[nycphp-talk] email system for website
Paul A Houle
paul at devonianfarm.com
Mon Jan 4 12:16:11 EST 2010
Matt Juszczak wrote:
> Paul,
>
>> Sure, but I'd also send "high priority" and "low priority" emails
>> through separate systems (sendmail/postfix/whatever instances.)
>
> Well, what's to stop me from using the same database table for high
> priority and low priority, but having the high priority background
> process continuously loop to check for new items in the queue?
>
> Even if I have the webs send out high priority directly, like you
> said, that could cause some damage.
>
> Perhaps I could create a centralized relay server that the webs use
> for high priority, and have the webs send out high priority mail. But
> at that point, I unfortunately have to duplicate the high priority
> mail code =(
>
Ok, you're talking about two sorts of queue here.
(i) there's the queue of an SMTP-compliant mailer (I assume), and
(ii) there's a queue that you're maintaining of messages you want to
send; this might be a set of database rows, one per message, and
you're doing a "mail merge" process to fill in a template and push
messages gradually into queue (i)
Presumably you've got some rate control on (ii), and the mail merge
process is watching the length of the queue in (i) (and maybe some other
variables), so that you can control the load of the SMTP server.
One trouble with this is that a certain fraction of mail takes a
long time to deliver; an SMTP-compliant mailer will keep trying to
deliver a message for seven days. If you're sending enough mail,
you're eventually going to get a large "plug" of stuck messages that the
mail server is going to try to keep delivering and re-delivering.
Ultimately this is going to burn up resources on the mail server, which
will impact other things running on that machine: such as high-priority
mails you want to send.
Now, process (ii) can certainly stop putting messages in (i) once
the "plug" of stuck messages reaches a significant size, but that's
going to really slow down the bulk mail.
The main factor in mail server performance is the effect of fsync()
calls on a mechanical disk: mail delivery events really ought to be
transactional, since you don't want to deliver mail twice or fail to
deliver it. Fsync() doesn't (honestly) return until a chunk of metal
moves to a certain place; the bottleneck isn't so much like "you can do
so many a second" but more like a systemwide lock, since there can be
multiple processes trying to fsync(), such as syslogd or a database
server that's committing a transaction. The traffic jam can get backed
up, since other processes can be waiting for the the first process to
complete, can be holding more locks and so forth... So you end up with
a situation where the performance bottleneck is a real pain to
understand... It might even take you 5 minutes to get to a shell prompt
when you ssh in.
If you're on a virtual server, you may (or may not) have somebody
else doing a lot of fsync() calls, in which case your performance could
be hosed for reasons outside your observation and control. Just as
likely, the server will be programmmed to return from fsync() before
the fsync is done, which means someday you're going to have a big
database wreck...
For $200 a month you can rent a server that will do a great job
delivering email. Or you can spend 10-100x that trying to figure
problems out.
More information about the talk
mailing list