[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Ok, I'll show my ignorance...please help...?



> We're running Cyrus IMAP on linux now, and are really happy with it.  I'm
> not sure to what extent it's a memory hog, but since it is an
> inetd service
> there's one process per connection it's not exactly lightweight.
> Regardless is seems to be doing pretty well with 20-50 user connections at
> any given time.

Yes, I have used Cyrus.  I used it on my home server for over a year.  When
my inbox exceeded 32,768 messages Netscape wouldn't scroll the messages list
properly, but the server was still OK.  However, it stores each email in a
separate file, and so it soon became impossible to even do an ls in the mail
directory.  (I tried once - I believe I left it for a few hours and came
back and it was still thrashing the hard drive to death.)  But with all that
email, I believe Cyrus took up a few tens of megs of memory when a
connection was active.  That caused a lot of swapping on that system (I had
20 megs in it at the time).  Multiple users with that much mail would have
been devastating.  Also, now and then Netscape would spend quite a bit of
time "compressing folders" (and the server seemed to be involved in this),
but I always wondered how much compression was really possible with each
mail in a separate file.  Maybe it was really just preening its indices or
something.

I quit using it for several reasons - I wanted to upgrade the Linux
distribution on that system, and when I re-installed Cyrus and just dumped
all that old mail back into its directories, it crashed.  And, I decided it
was going to be hard to manage all those separate messages; ideally I'd like
a way to move them into some kind of archive when they get old enough, but
only Cyrus knows how to deal with the mess it creates (numbered files for
each message, and indices to keep track of headers and ordering (I guess
that's what it keeps in them)).  Ideally I'd like Cyrus to be able to manage
all the mail I have ever received and not deleted (I could put a year's
worth into a folder so the main inbox doesn't get so huge), but it doesn't
scale that well.  Failing that, I'd like to put the mail into a web-based
archive, but I haven't done enough research to find out if the existing
tools for constructing such things can deal with Cyrus folders or if I'd
have to cat all the messages together to get back to something resembling a
normal Unix mail folder.  So in the meantime I tarred up the cyrus folders
and put the tarball on a CD (saving several hundred megs of hd space) for
whenever I get back to the problem, and nowadays I use mutt and just move
the mailbox aside, gzip it and start a new one when it gets unwieldy.  This
is of course rather suboptimal.

On the other hand, the imap server that comes with Debian (which I believe
is based on the U.Wash. sources) keeps mail in conventional Unix mail files.
That makes it easier to manage.  But, I tried to use it on the mail server
at work for a while, and curiously, the amount of memory it took up when
there was an active
connection seemed to be about the same as the size of the mail file.  Why it
couldn't just index the headers in memory rather than loading the entire
mail folder I have no idea.  And it can't manage folders worth a darn; it
assumes everything in your home directory is mail-related so in netscape I
got all the files in my home directory showing up as a huge collection of
mail folders.  The idea would work if they had just implemented it better
(like, use elm's ~/Mail/received for the inbox, and other files and
directories under Mail for the other IMAP folders).
>
> Why would they scale worse with more mail?  Just curious.  Interesting

There's practically always going to be more of something in memory on the
client and/or the server for more emails.  The idea would be to get it
scaling much less than linearly.  I suppose a really simplified textual
client could only have as many headers in memory as it is displaying on one
screenfull of listing, but then searches would be slow.  I've wondered if
IMAP allows for windowing techniques like that or if the protocol assumes th
at the mapping from a numerical mail ID to the actual mail exists and is
consistent on both the client and server.  If that is true (as I suspect)
then the server must maintain this mapping somewhere.  It could be done on
disk but for speed you want it in memory.

> though, since it actually wouldn't be terribly hard to get Cyrus [I'm not
> familiar with Washington's implementation] to store mail in an RDB as
> opposed to the filesystem.  Neat project.

That's an idea.  And how well would an RDB scale if you stick a lifetime of
email in it?  How well do databases map onto hierarchical storage management
systems?  I figure that's where this is headed - if I really do keep all my
email then I'm going to want to move the old ones onto CD's which will be
rarely accessed, yet still have it transparent to the client(s).  I just got
a CD changer and I'm trying to figure out how to write a driver for it.
(That's another story... the short of it is that it is supported by iXOS
jukeman, which is probably expensive, and I need to capture some of the
serial control data being sent back and forth so I can try to
reverse-engineer the command language.)  At least plaintext Unix-style mail
folders will probably still be decipherable by some means 20 years from now,
and particular RDB file formats may not.
>
> In general, though, LDAP server's aren't really appropriate for storing
> mail.  Based on what I've read, it's more along the lines of maintaining
> indices (small pieces of data which get a ton of reads and few
> writes) -- so
> it's kind of like DNS.

Yeah.  Funny thing though, wasn't the relational model touted as being
superior to the hierarchical model way back when?  And now there is this
trend back to hierarchical again.  If only dealing with it wasn't so
tedious; why couldn't the tools look more like RegEdit?

AFAICT each db-file is just a map from one thing to another.  There are
several of those, indexing various aspects of the tree.  So it's very fully
indexed and this is why reads are fast and writes are slow.