[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: indexing & hardware questions -- again



All, thanks for responding. Here are some follow-up
clarifications/remarks:

> >Could somebody explain what information is put in the index 
> files when
> >substring indexes are generated? Or, more generally, what OpenLDAP's
> >indexing strategy is?
> 
> I don't know the answer to that one.

If anybody knows more about this, I'd be curious to hear the details.

> >I'm also concerned about the sheer size of the index files 
> -- generating
> >"sub" indices increases the size of the index files by an order of
> >magnitude. My directory is quite large -- about 1,000,000 entries --
> >so I have to make sure that the machine running LDAP can handle the
> >memory requirements, which are in part determined by options like
> >dbcachesize.
> >Does anybody have some rough metrics for memory usage, e.g. with X
> >number of entries, Y number of attributes per entry, Z indexes on
> >these attributes, etc?
> 
> What is the "average" object size?  Are you storing binary 
> information (photos,
> etc..) in the directory?  I think the backend you use in this 
> case is the most
> important component (in addition to sufficient resources).

All information is mere text, but there is still a lot of data -- e.g. I
believe that the LDIF could be 1 GB (based on a sample LDIF of 1.5
million entries).
Add index files to this, and you see why I'm worried about memory usage.

 
> >One option that I'm considering is to maintain different indices on
> >different slave directories. This way, performance on the server used
> >by my web application can be optimized for the search 
> filters used by the
> >web application, and performance on the server used by the 
> service and
> >support staff can degrade a bit more...
> >Has anybody else tried this? Or do you have a better suggestion?
> 
> I haven't tried this.  Sounds like an interesting idea.

Anybody? (To preclude references to the FAQ, I understand that you
should index based on both the attributes that are searched on and the
specific search filters used...)
 
> >Finally, is slurpd actually more efficient at handling modify
> >operations? While a directory is handling a modify operation from a
> >client, its performance is pretty bad. So, by "efficient" I mean, do
> >the slaves take significantly LESS of a performance hit if the slurpd
> >process sends modify operations than if a slave handles client
> >requests directly?
> 
> slurpd doesn't perform modify operations.  It mirrors such 
> operations on a
> master slapd to the slave slapd.  At least as I understand 
> it.  slapd still is
> the one performing the modify op.
>

My specific question is this:
I am concerned about the costs and benefits of processing modify
operations via slurpd, with respect to how modify operations hurt the
speed of searches.
ldapadd is much more faster at importing a lot of data than some client
that I write would be. Is slurp also more efficient?
Or, since presumably multiple modify operations can be sent by slurpd at
once (or in a row?), could slurpd actually cause longer blocks of time
during which the slave is less responsive to searches?
Finally, can you control how slurpd sleeps?