[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#8430) Improved handling for large number of databases
- To: openldap-its@OpenLDAP.org
- Subject: Re: (ITS#8430) Improved handling for large number of databases
- From: hyc@symas.com
- Date: Fri, 27 May 2016 11:17:06 +0000
- Auto-submitted: auto-generated (OpenLDAP-ITS)
juerg.bircher@helmedica.com wrote:
> Full_Name: Juerg Bircher
> Version: lmdb (master)
> OS: MacOS / Linux
> URL: ftp://ftp.openldap.org/incoming/Juerg_Bircher_160527-Improved-handling-for-large-number-of-databases.patch
> Submission from: (NULL) (178.82.37.195)
Thanks for the patch, but for a change of this size you also need to include
an IP Rights notice, as documented on the Contributing guidelines.
>
>
> There is a increased performance penalty the more databases are created within
> the same environment. I was looking for a way the improve that by keeping the
> simplicity of tracking databases within a list with direct access by index
> (MDB_dbi).
> mdb_dbi_open() is however not improved with the assumption that the database
> handle (dbi) is cached in the application. So mdb_dbi_open() should happen only
> once for each database during the life time of an application.
>
>
> One issue is that mdb_txn_begin() (for read-only transactions) calloc the
> sizeof(MDB_txn) + me_maxdbs * sizeof(MDB_db + 1). The plus 1 for the dbflags.
> However it is sufficient only to malloc that size and clear the sizeof(MDB_txn)
>
> memset(txn, 0, sizeof(MDB_txn)
>
> After that the data beyond the MDB_txn is not initialized which is ok for the
> moment.
>
> The next improvement happens in mdb_txn_renew0() where the dbflags are only set
> to DB_UNUSED (a new flag) for each database currently opened in the
> environment.
>
> memset(txn->mt_dbflags, DB_UNUSED, txn->mt_numdbs);
>
> The former code used to to loop through each database to calculate the dbflags.
> This is still done but lazily for each accessed database with the assumption
> that a read only transaction rarely uses all databases of the environment.
>
> The lazy initialization of the dbflag happens in the macro TXN_DBI_EXIST which
> is always used when a database handle (dbi) is passed to an function.
> The flags are updated in mdb_setup_db_info() once a database is access which is
> marked as unused (DB_UNUSED).
>
> static int mdb_setup_db_info(MDB_txn *txn, MDB_dbi dbi) {
> /* Setup db info */
> uint16_t x = txn->mt_env->me_dbflags[dbi];
> txn->mt_dbs[dbi].md_flags = x & PERSISTENT_FLAGS;
> txn->mt_dbflags[dbi] = (x & MDB_VALID) ? DB_VALID|DB_USRVALID|DB_STALE : 0;
> return (txn->mt_dbflags[dbi] & validity);
> }
>
> /** Check \b txn and \b dbi arguments to a function and initialize db info
> if needed */
> #define TXN_DBI_EXIST(txn, dbi, validity) \
> ((txn) && (dbi#C3C(txn)->mt_numdbs && (((txn)->mt_dbflags[dbi] & (validity))
> || (((txn)->mt_dbflags[dbi] & DB_UNUSED) && mdb_setup_db_info((txn), (dbi),
> (validity)))))
>
>
> The next improvement is done in any function which needs to loop through the
> databases for example in mdb_cursors_close(). Again the more databases in the
> environment the longer the execution time.
> It should be best if looping only through dbflags and searching for those
> databases which are used (!DB_UNUSED). This could be done byte wise or more
> efficient in 8/4 byte steps comparing with an extended mask DB_UNUSED_LONG
> instead of DB_UNUSED. So we can skip 8 or 4 (32 bit) unused databases in one
> step (still with the assumption that a transaction rarely uses all databases of
> the environment).
>
> So the loop looks as follows always starting at the lower index to avoid
> alignment issues with ARM prior v6.
>
> #define DB_UNUSED 0x20 /**< DB not used in this txn */
>
> #ifdef MDB_VL32
> #define DB_UNUSED_LONG 0x20202020 /* DB_UNUSED long mask for fast
> tracking */
> #else
> #define DB_UNUSED_LONG 0x2020202020202020 /* DB_UNUSED long mask for fast
> tracking */
> #endif
>
>
> #ifdef MDB_VL32
> #define MDB_WORD unsigned int
> #else
> #define MDB_WORD unsigned long long
> #endif
>
>
>
> MDB_dbi %3= src->mt_numdbs;
> MDB_dbi i = 0;
>
> while (1) {
> unsigned int upper = i + sizeof(MDB_WORD);
> if (upper < n) {
> // skip unused
> if ((*(MDB_WORD *)(tdbflags + i)) == DB_UNUSED_LONG) {
> i = upper;
> continue;
> }
> }
> else {
> upper = n;
> }
>
> for (; i < upper; i++) {
> // any other filter criteria appropriate to the function
> ....
> }
> if (i >= n) {
> break;
> }
> }
>
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/