Full_Name: Juerg Bircher Version: lmdb (master) OS: MacOS / Linux URL: ftp://ftp.openldap.org/incoming/Juerg_Bircher_160527-Improved-handling-for-large-number-of-databases.patch Submission from: (NULL) (178.82.37.195) There is a increased performance penalty the more databases are created within the same environment. I was looking for a way the improve that by keeping the simplicity of tracking databases within a list with direct access by index (MDB_dbi). mdb_dbi_open() is however not improved with the assumption that the database handle (dbi) is cached in the application. So mdb_dbi_open() should happen only once for each database during the life time of an application. One issue is that mdb_txn_begin() (for read-only transactions) calloc the sizeof(MDB_txn) + me_maxdbs * sizeof(MDB_db + 1). The plus 1 for the dbflags. However it is sufficient only to malloc that size and clear the sizeof(MDB_txn) memset(txn, 0, sizeof(MDB_txn) After that the data beyond the MDB_txn is not initialized which is ok for the moment. The next improvement happens in mdb_txn_renew0() where the dbflags are only set to DB_UNUSED (a new flag) for each database currently opened in the environment. memset(txn->mt_dbflags, DB_UNUSED, txn->mt_numdbs); The former code used to to loop through each database to calculate the dbflags. This is still done but lazily for each accessed database with the assumption that a read only transaction rarely uses all databases of the environment. The lazy initialization of the dbflag happens in the macro TXN_DBI_EXIST which is always used when a database handle (dbi) is passed to an function. The flags are updated in mdb_setup_db_info() once a database is access which is marked as unused (DB_UNUSED). static int mdb_setup_db_info(MDB_txn *txn, MDB_dbi dbi) { /* Setup db info */ uint16_t x = txn->mt_env->me_dbflags[dbi]; txn->mt_dbs[dbi].md_flags = x & PERSISTENT_FLAGS; txn->mt_dbflags[dbi] = (x & MDB_VALID) ? DB_VALID|DB_USRVALID|DB_STALE : 0; return (txn->mt_dbflags[dbi] & validity); } /** Check \b txn and \b dbi arguments to a function and initialize db info if needed */ #define TXN_DBI_EXIST(txn, dbi, validity) \ ((txn) && (dbi#C3C(txn)->mt_numdbs && (((txn)->mt_dbflags[dbi] & (validity)) || (((txn)->mt_dbflags[dbi] & DB_UNUSED) && mdb_setup_db_info((txn), (dbi), (validity))))) The next improvement is done in any function which needs to loop through the databases for example in mdb_cursors_close(). Again the more databases in the environment the longer the execution time. It should be best if looping only through dbflags and searching for those databases which are used (!DB_UNUSED). This could be done byte wise or more efficient in 8/4 byte steps comparing with an extended mask DB_UNUSED_LONG instead of DB_UNUSED. So we can skip 8 or 4 (32 bit) unused databases in one step (still with the assumption that a transaction rarely uses all databases of the environment). So the loop looks as follows always starting at the lower index to avoid alignment issues with ARM prior v6. #define DB_UNUSED 0x20 /**< DB not used in this txn */ #ifdef MDB_VL32 #define DB_UNUSED_LONG 0x20202020 /* DB_UNUSED long mask for fast tracking */ #else #define DB_UNUSED_LONG 0x2020202020202020 /* DB_UNUSED long mask for fast tracking */ #endif #ifdef MDB_VL32 #define MDB_WORD unsigned int #else #define MDB_WORD unsigned long long #endif MDB_dbi %3= src->mt_numdbs; MDB_dbi i = 0; while (1) { unsigned int upper = i + sizeof(MDB_WORD); if (upper < n) { // skip unused if ((*(MDB_WORD *)(tdbflags + i)) == DB_UNUSED_LONG) { i = upper; continue; } } else { upper = n; } for (; i < upper; i++) { // any other filter criteria appropriate to the function .... } if (i >= n) { break; } }
juerg.bircher@helmedica.com wrote: > Full_Name: Juerg Bircher > Version: lmdb (master) > OS: MacOS / Linux > URL: ftp://ftp.openldap.org/incoming/Juerg_Bircher_160527-Improved-handling-for-large-number-of-databases.patch > Submission from: (NULL) (178.82.37.195) Thanks for the patch, but for a change of this size you also need to include an IP Rights notice, as documented on the Contributing guidelines. > > > There is a increased performance penalty the more databases are created within > the same environment. I was looking for a way the improve that by keeping the > simplicity of tracking databases within a list with direct access by index > (MDB_dbi). > mdb_dbi_open() is however not improved with the assumption that the database > handle (dbi) is cached in the application. So mdb_dbi_open() should happen only > once for each database during the life time of an application. > > > One issue is that mdb_txn_begin() (for read-only transactions) calloc the > sizeof(MDB_txn) + me_maxdbs * sizeof(MDB_db + 1). The plus 1 for the dbflags. > However it is sufficient only to malloc that size and clear the sizeof(MDB_txn) > > memset(txn, 0, sizeof(MDB_txn) > > After that the data beyond the MDB_txn is not initialized which is ok for the > moment. > > The next improvement happens in mdb_txn_renew0() where the dbflags are only set > to DB_UNUSED (a new flag) for each database currently opened in the > environment. > > memset(txn->mt_dbflags, DB_UNUSED, txn->mt_numdbs); > > The former code used to to loop through each database to calculate the dbflags. > This is still done but lazily for each accessed database with the assumption > that a read only transaction rarely uses all databases of the environment. > > The lazy initialization of the dbflag happens in the macro TXN_DBI_EXIST which > is always used when a database handle (dbi) is passed to an function. > The flags are updated in mdb_setup_db_info() once a database is access which is > marked as unused (DB_UNUSED). > > static int mdb_setup_db_info(MDB_txn *txn, MDB_dbi dbi) { > /* Setup db info */ > uint16_t x = txn->mt_env->me_dbflags[dbi]; > txn->mt_dbs[dbi].md_flags = x & PERSISTENT_FLAGS; > txn->mt_dbflags[dbi] = (x & MDB_VALID) ? DB_VALID|DB_USRVALID|DB_STALE : 0; > return (txn->mt_dbflags[dbi] & validity); > } > > /** Check \b txn and \b dbi arguments to a function and initialize db info > if needed */ > #define TXN_DBI_EXIST(txn, dbi, validity) \ > ((txn) && (dbi#C3C(txn)->mt_numdbs && (((txn)->mt_dbflags[dbi] & (validity)) > || (((txn)->mt_dbflags[dbi] & DB_UNUSED) && mdb_setup_db_info((txn), (dbi), > (validity))))) > > > The next improvement is done in any function which needs to loop through the > databases for example in mdb_cursors_close(). Again the more databases in the > environment the longer the execution time. > It should be best if looping only through dbflags and searching for those > databases which are used (!DB_UNUSED). This could be done byte wise or more > efficient in 8/4 byte steps comparing with an extended mask DB_UNUSED_LONG > instead of DB_UNUSED. So we can skip 8 or 4 (32 bit) unused databases in one > step (still with the assumption that a transaction rarely uses all databases of > the environment). > > So the loop looks as follows always starting at the lower index to avoid > alignment issues with ARM prior v6. > > #define DB_UNUSED 0x20 /**< DB not used in this txn */ > > #ifdef MDB_VL32 > #define DB_UNUSED_LONG 0x20202020 /* DB_UNUSED long mask for fast > tracking */ > #else > #define DB_UNUSED_LONG 0x2020202020202020 /* DB_UNUSED long mask for fast > tracking */ > #endif > > > #ifdef MDB_VL32 > #define MDB_WORD unsigned int > #else > #define MDB_WORD unsigned long long > #endif > > > > MDB_dbi %3= src->mt_numdbs; > MDB_dbi i = 0; > > while (1) { > unsigned int upper = i + sizeof(MDB_WORD); > if (upper < n) { > // skip unused > if ((*(MDB_WORD *)(tdbflags + i)) == DB_UNUSED_LONG) { > i = upper; > continue; > } > } > else { > upper = n; > } > > for (; i < upper; i++) { > // any other filter criteria appropriate to the function > .... > } > if (i >= n) { > break; > } > } > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On 27. mai 2016 12:57, juerg.bircher@helmedica.com wrote: > mdb_dbi_open() is however not improved with the assumption that the database > handle (dbi) is cached in the application. So mdb_dbi_open() should happen only > once for each database during the life time of an application. It must check if the DB is open anyway, since LDMB breaks if the user has two DBIs to the same DB. Though maybe we could add an optimization flag to skip the search, for users who know they don't reopen open DBs. Other notes: LMDB and OpenLDAP mostly use tab-width = indentation = 4. (We should comment that somewhere.) You seem to use tab-width = 8. I think MDB_WORD intends to be mdb_size_t, but should be 'unsigned long': It's probably no optimization to use 'long long' when that is not a native integer type. But I haven't profiled that. The delay in picking up env->me_dbflags is a bit troubling, since it means creating a new txn is no longer quite atomic. I can't think of how it'll make a difference when DBIs are used correctly, but that's one thing people screw up. Maybe it'd be OK if renew0() just copies it into mt_dbs[].md_flags without the '&', and mdb_setup_db_info() fixes it up. -- Hallvard
Disclaimer The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Juerg Bircher juerg.bircher@helmedica.com. I have not assigned rights and/or interest in this work to any party. I, Juerg Bircher, hereby place the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice. On 27/05/16 13:16, "Howard Chu" <hyc@symas.com> wrote: >juerg.bircher@helmedica.com wrote: >> Full_Name: Juerg Bircher >> Version: lmdb (master) >> OS: MacOS / Linux >> URL: ftp://ftp.openldap.org/incoming/Juerg_Bircher_160527-Improved-handling-for-large-number-of-databases.patch >> Submission from: (NULL) (178.82.37.195) > >Thanks for the patch, but for a change of this size you also need to include >an IP Rights notice, as documented on the Contributing guidelines. >> >> >> There is a increased performance penalty the more databases are created within >> the same environment. I was looking for a way the improve that by keeping the >> simplicity of tracking databases within a list with direct access by index >> (MDB_dbi). >> mdb_dbi_open() is however not improved with the assumption that the database >> handle (dbi) is cached in the application. So mdb_dbi_open() should happen only >> once for each database during the life time of an application. >> >> >> One issue is that mdb_txn_begin() (for read-only transactions) calloc the >> sizeof(MDB_txn) + me_maxdbs * sizeof(MDB_db + 1). The plus 1 for the dbflags. >> However it is sufficient only to malloc that size and clear the sizeof(MDB_txn) >> >> memset(txn, 0, sizeof(MDB_txn) >> >> After that the data beyond the MDB_txn is not initialized which is ok for the >> moment. >> >> The next improvement happens in mdb_txn_renew0() where the dbflags are only set >> to DB_UNUSED (a new flag) for each database currently opened in the >> environment. >> >> memset(txn->mt_dbflags, DB_UNUSED, txn->mt_numdbs); >> >> The former code used to to loop through each database to calculate the dbflags. >> This is still done but lazily for each accessed database with the assumption >> that a read only transaction rarely uses all databases of the environment. >> >> The lazy initialization of the dbflag happens in the macro TXN_DBI_EXIST which >> is always used when a database handle (dbi) is passed to an function. >> The flags are updated in mdb_setup_db_info() once a database is access which is >> marked as unused (DB_UNUSED). >> >> static int mdb_setup_db_info(MDB_txn *txn, MDB_dbi dbi) { >> /* Setup db info */ >> uint16_t x = txn->mt_env->me_dbflags[dbi]; >> txn->mt_dbs[dbi].md_flags = x & PERSISTENT_FLAGS; >> txn->mt_dbflags[dbi] = (x & MDB_VALID) ? DB_VALID|DB_USRVALID|DB_STALE : 0; >> return (txn->mt_dbflags[dbi] & validity); >> } >> >> /** Check \b txn and \b dbi arguments to a function and initialize db info >> if needed */ >> #define TXN_DBI_EXIST(txn, dbi, validity) \ >> ((txn) && (dbi#C3C(txn)->mt_numdbs && (((txn)->mt_dbflags[dbi] & (validity)) >> || (((txn)->mt_dbflags[dbi] & DB_UNUSED) && mdb_setup_db_info((txn), (dbi), >> (validity))))) >> >> >> The next improvement is done in any function which needs to loop through the >> databases for example in mdb_cursors_close(). Again the more databases in the >> environment the longer the execution time. >> It should be best if looping only through dbflags and searching for those >> databases which are used (!DB_UNUSED). This could be done byte wise or more >> efficient in 8/4 byte steps comparing with an extended mask DB_UNUSED_LONG >> instead of DB_UNUSED. So we can skip 8 or 4 (32 bit) unused databases in one >> step (still with the assumption that a transaction rarely uses all databases of >> the environment). >> >> So the loop looks as follows always starting at the lower index to avoid >> alignment issues with ARM prior v6. >> >> #define DB_UNUSED 0x20 /**< DB not used in this txn */ >> >> #ifdef MDB_VL32 >> #define DB_UNUSED_LONG 0x20202020 /* DB_UNUSED long mask for fast >> tracking */ >> #else >> #define DB_UNUSED_LONG 0x2020202020202020 /* DB_UNUSED long mask for fast >> tracking */ >> #endif >> >> >> #ifdef MDB_VL32 >> #define MDB_WORD unsigned int >> #else >> #define MDB_WORD unsigned long long >> #endif >> >> >> >> MDB_dbi %3= src->mt_numdbs; >> MDB_dbi i = 0; >> >> while (1) { >> unsigned int upper = i + sizeof(MDB_WORD); >> if (upper < n) { >> // skip unused >> if ((*(MDB_WORD *)(tdbflags + i)) == DB_UNUSED_LONG) { >> i = upper; >> continue; >> } >> } >> else { >> upper = n; >> } >> >> for (; i < upper; i++) { >> // any other filter criteria appropriate to the function >> .... >> } >> if (i >= n) { >> break; >> } >> } >> >> >> > > >-- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/
Here the missing disclaimer for the patch: The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Juerg Bircher juerg.bircher@helmedica.com. I have not assigned rights and/or interest in this work to any party. I, Juerg Bircher, hereby place the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.
I am switching to gmail as in outlook my name uses Umlaute which results in base64 encoded mails! Strangely the mailing list openldap-technical can handle it and openldap-its cannot! Here the missing disclaimer for the patch: The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Juerg Bircher juerg.bircher@helmedica.com. I have not assigned rights and/or interest in this work to any party. I, Juerg Bircher, hereby place the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.
First of all thank you for having a look at the improvement. On 27/05/16 14:42, "Hallvard Breien Furuseth" <h.b.furuseth@usit.uio.no> wrote: >On 27. mai 2016 12:57, juerg.bircher@helmedica.com wrote: >> mdb_dbi_open() is however not improved with the assumption that the database >> handle (dbi) is cached in the application. So mdb_dbi_open() should happen only >> once for each database during the life time of an application. > >It must check if the DB is open anyway, since LDMB breaks if the user >has two DBIs to the same DB. Though maybe we could add an optimization >flag to skip the search, for users who know they don't reopen open DBs. > >Other notes: > >LMDB and OpenLDAP mostly use tab-width = indentation = 4. (We should >comment that somewhere.) You seem to use tab-width = 8. I will consider that for the next patch! > >I think MDB_WORD intends to be mdb_size_t, but should be 'unsigned >long': It's probably no optimization to use 'long long' when that >is not a native integer type. But I haven't profiled that. The MDB_WORD is used to get the appropriate word size of the CPU for the optimised dbflag check. So on a 32 bit CPUs it should be an 32 bit integer and on a 64 bit CPUs a 64 bit integer. But the mdb_size_t seems always to be 64 bits? > >The delay in picking up env->me_dbflags is a bit troubling, since it >means creating a new txn is no longer quite atomic. I can't think of >how it'll make a difference when DBIs are used correctly, but that's one >thing people screw up. Maybe it'd be OK if renew0() just copies it into >mt_dbs[].md_flags without the '&', and mdb_setup_db_info() fixes it up. Is it really atomic in original lmdb? The flags are copied from the environment and calculated. But while copying the dbflags from the environment they may change. So read level consistency and therefore atomicity is not guaranteed. But I guess it just does not matter. So picking up the flags lazily just extends the time of copying the flags from the environment but is basically the same. Atomicity and also isolation is a must in respect to the data but I think not in respect to the database handles and their flags. Maybe I am overseeing something as I do not know all the thoughts between the lines. > >-- >Hallvard Regards Juerg
On 27/05/16 23:39, juerg.bircher@gmail.com wrote: > On 27/05/16 14:42, "Hallvard Breien Furuseth" <h.b.furuseth@usit.uio.no> wrote: >> I think MDB_WORD intends to be mdb_size_t, but should be 'unsigned >> long': It's probably no optimization to use 'long long' when that >> is not a native integer type. But I haven't profiled that. > > The MDB_WORD is used to get the appropriate word size of the CPU for > the optimised dbflag check. So on a 32 bit CPUs it should be an 32 bit > integer and on a 64 bit CPUs a 64 bit integer. That's why I suggested unsigned long. It seems the best guess for a native wide type. But it's hard to guess what kind of CPU is in use. > But the mdb_size_t seems always to be 64 bits? On a 32-bit host it's normally 32-bit, but you get 64-bit mdb_size_t anyway if you build and use LMDB with MDB_VL32 defined. I misread your patch, it should not use MDB_VL32. >> The delay in picking up env->me_dbflags is a bit troubling, since it >> means creating a new txn is no longer quite atomic. I can't think of >> how it'll make a difference when DBIs are used correctly, but that's one >> thing people screw up. Maybe it'd be OK if renew0() just copies it into >> mt_dbs[].md_flags without the '&', and mdb_setup_db_info() fixes it up. > > Is it really atomic in original lmdb? The flags are copied from the > environment and calculated. But while copying the dbflags from the > environment they may change. So read level consistency and therefore > atomicity is not guaranteed. I should have said, it's atomic when DBIs are used correctly. Such as obeying this limitation from the mdb_dbi_open() doc and a similar one for mdb_dbi_close(): * A transaction that uses * this function must finish (either commit or abort) before * any other transaction in the process may use this function. To guarante atomicity we'd need locks around some DBI operations. To avoid that, LMDB simply calls unsafe use "user errors" - which can cause breakage. Hence those annoying DBI usage restrictions. LMDB does attempt to catch user errors when it's cheap to try, see e.g. TXN_DBI_CHANGED(). Doesn't work at all for read-only txns, they have txn->mt_dbiseqs == env->me_dbiseqs. So you can see why DBI cleverness makes me nervous. I think the MDB_WORD looping can go in now, but we'll need to think carefully about the lazy init. > But I guess it just does not matter. > So picking up the flags lazily just extends the time of copying the > flags from the environment but is basically the same. > Atomicity and also isolation is a must in respect to the data but I > think not in respect to the database handles and their flags. > Maybe I am overseeing something as I do not know all the thoughts > between the lines. With ITS#8431 I can think of one thing: There's no sync between mdb_dbi_open() in one txn and another txn using the new DBI. It'll be up to the user to synchronize DBIs. For that to work in write-txs, TXN_DBI_CHANGED() must go away, and writers will have no way to try to catch user errors with sync. Or if it should only work in read-only txns, then read-only txns and write-txns must have different definitions of what is a "user error" with DBIs. Anyway, we'll need to think about the definition of user errors. Other notes: /** ... */ above a global declaration or /**< ... */ to the right of it tells Doxygen that the comment is the doc of the declared item. "#foo" inside a Doxygen comment is a link to the declaration of foo. Maybe mt_dbflags should have an extra byte or word at the end which does not have DB_UNUSED, and with size rounded up to sizeof(MDB_WORD). Then stepping to next DBI can do something like this, unrelated to the check against mt_numdbs. Not sure if that's an improvement, I'm getting tired: /** Return an #MDB_WORD where each byte is \b byte */ #define WORD_BYTES(byte) ((MDB_WORD)-1 / 0xff * (byte)) /** Step up to next used DBI */ #define DBI_STEP(txn, dbi) do { \ if (! (++(dbi) & (sizeof(mdb_word_t)-1))) \ while (*(mdb_word_t *)((txn)->mt_dbflags + (dbi)) == WORD_BYTES(DB_UNUSED)) (dbi) += sizeof(MDB_WORD); \ } while (0)
On 28/05/16 03:28, Hallvard Breien Furuseth wrote: >On 27/05/16 23:39, juerg.bircher@gmail.com wrote: >> Maybe I am overseeing something as I do not know all the thoughts >> between the lines. > > With ITS#8431 I can think of one thing: > (...blah...) Actually the main reason I worry about lazy init is simply that we've screwed up DBIs many times, so I prefer to go slowly. That's Howard's call though, not mine.
On Sat, May 28, 2016 at 3:28 AM, Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no> wrote: > On 27/05/16 23:39, juerg.bircher@gmail.com wrote: >> >> On 27/05/16 14:42, "Hallvard Breien Furuseth" <h.b.furuseth@usit.uio.no> >> wrote: >>> >>> I think MDB_WORD intends to be mdb_size_t, but should be 'unsigned >>> long': It's probably no optimization to use 'long long' when that >>> is not a native integer type. But I haven't profiled that. >> >> >> The MDB_WORD is used to get the appropriate word size of the CPU for >> the optimised dbflag check. So on a 32 bit CPUs it should be an 32 bit >> integer and on a 64 bit CPUs a 64 bit integer. > > > That's why I suggested unsigned long. It seems the best guess for a > native wide type. But it's hard to guess what kind of CPU is in use. > >> But the mdb_size_t seems always to be 64 bits? > > > On a 32-bit host it's normally 32-bit, but you get 64-bit mdb_size_t anyway > if you build and use LMDB with MDB_VL32 defined. I misread > your patch, it should not use MDB_VL32. > >>> The delay in picking up env->me_dbflags is a bit troubling, since it >>> means creating a new txn is no longer quite atomic. I can't think of >>> how it'll make a difference when DBIs are used correctly, but that's one >>> thing people screw up. Maybe it'd be OK if renew0() just copies it into >>> mt_dbs[].md_flags without the '&', and mdb_setup_db_info() fixes it up. >> >> >> Is it really atomic in original lmdb? The flags are copied from the >> environment and calculated. But while copying the dbflags from the >> environment they may change. So read level consistency and therefore >> atomicity is not guaranteed. > > > I should have said, it's atomic when DBIs are used correctly. Such as > obeying this limitation from the mdb_dbi_open() doc and a similar one > for mdb_dbi_close(): > > * A transaction that uses > * this function must finish (either commit or abort) before > * any other transaction in the process may use this function. > > To guarante atomicity we'd need locks around some DBI operations. > To avoid that, LMDB simply calls unsafe use "user errors" - which can > cause breakage. Hence those annoying DBI usage restrictions. > > LMDB does attempt to catch user errors when it's cheap to try, see e.g. > TXN_DBI_CHANGED(). Doesn't work at all for read-only txns, they have > txn->mt_dbiseqs == env->me_dbiseqs. > > So you can see why DBI cleverness makes me nervous. > I think the MDB_WORD looping can go in now, but we'll need to think > carefully about the lazy init. I understand that it is a tricky thing. But maybe the lazy init is still feasible? In my understand without having all the background it seems ok to get the current snapshot of the database flags from the env once it is firstly accessed whether it is at the beginning or later in the transaction. Read level consistency should be preserved anyway as mt_txnId is set in mdb_txn_renew0(). > >> But I guess it just does not matter. >> So picking up the flags lazily just extends the time of copying the >> flags from the environment but is basically the same. >> Atomicity and also isolation is a must in respect to the data but I >> think not in respect to the database handles and their flags. >> Maybe I am overseeing something as I do not know all the thoughts >> between the lines. > > > With ITS#8431 I can think of one thing: > > There's no sync between mdb_dbi_open() in one txn and another txn > using the new DBI. It'll be up to the user to synchronize DBIs. For > that to work in write-txs, TXN_DBI_CHANGED() must go away, and writers > will have no way to try to catch user errors with sync. Or if it > should only work in read-only txns, then read-only txns and write-txns > must have different definitions of what is a "user error" with DBIs. > > Anyway, we'll need to think about the definition of user errors. > So if you drop ITS#8431 it would at least be very helpful to get an separate error like MDB_INVALID_DPI instead of EINVAL So the application sharing and caching handles (dbi) could handle the situation appropriate like use mdb_env_open() so that the dbi becomes valid in that txn as well. > > Other notes: > > /** ... */ above a global declaration or /**< ... */ to the right > of it tells Doxygen that the comment is the doc of the declared item. > "#foo" inside a Doxygen comment is a link to the declaration of foo. > > > Maybe mt_dbflags should have an extra byte or word at the end which > does not have DB_UNUSED, and with size rounded up to sizeof(MDB_WORD). > Then stepping to next DBI can do something like this, unrelated to > the check against mt_numdbs. Not sure if that's an improvement, I'm > getting tired: > > /** Return an #MDB_WORD where each byte is \b byte */ > #define WORD_BYTES(byte) ((MDB_WORD)-1 / 0xff * (byte)) > > /** Step up to next used DBI */ > #define DBI_STEP(txn, dbi) do { \ > if (! (++(dbi) & (sizeof(mdb_word_t)-1))) \ > while (*(mdb_word_t *)((txn)->mt_dbflags + (dbi)) == > WORD_BYTES(DB_UNUSED)) > (dbi) += sizeof(MDB_WORD); \ > } while (0) > Yes round it up to sizeof(MDB_WORD) would be obvious in connection with the optimized loop. If you are willing to add some more #ifdef we could even get it faster at least for x86_64 using SSE2 instructions. Then MDB_WORD would be 128 bit and we check 16 flags in one step using _mm_cmpeq_epi8 and _mm_movemask_epi8 to get the mask of matching flags. If the mask != 0xffffffff we found an used database. Then we negate the mask and with __builtin_ffs(mask) we could then find the index of the first used database.
---------- Forwarded message ---------- From: Juerg Bircher <juerg.bircher@gmail.com> Date: Sat, May 28, 2016 at 10:30 PM Subject: Re: (ITS#8430) Improved handling for large number of databases To: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no> On Sat, May 28, 2016 at 3:41 AM, Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no> wrote: > On 28/05/16 03:28, Hallvard Breien Furuseth wrote: >> >> On 27/05/16 23:39, juerg.bircher@gmail.com wrote: >>> >>> Maybe I am overseeing something as I do not know all the thoughts >>> between the lines. >> >> >> With ITS#8431 I can think of one thing: >> (...blah...) > > > Actually the main reason I worry about lazy init is simply that > we've screwed up DBIs many times, so I prefer to go slowly. > That's Howard's call though, not mine. Maybe a good way in between could be to memcpy txn->mt_env->me_dbflags. However md_flags would need to be moved out of the MDB_dbi struct like mt_dbflags or easier just add another array serving as snapshot. This would not break anything with flag initialization as we get a snapshot of them in the mdb_txn_renew0. memcpy uses SSE instructions for x86_64. We should consider to align on 16 byte boundary for best performance. The dbflags could then still be initialized with MDB_UNSUSED as suggested.
On 28/05/16 22:08, Juerg Bircher wrote: > I understand that it is a tricky thing. But maybe the lazy init is > still feasible? (...) Right, lazy init is feasible. Just keep cross-txn DBI stuff out of it, I'll take your latest variant to ITS#8431. Copied from a private mail: > Maybe a good way in between could be to memcpy txn->mt_env->me_dbflags. > (...) memcpy uses SSE instructions for x86_64. Yes. If we make MDB_txn.mt_dbflags an uint16_t*, we can memcpy to it. That costs <maxdbs> bytes/txn + one more reserved env flag bit in PERSISTENT_FLAGS: Reserve DB_UNUSED, and keep it always set in me_dbflags so the memcpy will set it in mt_dbflags. Or maybe your code can be used without initial memcpy and I'm just paranoid. The one problem I can think of is user errors - people coming up with workarounds like your cross-txn DBI stuff. LMDB tries to catch some of that, but maybe late pickup defeats some such checks. I'll wait for Howard to say something at this point. BTW, all this reminds me - I should add a test which tries to catch mdb_dbi_open() in overlapping txns. Got the code, but forgot about it while pondering a good error code name. > However md_flags would need to be moved out of the MDB_dbi struct Nope. Sub-DBs with duplicate data have md_flags which do not correspond to anything saved in env->me_flags. Your lazy-init code needs to use the memcpyed flags instead of me_dbflags, that's all. > We should consider to align on 16 byte boundary for best performance. It's already aligned.
On 30. mai 2016 13:51, Hallvard Breien Furuseth wrote: > Copied from a private mail: >> Maybe a good way in between could be to memcpy txn->mt_env->me_dbflags. >> (...) memcpy uses SSE instructions for x86_64. > > Yes. If we make MDB_txn.mt_dbflags an uint16_t*, we can memcpy to it. > That costs <maxdbs> bytes/txn + one more reserved env flag bit in > PERSISTENT_FLAGS: Reserve DB_UNUSED, and keep it always set in > me_dbflags so the memcpy will set it in mt_dbflags. Oops, that was too clever. Then a bunch of code which assumes a DBI is valid if various txn flags are set, must also check DB_UNUSED. Or invert that to DB_USED which might be easier - (flags & DB_DIRTY) becomes F_ISSET(flags, DB_DIRTY|DB_USED) which checks if both are set. Maybe the simple solution using 2*<maxdbs> bytes/txn would be better. -- Hallvard