[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: mdb_dbi_open and threads



Hallvard wrote

"Currently a moderate number of slots are cheap but a huge number gets
expensive: 7-120 words per transaction, and every #mdb_dbi_open()
does a linear search of the opened slots."

I haven't seen a performance hit with around 10000 named databases. By the way, I was hoping
to only open those dbi's on demand rather than opening all at iniatialization.


"With threads 1 and 2 coexisting? When thread 2 called mdb_dbi_open(),
thread 1's prospect of using mdb_dbi_open() at all was lost."

Yeah with both coexisting. Thats what I thought.

@Klaus


Yeah. I know there can be only one write transactions. I was talking about 1 write and 1 or more read transactions.
It is not as if I am first looking to open dbi in the read transaction. It is because I can't guarantee whether another read transaction will
start and will attempt to open the same named dbi when a write is in progress.

"And first looking in a read transaction whether a database exists and then creating it in a second write transaction is definitely a bad and risky programming style, as it carries an assumption from one transaction to the next, which is typically not valid."
 
That was not what I tried to do.

"you still have the option to combine all your logical databases into a big single database"

Its a workaround that I haven't thought about before. Hoping to avoid the extra complexity.

Is there any prospect of implementing mdb_dbi_open or mdb_db_open_immediate to put the dbi into the shared environment without waiting for txn commit.
I learned earlier from Howard Chu that it is not a wanted phenomenon in ACID. But just in case, because otherwise (without opening all the dbi's in initialization) in a multi-threaded
environment, the possibility to open a dbi on demand ending in failure goes up.






On Mon, May 22, 2017 at 2:01 PM, Klaus Malorny <Klaus.Malorny@knipp.de> wrote:
On 5/21/17 9:43 PM, Muhammed Muneer wrote:
Howard Chu wrote

"Just follow the recommendation to open all handles at the beginning of the program."

But what if I have lots of named databases like maybe 10000 or more. Wouldn't this be expensive.

I am developing a MongoDB like database (similar in query and update syntax) around LMDB.
The thing is I have some enhancements on my own like the ability to generate update queries
from within an ongoing update.

So in a multi threaded environment, if the name of a named dbi is generated from within a write
transaction (thread1) and proceeds to mdb_dbi_open it only to find that another read transaction
(thread 2) just opened the same named dbi after the write-txn of thread 1 started, the prospect of
mdb_dbi_open the same named dbi for thread 1 is lost forever.


Please remember that you can have only one writing transaction at once. And first looking in a read transaction whether a database exists and then creating it in a second write transaction is definitely a bad and risky programming style, as it carries an assumption from one transaction to the next, which is typically not valid.

I have no experience with a large number of databases, but if it is a performance problem as Hallvard and the docs describe, then you still have the option to combine all your logical databases into a big single database. In this case you would maintain a database ID (e.g. four byte integer) that is prepended to the user provided key for all get and put operations. Only some care needs to be taken for range searches and cursor operations, as you might get a key/value pair that belongs to another logical database, but this is not a big deal. I use that approach for composite search keys quite a lot.

The association between database names and their IDs could be maintained in a separate database.

Regards,

Klaus