[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7651) LMDB: Uncontrolled database when opened from multiple processes



dimitrij.denissenko@blacksquaremedia.com wrote:
> --001a11c2ed64c222e504e2c14561
> Content-Type: text/plain; charset=UTF-8
>
> Hi,
>
> All writes occur in the parent process only. The child (normally) only
> reopens the environment and performs a few short reads.
>
> But, it's the actual opening of the env in the forked child that is causing
> the database growth. I tried to close the env straight after opening it in
> the child (without performing any reads), and have encountered the same
> issues.

I don't see anything like that here. There's nothing in 
mdb_env_open(MDB_NOSYNC) that can even affect the size of the DB file. I don't 
think there's anything we can investigate without sample code that reproduces 
the situation.

>
> Hope that makes sense,
> Dimitrij
> On 30 Jul 2013 21:19, "Howard Chu" <hyc@symas.com> wrote:
>
>> dimitrij.denissenko@**blacksquaremedia.com<dimitrij.denissenko@blacksquaremedia.com>wrote:
>>
>>> Full_Name: Dimitrij Denissenko
>>> Version:
>>> OS: Ubuntu 12.04
>>> URL:
>>> Submission from: (NULL) (62.30.100.0)
>>>
>>>
>>> Hi,
>>>
>>> I found an interesting issue with LMDB. I have populated the DB with a
>>> bunch of
>>> records and it uses ~30M on disk (after sync). Then I added a background
>>> process
>>> to my app and populated the database again with the same record set.
>>> Surprisingly. the resulting size on disk was >70M.
>>>
>>> The background process is forked periodically to perform some maintenance
>>> tasks,
>>> here is my (simplified) code:
>>>
>>> /* Close env before forking */
>>> mdb_env_close(env);
>>>
>>> if ((childpid = fork()) == 0) {
>>>       /* Child */
>>>       rc = mdb_env_open(env, ".", MDB_NOSYNC, 0644);
>>>       ...
>>> } else {
>>>       /* Parent */
>>>       rc = mdb_env_open(env, ".", MDB_NOSYNC, 0644);
>>>       ...
>>> }
>>>
>>> I could narrow it down to the mdb_env_open call in the child. If I add
>>> exit(0)
>>> before the mdb_env_open line, the DB size remains consistently at ~30M.
>>> The data
>>> size seems to grow proportionally to the number of forks performed during
>>> data
>>> load. What could be causing the growth? What can I do to prevent it?
>>>
>>> Thanks in advance
>>>
>>> PS: I tried it with MDB_FIXMAP and without, same result.
>>>
>>
>> Without seeing more of your code, it's impossible to tell. Are you adding
>> the data on both sides of the fork? In the above code snippet, where are
>> your mdb_put calls occurring? Are both the parent and child processes
>> writing identical data?
>>
>> --
>>    -- Howard Chu
>>    CTO, Symas Corp.           http://www.symas.com
>>    Director, Highland Sun     http://highlandsun.com/hyc/
>>    Chief Architect, OpenLDAP  http://www.openldap.org/**project/<http://www.openldap.org/project/>
>>
>
> --001a11c2ed64c222e504e2c14561
> Content-Type: text/html; charset=UTF-8
> Content-Transfer-Encoding: quoted-printable
>
> <p dir=3D"ltr">Hi,</p>
> <p dir=3D"ltr">All writes occur in the parent process only. The child (norm=
> ally) only reopens the environment and performs a few short reads. </p>
> <p dir=3D"ltr">But, it&#39;s the actual opening of the env in the forked ch=
> ild that is causing the database growth. I tried to close the env straight =
> after opening it in the child (without performing any reads), and have enco=
> untered the same issues.</p>
>
> <p dir=3D"ltr">Hope that makes sense,<br>
> Dimitrij</p>
> <div class=3D"gmail_quote">On 30 Jul 2013 21:19, &quot;Howard Chu&quot; &lt=
> ;<a href=3D"mailto:hyc@symas.com";>hyc@symas.com</a>&gt; wrote:<br type=3D"a=
> ttribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
> rder-left:1px #ccc solid;padding-left:1ex">
> <a href=3D"mailto:dimitrij.denissenko@blacksquaremedia.com"; target=3D"_blan=
> k">dimitrij.denissenko@<u></u>blacksquaremedia.com</a> wrote:<br>
> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
> x #ccc solid;padding-left:1ex">
> Full_Name: Dimitrij Denissenko<br>
> Version:<br>
> OS: Ubuntu 12.04<br>
> URL:<br>
> Submission from: (NULL) (62.30.100.0)<br>
> <br>
> <br>
> Hi,<br>
> <br>
> I found an interesting issue with LMDB. I have populated the DB with a bunc=
> h of<br>
> records and it uses ~30M on disk (after sync). Then I added a background pr=
> ocess<br>
> to my app and populated the database again with the same record set.<br>
> Surprisingly. the resulting size on disk was &gt;70M.<br>
> <br>
> The background process is forked periodically to perform some maintenance t=
> asks,<br>
> here is my (simplified) code:<br>
> <br>
> /* Close env before forking */<br>
> mdb_env_close(env);<br>
> <br>
> if ((childpid =3D fork()) =3D=3D 0) {<br>
> =C2=A0 =C2=A0 =C2=A0/* Child */<br>
> =C2=A0 =C2=A0 =C2=A0rc =3D mdb_env_open(env, &quot;.&quot;, MDB_NOSYNC, 064=
> 4);<br>
> =C2=A0 =C2=A0 =C2=A0...<br>
> } else {<br>
> =C2=A0 =C2=A0 =C2=A0/* Parent */<br>
> =C2=A0 =C2=A0 =C2=A0rc =3D mdb_env_open(env, &quot;.&quot;, MDB_NOSYNC, 064=
> 4);<br>
> =C2=A0 =C2=A0 =C2=A0...<br>
> }<br>
> <br>
> I could narrow it down to the mdb_env_open call in the child. If I add exit=
> (0)<br>
> before the mdb_env_open line, the DB size remains consistently at ~30M. The=
>   data<br>
> size seems to grow proportionally to the number of forks performed during d=
> ata<br>
> load. What could be causing the growth? What can I do to prevent it?<br>
> <br>
> Thanks in advance<br>
> <br>
> PS: I tried it with MDB_FIXMAP and without, same result.<br>
> </blockquote>
> <br>
> Without seeing more of your code, it&#39;s impossible to tell. Are you addi=
> ng the data on both sides of the fork? In the above code snippet, where are=
>   your mdb_put calls occurring? Are both the parent and child processes writ=
> ing identical data?<br>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/