[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7651) LMDB: Uncontrolled database when opened from multiple processes



--001a11c2ed64c222e504e2c14561
Content-Type: text/plain; charset=UTF-8

Hi,

All writes occur in the parent process only. The child (normally) only
reopens the environment and performs a few short reads.

But, it's the actual opening of the env in the forked child that is causing
the database growth. I tried to close the env straight after opening it in
the child (without performing any reads), and have encountered the same
issues.

Hope that makes sense,
Dimitrij
On 30 Jul 2013 21:19, "Howard Chu" <hyc@symas.com> wrote:

> dimitrij.denissenko@**blacksquaremedia.com<dimitrij.denissenko@blacksquaremedia.com>wrote:
>
>> Full_Name: Dimitrij Denissenko
>> Version:
>> OS: Ubuntu 12.04
>> URL:
>> Submission from: (NULL) (62.30.100.0)
>>
>>
>> Hi,
>>
>> I found an interesting issue with LMDB. I have populated the DB with a
>> bunch of
>> records and it uses ~30M on disk (after sync). Then I added a background
>> process
>> to my app and populated the database again with the same record set.
>> Surprisingly. the resulting size on disk was >70M.
>>
>> The background process is forked periodically to perform some maintenance
>> tasks,
>> here is my (simplified) code:
>>
>> /* Close env before forking */
>> mdb_env_close(env);
>>
>> if ((childpid = fork()) == 0) {
>>      /* Child */
>>      rc = mdb_env_open(env, ".", MDB_NOSYNC, 0644);
>>      ...
>> } else {
>>      /* Parent */
>>      rc = mdb_env_open(env, ".", MDB_NOSYNC, 0644);
>>      ...
>> }
>>
>> I could narrow it down to the mdb_env_open call in the child. If I add
>> exit(0)
>> before the mdb_env_open line, the DB size remains consistently at ~30M.
>> The data
>> size seems to grow proportionally to the number of forks performed during
>> data
>> load. What could be causing the growth? What can I do to prevent it?
>>
>> Thanks in advance
>>
>> PS: I tried it with MDB_FIXMAP and without, same result.
>>
>
> Without seeing more of your code, it's impossible to tell. Are you adding
> the data on both sides of the fork? In the above code snippet, where are
> your mdb_put calls occurring? Are both the parent and child processes
> writing identical data?
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.           http://www.symas.com
>   Director, Highland Sun     http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/**project/<http://www.openldap.org/project/>
>

--001a11c2ed64c222e504e2c14561
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">Hi,</p>
<p dir=3D"ltr">All writes occur in the parent process only. The child (norm=
ally) only reopens the environment and performs a few short reads. </p>
<p dir=3D"ltr">But, it&#39;s the actual opening of the env in the forked ch=
ild that is causing the database growth. I tried to close the env straight =
after opening it in the child (without performing any reads), and have enco=
untered the same issues.</p>

<p dir=3D"ltr">Hope that makes sense,<br>
Dimitrij</p>
<div class=3D"gmail_quote">On 30 Jul 2013 21:19, &quot;Howard Chu&quot; &lt=
;<a href=3D"mailto:hyc@symas.com";>hyc@symas.com</a>&gt; wrote:<br type=3D"a=
ttribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">
<a href=3D"mailto:dimitrij.denissenko@blacksquaremedia.com"; target=3D"_blan=
k">dimitrij.denissenko@<u></u>blacksquaremedia.com</a> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Full_Name: Dimitrij Denissenko<br>
Version:<br>
OS: Ubuntu 12.04<br>
URL:<br>
Submission from: (NULL) (62.30.100.0)<br>
<br>
<br>
Hi,<br>
<br>
I found an interesting issue with LMDB. I have populated the DB with a bunc=
h of<br>
records and it uses ~30M on disk (after sync). Then I added a background pr=
ocess<br>
to my app and populated the database again with the same record set.<br>
Surprisingly. the resulting size on disk was &gt;70M.<br>
<br>
The background process is forked periodically to perform some maintenance t=
asks,<br>
here is my (simplified) code:<br>
<br>
/* Close env before forking */<br>
mdb_env_close(env);<br>
<br>
if ((childpid =3D fork()) =3D=3D 0) {<br>
=C2=A0 =C2=A0 =C2=A0/* Child */<br>
=C2=A0 =C2=A0 =C2=A0rc =3D mdb_env_open(env, &quot;.&quot;, MDB_NOSYNC, 064=
4);<br>
=C2=A0 =C2=A0 =C2=A0...<br>
} else {<br>
=C2=A0 =C2=A0 =C2=A0/* Parent */<br>
=C2=A0 =C2=A0 =C2=A0rc =3D mdb_env_open(env, &quot;.&quot;, MDB_NOSYNC, 064=
4);<br>
=C2=A0 =C2=A0 =C2=A0...<br>
}<br>
<br>
I could narrow it down to the mdb_env_open call in the child. If I add exit=
(0)<br>
before the mdb_env_open line, the DB size remains consistently at ~30M. The=
 data<br>
size seems to grow proportionally to the number of forks performed during d=
ata<br>
load. What could be causing the growth? What can I do to prevent it?<br>
<br>
Thanks in advance<br>
<br>
PS: I tried it with MDB_FIXMAP and without, same result.<br>
</blockquote>
<br>
Without seeing more of your code, it&#39;s impossible to tell. Are you addi=
ng the data on both sides of the fork? In the above code snippet, where are=
 your mdb_put calls occurring? Are both the parent and child processes writ=
ing identical data?<br>

<br>
-- <br>
=C2=A0 -- Howard Chu<br>
=C2=A0 CTO, Symas Corp. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D"http:=
//www.symas.com" target=3D"_blank">http://www.symas.com</a><br>
=C2=A0 Director, Highland Sun =C2=A0 =C2=A0 <a href=3D"http://highlandsun.c=
om/hyc/" target=3D"_blank">http://highlandsun.com/hyc/</a><br>
=C2=A0 Chief Architect, OpenLDAP =C2=A0<a href=3D"http://www.openldap.org/p=
roject/" target=3D"_blank">http://www.openldap.org/<u></u>project/</a><br>
</blockquote></div>

--001a11c2ed64c222e504e2c14561--