[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7894) LMDB: MDB_DUPSORT and mdb_cursor_put(.., MDB_CURRENT) can put DB in illegal state



--001a11c2f26695c81a04ffbb03f9
Content-Type: text/plain; charset=UTF-8

On 2 August 2014 17:43, Howard Chu <hyc@symas.com> wrote:

> Max Bolingbroke wrote:
>
>> OK, that does make sense. So MDB_CURRENT is totally useless with
>> MDB_DUPSORT?
>> Since you are either pointing at the (k, v) pair already (so the put will
>> do
>> nothing) or you aren't (so the put will break the DB).
>>
>
> The feature would not exist if it were useless.


An alternative explanation for the existence of MDB_CURRENT could be that
it is useful without MDB_DUPSORT (which is certainly is!) but is useless
with it on. I take it from this remark that you do see some use for the
combination of the two options but I guess I must be too stupid to see what
that is!

I suppose that MDB_CURRENT might be useful with MDB_DUPSORT if you could
use it to replace the (key, value) pair being pointed to by the cursor with
a different (key, value) pair, so long as the LMDB user knew that the
replacement would not violate the LMDB invariant that duplicates must be
sorted and distinct. Though this use case "seems" to work with the current
implementation the docs certainly do not promise that this is supported.


>  Do you think it would be better to error out when the supplied key/data
>> doesn't match the data pointed to by the key? That would protect the DB
>> from
>> this kind of corruption in the presence of bugs in the application.
>>
>
> We don't attempt to protect from user errors; we expect competent
> programmers.


Is that really true? For example your docs from MDB_WRITEMAP say:

"""Use a writeable memory map unless MDB_RDONLY is set. This is faster
and uses fewer mallocs, but loses protection from application bugs
like wild pointer writes and other bad updates into the database."""

So LMDB's default behaviour is actually to take a perf hit in the name of
protecting programmers from themselves. My proposal is consistent with that
philosophy.


>  p.s. you may want to add a note to the docs that MDB_RESERVE is not
>> allowed
>> with MDB_DUPSORT. The combination doesn't seem to make a lot of sense
>>
>
> Indeed it makes no sense at all. And no, we don't attempt to protect from
> nonsensical usage.


Except that in e.g. mdb_put you explicitly check for nonsensical flags
making it through to mdb_cursor_put:

    if ((flags &
(MDB_NOOVERWRITE|MDB_NODUPDATA|MDB_RESERVE|MDB_APPEND|MDB_APPENDDUP)) !=
flags)
return EINVAL;

A check for the nonsensical combination of MDB_RESERVE and MDB_DUPSORT
would be in that spirit.

--001a11c2f26695c81a04ffbb03f9
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On 2=
 August 2014 17:43, Howard Chu <span dir=3D"ltr">&lt;<a href=3D"mailto:hyc@=
symas.com" target=3D"_blank">hyc@symas.com</a>&gt;</span> wrote:<br><blockq=
uote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-wi=
dth:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-=
left:1ex">
<div class=3D"">Max Bolingbroke wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
OK, that does make sense. So MDB_CURRENT is totally useless with MDB_DUPSOR=
T?<br>
Since you are either pointing at the (k, v) pair already (so the put will d=
o<br>
nothing) or you aren&#39;t (so the put will break the DB).<br>
</blockquote>
<br></div>
The feature would not exist if it were useless.</blockquote><div><br></div>=
<div>An alternative explanation for the existence of MDB_CURRENT could be t=
hat it is useful without MDB_DUPSORT (which is certainly is!) but is useles=
s with it on. I take it from this remark that you do see some use for the c=
ombination of the two options but I guess I must be too stupid to see what =
that is!</div>
<div><br>I suppose that MDB_CURRENT might be useful with MDB_DUPSORT if you=
 could use it to replace the (key, value) pair being pointed to by the curs=
or with a different (key, value) pair, so long as the LMDB user knew that t=
he replacement would not violate the LMDB invariant that duplicates must be=
 sorted and distinct. Though this use case &quot;seems&quot; to work with t=
he current implementation the docs certainly do not promise that this is su=
pported.</div>
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-l=
eft-style:solid;padding-left:1ex"><div class=3D"">
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
Do you think it would be better to error out when the supplied key/data<br>
doesn&#39;t match the data pointed to by the key? That would protect the DB=
 from<br>
this kind of corruption in the presence of bugs in the application.<br>
</blockquote>
<br></div>
We don&#39;t attempt to protect from user errors; we expect competent progr=
ammers.</blockquote><div><br></div><div>Is that really true? For example yo=
ur docs from MDB_WRITEMAP say:</div><div><br></div><div><div>&quot;&quot;&q=
uot;Use a writeable memory map unless MDB_RDONLY is set. This is faster</di=
v>
<div>and uses fewer mallocs, but loses protection from application bugs</di=
v><div>like wild pointer writes and other bad updates into the database.&qu=
ot;&quot;&quot;</div></div><div><br></div><div>So LMDB&#39;s default behavi=
our is actually to take a perf hit in the name of protecting programmers fr=
om themselves. My proposal is consistent with that philosophy.</div>
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-l=
eft-style:solid;padding-left:1ex"><div class=3D"">
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
p.s. you may want to add a note to the docs that MDB_RESERVE is not allowed=
<br>
with MDB_DUPSORT. The combination doesn&#39;t seem to make a lot of sense<b=
r>
</blockquote>
<br></div>
Indeed it makes no sense at all. And no, we don&#39;t attempt to protect fr=
om nonsensical usage.</blockquote><div><br></div><div>Except that in e.g. m=
db_put you explicitly check for nonsensical flags making it through to mdb_=
cursor_put:</div>
<div><br></div><div><div>=C2=A0 =C2=A0 if ((flags &amp; (MDB_NOOVERWRITE|MD=
B_NODUPDATA|MDB_RESERVE|MDB_APPEND|MDB_APPENDDUP)) !=3D flags)</div><div><s=
pan class=3D"" style=3D"white-space:pre">		</span>return EINVAL;</div></div=
><div><br></div>
<div>A check for the nonsensical combination of MDB_RESERVE and MDB_DUPSORT=
 would be in that spirit.</div></div></div></div>

--001a11c2f26695c81a04ffbb03f9--