I am a caffe user. In my use case, I am reading from a ~300GB lmdb sequentially, reading one element and never accessing again until I read every other element in the db and loops around again. It seems that lmdb will page cache every element. This becomes a problem as the dataset is read in fairly fast and it takes 1 hour before nearly 60% of the RAM is devoted to my lmdb page caches. Then it runs out of unmapped memory to use, so it starts kicking out the page frames of processes of other users, many of which have not been accessed in the last hour. So the system will prefer to kick out those page frames instead of page frames mapped at the beginning of the run. This behavior is entirely understandable but it causes extremely severe thrashing and unresponsive system, as those page frames of other user's comes into use very soon. Does my diagnosis of the situation seem reasonable?
As many of our caffe use case is a similar sequential read of a dataset much greater than available RAM, many other caffe users besides me reports a similar issue. They all report their system becoming unresponsive, presumably due to the same thrashing.
"training is freezing for multiple hours"
"Caffe memory increases with time(iterations?)"
Is there some option that I missed that can inform lmdb that for a certain read-only transaction is going to be purely sequential, so it shouldn't bother to cache the already read elements? If not, is there a plan to include such a feature?
Or is there an option I can limit the maximum memory a single lmdb transaction is going to use to cache?
Or is there some other possible solution to this problem?
I have been using a hack based on this fork
to avoid this issue. However, I would love to know if there is any less hacky way to solve this problem.
I have seen this thread
but it looks like it is for multiple readers.
Any advice on this will be much appreciated!