[Date Prev][Date Next] [Chronological] [Thread] [Top]

The Case for Learned Index Structures



This has been making some waves today on social media:

https://www.arxiv-vanity.com/papers/1712.01208v1/

For now, it's only a novelty. Just like perfect hash functions. It assumes a static data set being used in read-only fashion, so it's unsuitable for a directory or database that serves ongoing modifications. It also assumes an entire data set fits in RAM, which is generally not true for database applications. In particular, the "fast" case of using highly parallel GPUs assumes everything fits inside GPU RAM, which is even more tightly constrained than server main memory.

It's axiomatic that if you have advance knowledge about the shape/characteristics of a dataset, you can construct a dedicated mapping function for that dataset that is perfectly optimal, and outperforms any general-purpose mapping. That's kind of the point of general-purpose mappings - they are general. There are plenty of use cases where this fact may be useful. In LDAP and any database system ingesting data in realtime, these findings are irrelevant since advance knowledge of the dataset doesn't exist.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/