[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
The Case for Learned Index Structures
This has been making some waves today on social media:
https://www.arxiv-vanity.com/papers/1712.01208v1/
For now, it's only a novelty. Just like perfect hash functions. It assumes a 
static data set being used in read-only fashion, so it's unsuitable for a 
directory or database that serves ongoing modifications. It also assumes an 
entire data set fits in RAM, which is generally not true for database 
applications. In particular, the "fast" case of using highly parallel GPUs 
assumes everything fits inside GPU RAM, which is even more tightly constrained 
than server main memory.
It's axiomatic that if you have advance knowledge about the 
shape/characteristics of a dataset, you can construct a dedicated mapping 
function for that dataset that is perfectly optimal, and outperforms any 
general-purpose mapping. That's kind of the point of general-purpose mappings 
- they are general. There are plenty of use cases where this fact may be 
useful. In LDAP and any database system ingesting data in realtime, these 
findings are irrelevant since advance knowledge of the dataset doesn't exist.
--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/