[Date Prev][Date Next] [Chronological] [Thread] [Top]

Proxy Cache Contribution






The proxy cache code from Apurva Kumar of IBM India Research was committed
to HEAD recently.

The proxy cache provides the following features :
                                                                             
    1. Semantic caching of positive conjunctive LDAP queries                 
    2. Attribute level caching.                                              
    3. Consistency support                                                   
    4. Support for multiple backend types                                    
    5. Support for caching multiple directories                              
    6. Support for multiple database instances for a single cache directory  
       tree.                                                                 
                                                                             



Please refer to the attached document and ITS#2062 for further information
and history.
Any comments and feedbacks to the developers' list are welcome.

(See attached file: ldapcache.html)

------------------------
Jong Hyuk Choi
IBM Thomas J. Watson Research Center - Enterprise Linux Group
P. O. Box 218, Yorktown Heights, NY 10598
email: jongchoi@us.ibm.com
(phone) 914-945-3979    (fax) 914-945-4425   TL: 862-3979
Title: LDAP proxy cache

LDAP Proxy Cache extension for OpenLDAP

Features

  1. Semantic caching of positive conjunctive LDAP queries
    • Answering of repeat and contained queries.
    • Support for answering queries with equality, GE, LE and substring assertions.
    • Answering of queries corresponding to specified query templates eg. (cn=), &(cn=)(c=).
  2. Attribute level caching.
    • Only required attributes of an entry are cached to improve cache utilization.
  3. Consistency support
    • TTL based weak consistency support provided.
  4. Support for multiple backend types
    • Caching operations implemented using backend APIs.
  5. Support for caching multiple directories
    • Could be used as meta-directory cache.
  6. Support for multiple database instances for a single cache directory tree.

LDAP proxy caching: Motivation

There has been a growth of websites providing dynamic content on the web. Typically dynamic content is generated in response to a user request (query) evaluated against a database. Techniques used for traditional content caching are thus not useful for caching dynamic content. Directories are specialized databases which are capable of storing heterogeneous real world information in a single instance. A large number of dynamic content websites use directories at the backend to access personnel information, customer profiles, preferences, authentication information etc. In most of these applications even though the content provided to the user is dynamic, the directory data is relatively static.

The solutions proposed for active query caching of database backed web applications are typically based on a semantic query caching engine implemented in a servlet running at a proxy. The servlet contacts the backend database for queries which can not be answered from the cache. Applied to the directory context, the servlet will contact the origin LDAP directory server for such queries. Since directory is a part of the LDAP protocol, performing directory caching outside the LDAP framework poses several problems. Firstly, the servlet will require some schema information. This is because the attribute syntaxes and matching rules are required for query containment. Secondly, the solution will have to rely on some non-standard (outside the LDAP framework) means of ensuring consistency. Thirdly, the security framework of LDAP has to be implemented at the servlet. Lastly, for supporting clients which access LDAP directly (through LDAP URLs or browser address books etc.) such a solution will still require to export an LDAP interface.

Our approach is to extend an LDAP directory server to an LDAP proxy cache by implementing the query cache within the server. Implementing LDAP caching at the middleware (LDAP) level ensures that unlike the servlet based approach different applications and directories can use the solution. In this solution, schema information can be replicated from the backend server. Access control information can either be replicated or a different policy might be supported. Consistency issues can be addressed in a standard way, for eg. using the LDAP client update protocol (LCUP).

LDAP proxy cache: Architecture

A typical directory server has a front end which processes incoming client requests, calls a backend interface to perform the corresponding operation on the database, and passes the results back to the client. The backend is a database driver for LDAP. It implements an interface for each of the LDAP operations which performs the read and write operations corresponding to that operation on the database. The result of the operation is sent back to the front end.

    
	     +=============================================================+
	     |								   |	
	     |			  +-------------------+     		   |	
+-------+    |    +--------+      |                   |     +----------+   |
|LDAP	|<---|--->|Protocol|<---->|Cache Manager  [QC]|<--->|Cache     |   |
|Client |    |    |Engine  |      |                   |     |Backend   |   |
+-------+    |    +--------+      +--------+----------+     +----------+   |
	     |                             |                               |
	     +=============================+===============================+
	QC: Query cache  		   |
	     				   |
					   |      +--------------+
					   +----->| Backend LDAP |
					          | Server       |
						  +--------------+ 
    
    
Figure 1: LDAP proxy cache architecture

Figure 1 shows the architecture of an LDAP proxy cache. The proxy cache backend supports the backend interfaces like any other backend. It also consists of a query cache which implements query containment, cache replacement, prefetching algorithms and stores semantic information for the cached queries. The following cache specific operations are peformed by the proxy cache backend.

  1. Query containment: Implements the algorithm for determining whether the incoming query is contained in any of the cached queries.
  2. Cache search: This operation searches the cache for answerable queries and returns the results to the cache backend.
  3. Remote search/prefetching: For unanswerable queries, either the query or a generalized query depending on the caching algorithm is sent to the origin server.
  4. Add/merge entries: Entries returned from the origin server are added to the cache backend database. If the entry is already cached, any fetched attributes not present in the cached entry are added. This is termed as merge.
  5. Cache replacement, prefetching: If merging/adding can not proceed due to the cache being full, the caching algorithm specific policy is used to determine which filter(s) to remove.
  6. Remove query: This operation actually removes the entries corresponding to the filter marked by cache replacement algorithm for removal. All the entries from the cache which belong to only that filter are removed.
  7. Consistency check: Weak consistency based on TTLs is supported. Each type of query is given a TTL, after which the query is considered inconsistent. Periodically consistency check is performed and the queries (and corresponding data) which are stale are removed.

Proxy Cache: Implementation in OpenLDAP

Caching is based on templates. A template is completely defined in terms of a filter format and a set of projected attributes. For eg. queries

Query 1: Filter: &(cn=Foo)(c=US) attrset: {mail,telephoneNumber}.

Query 2: Filter: &(cn=Bar)(c=IN) attrset: {mail,telephoneNumber}.

belong to the same template filter: &(cn=)(c=) attrset: {mail,telephoneNumber}. Typical LDAP applications use a finite number of such templates. The proxy cache allows specification of those templates and attribute sets for which the corresponding queries are to be cached.

Each template is associated with a time to live (TTL). This is the maximum duration for which the cached queries corresponding to the template are considered valid.

Cache replacement removes queries which are least recently used (LRU). Cache replacement is invoked when cache size reaches a threshold (hi_thresh). Queries are removed until the cache size reaches (lo_thresh).

The query cache and cache management routines are implemented in back-meta while the cached entries can be stored in any backend (eg. back-ldbm, back-bdb etc).

Back-meta extensions

The meta backend performs basic LDAP proxying with respect to a set of remote LDAP servers, called "targets". The information contained in these servers can be presented as belonging to a single Directory Tree (DIT).

Cache configuration specific directives: Three cache specific directives are added:
  • cacheparams: Used to set various cache parameters. The syntax is:
    		
    cacheparams <hi_thresh> <lo_thresh> <# of attrsets> 
    <max entries> <consistency cycle time>
    
    		
    	    
    hi_thresh = cache capacity (bytes).
    lo_thresh = lower threshold for cache replacement (CR).
    # attrsets = Total number of attribute sets.
    max entries = Maximum entries for a cacheable queries.
    consistency cycle time = Time in sec between successive consistency checks.

  • attrset: Used to associate a set of attributes to an index.
     
    		
    attrset <index> <attr1> <attr2> <attr3> ...
    		
    	    
    index = index identifying the attribute set.
    attr1..attrn = attributes comprising the set.

  • addtemplate: Adds a cacheable template with a given TTL.
     
    		
    addtemplate <template string> <attr set index> <ttl>
    		
    	    
    template string: RFC 2254 string representation.
    attr set index: Index identifying the set of projected attributes.
    ttl: Time for which cached queries belonging to this template are valid.

Query caching specific structs

The struct metainfo is extended to contain a pointer to struct cache_manager.

	    
typedef struct cache_manager_s {
    unsigned long       cache_size;	/* current cache size (bytes) */ 
    unsigned long 	thresh_hi;	/* cache capacity (bytes) */
    unsigned long       thresh_lo;	/* lower threshold for cache replacement */
    unsigned long	num_cached_queries;/* total # of cached queries */
    unsigned long       max_queries;	/* upper bound on # of cached queries */ 

    int 	numattrsets;		/* number of attribute sets */
    int 	numtemplates;		/* number of cacheable templates */
    int 	total_entries;		/* total number of entries cached */ 
    int         num_entries_limit;	/* max # of entries in a cacheable query */ 

    int         consistency_cycle_time;	/* consistency cycle time (sec) */ 
    int 	consistency_time;	/* last consistency check (sec) */ 

    query_manager*   qm;	/* query cache managed by the cache manager */ 
    /* mutexes */
    ...
} cache_manager; 
	    
        

struct query_manager represents the query cache.

	    
typedef struct query_manager_s {
    struct attr_set* 	attr_sets;	/* possible sets of projected attributes */
    QueryTemplate*  	templates;	/* cacheable templates */

    CachedQuery* lru_top;	/* top and bottom of LRU list */
    CachedQuery* lru_bottom;

    /* Query cache methods */
    QCfunc	qcfunc;		/* Query containment*/  
    CRfunc	crfunc; 	/* cache replacement */
    AddQueryfunc addfunc;	/* add query */ 
} query_manager;
            
	

The above structs are defined in back-meta/cache.h.

Routine for query containment

Query containment is provided for substring, GE, LE and equality queries. An incoming query belonging to a cacheable template is checked for containment with the cached queries of that template or a more general template. The general query containment routine is in back-meta/query-cache.c while containment using substring queries is implemented in back-meta/substring.c.

Handling search request

The search backend API of back-meta is modified to handle an incoming search request at the proxy cache. If the incoming query is contained in any of the cached queries, it performs a local cache search and sends the result to the clients. Otherwise, it sends the query to the origin server. The entries returned are sent to the client and are merged in the cache database.

Merge/Remove implementation

The merge and remove operations are implemented using the callback mechanism provided in OpenLDAP. We introduce a cache specific multi-valued operational attribute queryid which is required in all the cached entries. It represents the IDs of all the queries, the entry belongs to.

Add/merge: To add/merge an entry e: {(a1,v1), (a2,v2)..(an,vn)} with name, dn, obtained as a result of query with ID qid, using the callback mechanism the following steps are performed:

  1. Perform base search with DN=dn and filter (queryid=*).
  2. In the callback function for send_search_entry perform a MODIFY REPLACE using the modify backend API on the cached entry replacing previous values with fetched values. Add value qid to the queryid attribute.
  3. In the callback function for send_search_result if nentries = 0, ADD the entry e with name dn using the add backend API.

Remove: To remove a query with search base dn, scope s, and ID qid.

  1. Perform a search with base dn and scope s with filter queryid = qid.
  2. In the callback function for send_search_entry find the number of values, n, in the queryid attribute.
    If n=1, delete the entry.
    If n>1, use MODIFY DELETE to delete the value qid from the queryid attribute.
Associating multiple database instances with the proxy cache backend.

The backglue backend is used to provide appearance of a single cache database while allowing multiple databases to serve a single cache directory tree.

Support for weak consistency of the cached information. Support for weak consistency of the cached queries is implemented. Consistency check is carried out periodically in which stale queries with expired TTLs are removed. The interval between checks and TTL values can be specified during configuration using cache specific directives.
Author: Apurva Kumar kapurva@in.ibm.com