How to Reduce Time for LDAP Caching


For large organizations with thousands of users in their global LDAP directory, the time for caching LDAP users can take up to one hour in worst case scenarios. 

Learn how to reduce the time for LDAP caching during Datameer boot up.


LDAP cache build times are directly related to the size and depth of the LDAP tree structure. The easiest way to help long cache times is by specifying a search base for users and groups that omits most of the LDAP tree.

A typical organizational structure is something like this: com > company > region > office > room. There are multiple regions, offices, and rooms to search through for users. If a high level search base is used such as dc=company, dc=com, Datameer has to scan each level of the tree, and the tree gets more wide as you traverse downward. In this structure, all users in the entire tree can be found.

When a product has no more than 50 concurrent users, why scan all 10,000? Put those 50 users in an organizational unit (ou) and scan only that ou. By correctly grouping users and specifying a specific search base, the time to build an LDAP cache can be reduced from over an hour to under a second.


Assume there are 3 regions, 4 offices in each region, and 5 rooms per office. We'll call them 1, 2, 3; A, B, C, D; and a, b, c, d, e respectively. Further, assume there are 50 people in each room.  That's 3000 user entities that need to be loaded and cached.


The people who need to log in are limited to one office and better yet, one room in that office. Let's say: 1, B, and c. Instead of specifying dc=company, dc=com as the search base, which results in a full search of all regions, offices, and rooms, you can specify a search base that is specific to where the users are found. Only that portion of the tree is scanned, and only that portion is cached. So in this example, the search base to set is: ou=c, ou=B, ou=1, dc=company, dc=com. The result is that only the 50 people in region 1, office B, room c are scanned and cached.

The benefit of this is two fold. By setting the search base to a more specific scope the time spent actively scanning and retrieving the tree structure, group objects, and user objects is significantly reduced. Secondly, by limiting the result set, the amount of data to cache is also significantly reduced. These two factors combined result in dramatically reduced time to build the cache.

Another parameter to consider is pagination control (PC). Use it to increase performance of requests with large numbers of results, this limits the number of result objects per page. Please note that not all implementations support PC. Enter 0 to disable.