Jenkins and LDAP and Timeouts

Fixing Jenkins LDAP authentication when using a Load Balancer

If you’re using a combination of LDAP + Jenkins + a load balancer (say, AD domain controllers behind an NLB in an AWS environment) and you’re seeing odd timeout issues when users try and log in, perhaps you’re hitting problems with Java connection pooling.

We were seeing errors similar to the following in our jenkins.log file:

nested exception is org.acegisecurity.ldap.LdapDataAccessException: LdapCallback;LDAP response read timed out, timeout used:60000ms.; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used:60000ms.; remaining name ''

A bit of digging, a bit of tcpdump and a big of head-scratching and hair pulling seemed to indicate this might be a problem with firewall timeouts, especially as when we were seeing the Jenkins timeout issues, there was no LDAP traffic leaving the Jenkins server. Odd.

A bit more digging, and reading the quite informative but also quite hateful ([Java JNDI LDAP documentation] seemed to point to connection pooling being the issue.

It looks as though connections in the pool are not kept alive, so any sort of stateful/tracking firewall (or in our case, load balancer) in the way would drop the connection after a specific period of idle time, but this was never ‘noticed’ by the JNDI provider. So when the application needed to query LDAP, it’d pluck a connection from the pool, send a query… and nada. Packets were dropped by the firewall/load balancer, the connection times out and you pick another connection from the pool… rinse and repeat until you’ve exhausted them all at which point a new connection gets created and then everything works normally for as long as it takes you to get bored of debugging the problem and move on to something else.

The easiest option semeed to be to turn off connection pooling, which can (thankfully) be easily configured by an environment variable when initialising the JNDI LDAP provider. Set the variable com.sun.jndi.ldap.connect.pool to false in the Jenkins LDAP configuration, and everything seems to work.

For now, the problem seems to have gone away… 🙏