If you’re using a combination of LDAP + Jenkins + a load balancer (say, AD domain controllers behind an NLB in an AWS environment) and you’re seeing odd timeout issues when users try and log in, perhaps you’re hitting problems with Java connection pooling.
We were seeing errors similar to the following in our
nested exception is org.acegisecurity.ldap.LdapDataAccessException:
LdapCallback;LDAP response read timed out, timeout used:60000ms.; nested
exception is javax.naming.NamingException: LDAP response read timed out,
timeout used:60000ms.; remaining name ''
A bit of digging, a bit of tcpdump and a big of head-scratching and hair pulling seemed to indicate this might be a problem with firewall timeouts, especially as when we were seeing the Jenkins timeout issues, there was no LDAP traffic leaving the Jenkins server. Odd.
A bit more digging, and reading the quite informative but also quite hateful (https://docs.oracle.com/javase/7/docs/technotes/guides/jndi/jndi-ldap.html#POOL)[Java JNDI LDAP documentation] seemed to point to connection pooling being the issue.
It looks as though connections in the pool are not kept alive, so any sort of stateful/tracking firewall (or in our case, load balancer) in the way would drop the connection after a specific period of idle time, but this was never ‘noticed’ by the JNDI provider. So when the application needed to query LDAP, it’d pluck a connection from the pool, send a query… and nada. Packets were dropped by the firewall/load balancer, the connection times out and you pick another connection from the pool… rinse and repeat until you’ve exhausted them all at which point a new connection gets created and then everything works normally for as long as it takes you to get bored of debugging the problem and move on to something else.
The easiest option semeed to be to turn off connection pooling, which can
(thankfully) be easily configured by an environment variable when initialising
the JNDI LDAP provider.
Set the variable
false in the Jenkins
LDAP configuration, and everything seems to work.
For now, the problem seems to have gone away… 🙏