If you’re using a combination of LDAP + Jenkins + a load balancer (say, AD domain controllers behind an NLB in an AWS environment) and you’re seeing odd timeout issues when users try and log in, perhaps you’re hitting problems with Java connection pooling.
We were seeing errors similar to the following in our jenkins.log
file:
LdapCallback;LDAP response read timed out, timeout used:60000ms.; nested
exception is javax.naming.NamingException: LDAP response read timed out,
timeout used:60000ms.; remaining name ''```
A bit of digging, a bit of tcpdump and a big of head-scratching and hair
pulling seemed to indicate this might be a problem with firewall timeouts,
especially as when we were seeing the Jenkins timeout issues, there was no LDAP
traffic leaving the Jenkins server. Odd.
A bit more digging, and reading the quite informative but also quite hateful
(https://docs.oracle.com/javase/7/docs/technotes/guides/jndi/jndi-ldap.html#POOL)[Java
JNDI LDAP documentation] seemed to point to connection pooling being the issue.
It looks as though connections in the pool are not kept alive, so any sort of
stateful/tracking firewall (or in our case, load balancer) in the way would
drop the connection after a specific period of idle time, but this was never
'noticed' by the JNDI provider. So when the application needed to query LDAP,
it'd pluck a connection from the pool, send a query... and nada. Packets were
dropped by the firewall/load balancer, the connection times out and you pick
another connection from the pool... rinse and repeat until you've exhausted
them all at which point a new connection gets created and then everything works
normally for as long as it takes you to get bored of debugging the problem and
move on to something else.
The easiest option semeed to be to turn off connection pooling, which can
(thankfully) be easily configured by an environment variable when initialising
the JNDI LDAP provider.
Set the variable `com.sun.jndi.ldap.connect.pool` to `false` in the Jenkins
LDAP configuration, and everything seems to work.
For now, the problem _seems_ to have gone away... 🙏 <i class="fab
fa-jenkins"></i>