AD Plugin Join Replication Issues

I was getting random AD plugin connection issues after joining to Active Directory. dsconfigad showed no errors, but sometimes I would not get a connection and I would have to rejoin. The problem turned out to be related to replication.

The AD plugin initially has no knowledge of which AD site and domain controllers are considered local to your subnet, so it discovers any domain controllers and contacts one to lookup the site information. During this process, and in general, the AD Plugin keeps an LDAP connection open to the domain controller. The AD plugin likes to reuse these LDAP connections, presumably for performance reasons. When it is time to actually add the computer to the domain, the AD Plugin reuses this existing connection. The problem is that this domain controller is not necessarily one within your AD site.

At this point, if the Mac is restarted or DirectoryService is killed, any new connections will be made to a DC in the subnet’s AD site, but if your computer was added to a non-local DC, the local DCs may have no knowledge of your computer because the computer account has not yet replicated to them.

This problem can appear to be quite random because sometimes you’ll get lucky and get a local DC for the join, or you might catch the replication at the right time. You might also see bad password errors in the DirectoryService debug logs. I have filed a bug report on this, and I don’t have a good workaround for now other than — don’t reboot or restart DirectoryService after a join. Of course if you know your replication schedules, you could just wait until you are sure replication is completed.

This same issue can present itself with unjoins and rejoins.

You can see what domain controllers you are connecting to during the join using the following shell command assuming your are joining using dsconfigad:

while [ 1 ]; do if netstat -a | grep ldap| grep ESTAB; then ps auxww | grep dsconfigad | grep -v grep; date;fi; done

If you have joined, unjoined, and rejoined and think you may be seeing replication issues, compare the whenCreated attribute of the computer account on different domain controllers using ldapsearch.

ldapsearch -LLL -v -W -x -h -D -b "OU=Computers,DC=subdomain,DC=forest,DC=com" CN=machine-join-name | grep whenCreated:

If an older out of sync computer account exists, its whenCreated date will be different from the domain controller the computer was just added to until the last join has replicated to all the servers.