Active Directory Domain Services — Troubleshooting & Operations Field Guide
Purpose
This document summarizes practical Active Directory Domain Services troubleshooting concepts and command-line workflows for Microsoft hybrid and on-premises identity environments.
The focus is operational: how to reason through AD DS failures logically instead of checking random components in isolation.
Scope
This is a learning and portfolio-support document for AD DS troubleshooting reasoning.
It uses placeholder names and general operational patterns only. It does not include customer environment data, production domain names, credential identifiers, private incident evidence or privileged operational records.
Troubleshooting principle
Most AD DS issues should be investigated from the dependency layer upward:
1. DNS resolution
2. Secure channel
3. Replication
4. FSMO role availability
5. Group Policy processing
6. Event logs and exact failure evidence
DNS should usually be checked first because domain-joined clients and domain controllers depend on DNS to locate domain services, LDAP, Kerberos and Global Catalog records.
Core diagnostic flow
nslookup domain.local
nslookup -type=SRV _ldap._tcp.domain.local
dcdiag /test:dns /v
nltest /sc_query:domain.local
repadmin /showrepl
repadmin /replsummary
netdom query fsmo
gpresult /r
eventvwr.msc
FSMO roles
FSMO roles are special single-master roles used for operations that cannot safely occur as normal multi-master updates.
Forest-wide roles
- Schema Master
- Domain Naming Master
Domain-wide roles
- PDC Emulator
- RID Master
- Infrastructure Master
Operational notes
The PDC Emulator is usually the most operationally visible FSMO role. It affects password changes, account lockout behavior, time synchronization and Group Policy editing behavior.
netdom query fsmo
Get-ADDomain | Select-Object PDCEmulator, RIDMaster, InfrastructureMaster
Get-ADForest | Select-Object SchemaMaster, DomainNamingMaster
AD replication
AD DS replication is multi-master for most directory data. Replication issues are commonly caused by DNS failures, network connectivity problems, Kerberos or secure channel problems, excessive time drift or tombstone lifetime issues.
repadmin /replsummary
repadmin /showrepl
repadmin /syncall /AeD
repadmin /queue
Common replication failure causes
1. DNS resolution failure between domain controllers
2. Network connectivity or blocked RPC / LDAP ports
3. Kerberos or secure channel failure
4. Excessive time drift
5. Tombstone lifetime exceeded after a domain controller has been offline too long
Secure channel
A secure channel is the trusted machine-account relationship between a domain-joined computer and the domain.
Common symptom:
The trust relationship between this workstation and the primary domain failed.
Useful checks:
nltest /sc_query:domain.local
nltest /sc_reset:domain.local
Test-ComputerSecureChannel -Verbose
Test-ComputerSecureChannel -Repair
Typical cause: a machine has been restored from an old snapshot or backup and its local machine-account password no longer matches the password stored in Active Directory.
Domain controller health
Domain controller health should be checked with both high-level and targeted diagnostics.
dcdiag /v
dcdiag /test:dns /v
dcdiag /s:DC1 /v
Important areas:
- Connectivity
- Replications
- NetLogons
- Advertising
- KnowsOfRoleHolders
- Services
- DNS registration
DNS SRV records are critical for domain controller discovery:
nslookup -type=SRV _ldap._tcp.domain.local
nslookup -type=SRV _kerberos._tcp.domain.local
Group Policy troubleshooting
Group Policy troubleshooting should verify both policy processing and targeting.
gpupdate /force
gpresult /r
gpresult /h report.html
Common causes of GPO issues:
- replication delay
- WMI filter excludes target
- security filtering excludes target
- GPO linked to wrong OU
- client has not refreshed policy
- Block Inheritance / Enforced behavior misunderstood
Group Policy processing order
Local policy
Site
Domain
OU
Later processing usually wins, except where Enforced and Block Inheritance change the normal behavior.
Fine-Grained Password Policy
Fine-Grained Password Policies are applied to users or global security groups, not directly to OUs.
Get-ADFineGrainedPasswordPolicy -Filter *
Get-ADUserResultantPasswordPolicy username
Key rule:
Lower precedence number = higher priority.
If an OU-level targeting model is needed, use a group-based approach instead of trying to link a Password Settings Object directly to an OU.
FSMO seizure
FSMO seizure is a last-resort recovery operation when the original role holder will not return.
Modern PowerShell approach:
Move-ADDirectoryServerOperationMasterRole `
-Identity "DC2" `
-OperationMasterRole SchemaMaster,RIDMaster,PDCEmulator,InfrastructureMaster,DomainNamingMaster `
-Force
Important warning: if a seized FSMO role holder later returns online, it must not be allowed to rejoin normally. Metadata cleanup and controlled recovery are required.
Event logs
Important logs:
- Directory Service
- DNS Server
- System
- Security
Common event IDs to recognize:
| Event ID | Area | Meaning |
|---|---|---|
| 5719 | Netlogon | Secure channel / domain controller communication issue |
| 1311 | Directory Service | Replication topology inconsistency |
| 13568 | DFSR | SYSVOL replication journal wrap issue |
| 4768 / 4769 | Security / Kerberos | Kerberos authentication events |
Root cause priority
Common AD DS root causes in practical troubleshooting order:
1. DNS misconfiguration or missing SRV records
2. Replication failure caused by DNS or network issues
3. Broken secure channel
4. Incorrect Fine-Grained Password Policy targeting
5. Unavailable or misplaced FSMO role holder
6. Group Policy targeting, inheritance or refresh issue
Summary
Active Directory troubleshooting is dependency-driven. DNS, secure channel and replication must be validated before higher-level components such as Group Policy or password policy behavior can be trusted.