2015年9月3日星期四

Don't be afraid of DNS Scavenging. Just be patient.

DNS Scavenging is a great answer to a problem that has been nagging everyone since RFC 2136 came out way back in 1997.  Despite many clever methods of ensuring that clients and DHCP servers that perform dynamic updates clean up after themselves sometimes DNS can get messy.  Remember that old test server that you built two years ago that caught fire before it could be used?  Probably not.  DNS still remembers it though.  There are two big issues with DNS scavenging that seem to come up a lot:
"I'm hitting this 'scavenge now' button like a snare drum and nothing is happening.  Why?"
or
"I woke up this morning, my DNS zones are nearly empty and Active Directory is sitting in a corner rocking back and forth crying.  What happened?"
This post should help us figure out when the first issue will happen and completely avoid the second.  We'll go through how scavenging is setup then I'll give you my best practices. 

Scavenging setup

Scavenging will help you clean up old unused records in DNS.  Since "clean up" really means "delete stuff" a good understanding of what you are doing and a healthy respect for "delete stuff" will keep you out of the hot grease.  Because deletion is involved there are quite a few safety valves built into scavenging that take a long time to pop.  When enabling scavenging patience is required.  It will work just fine, but not today!
Note: For purposes of this discussion we are going to concentrate on the most common Windows DNS scenario: Windows Server 2003 DNS servers hosting AD integrated zones.
Scavenging is set in three places on a Windows Server:
  1. On the individual resource record to be scavenged.
  2. On a zone to be scavenged.
  3. At one or more servers performing scavenging.
It must be set in all three places or nothing happens.

Scavenging settings on a Resource Record

To see the scavenging setting on a record hit View | Advanced in the DNS MMC then bring up properties on a record. 
image
Scavenging gets set on a resource record in one of three ways.  The first is by someone coming in here, checking the "Delete this record when it becomes stale" checkbox and hitting apply.  When you hit apply the time of day will be rounded down to the nearest hour and applied as the timestamp on the record.  Static records have a timestamp of 0 indicating do not scavenge. 
The second is when a record gets created by a client machine registering using dynamic DNS.  Windows clients will attempt to dynamically update DNS every 24 hours.  All DDNS records get set to scavenge.  When a record is first created by a client that has no existing record it is considered an "Update" and the timestamp is set.  If the client has an existing host record and changes the IP of the host record this is also considered an "Update" and the timestamp is set.  If the client has an existing host record with the same IP address then this is considered a "Refresh" and the timestamp may or may not get changed depending on zone settings.  More on this later.
The third way to set scavenging on records is by using DNScmd.exe with the /ageallrecords switch.  Let's pause here for a few moments to consider a few important words: All, Records, Delete, Stuff.  If you actually run this command against a zone it will truly set scavenging and a timestamp on all records in the zone including static records that you never want to be scavenged.  Because of the time it takes scavenging to do it's thing people find this command and get tempted to give it a try.  Do not.  It will delete stuff.  Have patience instead.
Once a timestamp is set on a record it will replicate around to all servers that host the zone.  There is one caveat to this.  If scavenging is not enabled on the zone that hosts the record then it will never scavenge so the timestamp is essentially irrelevant.  The timestamp may get updated on the server where the client dynamically registers but it will not replicate around to the other servers in the zone.

Scavenging Settings at the Zone

Before a server will even look at a record to see if it will be scavenged the zone must have scavenging enabled.  To access the scavenging settings for a zone right click the zone, select properties then on the general tab hit the "Aging" button.  This screen is universal for the zone.  If you view it on any DNS server where this zone is replicated it will be the same.
image
When you first set scavenging on a zone the timestamp seen at the bottom (reload zone if you don't see it) will be set to the current time of day rounded down to the nearest hour plus the Refresh interval.  This also gets reset any time the zone is loaded or any time dynamic updates get enabled on the zone. 
The "zone can be scavenged after" timestamp is the first of your safety valves.  It gives clients time to get their record timestamp updated before the big axe swings.  Since new record timestamps are not replicated while zone scavenging is disabled this also gives replication time to get things in order.
Refresh and No-Refresh intervals
The next safety valves are the Refresh and No-refresh intervals.  Both of these must elapse before a record can be deleted.
The No-refresh interval is a period of time during which a resource record cannot be refreshed.  Recall from earlier that a refresh is a dynamic update where we are not changing the host/IP of a resource record, just touching the timestamp.  If a client changes the IP of a host record this is considered an "update" and is exempt from the No-refresh interval.  The purpose of a No-refresh interval is simply to reduce replication traffic.  A change to a record means a change that must be replicated.
After the (Record Timestamp) + (No-refresh interval) elapses we enter the Refresh interval.  The refresh interval is the time when refreshes to the timestamp are allowed.  This is the time when good things must happen.  The client is allowed to come in and update it's timestamp.  This timestamp will be replicated around and the No-refresh interval begins again.  If for some reason the client fails to update it's record during the refresh interval it becomes eligible to be scavenged.  Will it disappear immediately?  Probably not but it is certainly possible.
Note: When setting Refresh and No-Refresh intervals be sure to allow enough time for clients to get several registration attempts during a Refresh interval.  Failure to do so could allow a record to become eligible for scavenging simply from a failed refresh attempt.
One last thing before we leave the zone setting behind.  If you right click on your server you will see the option to "Set Aging/Scavenging for All Zones...".  Selecting this will take you to a screen similar to the one above.  What does this do?  This sets the default settings that will be used if a new zone is created by this server.  Unless you check the subsequent box "Apply these settings to the existing Active Directory-integrated zones" it will not touch existing zones.

Scavenging settings on the Server

So you now have a resource resource record set to scavenge and a zone set to scavenge.  All that is left is for somebody to come along, check all the timestamps and delete some stuff.  This is done by any server that hosts the AD integrated zone. 
Setting scavenging on the server is done by right clicking the server in the MMC, selecting properties, going to the advanced tab and checking the "Enable automatic scavenging of stale records" checkbox.
image
The Scavenging Period is how often this particular server will attempt to scavenge.  When a server scavenges it will log a DNS event 2501 to indicate how many records were scavenged.  An event 2502 will be logged if no records were scavenged.  Only one server is required to scavenge since the zone data is replicated to all servers hosting the zone.
Tip: You can tell exactly when a server will attempt to scavenge by taking the timestamp on the most recent 2501/2502 event and adding the Scavenging period to it.
Although you can set every server hosting the zone to scavenge I recommend just having one.  The logic for this is simple: If the one server fails to scavenge the world won't end.  You'll have one place to look for the culprit and one set of logs to check.  If on the other hand you have many servers set to scavenge you have many logs to check if scavenging fails.  Worse yet, if things start disappearing unexpectedly you don't want to go hopping from server to server looking for 2501 events.
To facilitate strict control over which server is scavenging for a zone you can use DNSCmd.exe to specify exactly which servers may scavenge.  For example the following command will make it so that only 192.168.1.1 and 192.168.1.2 DNS servers are allowed to scavenge on the contoso.com zone:
DNSCmd . /ZoneResetScavengeServers contoso.com 192.168.1.1 192.168.1.2
With the server now scavenging, zones enabled for scavenging, and resources records set what actually happens when the server does it's thing?
The scavenging process and final safety valves
When the last 2501/2052 event + the server scavenging period comes around the server is going to make a scavenging attempt.  You can also manually initiate an attempt by right clicking the server and selecting "Scavenge Stale Resource Records".  Note that manually making an attempt in no way bypasses the safety valves.  These are the final safety valves before we "delete stuff":
  • Is scavenging enabled on the zone?  Pretty self explanatory.
  • Is dynamic update enabled on the zone?  If it's not there is a good chance timestamps will be old enough that mass deletions can occur.
  • Is the scavenging server listed as one of the "Scavenge Servers" for the zone?
  • Are we past the "zone can be scavenged after" timestamp on the zone?  This gives the clients and AD replication to get things squared away before we start. 
  • Has it been longer than a refresh interval since this zone was last replicated in Active Directory?  If scavenging gets enabled on a server that has replication issues this will prevent it from tombstoning a bunch of records that may be perfectly fine on other servers.
If all of the above checks are good then the zone is ready to be scavenged.  At this point the scavenging server checks the timestamp on each individual resource record.  If the current date/time is greater than the timestamp + No-refresh + Refresh then the record is deleted.

My best practices

Here is how I set scavenging up on a preexisting zone.  This procedure is designed for maximum safety.  Using default settings this process can take as long as 4-5 weeks (2 weeks Sanity phase, 2-3 weeks for Enable phase)
Setup phase
  1. Turn off scavenging on all servers.  To confirm scavenging won't inadvertently run use the DNSCmd /ZoneResetScavengeServers to confine scavenging to a single server then ensure this server has scavenging disabled.
  2. Turn on scavenging on the zones you wish to scavenge.  Set the refresh and No-refresh intervals as desired.  If you want things to scavenge more aggressively I would recommend lowering the No-refresh interval at the cost of some replication traffic.  Leave the refresh at the default.
  3. Add today's date plus the Refresh and No-Refresh intervals.  Come back in a few weeks when this time has elapsed.  Seriously you can't rush this.
Sanity check phase
Sift through your DNS records looking for any records older than the Refresh + No-Refresh interval.  If you see any then something has gone wrong with the dynamic registration process and it must be corrected before proceeding.  A thorough check at this point is the most important step in setup
Things to check if you find old records:
  • Does an IPConfig /registerdns work?
  • Who is the owner of the record (see security tab in the record properties)?
  • Was the record statically created by an admin then later enabled for scavenging?  If so you may need to delete the record to clear ownership and run an IPConfig /registerdns to get it updated.
  • Is the server replicating OK with AD?
Do not proceed unless you can explain any outdated records.  In the next phase they will be deleted.
Enable phase
The final step is to actually enable scavenging.  Enable scavenging on the single server you used the /ZoneResetScavengServers command on.
Once enabled create a new test record and enable it for scavenging.  Then map out the point in time when this record will disappear.  Here is how:
  1. Start with the timestamp on the record
  2. Add the refresh interval
  3. Add the no refresh interval
  4. The result will be your "eligible to scavenge" time.  The record will not disappear at this time though.  It's just eligible.
  5. Check your DNS event logs for 2501 and 2502 events to find what hour the DNS server is doing a scavenging run.
  6. Take your "eligible to scavenge" time, find the most recent 2501/2502 event and add the server's Scavenging Period (from server properties | advanced tab) to it.  This is the point in time when the test record you just created will disappear.
Lets look at an example with the following assumptions:
  • Zone is set to a 3 day Refresh and a 3 day No-Refresh interval
  • Server Scavenging period is set to 3 days
  • Last DNS Event id 2501 or 2502 occurred at 6am on 1/1/2008
  • We have a record with a timestamp of 1/1/2008 at 12:00 noon
Given these assumptions you can rub your temples for a bit and predict that the record will be deleted at approximately 6am on 1/10/2008.
image
Once scavenging is enabled you can check back periodically to look for the 2501 and 2502 events to see how things are going.  You can also come back at the predicted date and time and see if your test record disappeared.
That's it!

Understanding aging and scavenging

Applies To: Windows Server 2008, Windows Server 2008 R2
The DNS Server service supports aging and scavenging features. These features are provided as a mechanism for performing cleanup and removal of stale resource records, which can accumulate in zone data over time.
With dynamic update, resource records are automatically added to zones when computers start on the network. However, in some cases, they are not automatically removed when computers leave the network. For example, if a computer registers its own host (A) resource record at startup and is later improperly disconnected from the network, its host (A) resource record might not be deleted. If your network has mobile users and computers, this situation can occur frequently.
If left unmanaged, the presence of stale resource records in zone data may cause some problems:
  • If a large number of stale resource records remain in zones, they can eventually take up server disk space and cause unnecessarily long zone transfers.
  • Domain Name System (DNS) servers that load zones that contain stale resource records might use outdated information to answer client queries, potentially causing the clients to experience name resolution problems on the network.
  • The accumulation of stale resource records at the DNS server can degrade its performance and responsiveness.
  • In some cases, the presence of a stale resource record in a zone can prevent a DNS domain name from being used by another computer or host device.
To solve these problems, the DNS Server service has the following features:
  • Time stamping, based on the current date and time that is set at the server computer, for any resource records that are added dynamically to primary-type zones. In addition, time stamps are recorded in standard primary zones where aging and scavenging is enabled.

    For resource records that you add manually, a time-stamp value of zero is used, indicating that these records are not affected by the aging process and that they can remain without limitation in zone data unless you otherwise change their time stamp or delete them.
  • Aging of resource records in local data, based on a specified refresh time period, for any eligible zones.

    Only primary-type zones that are loaded by the DNS Server service are eligible to participate in this process.
  • Scavenging for any resource records that persist beyond the specified refresh period.

    When a DNS server performs a scavenging operation, it can determine that resource records have aged to the point of becoming stale and remove them from zone data. You can configure servers to perform recurring scavenging operations automatically, or you can initiate an immediate scavenging operation at the server.

    For more information, see either Enable Automatic Scavenging of Stale Resource Records or Start Immediate Scavenging of Stale Resource Records.
CautionCaution
By default, the aging and scavenging mechanism for the DNS Server service is disabled. It should be enabled only when all parameters are fully understood. Otherwise, the server can be accidentally configured to delete records that should not be deleted. If a record is accidentally deleted, not only will users fail to resolve queries for that record, but any user can create a record and take ownership of it, even on zones that are configured for secure dynamic update.
A server uses the contents of each resource-record-specific time stamp, along with other aging and scavenging properties that you can adjust or configure, to determine when it scavenges records.

Prerequisites for aging and scavenging

Before you can use the aging and scavenging features of DNS, several conditions must be met:
  1. Scavenging and aging must be enabled, both at the DNS server and on the zone.

    By default, aging and scavenging of resource records is disabled.
  2. Resource records must either be dynamically added to zones or manually modified to be used in aging and scavenging operations.

    Typically, only those resource records that are added dynamically using the DNS dynamic update protocol are subject to aging and scavenging.

    You can, however, enable scavenging for other resource records that are added through nondynamic means. For records that are added to zones in this way, either by loading a text-based zone file from another DNS server or by manually adding them to a zone, a time stamp of zero is set. This makes these records ineligible for use in aging and scavenging operations.

    To change this default, you can administer these records individually, to reset and permit them to use a current (nonzero) time-stamp value. This makes it possible for these records to become aged and scavenged.

    For more information, see Reset Aging and Scavenging Properties for a Specified Resource Record.
noteNote
In the case of changing a zone from standard primary to Active Directory-integrated, you may want to enable scavenging of all existing resource records in the zone. To enable aging for all existing resource records in a zone, you can use the AgeAllRecords command, which is available through the dnscmd command-line tool.

Aging and scavenging terminology

The following table indicates new or revised terms that have been introduced to help specifically when discussing aging and scavenging.

 

TermDescription
Resource record time stamp
A date and time value that is used by the DNS server to determine removal of the resource record when it performs aging and scavenging operations.
Current server time
The current date and time on the DNS server. This number can be expressed as an exact numeric value at any point in time.
No-refresh interval
An interval of time, determined for each zone, as bounded by the following two events:
  1. The date and time when the record was last refreshed and its time stamp was set.
  2. The date and time when the record next becomes eligible to be refreshed and have its time stamp reset.
This value is needed to decrease the number of write operations to the Active Directory database. By default, this interval is set to seven days. It should not be increased to an unreasonably high level, because the benefits of the aging and scavenging feature might either be lost or diminished.
Refresh interval
An interval of time, determined for each zone, as bounded by the following two distinct events:
  1. The earliest date and time when the record becomes eligible to be refreshed and have its time stamp reset.
  2. The earliest date and time when the record becomes eligible to be scavenged and removed from the zone database.
This value should be large enough to allow all clients to refresh their records. By default, this interval is set to seven days. It should not be increased to an unreasonably high level, because the benefits of the aging and scavenging feature might either be lost or diminished.
Start scavenging time
A specific time, expressed as a number. This time is used by the server to determine when a zone becomes available for scavenging.
Scavenging period
When automatic scavenging is enabled at the server, this period represents the time between repetitions of the automated scavenging process. The default value for this is seven days. To prevent deterioration of DNS server performance, the minimum allowed value for this is one hour.
Record refresh
When a DNS dynamic update is processed for a resource record when only the resource record time stamp, and no other characteristics of the record, are revised.
Refreshes generally occur for the following reasons:
  1. When a computer is restarted on the network and, if at startup, its name and IP address information are consistent with the same name and address information it used before being shut down, it sends a refresh to renew its associated resource records for this information.
  2. A periodic refresh is sent by the computer while it is running.

    The Windows DNS Client service renews DNS registration of client resource records every 24 hours. When this dynamic update occurs, if the dynamic update request does not cause modification to the DNS database, it is considered to be a refresh and not a resource record update.
  3. Other network services make refresh attempts, such as: DHCP servers, which renew client address leases; cluster servers, which register and update records for a cluster; and the Net Logon service, which can register and update resource records that are used by Active Directory domain controllers.
Record update
When a DNS dynamic update is processed for a resource record where other characteristics of the record in addition to its time stamp are revised.
Updates generally occur for the following reasons:
  1. When a new computer is added to the network and, at startup, it sends an update to register its resource records for the first time with its configured zone.
  2. When a computer with existing records in the zone has a change in IP address, causing updates to be sent for its revised name-to-address mappings in DNS zone data.
  3. When the Net Logon service registers a new Active Directory domain controller.
Scavenging servers
An optional advanced zone parameter that enables you to specify a restricted list of IP addresses for DNS servers that are enabled to perform scavenging of the zone.
By default, if this parameter is not specified, all DNS servers that load a directory-integrated zone (also enabled for scavenging) attempt to perform scavenging of the zone. In some cases, this parameter can be useful if it is preferable that scavenging only be performed at some servers loading the directory-integrated zone.
To set this parameter, you must specify the list of IP addresses for the servers that are enabled to scavenge the zone in the ZoneResetScavengeServers parameter for the zone. This can be done using the dnscmd command, a command-line based tool for administering Windows DNS servers.

When scavenging can start

After all prerequisites for enabling the use of scavenging are met, it can start for a server zone when the current server time is greater than the value of the start scavenging time for the zone.
The server sets the time value to start scavenging on a per-zone basis whenever one of the following events occurs:
  • Dynamic updates are enabled for the zone.
  • A change in the state of the Scavenge stale resource records check box is applied. You can use DNS Manager to modify this setting at either an applicable DNS server or one of its primary zones.
  • The DNS server loads a primary zone that is enabled to use scavenging.

    This can occur when the server computer is started or when the DNS Server service is started.
  • When a zone resumes service after having been paused.
  • If the zone is AD DS-integrated, replication for the zone must have taken place at least once since the DNS service was restarted or the domain controller was rebooted. When the previous events occur, the DNS server sets the value of start scavenging time by calculating the following sum:

    Current server time + Refresh interval = Start scavenging time

    This value is used as a basis of comparison during scavenging operations.

Example: the aging and scavenging process for a sample record

To understand the process of aging and scavenging at the server, consider the life span and successive stages of a single resource record, as it is added to a server and zone where this process is in effect and then aged and removed from the database.
  1. A sample DNS host, "host-a.example.microsoft.com", registers its host (A) resource record at the DNS server for a zone where aging and scavenging are enabled for use.
  2. When registering the record, the DNS server places a time stamp on this record based on current server time.

    After the record time stamp is written, the DNS server does not accept refreshes for this record for the duration of the zone no-refresh interval. It can, however, accept updates before that time. For example, if the IP address for "host-a.example.microsoft.com" changes, the DNS server can accept the update. In this case, the server also updates (resets) the record time stamp.
  3. Upon expiration of the no-refresh period, the server begins to accept attempts to refresh this record.

    When the initial no-refresh period ends, the refresh period immediately begins for the record. During this time, the server does not suppress attempts to refresh the record for its remaining life span.
  4. During and after the refresh period, if the server receives a refresh for the record, it processes it.

    This resets the time stamp for the record based on the method that is described in step 2.
  5. When subsequent scavenging is performed by the server for the "example.microsoft.com" zone, the record (and all other zone records) are examined by the server.

    Each record is compared to current server time on the basis of the following sum to determine whether the record should be removed:

    Record time stamp + No-refresh interval for zone + Refresh interval for zone

    • If the value of this sum is greater than current server time, no action is taken and the record continues to age in the zone.
    • If the value of this sum is less than current server time, the record is deleted both from any zone data currently loaded in server memory and also from the applicable DnsZone object store in Active Directory Domain Services (AD DS) for the directory-integrated "example.microsoft.com" zone.






https://technet.microsoft.com/en-us/library/cc771677.aspx