Friday, April 20, 2012

Resetting Lync CMS Replication

The Central Management Store (CMS) stores a copy of the entire Lync topology for your deployment.  Every server has a copy of the CMS, but there can be only one master.  Each Lync server downloads a copy of the CMS from this master at regular intervals.  By default, the first Lync server you deploy is designated the master CMS.  However, there may be cases where you have to move the master CMS to another server.  This can be done relatively easily, assuming you follow the documentation properly.

The CMS replication process uses a local file share to copy updates between servers.  The share is called \\servername\xds-replica.  Every server has this share, including the CMS master.  The share is typically located in the root of the C: drive in the folder C:\RtcReplicaRoot\xds-replica.  If you installed Lync on another drive, this folder will be in the root of that drive.  

Sometimes, you may find that CMS replication is not working on a specific server.  You can check the CMS replication status by running the command Get-CsManagementStoreReplicationStatus. If all is well, every server's UpToDate status will be True.  If a server's status is False, try to force replication by running Invoke-CsManagementStoreReplication -ReplicaFqdn servername.  Wait a few minutes to see if its status changes.  If not, then look in the Event Log for both the failed replica and the CMS master for clues as to what is wrong.

If you can't find any reason for the failed replication, I've found that deleting the xds-replica folder on the failed replica and recreating it seems to reset things and solve the problem.  Unfortunately, even full Lync administrators do not have permissions to view the contents of the xds-replica folder (likely to prevent people like me from making a mess of things).

To "reset" the xds-replica to installation default follow these steps:
  1. Stop the following services used for CMS replication:
  • Lync Server File Transfer Agent
  • Lync Server Replica Replicator Agent (courtesy of the Department of Redundancy Department)
  1. Take ownership of the C:\RtcReplicaRoot\xds-replica folder, using the below picture as a guide.  Be warned, that once you start this procedure, you're committed to following through.  When you take ownership of the folder, you will wipe out the required permissions Lync needs to replicate the CMS and remove the share.

  1. Once you take ownership, delete the entire xds-replica folder under C:\RtcReplicaRoot.  
  2. Go to Control Panel - Programs and Features, select Microsoft Lync Server 2010, Core Components and select Repair.  This will create a new xds-replica folder/share and set the proper permissions.
  3. Go back to the Services snap-in and restart the two services.  The Replicator service may have been set to Disabled by the repair process.  Just set it to Automatic before starting it.
  4. Run Invoke-CsManagementStoreReplication -ReplicaFqdn servername and after a few minutes you should see the CMS replication status for the server change to True.
This procedure worked like a charm for me on a few occasions.  Let me know if it doesn't work for you.

11 comments:

  1. Yes it works well !!
    laurent Teruin
    Lteruin@hotmail.com

    ReplyDelete
  2. Thanks Ken. This procedure works!

    ReplyDelete
  3. worked for me. thanks I owe you another beer

    ReplyDelete
  4. Ken,

    from this article its not clear to me, where did things went wrong:
    the replica directory permissions went wrong on the slave server, or on the master server (CMS)?
    Do I need to reset the folder permission settings on the slave or on the master or both?

    ReplyDelete
    Replies
    1. I've only tried this on a slave server, not the master.

      Ken

      Delete
    2. Thanks Ken,

      also please remember that the XDS folder structure -though for a different purpose- is also generatede on the Lync FileShare as well, and that FileShare can be different from the CMS server (for example CMS is serverA, however the Lync FileShare is hosted on serverB) so it further complicates the architecture :)

      Delete
    3. in my case i had both front end and and edge server showing False. i followed this procedure on the master (front end Server) and now both are showing true. Thanks this worked like a charm.

      Delete
  5. Actually I am in the middle of Moving CMS from 2010 to 2013 and after the move-csmanagementserver done successfully, in TechNet they tell you to run deployment wizard on both for add/remove Lync components, however after doing that Both (Lync Server File Transfer Agent) and (Lync Server Master Replicator Agent) are not able to start on the 2010 Lync and the Lync 2013 is not up to date :'(

    ReplyDelete
    Replies
    1. Try restarting the servers in each pool. When you move the CMS role, the Master Replica Agent should be removed from the 2010 server. Restarting should clean that up.

      Delete
  6. These instructions are still valid for Lync 2013, thanks!

    This solved issues I was having with 1 node of an enterprise pool. This server hadn't been getting the topology updates and didn't know about the upgraded 2013 edge server (still was looking for the 2010 edge at the same FQDN) and error logs were sending me off on a wild goose chase. Users connected to this server were having strange problems while the other half of the users were doing fine connected to the other node.

    ReplyDelete
  7. Hey Ken, I am having this issue but my set up is a little bit different than usual. I am having this issue on a HP survivable branch appliance in a child domain. I have tried all given solutions, including copying the folder from the front end server using X-Copy and your given solution here. The situation now is that I am not getting any errors in the event viewer any longer on both the sba and the FE server. but it is still coming up as false. I would greatly appreciate it if you could come with a suggestion. brgds, Hafsteinn Isaksen Senior IT DeepOcean.

    ReplyDelete