Showing posts with label edge. Show all posts
Showing posts with label edge. Show all posts

Friday, March 9, 2018

Edge Topology Replication Failures Caused by Mismatched Windows Updates

While getting a new Skype for Business edge server ready for production, I generally make sure all the latest Windows Updates are applied. I was doing exactly this for the company I work for (Nectar Services Corp).  Everything seemed to go just fine, but I noticed the edge server's replication status was showing as False when I ran Get-CsManagementStoreReplicationStatus.  The usual procedure of running Invoke-CsManagementStoreReplication -ReplicaFQDN servername did nothing. Even wiping out the C:\RtcReplicaRoot\xds-replica folder as described in this dusty old blog post didn't make a difference.

The Event Logs on the front-end server that was the master replication partner showed these fairly frequent error events:
Log Name:      Lync Server
Source:        LS File Transfer Agent Service
Date:          2/22/2018 10:01:29 AM
Event ID:      1046
Task Category: (1121)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      FRONTENDSERVERNAME.contoso.com
Description:
Skype for Business Server 2015, File Transfer Agent cannot send replication data to Replica Replicator Agent on Edge
Edge machine: EDGESERVERNAME.contoso.com
Exception: System.ServiceModel.EndpointNotFoundException: There was no endpoint listening at https://edgeservername.contoso.com:4443/ReplicationWebService that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details. ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it 192.168.1.2:4443
I could reach the edge server's replication web service URL via port 4443 as described in the event log, so there wasn't a firewall issue or an issue with the web service that I could determine.

The edge server was also throwing errors, saying that it hadn't heard from any of the replication servers in a while and its feelings were very hurt.
Log Name:      Lync Server
Source:        LS Replica Replicator Agent Service
Date:          2/22/2018 12:04:32 PM
Event ID:      3045
Task Category: (3003)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      EDGESERVERNAME.contoso.com
Description:
The replication synthetic transaction has not been updated in a significant time period.
Time since the last update: 01.03:43:09
Cause: The Master Replicator Agent has not updated the replication transaction document in a significant time period.
Resolution:
If other replicas are experiencing similar issues, check the Master Replicator Agent and File Transfer Agent service health.  Verify access to the DFS files shares and replica file shares
The weird thing was that replication WAS working prior to me installing the final batch of Windows Updates. So, it seemed to make sense to uninstall each of the last Windows Update until things start working again. Unfortunately, after uninstalling those updates, along with a bunch of other ones in an increasingly desperate attempt to find the issue, the issue still persisted. Back to the drawing board.

I noticed that I was unable to copy binary files to the edge server via RDP copy/paste. Text files would copy fine, but any other filetype would cause the RDP session to crash with "Unexpected Server Error".  This seemed to be related to the replication issue, because this also worked fine prior to installing those updates. This is a good time to note that to access the edge server, I had to first RDP to a front-end server, then RDP to the edge from there, because the firewall was blocking all other access. This would prove to be relevant later.

I went as far as uninstalling and reinstalling Skype for Business along with all the supporting components, but it STILL didn't work.

Frustration level: STRATOSPHERIC

Finally, I nuked the entire edge server from orbit (It's the only way to be sure) and had the edge server rebuilt from scratch. Once again, replication worked.

End of story, right? Wrong. I had to figure out what the issue was, so I re-applied each of the original suspected updates until the issue re-appeared. And re-appear it did, along with the RDP copy/paste issue. The offensive update that caused the issue was 4072650, which is vaguely described as Hyper-V integration components update for Windows virtual machines. The description wasn't much help, but it certainly had an effect. Once again, removing that patch didn't make the issue disappear.

Finally, I checked to see if the front-end servers had this truly despicable update, AND THEY DIDN'T.  It was buried in Windows Update as an optional update, which weren't being automatically downloaded and installed. On a hunch, I installed the update (no restart required, thankfully), and VOILA, replication started working again, and I could copy/paste files between front-ends and edge via RDP.

So, this vague Windows Update was the source of all my issues. Sigh... Presumably, this would only show up in the following circumstances:

  • Using Hyper-V hosted virtual machines
  • Running Windows 2012 R2
  • Update 4072650 is installed on some VMs but not all

TL;DR version

If you're having replication errors on your edge server, and are getting LS File Transfer Agent Service error 1046 on your master replication server, then make sure that all front-end and edge servers have Update 4072650 if you see it applied on any one server.



Friday, January 6, 2012

Lync Edge Server Static Routes

If you're following the Technet articles on how to setup your edge server, you will eventually get to the point where you have to setup your NICs on your edge server.  According to the Set Up Network Interfaces for Edge Servers page:, you should set the default gateway on the external interface, but not the internal. The guide then helpfully tells you to....
Create persistent static routes on the internal interface to all internal networks where clients, Lync Server 2010, and Exchange Unified Messaging (UM) servers reside.
If you're not a Windows networking expert, this might stump you a bit.  Doing some searches might help, but here's a simple way to ensure that all internal networks are covered, even if you aren't aware of exactly which ones are in use.  This can easily happen if you're a consultant doing a Lync deployment for a large, multi-site company.

There are 3 well-known IP subnets that are reserved for internal use.  Any networking person can tell you what they are, and should be using these for their internal corporate network.  If not, then I would recommend running away.  The 3 well-known subnets are 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16.

I typically add static routes for all 3 subnets even if they aren't all in use.  This will future-proof your deployment  in case the company adds or changes their subnetting scheme.  To add these static routes on the internal interface of your edge server, do the following:
  1. Using the Network Connections interface, make sure your NICs have descriptive names that make sense (Ie. Internal and External)
  2. Open a command prompt in Administrative Mode on the edge server.
  3. Make sure you know what the internal default gateway should be.  In this example, we will use 192.168.100.1
  4. Type the following commands in the command window:
netsh interface ipv4 add route 10.0.0.0/8 "Internal" 192.168.100.1
netsh interface ipv4 add route 172.16.0.0/12 "Internal" 192.168.100.1
netsh interface ipv4 add route 192.168.0.0/16 "Internal" 192.168.100.1
When you do a netsh interface ipv4 show route, you should see the new routes show up at the bottom of the list.  If you make a mistake, you can delete a route by using the same command above, and replace add with delete.  Now,  your Lync edge server should be able to route to any internal address, both now and in the future.

UPDATE:  Apparently, I'm part Amish, and I was using ROUTE ADD instead of the updated netsh commands shown above.  Thanks to @twharrington on Twitter for pointing out the error to this ol' timer.  I'M DOIN' IT OLD SKOOL!