Thursday, January 16, 2014

High Processor Utilization on Lync 2013 Front-End Servers

UPDATE (2015-Oct-02): The problem has finally been fixed!  Took nearly 2 years, but here's a link to the KB article. The fix is in the September 2015 CU for Lync Server 2013.  The cause is "because the topology snapshot was recomputed multiple times by the Response Group Service."  Thanks to @sublimeashish for telling me. 

We have a customer who is about to migrate from Lync 2010 to Lync 2013.  They've got a few lightly loaded Lync 2013 Enterprise Edition pools with 3 servers each.  All are running Windows 2008 R2 Standard Edition on VMWare.  All patches are up-to-date.

For inexplicable reasons, some of the servers will suddenly see their processor utilization spike to near 100% for extended periods of time, when their typical utilization is less than 5%. A look at Task Manager shows two instances of the W3WP.exe service (IIS web service) that are consuming large amounts of processor resources.  There are no events in the Event Logs to indicate an issue.

Performing an IISReset on the affected node makes the processor go back to normal, but this is obviously not a real solution.  We opened a ticket with Microsoft PSS, and they confirmed there are others seeing the same thing.  It seems the source of the problem is the "garbage collection" process in the LyncIntFeature and LyncExtFeature application pools in IIS.  Recycling those pools makes processor utilization return to normal (for a while at least).

Microsoft is actively working to resolve the issue, and I will post a permanent solution for all to see as soon as one becomes available.

UPDATE:  Thanks to @dannydpa  on Twitter, it appears the trigger may be Lync topology publishing. I confirmed this by updating the topology and publishing it.  Less than 10 minutes later, all the servers processor utilization spiked.  Recycling the aforementioned apppools resolved the issue.

To help others with this issue, I've created a little Powershell script that will recycle the LyncIntFeature and LyncExtFeature app pools for all Lync servers.  For the script to work, you need to make sure that remote management is enabled on all Lync servers.  On Windows Server 2012, this is on by default, but in Windows 2008 R2, you need to log on locally and run: Enable-PSRemoting -Force before running the script.
$WebPools = (Get-CSService -WebServer).PoolFQDN

ForEach ($Pool in $WebPools)
  $PoolMembers = (Get-CSPool $Pool).Computers
  Foreach ($Computer in $PoolMembers)
    Write-Host "Resetting LyncExtFeature and LyncIntFeature app pools on $Computer"
    $Session = New-PSSession -ComputerName $Computer
    Invoke-Command -session $Session -ScriptBlock {Restart-WebAppPool LyncExtFeature}
    Invoke-Command -session $Session -ScriptBlock {Restart-WebAppPool LyncIntFeature}
    Remove-PSSession $Session