Ken's Unified Communications Blog: High Processor Utilization on Lync 2013 Front-End Servers

Thursday, January 16, 2014

High Processor Utilization on Lync 2013 Front-End Servers

UPDATE (2015-Oct-02): The problem has finally been fixed! Took nearly 2 years, but here's a link to the KB article. The fix is in the September 2015 CU for Lync Server 2013. The cause is "because the topology snapshot was recomputed multiple times by the Response Group Service." Thanks to @sublimeashish for telling me.

We have a customer who is about to migrate from Lync 2010 to Lync 2013. They've got a few lightly loaded Lync 2013 Enterprise Edition pools with 3 servers each. All are running Windows 2008 R2 Standard Edition on VMWare. All patches are up-to-date.

For inexplicable reasons, some of the servers will suddenly see their processor utilization spike to near 100% for extended periods of time, when their typical utilization is less than 5%. A look at Task Manager shows two instances of the W3WP.exe service (IIS web service) that are consuming large amounts of processor resources. There are no events in the Event Logs to indicate an issue.

Performing an IISReset on the affected node makes the processor go back to normal, but this is obviously not a real solution. We opened a ticket with Microsoft PSS, and they confirmed there are others seeing the same thing. It seems the source of the problem is the "garbage collection" process in the LyncIntFeature and LyncExtFeature application pools in IIS. Recycling those pools makes processor utilization return to normal (for a while at least).

Microsoft is actively working to resolve the issue, and I will post a permanent solution for all to see as soon as one becomes available.

UPDATE: Thanks to @dannydpa on Twitter, it appears the trigger may be Lync topology publishing. I confirmed this by updating the topology and publishing it. Less than 10 minutes later, all the servers processor utilization spiked. Recycling the aforementioned apppools resolved the issue.

To help others with this issue, I've created a little Powershell script that will recycle the LyncIntFeature and LyncExtFeature app pools for all Lync servers. For the script to work, you need to make sure that remote management is enabled on all Lync servers. On Windows Server 2012, this is on by default, but in Windows 2008 R2, you need to log on locally and run: Enable-PSRemoting -Force before running the script.

$WebPools = (Get-CSService -WebServer).PoolFQDN

ForEach ($Pool in $WebPools)
{
$PoolMembers = (Get-CSPool $Pool).Computers
Foreach ($Computer in $PoolMembers)
{
Write-Host "Resetting LyncExtFeature and LyncIntFeature app pools on $Computer"
$Session = New-PSSession -ComputerName $Computer
Invoke-Command -session $Session -ScriptBlock {Restart-WebAppPool LyncExtFeature}
Invoke-Command -session $Session -ScriptBlock {Restart-WebAppPool LyncIntFeature}
Remove-PSSession $Session
}
}

15 comments:

UnknownJanuary 18, 2014 at 1:17 AM
The CPU spike only happens when adding or removing an object from Topology. Changing a value does not cause this. It also appears to be related some how to response groups.

If these at VMs, disable NUMA support.
ReplyDelete
Replies
Danny PonitJanuary 20, 2014 at 8:22 AM
We have NUMA spanning disabled and also made sure there is no CPU over commit. We have a little over 100 RGS workflows on our pool. We created all RGS via PowerShell on our new Lync 2013 pool. We never migrated them from Lync 2010 to Lync 2013. Microsoft is investigating new traces. Hopefully they find something.
ReplyDelete
Replies
AnonymousJanuary 22, 2014 at 11:36 AM
From what I am seeing, it only affects the pool that hosts the CMS as well. We have no Response Groups but it appears an addition to the topology caused it.
ReplyDelete
Replies
AnonymousFebruary 4, 2014 at 7:33 AM
Had the same issue and opened a ticket with MS.
They saw nothing suspicious, but CPU was around 60% at all times.
Figured out the Call Park Service consumes up to 70% CPU at times, and the worse part is that it's not being used at all.
Disabled CP in all policies, same issue. Restarted the service, same issue.
Only fixed after I manually removed it from "Programs and Features" and stopped the service.
CPU is now around 10-15% at most times.
ReplyDelete
Replies
AnonymousFebruary 10, 2014 at 3:58 PM
We have had this high CPU spike on the W3WP processes when publishing topology changes. We have had this issue though before we upgraded to 2013. Shortly after installing our first 2013 pool we had the issue again but it did seem to have an impact on other 2010 servers in the pool and it was not just restricted to servers hosting CMS. It does not seem to happen every time - we have made several topology additions without having the issue.
I have just had the issue now after the topology was modified to remove a 2010 pool after migration. In this instance all four cores on the CMS hosting server were at 100% with two of the eight or so w3wp.exe services consuming the resource between them. One strange anecdote to this instance is that I only noticed it because my remote powershell session timed out connecting - and I only noticed that because I was attempting to access the rgsconfig page on a 2010 pool. I checked the CPU on the 2010 front end and it was really low but there was a RGS error in the Lync event log and an ASP.net Error in the Application Event log.

Mike Dickin
Hempel

ReplyDelete
Replies
Francis HibbertApril 1, 2014 at 3:21 AM
Problem Solved:
I had this problem on a 2 node Enterprise FE pool. It started on one of the front ends, and then spread to the second one. After a lot of investigation, the solution was to apply the SQL 2012 Express SP1 on both the LYNCLOCAL and the RTCLOCAL instances. The LYNCLOCAL instance would not install using the unattended install but did install using the GUI install.
ReplyDelete
Replies
Thomas PoettJune 6, 2014 at 5:57 AM
Hi Ken,
I had figured out some additional problems, well IISReset helps, but I had seen a huge traffic coming via the load balancer. still investigating this. Its unclear why, because there is not user active right now.
I can confirm it part of the topology publishing point. I will check from where the traffic is coming on the LB. use if you check the netstat -ano, you will see the connection from the LB.
More when I found some more interesting.
Cheers
Thomas
ReplyDelete
Replies
Michael PapalabrouJune 6, 2014 at 9:45 AM
Same issue here - after changing a file store in the topology, the 2013 FE server of another site (not related to the change) spiked. iisreset helped. Network utilisation not high in our case.
ReplyDelete
Replies
AnonymousAugust 14, 2014 at 9:59 AM
I have a customer that is experiencing the same issue with high CPU on all FEs after making a change in the Topology (6 servers, 2 pools). It is not restricted to the CMS Pool and is resolved by conducting an IISRESET or recycling the application pools (LyncIntFeature, LyncExtFeature). We applied the 8/5/14 Lync Server updates and the issues still persists. Has anyone found a resolution to this issue?
Thanks, -John Lockett
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Thursday, January 16, 2014

High Processor Utilization on Lync 2013 Front-End Servers

15 comments: