Ken's Unified Communications Blog: fix

Showing posts with label fix. Show all posts

Friday, October 21, 2011

Lync Loses Connection Every 8min 28sec

If you enjoy CSI-like detective stories (and by CSI, I mean cheesy quips as he puts on his sunglasses, NOT this) centered around isoteric computer issues involving Lync, then you'll love this post. For the other 6.99999 billion people in the world, move along....nothing to see here.

Late last week, my Lync client on my main work desktop started acting oddly. Every so often, it would spontaneously lose its connection, log out, and re-login again - all within about 5-10 seconds. At first, I suspected a network problem. Running a steady ping against the Lync edge server showed a constant connection. I looked at the Lync servers for any potential issue, thinking it was something that could be affecting everybody, but all seemed well. It was late in the day on Friday, so I dealt with it like I deal with all my problems.....ignore it and hope it just goes away.

Monday morning - same thing. It was starting to get annoying. I'd be in the middle of an IM conversation, and BAM, log out and back in again. Audio calls were unaffected, except for the message about limited connectivity popping up. Weirdly enough, when I logged on via my laptop, the issue never came up, which lead me to believe it was a problem with my desktop computer. Even more weird was that if I logged onto another Lync account on the same pool via my desktop, the problem never came up.

The first thing I tried was to restart my computer. Didn't fix it. Then I exited Lync and deleted all my settings in my %userprofile%\AppData\Local\Microsoft\Communicator folder. No change in behaviour.

There were no relevant messages in the Event Log, even though Event Log logging was turned on. I decided to turn on detailed logging via Options - General - Turn on logging in Lync. You should only turn on this option when troubleshooting an issue, because it can take up a lot of disk space. Logs are stored in the %userprofile%\Tracing folder. The Communicator-uccapi-0.uccapilog is the log file to look at. This file will grow to about 50 MB before it rolls over into Communicator-uccapi-1.uccapilog and so on. It's Notepad-friendly.

Digging through the log, I focussed on the time where the issue arose. I started to notice a pattern. My client would lose its connection EXACTLY every 8 minutes 28 seconds. I racked my mind to think of what would be happening every 8:28, but nothing came to mind. I ripped out all the log data surrounding that time to try to find a pattern. This is the sequence that repeated every 8:28:

10/18/2011|12:23:30.119 1674:2350 INFO :: UCCP:ClientAllowedAuthProts0x10004
10/18/2011|12:23:30.120 1674:2350 INFO :: SIP_REGISTER::RefreshRegistration force(1),refreshSA(1),refreshRoute(0),state(2)
10/18/2011|12:23:30.120 1674:2350 INFO :: SIP_REGISTER:State (2) => (4)
10/18/2011|12:23:30.120 1674:2350 INFO :: SA(b0b8c38) Dropped
10/18/2011|12:23:30.120 1674:2350 INFO :: Out trxn corr-id (0CE823A8)
10/18/2011|12:23:30.127 1674:2350 INFO :: SignMsg:NoSA for request(ce823a8)
10/18/2011|12:23:30.127 1674:2350 INFO :: SignMsg: send request without signature for trans(ce823a8)
10/18/2011|12:23:30.127 1674:2350 INFO :: Trxn corr-id (0CE823A8), SIP msg corr-id (7aa8ef7f)
10/18/2011|12:23:30.127 1674:2350 INFO :: Sending Packet - 209.xx.xxx.xxx:443 (From Local Address: 10.0.0.2:55072) 793 bytes:
10/18/2011|12:23:30.127 1674:2350 INFO :: REGISTER sip:contoso.com SIP/2.0
Via: SIP/2.0/TLS 10.0.0.2:55072

The client would then try re-registering itself multiple times (about 10) for about 3-5 seconds, which would generate a SIP/2.0 401 Unauthorized from the server. Then, it would just log back in as if nothing ever happened.

It looked to me like the line that had SIP_REGISTER::RefreshRegistration force(1),refreshSA(1),refreshRoute(0),state(2) was the likely culprit, but I still had no idea why it was happening. I ran log captures on working machines and never saw that line appear anywhere else. Internet searches came up blank. I even ran traces from the front-end Lync server, which only showed me the multiple 401 Unauthorized messages it was returning to the client. No indication why it was not allowing me in.

I tried uninstalling and reinstalling Lync. Nope. I tried uninstalling and searching and deleting any trace of Lync in folders and the registry. I deleted any cached credentials in Windows via Credential Manager. Nope. In frustration, I even deleted my Lync account and re-created it. Nope.....Chuck Testa.

Then I tried creating a new local account on my desktop and logged into Lync. Great success!!! The problem was gone. However, I wasn't all that thrilled because I didn't want to go through the pain of moving all my profile data to a new account. Plus, it felt like giving up....kind of like when you re-install the OS to deal with an issue.

So, I reasoned that there must be something about my profile that was causing the issue. So, I logged back into my main account, deleted all the temp files in %userprofile%\TEMP, cleaned out my browser history and cache, and restarted. Amazingly, the problem was gone. It's been 3 days and my client has been working normally.

I'm not sure if it was cleaning out the TEMP folder or the IE cache that fixed the problem. Ideally, I should have tried one or the other first, but I was getting tired of troubleshooting by this point. If anybody has any insight as to what was really going on, please drop me a line.

Wednesday, October 5, 2011

Lync Desktop Sharing UI Issue

I love the Lync client (I also love lamp). It's a slick, easy-to-use application that handles, IM, voice, video, conferencing, whiteboarding and desktop sharing. But like everything in life, you've got to take the good with the bad. The bad things in Lync are really more minor annoyances to me, but when I see them trip up users over and over again (myself included), I think "there's got to be a better way to do this". I'm a stickler for a well-designed user interface, so when I see something missing, it gets on my nerves a bit.

The thing I'm talking about today is related to desktop/application sharing. UI-wise, it works like a charm in most cases, but there's a scenario that more often than not, confuses the hell out of people. Let me take you through it:

You start a desktop sharing session with one of your colleagues. You easily find the Sharing button in the Lync client and click Share desktop.

You share your desktop, and everything is great. You'll notice a yellow bar notifying you that you're sharing your screen.

If you click the Preview button, it will open the "stage" which shows you what other people are seeing. The stage will show you one of those cool things that happens when you have 2 mirrors opposite each other....the scene just keeps repeating itself smaller and smaller and smaller until its the size of atoms (or so I imagine). Normally, you don't want to show your stage, because it just gets in the way, so you click "Hide stage".

Now, the other guy wants to show you something on his desktop, so he shares his desktop out. He'll get a notification saying his sharing will replace yours..

And your screen sharing session will just go away and you'll see this yellow bar notification.

However, even though the double arrow beside the user sharing is "lit up", you won't see his desktop. You'll thrash around in confusion until either you quit your job because Lync has "beaten you" or you stumble onto how to see the other person's desktop. If either of those cases don't apply, I'll share the secret. You have to click on Share - Show Stage for the other guy's desktop to show up on your screen.

A very simple way out of this little UI conundrum is to add a button on the yellow notification bar to "Show stage". If the right people see this blog, maybe we'll see it incorporated in a future patch.

Until next time, cheers!

Tuesday, December 14, 2010

Finally! Exchange 2010 OWA Redirect Bug Fixed!

Ever since the release of Exchange 2010 SP1, there have been numerous reports of users being prompted that they have to be redirected to another server, instead of the silent redirect that had worked like a charm until SP1. I blogged about this in September after I experienced the issue for myself. Since then, it's been one of the most-read pages on my blog.

I've just received word that the issue has finally been fixed in Exchange 2010 SP1 Rollup 2. This KB article describes the issue and the confirmation that it's been fixed by RU2.

You can get Rollup 2 here.

Friday, October 1, 2010

Check your Exchange 2010 UM Dial Plans Before Upgrading to SP1

I thought I'd pass along my experience with some undocumented UM changes in Exchange 2010 SP1 that recently caused a client some grief.

This particular client has made extensive use of UM dial plans and auto attendants, with it all tied in to a Cisco Call Manager telephony environment. The way they configured their AAs wasn't done in a manner recommended by Microsoft. Specifically, they didn't consistently assign Dialing Rule Groups to their AAs.

If you're not familiar with Dialing Rule Groups, they are essentially groupings of dialing rules used to determine the types of calls that users can make when they make outgoing calls via Exchange UM. For instance, you might have a dialing rule group that contains a set of rules that only allows local calls. According to MS Best Practices, every dial plan and auto attendant should have at least one dialing rule group assigned to handle every possible combination of numbers it is expected to see.

The UM Dial Plans used by this client were almost exclusively set to 4 digits. Many of the AAs had key mappings (ie Press 1 to reach Sales) that routed to 7-digit extensions. There were no dialing rule groups in place on the AAs. In Exchange 2007 and Exchange 2010 RTM, this didn't seem to matter. The auto attendants always routed the calls properly.

However, once we put in SP1, all the auto attendants that routed calls to extensions with more than 4 digits failed. Users would get to the main menu, press the button corresponding to the key mapping and get a message saying the call could not be completed, and the caller was returned to the main menu.

After much sweating, hand-wringing and a call to MS Premier Support, we determined that we required a dialing rule group on each of the auto attendants that routed to 7-digit extensions. Once done, calls were routed as before.

The Microsoft support rep said that the AAs should never have worked using the configuration this client had in place. However, this client had successfully used this method in both Exchange 2007 and Exchange 2010 RTM. It was only Exchange 2010 SP1 where this became an issue. One way you could look at it is that Exchange 2010 SP1 corrected an logical oversight in previous versions.

So, in essense, make sure your dial plans and auto attendants are configured according to Microsoft's Best Practices BEFORE upgrading to Exchange 2010 SP1.

Friday, September 24, 2010

New September 2010 Hotfixes For OCS R2 Response Group Issue

Microsoft has released a new set of hotfixes to fix the Response Group problems introduced with the July 2010 set of hotfixes. There are 4 required updates for the following services:

Response Group
Core Components
Web Components
Administration Tools

Get a direct link to the server hotfix installer here, which will install all the required updates.

Monday, August 16, 2010

CS "14" Beta Refresh Topology Publish Error

So, after returning from CS "14" Ignite training in ~~shady, cool~~ effin' hot Scottsdale, AZ, I was excited to install the beta refresh in my "lab" (which just so happens to be in our production environment....*cough cough*). I uninstalled the beta before installing the refresh, as recommended by MS. Everything seemed to go relatively well, with a few minor hiccups here and there, until it came to the point where I had to publish the topology. I got this unhelpful error:

The existing topology identifies server mycs14serverFQDN\rtclocal as the CMS but the topology that you are trying to publish identifies server mycs14serverFQDN\rtlcocal as the CMS. The CMS must match before the topology can be published.

At first glance, it appears to be a rather odd message, since the server name is identical in both spots in the error message. Since I hadn't changed the server name, I reasoned that Active Directory still had references to the old CMS database I removed from the beta. CS "14" stores most of its data within the CMS database, instead of Active Directory. The reason behind this is that its much easier to add functionality to a database than to have to extend the AD schema, and all the crap that goes along with it (approvals, hand-wringing, etc). However, AD still has a few references to CS "14", like where to find the CMS database on the network.

Originally, I used ADSIEdit to remove the configuration store information from Active Directory. THis caused some other issues with OCS R2, which were solved by further ADSIEdits to remove some other dead references to the failed CS "14" removal. Recently, I found a better way to remove the configuration settings from Active Directory: Remove-CsConfigurationStoreLocation. I saw this mentioned in one of the forums. I haven't tried it yet, but I'm guessing this is a much better way to clean things up than my ADSIEdit method.

Pages