Wednesday, January 21, 2015

Adventures in Normalization

One of the things that differentiate Lync from its competitors is its reliance on 3rd party peripherals like deskphones. This is often touted as a Good Thing, since you're not locked into a single vendor for phones. Phones from single-vendor solutions are often more expensive than an equivalent Lync phone simply because you can't get them from anywhere else, so competition is low and prices are higher.

Since Lync phones are provided by different vendors, there is naturally a lot of variation in the specific details on how things work. Microsoft has alleviated this somewhat by instituting a Lync-qualified devices program.  This is a fairly thorough program that puts devices through their paces to ensure they function properly within Lync. 

However rigorous the program, little details slip through the cracks. One of the details that I've noticed is how different phones (and even the Lync client) handle regular expressions in different ways. For anyone familiar with the minutiae (thanks Word of the Day Calendar) of Lync Enterprise Voice, regular expressions are used for phone number normalization and routing. 

All Lync-qualified devices handle basic regex in the same manner. For most people, this is not an issue, but sometimes you need to use lesser-known regex patterns to accomplish certain goals. When you try to get fancy with regex, things sometimes break down with disparate results between clients and phones. 

Over the years, I've heard about and personally experienced situations where Lync Optimizer-generated regex doesn't work on specific phones. The affected vendors have eventually fixed these issues, but I figured that I should do some hardcore 60 Minutes style investigation into just how well different devices handle some of the more esoteric regex patterns out there. 

To do this, I used my wide variety of Lync deskphones I have available for testing in my home lab.  (You should hear what it sounds like when someone phones me.  Everything is ringing and flashing, and I usually go "AAAAhhhhh" and run out of the room.)  I used a Polycom VVX 600, a Polycom CX600, a Snom 760, and an old LG-Nortel "Tanjay" series phone.  My AudioCodes 440HD wasn't connecting to our Lync server at the time, so it was left out of testing. Every phone was updated with the latest firmware available at testing time. 

For the first round of the Lync Normalization Olympics, I decided to keep it simple, using some of the rule formats commonly used by the Lync Optimizer.  I input the normalization rules into Lync via the Control Panel, and I used the "Create voice routing test case information" to see how Lync Control Panel would interpret the regex, and included that in the results too. For all the rulesets, I used a unique "identifier" for each rule to allow me to setup a series that I could test all at once.  

Commas

First off, I used a regex rule that included a comma, which I had heard caused normalization issues with one particular vendor. In the below example, I'm looking for a pattern of numbers that starts with 999 followed by 10 or 11 digits. The 999 gets stripped and puts a + in its place.

Pattern: ^999(\d{10,11})$
Translation: +$1
Input number: 99915552223333 then 9995552223333
Expected output: +15552223333 then +5552223333

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+15552223333
+15552223333
+15552223333
+15552223333
+15552223333
+15552223333
+5552223333
+5552223333
+5552223333
+5552223333
+5552223333
+5552223333
Everything passed that one with flying colours, so rumours of a phone type not handling commas seem to no longer apply.

Dealing with Extensions, Part I

Another common rule used in the Optimizer is a rule to accept any dialed number, but strip out any extensions and leave it as an externally routable E.164 number. This rule was meant mostly for click-to-dial from the Lync client, but I did hear about a particular vendor's phone not being able to parse that particular rule. In the next example, I'm looking for a pattern of numbers that starts with 998 followed by 11 digits, followed by some regex that would strip anything else beyond the 11 digits. The 998 gets stripped and puts a + in its place.

Pattern: ^998(\d{11})\d*(\D+\d+)?$
Translation: +$1
Input number: 99815552223333
Expected output: +15552223333

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+15552223333
+15552223333
+15552223333
+15552223333
+15552223333
+15552223333
Everything passed that one with flying colours, so rumours of a phone type not parsing those rules seem to no longer apply.

Dealing with Extensions, Part II

The next rule test is a variation on the first example above, using a slightly different regex pattern to strip any extension. This format was used a long time ago in the Lync Optimizer, but was replaced with the "cleaner" looking version above.  One person told me he used this format to get around an issue with a particular phone vendor's issue parsing the previous example's format.

Pattern: ^997(\d{11})(\s*\S*)*?$
Translation: +$1
Input number: 99715552223333
Expected output: +15552223333

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+15552223333
+15552223333
+15552223333
+15552223333
+15552223333
+15552223333
Again, no issues on any platform.

Extensions with Optional Site Code

In companies with multiple sites that provide its users with extension dialing, its often necessary to use site codes to properly differentiate one site from another. This is especially true if there are overlapping extension ranges originating from legacy PBXs. So, for a company using 4-digit extensions, they may prepend a site code to ensure uniqueness. So, Site #101 could use 1015xxx for their extension range, and Site #102 could use 1025xxx for their extension range.

One particular company wanted to ensure that users within any given site could dial their own users by extension without the prepended site code, but still allow them to enter the site code if they wanted to. I did this in one rule by using a bit of fancy regex. In the below example, a user should be able to enter the extension with or without the 996 "site code".

Pattern: ^(?:996)?(5\d{3})$$
Translation: +1555222$1
Input number: 9965000 then 5000
Expected output: +15552225000


Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+15552225000
+15552225000
+15552225000
+15552225000
+15552225000
+15552225000
+15552225000
+15552225000
+15552225000
+15552225000
+15552225000
+15552225000
Sigh, once again, no issues on any platform.

OK, boooooring. All the phones passed the basic regex tests without issue. Not at all "exciting". I use "exciting" in quotes, because how exciting can the topic of normalization be?

Multiple Replacement Strings

Sometimes you need to use multiple replacement strings in a rule (ie $1, $2 etc). Rumour has it some phones have issues with this.

Pattern: ^996(\d{3})000(\d{7})$
Translation: +1$1$2
Input number: 9965550002223333
Expected output: +15552223333

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+15552223333
+15552223333
+15552223333
+15552223333
+15552223333
+15552223333
Running out of ways to say "No issues".


Advanced Normalization Tests 

To really gauge how well a phones' regex engine functions, we have to throw at it some regex that would only be used in very specific circumstances. 

For example, let's say I have a need to translate a set of digits into a pattern where those digits are buried within other numbers.  For example, I want to take 92xx and translate it to +155592xx999.  So, 9201 translates to +15559201999.  If you try to create a pattern like ^(92\d{2})$ and translate it to +1555$1999, there is ambiguity on whether you mean +1555 $1 999 or +1555 $19 99. 

Normally, I would follow the .NET regex reference which has different ways of dealing with this, but none of them work in the Lync client.  I did stumble upon one that did work, even though it isn’t a valid .NET expression.  Not only that, but testing with other phones gave different results most of the time.

I tried various patterns, and plugged in 9100, 9200, 9300 etc to see the results in both Control Panel and Lync softclient and desk phones. The results proved interesting.

Pattern: ^(91\d{2})$
Translation: +1555$1999
Input number: 9100
Expected output: Expect failure, but want to see how various devices handle this situation.

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+1555$1999
+155599
+15559100999
+15559100999
No match
+1555
Seems as though the old Tanjay and CX-series phone work in this situation, but nothing else. No big deal, because I wouldn't expect any consistent behaviour here.


Pattern: ^(92\d{2})$
Translation: +1555${1}999
Input number: 9200
Expected output: +15559200999

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+15559200999
No match
+15559200999
+15559200999
+15559200999
No match
Using ${1} is a valid way to prevent any ambiguity between $1 or $19 in this example. Sadly only the Lync client and the Snom don't respect this valid regex.


Pattern: ^(93\d{2})$
Translation: +1555$_999
Input number: 9300
Expected output: +15559300999

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+15559300999
No match
No match
No match
+15559300999
No match
Using $_ is a way to substitute the entire string. Not ideal code-wise, but wanted to see if it would work.  Only the Control Panel and Polycom VVX phones properly parsed it.


Pattern: ^(?<1>94\d{2})$
Translation: +1555${1}999
Input number: 9400
Expected output: +15559400999

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+15559400999
No match
No match
No match
+15559400999
No match
This is a valid .NET regex method to use named replacement strings instead of numeric. So, I could have easily used ^(?<DigItDude>94\d{2})$ -->  +1555${DigItDude}999.  Again, only the Control Panel and Polycom VVX phones properly parsed it.


Pattern: ^(95\d{2})$
Translation: +1555($1)999
Input number: 9500
Expected output: +15559500999

Control Panel
Lync Client
Tanjay Phone
Polycom CX
Polycom VVX
Snom
+1555(9500)999
+15559500999
+15559500999
+15559500999
+15559500999
+1555(9500)999
This is actually not a valid .NET regex method. Surprisingly, all the devices except for the Snom managed to parse it correctly.

Conclusions

As long as your phone number normalization requirements don't require any digit replacement buried within other digits, you can safely use just about any Lync-qualified device.  However, if you know in advance that you have such a requirement (and its very unlikely that you would), I would avoid Snom phones at this time, simply because they couldn't handle any of the advanced regex I threw at them in my attempts to solve that specific issue.

The way I see it, if I build a regex pattern in Lync Control Panel, I should be able to trust that every single device will parse that regex pattern the same way.  I find it rather annoying that there isn't any rigorous checking to see if vendors adhere to .NET regex standards. On that front, I feel Microsoft needs to be more diligent with their phone qualification process.  Its these little details that can lead to frustration for anybody deploying Lync Enterprise Voice.

If you got to the end of this post and are still not asleep.....congratulations!  This post has been entered in the "Most Boring Tech Post of the Year" for 2015. Yes, I know its early, but I think this is an early front-runner for the win.