Disabling HTTP in OWAS/WAC

•2014/03/26 • 2 Comments

tumblr_inline_mm0uxpnKvq1qz4rgpWe built our OWAS farms and, like most Lync people, had no clue what we were doing. But they ended up working anyway so hooray for us.

Now that we are begrudgingly learning a little about it we have learned that we should disable HTTP on the pools and run with HTTPS only.

So we tried the obvious command to disable HTTP:

Set-OfficeWebAppsFarm -AllowHTTP $False

That gives this wonderful error:

Set-OfficeWebAppsFarm : A positional parameter cannot be found that accepts argument ‘False’.
At line:1 char:1
+ Set-OfficeWebAppsFarm -AllowHTTP $False
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : InvalidArgument: (:) [Set-OfficeWebAppsFarm], ParameterBindingException
+ FullyQualifiedErrorId : PositionalParameterNotFound,Microsoft.Office.Web.Apps.Administration.SetFarmCommand

After asking around, we found that the secret to this command is to use a colon (:) instead of a space ( ) between the parameter and the value. As such, this is the proper syntax:

Set-OfficeWebAppsFarm -AllowHTTP:$False

Note that if you have the SSLOffloaded parameter set to True that you cannot disable AllowHTTP. If you try, you get this error:

WARNING: When offloading SSL, AllowHttp is automatically enabled.

To work around this, run the following command to set both to false.

Set-OfficeWebAppsFarm -SSLOffloaded:$False -AllowHTTP:$False

For more detail and tips on how to secure your Office Web Apps, see this blog.

Lync\SCOM bug

•2014/03/04 • 3 Comments

Just a quick note. We have recently been receiving a lot of SCOM alerts like the following in our Lync 2013 environment:

Resolution State: New

Alert Name: [LYNC] Total number of Storage Service EWS Autodiscovery errors.

 

Source: LS Storage Service Component [lync203-1.flinchbot.com]

Path: lync203-1.flinchbot.com

Last modified by: System

Last modified time: 3/4/2014 3:48:54 PM

 

Perf Object Name:

Perf Counter Name: LYSS – Total number of Storage Service EWS Autodiscovery errors.

Perf Counter Value: 186

Error Threshold: 25

Warning Threshold: 1

Consecutive Samples Repeat Count: 2

 

Please see the ‘Product Knowledge’ and the ‘Alert Context’ tab on Alert Properties view for more information.

 

[end of alert description]

These seemed odd because, though I am not at all involved in Exchange, our Exchange guys are sharp and would have caught a misconfigured autodiscover record.

Further, we do not do any archiving so why is a Lync Front End even bothering to check for EWS?

So I checked with my contact at Microsoft and he informed me that this is a bug and we should disable the alert. I don’t know (and don’t care!) if it is a Lync or SCOM error.

The Hidden Logs That Could Crash Your Lync Servers!

•2014/02/28 • 4 Comments

CrashHow’s that for the title of a blog article! Apparently I’ve been reading too much Huffington Post or something. For the record, I never read that website. I have standards, as low as they may be.

So back to the title and the point of this post. Are there actually hidden log files that could cause some unintended problems with your Lync 2013 environment? Absolutely. I am assuming you are already aware that IIS logs could fill up your local hard drive. It is also a good idea to keep an eye on the trace files created by OCS Logger and Snooper.

However, there are some hidden logfiles that are created by Windows Fabric that could very much fill up your hard drive and it would be a decent challenge to find them. If you are unaware, Lync 2013 sits on top of a technology called Windows Fabric. For a nice overview, check out this Technet blog article as well as this article on masteringlync.com.

By default, Windows Fabric is set to create log files in this hidden system directory:

C:\programdata\Windows Fabric\Fabric\log\Traces

Once a log file reaches 128MB, it creates a brand new log file. Over time, all of these 128MB log file will fill up your hard drive. When the hard drive gets full it’s very likely that you will see some issues with Lync – yes, even including the potential of one of your Lync servers to crash.

Here is a screenshot of one of my lab servers where I have done nothing to address this potential issue.

Windows Fabric Log Files

That is a lot of disk space used for logs I can’t even read.

According to Windows Explorer, that is 810MB of disk space taken up in my Lab by Windows Fabric log files. Note that these are binary log files so it’s not as if I could read these log files to see what is happening. As such, these log files are only useful to Microsoft when troubleshooting a potential issue. You know, an issue like your hard drive has filled up! I don’t think there is a point in keeping a years worth of Windows Fabric log files.

So how do we keep these log files from eating up our drive space? For the paranoid, create a scheduled task on all of your Front End Servers (and Directors and SBAs/SBSes) to move the logs to some other server that has disk space you want to waste. For the rest of us looking for an easy, one time fix, run this command from an elevated command prompt (this is not a PowerShell command):

Logman update trace FabricLeaseLayerTraces -f bincirc --cnf

This will change the logging to circular. According to this Technet article, –cnf is used to “create a new file when the log size has been exceeded”. I imagine this is added as a parameter so that logging doesn’t stop once the initial 128MB file size has been reached. Rather, it will go back to the beginning of the same file and continue logging.

So there you go. Either keep an eye on this directory or run the Microsoft-recommended command to make sure these hidden log files don’t cause you unnecessary heartache.

NUMA

•2014/02/24 • Leave a Comment

Microsoft should have been embarrassed that they publicly claimed support for virtualizing Lync 2013 but were incapable of providing guidance…until last week when they finally released their 14-months overdue white paper.

Why can’t school be like the real-world? If I could re-write my papers (aka – release them way late) that I wrote in college I so could up my college GPA from B- to a solid B!

Far too late to help us, we did get some tidbits of information out of Microsoft months before this paper was published. One of the main tripping points we came across is mentioned in the white paper as such:

Disable non-uniform memory access (NUMA) spanning on the hypervisor, as this can reduce guest performance

One of our environments was having all kinds of performance issues and disabling NUMA provided a clear boost to performance. As such, the following little meme flew around our office for a few days. Now that this guidance is official, I thought I’d share it with the rest of you.

 

newman

Is the Lync Stress Tool worthless?

•2014/02/24 • Leave a Comment

stressedThere has been a lot of chatter lately about the Lync 2013 Stress tool particularly since Microsoft just released a new guide about this tool. The guide is very useful as figuring out the tool on your own is….challenging.

In short, the tool works by simulating a heavy load of traffic against your Lync environment. If your servers can handle the load you have defined then you can be fairly confident that your installation is ready for production.

However, there is a big caveat that needs to be explained before you launch this against your Lync servers that sit in any semblance of a production environment. By “any semblance of a production environment” I  mean the Active Directory domain that houses your production or pre-production Lync 2013 servers, any other Lync installs that share the same Lync Organization as the pool you want to test, and anything else that might get pegged harder than usual due to this testing such as network bandwidth or firewalls.

In section 5.1 of the guide, Microsoft even mentions the following:

To stress test Lync Server using LSS, it is best to use an isolated lab environment. The stress testing lab needs to include:

  • Active Directory Domain Services domain controllers

  • Active Directory Certificate Services root certification authority

So if MIcrosoft says you should only use this in a lab environment, begin to ask yourself what is the point of testing lab servers? Well….there isn’t much of a point unless you build an exact duplicate in your lab as to what you will put into production. Depending on the size of your environment, this could be a very sizable investment. (It’s not unheard of to have over 30 Lync-related servers in a single pool. Plan on deploying more than 1 pool in a paired-pool config and your lab will get really large (and expensive) though this tool doesn’t stress test every component).

Alternately, you could bring up your entire Lync environment in a Lab domain, stress test it, then uninstall everything (bootstrapper.exe /scorch) and re-install it into production. Assuming you do everything exactly correct then you will at least have a decent idea that your moving-to-production hardware can handle your anticipated load. But that is an awful lot of work to build your environment twice just to get some metrics.

So then what’s the big deal with just running this in production? Why does Microsoft warn against it?

The guide mentions that you need to build client machines to launch the tests. Each client machine can handle no more than 4500 simulated endpoints (with Multiple Points of Presence (MPOP), it goes up to 6,300 but for the purposes of this article, the focus is on the 4,500 endpoints). Each endpoint is actually a user created in your Active Directory environment and each one of these users will be Lync enabled. What happens when you Lync enable a few thousand or tens of thousands Lync users? You need to regenerate the Lync Address Book and push it out to all of your users.

This is exactly what you don't want your users seeing just because you are testing.

This is exactly what you don’t want your users seeing just because you are testing.

If you are in a small environment then maybe this isn’t a problem. But if you are geographically dispersed and/or your users have limited bandwidth then you can start seeing how there might be issues by throwing abnormally large address books around your network. And if you didn’t think ahead and name your thousands of test users something like ZZZZZZ_LyncUserX then you will have a few thousand new “users” buried smack in the middle of your Lync Address book.

Look at all of those accounts clogging up the address book.

Look at all of those accounts clogging up the address book.

When you remove all of these users a new Address Book will need to be generated and pushed too.

Depending on how robust your AD infrastructure is, do you think your network can handle several thousand users all logging in over a short period of time? Sure you can set the tool to log in users at a rate of one per second but what will this do to any security logging or auditing software you might have in production?

The testing tool can also create a bunch of conference directories that you will have to manually clean up afterwards.

So what should you do instead? Well ask yourself this: What are you really trying to test? The ability of Lync to handle thousands of connections or the ability of your servers  to support thousands of connections to Lync? Because quite honestly, I trust Microsoft to make Lync scaleable to handle the maximum load you are looking to run. But where the bottlenecks come is in in your server infrastructure. Is your SQL Server properly scaled? Do you have enough bandwidth between servers or is your switch overrun and dropping packets?

Microsoft has released a Key Health Indicators document that works with Windows Performance Monitor to collect the key metrics you need to make sure that your servers running Lync are running well. You can download a script to create these counters in Performance Monitor. They are part of the Network Planning, Monitoring, and Troubleshooting with Lync Server document. Just run the PowerShell script and it will create a set of Key Health Indicators for you to monitor.

The script creates a Lync-specific KHI collection within Performance Monitor for you.

The script creates a Lync-specific KHI collector set within Performance Monitor for you.

Now run the KHI collector set for a few days with no one using the servers. This will create your baseline metrics. Now, begin adding or migrating your users to the Lync 2013 servers. Every week run the KHI metrics and see if you notice any unusual spikes. If so, investigate them as these could be pointing out potential bottlenecks such as disk that is too slow or not enough CPU resources.

Using this method will actually let you monitor your environment and let you know if it is handling the actual stress of your deployment and not a theoretical stress in your lab.

Now, how could Microsoft improve the stress tool? Well, create 1 or 10 or 100 users and have them log in 4000 or 400 or 40 times. The tool allows you to have each user log in multiple times but only up to a “100% ratio”. This means if you have 1000 test users you can have up to 2000 sessions with multiple logins (MPOP). However if you need to stress an environment that would need up to 30,000 endpoints you still need to create 15,000 test users.

So to answer the question that is asked in the title of this article: Is the Lync Stress Tool worthless? My answer is that, unless you are in a small deployment or you are really digging deep into Lync architecture, it is basically worthless. Instead, proactively monitor your Lync servers as you would any other production server and should any issues pop up you will be prepared to handle them before they become catastrophic.

Manipulating the msRTCSIP-UserRoutingGroupID

•2014/01/07 • 2 Comments

pointless-button-large-msg-127353534404First off, if you haven’t yet read this article then you should – Understanding how Windows Fabric Works (with regards to Lync). This article is a very minor ancillary to that article so out of context it will make very little sense.

In the Mastering Lync article, the following is written:

As you can see here, the msRTCSIP-UserRoutingGroupID in active directory corresponds to a routing group defined within Lync.  Some of the numbers are reversed from Active Directory to what is seen in the RTCLocal database.

It’s the “Some of the numbers are reversed” bit that the below script handles.

For no apparent reason, I decided to write a script that would take the routing group ID’s in Active directory and manipulate the “reversed numbers” so that they match what is seen in Lync’s SQL Server tables. Note that it doesn’t edit or change anything in AD; rather it just outputs a list of User ID’s and routing groups that you can view and manually compare with what is in SQL Server.

So is there a real point to this script? Probably not but hey – I learned some fun PowerShell manipulation. Plus, you can get all of this information right out of SQL Server.

The first line of the script is a Get-ADUser command which grabs the msRTCSIP-UserRoutingGroupID attribute from all of your users.

$user = get-aduser -filter * -Properties name,MsRTCSIP-UserRoutingGroupID | where {$_."MsRTCSIP-UserRoutingGroupID" -ne $Null}

After that is gathered, a simple foreach loop is run that loops through all of the users and manipulates the numbers. The first thing that is done is to convert the value returned from hexadecimal to a string format so we can easily manipulate it.

</pre>
ForEach ($StringValue in $item."MsRTCSIP-UserRoutingGroupID")
 {

$HexValue = "{0:x}" -f $StringValue
 $HexRoutingGroupID = $HexRoutingGroupID + $HexValue

}

Once this is done, we can now do a bunch of string manipulations to flip the values to match what is seen in SQL Server. In short, what happens is that the string is “counted backward” from one value to another. In the first case we are selecting the values from -13 to -16. The cool thing here is that it also adds those values to our variable in “negative order” so we don’t have to do any string manipulation once we’ve picked off a pair of values we want to reverse. It’s pretty slick.

</pre>
$UserRoutingGroupID1Inverse = $HexRoutingGroupID[-13..(-16)]
 $UserRoutingGroupID2Inverse = $HexRoutingGroupID[-11..(-12)]
 $UserRoutingGroupID3Inverse = $HexRoutingGroupID[-9..(-10)]
 $UserRoutingGroup4 = $HexRoutingGroupID[8..16]

Finally the string gets concatenated together and output to the screen.

</pre>
$outputvalue = $item.name +","+ $UserRoutingGroupID1Inverse +" " + $UserRoutingGroupID2Inverse + " " + $UserRoutingGroupID3Inverse + " " + $UserRoutingGroup4
 $outputvalue

So there it is. No clue if this script will ever provide any major value as you can also just output this information directly from SQL. Check into the rtclocal.rtc.dbo.Resource,  rtclocal.rtc.dbo.FrontEnd and rtclocal.rtc.dbo.RoutingGroupAssignment tables. Then do SQL magic I don’t know how to do.

Download the full script here.

How to Validate that OWAS is Working

•2014/01/02 • Leave a Comment

powerpoint1This article covers how to validate an Office Web Apps Server installation. I am too new to OWAS to know how to fix it if it gets broken. Maybe someday I’ll write an article on how to fix a broken OWAS. Until then, this article will have to suffice.


We are currently installing and configuring our Lync 2013 environment. We followed the instructions on how to install and configure OWAS. The immediate question we had was: How do we test if this is working at all? The instructions show a few event log entries to look for to validate that OWAS is installed correctly. This article shows a few additional ways to get more information to validate your OWAS installation.

Ultimately, the easiest way to test OWAS with Lync is to just upload a PowerPoint file in a meeting and see if people can view it. If so, job done!

But if you want to do some more in-depth testing, below are some commands you can try to see if your OWAS servers are functional.

On your OWAS server, open a PowerShell session and type the following command:

Get-OfficeWebAppsFarm

You should get output similar to the following:

FarmOU :
InternalURL : https://owaspoolint.flinchbot.com/
ExternalURL : https://owaspoolext.flinchbot.com/
AllowHTTP : True
SSLOffloaded : True
CertificateName : OWASFarmCert
EditingEnabled : False
LogLocation : C:\ProgramData\Microsoft\OfficeWebApps\Data\Logs\ULS
LogRetentionInDays : 7
LogVerbosity :
Proxy :
CacheLocation : C:\ProgramData\Microsoft\OfficeWebApps\Working\d
MaxMemoryCacheSizeInMB : 75
DocumentInfoCacheSize : 5000
CacheSizeInGB : 15
ClipartEnabled : False
TranslationEnabled : False
MaxTranslationCharacterCount : 125000
TranslationServiceAppId :
TranslationServiceAddress :
RenderingLocalCacheLocation : C:\ProgramData\Microsoft\OfficeWebApps\Working\waccache
RecycleActiveProcessCount : 5
AllowCEIP : False
ExcelRequestDurationMax : 300
ExcelSessionTimeout : 450
ExcelWorkbookSizeMax : 10
ExcelPrivateBytesMax : -1
ExcelConnectionLifetime : 1800
ExcelExternalDataCacheLifetime : 300
ExcelAllowExternalData : True
ExcelWarnOnDataRefresh : True
OpenFromUrlEnabled : False
OpenFromUncEnabled : True
OpenFromUrlThrottlingEnabled : True
PicturePasteDisabled : True
RemovePersonalInformationFromLogs : False
AllowHttpSecureStoreConnections : False
Machines : {OWAS01,OWAS02}

This lets you know that your OWAS farm is configured. If you want to see if your OWAS servers are healthy, run the following command (yes – with the parentheses):

(Get-OfficeWebAppsFarm).Machines

This should return output such as the following which will let you know if one of your OWAS servers is experiencing issues:

MachineName                      Roles         HealthStatus
———–                                  —–            ————
OWAS01                                {All}          Healthy
OWAS02                               {All}          Healthy

To see if the OWAS servers are responding to web requests, you can run the following command where the name of the server is the Internal URL displayed via the Get-OfficeWebAppsFarm cmdlet from above:

 invoke-webrequest https://owaspoolint.flinchbot.com/m/metparticipant.svc/jsonAnonymous/BroadcastPing

This kicks out data seen in the following screenshot:

invoke-webrequest

You can also run the following test from your web browser by plowing the following URL into the web browser’s address bar:

https://owaspool.flinchbot.com/hosting/discovery

This should return a pile of XML to your web browser.

 
%d bloggers like this: