Recently I wrote about an internal Lync Server CMS replication issue we were experiencing. While we were fixing that issue, we also had a report come in that our Response Groups were not working properly. Hooray! When it rains, it pours I suppose.
We use Lync RGS for simple Hunt Groups – like Tech Support, and Purchasing, and Sales, etc. – all inbound phone calls (other than private DIDs) come into Lync, then to an Exchange Auto Attendant – and then to Response Groups. They are pretty important to us.
We are a managed services provider (MSP), so having our Tech Support Response Group Queue down is not good — really not good. Let’s see what’s going on, shall we? Warning: there are a lot of screen shots below. I apologize in advance for all the scrolling you’re about to do.
Symptom 1: When you called a response group directly, you get a fast busy.
Symptom 2: When you called our main number and Exchange AA picks up and you transferred to a Response Group, you got a “call cannot be completed” and got sent back to the AA.
Symptom 3: When you called a Response Group while tracing S4 in OCSLogger/Snooper – you get a nice generic 26017 entry
- ms-diagnostics-public: 26017;reason=”Internal server error.”
Symptom 4: When you restarted the Response Group Service (RTCRGS), it consistently pushed errors related to WorkflowRuntime and Contact Objects.
Event 31028: LS Response Group Service – WorkflowRuntime / CommitWorkBatch
Event 31067: LS Response Group Service – Match Making / Contact Object
And another curious “Information” log entry in the middle of those…
Event 31035: LS Response Group Service – Application Endpoint
That doesn’t make sense. At all.
I started by focusing on the 31067 error and begun looking at the RGS Application Endpoints in the Lync Shell.
Then I verified those Application Contact Objects existed in ADSI Edit – and they did.
So that’s annoying.
Let’s go back to the 31035 Event. I traced all the RGS items in OCSLogger/Snooper while I restarted the server and found this curious log:
TL_ERROR(TF_COMPONENT) 1C38.2C98::11/16/2014-02:45:12.980.00000457 (RgsHostingFramework,AepManager.StartConnectingAep:aepmanager.cs
(1144))(0000000002D826E8)[Exit] – Could not establish AEP AEP address=[sip:RtcApplicationemail@example.com], exception=[System.InvalidOperationException: An application endpoint with the same uri already exists on the CollaborationPlatform.
Okay. That’s the RGS Presence Watcher. Interesting.
Another interesting log entry just above it:
TL_ERROR(TF_COMPONENT) 1C38.216C::11/16/2014-02:44:11.371.00000238 (RgsHostingFramework,AepManager.StartConnectingAep:aepmanager.cs
(1144))(0000000002D826E8)[Exit] – Could not establish AEP AEP address=[sip:RtcApplicationfirstname.lastname@example.org], exception=[System.InvalidOperationException: The requested Performance Counter is not a custom counter, it has to be initialized as ReadOnly.
Huh…Performance Counters. I reached out to some friends and they asked me to look at the Registry Entry for the Windows Workflow Foundation entries.
Specifically, I was asked about HKLM – System -> CurrentControlSet -> Services -> WWF 3 and WWF 4 like below:
You’ll notice that under PerIniFile it references PerfCounters.ini. That’s what it says now. Before, it said “PerfCounters_d.ini” (with the underscore d) which is odd.
At my friends’ recommendation, I reloaded the performance counters in both WWF 3.0 and 4.0 folders.
And the following log entries appear:
Okay. I waited and took a deep breath. And I restarted Response Groups – RTCRGS again.
Hooray! You could see RGS Stop and Start, and the Application Endpoints were created without error.
All is well.
So, what was the problem? I guess the Response Group problem was related to Performance Counters – specifically the Windows Workflow Foundation performance counters. Reloading those fixed whatever it was that was cause the RGS Errors at the top of this blog post.