Wednesday, September 27, 2017

Workflow, Core Service and Tridion Crashing Blues


The Tridion Workflow is basically used to audit the content before publishing out to specific environment. It is really a good module of Tridion CMS, little bit underestimated in my opinion though.

Recently, I come across a issue where Tridion was going unresponsive, when we had some 30+ workflow process ruining simultaneously. It took us some good time to figure out the root cause of the issue. My intention of writing this blog to save your investigation time in case you face the same issue.


The issue:
The Tridion gets unresponsive when there is load on the CMS server in the form of Workflow Processes.

The Root Cause:
While setting a Tridion CMS, one has to configure the Maximum Concurrent Core Service Instances allowed before Core Service Host gets throttled. If the number of Core Service Instances goes beyond that limit, the Core Service Host Service gets throttled and become unresponsive. In our case Workflow Processes were creating those Core Service Instances.

In our configuration settings, we had the value set at the default minimum value (which was 100); not the recommended value (100 x {No. of Cores}). So this minimum value 100 was getting crossed easily by the Core service Instances when the load was high and so Tridion was getting unresponsive.

These configuration settings are listed below and can be found in the files as under:

<serviceThrottling maxConcurrentSessions="100" maxConcurrentCalls="16" />

*The default value for the Maximum Concurrent Instances is { maxConcurrentSessions + maxConcurrentCalls}

1. {TridionDir}\webservices\web.config
2. {TridionDir}\bin\TcmServiceHost.exe.config

As per recommendation the default values should multiplied by no. of cores on the server.

Workflow Connection to the Issue:
When an auto External Activity in a workflow process starts, it automatically gets an instance of a “Ready Made” Core Service Client and it does not closed/disposed once the activity is finished (Proved) rather it is cleaned/closed when garbage collection happens later by the system (Yet to confirm with SDL). The more Workflow Processes result in more auto External Activities and so more concurrent Core Service Clients. Once the max allowed limit for Core Service instances is crossed, the CMS goes throttled and become unresponsive.

The Fix:
We had to increase the maximum allowed limit for load in configuration files on CMS. To do so, we increased the throttling attributes for Core Service to allow more concurrent Core Service instances as below (The configuration could be found in the files listed above already):

<serviceThrottling maxConcurrentSessions="100*{No. of Cores}" maxConcurrentCalls="16*{No. of Cores}" />

*Where {No. of Cores} is the number of cores available on the CMS Server.




I hope, this helps someone someday :)


No comments:

Post a Comment