In late February and early March we experienced “All the Citrix XML Services configured for farm failed to respond to this XML Service transaction.” As virtual desktops have grown increasingly important in our organization, this failure was critical to resolve quickly. Unfortunately, the first time it happened, reboots of the delivery controllers and storefront servers didn’t resolve the problem and it went away on its own. When it happened the next week in early March, we decided to take an “emergency” change request to upgrade from XD7 to XD 7.1. Unfortunately, about a week and a half later the problem occurred again and we contacted our Citrix rep. However, when it occurred the third time I noticed something strange: The Citrix Monitor Service on one of the DC’s had memory usage that stuck out like a sore thumb: A whopping 3.7gb compared to the other DC having a mere 477mb.
After seeing this behavior, I configured service monitoring using System Center Operations Manager and set a memory threshold of 800MB. If you have System Center Operations Manager, you owe it to yourself to configure monitoring of critical XenDesktop services like this. Once I did, I was able to discover dramatic behavior.
The memory utilization shot from about 700mb to nearly 3gb basically in just a few moments. I received an e-mail alert from Operations Manager, so I restarted the service, trying to be ahead of the problem. However as you see on the chart, the problem occurred again in a short time, and I restarted again. This happened two more times that night before I decided to throw in the towel and simply let the service use 3gb of memory while I waited to apply the private fix I obtained from Citrix. If you are experiencing this problem you should contact Citrix to obtain the private fix. I’ve been told this issue can occur with XenDesktop 7 as well (it may well have occurred for me before, I simply was not monitoring for it until we had broker issues).
I’m pleased to report after installing the private hotfix the issue has not reoccurred. We were lucky – the memory utilization was “only” about 3-4GB, and our DC’s have very large RAM assignments because we previously saw a similar leak in XD 7 with the broker service while using Hyper-V/SCVMM (a problem we haven’t been able to duplicate since switching to VMware, oddly enough). Support agent said that other users reported that XML services failed to respond when this issue occurred, which may have been due to massive disk thrashing if the server didn’t have enough RAM to handle such a drastic increase.
After applying the patch last Tuesday I’m not entirely convinced the leak is gone, but it is certainly better in that it isn’t all at once. Notice the steady increase over the past few days since the patch application 4/23