BizTalk Server 2013, Performance

Can I have 100 BizTalk Server Host Instances?

On occasion you see these questions. Can I run X number of host instances? What will happen? Without diving deep into the reason why you think you need to, or the details of what is happening inside BizTalk Server when you do, I will present some results of doing that.

First, I needed a machine to play around with. I also wanted a reasonably powerful machine, so I where better to go than to Windows Azure? I selected a BizTalk Server 2013 Evaluation Edition pre-configured image.

image

Choose the Extra Large size – that’s 8 cores and 14 GB of RAM.

image

When that machine was provisioned for me, this was how the performance of it looked:

image

Now I am fully aware that viewing only the Task Managers idea of the processor is a very limited view of “performance” but I am purposefully using that view so that you, the reader, understand that this is NOT intended to be a DeepDive. It is merely an indication.

So, I configured a script I have to create 100 hosts and host instances for me, along with handlers for FILE Adapter for those hosts. But so far no ports and no traffic. This is how that looked.

image

You can see the Processor churning away at about 20% utilization, while Memory is largely unaffected.

Taking it one step further I wanted to make sure that the host instances actually had something to check for, so I created 1 receive port and 100 send ports. One send port for each of the host instances (the Send Port has that hosts handler for the FILE transport).

image

That put the machine under a little more pressure. Obviously it is doing something more when it actually has ports configured. Processor is at ~60%. Memory again not really affected.

Remember now, this is all just doing nothing at this point. Let’s see what happens if we actually do something:

image

That piece of the graph marked in blue above, that’s when I dropped a file into the receive port and receive location that all the 100 send ports (one for each of the hosts) were subscribed on. It went to 100% quickly. All files went through quickly though, and the event didn’t last long enough for BizTalk to start throttling.

So what about the Memory though. Is that really not affected? How much does a BizTalk instance use and wont 100 of them make an impact? Well, it turns out that each host instance will, at this particular point, only use about 20 MB.

image

But ok, 100 host instance is a lot. What about 50? Still the same config as above, but only 50 host instances are started.

image

A bit jagged, but still, running only 50 takes Processor utilization down from 60% to 20%. Now what if we send something through?

image

A short spike when sending the document through to its 50 subscribers.

Taking a little bit of a deeper look at that spike we can see that SQL Server is the main contributor to that spike.

image

Hmmm. Ok, “But” says the customer, I want all of this to be Low Latency. 50 ms polling. Do it!

image

CPU goes up from ~20% to a little under 40%. But it also changes characteristics. When it before was jagged, it now becomes more or less a straight line. The processor does not get to rest, and it does not get spikes in the same way.

Unless of course you send a message through, in which case it does again spike.

image

But it’s just a very short spike. Nothing at all in the way that you can or cannot say that this would be a bottleneck based on this simple test. I have not done any extended tests to see what the MST would be for this machine.

What if we raise the number of host instances just slightly? To 75.

image

See that marked point in time above? That’s where I enabled the host instances. Proc goes to about 60%. 100 again you say? Let’s try it…

image ¨

Again a visible increase in power needed. And the proc now at about 70% and pretty flat.

image

Sending a message through again spikes it.

image

And SQL is the thing that is grabbing most of that processing power.

image

So there you go. That’s what would happen if you run 100 hosts and 100 host instances on a single machine, and if you put them all to poll the database at 50 ms.

This was done on a Windows Azure Virtual Machine with the BizTalk Server Evaluation image in its default configuration. I did nothing to it. No updates, no tuning, no alterations. I know for certain that I can improve the performance of what we see above.

You can draw your own conclusion from the above. My own conclusion is “in-conclusive” ;). That is – I can see that running 100 host instances with 50ms polling on a machine where both BizTalk and SQL share the same machine and the machine is not optimized does not bring down the machine by the share volume of polling alone. However when running even simple traffic through we hit the roof. If this load would be placed on a distributed environment, with SQL and BizTalk on separate machines and SQL with a more optimized storage architecture etc, BizTalk with other configuration such as Global Tracking disabled etc, I should think that the scenario is doable.

I would however highly question why you think you need 100 hosts and 100 host instances. There are a lot of functionality in BizTalk, for example SSO Affiliate Applications, that solve some of the reasons why you would think you need that many. My recommendation is certainly not to go there unless absolutely necessary.

HTH,
/Johan

BizTalk, Performance

The rogue agent that brought BizTalk to its knees

To help others that might find themselves in a similar situation I am posting this odd experience we had with a BizTalk environment during the fall of 2011.

We had a pretty standard setup with good hardware to back it up all the way, set up after best practices. We were using the BizTalk Benchmark Wizard (BBW) to benchmark our environment and were comming up short at around 70 msg/s.

We should have had values that were around 900 msg/s. Overall, from scrutinizing the performance logs using Performance Analysis of Logs (PAL) as well as our own best judgement, we at first couldn’t find anything alarming. Processor, Memory, disk, network etc. All good. We also ran things like the BizTalk Best Practice Analyzer (BizTalk BPA), the MessageBoxViewer tool (MBV), the Monitor BizTalk Server SQL Server Agent job, but it all came back looking good. The environment just seemed… slow.

As it turns out the processor was especially interesting knowing what turned out to be the final finding. The processors (two of them per server each of them with 6 cores per processor) was on an average very low, but as it turns out there was one process that was taking the equivalent of 1 full core of power (its Process % Processor time was at 100), but since it didn’t stay on one core it was hard to spot the problem. PAL doesn’t have an alert for this, and finding the one process and performance counter among all of them is not so easy.

The process was the “HP Insight Server Agents” (cqmgserv.exe). The theory goes that as it was failing, recovering and retrying it was pumping the machine full of events and clogging up the underlying bus.

The closest we got to a match in the form of a support document from HP was this. Once the service was disabled the tests ran as expected att around 1000 msg/s. Later the service was updated to a newer version and started again without causing the same issues.

 

The purpose of this post is not to lay the blame on HP’s door but instead to enlighten readers that similar situations can occur and to highlight the value of a tool like BBW, since without it this exception would have likely never got caught and this server would have gone into production delivering much less value on the investment than it should.

HTH
/Johan

BizTalk, PAL, Performance

PAL not working with restricted account

Just thought I’d post this since I couldn’t find a concise hit myself when searching for it. I’m using the PAL (Performance Analysis of Logs) tool in a restricted environment where my default user isn’t an administrator. In certain scenarios this will cause PAL to fail. The definition of certain in my case is when the LogParser CSVInMaxRowSize registry key is too low, causing PAL to want to write a new value, on line 173: WshShell.RegWrite “HKLMSOFTWAREMicrosoftLog ParserCSVInMaxRowSize”, iRowSize, “REG_DWORD”. The error message that PAL will display in its Command window is: C:Program FilesPALPAL v1.3.3PAL.vbs(173, 9) WshShell.RegWrite: Invalid root in registry key “HKLMSOFTWAREMicrosoftLog ParserCSVInMaxRowSize”.


I “solved” this by running PAL (the call to CScript that is) as an administrative user, since I had that option. I guess smaller (less counters collected) logfiles than mine would not get this error, as would situations where the LogParser maximum row size is already set to a large enough value.

BAM, BizTalk, Monitoring, Performance, Readings

BAM Tracking and Failed Messages, and a new issue of BizTalk HotRod

Mikael Håkansson has a post called How to Replace Tracking with BAM in BizTalk that features a performance comparison he made for disabling global tracking in BizTalk Server 2006 and how that would look if you replaced that with BAM – in hard figures. He also posts a sample solution and talks about concepts such as activities and tracking profiles. The post mentions that the approach is meant for tracking successful messages and he suggests (as an example) the use of a WMI service to catch suspended messages. A concept that he leaves out is the tracking of failed messages using BAM and failed message routing. As a fluke, at roughly the same time his post was published there was a white paper released at MSDN that describes the process of creating an activity, tying that to a tracking profile by connecting it to relevant context properties and deploying it for a failed message routing scenario, see How to Track Failed Messages in BAM. It’s a very basic step-by-step article. Now…I am not taking a stance to say that failed message routing is the way to go. There are many considerations to take into account before determining to opt for that or for allowing messages to get suspended. I just wanted to post this to tie these two articles together since I think they are both good reads and gives you a view of what is required to replace tracking in BizTalk with BAM, for both successful and failed messages. And what the performance benefits might be for your solution.


Also check out the new issue of BizTalk HotRod that (among other things) also discusses Failed Message Routing and how to log these message, but does so in the context of the ESB Guidance.

BizTalk, Functoids, Performance

Caching database functoid and throttling

In case you haven’t seen the Blogical.Shared.Functoids.ExecuteQuery, and the feature post at Mikael Håkanssons blog, it’s a cache enabled database lookup functoid – a revised version of a functoid that we have successfully used on previous projects, available for download on codeplex. Caching can greatly increase the performance of any solution. However BizTalk caching, as with all caching, has to be done wisely and with moderation. So be sure to test (and monitor) thoroughly for memory saturation so that you don’t hit throttling limits effectively bringing your BizTalk solution to a stand still. A good read that exemplifies the latter is this recent post by Yossi Dahan. Keep it in mind when using the functoid.