I had the pleasure of delivering a short talk at the Business Process Integration & Workflow Conference last week in Redmond. The whole conference was great, especially meeting quite a few folks in person I'd only conversed with via email. Being notified of MVP status for BizTalk on Friday was a great cap to the week!
Although the sample I presented during my talk isn't quite ready to release, the slides (on scatter/gather scenarios in BizTalk) can be downloaded here.
I worked through a problem recently with a client that really took me by surprise - because I would think that many BizTalk shops would be running into this issue regularly. So! Here goes with an explanation and a solution.
We ran into this problem initially using the MSMQ (not MSMQT) adapter with BizTalk 2004. We had roughly 10 MSMQ receive locations, as well as a few send ports that were using the loopback adapter. These were all executing in the same host.
The initial symptom was that the loopback adapter appeared to not work - messages were just not getting through! They sat in the "delivered, not consumed state" for no good reason. But we quickly reproduced the problem with just MSMQ receive locations (i.e. without the loopback adapter.)
On a single processor virtual machine, the repro looked like this: Create four MSMQ receive locations, and one MSMQ send port (with the send port subscribed to one of the receive ports, just to keep things easy.) No messages will flow through the send port at all.
To repro: This download has a binding file with receive ports/locations for local (non-tx) private queues Q1-Q4, plus a send port for local private queue NONTXQ with a filter for the first receive port. There is also a bit of VB script to put a message into a queue...If you turn off one of the receive locations for Q2-Q4, you'll find things work just fine. If you don't, then (reiterating) no messages will flow through the send port.
What was the resolution? Well, with BizTalk 2004 Service Pack 1 installed, you can create a "CLR Hosting" key under the registry service definition.
In our case, we actually had to increase these values - you should determine the values you need through testing. Consider having min worker threads equal to 7x the number of MSMQ receive locations, and max worker threads equal to 10x the number of MSMQ receive locations. (More on these numbers later...)
Does the documentation address this? Good question. If you look at the topic "Managing Multiple Receive Locations" in the MSMQ adapter documentation, you will find some reference to this. It indicates you should create a "CLR Hosting" key as described above...but no actual values are mentioned (clearly just a documentation mishap.)
But why do these have to be tweaked at all? Good question. The documentation for the MSMQ adapter has some unfortunate quotables, like:
To increase performance, Microsoft BizTalk® 2004 Adapter for MSMQ is multi-threaded. If you have many receive locations, there may not be enough threads available for all the receive locations. This prevents some of the receive locations from picking up messages.
The reality is that you really shouldn't have to starve any particular receive location because of a lack of threads...you should just wind up with increased latency. But, such is not the implementation of the MSMQ adapter (at least for BizTalk 2004.)
Some background: The MSMQ adapter has a "Batch Size" parameter and a "Serial Processing" parameter that can be set per receive location. "Batch Size" determines how many messages the adapter will attempt to read from the queue (and submit to the message box) on each iteration. "Serial Processing" determines whether one thread is engaged in the peek/get/submit activity per receive location (Serial Processing = 'true') or multiple threads (Serial Processing = 'false'). If "Serial Processing" is true, the "Batch Size" is forced to one regardless of the actual setting.
So what is the execution flow for a given receive location? The internal class MsmqReceiverEndpoint is instantiated per receive location, and when it initializes, it calls ThreadPool.QueueUserWorkItem with a reference to itself. If "Serial Processing" is false...it does this exactly seven (7) times.
What does it do with the QueueUserWorkItem callback? Well, when MsmqReceiverEndpoint.ProcessWorkItem is called, it enters into a do/while loop that doesn't exit until the endpoint (receive location) becomes invalid (i.e. the receive location is shut town.) In other words, ProcessWorkItem sits on a .NET thread pool thread - and if Serial Processing is false, it sits on seven of them. The do/while loop executes a peek on the queue (with a hard-coded 10 second timeout), and if there are messages waiting, it receives up to "Batch Size" and submits them to the message box. (It will give up attempting to receive a "Batch Size" worth of messages if the 10 second timeout is reached on any attempt within the batch receive loop - i.e. if you drop a single message on a queue, and the batch size is greater than one, expect to wait 10 seconds before further activity begins...) The behavior of consuming seven threads per queue leads to the recommendation of MinWorkerThreads = 7x MSMQ receive locations provided above.
Now, I confess - I'm not a BizTalk adapter expert. But, this design seems to be in conflict with the advice offered in "Writing Effective BizTalk Server Adapters", where it says:
Don't starve the .NET thread pool: ...While starving the .NET thread pool is a risk to all asynchronous programming in .NET, it is particularly important for the BizTalk Server adapter programmer to watch out for this. It has impacted many BizTalk Server adapters: take great care not to starve the .NET thread pool. The .NET thread pool is a limited but widely shared resource. It is very easy to write code that uses one of its threads and holds onto it for ages and in so doing blocks other work items from ever being executed....If you have multiple pieces of work to do (for example copying messages out of MQSeries into BizTalk Server), you should execute one work item (one batch of messages into BizTalk Server) and simply requeue in the thread pool if there is more work to do. What ever you do, don't sit in a while loop on the thread.
Is this fixed in BizTalk 2006? Surely it is... And, in fact, it sure seems to be in Beta 1. The design of the adapter is a bit different...First, "Serial Processing" refers to whether additional messages will be received from the queue prior to the "EndBatchComplete" event being set (downstream of IBTDTCCommitConfirm.Done.) (This part of "Serial Processing" is true for BizTalk 2004 as well, along with forcing the batch size to one.) "Serial Processing" in BizTalk 2006 does not affect how many threads will be reading from your queue - you will have just one (despite what the Beta 1 docs say...), unless you have multiple host instances in play. (That one thread using a large batch size and operating with serial processing set to 'false' - not blocking on the actual message box submission - should keep up with a fairly large message arrival rate, but multiple host instances might be needed for your particular case.)
More importantly, the ProcessWorkItem implementation returns immediately after a single peek/get/submit operation (and simply calls QueueUserWorkItem again, per the advice cited above.) (Side note: There seems to be some room in the design for the idea that you in fact woudn't return immediately if more than a threshold number of messages were received, but currently this condition is "if # of messages received > BatchSize", which won't ever happen.)
So what should I do for now with BizTalk 2004? For those using the MSMQ Adapter with BizTalk 2004...consider whether you can set "Serial Processing" equal to true. Keep in mind this forces you to a batch size of 1, so this might not work depending on your message arrival rate. If you test this configuration and find an unacceptable performance loss, consider setting the MinWorkerThreads value to 7x the number of MSMQ receive locations you are maintaining, and MaxWorkerThreads to roughly 10x (to provide breathing room.) As an alternative, spread your receive locations among multiple hosts (though avoid an over-proliferation of hosts - that has its own issues.)
And never draw any conclusions until you have performance tested at load with your final host configuration - that is, your final allocation of send handlers, receive handlers, send ports, and receive locations among your hosts! Other adapters may affect the outcome if they involve polling on the receive side, or polling on the "response" side of a solit-response send port. (If they use a thread pool thread to do their work, they can be affected by any adapter that consumes threads whether they themselves are written correctly or not!) Finally, I've heard from a gentlemen who has done extensive testing that the threading parameters above are useful/necessary when using large numbers of MSMQT receive locations as well.
Never a dull day in BizTalk land...!
I’ve been involved a bit in getting a local BizTalk user group started – and really looking forward to seeing it get off the ground! The first time we’ll meet is Thursday, September 22nd from 6:00pm – 7:30pm, at the Microsoft offices in Bloomington.
You can expect the focus to be both on present development and operational issues for BizTalk 2004, as well as early hands-on time with BizTalk 2006. Food, beverages, fellow biztalkers…gotta love it.
Unfortunately, I won't be able to make it to the PDC this year...However, a good colleague by the name of Jordan Terrell has put together this great PST file for the PDC with all the sessions as calendar appointments. (Open as a data file in Outlook, and in the Calendar view, you will be able to select a calendar corresponding to each of the conference tracks.)
In my last entry, I discussed some ways that can making working with binding files a bit easier. Here is another post in that same vein that addresses a common pain point...
Un-escaping TransportTypeData
One of the annoying things about binding files is that adapters only have a string element available to store adapter-specific information for send ports and receive locations. As a result, adapters will store escaped XML (or even "doubly escaped" xml...) This can be extremely hard to manage, especially for adapters such as MQSeries that keep quite a bit of information in this form.
To solve this problem, I introduced a new command-line tool in the most recent version of the Deployment Framework called "ElementTunnel.exe" (the source for which is in the Tools download.) This utility will take in an xml file, along with a file containing xpaths to elements that should be "encoded" or "decoded". The end result is that you can choose to manage a "master" binding file (not directly useable) and run ElementTunnel on it immediately prior to deployment. (You may also run XmlPreProcess on the same file for macro expansion! The sample in the deployment framework shows both occurring - XmlPreProcess should occur first!)
So what does this mean? An example for a single Send Port snippet: It means that, in the case of MQSeries, instead of storing and maintaining this mess:
You can store and maintain this:
Ahhh, isn't that better? Of course, similar goodness for all other adapters. And, in the clean version, you'll find it easier to place/maintain XmlPreProcess macros.
In the Deployment Framework sample, you'll see that we pass the following xpaths to ElementTunnel (along with the "master" binding file itself):
/BindingInfo/ReceivePortCollection/ReceivePort/ReceiveLocations/ReceiveLocation/
ReceiveLocationTransportTypeData/CustomProps/AdapterConfig
/BindingInfo/ReceivePortCollection/ReceivePort/ReceiveLocations/ReceiveLocation/
ReceiveLocationTransportTypeData
/BindingInfo/SendPortCollection/SendPort/*/TransportTypeData/CustomProps/AdapterConfig
/BindingInfo/SendPortCollection/SendPort/*/TransportTypeData
/BindingInfo/ReceivePortCollection/ReceivePort/ReceiveLocations/ReceiveLocation/
ReceiveLocationTransportTypeData
/BindingInfo/ReceivePortCollection/ReceivePort/ReceiveLocations/ReceiveLocation/
ReceiveLocationTransportTypeData/CustomProps/AdapterConfig
/BindingInfo/SendPortCollection/SendPort/*/TransportTypeData
/BindingInfo/SendPortCollection/SendPort/*/TransportTypeData/CustomProps/AdapterConfig
With BizTalk 2004, it can be quite helpful to eventually maintain binding files as "source code". After a solution has reached a certain point of stability (where port definitions are not changing often), many projects will use the Deployment Wizard to do one last export of the binding information -- and then maintain it by hand for any future changes (storing it in version control along with the rest of the solution.)
There are some interesting benefits that come along with this. One such benefit is the ability to use the XmlPreProcess tool to merge environment-specific elements into the binding file (like URIs, retry counts, etc.), using the SettingsFileGenerator.xls spreadsheet to assist -- as has been discussed on this blog before. Even if you are not using the Deployment Framework (which uses XmlPreProcess extensively), you should consider using XmlPreProcess as a standalone tool. The ability to easily maintain a matrix of logical names (for physical endpoints, etc.) versus "environment names" (development, QA, production, etc.) is a huge help. See the example spreadsheet below. (The Deployment Framework also shows how to extend use of this same spreadsheet to manage run-time configuration settings that are stored in the BizTalk SSO.)
Optional Deployment of Port DefinitionsOn to a bit more advanced topic: If you have a set of port definitions that you want to conditionally deploy into a given environment, you can define a true/false value within the spreadsheet and use simple "ifdef" logic in your binding file around the port definition. For instance, you might want a particular File Send Port or Receive Location to only be active in your development and test environments. To do this, define a name such as "LogInboundPODocsToFile", and set the default value to "true" - and set it to "false" in the "production" column. Mark up your binding file accordingly. See the example spreadsheet and binding file snippet below. (When XmlPreProcess is run on this binding file, the port definition will only be included for environments where the LogInboundPODocsToFile value is true.)
<!-- ifdef ${LogInboundPODocsToFile} -->
<SendPort Name="LogSalesOrderResponse_FILE" IsStatic="true" IsTwoWay="false">
<TransmitPipeline Name="SendWithDefaultNamespaceFormat" FullyQualifiedName="SendWithDefaultNamespaceFormat, XYZCo.BizTalk.Pipelines, Version=1.0.0.0, Culture=neutral, PublicKeyToken=343bd7a15fff8d6e"
Type="2" />
<PrimaryTransport>
<Address>C:\Dev\FileLog\%MessageID%.xml</Address>
<TransportType Name="FILE" Capabilities="11" ConfigurationClsid="5e49e3a6-b4fc-4077-b44c-22f34a242fdb" />
...
</SendPort>
<!-- endif -->
Why would you want to conditionally deploy ports? Like many, I have found it useful to have an additional file-based Receive Location (associated with a Receive Port bound to an orchestration) to kick off orchestrations during development - even if the actual transport used in production will be something different. In addition, binding an orchestration to a Send Port Group allows you to have an additional file-based Send Port that will create an easy log of outbound traffic. Finally, you might create a file-based Send Port that acts as an "additional" subscriber (by Receive Port Name) to your inbound messages for an easy log of inbound traffic. (And, combined with a file-based receive port, these two mechanisms give you an easy re-processing mechanism - just drag/drop in Explorer.) But you might want all of this machinery shut off in production, hence the technique we just discussed.
Macro Recursion
Another feature within XmlPreProcess is the ability to use "macro" recursion with XmlPreProcess. This means you can define a macro (logical name) such as QueueServer (with a different value for development, QA, and production, etc.) and then define additional values in the spreadsheet that build on this such as: POAckQueue = {$QueueServer}\private$\POAckQueue. This indirection can make maintaining large numbers of endpoint URIs even easier...See the example below - where POAckQueue can now appear in the "default" column (applicable to all environments.)
Multiple Environments
Note that the SettingsFileGenerator.xls spreadsheet provided with the Deployment Framework sample (and with XmlPreProcess) has room for four environments (development, QA, staging, and production.) However, you can simply add columns to manage additional environments if need be. One such use of this would be to create a column for "unit testing", where the URIs and other binding file substitutions point to resources under the control of your unit testing framework.
More to say on binding file management in another post...
Scott Colestock lives, writes, and works as an independent consultant in the Twin Cities (Minneapolis, Minnesota) area.
© Copyright 2010