Despite all the functional testing and stress testing you do prior to releasing
your BizTalk app into production, unexpected behavior can (and often will) happen
just the same. Production usage just winds up introducing all sorts of permutations
(including interactions with external systems) that are hard to predict earlier
in the lifecycle.
The goal, of course, is to minimize the the operational "care and feeding" that
an application requires over time. Making this happen is mostly a function
of using the application's "diagnostic surface area" (logs, counters, MOM packs,
etc.) to feed
back into each release cycle. But we also need post-mortem tools when the
host environment terminates unexpectedly or stops responding (whether that environment
is BizTalk, IIS, COM+, Sql SSIS, etc.)
While a well-designed app will be able to successfully restart and resume processing
(with full data integrity) at such a point (i.e. after the host has been terminated),
there is still operational expense that
has been injected. We want to find and eliminate these problems...
Using the Visual Studio debugger is almost never an option in production, of course.
We need the ability to capture the current state as a "dump file" and do offline
analysis.
The "Windows
Debugging Tools" are designed for this purpose (and you will often use these
during a call with Microsoft's support staff, so it is good to be familiar with
them.) The debugging tools are a pretty large subject - so here, we are just
going to cover the bare minimum required to capture a dump file for your running
BizTalk process when it appears to be hung with a large number of "Active" service
instances.
Step By Step:
- Install or xcopy the
Windows Debugging Tools to the server where BizTalk is currently
hung (or crashing unexpectedly.) It can be helpful to install in an easy location
for command line access like 'c:\debuggers'.
- From command line, run the following from the command line to get process IDs for all BizTalk hosts:
typeperf
"\BizTalk:Messaging(*)\ID Process" -sc 1 - Run 'adplus.vbs' in crash or hang mode, depending on whether the process ends unexpectedly
(crash) or has become unresponsive (hang). To generate a hang dump, your command
line might look like:
c:\debuggers\cscript adplus.vbs –hang –CTCF –p (pid from last step) –o
c:\temp - Copy the dump file to an offline location if need be.
- Set an envrionment variable called '_NT_SYMBOL_PATH' to 'srv*c:\symbols*http://msdl.microsoft.com/download/symbols'.
Alternatively, launch WinDbg.exe from the debuggers directory and use the File-'Symbol
File Path' menu. This will ensure that you are automatically downloading the correct
symbols when you analyze the crash dump.
- Start WinDbg.exe, and use File-'Open Crash Dump' to open your dump file. Then,
in the command window, use:
'.load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll'
to load managed code debugging extensions.
- In the command window, use !EEStack to get a full stack trace. Use Edit-Find
to search for your custom code method name or the name of your orchestration.
Look for patterns that indicate the cause of the hang ("hmmm, all my threads seem
to be inside Thread.Sleep. That's funny.") Use !help from the
command window to begin learning about the rest of SOS (to assist with diagnosing
managed memory leaks, etc.)
For more information on the Windows debugging tools, see
here.