There are several QFEs for BizTalk 2004 that you may want to know about, in
case you encounter the situations described below.
Situation: You attempt to create or edit Receive or Send ports
within the BizTalk Explorer in Visual Studio, and CPU consumption hits
100%. Memory consumption climbs until the IDE crashes with an
OutOfMemoryException.
Fix: Ask PSS for the hotfix associated with KB870619 (also
known as hotfix 1185)
Situation: You have a CDATA section in an inbound xml document
within an orchestration. The CDATA section contains flat file data, with
vital trailing (or leading) whitespace. After you execute a transform in
your orchestration, the CDATA designation is stripped (although the flat data
is still there) - and the leading/trailing whitespace is now lost.
Curiously, using the "Test Map" feature (by right-clicking on a map in the
solution explorer) doesn't exhibit this behavior.
Fix: Ask PSS for the hotfix associated with KB841563
Situation: You have a scope shape, and scope timeouts aren't
working as expected. Specifically, for blocking calls in an expression
shape within the scope (like a DCOM call, etc.) where the timeout is being
exceeded, you see 100% CPU consumption in the BizTalk service and the
orchestration never terminates.
Fix: Ask PSS for the hotfix associated with KB811250.
(Note: this problem may have been addressed in the fix rollup released in April
- I'm not sure what the ordering was here.)
(Update: See the latest on the Deployment Framework
here.)
Addressing a few comments/questions that have appeared -
- Bootstrapping your binding file: It was pointed out that a bootstrapping process is required for your initial BizTalk Orchestration binding file. This is quite true - you either need to manually deploy your orchestration(s) and associated assemblies, and use the BizTalk Deployment Wizard to create a binding file, or you need to create a binding file by hand (using a previous file as an example...)
- Automatic maintenance of your binding file: Hermo Terblanche would like the binding file to be maintained automatically (as part of the build process.) One could imagine automatically “refreshing” this file using BTSDeploy at build time. However, there is a question of what the source of truth should be for your bindings - I would argue it should be the binding file, not the current configuration of the server. Others might disagree, at least during development.
- Rule deployment: Chris Delaney is looking for a method of deploying rules. There doesn't appear to be a command line tool for this (that you could call from NAnt), but it looks somewhat trivial to write. See \program files\microsoft biztalk server 2004\sdk\samples\business rules\business rules hello world2\HelloWorld2.cs - specifically the LoadFromFile and DeployRuleSet methods.
(Update: See the latest on the Deployment Framework
here.)
I realized today that in the "deploy.orchestrations" target of the NAnt build
file (discussed
here) there is a bug in the ordering of operations. The original
file imported the binding file prior to deploying the orchestration - which
doesn't work for a "first time" deployment.
In addition, Hermo Terblanche made a good point (in comments) regarding
piplines which are used by Send/Receive ports - i.e. the ports need to be
removed prior to removing/updating the corresponding pipeline assembly.
Therefore, an additional dependency has been added to the "undeploy.piplines"
target - namely, "remove.ports". This target will remove exactly one Send
and one Receive port, the names of which are derived from the binding
file. This section of the build file will have to be customized for your
purposes, just like the names and ordering of orchestrations. Note that
in my sample, the orchestration uses “Specify Now” ports, and
removing the Send/Receive ports is probably not necessary. However, for
“Specify Later” ports (the more typical case) that use custom
pipelines, you will get the error referenced in Hermo's comment if you do not
remove the ports prior to updating the pipeline assembly.
The zip file with a new build file is still located
here. Current NAnt output from the sample will look like
this. Enjoy!
(Update: See the latest on the Deployment Framework
here.)
One of the more complicated aspects for a developer using BizTalk 2004 is the
large number of steps required during the edit/run/debug cycle. Since
your BizTalk artifacts can't be run "in place" (that is, where they were
compiled) but must instead be deployed to a local (or remote) BizTalk server,
life can get quite complicated.
If you have done much BizTalk 2004 development, you know the routine quite
well at this point. If you have orchestrations, schemas, transforms,
pipelines, and (say) C# components all partitioned into separate
assemblies - and you have a number of orchestrations with "Call/Start
Orchestration" shape dependencies that introduce specific start/stop-ordering -
you can spend a lot of time doing the whole
stop/unenlist/undeploy/redeploy/enlist/start/bounce BizTalk routine.
Worse, you might not get it right - which can lead to hours spent debugging
problems (over the course of your project) that don't really exist.
To alleviate this problem, it can be quite helpful to use a tool like
NAnt to coordinate the "update my BizTalk server" process that must
occur after each build. (NAnt is a large topic - suffice to say it is an
Xml-driven software build system.) As long as your NAnt build file (and
BizTalk bindings file) are kept up to date, the whole process can be reduced
to:
-
Comple your solution (which might have multiple orchestrations, schemas,
transforms, pipelines, and external components in separate assemblies)
-
Choose "Deploy to BizTalk" from your Tools menu.
-
Wait 60 seconds or so, enjoying the feeling that you have a consistent &
reliable deployment mechanism.
In addition, you can of course use NAnt to kick off full unattended builds for
nightly builds or continuous integration systems (like
Cruise Control). Since Visual Studio.NET is the
only supported way to build BizTalk projects - despite that tempting
XSharpP.exe file sitting in your BizTalk installation directory - this part of
your NAnt build file must defer to calling devenv.exe in command-line
fashion. (A short "getting started" for NAnt & directions for using
with a VS external tool can be found here.)
So, how about a sample that shows using NAnt to coordinate a reasonably complex
deployment of inter-related BizTalk projects? Available
for download is a zip of a BizTalk solution that contains the following
projects:
-
BizTalkSample.Components (a C# project, with a class called from
orchestration)
-
BizTalkSample.Orchestrations (that contains two orchestrations, one which
calls the other via a Call Orchestration shape)
-
BizTalkSample.Pipelines (a send and receive pipeline, which are not needed
but included to illustrate the deployment aspects)
-
BizTalkSample.Schemas (which includes two schemas)
-
BizTalkSample.Transforms (which contains a single transform used by one of
the orchestrations)
The item I hope you will find most interesting, however, is a
file called BizTalkSample.sln.build. This is a NAnt
build file that has a deployment target which
captures the following dependency tree (only partially blown out here, for
brevity - but this no doubt seems familiar to you if you've been using Biztalk
2004 for awhile…)
Deploy
Deploy Schemas
Undeploy Schemas
Undeploy Orchestrations
Unenlist Orchestrations
Stop Orchestration
Undeploy Transforms
Undeploy Orchestrations
Deploy Components
Deploy Pipelines
Undeploy Pipelines
Undeploy Orchestrations
Deploy Transforms
Undeploy Transforms
Undeploy Orchestrations
Deploy Orchestrations
Bounce BizTalk
Perhaps more illustrative is the output of the deployment itself - a sample of
which can be seen here. Note that the
NAnt script relies heavily on the EnlistOrch.vbs and StopOrch.vbs script files
that ship as samples with BizTalk 2004, as well as the BTSDeploy command line
utility.
The BizTalkSample.sln.build script included in the download should
represent a BizTalk project organization of sufficient complexity to act as a
template for your own projects. You will want to maintain the list (and
correct ordering!) of the names of your orchestrations within the build targets
for "deploy.orchestrations" and "unenlist.orchestrations" (this isn't
deriveable automatically…)
Note that the build script relies on a set of naming conventions that will be
evident once you spend some time with it - namely, that directory names
correspond to assembly names. In addition, the standard "Development" and
"Deployment" configurations in BizTalk projects have been replaced with "Debug"
and "Release", in order to not create inconsistencies with standard .NET
projects (and allow for one "Configuration" property within the build
script.) This replacement was accomplished with a file-level
search/replace.
There is probably room for more sophistication in this build file. It
takes a somewhat pessimistic view of what must happen for each
edit/run/debug/deploy cycle, but I've found that despite the 60 seconds spent
executing the script, your net productivity gain will be quite high given the
time you won't waste trying to figure out what aspect of your deployment isn't
correct. Leave comments with suggestions for improvements, if you like.
In the latest round of documentation that was released for BizTalk 2004 there is an "orchestration operator" defined that was not previously documented: 'succeeded()'
The documentation states that this operator can be used to determine the outcome of a transactional scope or orchestration. When might this operator be needed?
Well, it turns out that the orchestration compiler has some interesting rules about what you can do in an exception handler that might not be entirely intuitive at first (though as you reflect on analogies to C# or other exception-enabled languages, it begins to make sense.)
Suppose that you have defined a Request/Response port as the means of interacting with an orchestration, and you want to ensure that some response is generated regardless of the failure conditions you encounter. Your first attempt might look like this (I know mine did…) Stretch this JPG out to full size to see it clearly (IE will shrink it.)
This will generate a compiler error:
error X2162: must receive before sending a fault message on an implemented port
What is going on here? It sure seems as if we have received a message already - we did it in the Rcv_SomeDoc shape. However, we have the Snd_ResponseDoc shape inside of Scope_WorkThatMightFault, and the orchestration compiler is assuming that we might have already executed that shape prior to the catch block executing (i.e. prior to an exception being raised.) A Request/Response port must only have one response for any given request…and our Snd_FaultDoc shape has the potential to violate this rule. It sure would be nice if X2162 could be more explanatory in this regard…
How do we overcome this? It isn't terribly obvious…We must wrap the Snd_ResponseDoc shape in an (additional) transactional scope, and check in our catch block to ensure that the associated transaction did not succeed before performing Snd_FaultDoc. See this diagram (Acrobat required).
What we are doing here is structuring the flow such that exactly one response will be sent for the original Rcv_SomeDoc shape. The way we do this is to use a Decide shape, with an expression such as "!succeeded(Transaction_SndResponse)" in the rule branch. The Snd_FaultDoc will be in the 'true' side of the branch (i.e. we did not successfully perform Snd_ResponseDoc), while the 'false' side will likely be empty.
This is a pretty subtle bit of enforcement that the orchestration compiler is performing. It is somewhat analogous to a typical language compiler ensuring that all code paths have a return value for non-void functions or methods. And, of course, even though it is not enforced, it is certainly the case that 'catch' and 'finally' blocks in standard languages often have to be aware of what has or hasn't taken place in the associated 'try' block. The orchestration compiler (apparently) just has some well-defined & strict rules it wants orchestrations to adhere to (such as "exactly one response for each request emanating from a Request/Response port".)
There is a somewhat similar case that is described briefly in the BizTalk documentation. Imagine we wish to make reference to a message or variable in our 'catch' block that was initialized within the associated scope. In this case, the orchestration compiler will assume that we might not have gotten around to initializing that variable/message prior to the exception being thrown - and a compiler error will be generated as a result: error X2109: use of unassigned local variable 'Variable_Blah'
In this case, we can wrap the portion of the scope's work that is responsible for initializing the variable/message of interest in an (additional) transactional scope (i.e. "Scope_InitWork"), and we can use a Decide shape with an expression such as "succeeded(Transaction_InitWork)" in the rule branch. This will allow the orchestration to compile…
Wow! One might start to agree with Charles' boss...
(See
here for the current version of the naming conventions!)
One of the primary benefits of the BizTalk Orchestration model is the great
transparency you can get when a software implementation is pictorial.
Regardless of how well a developer comments code, there will always be a
need to maintain a separate set of artifacts (UML diagrams, Visio diagrams with
random shapes of your choosing, prose documents, whiteboard discussions, etc.)
that are used to convey what the code is actually doing especially when working
with a business audience. This is true if for no other reason than that a
discussion of interesting functionality will often take place at a different
level of granularity than raw code can support
Round-trip engineering tools - that attempt to keep code in sync with diagrams
often seem to suffer from a lack of fidelity that renders them ineffective.
With BizTalk Orchestration, the diagram is the implementation (at least
at a particular level) of a piece of functionality. Yes, you can
disappear into components and lose sight of what might happen. Yes, there
is a code representation (xlang/s) underneath the orchestration but it seems to
be completely isomorphic with the diagram.
So the opportunity exists to use an orchestration diagram in several
interesting ways within a project lifecycle:
-
As a way to capture an initial high-level design, using all the orchestration
shapes as intended but not yet bothering with real maps and schemas.
Stubbing out schemas (so you can declare messages and variables to be of proper
types) and maps will allow you to flesh out the orchestration diagram(s) quite
a bit, using the compiler as just a way to check for consistency. All of
the external system interactions, communication patterns, decision points,
parallel vs. joined flows, etc. can be represented at this point in a shell
orchestration.
-
As a way to gain consensus with the development team & business sponsor
about whether the right functionality is indeed going to be built. The
high level design just described is a great tool for this discussion. Put
your orchestration(s) up on a wall with a projector and do a walk-through with
as many of the project stakeholders as makes sense. Or use a tool like
CutePDF
to print the orchestration as a PDF to send around via email. (Of course,
once Microsoft ships the Visio add-on for BizTalk 2004 orchestrations, this
will represent another option for non-VS.NET users. This has the added
benefit of allowing you to exclude what you might consider to be lower-level
detail by setting the Report to Analyst switch on various orchestration shapes
to False.)
-
As a way to estimate work. The various shapes in your initial
orchestration can often represent reasonable granularity for time estimates.
-
And finally, as a way to guide project work...Rather than starting with the
entire orchestration that you created to support steps 1-3, you might find it
easier to create a new orchestration that represents the path(s) you are
tackling at a particular point. You can cut/paste portions of that
original orchestration or simply use it as a reference for what comes next it
serves as your outline.
To help realize some of these benefits, naming conventions within an
orchestration are quite important
While the naming conventions are good practice for variables, Messages,
Multi-Part types, etc. they are even more import for the workflow shapes.
The goal is to ensure that the intent of each shape is clear, and that the text
associated with the shape conveys as much as possible given the space
constraints. In this way, a non-technical audience will be able to use
the orchestration as documentation.
(See
here for the current version of the naming conventions.)
Respond with comments & the document will remain updated per your feedback!
I intend to cover some more foundational material for
BizTalk 2004 in the future, but today I wanted to cover an issue that at
least some people will run into fairly quickly when beginning to use the
product.
There are times when it is desirable to work with multiple XML schemas that
specify the same target namespace, and which specify different definitions for
the same element.
For instance, you may wish to have a lax schema when a document lands on your
doorstep initially - but further into the processing of that document (along a
particular path) you may wish to validate against a stricter schema. Or,
you may have a situation where you have what is arguably an envelope structure
which can't be cleanly stripped off (for a variety of reasons) - leaving you
with documents which might look quite different, but have the same target
namespace and element usage.
BizTalk 2004, in general, wants to see one schema deployed to the BizTalk
management database for any given combination of target namespace and element
declaration. If you deploy two schemas with target namespace
http://MyNamespace and element declaration MyRoot, and then attempt to receive
a MyRoot-rooted document through a receive port using the default Xml Receive
pipeline, you will receive an error from BizTalk like this one:
There was a failure executing the receive pipeline…Source:"XML
Disassembler"…
Reason: The disassembler cannot retrieve the document specification using this
type: "http://MyNamespace#MyRoot". Either the schema is not deployed correctly,
or more than one schema is deployed for the same message type.
To overcome this, you can use custom BizTalk Pipelines on the send and receive
ports that will be dealing with schemas that are subject to the
ambiguity. Within a pipeline, you can restrict the set of available
schemas from "everything that is deployed" down to the schema(s) that you are
interested in.
Specifically, for receive ports, you can add a new item to your project (a
"Receive Pipeline"), and add the default Disassembler and Party Resolution
pipeline components. For the Disassembler component, edit the "Document
Schemas" collection and add the particular schema you are interested in.
See this picture for an illustration.
Likewise, for send ports, you can add a "Send Pipeline" item to your project,
and add the default Assembler pipeline component. Again, specify the
schema you are interested in with the "Document Schemas" collection of the
Assembler.
For each of the Send or Receive ports that will be trafficking in these
messages, specify your newly created pipelines - instead of the default Xml
Receive/Send pipelines!
Now for the gotchya! (you knew there had to be one, right?)
BizTalk 2004 will require that the assembly containing your custom pipeline(s)
is deployed to the GAC (along with every other BizTalk project assembly.)
When components loaded from the GAC wish to dynamically load other types &
assemblies, they must do so with fully qualified assembly names. Applying
this to our current discussion means this: BizTalk pipelines must have fully
qualified information for the assembly that contains the schemas you configure
within pipeline components.
If the pipeline component lives in the same BizTalk project as the schemas you
are attempting to reference, the property designer (when editing the "Document
Schemas" collection) will only be populated with a namespace-qualified type
name - the fully qualified assembly name will be missing. At run time,
the schemas will not be found…and the behavior at run time will appear
completely unchanged from the case where no custom pipeline was specified at
all.
To work around this, simply put your pipelines in a different
project/assembly than the project containing the schemas you need to reference
in the designer.
A bug you say? Certainly it would be nice if the designer warned
you, and it deserves a KB article soon…But keep in mind what is
happening: A GAC-destined component (a pipeline Disassembler) is
providing designer support which allows you to select another component (a
schema) which will be loaded dynamically at run time...It raises an interesing
problem that goes beyond just BizTalk 2004.
Whenever a component that is destined for the GAC has IDE designer support which
in turn allows you to select a type for a "plug in" component that will be
loaded by the "host" component dynamically at run time (without using
Assembly.LoadFrom semantics) - you will run into this issue. Why?
Because if you select a type from the same project, the fully
qualified name can't be reliably known. After all, the project might not
have been compiled yet, or the fully-qualified name might be set up to change
with each compilation (gasp!) via 1.0.*.* versioning policy. If you use
such a designer to select a type in a distinct assembly, the fully qualified
name can indeed be known -and shame on the component author if the versioning
policy isn't sane.
Of course, being deployed into the GAC raises all kinds of thorny issues, but
this one was a bit subtle...
I agree wholeheartedly with
Ian Griffith's response to Sam Gentile's recent
post…
I went throught this exercise with a client last summer - it was fairly long
and drawn out. The organization had always had a physically distinct
middle tier that was responsible for data access - based on the belief that
scalability and security would both be improved.
For the application in question at the time, DataSets were being returned from
middle-tier objects - marshaled via .NET Remoting (binary/tcp). Now,
DataSets don't have a very compact representation when serialized, as has been
described here.
Whether due to the serialization format or other aspects of the serialization
process, our performance tests indicated that the time spent
serializing/deserializing DataSets imposed a tremendous CPU tax on both the web
server and the application server - even after implementing
techniques to address it. The throughput (requests per
second) on the web servers & request latency also suffered dramatically.
We conducted extensive tests using ACT (Application Center Test) to drive 40
virtual users with no dwell time (i.e. each driver thread executed another
request as soon as the last returned.) Two web servers were used, and in
the remoted middle-tier case, we had a single middle-tier server. A
single Sql Server was used. All servers were "slightly aged"
four-processor machines. The read operations brought back large DataSets,
whereas the write operations were fairly simple by comparison - the workload
was intended to simulate real user profiles.
|
Configuration
|
Operation
|
Requests Per Second (RPS)
|
Latency
|
|
Remoted middle tier
|
Read
|
28
|
1400 msec
|
|
Local middle tier
|
Read
|
115
|
322 msec
|
|
Remoted middle tier
|
Write
|
14
|
2791 msec
|
|
Local middle tier
|
Write
|
200
|
193 msec
|
Notice that not only was the local middle tier (non-remoted case) able to
sustain a much higher throughput, but it had far less latency as well.
CPU utilization indicated we would need one physical middle tier server for
every web server. (Of course, when comparing raw performance of "physical
middle tier vs. not", you always need to ask "what would happen if I deployed
these middle tier servers as front-end web servers instead? In practice,
you don't even need to go that far - just getting rid of the middle tier
servers altogther will often improve performance…)
So, after evaluating performance, we decided we wanted to push
for a local middle tier, and allow (gasp) access to the database from the
DMZ. This led us to a long and serious discussion of the security
implications, and our reasoning followed Ian's quite closely. The
Threats and Countermeasures text was a very valuable resource. We
certainly avoided the use of all dynamic Sql (in favor of using stored
procedures), used a low privilege (Windows) account to access Sql Server (that
only had access to stored procedures), used strongly-typed (SqlParameter)
parameters for all database calls (that are type/length checked), avoided
storing connection strings in the clear via the .NET config encryption
mechanism, used non-standard ports for Sql Server, etc. The quantity of
advice to digest is large indeed - but necessary regardless of whether you are
deployed with a physical middle tier or not...
Two closing thoughts on this topic….First, Martin Fowler sums up this whole
topic well in his book Patterns of Enterprise Application Architecture (Chapter
7) on the topic of “Errant Architectures” - excerpted in a
SD magazine article. After introducing the topic, he says:
"…Hence, we get to my First Law of Distributed Object Design: Don’t
distribute your objects! How, then, do you effectively use multiple processors
[servers]? In most cases, the way to go is clustering [of more front-end web
servers]. Put all the classes into a single process and then run multiple
copies of that process on the various nodes. That way, each process uses local
calls to get the job done and thus does things faster. You can also use
fine-grained interfaces for all the classes within the process and thus get
better maintainability with a simpler programming model. …All things being
equal, it’s best to run the Web and application servers in a single process—but
all things aren’t always equal. "
Second, does anyone remember the nile.com
benchmarks that DocuLabs conducted? I can't find the
exact iteration of the benchmark I'm looking for, but they found that ISAPI
components calling local COM+ components on a single Compaq 8500 (8-way) could
achieve 3000 requests per second, vs. just 500 once the COM+ components were
placed on a separate Compaq 8500. Unreal. (And by the way, with
those numbers, what the heck was wrong the ASP.NET code above? Oh well,
nile.com WAS a benchmark after all…)
Steve
Maine's post on "single-parameter service interfaces" - and the
assertion that such interfaces are more in keeping with the SOA theme - got me
thinking just a bit about the real relationship between [WebMethod] methods and
the associated WSDL.
Recall that WSDL port types consist
of operations that define (at most) an input message and an output
message. WSDL messages consist of "parts" - and for
literal formatting with “wrapped“ parameter style (the default for
ASMX), you will have a single "part". The part in turn refers to an
XML schema-defined element. (Here is a concrete
example to look at.)
Notice that at this point, we haven't said anything about whether a) we have
multiple parameters to our service interface, with a mapping between those
parameters and child elements in the WSDL-referenced schema or b) we have a
single (xml document) parameter to our service interface that is expected to
conform to the WSDL-referenced schema (or for that matter, a single parameter
consisting of a serializable class.)
But the WSDL operation definition is quite clear - there is only one
message associated with each of the potential directions (input, output, and
fault.) The operation definition doesn't care whether the underlying code
that supplies implementation shreds the associated schema types to and from
method parameters!
And in an important way, it doesn’t matter. From the client's
perspective, I can submit an xml document (or serialized object) to an
operation defined on a port type, as long as that xml document conforms to the
associated schema. The client isn't forced to take a parameter-oriented
view of a web service interface regardless of whether or not the server
implementation is "parameterized". Likewise, from the server's
perspective, a web service interface could be implemented with consumption of
(compliant) xml documents - without forcing that view on the client (who might
very well prefer a parameter-style proxy to be generated from WSDL.)
This point remains true even if I was using “bare“ parameter style (i.e.
if I had multiple message parts) or if I was using RPC formatting (i.e. if I
had a parent element for my parameters named after the web service method.)
Of course, your philosophical bent will lead you to either the
WSDL-first path (for the document view) or the ASMX
path-of-least-resistance (for the parameter view.)
And, handling the open content case that
Steve discussed is only possible with a document-oriented
approach. (XmlAnyElementAttribute
could assist with the case where you want to rely on serialized/deserialized
objects to stand in for raw xml documents.)
Note that the parameterized view exhibits some aspects of being a
leaky abstraction. SOAP 1.1 allows for missing values
("Applications MAY process requests with missing parameters but also MAY return
a fault.") - and so does the XmlSerializer. This means that you can
wind up with malformed requests, and not know it. (Is your service really
going to be ok with treating missing parameters the same as freshly initialized
data types?) Since ASMX offers no schema validation by default, you
really need to rely on a
schema-validation SoapExtension to solve this problem.
|