Pages

Showing posts with label Patterns. Show all posts
Showing posts with label Patterns. Show all posts

Biztalk Aggregation Pattern For Large Messages

Monday, August 29, 2011

Source : MSDN Blogs Richard Seroter
There are a few instances of Aggregator patterns for BizTalk Server, but I recently had a customer who was dealing with a very large batch, and therefore the existing patterns were insufficient. So, I threw together a sample that showed how to handle very large batches in a fairly speedy manner.
In this use case, we have an XML message come into BizTalk, want to split out the batch, process each message individually, then aggregate the whole batch back together again and send a single message out. Now using existing aggregation patterns didn't work since they mostly involve building up an XML file in memory. When a batch of 1000+ message got submitted it was taking over 10 minutes to process the batch, which was way too slow.
So, my solution uses a combination of envelope de-batching, sequential convoy pattern, and file streaming. The first part of the solution was to build the XML schemas necessary. The first schema, the envelope schema, looked like this ...

Three things to note here. First, I set the <Schema> node's Envelope property to Yes. This way I won't have to set anything in my pipeline for it to process the batch. Secondly, I promoted both BatchID andCount fields. We'll see the significance in a moment. Thirdly, I set the Body XPath property of the root node to the Record node of the schema. This way, the pipeline will see that it's an envelope, and, know where to split the batch at.
The second schema is the body schema. It looks like this ...

Notice that it too has promoted properties. By doing this, the values from the master envelope schema get copied down to EACH debatched message without me doing anything special. So, setting up my schemas this way, in combination with using the standard XML receive pipeline, will cause my message to be debatched, and, push values down from the envelope to each message. Cool!
Remember that I have to rebuild the message on the way out. So, I use the file system to build this message up, instead of doing it in memory and increasing my footprint. Therefore, when I first receive the message into my orchestration I need to create this temporary file on disk (and re-create the envelope header). Likewise, when all the messages have been received and processed, I also need to write the closing envelope text. This way I have a temporary file in disk that contains the envelope, and all the processed message bodies.
So, I wrote a quick component to write the envelope header and footer to disk. It also will load the completed file and return an XML document. It looks like this ...
public void WriteBatchHeader()
{
string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";
//envelope header text
string header = "";

StreamWriter sw = new StreamWriter(docpath, true);
sw.Write(header);
sw.Flush();
sw.Close();
}

public void WriteBatchFooter()
{
string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";
//closing envelope text
string footer = "";

StreamWriter sw = new StreamWriter(docpath, true);
sw.Write(footer);
sw.Flush();
sw.Close();
}

public XmlDocument LoadBatchFile()
{
string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";

XmlDocument xmlDoc = new XmlDocument();
StreamReader sr = new StreamReader(docpath);
xmlDoc.Load(sr);
sr.Close();

return xmlDoc;
}
Alright, now I have my orchestration. As you can see below, in this process I start by:
  • Receive the first debatched message.
  • Call my helper component (code above), create the temporary file, and add the top XML structure.
  • Then I send the message out (initializing the correlation set) via FILE adapter (see below for configuration).
  • Then in the Expression shape I set the loop counter off of the promoted/distinguished Count field (which holds the number of records in the entire batch).
  • I then process each subsequent debatched message via a sequential convoy, following the previously initialized correlation set. In this example, I do no further message processing and simply send the message back out via the FILE adapter.
Once the loop is done, that means I've processed all the debatched records. So, the last half of the orchestration deals with the final processing. Here I do the following:
  • Add a 10 second delay. This was necessary because the messaging engine was still writing files to disk as the next component was trying to add the closing XML tags.
  • Call the helper component (code above) to write the XML footer to the entire message.
  • Next I construct the outbound message by setting the message variable equal to the response of the LoadBatchFile() defined above. I added trace points to see how long this took, and it was subsecond. This was to be expected even on the large file since I'm just streaming it all in.
  • Finally, send the rebuilt message back out via any adapter.

The last step was to configure the FILE send adapter being used to write this file to disk. As you can see in the picture, I set the PassThruTransmit pipeline and configured the Copy mode to Append. This caused each message sent to to this port to be added to the same outbound file. 
So, when it was all said and done, I was able to process a batch with 1000 messages in around 2 minutes on a laptop. This was significantly better than the 10+ minutes the customer was experiencing. I'm still using a sequential convoy so I can't process messages as quickly as if I was spawning lots of unique orchestrations, but, our speed improvement came from streaming files around vs. continually accessing the XML DOM. Neat stuff.
Read more ...

Biztalk Asynchronous Aggregation Pattern

Monday, August 29, 2011

Source : MSDN , Windows Server AppFabric Customer Advisory Team, Technocrati Blog

Developing a BizTalk Server solution can be challenging, and especially complex for those who are unfamiliar with it. Developing with BizTalk Server, like any software development effort is like playing chess. There are some great opening moves in chess, and there are some great patterns out there to start a solution. Besides being an outstanding communications tool, design patterns help make the design process faster. This allows solution providers to take the time to concentrate on the business implementation. More importantly, patterns help formalize the design to make it reusable. Reusability not only applies to the components themselves, but also to the stages the design must go through to morph itself into the final solution. The ability to apply a patterned repeatable solution is worth the little time spent learning formal patterns, or to even formalize your own. This entry looks at how architectural design considerations associated to BizTalk Server regarding messaging and orchestration can be applied using patterns, across all industries. The aim is to provide a technical deep-dive using BizTalk Server anchored examples to demonstrate best practices and patterns regarding parallel processing, and correlation.
The blog entry http://blogs.msdn.com/b/quocbui/archive/2009/10/16/biztalk-patterns-the-parallel-shape.aspx examines a classic example of parallel execution. It is consists of de-batching a large message file into smaller files, apply child business process executions on each of the smaller files, and then aggregating all the results into a single large result. Each subset (small file) represents an independent unit of work that does not rely or relate to other subsets, except that it belongs to the same batch (large message). Eventually each child process (the independent unit of work) will return a result message that will be collected with other results, into a collated response message to be leveraged for further processing.
This pattern is a relatively common requirement by clients and systems that interact with BizTalk Server, where a batch is sent in and they require a batch response with the success/failure result for each message. With this pattern one can still leverage the multithreaded nature of BizTalk's batch disassembly while still conforming to requirement of the client.
image
Let’s closely examine how this example works. First, it uses several capabilities of BizTalk that are natively provided – some that are well documented, and some that are not. A relatively under-utilized feature - that BizTalk Server natively provides, is the ability to de-batch a message (file splitting) from its Receive pipeline. BizTalk Server can take a large message with any number of child records (which are not dependent with one another, except for the fact that they all belong to the same batch), and split out the child records to become independent messages for a child business process to act on. This child business process can be anything – such as a BizTalk Orchestration function, or a Windows Workflow.
image[18]
BizTalk Server uses the concept of envelopes, which are BizTalk Schemas to define the parent and child data structures. These schemas are then referenced in a custom receive pipeline’s disassembler properties. This capability works on both XML and Flat file type messages.
Two schema envelopes (a parent and a child) are normally required to instruct BizTalk Server on how to split the file. However, using one schema is also possible.image[13]
To learn how to split a large file (also known as as de-batching) into smaller files, please refer to the BizTalk Server 2006 samples  Split File Pipelineexample. Even though the sample is based on BizTalk Server 2006 (R1) – it remains relevant for BizTalk Server 2006 R2, BizTalk Server 2009, and BizTalk Server 2010. 
Note: There are several examples on how to de-batch messages, such as de-batching from SQL Server, as communicated on Rahul Garg’s blog on de-batching.
image[171]
After the large file has been split into smaller child record messages, child business processes can simultaneously work on each of these smaller files, which will eventually create child result messages. These child result messages can then be picked up by BizTalk Server and published to the BizTalk MessageBox database.
image[99]Once the messages are placed into the MessageBox database, they can be moved into a temporary repository, such as a SQL Server database table.

Messages collected into the database repository, can be collated, and even sorted by a particular identifier, such as the parent (the original large batched message) GUID.
image[179]
To aggregate the results back to the same batch, correlation has to be used. BizTalk Orchestration provides the ability to correlate. Aggregation with Orchestration is relatively easy to develop. Using Orchestration, however, may require more hardware resources (memory, CPU) than necessary. One Orchestration instantiation will be required per batch (so if there’s 1000 large batch messages, then there will be 1000 Orchestration instances). Orchestrations linger until the last result message has arrived (or it can time out from a planned exception handling). There is an alternative way to aggregate, without Orchestration, but correlation is still necessary and required to collate all the results into the same batch. The trick is to correlate the result messages without Orchestration.
image
Let’s examine how correlation without Orchestration can be achieved.
Paolo Salvatori, in his first blog entry http://blogs.msdn.com/paolos/archive/2008/07/21/msdn-blogs.aspx describes how to correlate without Orchestration, by using a two-way receive port. This is important to note, because the two-way receive port provides key information that can be leveraged, by promoting them to the context property bag.
image[121]
These key properties are
  • EpmRRCorrelationID
  • ReqRespTransmitPipelineID
  • IsRequestResponse
  • Correlation Token
  • RouteDirectToTP
image[129]
Correlation without Orchestration is as easy as just promoting the right information to the context property bag. Of course, this information needs to be persisted, maintained, and returned with the result messages. However, this method only works with a two-way receive port. What about from a one-way receive port, such as from a file directory or MSMQ?
That is still possible because the de-batching (file splitting) mechanism of BizTalk Server provides a different set of compound key properties that can be used.
These are
  • Seq# (sequence number of child message)
  • UID (Identifier of parent message)
  • isLast? (Boolean whether if the child message is the last in the sequence)
  

The Sequence number of the child message, and the Boolean isLast can be used to determined the total number of records within the large message. For example, if there are 30 messages in a large message, the 30th message will have a sequence number of 30 (sequencing starts at 1), and isLast Boolean value of True.
image[135]


The final step is to aggregate all the result messages and ensure that they’re correctly collated into the same batch that their original source messages from from. The UID that was used for correlation, is the parent identifier that can be used to group these result messages into. An external database table can be used to temporarily store the incoming messages, and a single SQL Stored Procedure can be used to extrapolate these messages into a single output file.
  • A result message is received by a FILE Receive Port
  • This result message is stored in a SQL Server table. This is achieved by being routed to a SQL Send Port where the original message is wrapped into a sort of envelope by a map which uses a custom XSLT. The outbound message is an Updategram which calls a stored procedure to insert the message into the table.
  • A SQL receive port calls a stored procedure to retrieve the records of a certain type from the custom table. This operation generates a single document.
  • This document is routed to a File Send Port where the message is eventually transformed into another format.
image
This method of using a SQL aggregator was communicated in http://geekswithblogs.net/asmith/archive/2005/06/06/42281.aspx
Note that the XSLT can be eventually replaced by a custom pipeline component or directly by an envelope (in this case the send pipeline needs to contain the Xml Assembler component).
Sample code is provided courtesy of Ji Young Lee, of Microsoft (South) Korea. Uncompress the ZIP file with the directory structure intact on the C:\ root directory (i.e.: c:\QuocProj). :http://code.msdn.microsoft.com/AsynchAggregation 
Acknowledgements to: Thiago Almeida (Datacom), Werner Van Huffel (Microsoft), Ji Young Lee (Microsoft), Paolo Salvatori (Microsoft)
Read more ...

Implementing FIFO Pattern in Biz Talk 2010

Friday, July 8, 2011

1. Problem

You are implementing an integration point where message order must be maintained. Messages must be delivered from your source system to your destination system in first-in/first-out (FIFO) sequence.

2. Solution

In scenarios where FIFO-ordered delivery is required, sequential convoys handle the race condition that occurs as BizTalk attempts to process subscriptions for messages received at the same time. Ordered message delivery is a common requirement that necessitates the use of sequential convoys. For example, FIFO processing of messages is usually required for financial transactions. It is easy to see why ordered delivery is required when looking at a simple example of a deposit and withdrawal from a bank account. If a customer has $0.00 in her account, makes a deposit of $10.00, and then makes a withdrawal of $5.00, it is important that these transactions are committed in the correct order. If the withdrawal transaction occurs first, the customer will likely be informed that she has insufficient funds, even though she has just made her deposit.
Sequential convoys are implemented by message correlation and ordered delivery flags in BizTalk Server, as outlined in the following steps.
  1. Open the project that contains the schema. (We assume that an XSD schema used to define a financial transaction message is already created.)
  2. Add a new orchestration to the project, and give it a descriptive name. In our example, the orchestration is named SequentialConvoyOrchestration.
  3. Create a new message, and specify the name and type. In our example, we create a message named FinancialTransactionMessage, which is defined by the FinancialTransactionSchema schema.
  4. In the Orchestration View window, expand the Types node of the tree view so that the Correlation Types folder is visible.
  5. Right-click the Correlation Types folder, and select New Correlation Type, which creates a correlation type and launches the Correlation Properties dialog box.
  6. In the Correlation Properties dialog box, select the properties that the convoy's correlation set will be based on. In our example, we select the BTS.ReceivePortName property, which indicates which receive port the message was received through.
  7. Click the new correlation type, and give it a descriptive name in the Properties window. In our example, the correlation type is named ReceivePortNameCorrelationType.
  8. In the Orchestration View window, right-click the Correlation Set folder, select New Correlation Set, and specify a name and correlation type. In our example, we create a correlation set named ReceivePortNameCorrelationSet and select ReceivePortNameCorrelationType.
  9. From the toolbox, drag the following onto the design surface in top-down order. The final orchestration is shown in Figure 1.
    • Receive shape to receive the initial order message: Configure this shape to use the FinancialTransactionMessage, activate the orchestration, initialize ReceivePortNameCorrelationSet, and to use an orchestration receive port.
    • Loop shape to allow the orchestration to receive multiple messages: Configure this shape with the expression Loop == true (allowing the orchestration to run in perpetuity).
    • Send shape within the Loop shape: This delivers the financial transaction message the destination system. Configure this shape to use an orchestration send port.
    • Receive shape within the Loop shape: This receives the next message (based on the order messages were received) in the convoy. Configure this shape to use the FinancialTransactionMessage, to follow the ReceivePortNameCorrelationSet and to use the same orchestration receive port as the first Receive shape.

    Figure 1. Configuring a sequential convoy
  10. Build and deploy the BizTalk project.
  11. Create a receive port and receive location to receive messages from the source system.
  12. Create a send port to deliver messages to the destination system. In our solution, we send messages to an MSMQ queue named TransactionOut. In the Transport Advanced Options section of the Send Port Properties dialog box, select the Ordered Delivery option, as shown in Figure 2.

    Figure 2. Configuring an ordered delivery send port
  13. Bind the orchestration to the receive and send ports, configure the host for the orchestration, and start the orchestration.

3. How It Works

In this solution, we show how a convoy can be used to sequentially handle messages within an orchestration. The sequential convoy consists of the ReceivePortNameCorrelationSet and the ordered delivery flags specified on the receive location and send port. The first Receive shape initializes the correlation set, which is based on the receive port name by which the order was consumed. Initializing a correlation set instructs BizTalk Server to associate the correlation type data with the orchestration instance. This allows BizTalk to route all messages that have identical correlation type criteria (in our case, all messages consumed by the receive port bound to the orchestration) to the same instance. The Ordered Processing flag further instructs BizTalk Server to maintain order when determining which message should be delivered next to the orchestration.
NOTE

The adapter used to receive messages into sequential convoy orchestrations must implement ordered delivery. If an adapter supports ordered delivery, the check box will appear on the Transport Advanced Options tab.
The Send shape in the orchestration delivers the financial transaction message to a destination system for further processing. The second Receive shape follows the correlation set, which allows the next message consumed by the receive port to be routed to the already running orchestration instance. Both the Send and second Receive shapes are contained within a loop, which runs in perpetuity. This results in a single orchestration instance that processes all messages for a given correlation set, in sequential order. This type of orchestration is sometimes referred to as a singleton orchestration.
3.1. Working with Sequential Convoys
The term convoy set is used to describe the correlation sets used to enforce convoy message handling. While our example used only a single correlation set, you can use multiple correlation sets to implement a sequential convoy. Regardless of how many correlation sets are used, sequential convoy sets must be initialized by the same Receive shape and then followed by a subsequent Receive shape.
Sequential convoys can also accept untyped messages (messages defined as being of type XmlDocument). You can see how this is important by extending the financial transaction scenario, and assuming that a deposit and withdrawal are actually different message types (defined by different schemas). In this case, a message type of XmlDocument would be used on the convoy Receive shapes.
3.2. Fine-Tuning Sequential Convoys
While our example does implement a sequential convoy, you can fine-tune the solution to handle sequential processing in a more efficient and graceful manner. As it stands now, the SequentialConvoyOrchestration handles each message received from the source MSMQ queue in order. This essentially single-threads the integration point, significantly decreasing throughput. Single-threading does achieve FIFO processing, but it is a bit heavy-handed. In our example, all transactions do not have to be delivered in order—just those for a particular customer. By modifying the convoy set to be based on a customer ID field in the financial transaction schema (instead of the receive port name), you can allow transactions for different customers to be handled simultaneously. This change would take advantage of BizTalk Server's ability to process multiple messages simultaneously, increasing the performance of your solution.
NOTE

In this scenario, you must use a pipeline that promotes the customer ID property (such as the XmlReceive pipeline) on the receive location bound to the sequential convoy orchestration. The PassThru receive pipeline cannot be used in this scenario.
Changing the convoy set to include a customer ID field would also impact the number of orchestrations running in perpetuity. Each new customer ID would end up creating a new orchestration, which could result in hundreds, if not thousands, of constantly running instances. This situation is not particularly desirable from either a performance or management perspective. To address this issue, you can implement a timeout feature allowing the orchestration to terminate if subsequent messages are not received within a specified period of time. Take the following steps to implement this enhancement. The updated orchestration is shown in Figure 3.
  1. Add a Listen shape in between the Send shape and second Receive shape.
  2. Move the second Receive shape to the left-hand branch of the Listen shape.
  3. Add a Delay shape to the right-hand branch of the Listen shape. Configure this shape to delay for the appropriate timeout duration. In our example, we set the timeout to be 10 seconds by using the following value for the Delay property:
    new System.TimeSpan(0,0,0,10)

  4. Add an Expression shape directly below the Delay shape. Configure this shape to exit the convoy by using the following expression:
    Loop = false;

Figure 3. Configuring a terminating sequential convoy

Finally, you can enhance the solution to ensure that messages are successfully delivered to the destination system before processing subsequent messages. In the current solution, messages are sent out of the orchestration to the MessageBox database via the orchestration port. Once this happens, the orchestration continues; there is no indication that the message was actually delivered to its end destination. For example, if the destination MSMQ queue was momentarily offline, a message may be suspended while subsequent messages may be delivered successfully. Take the following steps to implement this enhancement. The updated orchestration portion is shown in Figure 4.
  1. Change the orchestration send port's Delivery Notification property to Transmitted.
  2. Add a Scope shape directly above the Send shape.
  3. Move the Send shape inside the Scope shape.
  4. Add an exception handler by right-clicking the Scope shape. Configure this shape to have an Exception Object Type property of Microsoft.XLANGs.BaseTypes.DeliveryFailureException. Enter a descriptive name for the Exception Object Name property.
  5. Add an Expression shape inside the exception handler block added in the previous step. Configure this shape to appropriately handle delivery failure exceptions. In our solution, we simply write the event to the trace log via the following code:
    System.Diagnostics.Trace.Write("Delivery Failure Exception Occurred - " +
    deliveryFailureExc.Message);

Figure 4. Capturing delivery failure exceptions

Read more ...

Biz talk Singleton Orchestration Design

Tuesday, July 5, 2011

Here’s a white paper from Microsoft on creating sequential FIFO orchestrations (this applies to singletons as well), It’s actually very hard to create an orchestration that neatly ends with zero risk of losing messages. This is because a message might come in after the listen shape, but before the orchestration has terminated.
To test this, I created a simple singleton orchestration that adds a deliberate wait of 2 minutes before finishing.
I then deployed the orchestration, and starting sending in messages one by one (by the way, “Do Something” simply writes the one field in the message to the debugger).
So the message with value 1 was sent first, here’s the debugger output:
12880] Field was: 1
Then the message with value 2 was sent in, here’s the output:
[12880] Field was: 2
Then, i waited about 45 seconds, just long enough to get us past the first Delay shape in the orchestration (of 30 seconds).  I submitted a message with value 3, but received no output (as expected).  I waited until the remaining time had finished and saw the orchestration suspend with this error:
The orchestration was not resumable.  I then sent in a message with value 4, here’s the output:
[12880] Field was: 4
So, what does this prove?  If a message comes in before your orchestration has had the time to complete (and you’re no longer waiting to receive a message), you will have unprocessed messages in the orchestration when goes to complete.  I guess the good news is that you can see what that message was by clicking on the message tab, so if you’re willing to run this risk, you might go ahead with this decent, but not perfect, design.
To be absolutely fail-proof, the paper offers a few suggestions, such as stopping the receive location via wmi script as part of the shutdown process.  This is fine and all, except for this one question – how is it supposed to be turned back on? If you do this as part of the same orchestration, you have the same problem you started with!  I guess the one way this could be done would be by adding a “Start Orchestration” shape, which begins with a delay, and then enables the receive location again (the delay is to allow time for the calling orchestration to finish with no risk of losing messages).
Good luck!
Read more ...