Biztalk Aggregation Pattern For Large Messages

Monday, August 29, 2011

Source : MSDN Blogs Richard Seroter
There are a few instances of Aggregator patterns for BizTalk Server, but I recently had a customer who was dealing with a very large batch, and therefore the existing patterns were insufficient. So, I threw together a sample that showed how to handle very large batches in a fairly speedy manner.
In this use case, we have an XML message come into BizTalk, want to split out the batch, process each message individually, then aggregate the whole batch back together again and send a single message out. Now using existing aggregation patterns didn't work since they mostly involve building up an XML file in memory. When a batch of 1000+ message got submitted it was taking over 10 minutes to process the batch, which was way too slow.
So, my solution uses a combination of envelope de-batching, sequential convoy pattern, and file streaming. The first part of the solution was to build the XML schemas necessary. The first schema, the envelope schema, looked like this ...

Three things to note here. First, I set the <Schema> node's Envelope property to Yes. This way I won't have to set anything in my pipeline for it to process the batch. Secondly, I promoted both BatchID andCount fields. We'll see the significance in a moment. Thirdly, I set the Body XPath property of the root node to the Record node of the schema. This way, the pipeline will see that it's an envelope, and, know where to split the batch at.
The second schema is the body schema. It looks like this ...

Notice that it too has promoted properties. By doing this, the values from the master envelope schema get copied down to EACH debatched message without me doing anything special. So, setting up my schemas this way, in combination with using the standard XML receive pipeline, will cause my message to be debatched, and, push values down from the envelope to each message. Cool!
Remember that I have to rebuild the message on the way out. So, I use the file system to build this message up, instead of doing it in memory and increasing my footprint. Therefore, when I first receive the message into my orchestration I need to create this temporary file on disk (and re-create the envelope header). Likewise, when all the messages have been received and processed, I also need to write the closing envelope text. This way I have a temporary file in disk that contains the envelope, and all the processed message bodies.
So, I wrote a quick component to write the envelope header and footer to disk. It also will load the completed file and return an XML document. It looks like this ...
public void WriteBatchHeader()
{
string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";
//envelope header text
string header = "";

StreamWriter sw = new StreamWriter(docpath, true);
sw.Write(header);
sw.Flush();
sw.Close();
}

public void WriteBatchFooter()
{
string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";
//closing envelope text
string footer = "";

StreamWriter sw = new StreamWriter(docpath, true);
sw.Write(footer);
sw.Flush();
sw.Close();
}

public XmlDocument LoadBatchFile()
{
string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";

XmlDocument xmlDoc = new XmlDocument();
StreamReader sr = new StreamReader(docpath);
xmlDoc.Load(sr);
sr.Close();

return xmlDoc;
}
Alright, now I have my orchestration. As you can see below, in this process I start by:
  • Receive the first debatched message.
  • Call my helper component (code above), create the temporary file, and add the top XML structure.
  • Then I send the message out (initializing the correlation set) via FILE adapter (see below for configuration).
  • Then in the Expression shape I set the loop counter off of the promoted/distinguished Count field (which holds the number of records in the entire batch).
  • I then process each subsequent debatched message via a sequential convoy, following the previously initialized correlation set. In this example, I do no further message processing and simply send the message back out via the FILE adapter.
Once the loop is done, that means I've processed all the debatched records. So, the last half of the orchestration deals with the final processing. Here I do the following:
  • Add a 10 second delay. This was necessary because the messaging engine was still writing files to disk as the next component was trying to add the closing XML tags.
  • Call the helper component (code above) to write the XML footer to the entire message.
  • Next I construct the outbound message by setting the message variable equal to the response of the LoadBatchFile() defined above. I added trace points to see how long this took, and it was subsecond. This was to be expected even on the large file since I'm just streaming it all in.
  • Finally, send the rebuilt message back out via any adapter.

The last step was to configure the FILE send adapter being used to write this file to disk. As you can see in the picture, I set the PassThruTransmit pipeline and configured the Copy mode to Append. This caused each message sent to to this port to be added to the same outbound file. 
So, when it was all said and done, I was able to process a batch with 1000 messages in around 2 minutes on a laptop. This was significantly better than the 10+ minutes the customer was experiencing. I'm still using a sequential convoy so I can't process messages as quickly as if I was spawning lots of unique orchestrations, but, our speed improvement came from streaming files around vs. continually accessing the XML DOM. Neat stuff.

No comments:

Post a Comment

Post Your Comment...