Our integration implementation makes heavy use of the Canonical Data Model pattern. To shortly sum it up , we accept data in a variety of formats (XML, CSV or proprietary) and map them to an XML schema and/or Java object model. Beyond the standard transport transformations supplied by Mule, we needed to implement a zoo of custom transformers to move to the canonical format. I was looking for a way to mitigate this complexity overhead in some sort of framework.
I had read this article on InfoQ about Smooks around when thinking about the above and it seemed like a good fit, especially since there is a Mule module for it. To make a long story short, we were able to upgrade to Mule 2.x and, using Smooks, not have to implement any model specific Mule transformers.
Smooks works by streaming data in, transforming it and streaming it out. "Cartridges" supply various transformation capabilities and exist for common data formats like XML, JSON and CSV. The streaming model means that the transformations themselves don't require the entire documents to be loaded in memory. This allows for large documents to be transformed without requiring the associated memory footprint.
The transformations can be accomplished via XML configuration assuming the data formats being used have an associated cartridge. This is also the case if the data is in a format you can easily move to a different format. For instance, we have Nagios 2.x instances that use a semi-colon delimited status.log to write alert data. A simple Groovy script allowed me to replace the semi-colons with commas. I was then able to use the CSV cartridge to convert the data to XML.
The above Nagios instances are being upgraded to Nagios 3.x. In Nagios 3.x, the status.log format is different. Instead of being semi-colon delimited, it is in a proprietary format that sort of looks like JSON. Here's an example:
There obviously isn't a Smooks cartridge that supports this format. One solution might be to try to convert the above format to JSON. This will probably work but likely be error-prone (and annoying to implement.) An alternative is to implement an XMLReader to parse the above file and spit out an XML Document.
servicestatus {
host_name=liro_url_laces0
service_description=liro_https://acmesoft.com/VI/Pages/General/TestConn.aspx
modified_attributes=0
check_command=check_https!/VI/
check_period=24x7
notification_period=24x7
check_interval=15.000000
retry_interval=2.000000
event_handler=
has_been_checked=1
..
}
Smooks uses implementations of XMLReader to parse arbitrary file formats as XML. It then operate on the SAX stream or DOM as dictated by a configuration file. The following illustrates an implementation of the parse method of XMLReader that will parse the status.log format above:
public void parse(InputSource inputSource) throws IOException, SAXException {
if (contentHandler == null) {
throw new IllegalStateException("'contentHandler' not set. Cannot parse Email stream.");
}
String currentBlock = null;
contentHandler.startDocument();
contentHandler.startElement(XMLConstants.NULL_NS_URI, "statusLog", "", EMPTY_ATTRIBS);
for (String line : getString(inputSource).split("\n")) {
if (line.startsWith("#"))
continue;
if (line.contains("servicestatus")) {
String block = StringUtils.deleteWhitespace(line.split("\\{")[0]);
contentHandler.startElement(XMLConstants.NULL_NS_URI, block, "", EMPTY_ATTRIBS);
currentBlock = block;
}
if (currentBlock != null) {
if (line.contains("=")) {
String[] fields = line.split("=", 2);
String fieldName = StringEscapeUtils.escapeXml(StringUtils.deleteWhitespace(fields[0].replace("=", "")));
contentHandler.startElement(XMLConstants.NULL_NS_URI, fieldName, "", EMPTY_ATTRIBS);
if (fields.length > 1) {
String content = StringEscapeUtils.escapeXml(fields[1]);
contentHandler.characters(content.toCharArray(), 0, content.length());
} else {
contentHandler.characters(" ".toCharArray(), 0, 1);
}
contentHandler.endElement(XMLConstants.NULL_NS_URI, fieldName, "");
}
if (line.contains("}")) {
contentHandler.endElement(XMLConstants.NULL_NS_URI, currentBlock, "");
currentBlock = null;
}
}
}
contentHandler.endElement(XMLConstants.NULL_NS_URI, "statusLog", "");
contentHandler.endDocument();
}
We can plug the reader into the Smooks XML config :
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.1.xsd"
xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd"
>
<params>
<param name="stream.filter.type">SAX</param>
<param name="default.serialization.on">false</param>
</params>
<reader class="net.opsource.osb.reader.NagiosReader"/>
<resource-config selector="servicestatus">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>
<ftl:freemarker applyOnElement="statusLog">
<ftl:template><!--
<ApplicationResponseTimes>
<?TEMPLATE-SPLIT-PI?>
</ApplicationResponseTimes>
-->
</ftl:template>
</ftl:freemarker>
<ftl:freemarker applyOnElement="servicestatus">
<ftl:template>smooks/monitoring/application_response_time/metric.ftl</ftl:template>
</ftl:freemarker>
</smooks-resource-list>
Now we plug it into Mule using the Smooks module and we're ready to go.
<smooks:transformer name="nagiosStatusLineToXML"
configFile="smooks/monitoring/application_response_time/smooks-config.xml"
resultType="STRING"/>
I'm pretty excited about this because I'm no longer writing a dedicated transformer for each domain model I'm mapping data to. I just need to implement XMLReaders when I come across a data format not already supported by a Smooks cartridge.