com.reverseXSL.parser
Class Parser

java.lang.Object
  extended by com.reverseXSL.parser.Parser

public final class Parser
extends java.lang.Object

Provides the methods to translate input character streams into XML documents. The parsing is based on DEF files.

Please refer to the MS-Word documentation 'ReverseXSL DEF file specs.doc' for a complete description of the Definition objects and file syntaxes handled by this parser.
See also Definition.

Design Note: Given the simplicity of a parsing environment (simply comprising a DEF file) we have not associated a ParserFactory to the Parser itself. One shall simply instantiate a parser via the constructor:
myParser = new Parser(def, maxFatal, maxExceptions);
and then call it as often as desired, repeating in this case the same transformation, each time on a new message, as in:
myParser.parse(dataIn, ...);
Note that the parse() method in proper returns a count of exceptions. Additional methods are used to inspect results and get a rendering of the output, only as an XML-formatted document in the present version (additional output formats could be added in future releases). A Parser instance is a stateful object, whose state is reset at the start of any new parse() method call.

The present class provides a fairly low-level API for reverse XSL transformations. Please consider the TransformerFactory and Transformer objects for improved productivity.


Nested Class Summary
 class Parser.ExceptionListIterator
          This Inner Class sub-classes a ListIterator such as to support methods more specific to the handling of the Exception list recorded by the parser.
 
Constructor Summary
Parser()
          Required but not much useful as such.
Parser(Definition msgDef, int maxFatEx, int maxEx)
          Initialises a new Parser object with a reference Definition and Exception handling parameters.
Parser(Definition msgDef, int maxFatEx, int maxEx, int maxMisMatch)
          Variant of Parser(Definition, int, int) that allows to set the max number of successive segment/element matching failures after which the parser will attempt to 'backtrack'.
 
Method Summary
 void adjustExceptionsLineOffsets(int adjustment)
          Adjust the line offsets of all recorded exceptions by adding the given adjustment value to all line offsets (relevant whenever the input message is set of lines).
 Parser.ExceptionListIterator exceptionIterator()
          Provides a List Iterator on the Array List of recorded exceptions (stored in parser state next to a parse(String, String, int).
 int extractCompositeValue(java.lang.StringBuffer sb, java.util.regex.Pattern ptrn)
          Magic procedure able to return the concatenated value of all capturing groups in a complex pattern applied to a string (the pattern can match once or more times).
 int extractCompositeValue(java.lang.StringBuffer sb, java.util.regex.Pattern ptrn, java.lang.String sep)
          Magic procedure able to return the concatenated value of all capturing groups in a complex pattern applied to a string (the pattern can match once or more times).
 java.io.StringWriter getXML(boolean withRAW, boolean indent)
          Provides an XML rendering of the tagged message as resulting from parsing, i.e.
 int parse(java.lang.String msgID, java.io.LineNumberReader dataIn, int startLineNb)
          Parses an input character stream using a LineNumberReader.
 int parse(java.lang.String msgID, java.lang.String dataIn, int startLineNb)
          Variant parse method starting from a string and falling back onto the other parse(String, LineNumberReader, int) method whenever the system discovers that the cut function at the Message level is CUT-ON-NL.
 void removeNonRepeatableNilOptionalElements(boolean tf)
          This method must be called before parsing in itself (i.e.
 void setBaseNamespace(java.lang.String bns)
          Sets the base XML namespace for every following parse(String, LineNumberReader, int) invocation followed by getXML(boolean, boolean).
 java.lang.String toString()
          Dumps an overview of the parser state into a text string.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Parser

public Parser()
Required but not much useful as such.

This method was made public just for the sake of invoking diverse utility methods notably in RegexCheck and Definition.

See Also:
Parser(Definition, int, int), Parser(Definition, int, int, int)

Parser

public Parser(Definition msgDef,
              int maxFatEx,
              int maxEx)
Initialises a new Parser object with a reference Definition and Exception handling parameters. The parse(String, LineNumberReader, int) method can then be repetitively invoked on diverse input message data.

The maximum number of successive missed-element-matching before backtracking is 3 by default. See Parser(Definition, int, int, int).

Parameters:
msgDef - the message Definition object to use for parsing
maxFatEx - the max number of fatal exceptions that will be recorded before being thrown
maxEx - the max number of all kinds of exceptions (including fatal ones) that will be recorded before being thrown
See Also:
Definition

Parser

public Parser(Definition msgDef,
              int maxFatEx,
              int maxEx,
              int maxMisMatch)
Variant of Parser(Definition, int, int) that allows to set the max number of successive segment/element matching failures after which the parser will attempt to 'backtrack'.

Backtracking means that the Parser will give-up with the current input message element (i.e. skip data and leave it un-tagged as RAW input data), jump back (i.e. 'backtrack) to the last unmatched definition, and attempt to resume parsing from there.

Parameters:
msgDef - the message Definition object to use for parsing
maxFatEx - the max number of fatal exceptions that will be recorded before being thrown
maxEx - the max number of all kinds of exceptions (including fatal ones) that will be recorded before being thrown
maxMisMatch - the maximum number of successive missed-element-matching after which the parser will attempt to resume parsing by skipping input data and backtracking into the definition.
See Also:
Definition
Method Detail

adjustExceptionsLineOffsets

public void adjustExceptionsLineOffsets(int adjustment)
Adjust the line offsets of all recorded exceptions by adding the given adjustment value to all line offsets (relevant whenever the input message is set of lines). The adjustment can be either positive or negative and is actually added to exceptions' line offsets.
Consistency rule: Existing line offsets which may yield a negative value next to adjustment are not updated. The use of this method in a proper message parsing context shall never yield such case.

This method is a facility to perform the parsing of a input message on the text message body part alone (e.g. without the message's header lines), and then report line offsets of any parsing errors relative to the very beginning of the message, header lines included. If an input interchange contains several messages, this facility helps parsing each message in turn but reports offsets with regard to the global interchange.

Release note: a future release is planned that will de-pollute and normalize well-known EDI formats like EDIFACT and X12 before Parsing, and make segment offsets like line offsets.

Parameters:
adjustment -

exceptionIterator

public Parser.ExceptionListIterator exceptionIterator()
Provides a List Iterator on the Array List of recorded exceptions (stored in parser state next to a parse(String, String, int).

Note that the last exception that possibly caused the MaxFatal or MaxAllExceptions counts to be exceeded (and thrown) is also recorded.

Returns:
an extended ListIterator supporting extra methods for improved iteration through a Parser state.

extractCompositeValue

public int extractCompositeValue(java.lang.StringBuffer sb,
                                 java.util.regex.Pattern ptrn)
Magic procedure able to return the concatenated value of all capturing groups in a complex pattern applied to a string (the pattern can match once or more times). The procedure ignores the sub-capturing-groups. (i.e. ignores nested Capturing groups) that would create data duplication in the result.

NOTE: This method was made public just for the sake of being invoked by the RegexCheck tool.

Parameters:
sb - string buffer containing original string and returned with the extracted result
ptrn - pattern of reference with capturing groups
Returns:
the offset in the original string of the first byte of the extracted part

extractCompositeValue

public int extractCompositeValue(java.lang.StringBuffer sb,
                                 java.util.regex.Pattern ptrn,
                                 java.lang.String sep)
Magic procedure able to return the concatenated value of all capturing groups in a complex pattern applied to a string (the pattern can match once or more times). The procedure ignores the sub-capturing-groups. (i.e. ignores nested Capturing groups) that would create data duplication in the result.

NOTE: This method was made public just for the sake of being invoked by the RegexCheck tool.

Parameters:
sb - string buffer containing original string and returned with the extracted result
ptrn - pattern of reference with capturing groups
sep - separator between multiple capturing group values in the concatenated resulting string
Returns:
the offset in the original string of the first byte of the extracted part

getXML

public java.io.StringWriter getXML(boolean withRAW,
                                   boolean indent)
                            throws javax.xml.parsers.ParserConfigurationException,
                                   javax.xml.parsers.FactoryConfigurationError,
                                   javax.xml.transform.TransformerFactoryConfigurationError,
                                   javax.xml.transform.TransformerException
Provides an XML rendering of the tagged message as resulting from parsing, i.e. next to a parse(String, LineNumberReader, int) method call.

Data and Marks elements whose names start with the special character @ are promoted as attributes of the parent element.

Parameters:
withRAW - tells to generate RAW element or not; i.e. either UnTagged elements else those explicitly tagged as 'RAW'
indent - asks for indentation (only line breaks on elements as true indentation does not work!)
Returns:
the XML output in a StringWriter
Throws:
javax.xml.parsers.FactoryConfigurationError
javax.xml.parsers.ParserConfigurationException
javax.xml.transform.TransformerFactoryConfigurationError
javax.xml.transform.TransformerException

parse

public int parse(java.lang.String msgID,
                 java.io.LineNumberReader dataIn,
                 int startLineNb)
          throws java.io.IOException,
                 ParserException
Parses an input character stream using a LineNumberReader. This implementation is able to trace line offsets in Parser Exceptions whenever the MSG level cut-function is actually CUT-ON-NL.

The parsing is successful when no exceptions are thrown and the returned number of recorded exceptions is 0.

Next to parsing, the XML document can be generated using:
getXML(boolean, boolean)

Parameters:
msgID - a message ID (will be recorded in exceptions and traced)
dataIn - the line number reader, possibly reset(), so that readLine() will get the very first characters
startLineNb - the line number to assume next to the first dataIn.readLine()
Returns:
the total count of exceptions that were recorded
Throws:
java.io.IOException
ParserException

parse

public int parse(java.lang.String msgID,
                 java.lang.String dataIn,
                 int startLineNb)
          throws java.io.IOException,
                 ParserException
Variant parse method starting from a string and falling back onto the other parse(String, LineNumberReader, int) method whenever the system discovers that the cut function at the Message level is CUT-ON-NL.

Parameters:
msgID - a message ID
dataIn - input string data message
startLineNb - starting line number (e.g. from the original message that also possibly contained a header/envelope)
Returns:
the total count of exceptions that were recorded
Throws:
java.io.IOException
ParserException

removeNonRepeatableNilOptionalElements

public void removeNonRepeatableNilOptionalElements(boolean tf)
This method must be called before parsing in itself (i.e. parse(String, LineNumberReader, int)) and would cause (if set TRUE) to remove all data elements with a NIL value that are optional or conditional elements, and whose matching definition indicates that the element is non repeatable (i.e. ACC 1), and whose minimum size requirement is >0.

This function is actually quite useful on messages based on the principle of positional data elements within 'segments' (e.g. EDIFACT, TRADACOMS, X12, etc.). Indeed, most positions (think 'slots') in such segments are occupied by optional/conditional data elements, all unique and distinguished by their relative position in the 'segment'. Every unoccupied position will yield a corresponding NIL data element in XML, that can be suppressed from the XML output if this method is set to TRUE.

NIL data elements are supressed only if they have a min/max size specification (of the kind [1..15] ) with a minimum of at least 1. Obviously, if 0 is an acceptable size, there's no reason to suppress the element.

Moreover, the element must be non-repeatable otherwise there is a risk to eat-up first and intermediate elements causing undesirable rank shifts.

The default value is false.

Parameters:
tf - new value for the flag

setBaseNamespace

public final void setBaseNamespace(java.lang.String bns)
Sets the base XML namespace for every following parse(String, LineNumberReader, int) invocation followed by getXML(boolean, boolean).

This namespace is not reset in between parse(...) calls.

The default namespace is "http://www.reverseXSL.com/FreeParser". Calling this method with null or empty arguments does reset the namespace to the default (as if setBasenamespace() was never invoked). Note that the namespace can be set via the java API alse the SET BASENAMESPACE statement in DEF files. In case both are used, the API takes precedence.

Parameters:
bns - the namespace that applies to this parser instance, e.g. "http://www.reverseXSL.com/Cargo"

toString

public java.lang.String toString()
Dumps an overview of the parser state into a text string. Useful for debugging Definitions under development and testing their effect on sample messages.

Overrides:
toString in class java.lang.Object
Returns:
complete parser state dump as text.