Data

Overview

Package

Class

Use

Tree

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.reverseXSL.message
Class Data

java.lang.Object
  com.reverseXSL.message.Data

public class Data
extends java.lang.Object
extends java.lang.Object

Wraps byte-oriented collections or structures into an object enriched with numerous methods capable of normalizing interchange data.

The original data piece (e.g. a ByteBuffer) is just wrapped, and not cloned nor copied. Therefore any later change to the argument data piece may geopardize operations.

Raw data received via some communication channel or read from various media is often affected by additional control characters, spurious record delimiters or other 'pollution' of the canonical formats required for fully automated processing. This Data class is used to wrap such raw data and yield clean, de-polluted, streams of bytes or characters for message processing.

NOTE: full use of this class is for future functional extensions of the reverseXSL software.

Field Summary
`static int`	`_1NewLineAtEnd` arg for `getConvertedData(int)` : suppress trailing empty lines and ensure that the very last data line bears a single line terminator.
`static int`	`_ASCII7bits` arg for `getConvertedData(int)` : control characters (value<32) are discarded except for tabs, carriage returns and line feeds.
`static int`	`_NoBlankLine` arg for `getConvertedData(int)` : suppress blank lines everywhere in the original data.
`static int`	`_NoCRLFBytes` arg for `getConvertedData(int)` : suppress all CR's and LF's.
`static int`	`_NoCtrlBytes` arg for `getConvertedData(int)` : control BYTES (value<32) are discarded except for tabs, carriage returns and line feeds.
`static int`	`_NoCtrlChars` arg for `getConvertedData(int)` : control characters (value<32) are discarded except for tabs, carriage returns and line feeds.
`static int`	`_NONE` arg for `getConvertedData(int)` : case of no conversion requested.
`static int`	`_ToCRLF` arg for `getConvertedData(int)` : convert standalone LF's to CRLF's, and preserve existing CRLF's.
`static int`	`_ToLF` arg for `getConvertedData(int)` : remove CR's.
`static int`	`_ToUPPER` arg for `getConvertedData(int)` : convert all characters to their uppercase equivalents (based on built-in java String methods).
`static int`	`_TrimNBSP` arg for `getConvertedData(int)` : Trim Non-Breaking SPaces (i.e.
`static int`	`_UnfoldPSCRMRemarks` arg for `getConvertedData(int)` : IATA PSCRM messages still generated by older systems can enforce the 69 chars limit of the even older TELEX transmission system by cutting lines in the middle of remarks elements, e.g.

Constructor Summary
`Data(byte[] ba)` Instantiate a Data object from a byte array, assuming UTF-8 as charset for character oriented operations on this data.
`Data(byte[] ba, java.nio.charset.Charset cs)` Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.
`Data(java.nio.ByteBuffer bb)` Instantiate a Data object from a byte buffer, assuming UTF-8 as charset for character oriented operations on this data.
`Data(java.nio.ByteBuffer bb, java.nio.charset.Charset cs)` Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.
`Data(java.io.InputStream inS, java.nio.charset.Charset cs)` Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.

Method Summary
`byte[]`	`getArray()` Get the backing byte array.
`byte[]`	`getBytes()`
`java.lang.StringBuffer`	`getConvertedData(int conversions)` Converting the Data bytes to Characters while at the same time filtering and normalizing data.
`DataFormat`	`getFormat()` Get the data format type.
`DataFormat`	`identify()` Inspect data and set the data format type.
`static DataFormat`	`identify(java.lang.String msg)` Attempts an identification of the data format based on a short string.
`int`	`length()` get the actual data length.
`static int`	`tokenValue(java.lang.String opt)` utility method to convert named conversion tokens into the corresponding conversion token value.

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

_1NewLineAtEnd

public static final int _1NewLineAtEnd

arg for getConvertedData(int) : suppress trailing empty lines and ensure that the very last data line bears a single line terminator. The line terminator is either CR or CRLF according to other conversions or the existing data contents (by default).

See Also:: Constant Field Values

_ASCII7bits

public static final int _ASCII7bits

arg for getConvertedData(int) : control characters (value<32) are discarded except for tabs, carriage returns and line feeds. Character values above 127 are replaced by a '?'.

Note that this method operates on characters, not bytes, and thus also properly replaces all multibyte characters (whose Unicode values are always >127) with a single '?'.

See Also:: Constant Field Values

_NoBlankLine

public static final int _NoBlankLine

arg for getConvertedData(int) : suppress blank lines everywhere in the original data. Precisely, both the true empty lines and those containing only spaces or tabs are removed.

This is a byte-oriented method, applied before decoding bytes into characters!

See Also:: Constant Field Values

_NoCRLFBytes

public static final int _NoCRLFBytes

arg for getConvertedData(int) : suppress all CR's and LF's. In other words, consider the original data as a very long line.

This is a byte-oriented method, applied before decoding bytes into characters!

It is most useful when added to _NoCtrlBytes in which case only tab characters (with a value <32) are preserved.

See Also:: Constant Field Values

_NoCtrlBytes

public static final int _NoCtrlBytes

arg for getConvertedData(int) : control BYTES (value<32) are discarded except for tabs, carriage returns and line feeds. 8-bit values (above 127) are preserved.

This is a byte-oriented method, applied before decoding bytes into characters!

Compared with _NoCtrlChars, the supporting function operates on bytes and not characters, and thus may discard bytes actually belonging to multibyte character encodings, thus scrambling the original data!.

However,

this potential side effect will not affect UTF-8 encodings because all multibyte values in UTF-8 are over 128 (the most significant bit is always 1 by construction)
UTF-16 encodings (miss-named 'Unicode' versus UTF-8 in MS-Windows) will notably be scrambled.

The function is peculiarly useful whenever legacy single-byte character codings are expected (e.g. ISO-8859) and must be de-polluted.

See Also:: Constant Field Values

_NoCtrlChars

public static final int _NoCtrlChars

arg for getConvertedData(int) : control characters (value<32) are discarded except for tabs, carriage returns and line feeds. All other character values are preserved.

Compared with _NoCtrlBytes, the supporting function preserves character values that would be encoded as 8bit values in ISO-8859, and all multibyte characters in UTF-16 Unicode Transformation Formats.

See Also:: Constant Field Values

_NONE

public static final int _NONE

arg for getConvertedData(int) : case of no conversion requested.

See Also:: Constant Field Values

_ToCRLF

public static final int _ToCRLF

arg for getConvertedData(int) : convert standalone LF's to CRLF's, and preserve existing CRLF's.

See Also:: Constant Field Values

_ToLF

public static final int _ToLF

arg for getConvertedData(int) : remove CR's.

See Also:: Constant Field Values

_ToUPPER

public static final int _ToUPPER

arg for getConvertedData(int) : convert all characters to their uppercase equivalents (based on built-in java String methods).

See Also:: Constant Field Values

_TrimNBSP

public static final int _TrimNBSP

arg for getConvertedData(int) : Trim Non-Breaking SPaces (i.e. space chars and tabs) at the beginning and end of each line in the message body.

Note that if you combine this trim operation with _NoCRLFBytes you will only trim NBSP leading and trailing the entire data because _NoCRLFBytes transforms the whole data into a single big line first!

See Also:: Constant Field Values

_UnfoldPSCRMRemarks

public static final int _UnfoldPSCRMRemarks

arg for getConvertedData(int) : IATA PSCRM messages still generated by older systems can enforce the 69 chars limit of the even older TELEX transmission system by cutting lines in the middle of remarks elements, e.g.

1BIMMEL/LMRS-BV2 .L/272397 .R/TOP KRSV .R/CKIN HK1 1BAG 05KG-1BIMMEL/LMRS

becomes:

1BIMMEL/LMRS-BV2 .L/272397 .R/TOP KRSV .R/CKIN HK1 1BAG
.RN/05KG-1BIMMEL/LMRS

thus breaking the .R/CKIN check-in luggage segment that normally runs to the end of line. This flag restores the canonical long line.

See Also:: Constant Field Values

Constructor Detail

Data

public Data(byte[] ba)

Instantiate a Data object from a byte array, assuming UTF-8 as charset for character oriented operations on this data.

Parameters:: ba - the byte array wrapped as Data, which is NOT copied (later changes to the byte array may adversely affect this Data)

Data

public Data(byte[] ba,
            java.nio.charset.Charset cs)

Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.

Parameters:: ba - the byte array wrapped as Data, which is NOT copied (later changes to the byte array may adversely affect this Data); cs - if null defaults back to UTF-8

Data

public Data(java.nio.ByteBuffer bb)

Instantiate a Data object from a byte buffer, assuming UTF-8 as charset for character oriented operations on this data.

Parameters:: bb - the byte buffer wrapped as Data, which is rewound but NOT copied (later changes to the ByteBuffer may adversely this Data)

Data

public Data(java.nio.ByteBuffer bb,
            java.nio.charset.Charset cs)

Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.

Parameters:: bb - the byte buffer wrapped as Data, which is rewound but NOT copied (later changes to the ByteBuffer may adversely this Data); cs - if null defaults back to UTF-8

Data

public Data(java.io.InputStream inS,
            java.nio.charset.Charset cs)
     throws java.io.IOException

Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data.

Parameters:: inS - the source of byte-oriented data; cs - if null defaults back to UTF-8
Throws:: java.io.IOException - as would result from read errors from the argument input stream

Method Detail

getArray

public byte[] getArray()

Get the backing byte array. Note that its size is often greater than the actual data.

Returns:: backing array of bytes.

getBytes

public byte[] getBytes()

getConvertedData

public java.lang.StringBuffer getConvertedData(int conversions)

Converting the Data bytes to Characters while at the same time filtering and normalizing data.

The character set specified at instantiation (or default UTF-8) is used to interpret bytes into characters.

Parameters:: conversions - either the value _NONE, else the addition of one or more of the constants _ToCRLF, _ToLF, _1NewLineAtEnd, _ToUPPER, _ASCII7bits, _TrimNBSP _NoCtrlBytes, _NoCRLFBytes, _NoBlankLine, _NoCtrlChars.
Returns:: string buffer

getFormat

public DataFormat getFormat()

Get the data format type.

Returns:: one of ANY, IATA, CSV, TEXT, XML, EDIFACT, X12, TRADACOMS, SWIFT, PROPRIETARY, BINARY

identify

public DataFormat identify()

Inspect data and set the data format type.

Returns:: one of ANY, IATA, CSV, TEXT, XML, EDIFACT, X12, TRADACOMS, SWIFT, PROPRIETARY, BINARY

identify

public static DataFormat identify(java.lang.String msg)

Attempts an identification of the data format based on a short string. Only the first 100 characters are actually inspected.

Parameters:: msg - typically, a short string
Returns:: one of ANY, IATA, CSV, TEXT, XML, EDIFACT, X12, TRADACOMS, SWIFT, PROPRIETARY, BINARY

length

public int length()

get the actual data length.

Returns:: lentgh in bytes, as integer, i.e. up to 4Gbytes

tokenValue

public static final int tokenValue(java.lang.String opt)

utility method to convert named conversion tokens into the corresponding conversion token value.

Parameters:: opt - the named value
Returns:: the matching integer value, -1 if not found

Overview

Package

Class

Use

Tree

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.reverseXSL.message Class Data

_1NewLineAtEnd

_ASCII7bits

_NoBlankLine

_NoCRLFBytes

_NoCtrlBytes

_NoCtrlChars

_NONE

_ToCRLF

_ToLF

_ToUPPER

_TrimNBSP

_UnfoldPSCRMRemarks

Data

Data

Data

Data

Data

getArray

getBytes

getConvertedData

getFormat

identify

identify

length

tokenValue

com.reverseXSL.message
Class Data