|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.reverseXSL.message.Data
public class Data
Wraps byte-oriented collections or structures into an object enriched with numerous methods capable of normalizing interchange data.
The original data piece (e.g. a ByteBuffer) is just wrapped, and not cloned nor copied. Therefore any later change to the argument data piece may geopardize operations.
Raw data received via some communication channel or read from various media is often affected by additional control characters, spurious record delimiters or other 'pollution' of the canonical formats required for fully automated processing. This Data class is used to wrap such raw data and yield clean, de-polluted, streams of bytes or characters for message processing.
NOTE: full use of this class is for future functional extensions of the reverseXSL software.
Field Summary | |
---|---|
static int |
_1NewLineAtEnd
arg for getConvertedData(int) : suppress trailing empty lines
and ensure that the very last data line bears a single line terminator. |
static int |
_ASCII7bits
arg for getConvertedData(int) : control characters (value<32)
are discarded except for tabs, carriage returns and line feeds. |
static int |
_NoBlankLine
arg for getConvertedData(int) : suppress blank lines everywhere
in the original data. |
static int |
_NoCRLFBytes
arg for getConvertedData(int) : suppress all CR's and LF's. |
static int |
_NoCtrlBytes
arg for getConvertedData(int) : control BYTES (value<32)
are discarded except for tabs, carriage returns and line feeds. |
static int |
_NoCtrlChars
arg for getConvertedData(int) : control characters (value<32)
are discarded except for tabs, carriage returns and line feeds. |
static int |
_NONE
arg for getConvertedData(int) : case of no conversion requested. |
static int |
_ToCRLF
arg for getConvertedData(int) : convert standalone LF's to CRLF's,
and preserve existing CRLF's. |
static int |
_ToLF
arg for getConvertedData(int) : remove CR's. |
static int |
_ToUPPER
arg for getConvertedData(int) : convert all characters to their
uppercase equivalents (based on built-in java String methods). |
static int |
_TrimNBSP
arg for getConvertedData(int) : Trim
Non-Breaking SPaces (i.e. |
static int |
_UnfoldPSCRMRemarks
arg for getConvertedData(int) : IATA PSCRM messages still generated by older systems can enforce
the 69 chars limit of the even older TELEX transmission system by cutting lines in the middle of remarks
elements, e.g. |
Constructor Summary | |
---|---|
Data(byte[] ba)
Instantiate a Data object from a byte array, assuming UTF-8 as charset for character oriented operations on this data. |
|
Data(byte[] ba,
java.nio.charset.Charset cs)
Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data. |
|
Data(java.nio.ByteBuffer bb)
Instantiate a Data object from a byte buffer, assuming UTF-8 as charset for character oriented operations on this data. |
|
Data(java.nio.ByteBuffer bb,
java.nio.charset.Charset cs)
Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data. |
|
Data(java.io.InputStream inS,
java.nio.charset.Charset cs)
Instantiate a Data object from a byte buffer, with the explicit charset that must be assumed for character oriented operations on this data. |
Method Summary | |
---|---|
byte[] |
getArray()
Get the backing byte array. |
byte[] |
getBytes()
|
java.lang.StringBuffer |
getConvertedData(int conversions)
Converting the Data bytes to Characters while at the same time filtering and normalizing data. |
DataFormat |
getFormat()
Get the data format type. |
DataFormat |
identify()
Inspect data and set the data format type. |
static DataFormat |
identify(java.lang.String msg)
Attempts an identification of the data format based on a short string. |
int |
length()
get the actual data length. |
static int |
tokenValue(java.lang.String opt)
utility method to convert named conversion tokens into the corresponding conversion token value. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int _1NewLineAtEnd
getConvertedData(int)
: suppress trailing empty lines
and ensure that the very last data line bears a single line terminator.
The line terminator is either CR or CRLF according to other conversions or
the existing data contents (by default).
public static final int _ASCII7bits
getConvertedData(int)
: control characters (value<32)
are discarded except for tabs, carriage returns and line feeds.
Character values above 127 are replaced by a '?'.
Note that this method operates on characters, not bytes, and thus also properly replaces all multibyte characters (whose Unicode values are always >127) with a single '?'.
public static final int _NoBlankLine
getConvertedData(int)
: suppress blank lines everywhere
in the original data. Precisely, both the
true empty lines and those containing only spaces or tabs are removed.
This is a byte-oriented method, applied before decoding bytes into characters!
public static final int _NoCRLFBytes
getConvertedData(int)
: suppress all CR's and LF's.
In other words, consider the original data as a very long line.
This is a byte-oriented method, applied before decoding bytes into characters!
It is most useful when added to_NoCtrlBytes
in which case only tab characters (with a value <32)
are preserved.
public static final int _NoCtrlBytes
getConvertedData(int)
: control BYTES (value<32)
are discarded except for tabs, carriage returns and line feeds.
8-bit values (above 127) are preserved.
This is a byte-oriented method, applied before decoding bytes into characters!
Compared with _NoCtrlChars
, the supporting function operates
on bytes and not characters, and thus may discard bytes
actually belonging to multibyte character encodings, thus scrambling the original data!.
However,
public static final int _NoCtrlChars
getConvertedData(int)
: control characters (value<32)
are discarded except for tabs, carriage returns and line feeds.
All other character values are preserved.
Compared with _NoCtrlBytes
, the supporting function preserves
character values that would be encoded as 8bit values in
ISO-8859, and all multibyte characters in
UTF-16 Unicode Transformation Formats.
public static final int _NONE
getConvertedData(int)
: case of no conversion requested.
public static final int _ToCRLF
getConvertedData(int)
: convert standalone LF's to CRLF's,
and preserve existing CRLF's.
public static final int _ToLF
getConvertedData(int)
: remove CR's.
public static final int _ToUPPER
getConvertedData(int)
: convert all characters to their
uppercase equivalents (based on built-in java String methods).
public static final int _TrimNBSP
getConvertedData(int)
: Trim
Non-Breaking SPaces (i.e. space chars and tabs) at the beginning and end of each line in the message body.
Note that if you combine this trim operation with _NoCRLFBytes
you will only trim NBSP leading and
trailing the entire data because _NoCRLFBytes transforms the whole data into a single big line first!
public static final int _UnfoldPSCRMRemarks
getConvertedData(int)
: IATA PSCRM messages still generated by older systems can enforce
the 69 chars limit of the even older TELEX transmission system by cutting lines in the middle of remarks
elements, e.g. 1BIMMEL/LMRS-BV2 .L/272397 .R/TOP KRSV .R/CKIN HK1 1BAG 05KG-1BIMMEL/LMRSbecomes:
1BIMMEL/LMRS-BV2 .L/272397 .R/TOP KRSV .R/CKIN HK1 1BAGthus breaking the .R/CKIN check-in luggage segment that normally runs to the end of line. This flag restores the canonical long line.
.RN/05KG-1BIMMEL/LMRS
Constructor Detail |
---|
public Data(byte[] ba)
ba
- the byte array wrapped as Data, which is NOT copied (later changes to the
byte array may adversely affect this Data)public Data(byte[] ba, java.nio.charset.Charset cs)
ba
- the byte array wrapped as Data, which is NOT copied (later changes to the
byte array may adversely affect this Data)cs
- if null defaults back to UTF-8public Data(java.nio.ByteBuffer bb)
bb
- the byte buffer wrapped as Data, which is rewound but NOT copied (later changes to the
ByteBuffer may adversely this Data)public Data(java.nio.ByteBuffer bb, java.nio.charset.Charset cs)
bb
- the byte buffer wrapped as Data, which is rewound but NOT copied (later changes to the
ByteBuffer may adversely this Data)cs
- if null defaults back to UTF-8public Data(java.io.InputStream inS, java.nio.charset.Charset cs) throws java.io.IOException
inS
- the source of byte-oriented datacs
- if null defaults back to UTF-8
java.io.IOException
- as would result from read errors from the argument input streamMethod Detail |
---|
public byte[] getArray()
public byte[] getBytes()
public java.lang.StringBuffer getConvertedData(int conversions)
The character set specified at instantiation (or default UTF-8) is used to interpret bytes into characters.
conversions
- either the value _NONE, else the addition of one or
more of the constants _ToCRLF, _ToLF, _1NewLineAtEnd, _ToUPPER, _ASCII7bits, _TrimNBSP
_NoCtrlBytes, _NoCRLFBytes, _NoBlankLine, _NoCtrlChars.
public DataFormat getFormat()
public DataFormat identify()
public static DataFormat identify(java.lang.String msg)
msg
- typically, a short string
public int length()
public static final int tokenValue(java.lang.String opt)
opt
- the named value
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |