org.apache.directory.api.util
Class Unicode

java.lang.Object
  extended by org.apache.directory.api.util.Unicode

public final class Unicode
extends Object

Various unicode manipulation methods that are more efficient then chaining operations: all is done in the same buffer without creating a bunch of string objects.

Author:
Apache Directory Project

Constructor Summary
Unicode()
           
 
Method Summary
static char bytesToChar(byte[] bytes)
          Return the Unicode char which is coded in the bytes at position 0.
static char bytesToChar(byte[] bytes, int pos)
          Return the Unicode char which is coded in the bytes at the given position.
static byte[] charToBytes(char car)
          Return the Unicode char which is coded in the bytes at the given position.
static int countBytes(char[] chars)
          Count the number of bytes included in the given char[].
static int countBytesPerChar(byte[] bytes, int pos)
          Count the number of bytes needed to return an Unicode char.
static int countChars(byte[] bytes)
          Count the number of chars included in the given byte[].
static int countNbBytesPerChar(char car)
          Return the number of bytes that hold an Unicode char.
static boolean isUnicodeSubset(byte b)
          Check if the current byte is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'
static boolean isUnicodeSubset(char c)
          Check if the current char is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'
static boolean isUnicodeSubset(String str, int pos)
          Check if the current char is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'
static String readUTF(ObjectInput objectInput)
          Reads in a string that has been encoded using a modified UTF-8 format.
static void writeUTF(ObjectOutput objectOutput, String str)
          Writes four bytes of length information to the output stream, followed by the modified UTF-8 representation of every character in the string str.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Unicode

public Unicode()
Method Detail

countBytesPerChar

public static int countBytesPerChar(byte[] bytes,
                                    int pos)
Count the number of bytes needed to return an Unicode char. This can be from 1 to 6.

Parameters:
bytes - The bytes to read
pos - Position to start counting. It must be a valid start of a encoded char !
Returns:
The number of bytes to create a char, or -1 if the encoding is wrong. TODO : Should stop after the third byte, as a char is only 2 bytes long.

bytesToChar

public static char bytesToChar(byte[] bytes)
Return the Unicode char which is coded in the bytes at position 0.

Parameters:
bytes - The byte[] represntation of an Unicode string.
Returns:
The first char found.

bytesToChar

public static char bytesToChar(byte[] bytes,
                               int pos)
Return the Unicode char which is coded in the bytes at the given position.

Parameters:
bytes - The byte[] represntation of an Unicode string.
pos - The current position to start decoding the char
Returns:
The decoded char, or -1 if no char can be decoded TODO : Should stop after the third byte, as a char is only 2 bytes long.

countNbBytesPerChar

public static int countNbBytesPerChar(char car)
Return the number of bytes that hold an Unicode char.

Parameters:
car - The character to be decoded
Returns:
The number of bytes to hold the char. TODO : Should stop after the third byte, as a char is only 2 bytes long.

countBytes

public static int countBytes(char[] chars)
Count the number of bytes included in the given char[].

Parameters:
chars - The char array to decode
Returns:
The number of bytes in the char array

countChars

public static int countChars(byte[] bytes)
Count the number of chars included in the given byte[].

Parameters:
bytes - The byte array to decode
Returns:
The number of char in the byte array

charToBytes

public static byte[] charToBytes(char car)
Return the Unicode char which is coded in the bytes at the given position.

Parameters:
car - The character to be transformed to an array of bytes
Returns:
The byte array representing the char TODO : Should stop after the third byte, as a char is only 2 bytes long.

isUnicodeSubset

public static boolean isUnicodeSubset(String str,
                                      int pos)
Check if the current char is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'

Parameters:
str - The string to check
pos - Position of the current char
Returns:
True if the current char is in the unicode subset

isUnicodeSubset

public static boolean isUnicodeSubset(char c)
Check if the current char is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'

Parameters:
c - The char to check
Returns:
True if the current char is in the unicode subset

isUnicodeSubset

public static boolean isUnicodeSubset(byte b)
Check if the current byte is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'

Parameters:
b - The byte to check
Returns:
True if the current byte is in the unicode subset

writeUTF

public static void writeUTF(ObjectOutput objectOutput,
                            String str)
                     throws IOException
Writes four bytes of length information to the output stream, followed by the modified UTF-8 representation of every character in the string str. If str is null, the string value 'null' is written with a length of 0 instead of throwing an NullPointerException. Each character in the string s is converted to a group of one, two, or three bytes, depending on the value of the character. Due to given restrictions (total number of written bytes in a row can't exceed 65535) the total length is written in the length information (four bytes (writeInt)) and the string is split into smaller parts if necessary and written. As each character may be converted to a group of maximum 3 bytes and 65535 bytes can be written at maximum we're on the save side when writing a chunk of only 21845 (65535/3) characters at once. See also DataOutput.writeUTF(String).

Parameters:
objectOutput - The objectOutput to write to
str - The value to write
Throws:
IOException - If the value can't be written to the file

readUTF

public static String readUTF(ObjectInput objectInput)
                      throws IOException
Reads in a string that has been encoded using a modified UTF-8 format. The general contract of readUTF is that it reads a representation of a Unicode character string encoded in modified UTF-8 format; this string of characters is then returned as a String. First, four bytes are read (readInt) and used to construct an unsigned 16-bit integer in exactly the manner of the readUnsignedShort method . This integer value is called the UTF length and specifies the number of additional bytes to be read. These bytes are then converted to characters by considering them in groups. The length of each group is computed from the value of the first byte of the group. The byte following a group, if any, is the first byte of the next group. See also DataInput.readUTF().

Parameters:
objectInput - The objectInput to read from
Returns:
The read string
Throws:
IOException - If the value can't be read


Copyright © 2003-2013 The Apache Software Foundation. All Rights Reserved.