public class PdfTokenizer extends Object implements Closeable
| Modifier and Type | Class and Description |
|---|---|
static class |
PdfTokenizer.TokenType |
| Modifier and Type | Field and Description |
|---|---|
static byte[] |
F |
static byte[] |
False |
protected int |
generation |
protected boolean |
hexString |
static byte[] |
N |
static byte[] |
Null |
static byte[] |
Obj |
protected ByteBuffer |
outBuf |
static byte[] |
R |
protected int |
reference |
static byte[] |
Startxref |
static byte[] |
Stream |
static byte[] |
Trailer |
static byte[] |
True |
protected PdfTokenizer.TokenType |
type |
static byte[] |
Xref |
| Constructor and Description |
|---|
PdfTokenizer(RandomAccessFileOrArray file)
Creates a PdfTokenizer for the specified
RandomAccessFileOrArray. |
| Modifier and Type | Method and Description |
|---|---|
void |
backOnePosition(int ch) |
void |
checkFdfHeader() |
static int[] |
checkObjectStart(PdfTokenizer lineTokenizer)
Check whether line starts with object declaration.
|
String |
checkPdfHeader() |
static boolean |
checkTrailer(ByteBuffer line)
Checks whether
line equals to 'trailer'. |
void |
close() |
static byte[] |
decodeStringContent(byte[] content,
boolean hexWriting)
Resolve escape symbols or hexadecimal symbols.
|
protected static byte[] |
decodeStringContent(byte[] content,
int from,
int to,
boolean hexWriting)
Resolve escape symbols or hexadecimal symbols.
|
byte[] |
getByteContent() |
byte[] |
getDecodedStringContent() |
int |
getGenNr() |
int |
getHeaderOffset() |
int |
getIntValue() |
long |
getLongValue() |
long |
getNextEof()
Gets next %%EOF marker in current PDF file.
|
int |
getObjNr() |
long |
getPosition() |
RandomAccessFileOrArray |
getSafeFile() |
long |
getStartxref() |
String |
getStringValue() |
PdfTokenizer.TokenType |
getTokenType() |
boolean |
isCloseStream() |
protected static boolean |
isDelimiter(int ch) |
protected static boolean |
isDelimiterWhitespace(int ch) |
boolean |
isHexString() |
static boolean |
isWhitespace(int ch)
Is a certain character a whitespace? Currently checks on the following: '0', '9', '10', '12', '13', '32'.
|
protected static boolean |
isWhitespace(int ch,
boolean isWhitespace)
Checks whether a character is a whitespace.
|
long |
length() |
boolean |
nextToken() |
void |
nextValidToken() |
int |
peek()
Gets the next byte of pdf source without moving source position.
|
int |
peek(byte[] buffer)
Gets the next
buffer.length bytes of pdf source without moving source position. |
int |
read() |
void |
readFully(byte[] bytes) |
boolean |
readLineSegment(ByteBuffer buffer)
Reads data into the provided byte[].
|
boolean |
readLineSegment(ByteBuffer buffer,
boolean isNullWhitespace)
Reads data into the provided byte[].
|
String |
readString(int size) |
void |
seek(long pos) |
void |
setCloseStream(boolean closeStream) |
void |
throwError(String error,
Object... messageParams)
Helper method to handle content errors.
|
boolean |
tokenValueEqualsTo(byte[] cmp) |
public static final byte[] Obj
public static final byte[] R
public static final byte[] Xref
public static final byte[] Startxref
public static final byte[] Stream
public static final byte[] Trailer
public static final byte[] N
public static final byte[] F
public static final byte[] Null
public static final byte[] True
public static final byte[] False
protected PdfTokenizer.TokenType type
protected int reference
protected int generation
protected boolean hexString
protected ByteBuffer outBuf
public PdfTokenizer(RandomAccessFileOrArray file)
RandomAccessFileOrArray.
The beginning of the file is read to determine the location of the header, and the data source is adjusted
as necessary to account for any junk that occurs in the byte source before the headerfile - the sourcepublic void seek(long pos)
public void readFully(byte[] bytes)
throws IOException
IOExceptionpublic long getPosition()
public void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableIOExceptionpublic long length()
public int read()
throws IOException
IOExceptionpublic int peek()
throws IOException
IOException - in case of any reading error.public int peek(byte[] buffer)
throws IOException
buffer.length bytes of pdf source without moving source position.buffer - buffer to store read bytesbuffer.length it means EOF has been reached.IOException - in case of any reading error.public String readString(int size) throws IOException
IOExceptionpublic PdfTokenizer.TokenType getTokenType()
public byte[] getByteContent()
public String getStringValue()
public byte[] getDecodedStringContent()
public boolean tokenValueEqualsTo(byte[] cmp)
public int getObjNr()
public int getGenNr()
public void backOnePosition(int ch)
public int getHeaderOffset()
throws IOException
IOExceptionpublic String checkPdfHeader() throws IOException
IOExceptionpublic void checkFdfHeader()
throws IOException
IOExceptionpublic long getStartxref()
throws IOException
IOExceptionpublic long getNextEof()
throws IOException
IOException - in case of input-output related exceptions during PDF document readingpublic void nextValidToken()
throws IOException
IOExceptionpublic boolean nextToken()
throws IOException
IOExceptionpublic long getLongValue()
public int getIntValue()
public boolean isHexString()
public boolean isCloseStream()
public void setCloseStream(boolean closeStream)
public RandomAccessFileOrArray getSafeFile()
protected static byte[] decodeStringContent(byte[] content,
int from,
int to,
boolean hexWriting)
NOTE Due to PdfReference 1.7 part 3.2.3 String value contain ASCII characters, so we can convert it directly to byte array.
content - string bytes to be decodedfrom - given start indexto - given end indexhexWriting - true if given string is hex-encoded, e.g. '<69546578…>'.
False otherwise, e.g. '((iText( some version)…)'String.public static byte[] decodeStringContent(byte[] content,
boolean hexWriting)
content - string bytes to be decodedhexWriting - true if given string is hex-encoded, e.g. '<69546578…>'.
False otherwise, e.g. '((iText( some version)…)'String.public static boolean isWhitespace(int ch)
isWhiteSpace(ch, true).ch - intprotected static boolean isWhitespace(int ch,
boolean isWhitespace)
ch - intisWhitespace - booleanprotected static boolean isDelimiter(int ch)
protected static boolean isDelimiterWhitespace(int ch)
public void throwError(String error, Object... messageParams)
PdfRuntimeException.error - message.messageParams - error params.IOException - wrap error message into PdfRuntimeException and add position in file.public static boolean checkTrailer(ByteBuffer line)
line equals to 'trailer'.line - for checkpublic boolean readLineSegment(ByteBuffer buffer) throws IOException
isWhiteSpace(int) or isWhiteSpace(int, boolean)
for a list of whitespace characters.
readLineSegment(input, true).buffer - a ByteBuffer to which the result of reading will be savedIOException - in case of any reading errorpublic boolean readLineSegment(ByteBuffer buffer, boolean isNullWhitespace) throws IOException
isWhiteSpace(int) or isWhiteSpace(int, boolean)
for a list of whitespace characters.buffer - a ByteBuffer to which the result of reading will be savedisNullWhitespace - boolean to indicate whether '0' is whitespace or not.
If in doubt, use true or overloaded method readLineSegment(input)IOException - in case of any reading errorpublic static int[] checkObjectStart(PdfTokenizer lineTokenizer)
lineTokenizer - tokenizer, built by single line.Copyright © 1998–2025 Apryse Group NV. All rights reserved.