Class RegexPlainTextFilter
- java.lang.Object
-
- net.sf.okapi.common.filters.AbstractFilter
-
- net.sf.okapi.filters.plaintext.regex.RegexPlainTextFilter
-
- All Implemented Interfaces:
AutoCloseable,Iterator<Event>,IFilter
public class RegexPlainTextFilter extends AbstractFilter
PlainTextFilterextracts lines of input text, separated by line terminators. The filter is aware of the following line terminators:- Carriage return character followed immediately by a newline character ("\r\n")
- Newline (line feed) character ("\n")
- Stand-alone carriage return character ("\r")
- Next line character (" ")
- Line separator character (" ")
- Paragraph separator character (" ").
- Version:
- 0.1, 09.06.2009
-
-
Field Summary
Fields Modifier and Type Field Description static StringFILTER_CONFIGstatic StringFILTER_CONFIG_LINESstatic StringFILTER_CONFIG_PARAGRAPHSstatic StringFILTER_MIMEstatic StringFILTER_NAME-
Fields inherited from interface net.sf.okapi.common.filters.IFilter
SUB_FILTER
-
-
Constructor Summary
Constructors Constructor Description RegexPlainTextFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcancel()Cancels the current process.voidclose()Closes the input document.IFilterWritercreateFilterWriter()Creates a new IFilterWriter object from the most appropriate class to use with this filter.ISkeletonWritercreateSkeletonWriter()Creates a new ISkeletonWriter object that corresponds to the type of skeleton this filter uses.StringgetMimeType()Gets the input document mime type.StringgetName()Gets the name/identifier of this filter.ParametersgetParameters()Gets the current parameters for this filter.ParametersgetRegexParameters()Provides access to the internal line extractor'sParametersobject.booleanhasNext()Indicates if there is an event to process.Eventnext()Gets the next event available.voidopen(RawDocument input)Opens the input document described in a give RawDocument object.voidopen(RawDocument input, boolean generateSkeleton)Opens the input document described in a give RawDocument object, and optionally creates skeleton information.voidsetParameters(IParameters params)Sets new parameters for this filter.voidsetRule(String rule, int sourceGroup, int regexOptions)Configures an internal line extractor.-
Methods inherited from class net.sf.okapi.common.filters.AbstractFilter
addConfiguration, addConfiguration, addConfiguration, addConfigurations, createEndFilterEvent, createStartFilterEvent, findConfiguration, getConfiguration, getConfigurations, getDisplayName, getDocumentId, getDocumentName, getEncoderManager, getEncoding, getFilterConfigurationMapper, getFilterWriter, getNewlineType, getParameters, getParametersClassName, getParentId, getSrcLoc, getTrgLoc, isCanceled, isGenerateSkeleton, isMultilingual, isUtf8Bom, isUtf8Encoding, removeConfiguration, setDisplayName, setDocumentName, setEncoding, setFilterConfigurationMapper, setFilterWriter, setGenerateSkeleton, setMimeType, setMultilingual, setName, setNewlineType, setOptions, setParentId, setSrcLoc, setTrgLoc
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface java.util.Iterator
forEachRemaining, remove
-
-
-
-
Field Detail
-
FILTER_NAME
public static final String FILTER_NAME
- See Also:
- Constant Field Values
-
FILTER_MIME
public static final String FILTER_MIME
- See Also:
- Constant Field Values
-
FILTER_CONFIG
public static final String FILTER_CONFIG
- See Also:
- Constant Field Values
-
FILTER_CONFIG_LINES
public static final String FILTER_CONFIG_LINES
- See Also:
- Constant Field Values
-
FILTER_CONFIG_PARAGRAPHS
public static final String FILTER_CONFIG_PARAGRAPHS
- See Also:
- Constant Field Values
-
-
Method Detail
-
setRule
public void setRule(String rule, int sourceGroup, int regexOptions)
Configures an internal line extractor. If you want to set a custom rule, call this method with a modified rule.- Parameters:
rule- - Java regex rule used to extract lines of text. Default: "^(.*?)$".sourceGroup- - regex capturing group denoting text to be extracted. Default: 1.regexOptions- - Java regex options. Default: Pattern.MULTILINE.
-
getRegexParameters
public Parameters getRegexParameters()
Provides access to the internal line extractor'sParametersobject.- Returns:
Parametersobject; with this object you can access the line extraction rule, source group, regex options, etc.
-
cancel
public void cancel()
Description copied from interface:IFilterCancels the current process.- Specified by:
cancelin interfaceIFilter- Overrides:
cancelin classAbstractFilter
-
close
public void close()
Description copied from interface:IFilterCloses the input document. Developers should call this method from within their code before sending the last event: This can allow writer objects to overwrite the input file when they receive the last event. This method must also be safe to call even if the input document is not opened.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceIFilter- Overrides:
closein classAbstractFilter
-
createFilterWriter
public IFilterWriter createFilterWriter()
Description copied from interface:IFilterCreates a new IFilterWriter object from the most appropriate class to use with this filter.- Specified by:
createFilterWriterin interfaceIFilter- Overrides:
createFilterWriterin classAbstractFilter- Returns:
- A new instance of IFilterWriter for the preferred implementation for this filter.
-
createSkeletonWriter
public ISkeletonWriter createSkeletonWriter()
Description copied from interface:IFilterCreates a new ISkeletonWriter object that corresponds to the type of skeleton this filter uses.- Specified by:
createSkeletonWriterin interfaceIFilter- Overrides:
createSkeletonWriterin classAbstractFilter- Returns:
- A new instance of ISkeletonWriter for the type of skeleton this filter uses.
-
getMimeType
public String getMimeType()
Description copied from class:AbstractFilterGets the input document mime type.- Specified by:
getMimeTypein interfaceIFilter- Overrides:
getMimeTypein classAbstractFilter- Returns:
- the mime type
-
getName
public String getName()
Description copied from interface:IFilterGets the name/identifier of this filter.- Specified by:
getNamein interfaceIFilter- Overrides:
getNamein classAbstractFilter- Returns:
- The name/identifier of the filter.
-
getParameters
public Parameters getParameters()
Description copied from interface:IFilterGets the current parameters for this filter.- Specified by:
getParametersin interfaceIFilter- Overrides:
getParametersin classAbstractFilter- Returns:
- The current parameters for this filter, or null if this filter has no parameters.
-
hasNext
public boolean hasNext()
Description copied from interface:IFilterIndicates if there is an event to process.Implementer Note: The caller must be able to call this method several times without changing state.
- Returns:
- True if there is at least one event to process, false if not.
-
next
public Event next()
Description copied from interface:IFilterGets the next event available. Calling this method can be done only once on each event.- Returns:
- The next event available or null if there are no events.
-
open
public void open(RawDocument input)
Description copied from interface:IFilterOpens the input document described in a give RawDocument object. Skeleton information is always created when you use this method.- Parameters:
input- The RawDocument object to use to open the document.
-
open
public void open(RawDocument input, boolean generateSkeleton)
Description copied from interface:IFilterOpens the input document described in a give RawDocument object, and optionally creates skeleton information.- Specified by:
openin interfaceIFilter- Overrides:
openin classAbstractFilter- Parameters:
input- The RawDocument object to use to open the document.generateSkeleton- true to generate the skeleton data, false otherwise.
-
setParameters
public void setParameters(IParameters params)
Description copied from interface:IFilterSets new parameters for this filter.- Specified by:
setParametersin interfaceIFilter- Overrides:
setParametersin classAbstractFilter- Parameters:
params- The new parameters to use.
-
-