public class ContentFilter extends AbstractMarkupFilter
Filters Microsoft Office Word, Excel, and Powerpoint Documents. OpenXML is the format of these documents.
Since OpenXML files are Zip files that contain XML documents, OpenXMLFilter handles opening and processing the zip file, and instantiates this filter to process the XML documents.
This filter extends AbstractBaseMarkupFilter, which extends AbstractBaseFilter. It uses the Jericho parser to analyze the XML files.
The filter exhibits slightly differnt behavior depending on whether the XML file is Word, Excel, Powerpoint, or a chart in Word. The tags in these files are configured in yaml configuration files that specify the behavior of the tags. These configuration files are
SUB_FILTER| Constructor and Description |
|---|
ContentFilter(ConditionalParameters filterParams,
String partName) |
| Modifier and Type | Method and Description |
|---|---|
void |
displayOneEvent(Event event)
Logs information about the event fir the log level is FINEST.
|
protected boolean |
getBInSettingsFile() |
protected TaggedFilterConfiguration |
getConfig()
Get the current
TaggedFilterConfiguration. |
ParseType |
getConfigurationType() |
ConditionalParameters |
getFilterParameters() |
String |
getName()
Returns name of the filter.
|
YamlParameters |
getParameters()
Gets the current parameters for this filter.
|
ParseType |
getParseType() |
protected void |
handleComment(net.htmlparser.jericho.Tag tag)
Treats XML comments as DocumentParts.
|
protected void |
handleDocTypeDeclaration(net.htmlparser.jericho.Tag tag)
Treats XML doc type declaratons as DocumentParts.
|
protected void |
handleEndTag(net.htmlparser.jericho.EndTag endTag)
Handles end tags.
|
protected void |
handleMarkupDeclaration(net.htmlparser.jericho.Tag tag)
Treats XML markup declaratons as DocumentParts.
|
protected void |
handleProcessingInstruction(net.htmlparser.jericho.Tag tag)
Treats XML processing instructions as DocumentParts.
|
protected void |
handleServerCommon(net.htmlparser.jericho.Tag tag)
Treats XML server common tags as DocumentParts.
|
protected void |
handleServerCommonEscaped(net.htmlparser.jericho.Tag tag)
Treats server common escaped tags as DocumentParts.
|
protected void |
handleStartTag(net.htmlparser.jericho.StartTag startTag)
Handles a start tag.
|
protected void |
handleText(CharSequence text)
Handles text.
|
protected void |
handleXmlDeclaration(net.htmlparser.jericho.Tag tag)
Treats XML declaratons as DocumentParts.
|
protected String |
normalizeAttributeName(String attrName,
String attrValue,
net.htmlparser.jericho.Tag tag)
Normalizes naming of attributes whose values are the
encoding or a language name, so that they can be
automatically changed to the output encoding and output.
|
protected void |
setBInSettingsFile(boolean bInSettingsFile) |
void |
setParameters(IParameters params)
Sets new parameters for this filter.
|
void |
setUpConfig(ParseType filetype)
Sets the name of the Yaml configuration file for the current file type, reads the file, and sets the parameters.
|
String |
toString() |
addCodeToCurrentTextUnit, addCodeToCurrentTextUnit, addFilterEvent, addToDocumentPart, addToTextUnit, addToTextUnit, addToTextUnit, addToTextUnit, addToTextUnit, appendToFirstSkeletonPart, canStartNewTextUnit, close, createEventBuilder, createPropertyTextUnitPlaceholder, createPropertyTextUnitPlaceholders, detectEncoding, determineTagType, endDocumentPart, endFilter, endGroup, endTextUnit, getBufferedWhiteSpace, getCurrentDocName, getDocumentPartId, getEventBuilder, getGroupIdSequence, getParsedHeader, getRuleState, getTextUnitId, handleCdataSection, handleCharacterEntity, handleDocumentPart, handleNumericEntity, hasNext, isBOM, isDocumentEncoding, isInsideTextRun, isPreserveWhitespace, isUtf8Bom, isUtf8Encoding, isWhiteSpace, next, open, open, peekTempEvent, popTempEvent, postProcessTextUnit, preProcess, setCurrentDocName, setDocumentPartId, setGroupIdSequence, setMimeType, setPreserveWhitespace, setTextUnitId, setTextUnitMimeType, setTextUnitName, setTextUnitPreserveWhitespace, setTextUnitTranslatable, setTextUnitType, startDocumentPart, startFilter, startGroup, startGroup, startTextUnit, startTextUnit, startTextUnit, startTextUnit, updateEndTagRuleState, updateStartTagRuleStateaddConfiguration, addConfiguration, addConfiguration, addConfigurations, cancel, createEndFilterEvent, createFilterWriter, createSkeletonWriter, createStartFilterEvent, findConfiguration, getConfiguration, getConfigurations, getDisplayName, getDocumentId, getDocumentName, getEncoderManager, getEncoding, getFilterConfigurationMapper, getFilterWriter, getMimeType, getNewlineType, getParameters, getParametersClassName, getParentId, getSrcLoc, getTrgLoc, isCanceled, isGenerateSkeleton, isMultilingual, removeConfiguration, setDisplayName, setDocumentName, setEncoding, setFilterConfigurationMapper, setFilterWriter, setGenerateSkeleton, setMultilingual, setName, setNewlineType, setOptions, setParentId, setSrcLoc, setTrgLocclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitforEachRemaining, removepublic ContentFilter(ConditionalParameters filterParams, String partName)
public void displayOneEvent(Event event)
event - event to log information aboutpublic ParseType getParseType()
public void setUpConfig(ParseType filetype)
filetype - type of XML in the current fileprotected void handleText(CharSequence text)
handleText in class AbstractMarkupFiltertext - the text to be handledprotected void handleStartTag(net.htmlparser.jericho.StartTag startTag)
handleStartTag in class AbstractMarkupFilterstartTag - the start tag to processprotected void handleEndTag(net.htmlparser.jericho.EndTag endTag)
handleEndTag in class AbstractMarkupFilterendTag - the end tag to processprotected void handleComment(net.htmlparser.jericho.Tag tag)
handleComment in class AbstractMarkupFiltertag - comment tagprotected void handleDocTypeDeclaration(net.htmlparser.jericho.Tag tag)
handleDocTypeDeclaration in class AbstractMarkupFiltertag - doc type declaration tagprotected void handleMarkupDeclaration(net.htmlparser.jericho.Tag tag)
handleMarkupDeclaration in class AbstractMarkupFiltertag - markup declaration tagprotected void handleProcessingInstruction(net.htmlparser.jericho.Tag tag)
handleProcessingInstruction in class AbstractMarkupFiltertag - processing instruction tagprotected void handleServerCommon(net.htmlparser.jericho.Tag tag)
handleServerCommon in class AbstractMarkupFiltertag - server common tagprotected void handleServerCommonEscaped(net.htmlparser.jericho.Tag tag)
handleServerCommonEscaped in class AbstractMarkupFiltertag - server common escaped tagprotected void handleXmlDeclaration(net.htmlparser.jericho.Tag tag)
handleXmlDeclaration in class AbstractMarkupFiltertag - XML declaration tagpublic String getName()
getName in interface IFiltergetName in class AbstractFilterprotected String normalizeAttributeName(String attrName, String attrValue, net.htmlparser.jericho.Tag tag)
normalizeAttributeName in class AbstractMarkupFilterattrName - name of the attributeattrValue, - value of the attributetag - tag that contains the attributepublic ParseType getConfigurationType()
protected void setBInSettingsFile(boolean bInSettingsFile)
protected boolean getBInSettingsFile()
protected TaggedFilterConfiguration getConfig()
AbstractMarkupFilterTaggedFilterConfiguration. A TaggedFilterConfiguration is the result of reading in a YAML
configuration file and converting it into Java Objects.getConfig in class AbstractMarkupFilterTaggedFilterConfigurationpublic YamlParameters getParameters()
IFiltergetParameters in interface IFiltergetParameters in class AbstractFilterpublic void setParameters(IParameters params)
IFiltersetParameters in interface IFiltersetParameters in class AbstractFilterparams - The new parameters to use.public ConditionalParameters getFilterParameters()
Copyright © 2021. All rights reserved.