public class TaggedFilterConfiguration extends Object
Extraction rules can handle the following cases:
Default rule - don't extract it.
INLINE - Elements that are included with text.
EXCLUDED -Element and children that should be excluded from extraction.
INCLUDED - Elements and children within EXCLUDED ranges that should be extracted.
GROUP - Elements that are grouped together structurally such as lists, tables etc.
ATTRIBUTES - Attributes on specific elements which should be extracted. May be translatable or localizable.
ATTRIBUTES ANY ELEMENT - Convenience rule for attributes which can occur on any element. May be translatable or localize.
TEXT UNIT - Elements whose start and end tags become part of a TextUnit rather than DocumentPart.
Any of the above rules may have conditional rules based on attribute names and/or values. Conditional rules may be attached to both elements and attributes. More than one conditional rules are evaluated as OR expressions. For example, "type=button" OR "type=default".
| Modifier and Type | Class and Description |
|---|---|
static class |
TaggedFilterConfiguration.RULE_TYPE
AbstractMarkupFilter rule types. |
| Modifier and Type | Field and Description |
|---|---|
static EnumSet<TaggedFilterConfiguration.RULE_TYPE> |
ATTRIBUTE_ON_ELEMENT_RULES |
static EnumSet<TaggedFilterConfiguration.RULE_TYPE> |
FAILED |
static EnumSet<TaggedFilterConfiguration.RULE_TYPE> |
INLINE_AND_EXCLUDE |
static EnumSet<TaggedFilterConfiguration.RULE_TYPE> |
INLINE_AND_EXCLUDE_FAIL |
static EnumSet<TaggedFilterConfiguration.RULE_TYPE> |
INLINE_AND_INCLUDE |
static EnumSet<TaggedFilterConfiguration.RULE_TYPE> |
INLINE_AND_INCLUDE_FAIL |
| Constructor and Description |
|---|
TaggedFilterConfiguration() |
TaggedFilterConfiguration(File configurationFile) |
TaggedFilterConfiguration(String configurationScript) |
TaggedFilterConfiguration(URL configurationPathAsResource) |
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> ATTRIBUTE_ON_ELEMENT_RULES
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> INLINE_AND_EXCLUDE
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> INLINE_AND_EXCLUDE_FAIL
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> INLINE_AND_INCLUDE
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> INLINE_AND_INCLUDE_FAIL
public static final EnumSet<TaggedFilterConfiguration.RULE_TYPE> FAILED
public TaggedFilterConfiguration()
public TaggedFilterConfiguration(URL configurationPathAsResource)
public TaggedFilterConfiguration(File configurationFile)
public TaggedFilterConfiguration(String configurationScript)
public YamlConfigurationReader getConfigReader()
public boolean isGlobalPreserveWhitespace()
public boolean isGlobalExcludeByDefault()
public boolean isWellformed()
public boolean isInlineCdata()
public boolean isUseCodeFinder()
public boolean getBooleanParameter(String parameterName)
public int getIntegerParameter(String parameterName)
public String getGlobalPCDATASubfilter()
public String getGlobalCDATASubfilter()
public String getCodeFinderRules()
public String getElementType(net.htmlparser.jericho.Tag element)
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getAttributeRuleTypes(String attribute, String tag, Map<String,String> attributes)
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getAttributeRuleTypes(String attribute)
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getAttributeRuleTypes(String attribute, String tag)
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getAttributeOnElementRuleTypes(String tag, String attribute, Map<String,String> attributes)
TaggedFilterConfiguration.RULE_TYPEs for attributes found on element rules.tag - attribute - attributes - public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getElementRuleTypes(String tag, Map<String,String> attributes, boolean isStartTag)
public EnumSet<TaggedFilterConfiguration.RULE_TYPE> getElementRuleTypes(String tag, boolean isStartTag)
TaggedFilterConfiguration.RULE_TYPE
Any rules with conditions are automatically false since we have no attributes.tag - the markup tag (converted to lowercase for search)isEndTag - is this tag an ending tag?TaggedFilterConfiguration.RULE_TYPE as an EnumSetpublic boolean doesElementRuleConditionApply(Map elementRule, Map<String,String> attributes)
public boolean isTranslatableAttribute(String tag, String attribute, Map<String,String> attributes)
public boolean isReadOnlyLocalizableAttribute(String tag, String attribute, Map<String,String> attributes)
public boolean isWritableLocalizableAttribute(String tag, String attribute, Map<String,String> attributes)
public boolean isIdAttribute(String tag, String attribute, Map<String,String> attributes)
public boolean isPreserveWhitespaceCondition(String attribute, Map<String,String> attributes)
public boolean isDefaultWhitespaceCondition(String attribute, Map<String,String> attributes)
public String getSimplifierRules()
public void setSimplifierRules(String rules)
public boolean getQuoteModeDefined()
public void setQuoteModeDefined(boolean defined)
public int getQuoteMode()
public void setQuoteMode(String quoteMode)
Copyright © 2022. All rights reserved.