Class TaggedFilterConfiguration


  • public class TaggedFilterConfiguration
    extends Object
    Defines extraction rules useful for markup languages such as HTML and XML.

    Extraction rules can handle the following cases:

    Default rule - don't extract it.

    INLINE - Elements that are included with text.

    EXCLUDED -Element and children that should be excluded from extraction.

    INCLUDED - Elements and children within EXCLUDED ranges that should be extracted.

    GROUP - Elements that are grouped together structurally such as lists, tables etc.

    ATTRIBUTES - Attributes on specific elements which should be extracted. May be translatable or localizable.

    ATTRIBUTES ANY ELEMENT - Convenience rule for attributes which can occur on any element. May be translatable or localize.

    TEXT UNIT - Elements whose start and end tags become part of a TextUnit rather than DocumentPart.

    Any of the above rules may have conditional rules based on attribute names and/or values. Conditional rules may be attached to both elements and attributes. More than one conditional rules are evaluated as OR expressions. For example, "type=button" OR "type=default".