org.cyberneko.html

Class HTMLConfiguration

Implemented Interfaces:
XMLPullParserConfiguration

public class HTMLConfiguration
extends ParserConfigurationSettings
implements XMLPullParserConfiguration

An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

This configuration recognizes the following features:

This configuration recognizes the following properties:

For complete usage information, refer to the documentation.

Version:
$Id: HTMLConfiguration.java,v 1.9 2005/02/14 03:56:54 andyc Exp $
Author:
Andy Clark
See Also:
HTMLScanner, HTMLTagBalancer, HTMLErrorReporter

Nested Class Summary

protected class
HTMLConfiguration.ErrorReporter
Defines an error reporter for reporting HTML errors.

Field Summary

protected static String
AUGMENTATIONS
Include infoset augmentations.
protected static String
BALANCE_TAGS
Balance tags.
protected static String
ERROR_DOMAIN
Error domain.
protected static String
ERROR_REPORTER
Error reporter.
protected static String
FILTERS
Pipeline filters.
protected static String
NAMESPACES
Namespaces.
protected static String
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.
protected static String
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.
protected static String
REPORT_ERRORS
Report errors.
protected static String
SIMPLE_ERROR_FORMAT
Simple report format.
protected static boolean
XERCES_2_0_0
Parser version is Xerces 2.0.0.
protected static boolean
XERCES_2_0_1
Parser version is Xerces 2.0.1.
protected static boolean
XML4J_4_0_x
Parser version is XML4J 4.0.x.
protected boolean
fCloseStream
Stream opened by parser.
protected XMLDTDContentModelHandler
fDTDContentModelHandler
DTD content model handler.
protected XMLDTDHandler
fDTDHandler
DTD handler.
protected XMLDocumentHandler
fDocumentHandler
Document handler.
protected HTMLScanner
fDocumentScanner
Document scanner.
protected XMLEntityResolver
fEntityResolver
Entity resolver.
protected XMLErrorHandler
fErrorHandler
Error handler.
protected HTMLErrorReporter
fErrorReporter
Error reporter.
protected Vector
fHTMLComponents
Components.
protected Locale
fLocale
Locale.
protected NamespaceBinder
fNamespaceBinder
Namespace binder.
protected HTMLTagBalancer
fTagBalancer
HTML tag balancer.

Constructor Summary

HTMLConfiguration()
Default constructor.

Method Summary

protected void
addComponent(HTMLComponent component)
Adds a component.
void
cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.
XMLDTDContentModelHandler
getDTDContentModelHandler()
Returns the DTD content model handler.
XMLDTDHandler
getDTDHandler()
Returns the DTD handler.
XMLDocumentHandler
getDocumentHandler()
Returns the document handler.
XMLEntityResolver
getEntityResolver()
Returns the entity resolver.
XMLErrorHandler
getErrorHandler()
Returns the error handler.
Locale
getLocale()
Returns the locale.
void
parse(XMLInputSource source)
Parses a document.
boolean
parse(boolean complete)
Parses the document in a pull parsing fashion.
void
pushInputSource(XMLInputSource inputSource)
Pushes an input source onto the current entity stack.
protected void
reset()
Resets the parser configuration.
void
setDTDContentModelHandler(XMLDTDContentModelHandler handler)
Sets the DTD content model handler.
void
setDTDHandler(XMLDTDHandler handler)
Sets the DTD handler.
void
setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.
void
setEntityResolver(XMLEntityResolver resolver)
Sets the entity resolver.
void
setErrorHandler(XMLErrorHandler handler)
Sets the error handler.
void
setFeature(String featureId, boolean state)
Sets a feature.
void
setInputSource(XMLInputSource inputSource)
Sets the input source for the document to parse.
void
setLocale(Locale locale)
Sets the locale.
void
setProperty(String propertyId, Object value)
Sets a property.

Field Details

AUGMENTATIONS

protected static final String AUGMENTATIONS
Include infoset augmentations.

BALANCE_TAGS

protected static final String BALANCE_TAGS
Balance tags.

ERROR_DOMAIN

protected static final String ERROR_DOMAIN
Error domain.

ERROR_REPORTER

protected static final String ERROR_REPORTER
Error reporter.

FILTERS

protected static final String FILTERS
Pipeline filters.

NAMESPACES

protected static final String NAMESPACES
Namespaces.

NAMES_ATTRS

protected static final String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.

NAMES_ELEMS

protected static final String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.

REPORT_ERRORS

protected static final String REPORT_ERRORS
Report errors.

SIMPLE_ERROR_FORMAT

protected static final String SIMPLE_ERROR_FORMAT
Simple report format.

XERCES_2_0_0

protected static boolean XERCES_2_0_0
Parser version is Xerces 2.0.0.

XERCES_2_0_1

protected static boolean XERCES_2_0_1
Parser version is Xerces 2.0.1.

XML4J_4_0_x

protected static boolean XML4J_4_0_x
Parser version is XML4J 4.0.x.

fCloseStream

protected boolean fCloseStream
Stream opened by parser. Therefore, must close stream manually upon termination of parsing.

fDTDContentModelHandler

protected XMLDTDContentModelHandler fDTDContentModelHandler
DTD content model handler.

fDTDHandler

protected XMLDTDHandler fDTDHandler
DTD handler.

fDocumentHandler

protected XMLDocumentHandler fDocumentHandler
Document handler.

fDocumentScanner

protected HTMLScanner fDocumentScanner
Document scanner.

fEntityResolver

protected XMLEntityResolver fEntityResolver
Entity resolver.

fErrorHandler

protected XMLErrorHandler fErrorHandler
Error handler.

fErrorReporter

protected HTMLErrorReporter fErrorReporter
Error reporter.

fHTMLComponents

protected Vector fHTMLComponents
Components.

fLocale

protected Locale fLocale
Locale.

fNamespaceBinder

protected NamespaceBinder fNamespaceBinder
Namespace binder.

fTagBalancer

protected HTMLTagBalancer fTagBalancer
HTML tag balancer.

Constructor Details

HTMLConfiguration

public HTMLConfiguration()
Default constructor.

Method Details

addComponent

protected void addComponent(HTMLComponent component)
Adds a component.

cleanup

public void cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.

getDTDContentModelHandler

public XMLDTDContentModelHandler getDTDContentModelHandler()
Returns the DTD content model handler.

getDTDHandler

public XMLDTDHandler getDTDHandler()
Returns the DTD handler.

getDocumentHandler

public XMLDocumentHandler getDocumentHandler()
Returns the document handler.

getEntityResolver

public XMLEntityResolver getEntityResolver()
Returns the entity resolver.

getErrorHandler

public XMLErrorHandler getErrorHandler()
Returns the error handler.

getLocale

public Locale getLocale()
Returns the locale.

parse

public void parse(XMLInputSource source)
            throws XNIException,
                   IOException
Parses a document.

parse

public boolean parse(boolean complete)
            throws XNIException,
                   IOException
Parses the document in a pull parsing fashion.
Parameters:
complete - True if the pull parser should parse the remaining document completely.
Returns:
True if there is more document to parse.

pushInputSource

public void pushInputSource(XMLInputSource inputSource)
Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.

Parameters:
inputSource - The new input source to start scanning.

reset

protected void reset()
            throws XMLConfigurationException
Resets the parser configuration.

setDTDContentModelHandler

public void setDTDContentModelHandler(XMLDTDContentModelHandler handler)
Sets the DTD content model handler.

setDTDHandler

public void setDTDHandler(XMLDTDHandler handler)
Sets the DTD handler.

setDocumentHandler

public void setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.

setEntityResolver

public void setEntityResolver(XMLEntityResolver resolver)
Sets the entity resolver.

setErrorHandler

public void setErrorHandler(XMLErrorHandler handler)
Sets the error handler.

setFeature

public void setFeature(String featureId,
                       boolean state)
            throws XMLConfigurationException
Sets a feature.

setInputSource

public void setInputSource(XMLInputSource inputSource)
            throws XMLConfigurationException,
                   IOException
Sets the input source for the document to parse.
Parameters:
inputSource - The document's input source.

setLocale

public void setLocale(Locale locale)
Sets the locale.

setProperty

public void setProperty(String propertyId,
                        Object value)
            throws XMLConfigurationException
Sets a property.

(C) Copyright 2002-2005, Andy Clark. All rights reserved.