public class HtmlParserImpl extends GenericParser implements HtmlParser
This is the main class in the package. It implements the
HtmlParser
interface.
This class is not thread-safe, in particular you cannot invoke any
state changing operations (such as parse
from multiple threads
on the same object.
If you are looking at this class, chances are very high you are implementing Auto-Escaping for a new template system. Please see the landing page including a design document at Auto-Escape Landing Page.
HtmlParser.ATTR_TYPE, HtmlParser.Mode
columnNumber, currentState, initialState, intToExtStateTable, lineNumber, parserStateTable
STATE_ATTR, STATE_COMMENT, STATE_CSS_FILE, STATE_JS_FILE, STATE_TAG, STATE_TEXT, STATE_VALUE
STATE_ERROR
Constructor and Description |
---|
HtmlParserImpl()
Creates an
HtmlParserImpl object. |
HtmlParserImpl(HtmlParserImpl aHtmlParserImpl)
Creates an
HtmlParserImpl that is a copy of the one provided. |
Modifier and Type | Method and Description |
---|---|
String |
getAttribute()
Returns the name of the HTML attribute the parser is currently processing.
|
HtmlParser.ATTR_TYPE |
getAttributeType()
Returns the type of the attribute that the parser is in
or
ATTR_TYPE.NONE if we are not parsing an attribute. |
ExternalState |
getJavascriptState()
Returns the state the Javascript parser is in.
|
String |
getTag()
Returns the name of the HTML tag if the parser is currently within one.
|
String |
getValue()
Returns the value of an HTML attribute if the parser is currently
within one.
|
int |
getValueIndex()
Returns the current position of the parser within the HTML attribute
value, zero being the position of the first character in the value.
|
protected com.google.streamhtmlparser.impl.InternalState |
handleEnterState(com.google.streamhtmlparser.impl.InternalState currentState,
com.google.streamhtmlparser.impl.InternalState expectedNextState,
char input)
Invoked when the parser enters a new state.
|
protected com.google.streamhtmlparser.impl.InternalState |
handleExitState(com.google.streamhtmlparser.impl.InternalState currentState,
com.google.streamhtmlparser.impl.InternalState expectedNextState,
char input)
Invoked when the parser exits a state.
|
protected com.google.streamhtmlparser.impl.InternalState |
handleInState(com.google.streamhtmlparser.impl.InternalState currentState,
char input)
Invoked for each character read when no state change occured.
|
boolean |
inAttribute()
Returns
true if and only if the parser is currently within
an attribute, be it within the attribute name or the attribute value. |
boolean |
inCss()
Returns
true if and only if the parser is currently within
a CSS context. |
boolean |
inJavascript()
Returns
true if the parser is currently processing Javascript. |
void |
insertText()
A specialized directive to tell the parser there is some content
that will be inserted here but that it will not get to parse.
|
boolean |
isAttributeQuoted()
Returns
true if and only if the parser is currently within
an attribute value and that attribute value is quoted. |
boolean |
isJavascriptQuoted()
Returns
true if the parser is currently processing
a Javascript litteral that is quoted. |
boolean |
isUrlStart()
Returns
true if and only if the current position of the parser is
at the start of a URL HTML attribute value. |
protected void |
record(char input)
Invokes recording on all CharacterRecorder objects.
|
void |
reset()
Resets the state of the parser to the initial state of parsing HTML.
|
void |
resetMode(HtmlParser.Mode mode)
Resets the state of the parser, allowing for reuse of the
HtmlParser object. |
getColumnNumber, getLineNumber, getState, parse, parse, setColumnNumber, setLineNumber, setNextState
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getColumnNumber, getLineNumber, getState, parse, parse, setColumnNumber, setLineNumber
public HtmlParserImpl()
HtmlParserImpl
object.
Both for performance reasons and to leverage code a state-flow machine
that is automatically generated from Python for multiple target
languages, this object uses a static ParserStateTable
that
is read-only and obtained from the generated code in HtmlParserFsm
.
That code also maintains the mapping from internal states
(InternalState
) to external states (ExternalState
).
public HtmlParserImpl(HtmlParserImpl aHtmlParserImpl)
HtmlParserImpl
that is a copy of the one provided.aHtmlParserImpl
- the HtmlParserImpl
object to copypublic boolean inJavascript()
HtmlParser
true
if the parser is currently processing Javascript.
Such is the case if and only if, the parser is processing an attribute
that takes Javascript, a Javascript script block or the parser
is (re)set with HtmlParser.Mode.JS
.inJavascript
in interface HtmlParser
true
if the parser is processing Javascript,
false
otherwisepublic boolean isJavascriptQuoted()
HtmlParser
true
if the parser is currently processing
a Javascript litteral that is quoted. The caller will typically
invoke this method after determining that the parser is processing
Javascript. Knowing whether the element is quoted or not helps
determine which escaping to apply to it when needed.isJavascriptQuoted
in interface HtmlParser
true
if and only if the parser is inside a quoted
Javascript literalpublic boolean inAttribute()
HtmlParser
true
if and only if the parser is currently within
an attribute, be it within the attribute name or the attribute value.inAttribute
in interface HtmlParser
true
if and only if inside an attributepublic boolean inCss()
true
if and only if the parser is currently within
a CSS context. A CSS context is one of the below:
inCss
in interface HtmlParser
true
if and only if the parser is inside CSSpublic HtmlParser.ATTR_TYPE getAttributeType()
HtmlParser
ATTR_TYPE.NONE
if we are not parsing an attribute.
The caller will typically invoke this method after determining
that the parser is processing an attribute.
This is useful to determine which escaping to apply based on the type of value this attribute expects.
getAttributeType
in interface HtmlParser
HtmlParser.ATTR_TYPE
public ExternalState getJavascriptState()
HtmlParser
See JavascriptParser
for more information on the valid
external states. The caller will typically first determine that the
parser is processing Javascript and then invoke this method to
obtain more fine-grained state information.
getJavascriptState
in interface HtmlParser
public boolean isAttributeQuoted()
HtmlParser
true
if and only if the parser is currently within
an attribute value and that attribute value is quoted.isAttributeQuoted
in interface HtmlParser
true
if and only if the attribute value is quotedpublic String getTag()
HtmlParser
String
if the parser is not
in a tag as determined by getCurrentExternalState
.getTag
in interface HtmlParser
String
if we are
not within an HTML tagpublic String getAttribute()
HtmlParser
String
if the parser is not
in an attribute as determined by getCurrentExternalState
.getAttribute
in interface HtmlParser
String
if we are not within an HTML attributepublic String getValue()
HtmlParser
getCurrentExternalState
.getValue
in interface HtmlParser
String
if the parser is not
in an HTML attribute valuepublic int getValueIndex()
HtmlParser
Parser.getState()
.getValueIndex
in interface HtmlParser
public boolean isUrlStart()
HtmlParser
true
if and only if the current position of the parser is
at the start of a URL HTML attribute value. This is the case when the
following three conditions are all met:
HtmlParser.getAttributeType()
returning .ATTR_TYPE#URI
.
This method may be used by an Html Sanitizer or an Auto-Escape system
to determine whether to validate the URL for well-formedness and validate
the scheme of the URL (e.g. HTTP
, HTTPS
) is safe.
In particular, it is recommended to use this method instead of
checking that HtmlParser.getValueIndex()
is 0
to support attribute
types where the URL does not start at index zero, such as the
content
attribute of the meta
HTML tag.
isUrlStart
in interface HtmlParser
true
if and only if the parser is at the start of the URLpublic void resetMode(HtmlParser.Mode mode)
HtmlParser
object.
See the HtmlParser.Mode
enum for information on all
the valid modes.
Resets the state of the parser to a state consistent with the
Mode
provided. This will reset finer-grained state
information back to a default value, hence use only when
you want to parse text from a very clean slate.
See the HtmlParser.Mode
enum for information on all
the valid modes.
resetMode
in interface HtmlParser
mode
- is an enum representing the high-level state of the parserpublic void reset()
reset
in interface Parser
reset
in class GenericParser
public void insertText() throws ParseException
The two cases where #insertText()
affects our parsing are:
getValueIndex()
'='
character). In that case, we
change internal state to be now inside a non-quoted HTML attribute
value.insertText
in interface HtmlParser
ParseException
- if an unrecoverable error occurred during parsingprotected com.google.streamhtmlparser.impl.InternalState handleEnterState(com.google.streamhtmlparser.impl.InternalState currentState, com.google.streamhtmlparser.impl.InternalState expectedNextState, char input)
GenericParser
handleEnterState
in class GenericParser
currentState
- the current state of the parserexpectedNextState
- the next state according to the
state table definitioninput
- the last character parsedexpectedNextState
providedprotected com.google.streamhtmlparser.impl.InternalState handleExitState(com.google.streamhtmlparser.impl.InternalState currentState, com.google.streamhtmlparser.impl.InternalState expectedNextState, char input)
GenericParser
handleExitState
in class GenericParser
currentState
- the current state of the parserexpectedNextState
- the next state according to the
state table definitioninput
- the last character parsedexpectedNextState
providedprotected com.google.streamhtmlparser.impl.InternalState handleInState(com.google.streamhtmlparser.impl.InternalState currentState, char input) throws ParseException
GenericParser
handleInState
in class GenericParser
currentState
- the current state of the parserinput
- the last character parsedexpectedNextState
providedParseException
- if an unrecoverable error occurred during parsingprotected void record(char input)
record
in class GenericParser
input
- the input character to operate onCopyright © 2010-2012 Google. All Rights Reserved.