List of all members.
Public Member Functions |
| PdfParser (PdfVecObjects *pVecObjects) |
| PdfParser (PdfVecObjects *pVecObjects, const char *pszFilename, bool bLoadOnDemand=true) |
| PdfParser (PdfVecObjects *pVecObjects, const char *pBuffer, long lLen, bool bLoadOnDemand=true) |
| PdfParser (PdfVecObjects *pVecObjects, const PdfRefCountedInputDevice &rDevice, bool bLoadOnDemand=true) |
virtual | ~PdfParser () |
void | ParseFile (const char *pszFilename, bool bLoadOnDemand=true) |
void | ParseFile (const char *pBuffer, long lLen, bool bLoadOnDemand=true) |
void | ParseFile (const PdfRefCountedInputDevice &rDevice, bool bLoadOnDemand=true) |
bool | QuickEncryptedCheck (const char *pszFilename) |
int | GetNumberOfIncrementalUpdates () const |
const PdfVecObjects * | GetObjects () const |
EPdfVersion | GetPdfVersion () const |
const char * | GetPdfVersionString () const |
const PdfObject * | GetTrailer () const |
bool | GetLoadOnDemand () const |
bool | IsLinearized () const |
size_t | GetFileSize () const |
bool | GetEncrypted () const |
const PdfEncrypt * | GetEncrypt () const |
PdfEncrypt * | TakeEncrypt () |
void | SetPassword (const std::string &sPassword) |
bool | IsStrictParsing () const |
void | SetStringParsing (bool bStrict) |
bool | GetIgnoreBrokenObjects () |
void | SetIgnoreBrokenObjects (bool bBroken) |
Protected Member Functions |
void | FindToken (const char *pszToken, const long lRange) |
void | FindToken2 (const char *pszToken, const long lRange, size_t searchEnd) |
void | ReadDocumentStructure () |
void | HasLinearizationDict () |
void | MergeTrailer (const PdfObject *pTrailer) |
void | ReadTrailer () |
void | ReadXRef (pdf_long *pXRefOffset) |
void | ReadXRefContents (pdf_long lOffset, bool bPositionAtEnd=false) |
void | ReadXRefSubsection (long long &nFirstObject, long long &nNumObjects) |
void | ReadXRefStreamContents (pdf_long lOffset, bool bReadOnlyTrailer) |
void | ReadObjects () |
void | ReadObjectsInternal () |
void | ReadObjectFromStream (int nObjNo, int nIndex) |
bool | IsPdfFile () |
Detailed Description
PdfParser reads a PDF file into memory. The file can be modified in memory and written back using the PdfWriter class. Most PDF features are supported
Constructor & Destructor Documentation
Create a new PdfParser object You have to open a PDF file using ParseFile later.
- Parameters:
-
| pVecObjects | vector to write the parsed PdfObjects to |
- See also:
- ParseFile
PoDoFo::PdfParser::PdfParser |
( |
PdfVecObjects * |
pVecObjects, |
|
|
const char * |
pszFilename, |
|
|
bool |
bLoadOnDemand = true | |
|
) |
| | |
Create a new PdfParser object and open a PDF file and parse it into memory.
- Parameters:
-
| pVecObjects | vector to write the parsed PdfObjects to |
| pszFilename | filename of the file which is going to be parsed |
| bLoadOnDemand | If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.
- See also:
- SetPassword
PoDoFo::PdfParser::PdfParser |
( |
PdfVecObjects * |
pVecObjects, |
|
|
const char * |
pBuffer, |
|
|
long |
lLen, |
|
|
bool |
bLoadOnDemand = true | |
|
) |
| | |
Create a new PdfParser object and open a PDF file and parse it into memory.
- Parameters:
-
| pVecObjects | vector to write the parsed PdfObjects to |
| pBuffer | buffer containing a PDF file in memory |
| lLen | length of the buffer containing the PDF file |
| bLoadOnDemand | If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.
- See also:
- SetPassword
Create a new PdfParser object and open a PDF file and parse it into memory.
- Parameters:
-
| pVecObjects | vector to write the parsed PdfObjects to |
| rDevice | read from this PdfRefCountedInputDevice |
| bLoadOnDemand | If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.
- See also:
- SetPassword
PoDoFo::PdfParser::~PdfParser |
( |
|
) |
[virtual] |
Member Function Documentation
void PoDoFo::PdfParser::FindToken |
( |
const char * |
pszToken, |
|
|
const long |
lRange | |
|
) |
| | [protected] |
Searches backwards from the end of the file and tries to find a token. The current file is positioned right after the token.
- Parameters:
-
| pszToken | a token to find |
| lRange | range in bytes in which to search begining at the end of the file |
void PoDoFo::PdfParser::FindToken2 |
( |
const char * |
pszToken, |
|
|
const long |
lRange, |
|
|
size_t |
searchEnd | |
|
) |
| | [protected] |
Searches backwards from the specified position of the file and tries to find a token. The current file is positioned right after the token.
- Parameters:
-
| pszToken | a token to find |
| lRange | range in bytes in which to search begining at the specified position of the file |
| searchEnd | specifies position |
const PdfEncrypt* PoDoFo::PdfParser::GetEncrypt |
( |
|
) |
const [inline] |
- Returns:
- the parsers encryption object or NULL if the read PDF file was not encrypted
bool PoDoFo::PdfParser::GetEncrypted |
( |
|
) |
const [inline] |
- Returns:
- true if this PdfWriter creates an encrypted PDF file
size_t PoDoFo::PdfParser::GetFileSize |
( |
|
) |
const [inline] |
- Returns:
- the length of the file
bool PoDoFo::PdfParser::GetIgnoreBrokenObjects |
( |
|
) |
[inline] |
- Returns:
- if broken objects are ignored while parsing
bool PoDoFo::PdfParser::GetLoadOnDemand |
( |
|
) |
const [inline] |
- Returns:
- true if this PdfParser loads all objects on demand at the time they are accessed for the first time. The default is to load all object immediately. In this case false is returned.
int PoDoFo::PdfParser::GetNumberOfIncrementalUpdates |
( |
|
) |
const [inline] |
Retrieve the number of incremental updates that have been applied to the last parsed PDF file.
0 means no update has been applied.
- Returns:
- the number of incremental updates to the parsed PDF.
const PdfVecObjects * PoDoFo::PdfParser::GetObjects |
( |
|
) |
const [inline] |
Get a reference to the sorted internal objects vector.
- Returns:
- the internal objects vector.
EPdfVersion PoDoFo::PdfParser::GetPdfVersion |
( |
|
) |
const [inline] |
Get the file format version of the pdf
- Returns:
- the file format version as enum
const char * PoDoFo::PdfParser::GetPdfVersionString |
( |
|
) |
const |
Get the file format version of the pdf
- Returns:
- the file format version as string
const PdfObject * PoDoFo::PdfParser::GetTrailer |
( |
|
) |
const [inline] |
Get the trailer dictionary which can be written unmodified to a pdf file.
void PoDoFo::PdfParser::HasLinearizationDict |
( |
|
) |
[protected] |
Checks wether this pdf is linearized or not. Initializes the linearization directory on sucess.
bool PoDoFo::PdfParser::IsLinearized |
( |
|
) |
const [inline] |
- Returns:
- whether the parsed document contains linearization tables
bool PoDoFo::PdfParser::IsPdfFile |
( |
|
) |
[protected] |
Checks the magic number at the start of the pdf file and sets the m_ePdfVersion member to the correct version of the pdf file.
- Returns:
- true if this is a pdf file, otherwise false
bool PoDoFo::PdfParser::IsStrictParsing |
( |
|
) |
const [inline] |
void PoDoFo::PdfParser::MergeTrailer |
( |
const PdfObject * |
pTrailer |
) |
[protected] |
Merge the information of this trailer object in the parsers main trailer object.
- Parameters:
-
| pTrailer | take the keys to merge from this dictionary. |
void PoDoFo::PdfParser::ParseFile |
( |
const char * |
pszFilename, |
|
|
bool |
bLoadOnDemand = true | |
|
) |
| | |
Open a PDF file and parse it.
- Parameters:
-
| pszFilename | filename of the file which is going to be parsed |
| bLoadOnDemand | If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.
- See also:
- SetPassword
void PoDoFo::PdfParser::ParseFile |
( |
const char * |
pBuffer, |
|
|
long |
lLen, |
|
|
bool |
bLoadOnDemand = true | |
|
) |
| | |
Open a PDF file and parse it.
- Parameters:
-
| pBuffer | buffer containing a PDF file in memory |
| lLen | length of the buffer containing the PDF file |
| bLoadOnDemand | If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.
- See also:
- SetPassword
Open a PDF file and parse it.
- Parameters:
-
| rDevice | the input device to read from |
| bLoadOnDemand | If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.
- See also:
- SetPassword
bool PoDoFo::PdfParser::QuickEncryptedCheck |
( |
const char * |
pszFilename |
) |
|
Quick method to detect secured PDF files, i.e. a PDF with an /Encrypt key in the trailer directory.
- Returns:
- true if document is secured, false otherwise
void PoDoFo::PdfParser::ReadDocumentStructure |
( |
|
) |
[protected] |
Reads the xref sections and the trailers of the file in the correct order in the memory and takes care for linearized pdf files.
void PoDoFo::PdfParser::ReadObjectFromStream |
( |
int |
nObjNo, |
|
|
int |
nIndex | |
|
) |
| | [protected] |
Read the object with index nIndex from the object stream nObjNo and push it on the objects vector m_vecOffsets.
All objects are read from this stream and the stream object is free'd from memory. Further calls who try to read from the same stream simply do nothing.
- Parameters:
-
| nObjNo | object number of the stream object |
| nIndex | index of the object which should be parsed |
void PoDoFo::PdfParser::ReadObjects |
( |
|
) |
[protected] |
Reads all objects from the pdf into memory from the offsets listed in m_vecOffsets.
If required an encryption object is setup first.
The actual reading happens in ReadObjectsInternal() either if no encryption is required or a correct encryption object was initialized from SetPassword.
void PoDoFo::PdfParser::ReadObjectsInternal |
( |
|
) |
[protected] |
Reads all objects from the pdf into memory from the offsets listed in m_vecOffsets.
Requires a correctly setup PdfEncrypt object with correct password.
This method is called from ReadObjects or SetPassword.
- See also:
- ReadObjects
-
SetPassword
void PoDoFo::PdfParser::ReadTrailer |
( |
|
) |
[protected] |
Read the trailer directory at the end of the file.
void PoDoFo::PdfParser::ReadXRef |
( |
pdf_long * |
pXRefOffset |
) |
[protected] |
Looks for a startxref entry at the current file position and saves its byteoffset to pXRefOffset.
- Parameters:
-
| pXRefOffset | store the byte offset of the xref section into this variable. |
void PoDoFo::PdfParser::ReadXRefContents |
( |
pdf_long |
lOffset, |
|
|
bool |
bPositionAtEnd = false | |
|
) |
| | [protected] |
Reads the xref table from a pdf file. If there is no xref table, ReadXRefStreamContents() is called.
- Parameters:
-
| lOffset | read the table from this offset |
| bPositionAtEnd | if true the xref table is not read, but the file stream is positioned directly after the table, which allows reading a following trailer dictionary. |
void PoDoFo::PdfParser::ReadXRefStreamContents |
( |
pdf_long |
lOffset, |
|
|
bool |
bReadOnlyTrailer | |
|
) |
| | [protected] |
Reads a xref stream contens object
- Parameters:
-
| lOffset | read the stream from this offset |
| bReadOnlyTrailer | only the trailer is skipped over, the contents of the xref stream are not parsed |
void PoDoFo::PdfParser::ReadXRefSubsection |
( |
long long & |
nFirstObject, |
|
|
long long & |
nNumObjects | |
|
) |
| | [protected] |
Read a xref subsection
Throws ePdfError_NoXref if the number of objects read was not the number specified by the subsection header (as passed in `nNumObjects').
- Parameters:
-
| nFirstObject | object number of the first object |
| nNumObjects | how many objects should be read from this section |
void PoDoFo::PdfParser::SetIgnoreBrokenObjects |
( |
bool |
bBroken |
) |
[inline] |
Specify if the parser should ignore broken objects, i.e. XRef entries that do not point to valid objects.
Default is to not ignore broken objects and throw an exception if one is found.
- Parameters:
-
| bBroken | if true broken objects will be ignored |
void PoDoFo::PdfParser::SetPassword |
( |
const std::string & |
sPassword |
) |
|
If you try to open an encrypted PDF file, which requires a password to open, PoDoFo will throw a PdfError( ePdfError_InvalidPassword ) exception.
If you got such an exception, you have to set a password which should be used for opening the PDF.
The usual way will be to ask the user for the password and set the password using this method.
PdfParser will immediately continue to read the PDF file.
- Parameters:
-
| sPassword | a user or owner password which can be used to open an encrypted PDF file If the password is invalid, a PdfError( ePdfError_InvalidPassword ) exception is thrown! |
void PoDoFo::PdfParser::SetStringParsing |
( |
bool |
bStrict |
) |
[inline] |
Enable/disable strict parsing mode. Strict parsing is by default disabled.
If you enable strict parsing, PoDoFo will fail on a few more common PDF failures. Please not that PoDoFo's parser is by default very strict already and does not recover from e.g. wrong XREF tables.
- Parameters:
-
| bStrict | new setting for strict parsing mode. |
PdfEncrypt * PoDoFo::PdfParser::TakeEncrypt |
( |
|
) |
[inline] |
Takes the encryption object fro mthe parser. The internal handle will be set to NULL and the ownership of the object is given to the caller.
Only call this if you need access to the encryption object before deleting the parser.
- Returns:
- the parsers encryption object or NULL if the read PDF file was not encrypted