org.apache.nutch.parse.pdf
Class PdfParser

java.lang.Object
  extended by org.apache.nutch.parse.pdf.PdfParser
All Implemented Interfaces:
Configurable, Parser, Pluggable

public class PdfParser
extends Object
implements Parser

parser for mime type application/pdf. It is based on org.pdfbox.*. We have to see how well it does the job.

Author:
John Xing Note on 20040614 by Xing: Some codes are stacked here for convenience (see inline comments). They may be moved to more appropriate places when new codebase stabilizes, especially after code for indexing is written.

Field Summary
static org.apache.commons.logging.Log LOG
           
 
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
 
Constructor Summary
PdfParser()
           
 
Method Summary
 Configuration getConf()
           
 Parse getParse(Content content)
          Creates the parse for some content.
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

PdfParser

public PdfParser()
Method Detail

getParse

public Parse getParse(Content content)
Description copied from interface: Parser
Creates the parse for some content.

Specified by:
getParse in interface Parser

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable


Copyright © 2006 The Apache Software Foundation