Uses of Interface
org.apache.nutch.parse.Parse

Packages that use Parse
org.apache.nutch.analysis.lang Text document language identifier. 
org.apache.nutch.crawl Crawl control code. 
org.apache.nutch.indexer Maintain Lucene full-text indexes. 
org.apache.nutch.indexer.basic A basic indexing plugin. 
org.apache.nutch.indexer.more A more indexing plugin. 
org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin. 
org.apache.nutch.parse   
org.apache.nutch.parse.ext   
org.apache.nutch.parse.html An HTML document parsing plugin. 
org.apache.nutch.parse.js   
org.apache.nutch.parse.ms Common API for Microsoft © documents parsing. 
org.apache.nutch.parse.msexcel A Microsoft © Excel document parsing plugin. 
org.apache.nutch.parse.mspowerpoint A Microsoft © PowerPoint document parsing plugin. 
org.apache.nutch.parse.msword A Microsoft © Word document parsing plugin. 
org.apache.nutch.parse.oo   
org.apache.nutch.parse.pdf A pdf parsing plugin. 
org.apache.nutch.parse.rss   
org.apache.nutch.parse.swf   
org.apache.nutch.parse.text A plain text parsing plugin. 
org.apache.nutch.parse.zip   
org.apache.nutch.scoring   
org.apache.nutch.scoring.opic   
org.creativecommons.nutch Sample plugins that parse and index Creative Commons medadata. 
 

Uses of Parse in org.apache.nutch.analysis.lang
 

Methods in org.apache.nutch.analysis.lang that return Parse
 Parse HTMLLanguageParser.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Scan the HTML document looking at possible indications of content language
1.
 

Methods in org.apache.nutch.analysis.lang with parameters of type Parse
 Parse HTMLLanguageParser.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Scan the HTML document looking at possible indications of content language
1.
 Document LanguageIndexingFilter.filter(Document doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.crawl
 

Methods in org.apache.nutch.crawl with parameters of type Parse
 byte[] TextProfileSignature.calculate(Content content, Parse parse)
           
abstract  byte[] Signature.calculate(Content content, Parse parse)
           
 byte[] MD5Signature.calculate(Content content, Parse parse)
           
 

Uses of Parse in org.apache.nutch.indexer
 

Methods in org.apache.nutch.indexer with parameters of type Parse
 Document IndexingFilters.filter(Document doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Run all defined filters.
 Document IndexingFilter.filter(Document doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Adds fields or otherwise modifies the document that will be indexed for a parse.
 

Uses of Parse in org.apache.nutch.indexer.basic
 

Methods in org.apache.nutch.indexer.basic with parameters of type Parse
 Document BasicIndexingFilter.filter(Document doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.indexer.more
 

Methods in org.apache.nutch.indexer.more with parameters of type Parse
 Document MoreIndexingFilter.filter(Document doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.microformats.reltag
 

Methods in org.apache.nutch.microformats.reltag that return Parse
 Parse RelTagParser.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Scan the HTML document looking at possible rel-tags
 

Methods in org.apache.nutch.microformats.reltag with parameters of type Parse
 Parse RelTagParser.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Scan the HTML document looking at possible rel-tags
 Document RelTagIndexingFilter.filter(Document doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.parse
 

Classes in org.apache.nutch.parse that implement Parse
 class ParseImpl
          The result of parsing a page's raw content.
 

Methods in org.apache.nutch.parse that return Parse
 Parse HtmlParseFilters.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Run all defined filters.
 Parse HtmlParseFilter.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
 Parse ParseStatus.getEmptyParse(Configuration conf)
          A convenience method.
 Parse Parser.getParse(Content c)
          Creates the parse for some content.
 Parse ParseUtil.parse(Content content)
          Performs a parse by iterating through a List of preferred Parsers until a successful parse is performed and a Parse object is returned.
 Parse ParseUtil.parseByExtensionId(String extId, Content content)
          Method parses a Content object using the Parser specified by the parameter extId, i.e., the Parser's extension ID.
 

Methods in org.apache.nutch.parse with parameters of type Parse
 Parse HtmlParseFilters.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Run all defined filters.
 Parse HtmlParseFilter.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
 

Constructors in org.apache.nutch.parse with parameters of type Parse
ParseImpl(Parse parse)
           
 

Uses of Parse in org.apache.nutch.parse.ext
 

Methods in org.apache.nutch.parse.ext that return Parse
 Parse ExtParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.html
 

Methods in org.apache.nutch.parse.html that return Parse
 Parse HtmlParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.js
 

Methods in org.apache.nutch.parse.js that return Parse
 Parse JSParseFilter.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
           
 Parse JSParseFilter.getParse(Content c)
           
 

Methods in org.apache.nutch.parse.js with parameters of type Parse
 Parse JSParseFilter.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
           
 

Uses of Parse in org.apache.nutch.parse.ms
 

Methods in org.apache.nutch.parse.ms that return Parse
protected  Parse MSBaseParser.getParse(MSExtractor extractor, Content content)
          Parses a Content with a specific Microsoft document extractor.
 

Uses of Parse in org.apache.nutch.parse.msexcel
 

Methods in org.apache.nutch.parse.msexcel that return Parse
 Parse MSExcelParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.mspowerpoint
 

Methods in org.apache.nutch.parse.mspowerpoint that return Parse
 Parse MSPowerPointParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.msword
 

Methods in org.apache.nutch.parse.msword that return Parse
 Parse MSWordParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.oo
 

Methods in org.apache.nutch.parse.oo that return Parse
 Parse OOParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.pdf
 

Methods in org.apache.nutch.parse.pdf that return Parse
 Parse PdfParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.rss
 

Methods in org.apache.nutch.parse.rss that return Parse
 Parse RSSParser.getParse(Content content)
           Implementation method, parses the RSS content, and then returns a ParseImpl.
 

Uses of Parse in org.apache.nutch.parse.swf
 

Methods in org.apache.nutch.parse.swf that return Parse
 Parse SWFParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.text
 

Methods in org.apache.nutch.parse.text that return Parse
 Parse TextParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.parse.zip
 

Methods in org.apache.nutch.parse.zip that return Parse
 Parse ZipParser.getParse(Content content)
           
 

Uses of Parse in org.apache.nutch.scoring
 

Methods in org.apache.nutch.scoring with parameters of type Parse
 float ScoringFilters.indexerScore(Text url, Document doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
           
 float ScoringFilter.indexerScore(Text url, Document doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          This method calculates a Lucene document boost.
 void ScoringFilters.passScoreAfterParsing(Text url, Content content, Parse parse)
           
 void ScoringFilter.passScoreAfterParsing(Text url, Content content, Parse parse)
          Currently a part of score distribution is performed using only data coming from the parsing process.
 

Uses of Parse in org.apache.nutch.scoring.opic
 

Methods in org.apache.nutch.scoring.opic with parameters of type Parse
 float OPICScoringFilter.indexerScore(Text url, Document doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          Dampen the boost value by scorePower.
 void OPICScoringFilter.passScoreAfterParsing(Text url, Content content, Parse parse)
          Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData.
 

Uses of Parse in org.creativecommons.nutch
 

Methods in org.creativecommons.nutch that return Parse
 Parse CCParseFilter.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
 

Methods in org.creativecommons.nutch with parameters of type Parse
 Parse CCParseFilter.filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
 Document CCIndexingFilter.filter(Document doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 



Copyright © 2006 The Apache Software Foundation