|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Uses of Parse in org.apache.nutch.analysis.lang |
---|
Methods in org.apache.nutch.analysis.lang that return Parse | |
---|---|
Parse |
HTMLLanguageParser.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible indications of content language 1. |
Methods in org.apache.nutch.analysis.lang with parameters of type Parse | |
---|---|
Parse |
HTMLLanguageParser.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible indications of content language 1. |
Document |
LanguageIndexingFilter.filter(Document doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.crawl |
---|
Methods in org.apache.nutch.crawl with parameters of type Parse | |
---|---|
byte[] |
TextProfileSignature.calculate(Content content,
Parse parse)
|
abstract byte[] |
Signature.calculate(Content content,
Parse parse)
|
byte[] |
MD5Signature.calculate(Content content,
Parse parse)
|
Uses of Parse in org.apache.nutch.indexer |
---|
Methods in org.apache.nutch.indexer with parameters of type Parse | |
---|---|
Document |
IndexingFilters.filter(Document doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters. |
Document |
IndexingFilter.filter(Document doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
Uses of Parse in org.apache.nutch.indexer.basic |
---|
Methods in org.apache.nutch.indexer.basic with parameters of type Parse | |
---|---|
Document |
BasicIndexingFilter.filter(Document doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.indexer.more |
---|
Methods in org.apache.nutch.indexer.more with parameters of type Parse | |
---|---|
Document |
MoreIndexingFilter.filter(Document doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.microformats.reltag |
---|
Methods in org.apache.nutch.microformats.reltag that return Parse | |
---|---|
Parse |
RelTagParser.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible rel-tags |
Methods in org.apache.nutch.microformats.reltag with parameters of type Parse | |
---|---|
Parse |
RelTagParser.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible rel-tags |
Document |
RelTagIndexingFilter.filter(Document doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.parse |
---|
Classes in org.apache.nutch.parse that implement Parse | |
---|---|
class |
ParseImpl
The result of parsing a page's raw content. |
Methods in org.apache.nutch.parse that return Parse | |
---|---|
Parse |
HtmlParseFilters.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Run all defined filters. |
Parse |
HtmlParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page. |
Parse |
ParseStatus.getEmptyParse(Configuration conf)
A convenience method. |
Parse |
Parser.getParse(Content c)
Creates the parse for some content. |
Parse |
ParseUtil.parse(Content content)
Performs a parse by iterating through a List of preferred Parser s
until a successful parse is performed and a Parse object is
returned. |
Parse |
ParseUtil.parseByExtensionId(String extId,
Content content)
Method parses a Content object using the Parser specified
by the parameter extId , i.e., the Parser's extension ID. |
Methods in org.apache.nutch.parse with parameters of type Parse | |
---|---|
Parse |
HtmlParseFilters.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Run all defined filters. |
Parse |
HtmlParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page. |
Constructors in org.apache.nutch.parse with parameters of type Parse | |
---|---|
ParseImpl(Parse parse)
|
Uses of Parse in org.apache.nutch.parse.ext |
---|
Methods in org.apache.nutch.parse.ext that return Parse | |
---|---|
Parse |
ExtParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.html |
---|
Methods in org.apache.nutch.parse.html that return Parse | |
---|---|
Parse |
HtmlParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.js |
---|
Methods in org.apache.nutch.parse.js that return Parse | |
---|---|
Parse |
JSParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
Parse |
JSParseFilter.getParse(Content c)
|
Methods in org.apache.nutch.parse.js with parameters of type Parse | |
---|---|
Parse |
JSParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
Uses of Parse in org.apache.nutch.parse.ms |
---|
Methods in org.apache.nutch.parse.ms that return Parse | |
---|---|
protected Parse |
MSBaseParser.getParse(MSExtractor extractor,
Content content)
Parses a Content with a specific Microsoft document
extractor . |
Uses of Parse in org.apache.nutch.parse.msexcel |
---|
Methods in org.apache.nutch.parse.msexcel that return Parse | |
---|---|
Parse |
MSExcelParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.mspowerpoint |
---|
Methods in org.apache.nutch.parse.mspowerpoint that return Parse | |
---|---|
Parse |
MSPowerPointParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.msword |
---|
Methods in org.apache.nutch.parse.msword that return Parse | |
---|---|
Parse |
MSWordParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.oo |
---|
Methods in org.apache.nutch.parse.oo that return Parse | |
---|---|
Parse |
OOParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.pdf |
---|
Methods in org.apache.nutch.parse.pdf that return Parse | |
---|---|
Parse |
PdfParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.rss |
---|
Methods in org.apache.nutch.parse.rss that return Parse | |
---|---|
Parse |
RSSParser.getParse(Content content)
Implementation method, parses the RSS content, and then returns a ParseImpl . |
Uses of Parse in org.apache.nutch.parse.swf |
---|
Methods in org.apache.nutch.parse.swf that return Parse | |
---|---|
Parse |
SWFParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.text |
---|
Methods in org.apache.nutch.parse.text that return Parse | |
---|---|
Parse |
TextParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.zip |
---|
Methods in org.apache.nutch.parse.zip that return Parse | |
---|---|
Parse |
ZipParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.scoring |
---|
Methods in org.apache.nutch.scoring with parameters of type Parse | |
---|---|
float |
ScoringFilters.indexerScore(Text url,
Document doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
float |
ScoringFilter.indexerScore(Text url,
Document doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
This method calculates a Lucene document boost. |
void |
ScoringFilters.passScoreAfterParsing(Text url,
Content content,
Parse parse)
|
void |
ScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Currently a part of score distribution is performed using only data coming from the parsing process. |
Uses of Parse in org.apache.nutch.scoring.opic |
---|
Methods in org.apache.nutch.scoring.opic with parameters of type Parse | |
---|---|
float |
OPICScoringFilter.indexerScore(Text url,
Document doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Dampen the boost value by scorePower. |
void |
OPICScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData. |
Uses of Parse in org.creativecommons.nutch |
---|
Methods in org.creativecommons.nutch that return Parse | |
---|---|
Parse |
CCParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. |
Methods in org.creativecommons.nutch with parameters of type Parse | |
---|---|
Parse |
CCParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. |
Document |
CCIndexingFilter.filter(Document doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |