org.apache.nutch.indexer
Interface IndexingFilter
- All Superinterfaces:
- Configurable, Pluggable
- All Known Implementing Classes:
- BasicIndexingFilter, CCIndexingFilter, LanguageIndexingFilter, MoreIndexingFilter, RelTagIndexingFilter
public interface IndexingFilter
- extends Pluggable, Configurable
Extension point for indexing. Permits one to add metadata to the indexed
fields. All plugins found which implement this extension point are run
sequentially on the parse.
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
Document filter(Document doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
throws IndexingException
- Adds fields or otherwise modifies the document that will be indexed for a
parse.
- Parameters:
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the pageinlinks
- page inlinks
- Returns:
- modified (or a new) document instance
- Throws:
IndexingException
Copyright © 2006 The Apache Software Foundation