org.apache.nutch.crawl
Class Generator.CrawlDbUpdater

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by org.apache.nutch.crawl.Generator.CrawlDbUpdater
All Implemented Interfaces:
Closeable, JobConfigurable, Mapper, Reducer
Enclosing class:
Generator

public static class Generator.CrawlDbUpdater
extends MapReduceBase
implements Mapper, Reducer

Update the CrawlDB so that the next generate won't include the same URLs.


Constructor Summary
Generator.CrawlDbUpdater()
           
 
Method Summary
 void configure(JobConf job)
           
 void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter)
           
 void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter)
           
 
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.io.Closeable
close
 
Methods inherited from interface org.apache.hadoop.io.Closeable
close
 

Constructor Detail

Generator.CrawlDbUpdater

public Generator.CrawlDbUpdater()
Method Detail

configure

public void configure(JobConf job)
Specified by:
configure in interface JobConfigurable
Overrides:
configure in class MapReduceBase

map

public void map(WritableComparable key,
                Writable value,
                OutputCollector output,
                Reporter reporter)
         throws IOException
Specified by:
map in interface Mapper
Throws:
IOException

reduce

public void reduce(WritableComparable key,
                   Iterator values,
                   OutputCollector output,
                   Reporter reporter)
            throws IOException
Specified by:
reduce in interface Reducer
Throws:
IOException


Copyright © 2006 The Apache Software Foundation