org.apache.nutch.crawl
Class LinkDbMerger

java.lang.Object
  extended by org.apache.hadoop.util.ToolBase
      extended by org.apache.nutch.crawl.LinkDbMerger
All Implemented Interfaces:
Configurable, Tool

public class LinkDbMerger
extends ToolBase

This tool merges several LinkDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited URLs and links.

It's possible to use this tool just for filtering - in that case only one LinkDb should be specified in arguments.

If more than one LinkDb contains information about the same URL, all inlinks are accumulated, but only at most db.max.inlinks inlinks will ever be added.

If activated, URLFilters will be applied to both the target URLs and to any incoming link URL. If a target URL is prohibited, all inlinks to that target will be removed, including the target URL. If some of incoming links are prohibited, only they will be removed, and they won't count when checking the above-mentioned maximum limit.

Author:
Andrzej Bialecki

Field Summary
 
Fields inherited from class org.apache.hadoop.util.ToolBase
conf
 
Constructor Summary
LinkDbMerger()
           
LinkDbMerger(Configuration conf)
           
 
Method Summary
static void main(String[] args)
           
 void merge(Path output, Path[] dbs, boolean normalize, boolean filter)
           
 int run(String[] args)
           
 
Methods inherited from class org.apache.hadoop.util.ToolBase
doMain, getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LinkDbMerger

public LinkDbMerger()

LinkDbMerger

public LinkDbMerger(Configuration conf)
Method Detail

merge

public void merge(Path output,
                  Path[] dbs,
                  boolean normalize,
                  boolean filter)
           throws Exception
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Parameters:
args -
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Throws:
Exception


Copyright © 2006 The Apache Software Foundation