I crawled a web site on different times and have two log files related to these crawlings. My purpose is to reduce addresses to 100 related with the main domain,and also this reduction should be selected among the most changing urls.
Two compare these two log files I need a app that find the difference between two different log files.
A log files include this type of lines:
2004-07-21T23:29:40.502Z 200 225 http://127.0.0.1:9999/selftest/MaxLinkHops/5.html LLLLL http://127.0.0.1:9999/selftest/MaxLinkHops/4.html text/html #000 20040721232940481+12 sha1:M77KNTBZH2IU6V2SIG5EEG45EJICNQNM -
If there is a difference between two urls the last column (ha1:M77KNTBZH2IU6V2SIG5EEG45EJICNQNM) show some changes.The important columns are the date, url, and sh1 columns. Can you suggest a app to do this,or there is a better way to do that with heritrix?