Personal tools

Ace:Comparing Collections

From Adapt

Revision as of 16:21, 26 May 2009 by Toaster (talk | contribs)
Jump to: navigation, search

The ACE AM allows you to compare collections to an external manifest. This is useful when you want to compare sites that have replicated data, or to verify all data was successfully loaded during ingestion.

There are two ways to compare collection, first from an external data file containing digests and filenames. The second is to register peer ACE AM installations and automatically compare collections as part of an audit.

Manually Compare

Manually comparing collections is a two step process. First a source list of digests and filenames needs to be generated. The second step is to load that list into an AM installation for comparison.

Step 1: Generate Digest List

The format of the list is a simple, SHA-256 digest, followed by one or more spaces or tabs, then the path to the file in the collection.

Example source list.

01348875911b94af38b35f304dcd75348f437734696e26b40fd868eecd687d35	/state/data/state/state-2007-10-ARC/state-20071001-aud-000000.arc.gz
365fe0af21237b750258f6b8c48b25964d0dd5c7d612748eff2f6526f43682bb	/state/data/state/state-2007-10-ARC/state-20071001-aud-000001.arc.gz
893298b40da08a1b9ce0c7994c8f2717cedb23d046936c8e24bd62655ca1962b	/state/data/state/state-2007-10-ARC/state-20071001-aud-000002.arc.gz
ba7cce400971bd56377e2d79a21192c63e0328e7651728345c49ebf35fb4999d	/state/data/state/state-2007-10-ARC/state-20071001-aud-000003.arc.gz
9827832cdfd4a9565422e41fd334eb09a23c835772184936ffebabb147eb5b8a	/state/data/state/state-2007-10-ARC/state-20071001-aud-000004.arc.gz
3d55a5b19dd6133e598fb29ff89444fb05196863c21d8773a03dbe16c0b42615	/state/data/state/state-2007-10-ARC/state-20071001-aud-000005.arc.gz
ea0880b33fb9b237299c7e92578f4881c820b93e3d130a5818e3b3a3e90b8872	/state/data/state/state-2007-10-ARC/state-20071001-aud-000006.arc.gz
45b632a3de7ca7c38a916242d78cedce6f11004cef99e4642194a48651db597f	/state/data/state/state-2007-10-ARC/state-20071001-aud-000007.arc.gz
bca0a7d6f78b9d46196bb502ef31782d0b3ea5a075ca61691bdd0a2ffc3cfd24	/state/data/state/state-2007-10-ARC/state-20071001-aud-000008.arc.gz
4ec209a01449552454b57d82e12c0848982010ebd7f36e4ac3206576819531cf	/state/data/state/state-2007-10-ARC/state-20071001-aud-000009.arc.gz
c4ed9102ba6e8f0ea5f9bfeb06318b3db2230733c5fd9a22b18405dcfe820a7f	/state/data/state/state-2007-10-ARC/state-20071001-aud-000010.arc.gz
b6f97b66eff760a3bbef4e7895be9432fa4dfcb206d42d789c5ecfb4343eadc1	/state/data/state/state-2007-10-ARC/state-20071001-aud-000011.arc.gz
ab3f8d618e51032418a3285fd24687f7a2a006cd546de43e575c72b1fed727e4	/state/data/state/state-2007-10-ARC/state-20071001-aud-000012.arc.gz

Directories must be separated by a /. This is different from Windows where directories are separated by a \.

The Audit Manager is able to supply a list of digests for a collection or directory.

  1. From the status page, select the collection you wish to generate a list for
  2. Click on more..., then 'Download Digests'. You will see a list of digests and filenames in the correct format. You can right-click and save the list to your hard drive.

Step 2: Upload to ACE

From the status screen, select the collection you wish to compare. Click the 'more...' link to bring up the drop down menu, then click 'Compare Collection'. You will see the following screen. Click on 'Browse' and select the name of the file you saved your digest link into during step 1. Click submit.

Compare-1.png


If everything in the file you submitted matches the selected collection, you will see the following summary showing no differences.

Compare-3.png


If, for some reason there are differences, then your screen may look something like the following:

Compare.png

There are four different ways in which collections may differ.

Files in original collection, but not in supplied
This is a list of files that appear in the collection being monitored by ACE, but do not appear in the uploaded file.
Files in supplied file, but not original collection
These are files that appear in the uploaded list, but not in the collection monitored by ACE.
Files with different names, but same digests
These are files which have the same content, but only differ in directory or name. This is seen when files are renamed and moved across different operating systems.
Files with same names, but different digests
These are files that have the same directory and name, but have different content. Most likely seen during a bad replication.