Personal tools

Ace:Comparing Collections

From Adapt

Revision as of 21:44, 14 May 2018 by Shake (talk | contribs)
Jump to: navigation, search

The ACE AM allows you to compare collections to an external manifest. This is useful when you want to compare sites that have replicated data, or to verify all data was successfully loaded during ingestion.

There are two ways to compare collection, first is to register peer ACE AM installations and automatically compare collections as part of an audit. The second from an external data file containing digests and filenames.

Peer Comparisons

Comparing collections with a Peer site is a multi-step process which allows your ACE instance to query a peer ACE's digests for a given collection. First a user is needed at the Peer site for your ACE to be able to query for file digests. Then the Peer site needs to be registered in your local ACE. Finally, comparisons can be done by either manually issuing a comparison or by adding a PeerCollection to your Collection which you wish to compare.

Step 1: Create a Peer User

At the remote site, a User needs to be created in order for your ACE to have access to the remote ACE. The comparisons run through a series of HTTP requests, and as such it is recommended to only use a minimal set of Roles for the Peer User.

A User can be created by browsing to the Accounts servlet (/Users).

Step 2: Add the Peer User to A Local ACE

Once a user is created for you at the Peer Site, your ACE needs to store the credentials. This can be done on the PartnetSite servlet (/PartnerSite) which can be found by browsing with:

  • Status (Collection): Compare Collection
    • Partner: Add New

Step 3a: Manually Selecting the Collection

After the user setup is complete and your ACE AM can communicating with the Peer ACE AM, a manual collection comparison can be done through the CompareCollection Servlet.

This is the same servlet as in Step 2. After selecting a Partner Site to compare against, a listing of collections will be shown which you can compare against.

  • Note: If the credentials are not correct, you will be able to see an Exception in the aceam.log
  • Note: There is currently a bug where if a PartnerSite has a large amount of collections, not all will be shown.

Step 3b: Registering a PeerCollection with a Local Collection

If you wish to have collections compared automatically when auditing, you simply need to go to the Collection Settings and select the 'Add Peer' dialogue. This then follows the same process as the manual selection. Once a collection has been chosen, it will be persisted to the ACE AM database and used to compare against during audits.


Manually Compare

Manually comparing collections is a two step process. First a source list of digests and filenames needs to be generated. The second step is to load that list into an AM installation for comparison. To manually compare collections between AM installations, you would download the digest list from the first installation, then upload the list to the second for comparison.

Step 1: Generate Digest List

The format of the list is a simple, SHA-256 digest, followed by one or more spaces or tabs, then the path to the file in the collection.

Example source list.

01348875911b94af38b35f304dcd75348f437734696e26b40fd868eecd687d35	/state/data/state/state-2007-10-ARC/state-20071001-aud-000000.arc.gz
365fe0af21237b750258f6b8c48b25964d0dd5c7d612748eff2f6526f43682bb	/state/data/state/state-2007-10-ARC/state-20071001-aud-000001.arc.gz
893298b40da08a1b9ce0c7994c8f2717cedb23d046936c8e24bd62655ca1962b	/state/data/state/state-2007-10-ARC/state-20071001-aud-000002.arc.gz
ba7cce400971bd56377e2d79a21192c63e0328e7651728345c49ebf35fb4999d	/state/data/state/state-2007-10-ARC/state-20071001-aud-000003.arc.gz
9827832cdfd4a9565422e41fd334eb09a23c835772184936ffebabb147eb5b8a	/state/data/state/state-2007-10-ARC/state-20071001-aud-000004.arc.gz
3d55a5b19dd6133e598fb29ff89444fb05196863c21d8773a03dbe16c0b42615	/state/data/state/state-2007-10-ARC/state-20071001-aud-000005.arc.gz
ea0880b33fb9b237299c7e92578f4881c820b93e3d130a5818e3b3a3e90b8872	/state/data/state/state-2007-10-ARC/state-20071001-aud-000006.arc.gz
45b632a3de7ca7c38a916242d78cedce6f11004cef99e4642194a48651db597f	/state/data/state/state-2007-10-ARC/state-20071001-aud-000007.arc.gz
bca0a7d6f78b9d46196bb502ef31782d0b3ea5a075ca61691bdd0a2ffc3cfd24	/state/data/state/state-2007-10-ARC/state-20071001-aud-000008.arc.gz
4ec209a01449552454b57d82e12c0848982010ebd7f36e4ac3206576819531cf	/state/data/state/state-2007-10-ARC/state-20071001-aud-000009.arc.gz
c4ed9102ba6e8f0ea5f9bfeb06318b3db2230733c5fd9a22b18405dcfe820a7f	/state/data/state/state-2007-10-ARC/state-20071001-aud-000010.arc.gz
b6f97b66eff760a3bbef4e7895be9432fa4dfcb206d42d789c5ecfb4343eadc1	/state/data/state/state-2007-10-ARC/state-20071001-aud-000011.arc.gz
ab3f8d618e51032418a3285fd24687f7a2a006cd546de43e575c72b1fed727e4	/state/data/state/state-2007-10-ARC/state-20071001-aud-000012.arc.gz

Directories must be separated by a /. This is different from Windows where directories are separated by a \.

The Audit Manager is able to supply a list of digests for a collection or directory.

  1. From the status page, select the collection you wish to generate a list for
  2. Click on more..., then 'Download Digests'. You will see a list of digests and filenames in the correct format. You can right-click and save the list to your hard drive.

Step 2: Upload to ACE

From the status screen, select the collection you wish to compare. Click the 'more...' link to bring up the drop down menu, then click 'Compare Collection'. You will see the following screen. Click on 'Browse' and select the name of the file you saved your digest link into during step 1. Click submit.

Compare-1.png


If everything in the file you submitted matches the selected collection, you will see the following summary showing no differences.

Compare-3.png


If, for some reason there are differences, then your screen may look something like the following:

Compare.png

There are four different ways in which collections may differ.

Files in original collection, but not in supplied
This is a list of files that appear in the collection being monitored by ACE, but do not appear in the uploaded file.
Files in supplied file, but not original collection
These are files that appear in the uploaded list, but not in the collection monitored by ACE.
Files with different names, but same digests
These are files which have the same content, but only differ in directory or name. This is seen when files are renamed and moved across different operating systems.
Files with same names, but different digests
These are files that have the same directory and name, but have different content. Most likely seen during a bad replication.