HistoryTaxonomy
From Pledge
Event Model
The history system employs a very simple content object life-cycle model, which recognizes only 3 types of significant (therefore recorded) events:
- creation - min occur: 1, max occur: 1
- modification - min occur: 0, max occur: n
- removal - min occur: 0, max occur: 1
Object Scope
The content objects modeled by the history system may be divided into 3 groups:
'Persistent' objects, which are comprised by Communities, Collections, and Items. Only these objects are assigned persistent identifiers by DSpace.
'Agent' objects, of which the only representative is EPerson.
'Ingest' objects, namely Workspace items (items in the process of being submitted), and Workflow items (those in the process of review, edit, etc).
Simple arithmetic suggests (6 object types with 3 event types) that there are only 18 invocations of the history system in the codebase. This is essentially correct, although the special case of Item withdrawal/reinstatement generates a few more invocations. Withdrawal and reinstatement are considered to be item modifications.
Integrity
There are 2 principal means of insuring the integrity of the history system. First, each event record has associated with it an object 'state' identifier which is contained in the RDF record. Roughly, the view is that each event puts an object into a new state, and the entire life of the object consists of this temporal series of states. The RDBMS maintains a table of these states correlated to the object to which they pertain.
Thus, as long as the database is available, one could determine whether all RDF records for a given object have been accounted for, that all its states that were assigned have been found. That is, this technique would make removal of any RDF record detectable.
Comment: The "state" records are actually meaningless because of errors in the code and data model. The "action" records refer to states by numbers which are keys to the `HistoryState` DB table; but alas, the other column in that table is simply a URI which does not uniquely identify a "state" record. The result is that there is no way to relate state records to action records based on the data available. That's the result of building a system without ever attempting to use it. --lcs
Second, the database stores a checksum of the RDF record itself. Thus, attempts to tamper with (modify) any RDF record would also be detectable.
This checks out; every legitimate history file has a row in `History` that includes, apparently, an accurate checksum. (Some files don't have corresponding rows because they were written before the event got undone in a transaction rollback.) Don't give the design too much credit for including checksums, however, since they seem to have been intended mainly as a mechanism to avoid writing duplicate state-description records! --lcs
