org.apache.commons.jrcs.diff
Class SimpleDiff

java.lang.Object
  extended by org.apache.commons.jrcs.diff.SimpleDiff
All Implemented Interfaces:
DiffAlgorithm

public class SimpleDiff
extends java.lang.Object
implements DiffAlgorithm

Implements a simple differencing algortithm.

Version:
$Revision: 1.7 $
Author:
Juanco Anez

Overview of Algorithm

by bwm

The algorithm is optimised for situations where the input sequences have few repeated objects. If it is given input with many repeated objects it will report sub-optimal changes. However, given appropriate input, it is fast, and linear in memory usage.

The algorithm consists of the following steps:

  • compute an equivalence set for the input data
  • translate each element of the orginal and revised input sequences to a member of the equivalence set
  • match the the input sequences to determine the deltas, i.e. the differences between the original and revised sequences.

The first step is to compute a an equivalence set for the input data. The equivalence set is computed from objects that are in the original input sequence

   eq(x) = the index of the first occurence of x in the original sequence.
 

With this equivalence function, the algorithm can compare integers rather than strings, which is considerably more efficient.

The second step is to compute the datastructure on which the algorithm will operate. Having computed the equivalence function in the previous step, we can compute two arrays where indx[i] = eqs(orig[i]) and jndx[i] = eqs(rev[i]). The algorithm can now operate on indx and jndx instead of orig and rev. Thus, comparisons are then on O(int == int) instead of O(Object.equals(Object)).

The algorithm now matches indx and jndx. Whilst indx[i] == jndx[i] it skips matching objects in the sequence. In seeking to match objects in the input sequence it assumes that each object is likely to be unique. It uses the known characteristics of the unique equivalence function. It can tell from the eq value if this object appeared in the other sequence at all. If it did not, there is no point in searching for a match.

Recall that the eq function value is the index earliest occurrence in the orig sequence. This information is used to search efficiently for the next match. The algorithm is perfect when all input objects are unique, but degrades when input objects are not unique. When input objects are not unique an optimal match may not be found, but a correct match will be.

Having identified common matching objects in the orig and revised sequences, the differences between them are easily computed.

See Also:
Delta, Modifications: 27/Apr/2003 bwm Added some comments whilst trying to figure out the algorithm 03 May 2003 bwm Created this implementation class by refactoring it out of the Diff class to enable plug in difference algorithms

Field Summary
(package private) static int EOS
           
(package private) static int NOT_FOUND_i
           
(package private) static int NOT_FOUND_j
           
 
Constructor Summary
SimpleDiff()
           
 
Method Summary
protected  java.util.Map buildEqSet(java.lang.Object[] orig, java.lang.Object[] rev)
          create a Map from each common item in orig and rev to the index of its first occurrence in orig
protected  int[] buildIndex(java.util.Map eqs, java.lang.Object[] seq, int NF)
          build a an array such each a[i] = eqs([i]) or NF if eqs([i]) undefined
 Revision diff(java.lang.Object[] orig, java.lang.Object[] rev)
          Compute the difference between original and revised sequences.
protected  int scan(int[] ndx, int i, int target)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NOT_FOUND_i

static final int NOT_FOUND_i
See Also:
Constant Field Values

NOT_FOUND_j

static final int NOT_FOUND_j
See Also:
Constant Field Values

EOS

static final int EOS
See Also:
Constant Field Values
Constructor Detail

SimpleDiff

public SimpleDiff()
Method Detail

scan

protected int scan(int[] ndx,
                   int i,
                   int target)

diff

public Revision diff(java.lang.Object[] orig,
                     java.lang.Object[] rev)
              throws DifferentiationFailedException
Compute the difference between original and revised sequences.

Specified by:
diff in interface DiffAlgorithm
Parameters:
orig - The original sequence.
rev - The revised sequence to be compared with the original.
Returns:
A Revision object describing the differences.
Throws:
DifferenciationFailedException - if the diff could not be computed.
DifferentiationFailedException - if the diff could not be computed.

buildEqSet

protected java.util.Map buildEqSet(java.lang.Object[] orig,
                                   java.lang.Object[] rev)
create a Map from each common item in orig and rev to the index of its first occurrence in orig

Parameters:
orig - the original sequence of items
rev - the revised sequence of items

buildIndex

protected int[] buildIndex(java.util.Map eqs,
                           java.lang.Object[] seq,
                           int NF)
build a an array such each a[i] = eqs([i]) or NF if eqs([i]) undefined

Parameters:
eqs - a mapping from Object to Integer
seq - a sequence of objects
NF - the not found marker


Copyright 2002 the Apache Software Foundation
Copyright ? 1999-2001 Juancarlo A?ez, Caracas, Venezuela.
All rights reserved
. http://www.suigeneris.org/jrcs