Class TttdChunker


  • public class TttdChunker
    extends Chunker
    The TTTD chunker is an implementation of the Two Threshold Two Divisor (TTTD) chunking method based on the paper of Kave Eshghi and Hsiu Khuern Tang, 2005.

    The class implements a content-based Chunker, i.e. it determines breakpoints on the content rather than on the offset. It uses a fingerprinting algorithm to calculate window fingerprints and determine a breakpoint. The algorithm given in the constructor is instantiated to an implementation of a Fingerprinter.

    The TTTD chunking method makes sure that it does not produce chunks smaller than a certain threshold. To do so, it ignores chunk boundaries until a minimum size is reached. Even though this negatively affects the duplicate detection (because bigger chunks are created), chunk boundaries are still "natural" -- because they are based on the underlying data.

    To handle chunks that exceed a certain maximum size, TTTD applies two techniques. It defines the two divisors D (regular divisor) and D' (backup divisor) with D > D'. Because D' is smaller than D, it is more likely to find a breakpoint.

    If D does not find a chunk boundary before the maximum chunk size is reached, the backup breakpoint found by D' is used. If D also does not find any breakpoints, TTTD simply cuts the chunk at the maximum chunk size. TTTD hence guarantees to emit chunks with a minimum and maximum size.

    See Also:
    Original TTTD paper: A framework for analyzing and improving content-based chunking algorithms (2005, Kave Eshghi and Hsiu Khuern Tang)
    • Constructor Summary

      Constructors 
      Constructor Description
      TttdChunker​(int avgChunkSize)  
      TttdChunker​(int Tmin, int Tmax, int D, int Ddash, int windowSize)  
      TttdChunker​(int Tmin, int Tmax, int D, int Ddash, int windowSize, java.lang.String digestAlg)  
      TttdChunker​(int Tmin, int Tmax, int D, int Ddash, int windowSize, java.lang.String digestAlg, java.lang.String fingerprintAlg)  
      TttdChunker​(int avgChunkSize, int windowSize, java.lang.String digestAlg, java.lang.String fingerprintAlg)
      Infer the optimal values for avgChunkSize from the orginal paper's optimal (measured) values.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Chunker.ChunkEnumeration createChunks​(java.io.File file)
      Opens the given file and creates enumeration of Chunks.
      java.lang.String getChecksumAlgorithm()
      Returns the checksum algorithm used by the chunker to calculate the chunk and file checksums.
      java.lang.String toString()
      Returns a string representation of the chunker implementation.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • TttdChunker

        public TttdChunker​(int Tmin,
                           int Tmax,
                           int D,
                           int Ddash,
                           int windowSize)
      • TttdChunker

        public TttdChunker​(int Tmin,
                           int Tmax,
                           int D,
                           int Ddash,
                           int windowSize,
                           java.lang.String digestAlg)
      • TttdChunker

        public TttdChunker​(int avgChunkSize)
      • TttdChunker

        public TttdChunker​(int avgChunkSize,
                           int windowSize,
                           java.lang.String digestAlg,
                           java.lang.String fingerprintAlg)
        Infer the optimal values for avgChunkSize from the orginal paper's optimal (measured) values. LBFS: avg. chunk size = 1015 bytes --> Tmin = 460, Tmax = 2800, D = 540, Ddash = 270
      • TttdChunker

        public TttdChunker​(int Tmin,
                           int Tmax,
                           int D,
                           int Ddash,
                           int windowSize,
                           java.lang.String digestAlg,
                           java.lang.String fingerprintAlg)
    • Method Detail

      • createChunks

        public Chunker.ChunkEnumeration createChunks​(java.io.File file)
                                              throws java.io.IOException
        Description copied from class: Chunker
        Opens the given file and creates enumeration of Chunks. This method should not read the file into memory at once, but instead read and emit new chunks when requested using nextElement().

        The enumeration must be closed by the close() method to remove any possible locks.

        Specified by:
        createChunks in class Chunker
        Parameters:
        file - The file that is supposed to be chunked
        Returns:
        An enumeration of individual chunks, must be closed at the end of processing
        Throws:
        java.io.IOException - If any file exceptions occur
      • getChecksumAlgorithm

        public java.lang.String getChecksumAlgorithm()
        Description copied from class: Chunker
        Returns the checksum algorithm used by the chunker to calculate the chunk and file checksums. For the deduplication process to function properly, the checksum algorithms of all chunkers must be equal.
        Specified by:
        getChecksumAlgorithm in class Chunker
      • toString

        public java.lang.String toString()
        Description copied from class: Chunker
        Returns a string representation of the chunker implementation.
        Specified by:
        toString in class Chunker