Class TttdChunker
- java.lang.Object
-
- org.syncany.chunk.Chunker
-
- org.syncany.chunk.TttdChunker
-
public class TttdChunker extends Chunker
The TTTD chunker is an implementation of the Two Threshold Two Divisor (TTTD) chunking method based on the paper of Kave Eshghi and Hsiu Khuern Tang, 2005.The class implements a content-based
Chunker
, i.e. it determines breakpoints on the content rather than on the offset. It uses a fingerprinting algorithm to calculate window fingerprints and determine a breakpoint. The algorithm given in the constructor is instantiated to an implementation of aFingerprinter
.The TTTD chunking method makes sure that it does not produce chunks smaller than a certain threshold. To do so, it ignores chunk boundaries until a minimum size is reached. Even though this negatively affects the duplicate detection (because bigger chunks are created), chunk boundaries are still "natural" -- because they are based on the underlying data.
To handle chunks that exceed a certain maximum size, TTTD applies two techniques. It defines the two divisors D (regular divisor) and D' (backup divisor) with D > D'. Because D' is smaller than D, it is more likely to find a breakpoint.
If D does not find a chunk boundary before the maximum chunk size is reached, the backup breakpoint found by D' is used. If D also does not find any breakpoints, TTTD simply cuts the chunk at the maximum chunk size. TTTD hence guarantees to emit chunks with a minimum and maximum size.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
TttdChunker.TTTDEnumeration
-
Nested classes/interfaces inherited from class org.syncany.chunk.Chunker
Chunker.ChunkEnumeration
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
DEFAULT_DIGEST_ALG
static java.lang.String
DEFAULT_FINGERPRINT_ALG
static int
DEFAULT_WINDOW_SIZE
-
Fields inherited from class org.syncany.chunk.Chunker
PROPERTY_SIZE
-
-
Constructor Summary
Constructors Constructor Description TttdChunker(int avgChunkSize)
TttdChunker(int Tmin, int Tmax, int D, int Ddash, int windowSize)
TttdChunker(int Tmin, int Tmax, int D, int Ddash, int windowSize, java.lang.String digestAlg)
TttdChunker(int Tmin, int Tmax, int D, int Ddash, int windowSize, java.lang.String digestAlg, java.lang.String fingerprintAlg)
TttdChunker(int avgChunkSize, int windowSize, java.lang.String digestAlg, java.lang.String fingerprintAlg)
Infer the optimal values for avgChunkSize from the orginal paper's optimal (measured) values.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Chunker.ChunkEnumeration
createChunks(java.io.File file)
Opens the given file and creates enumeration ofChunk
s.java.lang.String
getChecksumAlgorithm()
Returns the checksum algorithm used by the chunker to calculate the chunk and file checksums.java.lang.String
toString()
Returns a string representation of the chunker implementation.
-
-
-
Field Detail
-
DEFAULT_WINDOW_SIZE
public static final int DEFAULT_WINDOW_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_DIGEST_ALG
public static final java.lang.String DEFAULT_DIGEST_ALG
- See Also:
- Constant Field Values
-
DEFAULT_FINGERPRINT_ALG
public static final java.lang.String DEFAULT_FINGERPRINT_ALG
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
TttdChunker
public TttdChunker(int Tmin, int Tmax, int D, int Ddash, int windowSize)
-
TttdChunker
public TttdChunker(int Tmin, int Tmax, int D, int Ddash, int windowSize, java.lang.String digestAlg)
-
TttdChunker
public TttdChunker(int avgChunkSize)
-
TttdChunker
public TttdChunker(int avgChunkSize, int windowSize, java.lang.String digestAlg, java.lang.String fingerprintAlg)
Infer the optimal values for avgChunkSize from the orginal paper's optimal (measured) values. LBFS: avg. chunk size = 1015 bytes --> Tmin = 460, Tmax = 2800, D = 540, Ddash = 270
-
TttdChunker
public TttdChunker(int Tmin, int Tmax, int D, int Ddash, int windowSize, java.lang.String digestAlg, java.lang.String fingerprintAlg)
-
-
Method Detail
-
createChunks
public Chunker.ChunkEnumeration createChunks(java.io.File file) throws java.io.IOException
Description copied from class:Chunker
Opens the given file and creates enumeration ofChunk
s. This method should not read the file into memory at once, but instead read and emit new chunks when requested usingnextElement()
.The enumeration must be closed by the
close()
method to remove any possible locks.- Specified by:
createChunks
in classChunker
- Parameters:
file
- The file that is supposed to be chunked- Returns:
- An enumeration of individual chunks, must be closed at the end of processing
- Throws:
java.io.IOException
- If any file exceptions occur
-
getChecksumAlgorithm
public java.lang.String getChecksumAlgorithm()
Description copied from class:Chunker
Returns the checksum algorithm used by the chunker to calculate the chunk and file checksums. For the deduplication process to function properly, the checksum algorithms of all chunkers must be equal.- Specified by:
getChecksumAlgorithm
in classChunker
-
-