Class MultiChunk

  • Direct Known Subclasses:
    ZipMultiChunk

    public abstract class MultiChunk
    extends java.lang.Object
    A multichunk represents the container format that stores one to many Chunks. Multichunks are created during the chunking/deduplication process by a MultiChunker.

    There are two modes to handle multichunks:

    • When a new multichunk is written and filled up with chunks, the Deduper makes sure that chunks are only added until a multichunk's minimum size has been reached, and closes the multichunk afterwards. During that process, the write() method is called for each chunk, and isFull() is checked for the size.
    • When a multichunk is read from a file or an input stream, it can be processed sequentially using the read() method (not used in current code!), or in a random order using the getChunkInputStream() method. Because of the latter method, it is essential that random read access on a multichunk is possible.
    • Constructor Detail

      • MultiChunk

        public MultiChunk​(MultiChunkEntry.MultiChunkId id,
                          int minSize)
        Creates a new multichunk.

        This method should be used if the multichunk identifier is known to the calling method. This is typically the case if a new multichunk is written.

        Parameters:
        id - Unique multichunk identifier (can be randomly chosen)
        minSize - Minimum multichunk size, used to determine if chunks can still be added
      • MultiChunk

        public MultiChunk​(int minSize)
        Creates a new multichunk.

        This method should be used if the multichunk identifier is not known to the calling method. This is typically the case if a multichunk is read from a file.

        Parameters:
        minSize - Minimum multichunk size, used to determine if chunks can still be added
    • Method Detail

      • write

        public abstract void write​(Chunk chunk)
                            throws java.io.IOException
        In write mode, this method can be used to write Chunks to a multichunk.

        Implementations must increase the size by the amount written to the multichunk (input size sufficient) and make sure that (if required) a header is written for the first chunk.

        Implementations do not have to check whether or not a multichunk is full. This should be done outside the multichunker/multichunk as part of the deduplication algorithm in the Deduper.

        Parameters:
        chunk - Chunk to be written to the multichunk container
        Throws:
        java.io.IOException - If an exception occurs when writing to the multichunk
      • read

        public abstract Chunk read()
                            throws java.io.IOException
        In read mode, this method can be used to sequentially read Chunks from a multichunk. The method returns a chunk until no more chunks are available, at which point it will return null.

        If random read access on a multichunk is desired, the getChunkInputStream() method should be used instead.

        Returns:
        Returns the next chunk in the opened multichunk, or null if no chunk is available (anymore)
        Throws:
        java.io.IOException - If an exception occurs when reading from the multichunk
      • getChunkInputStream

        public abstract java.io.InputStream getChunkInputStream​(byte[] checksum)
                                                         throws java.io.IOException
        In read mode, this method can be used to read Chunks in random access mode, using a chunk checksum as identifier. The method returns a chunk input stream (the chunk's data) if the chunk is found, and null otherwise.

        If all chunks are read from a multichunk sequentially, the read() method should be used instead.

        Parameters:
        checksum - The checksum identifying a chunk instance
        Returns:
        Returns a chunk input stream (chunk data) if the chunk can be found in the multichunk, or null otherwise
        Throws:
        java.io.IOException - If an exception occurs when reading from the multichunk
      • close

        public abstract void close()
                            throws java.io.IOException
        Closes a multichunk after writing/reading.

        Implementations should close the underlying input/output stream (depending on whether the chunk was opened in read or write mode.

        Throws:
        java.io.IOException - If an exception occurs when closing the multichunk
      • isFull

        public boolean isFull()
        In write mode, this method determines the fill state of the multichunk and returns whether or not a new chunk can still be added. It is used by the Deduper.
        Returns:
        Returns true if no more chunks should be added and the chunk should be closed, false otherwise
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class java.lang.Object