The behavior described in the article can be applied for backups and archives. Backups will be used throughout the text as an example.
The TSM server reads and writes predominantly in 256 KB blocks for sequential access pools. The problem described here does not exist in such extent on a random access disk storagepools, which uses a block size of only 4 KB.
TSM’s logical during transactions of backing up and restoring data, is that multiple smaller files will be put together into an aggregates. When backing up just a small, single file per transaction/command, it will be a block of 256 KB per very small file. The minimum unit is 1 block of 256 KB, whether the originating file is for example 1 KB or 64 KB.
This information is meant to discuss some useful TSM server parameters in relation to deduplication. It is by no means an in-depth explanation about TSM’s deduplication, or even deduplication in general.
If you have a Virtual Tape Library (VTL) which can (in-line) deduplicate the stored data, e.g. IBM’s ProtecTIER, you most likely want to use this built-in hardware deduplication functionality. As you are aware, you can also choice to use TSM’s native deduplication. This means deduplication on the TSM application level and it is available for client-side and server-side, or even a combination of both.
There might be problems when one is trying to deduplicate very large objects, because this will result in many, many chunks, associated metadata, server processing, etc. Sometimes it might be beneficial to just exclude very large objects from deduplication altogether.
The following is post is a guest-post written by Julien Sauvanet. I worked with Julien in the past, and asked him to write a guest post because I think it might be beneficial to the intended audience. The only thing I did was some moderation.
First of all, let’s explain what these CTL files are. When performing a virtual machine backup using TSM4VE, two different pieces of information (seen as files within TSM server database) are sent to the TSM server:
.DAT files: are basically the files containing the client’s data being backed up;
.CTL files: these are the control files containing information about each DAT file. Each time a .DAT file is sent, there is a .CTL file associated to it, which is stored on the TSM server as well.