Analyze Disk Access Pattern of File Systems for Content Addressable Storage

Kuniyasu Suzaki

CAS (Content Addressable Storage) is a virtual storage with storage deduplication technique, which merges same-content chunks and reduces the consumption of physical storage. The performance of CAS depends on access patterns (size, frequency, and locality of reference) of a file system, because the effect of merging depends on the size of a chuck (access unit) used in deduplication.

This paper reveals the semantic gap and affinity between the Linux file systems (EXT3, EXT4, XFS, JFS, and ReiserFS) and CAS. We measured block accesses at installation (Ubuntu 10.10) and at boot time, and found varieties in access patterns, even if same contents were installed. The results indicate that the five file systems allocate data scatter from the macroscopic view but keep contiguous blocks for data from the microscopic. The results also show the features of a file system, and make it possible to guess the file system used.

