2014 Linux Symposium, July 14-16

Btr-Diff: An Innovative Approach to Differentiate BTRFS Snapshots

Nafisa O Mandliwala (nafisa.mandliwala@gmail.com)

Efficient storage and fast retrieval of data has always been of utmost importance. The BtrFs file system is a copy-on-write (COW) based B-tree file system that has an in-built support for snapshots and is considered a potential replacement for the EXT4 file system. Snapshots are useful to have local online "copies" of the file system that can be referred back to, or to implement a form of deduplication, or for taking a full backup of the file system. Ability to compare snapshots becomes crucial for the system administrators as well as end users. The existing snapshot management tools perform directory based comparison on block level in user space. Their algorithms use 'diff'-like mechanism with a few more 'smarts' to avoid comparing files that haven't changed. This approach requires all of the metadata for the two trees being compared to be read. The most I/O intensive part is not comparing the files but generating the list of changed files. It needs to list all the files in the tree and 'stat' them to see if they have changed between the snapshots. This is slow and only gets slower as the file system grows.

With BtrFs send/receive patchset, the filesystem can be instructed to calculate the set of changes made between the two snapshots and serialize them to a file. Our objective is to leverage the 'send' part in the kernel to implement a new mechanism to list all the files that have been added, removed, changed or had their metadata changed in some way. The 'send' ioctl does the tree compare in kernel space using the on-disk metadata format (rather than the abstract 'stat' format exported to the user space), which includes the ability to recognize when entire sub-trees can be skipped for comparison. Our approach aims at taking advantage of the BtrFs inbuilt B-tree structure to speed up the tree traversal and detect changes in snapshots based on inode values. In addition to detecting changes between successive snapshots, our tool can also detect changes between a snapshot and an explicitly mentioned parent. This solution finds it usage where daily incremental backups of the file system are to be taken and can be very easily integrated with existing snapshot management tools.

Policies   |   Media Archives