FSCache and the on-disk structure of the cached data

The ‘cachefilesd’ kernel module will create two directories at the location specified in /etc/cachefilesd.conf. By default it’s /var/cache/fscache/.

[root@montypython ~]# lsmod |grep -i cache
cachefiles             40871  1
fscache                62354  3 nfs,cachefiles,nfsv4

Those are /var/cache/fscache/cache and /var/cache/fscache/graveyard.

The cache structure is maintained inside ‘/var/cache/fscache/cache/’, while anything that is retired or culled is moved to ‘graveyard’. The ‘cachefilesd’ daemon monitors ‘graveyard’ using ‘dnotify’ and will delete anything that is in there.

We’ll try an example. Consider an NFS share mounted with fscache support. The share contains the following files, with some random text.

# ls /vol1
files1.txt  files2.txt  files3.txt  files4.txt

a) Configure ‘cachefiles’ by editing ‘/etc/cachefilesd.conf’, and start the ‘cachefilesd’ daemon.

# systemctl start cachefilesd

b) Mount the NFS share on the client with the ‘fsc’ mount option, to enable ‘fscache’ support.

# sudo mount localhost:/vol1 /vol1-backup/ -o fsc

d) Access the data from the mount point, and fscache will create the backed caching index at the location specified in /etc/cachefilesd.conf. By default, its /var/cache/fscache/

e) Once the files are accessed on the client side, fscache builds an index as following:

NOTE: The index structure is dependent on the netfs (NFS in our case). The netfs driver can structure the cache index as it seems fit.

Explanation of the caching structure:

# tree /var/cache/fscache/
└── @4a
└── I03nfs
├── @22
│   └── Jo00000008400000000000000000000000400
│      └── @59
│           └── J110000000000000000w080000000000000000000000
│               ├── @53
│               │   └── EE0g00sgwB-90600000000ww000000000000000
│               ├── @5e
│               │   └── EE0g00sgwB-90600000000ww000000000000000
│               ├── @61
│               │   └── EE0g00sgwB-90600000000ww000000000000000
│               ├── @62
│               │   └── EE0g00sgwB-90600000000ww000000000000000
│               ├── @70
│               │   └── EE0g00sgwB-90600000000ww000000000000000
│               ├── @7c
│               │   └── EE0g00sgwB-90600000000ww000000000000000
│               └── @e8
│                   └── EE0g00sgwB-90600000000ww0000000000000000
└── @42
└── Jc000000000000EggDj00
└── @0a

a) The ‘cache‘ directory under /var/cache/fscache/ is a special index and can be seen as the root of the entire cache index structure.

b) Data objects (actual cached files) are represented as files if they have no children, or folders if they have. If represented as a directory, data objects will have a file inside named ‘data’ which holds the data.

c) The ‘cachefiles‘ kernel module represents :

i)   ‘index‘ objects as ‘directories‘, starting with either ‘I‘ or ‘J‘.

ii)  Data objects are represented with filenames, beginning with ‘D‘ or ‘E‘.

iii) Special objects are similar to data objects, and start with ‘S‘ or ‘T‘.

In general, any object would be represented as a folder, if that object has children.

g) In the directory hierarchy, immediately between the parent object and its child object, are directories named with *hash values* of the immediate child object keys, starting with an ‘@‘.

The child objects are placed inside this directory.These child objects would be folders, if it has child objects, or files if its the cached data itself. This can go on till the end of the path and reaches the file where the cached data is.

Representation of the object indexes (For NFS, in this case)

INDEX     INDEX      INDEX                             DATA FILES
========= ========== ================================= ================

FS-Cache and CacheFS, what are the differences?

FS-Cache and CacheFS. Are there any differences between these two? Initially, I thought both were same. But no, it’s not.

CacheFS is the backend implementation which caches the data onto the disk and mainpulates it, while FS-Cache is an interface which talks to CacheFS.

So why do we need two levels here?

FS-Cache was introduced as an API or front-end for CacheFS, which can be used by any file system driver. The file system driver talks with the FS-Cache API which inturn talks with CacheFS in the back-end. Hence, FS-Cache acts as a common interface for the file system drivers without the need to understand the backend CacheFS complexities, and how its implemented.

The only drawback is the additional code that needs to go into each file system driver which needs to use FS-Cache. ie.. Every file system driver that needs to talk with FS-Cache, has to be patched with the support to do so. Moreover, the cache structure differs slightly between file systems using it, and thus lacks a standard. This unfortunately, prevents FS-Cache from being used by every network filesystem out there.

The data flow would be as:

VFS <-> File system driver (NFS/CIFS etc..) <-> FS-Cache <-> CacheFS <-> Cached data

CacheFS need not cache every file in its entirety, it can also cache files partially. This partial caching mechanism is possible since FS-Cache caches ‘pages’ rather than an entire file. Pages are smaller fixed-size segments of data, and these are cached depending on how much the files are read initially.

FS-Cache does not require an open file to be loaded in the cache, prior being accessed. This is a nice feature as far as I understand, and the reasons are:

a) Not every open file in the remote file system can be loaded into cache, due to size limits. In such a case, only certain parts (pages) may be loaded. And the rest of the file should be accessed normally over the network.

b) The cache won’t necessarily be large enough to hold all the open files on the remote system.

c) Even if the cache is not populated properly, the file should be accessible. ie.. the cache should be able to be bypassed totally.

This hopefully clears the differences between FS-Cache and CacheFS.