File system – Part 2: High-level API design

Posted on September 14, 2011 by Stefan Reinalter

Building upon the low-level API introduced in an earlier post, we will take a look at the platform-independent high-level API today, which provides support for the things that are to be expected from a game engine file system.

Specifically, Molecule’s file system provides the following features:

Multiple file devices (native, memory, double-buffered, pak, network, etc.)
Encryption / decryption
Compression / decompression (ZIP, LZMA, etc.)
Synchronous / Asynchronous operations
Aliases

One thing that should be kept in mind is that all of the above should be implemented as orthogonal features. The following is a short list of possible use cases:

A pak-file can contain both compressed and uncompressed files, hence compression needs to be something which can piggyback onto other functionality.
Any file, no matter where it comes from, can be encrypted – hence, decryption also needs to be implemented as a piggyback feature.
The above should work for both synchronous and asynchronous operations.
A streamed file might be contained in a pak-file.

Following the Law of Demeter, we would like to implement each feature in its own isolated space, but allow clients to build combinations thereof.

This time, I opted not to use policy-based design, because these kinds of template-based design often force you into compile-time decisions, but a file system is something where I wanted to have the option of run-time changes, configurations, etc.

Instead, the design I came up with for Molecule is the following:

A file system supports an arbitrary amount of file devices which can be mounted/unmounted to/from the system.
Each file device takes care of exactly one thing, be it reading from disk, encrypting data, compressing data, sending data over the network, etc.
A file device is responsible for returning a proper file interface, not the file system itself.
File devices can be piggybacked onto other file devices, if they need to.

The last bullet point is a very important one, and will be discussed in a minute. Let’s take a look at the relevant parts of the file system API first:

class FileSystem
{
public:
  // ...

  typedef Flags<internal::FileSystemModeFlags> Mode;

  /// Mounts a file device to the file system
  void Mount(FileDevice* device);

  /// Unmounts a file device from the file system
  void Unmount(FileDevice* device);

  /// Opens a file for synchronous operations.
  /// NOTE: A nullptr is returned if no device for opening the file could be found.
  File* Open(const char* deviceList, const char* path, Mode mode);

  /// Opens a file for asynchronous operations.
  /// NOTE: A nullptr is returned if no device for opening the file could be found.
  AsyncFile* OpenAsync(const char* deviceList, const char* path, Mode mode);

  /// Closes a file previously returned by a call to Open()
  void Close(File* file);

  /// Closes a file previously returned by a call to OpenAsync()
  void Close(AsyncFile* file);

    // ...
};

Whenever a file is opened via a call to FileSystem::Open(), a File instance is returned. This interface offers common functionality for reading, writing, seeking, etc., and serves as an abstract base class. Examples of concrete implementations are the following:

DiskFile – for reading from HDD, DVD, BluRay, etc.
MemoryFile – for entirely reading a file first, and then just copying from memory. Uses any other File internally.
CryptoFile – for decrypting/encrypting data upon reading/writing. Uses any other FIle internally.

As stated above, file devices are responsible for returning a proper File implementation. The way these devices behave is the following:

A FileDevice is an abstract base class, which offers functionality for opening and closing a file.
Each file device implementation (e.g. DiskFileDevice, CryptoFileDevice, etc.) takes care of returning the proper File instance.
The file system walks through the list of mounted file devices, and asks the one corresponding to the device list’s identifier to open a file.

In order to make it easier to understand, let’s walk through a simple example:

// build a simple file system
FileSystem fs(fsArena, 8);

DiskFileDevice diskDevice;
fs.Mount(&diskDevice);

CryptoFileDevice cryptoDevice;
fs.Mount(&cryptoDevice);

// open a file
File* file = fs.Open("crypto:disk", "test.txt", FileSystem::Mode::WRITE | FileSystem::Mode::RECREATE);

// write something into the file
// ...

fs.Close(file);

Upon the call to FileSystem::Open(), the file system internally walks the list of mounted devices, and checks whose ID matches first device in the list (“crypto”). It then asks the CryptoFileDevice to Open() the file.

And here comes the interesting part – the crypto file device is a piggyback-device, which means it never touches files by itself. Instead, it asks other file devices (done via the file system) to open the file instead. This time, the DiskFileDevice (“disk”) is responsible for opening the file, and returns a DiskFile implementation to the caller, which was the CryptoFileDevice.

The CryptoFileDevice in turn takes this DiskFile, and hands it to the CryptoFile implementation, which is returned to the user. Therefore, each time the user calls Read() on the given File, the underlying CryptoFile implementation does something like the following:

unsigned int CryptoFile::DoRead(void* buffer, unsigned int length)
{
  const unsigned int bytesRead = m_file->Read(buffer, length);

  // very simple crypting
  char* b = (char*)buffer;
  for (unsigned int i=0; i<length; ++i)
  {
    b[i] ^= 58;
    b[i] ^= 129;
  }

  return bytesRead;
}

The implementation doesn’t care which File implementation (m_file) it internally uses for reading. It can be any implementation, which makes it possible to arbitrarily piggyback files onto each other, as in the following examples:

// open a crypted, zipped file, read from the network
File* file = fs.Open("zip:crypto:tcp", "test.txt", FileSystem::Mode::READ);

// open a crypted file living on the cartridge (e.g. savegames)
File* file = fs.Open("crypto:cartridge", "test.txt", FileSystem::Mode::READ);

As long as each file device implementation which is to be used as a piggyback device just asks the file system to open a file, which in turn asks the remaining mounted devices to do the job, features can be combined endlessly, even with user-provided file devices.

Additionally, using this system in conjunction with config variables turns out to be really powerful, and offers a whole new set of possibilities:

ConfigSettingString g_sgDevice("g_sgDevice", "The device used for savegames.", "crypto:cartridge");
ConfigSettingString g_defDevice("g_defDevice", "The default device.", "disk");

// open any file on HDD, DVD, etc.
File* file = fs.Open(g_defDevice, "test.txt", FileSystem::Mode::READ);

// open a savegame
File* file = fs.Open(g_sgDevice, "test.txt", FileSystem::Mode::READ);

Because config variables can be configured in either source-code, using configuration files, or by using the in-game console, device lists can now be changed on-the-fly. This is extremely useful during development and debugging.

Developers with a lot of memory available might change their g_defDevice configuration from “disk” to “memory:disk”, resulting in extremely fast loading times. People from the QA department might want to disable encryption of save games during development, so they can just pull down the in-game console mid-game, change the corresponding variable via “set g_sgDevice disk” and have their unencrypted savegames stored to disk, ready to attach them to a bug in the database. During development, programmers will want to switch between “disk” and “pak:disk” (enabling/disabling big pak-files, because those often cause troubles), which can easily be done using the above.

One part of the implementation I haven’t spoken about is the AsyncFile interface. It is somewhat similar to the File interface, but offering facilities for asynchronous operations instead. The underlying piggyback mechanism is exactly the same – OpenAsync() is deferred to mounted file devices.

That’s all there is to the file system, which concludes today’s post!

13 thoughts on “File system – Part 2: High-level API design”

Riley L on February 4, 2012 at 12:59 pm said:

What do you think of this Stefan?

Buffer-centric IO

Reply ↓
- Stefan Reinalter on February 6, 2012 at 10:19 am said:
  
  I like the approach, but I’m not yet sure whether it would work for the whole filesystem in an engine.
  
  Reply ↓
Gavin on January 22, 2013 at 1:31 pm said:

Hey Stefan,

I’m a bit confused here and have some questions. First of all I like your approach of the file system, but what does the FileDevice do in connection with the File Class? And what does the OsFile Class do from part one?

Lets assume I’d like to load a zip file from my hdd. I’d use two file devices one for loading the zip file from HDD into a buffer and one for loading the actual zip file e.g. creating an instance of a ZipFile Class inherited from the File Class, but what would be the actual task of the FileDevice besides creating the correct instance of the returned file class to the file system? Should it take care of loading all the correct zip headers and stuff? Or should this be done by the ZipFile class itself?

Thanks in advance

Reply ↓
- Stefan Reinalter on January 22, 2013 at 5:31 pm said:
  
  The FileDevice implementations simply return an instance of a FIle, e.g. a DiskFileDevice would return a DiskFile*, a ZipFileDevice would return a ZipFile*. Same goes for asynchronous Files (those derive from a different interface). All high-level code deals with FileDevice* and File* only, though.
  
  With this mechanism, you can piggy-back implementations on top of each other, without having to let them know of each other. In your example, you would have a ZipFileDevice, which is responsible for creating a ZipFile. The ZipFile would have a File* as member which it uses for reading data, but it doesn’t matter if it’s a DiskFile or not. It could also be a NetworkFile, so that zip-files could be read from the network. The file system/devices take care of creating the correct instance, so you can either have “zip:disk” or “zip:tcp”, or something entirely different.
  
  Anyway, the ZipFile would take care of reading headers, decompressing, etc. But it only uses its internal File* member for reading data, so the location where the data actually comes from is transparent to the ZipFile itself.
  
  A note on the OsFile: It’s the only platform-dependent part of the filesystem, and used by the DiskFile and AsyncDiskFile for reading/writing data. If you port the filesystem to a new platform, the OsFile is the only thing that needs to be ported.
  
  Reply ↓
  - Gavin on January 22, 2013 at 11:42 pm said:
    
    Ok,
    
    now I understand the part with the OsFile and DiskFile. Thanks a lot for the reply. Hope to hear about some new articles in the future 🙂
    
    Best regards,
    Gavin
  - Garnold on January 24, 2013 at 8:59 am said:
    
    Hi Stefan,
    I also have one question regarding ZipFile/PackFile. These files are mainly used to group several files together. How do you open a single file from a zip file using this system? Is ZipFileDevice responsible for finding it and creating correct ZipFile for reading only this single file? If so, is it parsing all the headers every time you want to open a file?
  - Stefan Reinalter on January 24, 2013 at 10:46 am said:
    
    Hi Garnold,
    
    I would probably add a pair of Mount/Unmount functions to the ZipFileDevice which can be used for making files inside a zip-file available to the file system. Mount() could simple open the zip-file and parse the headers once, storing an open handle to the file for later access.
    Subsequent Open() requests to the file system would then ask the ZipFileDevice for returning a ZipFile (as you said).
    
    This means you only have to parse the headers once, and can treat files inside zip-files almost the same as files on disk.
Marc Costa on January 9, 2014 at 6:53 am said:

Hi Stefan,

I’ve got some questions on the filesystem. I like your approach to the FileSystem and I’m trying a similar implementation.

First of all, what is the fsArena passed to the FileSystem constructor for? I guess it’s the memory used to store the File instances (DiskFile, NetworkFile, ZipFile and so on). What type of allocator should I use, though? Different File implementations might be of different size, which rules out the PoolAllocator. The stack and linear allocator don’t let you deallocate random memory, though.

Then, I don’t know how do you store the pointers to FileDevices. Maybe the fsArena is for these pointers, which would let to the question: where are the File instances stored? If not, where and how are the pointers to FileDevices stored? I guess a simple static array would do, though.

When calling fs.Open(“memory:disk”, “dir/file.txt”, Mode::READ); is the FileSystem::Open() function responsible of determining all the necessary FileDevice instances needed (listed in deviceList) and requesting each FileDevice to create a File and passing it as a constructor to the next FileDevice? Then, I guess the FileDevice will need an arena to store the File instances created.

Finally, how does the FileSystem know how to close a file? Does the File instance keep a pointer to the FileDevice that has created it?

I know I’m asking for a lot of information, I’m very interested in how the system works though.

Thanks in advance!

Reply ↓
- Stefan Reinalter on January 9, 2014 at 8:41 pm said:
  
  First of all, what is the fsArena passed to the FileSystem constructor for? I guess it’s the memory used to store the File instances (DiskFile, NetworkFile, ZipFile and so on). What type of allocator should I use, though? Different File implementations might be of different size, which rules out the PoolAllocator.
  
  Yes, it stores the file device instances. Those instances are allocated by the user, and mounted to the file system. For allocating the instances, I normally use a linear allocator because all devices have application lifetime anyway. An alternative would be to use a pool allocator which can hold allocations the size of the largest file device instance.
  
  Then, I don’t know how do you store the pointers to FileDevices. Maybe the fsArena is for these pointers, which would let to the question: where are the File instances stored? If not, where and how are the pointers to FileDevices stored? I guess a simple static array would do, though.
  
  I just store them in a dynamic array. You could also use a compile-time size array for that (one that e.g. holds 8 or 16 instances).
  
  When calling fs.Open(“memory:disk”, “dir/file.txt”, Mode::READ); is the FileSystem::Open() function responsible of determining all the necessary FileDevice instances needed (listed in deviceList) and requesting each FileDevice to create a File and passing it as a constructor to the next FileDevice? Then, I guess the FileDevice will need an arena to store the File instances created.
  
  FileSystem::Open() determines the first file device it needs (in this case a MemoryFileDevice), and passes the rest of the device list (and other parameters) to the Open()-method of that device. This device will in turn request the correct file device and pass along parameters until the final device in the list is met. The final device will create an instance of the corresponding file (in this case a DiskFile). The instance is allocated using a pool allocator that can hold any file instance.
  
  Finally, how does the FileSystem know how to close a file? Does the File instance keep a pointer to the FileDevice that has created it?
  
  Yes, a file knows its owning device.
  
  Hope that helps!
  
  Reply ↓
Tobias Kammerer on March 20, 2014 at 7:30 pm said:

First off: I really like your blog and the ideas you are proposing, very informative and motivating!

i am trying to implement something similar to what you described in those 2 blogs at the moment. There is one thing i am not quite sure how to handle: Suppose we have File and FileDevice as abstract classes. And now DiskFile and DiskFileDevice as derived classes with implementations.
Now if DiskFileDevice::Open() is getting called we should open the file on the disk (with the help of the OsFile class) and return an instance of DiskFile. So far so good. Here is the thing i don’t understand: Both classes need the same OsFile instance to work properly. DiskFileDevice needs it to close the File and DiskFile needs it to call read, write, seek etc. How do you give both access to it? At the moment my File Interface stores the callingDevice as a pointer and gives access to a simple getter, but using this to acces the OsFile instance would not work nicely, because then the Interface for the FileDevice would need to have a FileOs type member, which most of the FileDevices would never need.

I would love to hear from you!

Reply ↓
- Stefan Reinalter on March 25, 2014 at 6:33 pm said:
  
  Thanks Tobias!
  
  DiskFileDevice doesn’t need OsFile. The OsFile is only used by the DiskFile class, which has an OsFile member. The DiskFileDevice creates & destroys instances of DiskFile, and shouldn’t need an OsFile in order to work correctly. That is the responsibility of the File instance, not the Device instance.
  
  Reply ↓
ARNAISE Zacharie on March 23, 2018 at 2:06 pm said:

Very interesting post and well-written, as usual! 🙂
I have some questions about it:
– Using a string for deviceList is necessary because you need to check if there is one or multiple devices chained and pass the rest of the devices names to implementations of FileDevice. Would it be possible to use a compile-time string hashing mechanism (like the one you presented) instead? (involves string manipulation at compile-time and passing a variable number of string IDs to Filesystem::Open())
– Not something you talked about in your post but do you use an in-house format when you talk about “big pak-files”? If yes, what advantages does it provides compared to a, say, zip file?

Reply ↓
- Stefan Reinalter on March 23, 2018 at 2:14 pm said:
  
  Using a string for deviceList is necessary because you need to check if there is one or multiple devices chained and pass the rest of the devices names to implementations of FileDevice. Would it be possible to use a compile-time string hashing mechanism (like the one you presented) instead? (involves string manipulation at compile-time and passing a variable number of string IDs to Filesystem::Open())
  
  Might be, but I haven’t looked into it. It should not be something that should ever appear in your profiler.
  
  Not something you talked about in your post but do you use an in-house format when you talk about “big pak-files”? If yes, what advantages does it provides compared to a, say, zip file?
  
  Yes, it’s a custom chunk-based format. The main advantage is that you get to choose which compression algorithm you want to use, and there are far better alternatives than zip. It’s also easy to use different compression algorithms on different platforms, e.g. LZMA provides a really good compression ratio but mobile CPUs struggle with decompression.
  Another advantage is that I can easily use the same file format for supporting both development builds (single files that can all be hot-reloaded individually, but the pak-file serves as a kind of container) and retail builds (all files put into one or several big pak-files, no more single files).
  
  Reply ↓