Christopher Browne's Web Pages
Prev	The Linux Kernel	Next

2. File Systems

2.1. Introduction

Linux has long been fertile ground for the creation of various sorts of file systems. The reasons for this have been manifold:

People have moved to Linux from many places, and have wanted to make use of "legacy" file systems from wherever it was that they came, including such systems as Minux, MS-DOS, OS/2, Apple Macintosh, Atari ST, Amiga, and various "official UNIXes;"
Since the kernel is "hackable," some people just like to play with file systems to tune them for better performance (e.g.- reiserfs). A notable application area where improved performance is particularly valuable is that of news processing. News involves "lots and lotsa tiny files," which tends to be challenging to handle.
People want to build "more secure" networked filesystems, typically by creating encrypted variants of NFS;
People want to build "highly reliable" file systems, generally using journalling;
People want to build file systems that provide special sorts of functionality such as maintaining versioning or of creating documents "on the fly." Virtual File System hooks make it possible to build "file systems" that run programs to handle data requests...
Linux Security Audit Project

Regrettably, there is some conflict in this. There is an enormous desire for "more functionality", and that prevents kernel folk from stopping to comprehensively fix problems. There are, somewhere between fibrechannel device drivers, the mapping of that to SCSI access, then to the VFS layer, which then connects to various filesystems, some evident holes.

My employer sponsored some reliability testing in the interests of trying to see if Linux on Opterons, connecting to FibreChannel disk arrays would make a viable platform for large, highly-available PostgreSQL databases. All the filesystems corrupted painfully easily even though the hardware ought to support better.

It's not going to be easy to resolve this; supporting HA hardware requires thoroughly verifying all the details, supporting SCSI and FibreChannel protocols fully, and that would require a deceleration of Linux kernel development efforts, which is inconsistent with the way that Git is allowing larger and larger sets of contributors to cascade flurries of patches, like snow storms, on the Linux development team.

2.2. Significant File Systems

Analysis of the Ext2fs structure
One of Linux's "claims to fame," at least compared to certain horribly-unstable operating systems, is that it has a pretty decent base file system known as ext2fs. There are rumblings back and forth that ext2fs is better than Berkeley FFS and vice-versa; what is certainly the case is that both provide decent performance for most purposes, and both are quite robust. I've got a section on the issue of defragmentation of ext2, as this is a topic that people ask about frequently. The brief synopsis is that a defragmentation utility is available, but it's not too likely that people actually need to use it.
Proposed upcoming extensions to ext2 include handling of:
- Very large file systems (and large file sizes)
- Better support for automatic file compression
- Improved Security Properties
- Hashed directory lookups
- Backups of inode table directories
Theodore Ts'o is a member of the "Linux Kernel Core Group" that has often been responsible for ext2 "stuff."
A TOPS-10-like approach to ACLs?
Tim Smith, tzs@halcyon.com, 1999/01/22
The problem with ACLs is that actually managing them is a complex task. Use ACLs implies Add more staff to do "security management."
Few organizations have seen much point in bothering with this. If there are such fine-grained security requirements that you really need this stuff, you probably need to go B1, in which neither Linux nor NT are realistically the answer to the question anyways.
Actually, managing ACLs in only complex because most implementations take the approach of making an ACL some sort of extended attribute of the file or directory it applies to. This leads to complexity because (1) every utility that can copy of move files or directories has to be modified to know about ACLs or the system has to have complex kludges to guess what should be done (they have to be complex because they have to be right, because botching the ACLs can lead to a security breach), (2) if you've got a thousand files somewhere, you've got a thousand ACLs to worry about.
There is another approach. This was used on TOPS-10 . I don't know if it was original, or if TOPS-10 borrowed it from somewhere else. In this approach, the ACLs are not associated with particular files. Rather, there is a separate file that contains the access control information for an entire group of files. Here's a description of how it might work on Unix, based on the TOPS-10 version, with appropriate changes for a Unix world.
- On an attempt to access a file, the normal permissions would be checked. If they do not forbid it, the access is allowed.
- If the permissions prohibit the access, the file.access_list in the home directory of the owner of the fileis checked.
- The file .access_list contains a series of entries, one per line, of the form
  name:user:group:program:perms
  name - A file name. Some form of wildcarding would be allowed.
  user - A user name, or a list of names.
  Some form of wildcarding would be allowed (this is handy because many sites have patterns in the way they assign usernames).
  group - A group name, or a list of them. Wildcarding is allowed.
  program - A path to an executable program.
  perms - A set of permissions.
  E.g., rx for read and execute.
- The way .access_list is used is that it is scanned looking for an entry whose "name" field matches the file name, whose "user" field matches the user trying to access the file, whose "group" field matches the group of the user, and whose "program" field matches the executable that is trying to access the file. When such a line is found, the "perms" field tells what type of access is to be granted.
  Here's an example:
  # anything in a directory named "private" is off limits */private/*:*:*:*: # people in group "foo" get full (create, delete, read, write, # execute) access to everything in the foo project directory ~/projects/foo/*:*:foo:*:cdrwx # people playing mygame can update the high score file ~/mygame/score.dat:*:*: ~/mygame/bin/mygame:rw # some friends have access to the RCS files for mygame ~/mygame/src/RCS/*:dennis,kevin,josh:*: /usr/bin/ci:rw ~/mygame/src/RCS/*:dennis,kevin,josh:*: /usr/bin/co:rw # I'll put stuff I want everyone to read in my ~/public directory # I'll make the public directory 744, so no one will actually have # to check .access_list, but I'll still put in this entry for completeness ~/public/*:*:*:*:r# anything left over gets no access*:*:*:*:
I realize that there are problems with hard links with this scheme.
Note that this scheme is not nearly as inefficient as it looks, because most accesses would be to things where the normal permissions would be set to allow access, so .access_list would not be checked. You would only need .access_list for those special cases that don't fit in well with the user/group/other +users_in_multiple_groups model. Furthermore, one could greatly increase the efficiency by going to a binary format that can be compiled from .access_list.
I think this kind of scheme actually fits in better than NT style access lists with the way people really think about security. No one wants to think about security on a thousand files on a file by file basis. You want to group them, and think about security by groups. e.g., "things that are part of the frobozz project need to be kept away from the marketing people", "only people at the VP level should see accounting files", etc.
Note that this scheme does not require any changes to existing utilities. Note also that no changes to existing filesystems are required.
I don't recall if TOPS-10 had one access file per user, or one per directory, or both.
PS: in case anyone is curious, the main problem with this scheme on TOPS-10 was that it was added as a kludge. There was a program, FILDAE (File Daemon--TOPS10 used 6.3 names) that handles the access lists. FILDAE was outside the kernel. When an access failed, the kernel sent a message to FILDAE asking if the access list allowed it. FILDAE made its decision and reported back to the kernel. People found attacks based on getting the kernel and FILDAE to become confused over which response was to which request.
Design and Implementation of the Second Extended Filesystem
ext2resize
If you have Partition Magic 3.0, ext2 partitions can be resized via Theodore Ts'o's resizefs. For the time being only people who own Partition Magic3.0 can get resizefs. Theodore has indicated that eventually resizefs will come out in GPL form.
e2fsprogs.sourceforge.net - ext2 resizer finally released
(hosted at SourceForge )
ext2 fsresize utilities
Linux Online - Filesystems HOWTO: Raw partitions
FSDEXT2: Second extended file system (ext2fs) for Windows 95
Reiserfs - File system based on balanced trees
This has been a most interesting project; the author has been benchmarking a file system that stores both inodes and data blocks as a balanced tree. It provides increased performance over ext2fs for small files and for certain types of directory accesses. I think he does a very good presentation of the benchmarks that compare performance. Note that as of version 2.4.2, reiserfs became part of the "official" Linux kernel, and tools such as RIP are available to help manage installations using it.
The work is also very useful as an example of benchmarking. In order to have measurably different results between ext2fs and reiserfs, it proved necessary to construct somewhat artificial benchmarks. (Which tells us that ext2fs performance isn't too bad at all, although we already knew that...) It is necessary to interpret the results carefully. See also Trees Are Fast - Hans Reiser on reiserfs.
The overall approach is quite similar to the log-structured filesystems; this one has the advantage that it actually exists now and is not merely in planning stages.
See also Filesystem Benchmarking using PostMark
PostMark is a filesystem benchmarking tool produced by Network Appliance. While it is possible that it may favor their products, there may still be useful insights available from comparing the performance on this benchmark of other systems.
Regrettably, Reiserfs has had something of a "soap opera" attached to it; Hans Reiser evidently murdered his ex-wife, a sad and sordid tale where nobody turns out to really have been "in the right." This appears to have turned the filesystem, now oddly-orphaned, into a curiosity.
btrfs - copy-on-write Filesystem for Linux

This is a cluster-oriented filesystem created by Oracle .
Tru64 AdvFS for Linux Compatibility
AdvFS is a filesystem developed originally at Digital (now part of HP) which had, 10 years back, many of the sorts of features that Sun is hawking with their "ZFS". Tru64 is getting to be something of a curiosity, rather than an interesting product, and HP has made a "code drop" with a view to possibly making it usable on Linux.
Open-ZFS.org
ZFS was developed by Sun Microsystems for use in Solaris, but released the code as open source, so ports have been done to Linux, FreeBSD, MacOS. It offers many features for flexible management of filesystems including snapshotting, on the fly data compression, built-in awareness of RAID, volume management, transactional semantics, checksums enabling self-healing of some filesystem failures.
Linux ext3fs FAQ
Ext3fs is intended as a successor to Ext2fs, adding in journalling capabilities to allow faster recovery after unexpected reboots.
The Tux2 Failsafe Filesystem for Linux
Rather than using journalling to maintain consistency, the Tux2 filesystem uses a "phase tree" scheme where a tree-structure filesystem is updated in carefully delineated phases. The phase tree approach allows failsafe operation to be achieved with only a slight performance penalty.
See also the Tux2 development at SourceForge.
Ceph
BULMA: Journal File Systems in Linux
This is a review comparing most of the journalling filesystems available on Linux, with some performance statistics. Since many such filesystems are undergoing active development efforts, statistics that are accurate today may not be accurate six months from now, so your milage certainly may vary.
Linux LVM System
This appears to be the first of the "logical volume" projects for Linux to result in actual results. This system seems to be modelled after the logical volume system IBM provides with AIX .
IBM EVMS (Enterprise Volume Management System ) for Linux
Log-Structured File System Project
This is a discontinued project that planned to providing Linux with an "ultra-high-performance" file system that would simultaneously provide "ultra-high-reliability." The general approach is to use the lessons learned in writing robust, fast database management systems.
Grossly oversimplifying, robustness is provided by logging all updates before updating the database ("file system") proper, and speed is provided by having the database be a "view" that references the update logs. A separate process runs when the system is not very busy to "vacuum" out areas of the disk that have become fragmented due to files having been created and deleted. This approach takes after the way modern relational databases are implemented.
The research material may still be useful; other projects continue.
dtfs
Successor project to lfs.
LFS -Large File Summit - Greater than 2GB file sizes on 32 bit systems
Provides several things:
- Allow support for files which size exceeds maximum value of long datatype in 32-bit systems (2G-1).
- Page cache supporting at least up to 1TB file sizes.
- A few behaviour fixes into POSIX compliance of some file related functions.
Note that this will not be compatible with NFS (pre-NFS3/RFC 1094), which defines only a 32 bit file access path.
LFS has been widely implemented on all the Unix flavours still in widespread use. We are getting, now, to the point that 64 bit systems are getting common enough that LFS is starting to become irrelevant.
Linux/LFS Info at SuSE
Driver to attach Blowfish to LinuxFS via loopback device
UDF Filesystem for Linux
The point of this filesystem is that it:
- Supports DVDs, and
- Allows more efficient "CD-burning" schemes to be used.
News File System
Some combination of the above schemes could be combined to achieve a file system optimized for handling NNTP/INN news spools.
News is quite special in that it results in:
- Relatively small files, as most posts do not exceed 4K in size
  This encourages small cluster sizes so that space is not wasted
- Spool directories containing thousands of files
  With ext2, where directory entries are kept in a "list-like" data structure, accesses to files by name becomes increasingly inefficient as directory size grows
- Various expiry information that is probably more important than creation/modification dates.
  If expiry information were stored in the date "fields," both addition and deletion of news could take place faster.
An interesting test of efficacity of new filesystems is to try to use them for a news spool.
Large File Support
A current weakness with Linux is in support for very large files. The commonly-used ext2 file system supports up to 4TB filesystems, which indeed does qualify as "very large." Files are nonetheless restricted to 2GB, which, for some applications, is not very large.
SAS held the SAS Large Files Summit for UNIX, where suggested APIs and approaches were presented as part of an XOpen "summit" to allow UNIX systems to portably support very large files. Linux's approach should follow this...

"I need to have files bigger than 2GB. What's the big problem?"

There are several issues.

First is the filesystems themselves:
- ext2fs in linux-2.2 supports files bigger than 2GB (up to 1TB for filesystems with sufficiently big blocks (the limit comes from the triple-indirect blocks)).
- NFS only supports 32 bits.
- NFSv3 supports 64bits (NFSv3 is available as a patch to the standard distribution).
Then, there is the interface between the user-code and the filesystem code:
- On Linux-2.0, there is no support for more than 2GB files.
- On Linux-2.2, files bigger than 2GB are supported but only on 64 bit architectures (alpha, Sparc64, ...). So on Intel, you're out of luck.
- But there is a patch (in alpha or beta quality, I don't know) for linux-2.2 that adds support for 64bit filesize on 32bit architectures (such as IA-32).
  The catch is that any program that wants to access such a large file needs recompilation.
- According to Alexander Viro, Large File Support limitations are, at this point, thus:
  Actually it's a question to Ingo and Matti - they've done the work on large files. Anyway:
  - 2.2 is limited (in VFS) to MAXINT bytes per file. For ext2, reiserfs, whatever. On 32bit architectures it's 2Gb, on 64bit for all practical purposes it's infinity.
  - 2.3 is limited (in VFS) to MAXINT pages per file. On x86 it's 8Tb (2G pages, 4Kb each).
  - Ext2 layout also adds some limits. They depend on the block size. With the 4Kb blocks the limit on file size is slightly above 4Tb (it's 4Tb + 4Gb + 4Mb + 12*4Kb).

--Stefan Monnier

REAL reason for 2GB file size limit
NIT: The real limitation for POSIX file sizes on 32-bit architectures isn't directly from POSIX, but rather, indirectly from ISO/ANSI C (which is assumed by POSIX).
- In POSIX, the lseek() file offset is defined as off_t. There's nothing to prevent a 32-bit POSIX-compliant implementation from typedef'ing off_t as a 64-bit signed "long long".
- Similarly, ISO/ANSI C's fsetpos()/fgetpos() can be fixed by typedef'ing fpos_t to be a 64-bit long long.
- The problem lies with ISO/ANSI C's fseek()/ftell(), which use a "long" for the offset. Why fpos_t wasn't used (consistently!) is beyond me, but hindsight is 20/20. AFAIK, these are the only two functions that must break if greater-than-2GB files are permitted with 32-bit longs. Other functions can break (like the present case), but aren't required to break by the POSIX+C standards.
Thus, it is ISO/ANSI C compatibility that is necessarily broken for file systems that support bigger-than-2GB files with 32-bit longs -- something that applies even to non-POSIX file systems that claim compatibility with ISO/ANSI C. I'm sure you all can think of at least two, but that's a discussion for the advocacy lists.
-- Chuck Phillips <cdp@peakpeak.com>
Or, alternatively, Alexander Viro has an entertaining answer for you...
Q: is it true that ext2 has 2Gb limit on file size?
Q: so how comes that I can't create files larger than that?
Q: will reiserfs help?
Q: OK, so what can I do, I'm stuck with 32-bit box?
Q: I've done that, and half of my utilities don't work anymore
Q: OK, why?
Q: what should I do?
Q: why...
Q: ... are you so... Hey, what's up with this NIC? It's sparAAAAAASSSHHH (thud)
Q: is it true that ext2 has 2Gb limit on file size?
A: BS
Q: so how comes that I can't create files larger than that?
A: because VM in Linux 2.2 and earlier can't cope with files larger than 2.2 on 32-bit architectures. Regardless of filesystem.
Q: will reiserfs help?
A: what part of "regardless of filesystem" is too hard to understand?
Q: OK, so what can I do, I'm stuck with 32-bit box?
A: use 2.4 or 2.2 with LFS patches or FreeBSD. All of them will handle more than 2Gb on ext2.
Q: I've done that, and half of my utilities don't work anymore
A: That was a question?
Q: OK, why?
A: because if libc thinks that offsets are 32 bit it's not going to pass anything larger to the kernel
Q: what should I do?
A: get sufficiently recent libc. And learn to use search engines, already - all that stuff has been beaten to death many times.
Q: why...
A: excuse me, what was your username, again?
Q: ... are you so... Hey, what's up with this NIC? It's sparAAAAAASSSHHH (thud)
XFS
SGI has made their XFS filesystem available on Linux. Some notable properties of XFS include:
- Supports Very Large Files
- Supports journalled metadata
- Uses B-Trees to represent directories, so that directory accesses take O(log n) time.
BFS filesystem for Linux (BeOS)
Clockwise real-time filesystem for Linux
Allows control over the scheduling of disk access requests,providing both "best effort" servicing as well as "real time," with a specified Quality of Service "contract."
StegFS - A Steganographic File System for Linux
StegFS is a Steganographic File System for Linux. Not only does it encrypt data, it also hides it such that it cannot be proved to be there.
dmoz.org - File Systems
Tailmerging for Ext2
Tailmerging is a technique that I first heard about from its use in ReiserFS . Tailmerging makes use of that wasted space in the last block of each file by sharing each tail block between a few files. Each file knows where to look in the tail block to find its own tail.
Comparing ReiserFS and Ext3
Cloudburst: A Compressing, Log-Structured Virtual Disk for Flash Memory
Oracle Cluster File System
Oracle Clustering File System
BSD Soft Updates
This is an alternative approach to journalling, which tracks and enforces metadata updates to ensure that disk filesystems remain consistent. It imposes a partial ordering on buffer cache operations, which allows requirements for synchronized directory updates to be eliminated. Directory updates can see large performance increases. It also may allow deferring fsck runs, which may be run in the background while the system is "live". Performance is similar to what is provided by journalling, somewhat better in some cases, a little worse in others.
- Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem
- Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems
- Running "fsck" in the Background
Captive
A read/write NTFS-compatible filesystem for Linux. It uses the Microsoft Windows ntfs.sys driver, running atop an emulation layer that emulates needful portions of the Windows NT kernel.
[NILFS] - NTT's Log Structured Filesystem for Linux
LFS: A Log Structured File System for Linux that Supports Snapshots
MogileFS
Hadoop's DFS - inspired by GoogleFS

2.2.1. Virtualized Filesystems

AVFS - A Virtual File System
The Perl Filesystem (Kernel module to let you hook Perl code in to make up FSes)
Perl vs. traditional Filesystems
People have been known to react to the Perl Filesystem with "Why?", so I thought I'd compare the job of writing filesystems in Perl and C and let you draw your own conclusions.
- Perl
  - Filesystem will work the same on any supported system, any supported kernel version. If somebody gives you a pre-built module, you won't even need the kernel sources.
  - Most bugs will cause error messages and meaningful syslog entries.
  - Some filesystems might be slower (but our example "Net" filesystem spends all the time waiting for servers at the other end, so it'd be just as slow in any other language).
- Traditional
  - You need to recompile your filesystem for every combination of operating system/version where you want to use it. In most cases, this requires extensive rewriting (just look at the loadable kernel module which supports PerlFS - it tries to work on two kernel versions of the same operating system, and it contains more conditional compilation than is good for sanity)
  - Most bugs will result in a kernel panic or at best some obscure syslog entry.
  - Some filesystems might be faster.
"Why can't you use userfs?" - I wish I could find a recent version.
Another question I get is "Why not write a Perl NFS server instead?" - Because the NFS protocol is not flexible enough for some of the things I plan to do.
POrtable Dodgy Filesystems in Userland (hacK)
Stackable Design of File Systems
docfs Unified Documentation Storage and Retrieval for Linux Systems.
And now, for something completely different...
This project proposes creating special file systems that dynamically format documentation into the requested format. For instance, the "original source" would be in /usr/doc/sgml in SGML form. When a request is made for the manual page in /usr/man , this file system would dynamically run the SGML-to-GROFF translator, producing the manual page "on the fly." Similarly, accessing /usr/info/something would result in the SGML source being turned into TeXInfo form.
Extending File Systems Using Stackable Templates
Usenetfs: A Stackable File System for Large Article Directories
A Scalable News Architecture on a Single Spool
FiST
File System development is very difficult and time consuming. Even small changes to existing file systems require deep understanding of kernel internals, making the barrier to entry for new developers high. Moreover, porting file system code from one operating system to another is almost as difficult as the first port. Past proposals to provide extensible (stackable) file system interfaces would have simplified the development of new file systems. These proposals, however, advocated massive changes to existing operating system interfaces and existing file systems; operating system vendors and maintainers resist making any large changes to their kernels because of stability and performance concerns. As a result, file system development is still a difficult, long, and non-portable process.
The FiST (File System Translator) system combines two methods to solve the above problems in a novel way: a set of stackable file system templates for each operating system, and a high-level language that can describe stackable file systems in a cross-platform portable fashion. Using FiST, stackable file systems need only be described once. FiST's code generation tool, fistgen, compiles a single file system description into loadable kernel modules for several operating systems (currently Solaris, Linux, and FreeBSD).
PyVen - for implementing Userspace Filesystems in Python, atop Coda
Pgfs - PostgreSQL File System
A file system "server" that stores files in a PostgreSQL database, accesses being handled using NFS clients.
The point of the exercise is to provide automatic versioning, so that one can compare current file "sets" to those that existed at a previous point in time, rolling forward and back as necessary.
This provides a "pervasive" equivalent to CVS.
Code hasn't been sighted in several years.
The Design and Implementation of the Inversion File System
A filesystem implemented atop Postgres . It was slower than NFS, when each update is treated as atomic under "standard" Unix/ NFS semantics. When they were able to run file operations within the DBMS, it was rather a lot faster...

Alex Viro's Per-Process Namespaces for Linux 2.4.2

This is based on the Plan 9 notion of namespaces.

In effect, a namespace associates a set of mounts of filesystems with a process, rather than the traditional Unix approach of associating them with a central table for the system as a whole.

This leads to the notion of mounting "private" filesystems that are visible only to a particular process (and perhaps its children). One thing that this would be useful for is in enhancing system security.

For instance, if I'm using CFS to secure a directory, with the traditional Unix approach, I might use the command cattach /home/cbbrowne/secret_stuff/ secretstuff to mount the data in /home/cbbrowne/secret_stuff/ on /crypt/secretstuff . Unfortunately, anyone on the system with suitable permissions can look in /crypt/secretstuff and see the readable version of the data. That's not terribly secret; I have to be quite careful to keep my data secret!

With a per-process namespace, the mount might be associated with a specific process, and its children. It would be invisible to other processes belonging to other users, and (for better or worse) is even invisible to processes that are not children of that environment. That's rather more secure.

Mind you, that does not forcibly help in this particular situation since CFS behaves as a pretty much public NFS server for the host; the "mount" is for /crypt as a whole, not for each individual encrypted directory...

The other really cool thing that starts to become more practical is the notion of mapping data structures onto virtual filesystems. For instance, you might create a "driver" that provides a mapping DBM files to make one look like a directory with a whole bunch of files.

I might thus do mount -t dbm /home/cbbrowne/data/file.dbm /home/cbbrowne/mounts/file and be given the ability to do the following sorts of things

List the keys via ls /home/cbbrowne/mounts/file
Achieving:
key1 key2 key3 key4
Show the value for a key via cat /home/cbbrowne/mounts/file/key4
value4
More interestingly, we might create entries via echo "value 5" > /home/cbbrowne/mounts/file/key5

None of this would be conceptually impossible with a public namespace; the merit of the namespaces remaining private is that these sorts of isomorphisms are not be blathered around publicly.

There are a number of cryptographic filesystems wherein a virtual filesystem is somehow authenticated at mount time and made accessible to the user.
AtFS - Attribute Filesystems - provides uniform access to immutable revisions of files
Loopback FS using AES
Allowing use of AES encryption for filesystems...
MySQLFS
SQLFS
LinFS/SQLFS page
MaLinux
LUFS (Linux Userland FileSystem) is a hybrid userspace filesystem framework supporting an indefinite number of filesystems (localfs, sshfs, ftpfs, cardfs and cefs implemented so far) transparently for any application
For instance, consider ftpfs, FTP File System, which is a Linux kernel module, enhancing the VFS with FTP volume mounting capabilities. That is, you can "mount" FTP shared directories in your very personal file system and take advantage of local files ops.
- LOCASEFS
  LoCaseFS provides a lowercase mapping of the local file system. It comes in handy when importing win32 source trees on *nix systems.
- SshFS is probably the most advanced LUFS file system because of its security, usefulness and completeness. It is based on the SFTP protocol and requires openssh. You can mount remote file systems accessible through sftp (scp utility).
- GNUTELLAFS
  You mount a gnetfs in ~/gnet. You wait a couple of minutes so it can establish its peer connections. You start a search by creating a subdirectory of SEARCH: mkdir "~/gnet/SEARCH/metallica mp3". You wait a few seconds for the results to accumulate. The you chdir to "SEARCH/metallica mp3" and try a ls; surprise - the files are there!
  You shoot up mpg123 and enjoy... You are happy.
Storage
A project to replace the traditional filesystem with a new document store.
The idea is to store data as BLOBs in a relational database , notably PostgreSQL , along with document attributes. Users would then look for documents based on the attributes, as opposed to designing (usually badly) a hierarchy.
SRFS - Selfstabilizing Replication File System
redisfs - Replication-Friendly Redis-based filesystem
This implements a filesystem which stores data atop the Section 5 database.
Grive
Allows mounting Google Drive as a Linux filesystem
aufs - Advanced Multilayered Unification Filesystem
Tagsistant
Tagsistant is a tool to organize files in a semantic way, which means using tags.

2.2.2. Distributed Filesystems

NFS is the "traditional" networked filesystem used on Linux and Unix.
nqnfs - Not Quite NFS
The arla project - Free Implementation of AFS
GFS - GlobalFile System
The goal of the Global File System research project is to develop a serverless file system that exploits new interfaces like Fibre Channel that allow network attached storage. (Buzzword: SAN = Storage Area Network.)
The critical notion is that the system isserverless. With a traditional networked storage system like NFS, one host "owns" the filesystem and then provides access as a server so that other hosts access the data through that server.
GFS eschews having "a server;" shared-SCSI version exploits SCSI command extensions that provide a locking scheme such that multiple hosts may simultaneously access and update the filesystem directly across the SCSI bus. None of the hosts "own" the filesystem.
Coda Networked Filesystem
Coda is a distributed filesystem with its origin in AFS2. It has many features that are very desirable for network filesystems. Currently, Coda has several features not found elsewhere.
- Disconnected operation for mobile computing
- Is freely available under a liberal license
- High performance through client side persistent caching
- Server replication
- Security model for authentication, encryption and access control
- Continued operation during partial network failures in server network
- Network bandwith adaptation
- Good scalability
- Well defined semantics of sharing, even in the presence of network failures
Oversimplifying somewhat, clients use a cache to store changes that are made to files. They then push updates back to the server, which then distributes changes to other clients.
By having a sufficiently large cache, it can operate even when systems are disconnected, deferring "pushing updates back to the server" until the server is again available.
It implemented the cache using RVM (Recoverable Virtual Memory).
InterMezzo
InterMezzo is a new distributed file system with a focus on high availability. InterMezzo is an Open Source project, currently on Linux (2.2 and 2.3). A primary target of our development is to provide support for flexible replication of directories, with disconnected operation and a persistent cache. It was "deeply inspired" by Coda, and was originally started as part of that project.
OpenAFS
Unison File Synchronizer
Unison is a file-synchronization tool for Unix and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other.
The Inferno operating system provides a distributed file access protocol called Styx which would be interesting to use on other OSes, perhaps even Linux...
konspire
A new distributed file-sharing system featuring fast, exhaustive searches and modest network bandwidth requirements. Written in Java 1.1 (with Swing GUI) for platform independence.
Semantic File Systems
Secure Internet File System
FunFS: Fast User Network FileSystem
ShareWidth
A file sharing system organized around users, allowing users to expose files to those users they wish to provide them to.
Lustre
Lustre is a storage and file system architecture and implementation designed for use with very large clusters.

2.2.3. Other Disk Stuff

Gnu Parted
A partition editor for creation, deletion, resizing, moving, and copying of disk partitions.
linux.oreillynet.com: Proper Filesystem Layout [Oct. 11, 2001]
fstransform
fstransform may be used to do in-place transformations of filesystems between several interesting Linux choices, including xfs, jfs, reiserfs, ext2, ext3, ext4, without a need to do additional backups.

2.2.4. S3 Storage

Amazon has created a storage service, S3, which offers a web service-based API, quite widely used for data access for file storage, packet-based backups, and which is extensively used by Amazon for hosting data for its EC2 virtualization service.

I haven't yet had call to use it directly, though I use it via some proxies (e.g. - DropBox). I would be particularly interested in seeing alternative implementations of the server side emerge. There's code out there, though not particularly easy to deploy nor totally interoperable, at this stage.

littles3 - Java-based implementation of S3 server
Eucalyptus Cloud
Nimbus Cloud

Prev	Home	Next
The Linux Kernel	Up	Defragmentation - A Frequently Asked Question