Q1: How are small files generated in HEP field? Any other scietific fields?
Q2: What are the small files used for? Read? Write? Copy? Transfer?
Q3: What is the scale of clients? thousands? millions?
Investigation of "small files" problem
Q2: why there is "small files" problem?
Efficient Access to Many Small files in a Filesystem for Grid Computing, Douglas Thain et. al, Sep. 2007
Unfortunately, the data throughput of small le operations on both networks and lesystems is many orders of magnitude worse than the bulk transfer speeds available with large files. On the network, this is because protocols such as
FTP [15]treat a single le as a distinct heavyweight transaction that requires an individual network stream and authentication step.
A network lesystem such as NFS [16] has the opposite prob-lem: files are access on demand in small page-sized chunks, resulting in many network round trips and poor performance.This is particularly harmful in grids, where high network latencies are common.
from the asker: The filesystems that I tried (ext4, btrfs) have some problems with positioning of files on disk. Over a longer span of time, the physical positions of files on the disk (rotating media, not solid state disk) are becoming more randomly distributed. The negative consequence of this random distribution is that the filesystem is getting slower (such as: 4 times slower than a fresh filesystem).
Q1: How are small files generated in HEP field? Any other scietific fields?
Efficient Access to Many Small files in a Filesystem for Grid Computing, Douglas Thain et. al, Sep. 2007
There are also many production workloads that manipulate primarily large numbers of small les. For example
in bioinformatics applications such as BLAST [14], it is common to run thousands of small (less than 100 bytes)string queries against a constant genomic database of several
GB.
In other cases, a standard grid application may depend on the installation of a complex software package consisting of executables, dynamic libraries, and conguration les.The package must either be accessed at run-time over the network,resulting in many small network operations, or installed on a worker node, resulting in a large number of small file creations.
Or, a grid computing system may create a large number of small files internally in the course of executing a workload for the inputs, outputs, log files, and so forth.
XFS's focus is higher scale - more disks, more cores, etc
btrfs's focus is advanced technology, esp. snapshots and online repare capability.
non-local file systems
NFSv3 largely about familiarity, broad feature support, good performance for common workloads
GFS2 & OCFS2, shared-storage filesystems, focused on high availability & high consistency
Lustre focused on very high performance on "embarrassingly parallel" I/O workloads when backed by top-of-the-line hardware
PVFS2 has historically pursued a similar track(as Lustre), though the OrangeFS branch is trying to address general-purpose needs more
GlusterFS is focused on general-purpose use with cheap commodity hardware, and on flexibility/modularity
Ceph is focused on the very latest algorithms to scale up to petabytes, even though the production-level implementation of those algorithms might take longer than a simpler approach would have
HDFS and GoogleFS are highly specialized for the needs of their respective creators
worthy of mention are XtreemFS and Gfarm (both focused on wide-area distribution), Pomegranate (optimizing for very many small files), and Tahoe-LAFS (privacy/security)
NFSv4, especially with pNFS, is a bit of a weird network/distributed hybrid. It's still fundamentally a single-server model, but with little bits of multi-server support grafted on
there's still a lot of work on B-tree based filesystems like ZFS and btrfs, but I think that has passed its peak as a research area and the focus will shift elsewhere
Making all layers of the storage stack work better with SSDs was a huge topic at FAST'11
Better repair and data-integrity guarantees are also getting more attention as capacities continue to outstrip speeds by ever greater ratios
For distributed filesystems
the biggest challenge IMO(In My Opinion) - and this is said as a filesystem developer - is keeping them relevant
I think Gluster is on the right track offering both filesystem and object (S3/Swift) APIs on top of the same basic infrastructure, as is Ceph with filesystem and block APIs (RBD)
Another area of inquiry is providing ways for higher-level systems such as Hadoop to reason about the physical location of data that has been put into a distributed filesystem (or for that matter any other kind of storage besides Hadoop's own HDFS)
from Kartik Ayyar, Distributed filesystem developer
from Ravi Tandon
I believe file systems are moving towards a flat object oriented structure, File system sizes are getting too large to store them hierarchically. Flat file systems are those where files, directories, symbolic links are objects that can be tagged. These tags will be semantically generated depending on the content of the files. These tags can then be hashed to locate file. It has two primary advantages:
file system objects can be clustered together in a better manner semantically.
Secondly search time would be greatly enhanced due to object tagging.File system storage technologies are moving towards log-structured file systems. Flash disks, SSD would propel file systems of the future.
from Jim Dennis
need to resolve the problem of traversal
I think we need to see filesystems adding support for their own indexing (essentially providing continuously/incrementally updated indexes for various attributes and a query language and engine for receiving subsets of those) and for providing enhanced "inotify/dnotify" related APIs (to allow applications to register themselves to event driven notification of changes to directories, file contents, and/or file meta data for whole trees.
from Roy Lacombe -- from the perspective of hardward changes
I think adoption of memristors (why isn't this word in my Safari's spellchecker already?) alone can result in big paradigm shifts not only in file systems, but in OS philosophy and in programming too.
from Alan Cohen -- local filesystem
I think we will see file systems that are more like ZFS.
Why are traditional file systems rapidly becoming obsolete?
First, there's the "lots of objects" problem
Second, there's the metadata problem.
Third, there's the policies and services problem
Object-based information stores are different than filesystems in several important ways
First, you use a token or other uniform identifier to get your information
Second, they have the ability to associate all sorts of metadata with the object itself
Third, the ability to hang metadata off the object gives us the ability to create all sorts of useful policies and services around the information without having to put everything in some sort of database or repository.
in short, any data intensive job is a good target for parallel filesystems. However, you're likely to see more gains on large I/Os than you are on small I/Os because smaller I/Os have a heavier metadata component