create new tag
view all tags

FileStore Project Overview


The goal of this project is to build a prototype of a file storage service targeted at end-users of the high energy physics community. Although nothing in the system is specific to the physics scientific domain, users of that community will be our first target. The experience we will gain satisfying their use-cases will help us extend or adapt the system to a larger audience.


Experiments in the field of high energy physics are now used to exploit the computing and storage resources of the grid infrastructures they are allowed to use. Although the available tools for storing, cataloguing and transferring large amounts of experimental data split in millions of files exist and are used in a daily basis by the experiment's computing operations teams, the tools available for the end-user to store his individual data are not convenient enough.

This project aims to explore the available technology for building a more user-friendly way for remotely storing and retrieving individual-user's data. Ideally, the user will interact with the system by using the usual tools and metaphors he has in his laptop/desktop as if files were locally stored.

Who will use the service?

The service is intended to be used by individuals involved in high energy physics experiments around the world.

Who will provide the service?

The storage service would be provided by computing centres of the HEP community around the world. Those centres have good international connectivity, in general provide 24x7 service, have expertise running IT services, are used to store significant amounts of research data.

What mechanisms will the system provide for interacting with it?

The system will provide a reduced set of basic operations, namely:

  • list files
  • store file
  • retrieve file
  • delete file
  • create directory
  • delete directory

The system is not intended to provide the whole set of POSIX semantics (in particular seek, partial read or write, etc.). Its main goal is to provide a repository of user's files that can easily stored and retrieved to be processed either by using the computing capacity of the user's laptop/desktop or the computing capacity provided by the grid (think of it as a personal storage element), or both. We target a usage profile in which retrieve operations are more frequent than store operations.


The prototype we are building will support about 100 users each with 1 TB or 2 TB of allocated storage for individual use.


Below is a list of requirements we think the system must satisfy to be usable:

  • Availability: the system myst tolerate unavailability of computing centres hosting the data, as well as intra- and inter-centre network failures. There must not be single points of failure.
  • Confidentiality: the data must be transported using secured channels. Whether the data should be stored encrypted or not remains to be determined with the initial user community.
  • Performance: no stringent I/O performance is required from this system, as it is not intended to be used as a high-performance storage system serving data directly to the applications processing them, but rather as a resilient and extensible repository for files. However, some degree of interactivity will be welcome.
  • Scalability: the system needs to scale horizontally by adding new file servers
  • Resiliency: the system must automatically handle the inevitable failures in disk servers and network partitioning
  • Data sharing: sharing of files among authorized users must be possible and easy
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2011-01-05 - FabioHernandez
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback