create new tag
view all tags

Meeting Minutes - 2012-08-09

Attendants: Ziyan, Gang, Fabio

Time: 2:30 p.m.

Secretary: Fabio

Project organization

  • Project information will be made publicly available through an editable web space, so that the participants can easily contribute
  • We will meet twice a month, on Thursday 2:30pm
  • The goal of the meetings are to review progress, to identify problems, to look for solutions and to identify next steps

Goal of the project

The goal of this project can be summarized as:

  • to explore the possibilities of unstructured data stores for storing and exploiting DIRAC accounting records
  • to explore the possibilities of integration to the DIRAC source code of such a solution

The benefits of using an unstructured data store for recording accounting data for the DIRAC system are:

  • separation of live data (for which consistency, low-latency, integrity are needed), managed by MySQL, and historical data (at least for accounting), managed by another tool
  • scalability in terms of volume of data stored: a unstructured data store would allow us to keep historical records as detailed as necessary for very long time

Work macro-plan

We have identified 3 major steps for this project. They are:

  • Step 1: get familiar with unstructured data stores
In this step Gang will get familiar with the possibilities and constraints of unstructured data stores. For doing so, he will use Redis or MongoDB to store records of accounting information similar to the ones generated by DIRAC. This requires to understand what is currently stored in the DIRAC accounting records by looking at the source code or contacting DIRAC developers. The goal of this exercise is to understand what is needed to mode the problem domain in terms of key and associated values.
Even if we know upfront that Redis nor MonoDB are well suited for the problem in hand, they are very interesting tool for getting familiar with the field and for testing in the context of a single developer. In other words, you can very easily download and test them using one's personnal computer without needing a more complicated test bed.
The DIRAC code repository is hosted by Github and available at https://github.com/DIRACGrid

  • Step 2: understand the queries the accounting database needs to serve in the framework of the DIRAC system
In this step we need to understand the kind of queries that the accounting database needs to serve, for instance, for answering the requests emitted by the DIRAC dashboard.
We plan also to use real accounting data extracted from the production database of the DIRAC instance used for the LHCb experiment. We will need the help from the DIRAC developers for this extraction.

  • Step 3: evaluate candidate stores for the DIRAC use-cases
In this step we will deploy a tesbed for evaluating the open-source software systems that could be candidates for the use-cases we need to satisfy. Examples of systems to explore, in no particular order, are Cassandra, Riak, HBase and HyperDex. We want to understand the strengths and weakness of each system, in the light of our use-case of interest, not only from the software development point of view, but also from the system operations point of view (availability, scalability, resilience to faults, error logging, monitoring tools, etc.).

All our work will be done bearing in mind that the DIRAC system is written in Python and if our study is satisfactory we eventually need to integrate this work into the DIRAC system.

Next meeting

August 23rd, 2:30pm, Fabio's office.

Meeting ends at 3:35pm.

-- FabioHernandez - 2012-08-16

Topic revision: r1 - 2012-08-16 - FabioHernandez
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback