create new tag
view all tags

Meeting Minutes - 2012-09-13

Attendants: Ziyan, Gang, Fabio

Time: 2:30 p.m.

Secretary: Fabio

Progress report

Gang reports on the progess made since the previous meeting.

  • Gang's work was focused on understanding the organization of the accounting data in the MySQL database used by DIRAC and how the data is used by the DIRAC web portal. He prepared a presentation for this meeting which is available here.
  • The sample data used is the one provided by Ricardo extracted from the accounting database of the DIRAC instance used by the LHCb experiment. The volume of the sample is about 5GB. Gang will find out how many records it contains and what period of time it covers. This information will allow to estimate the accounting data volume of a relatively heavy-loaded DIRAC instance.
  • The process of adding accounting records to the data base can be summarized as follows: accounting records are stored into a table which acts as an input buffer. Records in this buffer are preiodically scanned by an asynchronous process which normalizes the data, store the normalized rows in their final destination tables and deletes the records in the input buffer. More details on Gang's findings are documented here and here.
  • Gang also performed an initial test for storing the accounting data in a key-value database, Redis in this case. This exercise is useful to understand what is needed for modelling the problem using a non-relational model. He choose to store all the values associated with a column (in the accounting table) as a list in Redis, associated to a key (the column name). As a consequence, each Redis key contains a list with more than 4 million values. In addition, extracting all the fields associated to a given accounting record supposes reading the 4M records for every field, which is not really efficient.

After discussion, we agreed on the following next steps:

  • Load the sample data on a MySQL database and write the Python code required to produce the 5 most frequent accounting plots. The code will try to reuse as much as possible what the DIRAC web portal uses, including the library for generating the images (matplotlib). The goal of this work is to have a baseline for comparison with an alternative implementation using NoSQL databases (see below).
  • Study several implementations of NoSQL databases in order to understand the strenghts and weaknesses of them and identify the criteria for selecting some of them for testing in this project. In particular, the databases which seem of interest for this use case are Hadoop, Cassandra, Riak, CouchDB. There are certainly others that can be considered.
  • Once this study is finished, we will select one of the databases and deploy it in a testbed. We will extract the sample records from the MySQL database and insert them in the selected NoSQL implementation. The Python code for generating the accounting plots will be then modified to take into account the new database and the performance of this solution will be measured and compared with the results obtained using MySQL.

Next meeting:

Thursday, September 27th, 2:30 pm, Fabio's office

-- FabioHernandez - 2012-09-17

Topic attachments
I Attachment History Action Size Date Who Comment
PowerPointpptx ac.pptx r1 manage 261.4 K 2012-09-17 - 07:19 FabioHernandez Presentation made by Gang for this meeting
Topic revision: r1 - 2012-09-17 - FabioHernandez
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback