Difference: 20130314 ( vs. 1)

Revision 12013-03-25 - FabioHernandez

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="DiracAccounting"

Meeting Minutes - 2013-03-14

Attendants: IHEP: Ziyan, Gang, Fabio

Time: 2:30 p.m.

Secretary: Fabio

Progress Report

This is the first meeting after spring break. Gang reported on his work since the last meeting. Gang's presentation is attached.

  • Gang looked at the way the aggregated job accounting records are stored in DIRAC's MySQL database. A pyramidal model is implemented with finer granularity for recent records and lower granularity for old records. Specifically, the records for the jobs executed in the last week are aggregated per hour. The records of the jobs executed in the last month are aggregated in 2 hours-long buckets. The records for the last 5 months are aggregated in buckets of 1 day duration. Buckets of lengths of 2 days are used for aggregating jobs of 6 months and jobs older than 6 months are aggregated using buckets of 1 week.
  • Gang performed some tests using Cassandra for storing all the fields included in individual job records. This is in contrast with the previous data scheme in which jobs were aggregated using only one grouping criteria (e.g. by site, by user, ...) in 1 day-long buckets. This schema proven useful for generating plots fast using a single selection criteria but it cannot be used for generating plots with several selection criteria because that information is not stored.
  • The Python routines for generating the plots were modified for taking into account this new schema. Some plot generation tests were performed, first without additional selection criteria and then using more than one selection criteria. The results of the first type of tests are presented in slide 7. The time for generating those plots are too high (275 to 450 seconds) to be usable for the portal. Generating the plots with multiple selection criteria are faster (i.e. 145 to 170 secs) but not fast enough to be considered usable.
  • In the previous meeting we had identified a plot that was generated by the portal and looked very differently from the one generated by Gang's tests. Gang looked again at it (see slide 11) using exactly the same data from the LHCb accounting records and the plot still looks different from the DIRAC portal.
  • We disccussed then how to make progress and we decided to:
    • ask some guidance from the Cassandra community to understand if there are some features of Cassandra or better ways to organize the data that could be used to help improve the performance.
    • evaluate other sort of data stores that can exploit the fact that the accounting records are structured (i.e. all the records contain roughly the same information). Column-oriented data stores could be an alternative to relational DBMS. Examples of those systems are MonetDB (free, open source), Vertica (commercial) and VoltDB, among others.

Next meeting

April 1st, 2:30 pm, Fabio's office

-- FabioHernandez - 2013-03-25

META FILEATTACHMENT attachment="discussion(3.14).ppt.pptx" attr="" comment="" date="1364201821" name="discussion(3.14).ppt.pptx" path="discussion(3.14).ppt.pptx" size="1203705" stream="discussion(3.14).ppt.pptx" tmpFilename="/usr/tmp/CGItemp11901" user="FabioHernandez" version="1"
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback