Difference: 20130110 ( vs. 1)

Revision 12013-01-16 - FabioHernandez

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="DiracAccounting"

Meeting Minutes - 2013-01-10

Attendants: Ziyan, Gang, Fabio

Time: 3:30 p.m.

Secretary: Fabio

Progress report

Gang reports on the progess made since the previous meeting. His presentation is attached.

  • Gang revisited the data modeling in Cassandra. The new model is composed of 4 colum families. One column family contains the raw accounting records and the 3 remaining column families contain the aggregated data according to several criteria. The granularity for aggregation is one day, which is considered enough for generation of the plots.
  • Each of the 3 column families which contain aggregated data is composed of several columns for storing the values associated to the relevant fields for each day, namely, CPUTime, Exectime, DiskSpace, InputSandboxSize, OutputSandboxSize, JobCount. The rows of those colum famlies are the user, the site and the job type.
  • Modelling the data in this way allows for efficient generation of the plots.
  • Gang paid special attention for naming the columns with meaningful names, so that the data model is understandable for humans. The price to pay is a slight increase in the storage volume needed to hold the data.
  • Gang made some tests to quantify the time needed to perform the data aggregation and insertion into Cassandra. Using a text file with 1000 job accounting records, he measured that it takes 18 seconds to read these data, aggregate it and insert into Cassandra using the data model described above. To put this number in perspective, the DIRAC instance of LHCb adds about 30.000 new job accounting records per day. So, adding this data to Cassandra would take about 9 minutes per day. NOTE: these tests have been performed in a machine with server-like hardware configuration.
  • In addition, Gang made a demonstration of the system generating a subset of plots in real-time (only pie plots). He verified that the plots generated are similar to the ones generated by the LHCb DIRAC portal for the same period considered. In some cases there are slight discrepancies (of the order of 0.1) which could be due to rounding problems.
  • Gang developed a simple web form for collecting the input data (time period and type of plot) and generating the plot after extracting the data from Cassandra. In every demonstrated case (only pie-type plots, though) the time required was less than 3 seconds and in some cases less than 1 second. The same test was performed against the LHCb DIRAC portal and the time required for generating the same plots was noticeably higher. However, the difference may be at least in part due to network latency, so we cannot draw rapid conclusions from such a simple test. Nevertheless, it is encouraging to see that without too much tuning, it seems possible to achieve short times for generating plots for accounting purposes, which is very important for keeping the desired level of interactivity with the DIRAC portal.

Next steps

We agreed on the following next steps:

  • develop the code for generating the full set of plots (time series, pie plots, etc.) including all the grouping criteria (by user, by site, by job type, etc.). This requires a better understanding of Matplotlib.
  • improve the test portal so that it allows for entering the same input data than the production DIRAC portal
  • make the demonstration portal accessible from outside the IHEP network, so that the experts of DIRAC core team can play with it
  • call for a video-conference meeting (via H323 or Skype or EVO), if possible before February 1st, so before Spring Festival break. The goal of the meeting is to discuss with the DIRAC team members about the progress made, demonstrated the current status of this work and make sure we are covering all the use cases. ZiYan will contact Ricardo for this.
  • prepare a presentation for the conference call for providing details on the work performed, the current status and the next steps.

Next meeting

We agreed to meet again on January 24th, to finalize the preparation of the conference call.

-- FabioHernandez - 2013-01-16

META FILEATTACHMENT attachment="Discussion(2012.01.10).ppt" attr="" comment="" date="1358324143" name="Discussion(2012.01.10).ppt" path="Discussion(2012.01.10).ppt" size="4186112" stream="Discussion(2012.01.10).ppt" tmpFilename="/usr/tmp/CGItemp7057" user="FabioHernandez" version="1"
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback