create new tag
view all tags

Meeting Minutes - 2013-05-24

Attendants: Ziyan, Gang, Fabio

Time: 2:30 p.m.

Secretary: Fabio

Progress Report

Gang reported on his work since the last meeting. Gang's presentation is attached. His work was focused on experimenting with MongoDB as a backend for storing and serving the data for allowing detailed analysis of the accounting job data (as opposed to "canned plots" delivered by the DIRAC portal).

The tests were performed using a Python script for retrieving the data from MongoDB and generating the plot, avoiding the intermediate step of retrieving the data and processing them in ROOT, as described in the DIRAC paper. In order to extract the data for generating the plot, a MongoDB 'find' query is used which returns a cursor over which the Python script gets the results of the query.

This script was used to generate plots of type CPU Efficiency per user, CPU efficiency per site, CPU efficiency per job type (e.g. MC simulation). The generated plots are compared to the ones in the DIRAC paper (see slides 6 to 10). The observed time required for generating those plots are also shown under each one of them. Some plots require less than one seconds, some about 50 seconds and yet some about 120 seconds. From the slides it is not clear if most of the time is spent in the data retrieval phase or during the generation of the plot itself. Gang's feel is that the data retrieval phase is fast enough, but sometimes the generation of the plot (via Numpy and Matplotlib) takes times, in particular for some complex plots. We agreed that is better to measure the time spent in each phase to understand what phase needs some optimization work.

Gang had some questions about the meaning of some fields in the DIRAC accounting database, specifically NormCPU, CPUTime and ExecTime. NormCPU is the normalized CPU time, the CPUTime is the CPU that the job took in that particular worker node where it was executed, and ExecTime is the wallclock time of the job.

The generated plots of ExecTime vs. DiskSpace (see slides 14 and 15) look different from the ones in the DIRAC paper used as reference.

Gang also performed some tests to obtain figures on the scalability of MongoDB as a backend for detailed analysis. He submitted up to 200 concurrent tasks querying data from MongoDB and averaged the results, which are shown in slide 19. One can see that a single operation of retrieving the data from MongoDB takes on average 1012 milliseconds when there are 200 concurrent similar queries. The same information is presented can be visualized in slide 20.

As a conlusion, from this series of tests, we think that MongoDB looks as a good candidate for storing the accounting data of DIRAC, both for the needs of the DIRAC portail (i.e. generation of canned plots) and for more detailed ah-hoc data analysis. For satisfying those two set of needs, different strategies for storing and organizing the data in MongoDB need to be implemented.

There may be some concerns regarding the operations of such an installation. In order to deliver the required performance and at the same time to maintain a high level of availability it is necessary to have at least 3 machines devoted to MongoDB for DIRAC accounting database. This may mean an additional constraint to some DIRAC installations.

Next steps

We agreed to organize a remote meeting with Ricardo and Adria to present these results and collect additional feedback. ZiYan will take care of scheduling a meeting.

-- FabioHernandez - 2013-06-13

Topic attachments
I Attachment History Action Size Date Who Comment
PowerPointpptx discussion(05.24).pptx r1 manage 1784.8 K 2013-06-13 - 04:26 FabioHernandez  
Topic revision: r1 - 2013-06-13 - FabioHernandez
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback