create new tag
view all tags

Meeting Minutes - 2012-08-23

Attendants: Ziyan, Gang, Fabio

Time: 2:30 p.m.

Secretary: Fabio

Progress report

Gang reports on the progess made since the previous meeting.

  • Gang has installed and getting familiar with the data structures provided by Redis. He has also written some Python scripts for interacting with a Redis server on his laptop.
  • Ricardo has sent us a dump containing an excerpt of the records of the production instance of DIRAC used by the LHCb experiment. This MySQL dump is composed of 35M job accounting records. The total volume of this data (in the current encode form) is about 4GB.
  • Gang has set up a MySQL server and used this dump to initialize a database, which is useful for easily inspecting the contents of the dump and understanding the structure of the DIRAC accounting data.
  • There are about a dozen of tables, which can be categorized in 3 types, named:
    • ac_type_*
    • ac_in_*
    • ac_key_*
  • Below we reproduce an excerpt of Ricardo's mail dated on 17/08/2012 which provides additional details on the structure and organization of the data in the dump:

The relevant information about what gets stored on the DIRAC accounting records comes from:

For the info that each record includes you can find something like:

class Job( BaseAccountingType ):
  def __init__( self ):
    BaseAccountingType.__init__( self )
    self.definitionKeyFields = [ ( 'User', 'VARCHAR(32)' ),
                                 ( 'UserGroup', 'VARCHAR(32)' ),
                                 ( 'JobGroup', "VARCHAR(64)" ), 
                                 ( 'JobType', 'VARCHAR(32)' ),
                                 ( 'JobClass', 'VARCHAR(32)' ), 
                                 ( 'ProcessingType', 'VARCHAR(32)' ),
                                 ( 'Site', 'VARCHAR(32)' ),
                                 ( 'FinalMajorStatus', 'VARCHAR(32)' ),
                                 ( 'FinalMinorStatus', 'VARCHAR(64)' )
    self.definitionAccountingFields = [ ( 'CPUTime', "INT UNSIGNED" ),
                                        ( 'NormCPUTime', "INT UNSIGNED" ),
                                        ( 'ExecTime', "INT UNSIGNED" ), 
                                        ( 'InputDataSize', 'BIGINT UNSIGNED' ),
                                        ( 'OutputDataSize', 'BIGINT UNSIGNED' ), 
                                        ( 'InputDataFiles', 'INT UNSIGNED' ),
                                        ( 'OutputDataFiles', 'INT UNSIGNED' ), 
                                        ( 'DiskSpace', 'BIGINT UNSIGNED' ),
                                        ( 'InputSandBoxSize', 'BIGINT UNSIGNED' ),
                                        ( 'OutputSandBoxSize', 'BIGINT UNSIGNED' ),
                                       ( 'ProcessedEvents', 'INT UNSIGNED' ) 
The "definitionKeyFields" are those that allow to classify the record and "definitionAccountingFields"
are those that are "accounted" for in the record, in this case for Jobs. All records of any type provide
also a startTime and endTime.

In the MySQL schema, each "Key" field is accompanied by a auxiliary table that assignes a numeric
key to each possible value.

Next steps

We agreed on the following next steps:

  • Gang will explore in detail the data available for understanding the contents of each one of the tables. His findings are documented here and here.
As a result of this work, we will have a clear understanding of the current organization of the data in the MySQL database and that information is essential to understand what are the most important queries the DIRAC accounting system needs to answer in a efficient way. Gang will inspect the DIRAC source code, in particular the modules relevant for the accounting data storage and retrieval. The code of the DIRAC portal, which is in charge of generating the plots by extracting data from the accounting database needs to be well understood. Gang will document the queries of the DIRAC portal sends to the accounting system.
  • Gang will prepare a detailed presentation of this findings for next meeting regarding the previous point.
  • Once the current organization of the accounting data and the most frequent queries are understood and documented, Gang will do an exercise of modelling the data using the data structures provided by Redis. We all understand that Redis is not our target data store, but it is a simple way to get familiar to the possibilities offered by this kind of data stores. We will focus after on evaluating a persistent data store suitable for the current and desired use-cases of the DIRAC accounting system.

Next meeting:

  • Thursday September 6th, 2012, 2:30pm, Fabio's office

-- FabioHernandez - 2012-08-24

Topic revision: r1 - 2012-08-24 - FabioHernandez
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback