create new tag
view all tags

Meeting Minutes - 2012-12-06

Attendants: Ziyan, Gang, Fabio

Time: 2:30 p.m.

Secretary: Fabio

Progress report

Gang reports on the progess made since the previous meeting. His presentation is attached.

  • Gang started working on modeling the data to take advantage of Cassandra possibilities and overcome its limitations. Several iterations have been performed to find a convenient way of selecting keys for the accounting records. This is an important point because accessing the data in Cassandra is performed by key. In addition, Cassandra provides a means to build an index for each column family which is useful for rapid accessing the information in each column.
  • Gang built a tool for extracting the data from MySQL and inserting into Cassandra. He found several problems (timeout problems, memory exhaustion, etc.) that need to be understood. He finally managed to insert into Cassandra a subset of the accounting records for doing some initial testing. This subset is roughly 10% of the available data set.
  • He is facing the problem of modeling the accounting data using the abstractions and tools provided by Cassandra. In particular, as opposed to MySQL, the data aggregation and counting is not provided by Cassandra itself and needs to be performed by the client application. More work is needed for understanding the data modeling possibilities offered by Cassandra and how to exploit them for storing the job accounting data.

Next steps

We agreed in the next steps:

  • investigate in more detail the timeout problem when inserting the data into Cassandra, so that the whole data set (about 5GB) can be used for the following tests.
  • explore the ways of storing the data in Cassandra to generate at least 3 different types of plots among the ones produced by the DIRAC web portal. The chosed set of plots must be representative of the types of plots to generate. The goal is to make sure that the data modeling decisions are applicable for the whole spectrum of plot generation needs.
  • explore ways of preprocessing the data so to store in Cassandra pre-computed summaries that can be readily used for generating the plots.

Next meeting

Thursday, December 20th, 2:30 pm, Fabio's office

-- FabioHernandez - 2012-12-12

Topic revision: r1 - 2012-12-12 - FabioHernandez
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback