Notes on the queries served by the DIRAC accounting system
Introduction
In this space we document our understanding of the queries the current DIRAC accounting system needs to serve to satisfy the requirements of its clients, for instance, the DIRAC dashboard. The goal of this work is to understand what are the most important queries the future system, based on unstructured data stores, needs to be optimized for.
This space is mainly maintained by Gang and reflects the results of his findings while studying the DIRAC accounting system.
--
FabioHernandez - 2012-08-24
The DIRAC accounting system used on the DIRAC web protal
DIRAC web protal
The DIRAC Accounting system provide five types of query to DIRAC users by the DIRAC web protal:Date Operation
Plots,Jobs Plots, WMS History Plots,Pilots Plots SRM Space Token Deployment Pilots.Figure 1 is a screenshot to show the five types.
Figure 1.Accounting types
Figure 2 Job plots and items
Figure 2 is a page when user select the job plot.We can see that there are some options user need to select,users need to choose what plot they want to generatec,such as CPU time ,CPU efficiency ,Input space and so on.They alse need to select the "Group by" and "Time span",then users can get a plot about the options they selected on the right part of the web page,may be like figure 3:
Figure 3 A job plot
As we can see in the figure 2,both "Plot to gengrate " and "Group by "has a candidate itemssets.The value of these items are stored in the DIRAC Accounting databases .Here is a table named "ac_in_lhcb-production_job",we list the fields :
Field | Type |
JobGroup | varchar(64) |
DiskSpace | bigint(20) unsigned |
InputDataSize | bigint(20) unsigned |
FinalMajorStatus | varchar(32) |
OutputDataSize | bigint(20) unsigned |
InputSandBoxSize | bigint(20) unsigned |
OutputDataFiles | int(10) unsigned |
NormCPUTime | int(10) unsigned |
User | varchar(32) |
JobType | varchar(32) |
JobClass | varchar(32) |
ProcessingType | varchar(32) |
ExecTime | int(10) unsigned |
CPUTime | int(10) unsigned |
startTime | int(10) unsigned |
UserGroup | varchar(32) |
FinalMinorStatus | varchar(64) |
Site | varchar(32) |
ProcessedEvents | int(10) unsigned |
OutputSandBoxSize | bigint(20) unsigned |
InputDataFiles | int(10) unsigned |
endTime | int(10) unsigned |
In this table,the red items are the members of "Group by" itemsets,the blue items decide the "Time Span",the rest are the menbers of "Plot to gengrate".
Source code analysis
We can get more information from
DIRACWeb
/
dirac
/
controllers
/
systems
/
accountingPlots.py
We will list a flow to show how users to get a statistical data from the database and display on the DIRAC web protal.First we describe some functions of class
AccountingplotsController in "accountingPlots.py".
FUN1 def __getUniqueKeyValues( self, typeName ):
“This function is used to get the unique key values of differents types,such as type Job Plots,type Date Opreation Plots... ”
FUN2 def __showPlotPage( self, typeName, templateFile ):
"This function is used to show the different plot page by select different types. "
FUN3 def __parseFormParams( self ): “parse the Params”
return -----> *params
"To parse the params that user selected and return a *params(typeName,reportName,startTime,endTime,conDict,grouping,extraArgs)"
FUN4 def __queryForPlot( self ):
"gengrate the plot"
FUN5 def getPlotData( self ):
"Get the plot from the DIRAC Accounting database"
FUN6 def getPlotImg( self ):
"get the image and show on the web page"
These function have the sentences like"repClient
= ReportsClient( rpcClient
= getRPCClient( "Accounting/ReportGenerator" ) )
retVal
= repClient
.getReport(
*params )" ,when the cache do not have the data that user requested,the DIRAC web will access the database to extra useful information.
Here is the flow chart: