Difference: PbsMonitor (1 vs. 2)

Revision 22012-02-13 - KanBowen

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
-- ShiJingyan - 2012-02-06
Line: 45 to 45
 b. 在torqmaui  每天运行一次:/root/shijy/PbsConfigAna.py `/bin/date +%Y-%m-%d -d 'yesterday'`  将作业排队时间,作业数,core数写在数据库中

 c. 在lxslc11上每天2:00运行一次 /root/shijy/QueueTime/QueueTime.sh 更新排队时间网页

Added:
>
>
META FILEATTACHMENT attachment="PBS_Job_Monitoring_System.docx" attr="" comment="PBS监控系统代码流程文档" date="1329123902" name="PBS_Job_Monitoring_System.docx" path="PBS Job Monitoring System.docx" size="20183" stream="PBS Job Monitoring System.docx" tmpFilename="/usr/tmp/CGItemp6315" user="Kanbw" version="1"

Revision 12012-02-06 - ShiJingyan

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"
-- ShiJingyan - 2012-02-06

torqmaui上mount了pbssrv的 /var/spool/pbs/目录

一、进程监视,检查:

(1)pbs_server: pbssrv上设置crontab : /root/pbs_Check.sh

(2) maui: pbssrv上设置 crontab: /root/ma_Check.sh

二、内存清理:

pbssrv上设置crontab /root/clearCacheNew.sh top中可以看到:cached的值超过2G,清cache

三、 检查作业是否乱跑:

在pbssrv上设置crontab: /root/shijy/crontab_CheckJobRunningNodes.sh

四 、定期释放hold的作业:

在pbssrv上设置crontab:

/root/kanbw/release.sh

/root/shijy/release-reverse.sh

/root/shijy/release.sh

五、pbs调置备份:

1. 每天00:00 在pbssrv上将qmgr -c 'p s' 输出到 /var/spool/pbs/pbs-bak/server-bak-`/bin/date +%

Y%m%d -d 'yesterday'`

2. 每天14:01 在torqmaui上将 /var/spool/pbs/pbs-bak/目录下的备份文件, maui.cfg, nodes文件拷贝到/pbs-bak/crontab-bak目录下

六、pbs监视:

(1)作业排队时间监视:

a. 在torqmaui服务器上每半小个时运行一次: /home/farmer/shijy/crontab_pbs.sh 将完成作业信息保存到数据库中jobmanage

b. 在torqmaui  每天运行一次:/root/shijy/PbsConfigAna.py `/bin/date +%Y-%m-%d -d 'yesterday'`  将作业排队时间,作业数,core数写在数据库中

 c. 在lxslc11上每天2:00运行一次 /root/shijy/QueueTime/QueueTime.sh 更新排队时间网页

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback