Tags:
create new tag
view all tags
-- ShiJingyan - 2012-02-06

torqmaui上mount了pbssrv的 /var/spool/pbs/目录

一、进程监视,检查:

(1)pbs_server: pbssrv上设置crontab : /root/pbs_Check.sh

(2) maui: pbssrv上设置 crontab: /root/ma_Check.sh

二、内存清理:

pbssrv上设置crontab /root/clearCacheNew.sh top中可以看到:cached的值超过2G,清cache

三、 检查作业是否乱跑:

在pbssrv上设置crontab: /root/shijy/crontab_CheckJobRunningNodes.sh

四 、定期释放hold的作业:

在pbssrv上设置crontab:

/root/kanbw/release.sh

/root/shijy/release-reverse.sh

/root/shijy/release.sh

五、pbs调置备份:

1. 每天00:00 在pbssrv上将qmgr -c 'p s' 输出到 /var/spool/pbs/pbs-bak/server-bak-`/bin/date +%

Y%m%d -d 'yesterday'`

2. 每天14:01 在torqmaui上将 /var/spool/pbs/pbs-bak/目录下的备份文件, maui.cfg, nodes文件拷贝到/pbs-bak/crontab-bak目录下

六、pbs监视:

(1)作业排队时间监视:

a. 在torqmaui服务器上每半小个时运行一次: /home/farmer/shijy/crontab_pbs.sh 将完成作业信息保存到数据库中jobmanage

b. 在torqmaui  每天运行一次:/root/shijy/PbsConfigAna.py `/bin/date +%Y-%m-%d -d 'yesterday'`  将作业排队时间,作业数,core数写在数据库中

 c. 在lxslc11上每天2:00运行一次 /root/shijy/QueueTime/QueueTime.sh 更新排队时间网页

Topic attachments
I Attachment History Action Size Date Who Comment
Microsoft Word filedocx PBS_Job_Monitoring_System.docx r1 manage 19.7 K 2012-02-13 - 09:05 KanBowen PBS监控系统代码流程文档
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2012-02-13 - KanBowen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback