Difference: 201111 (1 vs. 2)

Revision 22011-11-17 - KanBowen

Line: 1 to 1
 
META TOPICPARENT name="Maintenance"
-- KanBowen - 2011-11-07

11月7日

[torqueusers] Exit_status=271

I know that exit status gets offset by some number (128? 256?), but it's
not clear to me whether there is a correlation between the signal number
(SIGTERM, or signal 15), and the program's exit status. If a program
that is killed by signal 15, sends a exit code of 15, and if the offset
is 256, that would explain the exit code you see of 271 (256+15).

From the snippet of logs, it looks like Maui decided somehow to delete
the job. SIGTERM (15) is the first signal that Torque sends to the
job's process; if it fails to exit in a short period, it then sends
SIGKILL (9), which can't be caught/ignored. We sometimes have users
catch TERM in their job script, and do some cleanup.

I'd look into why Maui decided to delete it, if I were you. That's
likely the root of the problem.

Added:
>
>
11月17日

由于bes的计算资源不够用,所以把mbh队列的全部资源16*8,放到了bes组里面

如果mbh需要,需要把这些计算资源再放回去。

 \ No newline at end of file
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback