create new tag
view all tags
-- KanBowen - 2011-11-07


[torqueusers] Exit_status=271

I know that exit status gets offset by some number (128? 256?), but it's
not clear to me whether there is a correlation between the signal number
(SIGTERM, or signal 15), and the program's exit status. If a program
that is killed by signal 15, sends a exit code of 15, and if the offset
is 256, that would explain the exit code you see of 271 (256+15).

From the snippet of logs, it looks like Maui decided somehow to delete
the job. SIGTERM (15) is the first signal that Torque sends to the
job's process; if it fails to exit in a short period, it then sends
SIGKILL (9), which can't be caught/ignored. We sometimes have users
catch TERM in their job script, and do some cleanup.

I'd look into why Maui decided to delete it, if I were you. That's
likely the root of the problem.




Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2011-11-17 - KanBowen
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback