OpenPBS SSS
OpenPBS SSS, developed at Pacific Northwest National Laboratory (PNNL),
is an enhanced version of the stock OpenPBS_2.3.12 source. We applied patches
to get it to version 2.3.15. These patches include:
-
an
addparam.sh script that is used to add another
parameter for jobs so that the
maui scheduler
can do a few extra things
-
a patch from OSC for ia-64 boxes, that's in
config.guess
and config.sub
-
a patch so that the PBS server won't overwrite the default server file, and
if it's there and the makefile will also clean up documents with a
make distclean
-
some changes from the ANL to increase some RPP time outs and the number of
connections to listen for
-
some changes to some values for scaling from the NCSA scaling patch.
-
another patch to grab the memory information from
/proc/meminfo
rather than trying to get it from /dev/kmem
-
other patches increase the timeout values in some of the socket code.
-
there were some linux headers problems in
resmom/linux/mom_mach.c.
Removed the linux/quota.h and added time.h and
sys/quota.h
-
a patch to keep PBS from dumping the job load on a quick shutdown
NOTE: These patches above were mostly the work of others.
We didn't make many changes that were not available on the web.
PNNL additions:
-
changed the
xpbs_scriptload.c file to not use
tmpnam() since the compiler complained about it being
insecure. We used the mkstemp() call instead
-
added new code to allow the
pbs_server to keep much of
the information that's needed by a scheduler to allow the scheduler
to not need to go to each of the nodes to get that information in
addition to getting info from the server. Our changes add another
item to the list that you'll get by running pbsnodes -a.
Now there should also be a "status" entry that has a list of
name=value,name2=value2 data. We picked information that
we thought might be useful for a scheduler. It should be pretty easy
to change the information sent by the mom. At this time it's a
compile-time thing, not an option from qmgr. If you want to change
what is sent to the server try changing the list in:
srv/server/query_configs.h.
-
There were some fault tolerance patches that introduce non-blocking sockets
to get around the problem that the server still needs to contact all of
the moms. If one of the connections hangs, the server never gets to
all of the other nodes. The non-blocking sockets fix that, but I thought
that a better way to fix things was to have the nodes connect to the
server and check in every so often.
-
In addition to checking in with the server, the nodes also send node
status information to be added in with the other node information. This
information should then be available for any scheduling software without
needing to contact the individual nodes.
-
added a couple of parameters that can be set with qmgr:
set server node_ping_rate = 15
set server node_check_rate = 600
This tells the nodes that they need to check-in every 15 seconds (that's
probably a bit high for normal sized clusters) and to count the node
as down if it hasn't checked in for more than 600 seconds. If a mom
shuts down, it will close the socket and the server will know right away.
To make this work, We added a requirement for a $clienthost
line in the mom_priv/config file:
$clienthost myserver.mylocation.net.
The $clienthost parameter can be used to add nodes to the
mom acl list. In order for this to work the first $clienthost
entry needs to point to the node running the pbs_server process.
-
The code for the client to update the server happens in the
is_update_stat() routine in the resmom/mom_main.c
file. We added code on the server side to handle the case where the node
checks in so that the node still gets the list of ip-addrs to accept
connections from.
-
disabled the "ping nodes" code
-
changed the
neednodes attribute on the job to allow it
to be updated while the job is running
In looking for bugs in our code, we found that there were some problems
that we may not understand:
-
It seems that there are times that the pbs_mom dies
with a job in a "PRERUN" substate. If this happens,
when the pbs_mom restarts, it doesn't find the job.
We didn't know what to do in this case, so we decided that it should
do what it would do if job was running. This seemed to clear up
some conditions when nodes wouldn't free from jobs that had crashed.
There are times when a job has exited but all of the files
in the PBS_HOME/mom_priv/jobs directory didn't get cleaned out.
The pbs_mom startup didn't have a case for fixing that
situation either.
If you notice problems with this code please contact
Gary Skouson (gary.skouson@pnl.gov)
|