torque pbs in workstation failed

Hello:
I am trying to install Torque PBS system in my workstation (which contains 2xE5-2680V3). I install it as following:

(1)./configure --without-tcl --with-nvidia-gpus --enable-nvidia-gpus  --prefix=/soft/torque-5.1.1  --with-nvml-include=/usr/local/cuda/gpukit/usr/include/nvidia/gdk  --with-nvml-lib=/usr/local/cuda/lib64 
(2) make && make install
(3)set path=(/soft/torque-5.1.1/bin $path)
set path=(/soft/torque-5.1.1/sbin $path)
(4) #vi /etc/hosts as following:
127.0.0.1 localhost cudaC
xx.xx.xx.xx torqueserver
(5) #./torque.setup albert torqueserver
(6)# vi /var/spool/torque/server_priv/nodes
cudaC np=4
(7) #cd /soft/torque-5.1.1/sbin
#./pbs_sever
#./pbs_sched
#./ pbs_mom


(8)

#pbsnodes 

to check status:

cudaC
**state = down**
 power_state = Running
 np = 12
 ntype = cluster
 mom_service_port = 15002
 mom_manager_port = 15003

It seems that the service doesn’t start…
If I configure /var/spool/torque/server_priv/nodes as following:

node01.cudaC np=12
node02.cudaC np=12
node03.cudaC np=12
node04.cudaC np=12


then run

#pbsnodes

, it will failed with messages:

pbsnodes: Server has no node list MSG=none of the nodes in the 'server_priv/nodes' file resolves to a valid address


Does anybody have any idea what’s problem?
thx a lot

And the workstation runs openSUSE I assume. And the version of openSUSE is …?

Since this appears to be GPU computing,
I don’t see anything obvious, and the openSUSE version is probably less relevant (the cuda sdk version is revealed in the command).

My guess is that you’ll get a better response asking in a CUDA forum.

IMO,
TSU

openSUSE 13.2 X64

yes, I compiled torque with GPU support.

But the major problem is configuring the PBS system in SUSE. I don’t know where is the problems…:frowning:

Hi
Why not ask the package maintainers to upgrade the OBS version?
https://build.opensuse.org/package/show?project=network%3Acluster&package=torque

Else check out how the configure via the files and spec file.

thx a lot for advice.

I go through the link you provided but I cannot find anything concerning how to configure it…:frowning:

Hi
The spec file shows what is going where? Then there are the sysconfig files for configuration and systemd service files to fire things up.

I see it’s enabled for nvidia gpus, so maybe just ask the maintainers to update?

Thank you for further advice.

Finally I got some progress, I can at least make the whole machine as one node now:


cudaC torque-5.1.1/sbin# pbsnodes 
cudaC
     state = free
     power_state = Running
     np = 12
     ntype = cluster
     status = rectime=1435653757,cpuclock=Fixed,varattr=,jobs=,state=free,netload=59771850,gres=,loadave=0.11,ncpus=48,physmem=65982324kb,availmem=86084536kb,totmem=86954864kb,idletime=245,nusers=2,nsessions=6,sessions=1519 2350 2353 11014 11017 11049,uname=Linux cudaC 3.16.7-21-desktop #1 SMP PREEMPT Tue Apr 14 07:11:37 UTC 2015 (93c1539) x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     gpus = 4
     gpu_status = gpu[3]=gpu_id=0000:83:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:83:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[2]=gpu_id=0000:82:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:82:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[1]=gpu_id=0000:03:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:03:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=46 C,gpu[0]=gpu_id=0000:02:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:02:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=45 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=1%;gpu_temperature=40 C,driver_ver=346.46,timestamp=Tue Jun 30 10:42:37 2015

However, I don’t know how to create multiple node in a single workstation…

I’ve contacted with the torque developer for SUSE, but I didn’t get reply yet. I installed the latest torque version by the command line which I posted yesterday.

thanks a lot

Hi,

I am facing the same problem. How did you reach to this level or resolve this issue. I have HP Z840 workstation with 2 nvidia quadro 4200 GPUs and wish to use the same for GPU computing. I followed your instructions of compiling torque with cuda enabled. However, pbsnodes gives the same error as mentioned in your previous post. I am not able to resolve the issue please let me know how to proceed.

thanks and regards,
ajit