Hello:
I am trying to install Torque PBS system in my workstation (which contains 2xE5-2680V3). I install it as following:
(1)./configure --without-tcl --with-nvidia-gpus --enable-nvidia-gpus --prefix=/soft/torque-5.1.1 --with-nvml-include=/usr/local/cuda/gpukit/usr/include/nvidia/gdk --with-nvml-lib=/usr/local/cuda/lib64
(2) make && make install
(3)set path=(/soft/torque-5.1.1/bin $path)
set path=(/soft/torque-5.1.1/sbin $path)
(4) #vi /etc/hosts as following:
127.0.0.1 localhost cudaC
xx.xx.xx.xx torqueserver
(5) #./torque.setup albert torqueserver
(6)# vi /var/spool/torque/server_priv/nodes
cudaC np=4
(7) #cd /soft/torque-5.1.1/sbin
#./pbs_sever
#./pbs_sched
#./ pbs_mom
(8)
#pbsnodes
to check status:
cudaC
**state = down**
power_state = Running
np = 12
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
It seems that the service doesn’t start…
If I configure /var/spool/torque/server_priv/nodes as following:
node01.cudaC np=12
node02.cudaC np=12
node03.cudaC np=12
node04.cudaC np=12
then run
#pbsnodes
, it will failed with messages:
pbsnodes: Server has no node list MSG=none of the nodes in the 'server_priv/nodes' file resolves to a valid address
Does anybody have any idea what’s problem?
thx a lot
hcvv
June 29, 2015, 8:26pm
2
And the workstation runs openSUSE I assume. And the version of openSUSE is …?
tsu2
June 29, 2015, 8:48pm
3
Since this appears to be GPU computing,
I don’t see anything obvious, and the openSUSE version is probably less relevant (the cuda sdk version is revealed in the command).
My guess is that you’ll get a better response asking in a CUDA forum.
IMO,
TSU
yes, I compiled torque with GPU support.
But the major problem is configuring the PBS system in SUSE. I don’t know where is the problems…
tsu2:
Since this appears to be GPU computing,
I don’t see anything obvious, and the openSUSE version is probably less relevant (the cuda sdk version is revealed in the command).
My guess is that you’ll get a better response asking in a CUDA forum.
IMO,
TSU
albumns:
yes, I compiled torque with GPU support.
But the major problem is configuring the PBS system in SUSE. I don’t know where is the problems…
Hi
Why not ask the package maintainers to upgrade the OBS version?
https://build.opensuse.org/package/show?project=network%3Acluster&package=torque
Else check out how the configure via the files and spec file.
thx a lot for advice.
I go through the link you provided but I cannot find anything concerning how to configure it…
Hi
The spec file shows what is going where? Then there are the sysconfig files for configuration and systemd service files to fire things up.
I see it’s enabled for nvidia gpus, so maybe just ask the maintainers to update?
Thank you for further advice.
Finally I got some progress, I can at least make the whole machine as one node now:
cudaC torque-5.1.1/sbin# pbsnodes
cudaC
state = free
power_state = Running
np = 12
ntype = cluster
status = rectime=1435653757,cpuclock=Fixed,varattr=,jobs=,state=free,netload=59771850,gres=,loadave=0.11,ncpus=48,physmem=65982324kb,availmem=86084536kb,totmem=86954864kb,idletime=245,nusers=2,nsessions=6,sessions=1519 2350 2353 11014 11017 11049,uname=Linux cudaC 3.16.7-21-desktop #1 SMP PREEMPT Tue Apr 14 07:11:37 UTC 2015 (93c1539) x86_64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003
gpus = 4
gpu_status = gpu[3]=gpu_id=0000:83:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:83:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[2]=gpu_id=0000:82:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:82:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[1]=gpu_id=0000:03:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:03:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=46 C,gpu[0]=gpu_id=0000:02:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:02:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=45 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=1%;gpu_temperature=40 C,driver_ver=346.46,timestamp=Tue Jun 30 10:42:37 2015
However, I don’t know how to create multiple node in a single workstation…
I’ve contacted with the torque developer for SUSE, but I didn’t get reply yet. I installed the latest torque version by the command line which I posted yesterday.
thanks a lot
au73
May 2, 2016, 2:17pm
10
albumns:
Thank you for further advice.
Finally I got some progress, I can at least make the whole machine as one node now:
cudaC torque-5.1.1/sbin# pbsnodes
cudaC
state = free
power_state = Running
np = 12
ntype = cluster
status = rectime=1435653757,cpuclock=Fixed,varattr=,jobs=,state=free,netload=59771850,gres=,loadave=0.11,ncpus=48,physmem=65982324kb,availmem=86084536kb,totmem=86954864kb,idletime=245,nusers=2,nsessions=6,sessions=1519 2350 2353 11014 11017 11049,uname=Linux cudaC 3.16.7-21-desktop #1 SMP PREEMPT Tue Apr 14 07:11:37 UTC 2015 (93c1539) x86_64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003
gpus = 4
gpu_status = gpu[3]=gpu_id=0000:83:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:83:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[2]=gpu_id=0000:82:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:82:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[1]=gpu_id=0000:03:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:03:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=46 C,gpu[0]=gpu_id=0000:02:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:02:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=45 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=1%;gpu_temperature=40 C,driver_ver=346.46,timestamp=Tue Jun 30 10:42:37 2015
However, I don’t know how to create multiple node in a single workstation…
I’ve contacted with the torque developer for SUSE, but I didn’t get reply yet. I installed the latest torque version by the command line which I posted yesterday.
thanks a lot
Hi,
I am facing the same problem. How did you reach to this level or resolve this issue. I have HP Z840 workstation with 2 nvidia quadro 4200 GPUs and wish to use the same for GPU computing. I followed your instructions of compiling torque with cuda enabled. However, pbsnodes gives the same error as mentioned in your previous post. I am not able to resolve the issue please let me know how to proceed.
thanks and regards,
ajit