Results 1 to 10 of 10

Thread: torque pbs in workstation failed

  1. #1

    Unhappy torque pbs in workstation failed

    Hello:
    I am trying to install Torque PBS system in my workstation (which contains 2xE5-2680V3). I install it as following:
    Code:
    (1)./configure --without-tcl --with-nvidia-gpus --enable-nvidia-gpus  --prefix=/soft/torque-5.1.1  --with-nvml-include=/usr/local/cuda/gpukit/usr/include/nvidia/gdk  --with-nvml-lib=/usr/local/cuda/lib64 
    (2) make && make install
    (3)set path=(/soft/torque-5.1.1/bin $path)
    set path=(/soft/torque-5.1.1/sbin $path)
    (4) #vi /etc/hosts as following:
    127.0.0.1 localhost cudaC
    xx.xx.xx.xx torqueserver
    (5) #./torque.setup albert torqueserver
    (6)# vi /var/spool/torque/server_priv/nodes
    cudaC np=4
    (7) #cd /soft/torque-5.1.1/sbin
    #./pbs_sever
    #./pbs_sched
    #./ pbs_mom
    (8)
    Code:
    #pbsnodes
    to check status:
    Code:
    cudaC
    state = down
     power_state = Running
     np = 12
     ntype = cluster
     mom_service_port = 15002
     mom_manager_port = 15003
    It seems that the service doesn't start....
    If I configure /var/spool/torque/server_priv/nodes as following:
    Code:
    node01.cudaC np=12
    node02.cudaC np=12
    node03.cudaC np=12
    node04.cudaC np=12
    then run
    Code:
    #pbsnodes
    , it will failed with messages:

    Code:
    pbsnodes: Server has no node list MSG=none of the nodes in the 'server_priv/nodes' file resolves to a valid address
    Does anybody have any idea what's problem?
    thx a lot

  2. #2
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    25,117

    Default Re: torque pbs in workstation failed

    Quote Originally Posted by albumns View Post
    Hello:
    I am trying to install Torque PBS system in my workstation
    And the workstation runs openSUSE I assume. And the version of openSUSE is ........?
    Henk van Velden

  3. #3
    Join Date
    Jun 2008
    Location
    San Diego, Ca, USA
    Posts
    11,256
    Blog Entries
    2

    Default Re: torque pbs in workstation failed

    Since this appears to be GPU computing,
    I don't see anything obvious, and the openSUSE version is probably less relevant (the cuda sdk version is revealed in the command).

    My guess is that you'll get a better response asking in a CUDA forum.

    IMO,
    TSU

  4. #4

    Default Re: torque pbs in workstation failed

    openSUSE 13.2 X64

    Quote Originally Posted by hcvv View Post
    And the workstation runs openSUSE I assume. And the version of openSUSE is ........?

  5. #5

    Default Re: torque pbs in workstation failed

    yes, I compiled torque with GPU support.

    But the major problem is configuring the PBS system in SUSE. I don't know where is the problems....


    Quote Originally Posted by tsu2 View Post
    Since this appears to be GPU computing,
    I don't see anything obvious, and the openSUSE version is probably less relevant (the cuda sdk version is revealed in the command).

    My guess is that you'll get a better response asking in a CUDA forum.

    IMO,
    TSU

  6. #6
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    26,796
    Blog Entries
    15

    Default Re: torque pbs in workstation failed

    Quote Originally Posted by albumns View Post
    yes, I compiled torque with GPU support.

    But the major problem is configuring the PBS system in SUSE. I don't know where is the problems....
    Hi
    Why not ask the package maintainers to upgrade the OBS version?
    https://build.opensuse.org/package/s...package=torque

    Else check out how the configure via the files and spec file.
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  7. #7

    Default Re: torque pbs in workstation failed

    thx a lot for advice.

    I go through the link you provided but I cannot find anything concerning how to configure it.....:-(


    Quote Originally Posted by malcolmlewis View Post
    Hi
    Why not ask the package maintainers to upgrade the OBS version?
    https://build.opensuse.org/package/s...package=torque

    Else check out how the configure via the files and spec file.

  8. #8
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    26,796
    Blog Entries
    15

    Default Re: torque pbs in workstation failed

    Quote Originally Posted by albumns View Post
    thx a lot for advice.

    I go through the link you provided but I cannot find anything concerning how to configure it.....:-(
    Hi
    The spec file shows what is going where? Then there are the sysconfig files for configuration and systemd service files to fire things up.

    I see it's enabled for nvidia gpus, so maybe just ask the maintainers to update?
    https://build.opensuse.org/package/u...cluster/torque
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  9. #9

    Talking Re: torque pbs in workstation failed

    Thank you for further advice.

    Finally I got some progress, I can at least make the whole machine as one node now:

    Code:
    cudaC torque-5.1.1/sbin# pbsnodes 
    cudaC
         state = free
         power_state = Running
         np = 12
         ntype = cluster
         status = rectime=1435653757,cpuclock=Fixed,varattr=,jobs=,state=free,netload=59771850,gres=,loadave=0.11,ncpus=48,physmem=65982324kb,availmem=86084536kb,totmem=86954864kb,idletime=245,nusers=2,nsessions=6,sessions=1519 2350 2353 11014 11017 11049,uname=Linux cudaC 3.16.7-21-desktop #1 SMP PREEMPT Tue Apr 14 07:11:37 UTC 2015 (93c1539) x86_64,opsys=linux
         mom_service_port = 15002
         mom_manager_port = 15003
         gpus = 4
         gpu_status = gpu[3]=gpu_id=0000:83:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:83:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[2]=gpu_id=0000:82:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:82:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[1]=gpu_id=0000:03:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:03:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=46 C,gpu[0]=gpu_id=0000:02:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:02:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=45 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=1%;gpu_temperature=40 C,driver_ver=346.46,timestamp=Tue Jun 30 10:42:37 2015
    However, I don't know how to create multiple node in a single workstation.....

    I've contacted with the torque developer for SUSE, but I didn't get reply yet. I installed the latest torque version by the command line which I posted yesterday.

    thanks a lot

  10. #10

    Default Re: torque pbs in workstation failed

    Quote Originally Posted by albumns View Post
    Thank you for further advice.

    Finally I got some progress, I can at least make the whole machine as one node now:

    Code:
    cudaC torque-5.1.1/sbin# pbsnodes 
    cudaC
         state = free
         power_state = Running
         np = 12
         ntype = cluster
         status = rectime=1435653757,cpuclock=Fixed,varattr=,jobs=,state=free,netload=59771850,gres=,loadave=0.11,ncpus=48,physmem=65982324kb,availmem=86084536kb,totmem=86954864kb,idletime=245,nusers=2,nsessions=6,sessions=1519 2350 2353 11014 11017 11049,uname=Linux cudaC 3.16.7-21-desktop #1 SMP PREEMPT Tue Apr 14 07:11:37 UTC 2015 (93c1539) x86_64,opsys=linux
         mom_service_port = 15002
         mom_manager_port = 15003
         gpus = 4
         gpu_status = gpu[3]=gpu_id=0000:83:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:83:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[2]=gpu_id=0000:82:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:82:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=44 C,gpu[1]=gpu_id=0000:03:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:03:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=23 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=0%;gpu_temperature=46 C,gpu[0]=gpu_id=0000:02:00.0;gpu_pci_device_id=398594270;gpu_pci_location_id=0000:02:00.0;gpu_product_name=Graphics Device;gpu_display=Enabled;gpu_fan_speed=22%;gpu_memory_total=12287 MB;gpu_memory_used=45 MB;gpu_mode=Default;gpu_state=Unallocated;gpu_utilization=0%;gpu_memory_utilization=1%;gpu_temperature=40 C,driver_ver=346.46,timestamp=Tue Jun 30 10:42:37 2015
    However, I don't know how to create multiple node in a single workstation.....

    I've contacted with the torque developer for SUSE, but I didn't get reply yet. I installed the latest torque version by the command line which I posted yesterday.

    thanks a lot

    Hi,

    I am facing the same problem. How did you reach to this level or resolve this issue. I have HP Z840 workstation with 2 nvidia quadro 4200 GPUs and wish to use the same for GPU computing. I followed your instructions of compiling torque with cuda enabled. However, pbsnodes gives the same error as mentioned in your previous post. I am not able to resolve the issue please let me know how to proceed.

    thanks and regards,
    ajit

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •