10Gb ethernet (Mellanox ConnectX-2)

I have a couple of computers (both Dell R610 dual hexacore 3.33 GHz) which were linked together through standard 1Gb ethernet using a 1Gb router DHCP. Then I tried to modify the setup to use a couple of Mellanox ConnectX-2 10Gb PCIe cards directly connected between the computers. I was able to disconnect the 1Gb ethernet (192.167.1.x) and could ssh between the computers using just the directly-connected Mellanox cards (thereby verifying operation). I was also able to run OpenMPI 3.0.0 (which I compiled in /usr/local) apps across the cards (they are 192.166.1.1 and 192.166.1.3). I also have three exported directories from the “head node” (via NFS; 192.166.1.1) using the hostname and IP addresses, which the other system mounts correctly. So the hardware setup seems to be correct.

My question is: I don’t know how to tell if I’m achieving full bandwidth. I receive this notification when I run an openmpi app.:

submitting:   ./E011_NINT_patti.qsub
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           linux-68t00
  Local device:         mlx4_0
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
[linux-68t00:26916] 19 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[linux-68t00:26916] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

I guess the way I would know if this notification was an error was if I wasn’t getting well over 1Gb transfer rate - but I don’t know how to test that. I’m wondering if I needed to compile OpenMPI 3.0.0 with “openib BTL” or some other support.

Thanks for any help!
PattiM

PS: I wasn’t sure if this question should be in this forum or in the hardware forum! :slight_smile:

Looks like a job for “iperf”.

AK


Never attribute to malice that which can be adequately explained by stupidity.
(R.J. Hanlon)

Depending on characteristics of your direct connection,
You <might> also benefit by configuring Layer 2 Jumbo Frames and/or enlarging your Layer 3 tcp/ip sliding windows…

I describe both briefly and how to modify layer 3 sliding windows in a paper I wrote long ago, you may have to do a bit of research into what I described then should be updated for what is recommended today

https://sites.google.com/site/4techsecrets/optimize-and-fix-your-network-connection

TSU

Thanks! iperf - who knew? lol!

BTW: I rebuild openMPI 3.0.0 and this time the ./configure utility detected openib and included it; however, I still get the notification in my initial email. Hopefully iperf will tell me something. Another fun Sunday morning!! rotfl!

Thank You!! I’ve heard of (and set) “jumbo frames” before on a router, and I’ve seen them work. I will check out your paper - maybe the Mellanox hardware is much more flexible than a regular NIC. :slight_smile:

server issues - so I tested with iperf and received promising results - apparently verifying normal hardware operation because, as one would expect, just under 1 Gb and about 9+ Gb on the IB. (the 192.167. is gigabit ethernet and the 192.166. is IB)

patti@linux-68t2:~> iperf3 -c 192.166.1.1                                                                       
Connecting to host 192.166.1.1, port 5201                                                                       
  5] local 192.166.1.3 port 35008 connected to 192.166.1.1 port 5201                                           
 ID] Interval           Transfer     Bitrate         Retr  Cwnd                                                 
  5]   0.00-1.00   sec  1.03 GBytes  8.81 Gbits/sec  140    518 KBytes                                          
  5]   1.00-2.00   sec  1.07 GBytes  9.18 Gbits/sec    0    684 KBytes                                          
  5]   2.00-3.00   sec  1.08 GBytes  9.27 Gbits/sec    0    759 KBytes                                          
  5]   3.00-4.00   sec  1.07 GBytes  9.21 Gbits/sec    0    795 KBytes                                           
  5]   4.00-5.00   sec  1.09 GBytes  9.33 Gbits/sec    0    805 KBytes                                           
  5]   5.00-6.00   sec  1.06 GBytes  9.12 Gbits/sec    0    816 KBytes                                             
  5]   6.00-7.00   sec  1.02 GBytes  8.76 Gbits/sec    0    827 KBytes                                               
  5]   7.00-8.00   sec  1.08 GBytes  9.29 Gbits/sec    0    827 KBytes                                               
  5]   8.00-9.00   sec  1.08 GBytes  9.26 Gbits/sec    0    829 KBytes                                               
  5]   9.00-9.48   sec   518 MBytes  9.15 Gbits/sec    0    830 KBytes                                             
- - - - - - - - - - - - - - - - - - - - - - - - -                                                                     
 ID] Interval           Transfer     Bitrate         Retr                                                            
  5]   0.00-9.48   sec  10.1 GBytes  9.14 Gbits/sec  140             sender                                          
  5]   0.00-9.48   sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated
patti@linux-68t2:~> iperf3 -c 192.167.1.3
Connecting to host 192.167.1.3, port 5201
  5] local 192.167.1.2 port 37590 connected to 192.167.1.3 port 5201
 ID] Interval           Transfer     Bitrate         Retr  Cwnd
  5]   0.00-1.00   sec   114 MBytes   955 Mbits/sec    0    363 KBytes       
  5]   1.00-2.00   sec   112 MBytes   937 Mbits/sec    0    413 KBytes       
  5]   2.00-3.00   sec   112 MBytes   939 Mbits/sec    0    413 KBytes       
  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec    0    413 KBytes       
  5]   4.00-5.00   sec   113 MBytes   945 Mbits/sec    0    434 KBytes       
  5]   5.00-6.00   sec   112 MBytes   943 Mbits/sec    0    434 KBytes       
  5]   6.00-7.00   sec   112 MBytes   941 Mbits/sec    0    434 KBytes       
  5]   7.00-8.00   sec   112 MBytes   944 Mbits/sec    0    454 KBytes       
  5]   8.00-9.00   sec   111 MBytes   933 Mbits/sec    0    522 KBytes       
^C  5]   9.00-9.48   sec  52.4 MBytes   924 Mbits/sec    0    546 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
 ID] Interval           Transfer     Bitrate         Retr
  5]   0.00-9.48   sec  1.04 GBytes   941 Mbits/sec    0             sender
  5]   0.00-9.48   sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated
patti@linux-68t2:~> 

So now I have to read up on what iperf is actually doing, because it may be using different software/libs than openmpi (and these may not use the same, say, kernel modules? Is that a possibility?):

patti@linux-68t00:~/decks> mpirun -np 20 E011_NINT_patti
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           linux-68t00
  Local device:         mlx4_0
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
[linux-68t00:26916] 19 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[linux-68t00:26916] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Something still isn’t right because I have to “force” mpirun to use the 192.166 path by unplugging the cables from 192.167, and sometimes also rebooting. It’s also tricky to get the 192.166 actually going after boot - but that may be because I’m not using a IB router - I’m just connecting two IB backplanes by a cable (I read that this was a supported IB ConnectX-2 configuration). So I tried a few diagnostics I found in yast, but I don’t really know how to interpret. (I’m adding them here in case anyone recognizes anything.)

linux-68t00:/home/patti # iblinkinfo
ibwarn: [25263] mad_rpc_open_port: client_register for mgmt 1 failed
Failed to open (null) port 0
linux-68t00:/home/patti # ibnetdiscover
ibwarn: [25268] mad_rpc_open_port: client_register for mgmt 1 failed
src/ibnetdisc.c:784; can't open MAD port ((null):0)
ibnetdiscover: iberror: failed: discover failed

linux-68t00:/home/patti # ibstat
CA 'mlx4_0'
        CA type: MT26448
        Number of ports: 1                                                                                            
        Firmware version: 2.9.1200                                                                                      
        Hardware version: b0                                                                                             
        Node GUID: 0x0002c9030054e89a                                                                                      
        System image GUID: 0x0002c9030054e89a                                                                               
        Port 1:                                                                                                              
                State: Active                                                                                                 
                Physical state: LinkUp                                                                                          
                Rate: 10                                                                                                          
                Base lid: 0                                                                                                       
                LMC: 0                                                                                                              
                SM lid: 0                                                                                                           
                Capability mask: 0x04010000                                                                                              
                Port GUID: 0x0202c9fffe54e89a                                                                                            
                Link layer: Ethernet                                                                                                     
linux-68t00:/home/patti # ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0202:c9ff:fe54:e89a
        base lid:        0x0
        sm lid:          0x0
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            10 Gb/sec (1X QDR)
        link_layer:      Ethernet

linux-68t00:/home/patti # 
linux-68t00:/home/patti # ibdiagui
Loading IBDIAGUI from: /usr/lib64/ibdiagui1.5.7
Loading IBDM from: /usr/lib64/ibdm1.5.7
-W- Topology file is not specified.
    Reports regarding cluster links will use direct routes.
-I- Using port 1 as the local port.
-E- Fail to ibvs_bind.
linux-68t00:/home/patti # ibdmchk
-------------------------------------------------
-E- Could not find a readble osm.fdbs in /var/log or  /tmp
-E- Could not find a readble osm.mcfdbs in /var/log or  /tmp
-E- Could not find a readble /var/log/osm-subnet.lst or /tmp/subnet.lst
linux-68t00:/home/patti # 
linux-68t00:/home/patti # ibdiagnet
Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.5.7
-W- Topology file is not specified.
    Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib64/ibdm1.5.7
-I- Using port 1 as the local port.
-E- Fail to ibvs_bind.
linux-68t00:/home/patti #
linux-68t00:/home/patti # rxe_cfg
Can't exec "ifconfig": No such file or directory at /usr/bin/rxe_cfg line 192.
Can't exec "ifconfig": No such file or directory at /usr/bin/rxe_cfg line 192.
Can't exec "ifconfig": No such file or directory at /usr/bin/rxe_cfg line 192.
Can't exec "ifconfig": No such file or directory at /usr/bin/rxe_cfg line 192.
Can't exec "ifconfig": No such file or directory at /usr/bin/rxe_cfg line 192.
rdma_rxe module not loaded
  Name  Link  Driver   Speed  NMTU  IPv4_addr  RDEV  RMTU  
  em1   no    bnx2                                         
  em2   no    bnx2                                         
  em3   yes   bnx2                                         
  em4   no    bnx2                                         
  p1p1  yes   mlx4_en                                      
linux-68t00:/home/patti #

But thanks for the suggestion of iperf!! lol! Now I know for sure that the hardware is operating correctly.