multipath

Hi,
I am trying to setup multipath with failover policy on openSuSE 11. I have two qla2xxx HBA’s installed and they appear to be working. Here is the output of “multipath -l” command

SAN_dsk (WWIDnnn) dm-0 SUN,CSM200_R
[size=1][features=1 queue_if_no_path][hwhandler=1 rdac]
_ round-robin 0 [prio=-1][enabled]
_ 4:0:0:0 sdc 8:32 [active][undef]
_ round-robin 0 [prio=-1][active]
_ 5:0:0:0 sdd 8:48 [active][undef]

While testing, I pulled one of the two connection to SAN, and the connection failed over to second HBA connection to SAN.

When I plug the cable back in, it does not fall back to original connection… It stays in failed state.

Also, I noticed that failed disk (sdd disk) comes back as (sdg disk), which is probably why connection does not fall back to original HBA.

But, when I run “/sbin/service multipathd restart” sdg disk shows as as enabled in multipath -l…

What am I missing here? Any ideas / pointers?

Thanks[/size]

> on openSuSE 11

is this on openSUSE 11.0, 11.1 or 11.2?

or SUSE Linux Enterprise Server (SLES)?

from the level of the question i guess the latter…

cat /etc/SuSE-release
should be definitive…

don’t confuse what i write: you ARE welcome here and eventually
someone who can help will probably wander by…but, for the most part
the volunteer helpers here on the openSUSE side are dealing with less
complex ‘problems’ (like the growing pains of n00bs fleeing Redmond
nose rings, etc)…but, if you are running SLES 11 i guess you are
likely to have a better answer from the folks you purchased from, over
at forums.novell.com

but, check back because there are a few folks here with the level of
know your Q needs…


goldie
Give a hacker a fish and you feed him for a day.
Teach man and you feed him for a lifetime.

@daksh

This is from memory, so please check multipath docs.

You have set up failover on your array, but you do not have automatic faiback set up. As far as I can remember it’s a config parameter.

And one more thing. Again, as far as I can remember, if and only if you have true active-active storage controller pair, round-robin i/o policy makes sense. For active-passive and ALUA controllers round-robin policy decreases performance. I do not know your array, so rr can be good choice.

HTH

Milan

I am using openSuSE version 11.0; here is the content of /etc/SuSE-release

openSUSE 11.0 (i586)
VERSION = 11.0

@ Milan,
I am reading man pages for multipath which does not say anything about failback; perhaps, it is time to start googling…
We have Sun StorageTek 6140.

Also, thanks for your tip about round-robin…

Thanks

Ahem.


man multipathd.conf

failback         Tell  the  daemon  to  manage  path group failback, or not to. 0 or immediate means immediate failback, values >0 means deferred failback (in seconds). manual means no failback. Default value is manual

Sorry, I am not familiar with your Sun box. Perhaps you should check at their site. And at device mapper related mail lists & sites.

Chances are, your storage is well known and device mapper & multipath have default configuration for it. Otherwise it’s reading time.

Have a nice day.

Milan

Found failback setting and set it in my multipath.conf

Here are steps I just took to test failover and failback (after setting failback)…


multipath -l
SAN_dsk (WWID-nnn) dm-0 SUN,CSM200_R
[size=1][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=-1][enabled]
 \_ 4:0:0:0  sdc 8:32  [active][undef]
\_ round-robin 0 [prio=-1][active]
 \_ 5:0:0:0  sdd 8:48  [active][undef]

Now, I unplug cable from one of the HBA’s to test fail over:


multipath -l
SAN_dsk (WWID-nnn) dm-0 SUN,CSM200_R
[size=1][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=-1][active]
 \_ 4:0:0:0  sdc 8:32  [active][undef]
\_ round-robin 0 [prio=-1][enabled]
 \_ #:#:#:#  -   #:#   [failed][undef]

I checked my data to make sure fail over worked, and it did…
I can still read and write to disk on SAN; so far so good.

Now few minutes later, I plug the cable back in to HBA, so it should fail back…


SAN_dsk (WWID-nnn) dm-0 SUN,CSM200_R
[size=1][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=-1][active]
 \_ 4:0:0:0  sdc 8:32  [active][undef]
\_ round-robin 0 [prio=-1][enabled]
 \_ #:#:#:#  -   #:#   [failed][undef]

But, it does not looks like fail back worked. Waited few minutes… Still same output of multipath -l

Now, I restart multipathd service


/sbin/service multipathd restart
Shutting down multipathd                                              done
Starting multipathd                                                   done

And, here is output of multipath -l after restarting multipathd


SAN_dsk (WWID-nnn) dm-0 SUN,CSM200_R
[size=1][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=-1][enabled]
 \_ 4:0:0:0  sdc 8:32  [active][undef]
\_ round-robin 0 [prio=-1][active]
 \_ 5:0:0:0  sde 8:64  [active][undef]

So, it seems like failback only works after restarting multipathd, and sdd now appears as sde.
May be disk changing from sdd to sde is causing failback to not work properly?[/size][/size][/size][/size]

daksh,

please read following thread at dm-devel:

[dm-devel] Question about dm-multipath and Sun StorageTek 6140](http://www.redhat.com/archives/dm-devel/2007-February/msg00016.html)

and set up your box accordingly.

Also, there might be other threads interesting to you at dm-devel.

Best wishes

Milan

Just read the article and some more about LUN trespassing…
Made chanes to my config file, but need to wait until I get OK to test again (as I do not know for sure if the changes I made will crash the machine).

Thanks Milan for your help in this; I will post the results after I test.

Tried testing failover and failback after making changes described here:
[dm-devel] Question about dm-multipath and Sun StorageTek 6140](http://www.redhat.com/archives/dm-devel/2007-February/msg00016.html)

But, same results as before.

In my test, after the fail over happens successfully, I reconnect the cable, but failback does not happen until I restart multipathd service.

Strange, and it feels like I am missing something simple…

Thanks

Strange, and it feels like I am missing something simple…

Is your multipathd running? I mean the daemon (service), not the user command multipath.

boot.multipathd will set up things during boot, but you do need running multipathd to take care of things.

I think it’s worth shecking.

Milan

Sorry for the late reply…
While checking the multipathd service is running:


/etc/init.d/multipathd status
Checking for multipathd:                                             running

Daksh,

I think that you should ask people at dm-devel list for help. After all, they are way more qualified to help you than myself.

Have a nice day.

Milan

I have a add on query on this topic :

Thanks to this query. Now I am able to see all the paths on suse 11.0 client which is connected to cluster head. I have discovered the FC lun and mounted on suse client. once after failing over the cluster head lun should be accessible from the other head. I performed the head failover, client is able to see the multipath as mentioned below however the mounted lun become readonly filesystem. My suse client has attached with Emulex HBA. Is there any changes I need to do oninitiator side to resolve this issue ?

More details :
Multipath after failover :
3600144f098396baf00004c2477c00004 dm-3 SUN,Sun <BOX>
[size=10][features=0][hwhandler=0][rw]
_ round-robin 0 [prio=0][enabled]
_ 0:0:0:3 sdd 8:48 [failed][faulty]
_ round-robin 0 [prio=1][active]
_ 0:0:1:3 sdk 8:160 [active][ready]
_ round-robin 0 [prio=0][enabled]
_ 1:0:0:3 sdr 65:16 [failed][faulty]
_ round-robin 0 [prio=1][enabled]
_ 1:0:1:3 sdy 65:128 [active][ready]

linux-rv2s:~ # cat /etc/multipath.conf
device {
vendor “SUN”

product “*”

path_grouping_policy group_by_prio
getuid_callout “/lib/udev/scsi_id --page=0x83 --whitelisted --device=/dev/%n”
path_checker tur
path_selector “round-robin 0”
prio alua
prio_calluot “/sbin/mpath_prio_tpc /dev/%n”
rr_weight uniform
failback immediate
hardware_handler “0”
no_path_retry 12
rr_min_io 100
}[/size]