Volume group failover error when DLM killed in one node.

Hello,

I have configured a two-node cluster using stonith SBD device as a fence device. Now I’m testing the cluster behavior in different scenarios in order to validate the correct configuration. Cluster configuration includes DLM/CLVMD clones in both nodes that manage the lockspace for a common volume group.

  1. A case that I’m testing now is the behavior of the cluster when dlm process is killed in one node (the one that has common volume group as active). The result of this scenario is that the dirty node is fenced (correct) , but the when the volume group attempts to start to the second node fails due to dlm lockspace error. I would expect for the volume group to be able to failover to the second node. The following errors are evident:

2017-09-11T09:48:08.151190+03:00 nm-migr-srv02 kernel: 1047.458464] dlm: node 2: socket error sending to node 1, port 21064, sk_err=113/1132017-09-11T09:48:14.419357+03:00 nm-migr-srv02 lrmd[2605]: warning: migr-common_start_0 process (PID 5508) timed out
2017-09-11T09:48:14.420049+03:00 nm-migr-srv02 lvm[4156]: Read on local socket 5, len = 0
2017-09-11T09:48:14.420447+03:00 nm-migr-srv02 lvm[4156]: EOF on local socket: inprogress=1
2017-09-11T09:48:14.420788+03:00 nm-migr-srv02 lvm[4156]: Sending SIGUSR2 to pre&post thread (0x55b60fb427f0 in-progress)
2017-09-11T09:48:14.421124+03:00 nm-migr-srv02 lvm[4156]: ret == 0, errno = 2. removing client
2017-09-11T09:48:14.421466+03:00 nm-migr-srv02 lrmd[2605]: warning: migr-common_start_0:5508 - timed out after 30000ms
2017-09-11T09:48:14.421823+03:00 nm-migr-srv02 lrmd[2605]: notice: finished - rsc:migr-common action:start call_id:81 pid:5508 exit-code:1 exec-time:30003ms queue-time:0ms
2017-09-11T09:48:14.422170+03:00 nm-migr-srv02 crmd[2609]: error: Result of start operation for migr-common on nm-migr-srv02: Timed Out
2017-09-11T09:48:14.427834+03:00 nm-migr-srv02 crmd[2609]: warning: Action 28 (migr-common_start_0) on nm-migr-srv02 failed (target: 0 vs. rc: 1): Error
2017-09-11T09:48:14.428232+03:00 nm-migr-srv02 crmd[2609]: notice: Transition aborted by operation migr-common_start_0 ‘modify’ on nm-migr-srv02: Event failed
2017-09-11T09:48:14.428581+03:00 nm-migr-srv02 crmd[2609]: warning: Action 28 (migr-common_start_0) on nm-migr-srv02 failed (target: 0 vs. rc: 1): Error

  1. Another scenario that I have solved but I would expect a different behavior is the case when clvmd process is killed on the node where volume group is active. When monitor interval in volume_group was more frequent (f.e. 10s) than the one in clvmd (f.e. 20s), volume group returns error before clvmd and although failover works fine, cleanup of resources leaves cluster in unstable mode (until reboot of two nodes). In order to solve this problem, I had to make monitor interval of clvmd more frequent than the one of volume group. Another solution is to remove totally the monitor process of volume group.

Could you help me please for the first case at least? I’m sending attached the configuration of the cluster.

primitive clvmd ocf:lvm2:clvmd \ operations $id=clvmd-operations
op monitor interval=20 timeout=60
op start interval=0 timeout=90
op stop interval=0 timeout=120
params daemon_timeout=160 daemon_options=-d2
primitive dlm ocf:pacemaker:controld
operations $id=dlm-operations
op monitor interval=30 start-delay=0 timeout=60
op start interval=0 timeout=90
op stop interval=0 timeout=120
primitive migr-common LVM
operations $id=migr-common-operations
op start interval=0 timeout=30
op stop interval=0 timeout=30
params volgrpname=migr-common exclusive=true
group dlm-clvm dlm clvmd
group migr VIP migr-common migrfs
meta target-role=Started
clone dlm-clvm-clone dlm-clvm
meta interleave=true ordered=true target-role=Started
order ord-dlm-clvm-migr inf: dlm-clvm-clone migr-common

Is there any suggestion regarding this issue? My intention is for the server that has DLM killed or stopped to be able to restart dlm or to be fenced. In many other virtual servers that with SLES 11 SP2-4, I used the following configuration:

params daemon=“dlm_controld.pcmk” args=“-q 0” configdir=“/sys/kernel/config”

But on Leap 42.3 in case that this configuration is used, CLVMD can login and use dlm lockspace. So, I have it with the default configuration.

I tried the test - kill dlm_controld daemon on nodeA - on SLE12 sp3 (I’d say almost all software are the same as Leap42.3). The result is expected: nodeA get fenced and rejoined successfully while nodeB is running well.

===
clvm2:~ # pgrep -a dlm
1715 dlm_controld -s 0
1843 dlm_scand
1844 dlm_recv
1845 dlm_send
1846 dlm_recoverd
clvm2:~ # kill -9 1715

===
Stack: corosync
Current DC: clvm1 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Oct 16 21:07:35 2017
Last change: Mon Oct 16 20:42:54 2017 by root via cibadmin on clvm1

2 nodes configured
7 resources configured

Node clvm1: online
dlm (ocf::pacemaker:controld): Started
libvirt_stonith (stonith:external/libvirt): Started
vgtest1 (ocf::heartbeat:LVM): Started
clvm (ocf::heartbeat:clvm): Started
Node clvm2: OFFLINE

I think openSUSE leap42.3 is actively supported by developers. Please file a bug for https://bugzilla.suse.com/ to get help.
I rarely visit forums, this time google led me to this thread.

  1. check if you sbd stonith works well…
  2. From sle12/openSUSE42.1, we recommend to use this RA for clvmd below, not "primitive clvmd ocf:lvm2:clvmd \ operations $id=clvmd-operations ".

“clvm” RA is from resource-agent project, while “clvmd” was from lvm2 package, deprecated now.

===
crm(live)configure# show
node 172204569: clvm1
node 172204570: clvm2
primitive clvm clvm
op start interval=0 timeout=90
op stop interval=0 timeout=90
op monitor interval=20 timeout=90
primitive dlm ocf:pacemaker:controld
op start interval=0 timeout=90
op stop interval=0 timeout=100
op monitor interval=20 timeout=600

primitive vgtest1 LVM
op start interval=0 timeout=90 trace_ra=1
op stop interval=0 timeout=100 trace_ra=1
op monitor interval=20 timeout=100
params volgrpname=vgtest1
group base-group dlm clvm vgtest1
clone base-clone base-group

Hello zRen,

Thanks for your recommendations. Indeed, I replaced “ocf:LVM:clvmd” with “ocf:heartbeat:clvm” and now when DLM process is killed the resources failover to second node (the failed one is fenced and is kept powered-off). However , I have a last problem, that I would like to solve and it’s related to Stonith/SBD. My configuration is the following now:

node 1: nm-migr-srv01 \ attributes standby=off
node 2: nm-migr-srv02
attributes standby=off
primitive clvm clvm
op start interval=0 timeout=90
op stop interval=0 timeout=100
op monitor interval=20 timeout=60
primitive dlm ocf:pacemaker:controld
op monitor interval=60 timeout=60
op start timeout=90 interval=0
op stop timeout=100 interval=0
primitive migr-common LVM
params volgrpname=migr-common exclusive=yes
op monitor interval=60 timeout=60
op start timeout=30 interval=0
op stop timeout=30 interval=0
primitive migrfs Filesystem
operations $id=migrfs-operations
op monitor interval=10 start-delay=0 timeout=60 OCF_CHECK_LEVEL=10
op start interval=0 timeout=60
op stop interval=0 timeout=60
params device=“/dev/migr-common/lv_common” directory=“/projects” options=“rw,relatime,data=ordered” fstype=ext4 fast_stop=yes statusfile_prefix=“.Filesystem_status/” run_fsck=auto
primitive stonith_sbd stonith:external/sbd
meta target-role=Started
params pcmk_delay_max=20 sbd_device=“/dev/mapper/sbd_device_part1;/dev/mapper/sbd_device_2”
group base-group dlm
group migr clvm migr-common migrfs
meta target-role=Started
clone base-clone base-group
meta interleave=true target-role=Started
order ord-dlm-clvm-migr inf: base-clone migr
property cib-bootstrap-options:
have-watchdog=true
dc-version=1.1.17-279.9-d134f83b4
cluster-infrastructure=corosync
cluster-name=cluster-migr
no-quorum-policy=ignore
default-resource-stickiness=100
stonith-timeout=129
stonith-action=poweroff
last-lrm-refresh=1510140522

  1. The problem is that when DLM is killed via “pkill -9 dlm” , stonith/sbd is the only resource that fails to failover to the second node. Stonith/SBD failover fails ONLY on this scenario. This means that the situation is something like that:

Scenario A: Node A hosts Stonith/SBD and is powered off , Result:Successful (Everything migrates to NodeB)
Scenario B: Node A hosts Stonith/SBD and is rebooted, Result:Successful (Everything migrates to NodeB)
Scenario C: Node A hosts Stonith/SBD and rest of resources, CLVM killed, Result: Successful (CLVM and LVM are restarted on the node)
Scenario D: Node A hosts only the rest of resources, DLM killed, Result: Successful (Rest of resource migrate to the NodeB)
Scenario E: Node A hosts Stonith/SBD and rest of resources, DLM killed, Result: Partially Successfull (Stonith/SBD first attempts to migrate on NodeB are unsuccessful. Rest of resource migrate to NodeB when Stonith/SBD reaches MaxFailure limit and it’s stopped. Cleanup of Stonith/SBD is successful and is started without problem).

So, I would like a recommendation about Scenario E. The following errors appear:

2017-11-08T13:24:30.756707+02:00 nm-migr-srv01 dlm_controld[2677]: 721 fence result 2 pid 3180 result 0 exit status2017-11-08T13:24:30.757410+02:00 nm-migr-srv01 dlm_controld[2677]: 721 fence status 2 receive 0 from 1 walltime 1510140270 local 721
2017-11-08T13:24:30.937860+02:00 nm-migr-srv01 crmd[2451]: warning: Timer popped (timeout=20000, abort_level=1000000, complete=false)
2017-11-08T13:24:30.938201+02:00 nm-migr-srv01 crmd[2451]: error: [Action 4]: In-flight rsc op stonith_sbd_start_0 on nm-migr-srv01 (priority: 0, waiting: none)
2017-11-08T13:24:30.938441+02:00 nm-migr-srv01 crmd[2451]: warning: rsc_op 4: stonith_sbd_start_0 on nm-migr-srv01 timed out
2017-11-08T13:24:30.941705+02:00 nm-migr-srv01 crmd[2451]: warning: Action 4 (stonith_sbd_start_0) on nm-migr-srv01 failed (target: 0 vs. rc: 1): Error
2017-11-08T13:24:30.942033+02:00 nm-migr-srv01 crmd[2451]: warning: Action 4 (stonith_sbd_start_0) on nm-migr-srv01 failed (target: 0 vs. rc: 1): Error
2017-11-08T13:24:36.888416+02:00 nm-migr-srv01 corosync[1980]: [TOTEM ] A processor failed, forming new configuration.
2017-11-08T13:24:42.889654+02:00 nm-migr-srv01 corosync[1980]: [TOTEM ] A new membership (10.40.1.69:764) was formed. Members left: 2
2017-11-08T13:24:42.890329+02:00 nm-migr-srv01 corosync[1980]: [TOTEM ] Failed to receive the leave message. failed: 2
2017-11-08T13:24:42.890580+02:00 nm-migr-srv01 corosync[1980]: [QUORUM] Members[1]: 1
2017-11-08T13:24:42.890816+02:00 nm-migr-srv01 corosync[1980]: [MAIN ] Completed service synchronization, ready to provide service.
2017-11-08T13:24:42.891060+02:00 nm-migr-srv01 pacemakerd[2393]: notice: Node nm-migr-srv02 state is now lost
2017-11-08T13:24:42.891313+02:00 nm-migr-srv01 crmd[2451]: notice: Node nm-migr-srv02 state is now lost
2017-11-08T13:24:42.891596+02:00 nm-migr-srv01 crmd[2451]: warning: No reason to expect node 2 to be down
2017-11-08T13:24:42.891840+02:00 nm-migr-srv01 crmd[2451]: notice: Stonith/shutdown of nm-migr-srv02 not matched
2017-11-08T13:24:42.892325+02:00 nm-migr-srv01 crmd[2451]: notice: Updating quorum status to true (call=68)
2017-11-08T13:24:42.893213+02:00 nm-migr-srv01 kernel: 733.771956] dlm: closing connection to node 2
2017-11-08T13:25:44.933655+02:00 nm-migr-srv01 stonith-ng[2445]: notice: Action poweroff (d54e4b74-ab57-44d1-a246-fb81e4828a47) for nm-migr-srv02 (crmd.2451) timed out
2017-11-08T13:25:44.934290+02:00 nm-migr-srv01 stonith-ng[2445]: error: Operation poweroff of nm-migr-srv02 by nm-migr-srv01 for crmd.2451@nm-migr-srv01.d54e4b74: Timer expired
2017-11-08T13:25:44.934677+02:00 nm-migr-srv01 crmd[2451]: notice: Stonith operation 2/34:0:0:ceabd0cf-b745-4ccb-8604-611c9abcc4e4: Timer expired (-62)
2017-11-08T13:25:44.934979+02:00 nm-migr-srv01 crmd[2451]: notice: Stonith operation 2 for nm-migr-srv02 failed (Timer expired): aborting transition.
2017-11-08T13:25:44.935243+02:00 nm-migr-srv01 crmd[2451]: notice: Peer nm-migr-srv02 was not terminated (poweroff) by nm-migr-srv01 on behalf of crmd.2451: Timer expired
2017-11-08T13:25:44.935497+02:00 nm-migr-srv01 crmd[2451]: notice: Transition 0 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=20, Source=/var/lib/pacemaker/pengine/pe-warn-27.bz2): Complete
2017-11-08T13:25:44.941277+02:00 nm-migr-srv01 pengine[2449]: notice: Watchdog will be used via SBD if fencing is required
2017-11-08T13:25:44.941617+02:00 nm-migr-srv01 pengine[2449]: notice: On loss of CCM Quorum: Ignore
2017-11-08T13:25:44.941873+02:00 nm-migr-srv01 pengine[2449]: warning: Processing failed op start for stonith_sbd on nm-migr-srv01: unknown error (1)
2017-11-08T13:25:44.942122+02:00 nm-migr-srv01 pengine[2449]: warning: Processing failed op start for stonith_sbd on nm-migr-srv01: unknown error (1)
2017-11-08T13:25:44.942502+02:00 nm-migr-srv01 pengine[2449]: warning: Forcing stonith_sbd away from nm-migr-srv01 after 1000000 failures (max=1000000)
2017-11-08T13:25:44.944636+02:00 nm-migr-srv01 pengine[2449]: notice: Stop stonith_sbd#011(nm-migr-srv01)
2017-11-08T13:25:44.944948+02:00 nm-migr-srv01 pengine[2449]: notice: Start clvm#011(nm-migr-srv01)
2017-11-08T13:25:44.945328+02:00 nm-migr-srv01 pengine[2449]: notice: Start migr-common#011(nm-migr-srv01)
2017-11-08T13:25:44.945573+02:00 nm-migr-srv01 pengine[2449]: notice: Start migrfs#011(nm-migr-srv01)
2017-11-08T13:25:44.962359+02:00 nm-migr-srv01 pengine[2449]: notice: Calculated transition 1, saving inputs in /var/lib/pacemaker/pengine/pe-input-213.bz2

Thanks for your help.