I cannot get ocfs2 to function on Opensuse 11.0 with the user mode heartbeat pulse (as opposed to the ocfs2’s own file-based heartbeat).
To reproduce: Opensuse11.0 X86_64 with updates, configure a ocfs2 Heartbeat V2 Filesystem resource backed by an iSCSI volume. Try and mount it.
I can create, manually mount and dismount, and use Ocfs2 in the file-based heartbeat mode without error. However, this is incompatable with the Linux-HA Heartbeat package’s use of ocfs2, as noted in the various guides available on the internet, from Novell and others. Empirical testing shows that Heartbeat will indeed not mount ocfs2 filesystems when rco2cb configure has user-mode heartbeat=Y.
The symptoms of the failure are striking and make debugging very difficult. As soon as Heartbeat attempts to mount an ocfs filesystem, the node becomes inaccessible via console or the Internet. Alt-SysRq-b is even ignored! Only power cycling or reset will reboot and return the node to an operating condition.
With what few clues I am left (/var/log/messages ends before any errors that would explain), and having disabled Heartbeat’s fencing as much as possible, it appears that the ocfs2 filesystem’s built-in fencing is freezing the system, lest the filesystem be corrupted. Unfortunately I have no idea how to debug this with what I have. Or even confirm that it is the case. Other possibilities are welcome, if anyone can supply them.
Exacerbating this is the fact that I have no other “user-mode heartbeat” generator other than HA Heartbeat’s to test with, and Heartbeat demands that the Ocfs2 Filesystem resource be a clone, so that I have to reboot multiple systems afterwards, not just one.
Please, can anyone shed light on this? Or at least provide suggestions as to how I can:
-
test ocfs2 user mode heartbeat with a program other than Linux-HA Heartbeat, if such a program exists.
-
cancel, limit, or disable any ocfs2 fencing. Or at least work around it for long enough to at least get some useful error messages logged. Yes, I am aware this could cause file corruption on the ocfs2 filesystem. However, there will never be anything useful on the ocfs2 filesystem unless I can get this fixed.
-
use the Linux-HA Heartbeat V2 Filesystem resources to mount and monitor ocfs2 in file-based heartbeat mode, or
-
otherwise usefully approach this horrible mess.
I apologize for the complaints but this problem seems to be designed to be undebuggable in any ordinary way.
Note: in the above I have tried to distinguish between the “heartbeat” signal used by ocfs2 and the “Heartbeat” linux-HA software by always capitalizing the latter.