Results 1 to 9 of 9

Thread: HA DRBD failover fails...

  1. #1

    Default HA DRBD failover fails...

    HelloI am newbie to HA and DRBD, but was confronted with a serious problem without any chance of trying to reproduce it.DRBD is used here to mirror /var/data.Looking at the logs of the primary server, it shutdown for what ever reason, and notified the secondary server to take overwhich started to correctly promote the secondary to the primary node followed by the application services.Then the resource is released. I am not sure if the error code 239 (see logs) from cryptosetup (called from Crypto script) provoked this release which must have stopped the services, but 'CRIT: Giving up resources due to failure' seems to be a strong beedidate.Supposedly, 239 states that the resource had been loaded.What's confusing to me is the order of events. Can anyone shed some light on this ? What's odd to me is that this error occured only 3 seconds after the primary server shutdown completely. Probably just a coincidence.OS 10.3 (2.6.22)Heartbeat 2.0.7DRBD 8.0.6
    Code:
    Jun 25 11:43:13 sms1 shutdown[6718]: shutting down for system haltJun 25 11:43:14 sms1 init: Switching to runlevel: 0....Jun 25 11:43:24 sms1 ResourceManager[6862]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr stopJun 25 11:43:24 sms1 Filesystem[7119]: INFO: Running stop for /dev/mapper/cr_drbd0 on /var/dataJun 25 11:43:24 sms1 Filesystem[7119]: INFO: Trying to unmount /var/dataJun 25 11:43:24 sms1 Filesystem[7119]: INFO: unmounted /var/data successfullyJun 25 11:43:24 sms1 Filesystem[7116]: INFO:  SuccessJun 25 11:43:24 sms1 ResourceManager[6862]: debug: /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr stop done. RC=0Jun 25 11:43:24 sms1 ResourceManager[6862]: info: Running /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stopJun 25 11:43:24 sms1 ResourceManager[6862]: debug: Starting /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stopJun 25 11:43:24 sms1 ResourceManager[6862]: debug: /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stop done. RC=0Jun 25 11:43:24 sms1 ResourceManager[6862]: info: Running /etc/ha.d/resource.d/drbddisk drbd0 stopJun 25 11:43:24 sms1 ResourceManager[6862]: debug: Starting /etc/ha.d/resource.d/drbddisk drbd0 stopJun 25 11:43:24 sms1 kernel: drbd0: role( Primary -> Secondary ) ....Jun 25 11:43:26 sms1 heartbeat: [3831]: info: sms1 Heartbeat shutdown complete.Jun 25 11:43:26 sms1 logd: [7262]: debug: Stopping ha_logd with pid 3772Jun 25 11:43:26 sms1 logd: [7262]: info: Waiting for pid=3772 to exitJun 25 11:43:26 sms1 logd: [3772]: debug: logd_term_action: received SIGTERMJun 25 11:43:26 sms1 logd: [3772]: debug: logd_term_action: waiting for 0 messages to be read by write processJun 25 11:43:26 sms1 logd: [3772]: debug: logd_term_action: sending SIGTERM to write processJun 25 11:43:26 sms1 logd: [3787]: info: logd_term_write_action: received SIGTERMJun 25 11:43:26 sms1 logd: [3787]: debug: Writing out 0 messages then quittingJun 25 11:43:26 sms1 logd: [3787]: info: ha_logd: Exiting write processJun 25 11:43:27 sms1 logd: [7262]: info: Pid 3772 exitedJun 25 11:43:27 sms1 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Jun 25 11:43:27 sms1 kernel: drbd0: short read receiving data: read 1680 expected 4096Jun 25 11:43:27 sms1 kernel: drbd0: error receiving Data, l: 4120!Jun 25 11:43:27 sms1 kernel: drbd0: asender terminatedJun 25 11:43:27 sms1 kernel: drbd0: tl_clear()Jun 25 11:43:27 sms1 kernel: drbd0: Connection closedJun 25 11:43:27 sms1 kernel: drbd0: Writing meta data super block now.Jun 25 11:43:27 sms1 kernel: drbd0: conn( Disconnecting -> StandAlone ) Jun 25 11:43:27 sms1 kernel: drbd0: receiver terminatedJun 25 11:43:27 sms1 kernel: drbd0: disk( UpToDate -> Diskless ) Jun 25 11:43:27 sms1 kernel: drbd0: drbd_bm_resize called with capacity == 0Jun 25 11:43:27 sms1 kernel: drbd0: worker terminatedJun 25 11:43:27 sms1 kernel: drbd: module cleanup done.Jun 25 11:43:27 sms1 sshd[3600]: Received signal 15; terminating.....
    Code:
    Jun 25 11:43:24 sms2 kernel: drbd0: peer( Primary -> Secondary ) Jun 25 11:43:24 sms2 kernel: klogd 1.4.1, ---------- state change ---------- Jun 25 11:43:24 sms2 heartbeat: [3855]: info: Received shutdown notice from 'sms1'.Jun 25 11:43:24 sms2 heartbeat: [3855]: info: Resources being acquired from sms1.Jun 25 11:43:24 sms2 heartbeat: [3855]: debug: StartNextRemoteRscReq(): child count 1Jun 25 11:43:24 sms2 heartbeat: [27727]: info: acquire all HA resources (standby).Jun 25 11:43:24 sms2 heartbeat: [27728]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys sms2] to acquire.Jun 25 11:43:24 sms2 heartbeat: [3855]: debug: StartNextRemoteRscReq(): child count 1Jun 25 11:43:24 sms2 ResourceManager[27747]: info: Acquiring resource group: sms1 IPaddr::xx.xx.xx.xx drbddisk::drbd0 Crypto::cr_drbd0::/dev/drbd0::/etc/key.cr_drbd0::noauto Filesystem::/dev/mapper/cr_drbd0::/var/data::ext3::acl,user_xattr sms_haJun 25 11:43:24 sms2 IPaddr[27771]: INFO:  Resource is stoppedJun 25 11:43:24 sms2 ResourceManager[27747]: info: Running /etc/ha.d/resource.d/IPaddr xx.xx.xx.xx startJun 25 11:43:24 sms2 ResourceManager[27747]: debug: Starting /etc/ha.d/resource.d/IPaddr xx.xx.xx.xx start...Jun 25 11:43:25 sms2 IPaddr[27814]: INFO:  SuccessJun 25 11:43:25 sms2 ResourceManager[27747]: debug: /etc/ha.d/resource.d/IPaddr xx.xx.xx.xx start done. RC=0Jun 25 11:43:25 sms2 ResourceManager[27747]: info: Running /etc/ha.d/resource.d/drbddisk drbd0 startJun 25 11:43:25 sms2 ResourceManager[27747]: debug: Starting /etc/ha.d/resource.d/drbddisk drbd0 startJun 25 11:43:25 sms2 kernel: drbd0: role( Secondary -> Primary ) Jun 25 11:43:25 sms2 kernel: drbd0: Writing meta data super block now.Jun 25 11:43:25 sms2 ResourceManager[27747]: debug: /etc/ha.d/resource.d/drbddisk drbd0 start done. RC=0Jun 25 11:43:25 sms2 ResourceManager[27747]: info: Running /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto startJun 25 11:43:25 sms2 ResourceManager[27747]: debug: Starting /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto startJun 25 11:43:26 sms2 ResourceManager[27747]: debug: /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto start done. RC=0Jun 25 11:43:26 sms2 Filesystem[28000]: INFO:  Resource is stoppedJun 25 11:43:26 sms2 ResourceManager[27747]: info: Running /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr startJun 25 11:43:26 sms2 ResourceManager[27747]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr startJun 25 11:43:26 sms2 Filesystem[28041]: INFO: Running start for /dev/mapper/cr_drbd0 on /var/dataJun 25 11:43:26 sms2 kernel: kjournald starting.  Commit interval 5 secondsJun 25 11:43:26 sms2 kernel: EXT3-fs warning: checktime reached, running e2fsck is recommendedJun 25 11:43:26 sms2 kernel: EXT3 FS on dm-0, internal journalJun 25 11:43:26 sms2 kernel: EXT3-fs: mounted filesystem with ordered data mode.Jun 25 11:43:26 sms2 Filesystem[28038]: INFO:  SuccessJun 25 11:43:26 sms2 ResourceManager[27747]: debug: /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr start done. RC=0Jun 25 11:43:26 sms2 logger: /etc/ha.d/resource.d/sms_ha status was calledJun 25 11:43:26 sms2 ResourceManager[27747]: info: Running /etc/ha.d/resource.d/sms_ha  startJun 25 11:43:26 sms2 ResourceManager[27747]: debug: Starting /etc/ha.d/resource.d/sms_ha  startJun 25 11:43:26 sms2 logger: /etc/ha.d/resource.d/sms_ha start was calledJun 25 11:43:26 sms2 su: (to lsvadb_3.0-2) root on noneJun 25 11:43:26 sms2 su: (to lsvadb_3.0-2) root on noneJun 25 11:43:27 sms2 kernel: drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) Jun 25 11:43:27 sms2 kernel: drbd0: Creating new current UUIDJun 25 11:43:27 sms2 kernel: drbd0: Writing meta data super block now.Jun 25 11:43:27 sms2 kernel: drbd0: meta connection shut down by peer.Jun 25 11:43:27 sms2 kernel: drbd0: asender terminatedJun 25 11:43:27 sms2 kernel: drbd0: tl_clear()Jun 25 11:43:27 sms2 kernel: drbd0: Connection closedJun 25 11:43:27 sms2 kernel: drbd0: conn( TearDown -> Unconnected ) Jun 25 11:43:27 sms2 kernel: drbd0: receiver terminatedJun 25 11:43:27 sms2 kernel: drbd0: receiver (re)startedJun 25 11:43:27 sms2 kernel: drbd0: conn( Unconnected -> WFConnection ) Jun 25 11:43:27 sms2 kernel: JBD: barrier-based sync failed on dm-0 - disabling barriersJun 25 11:43:29 sms2 su: (to lsva) root on none...Jun 25 11:43:29 sms2 etnetclient[28225]: Login successfully doneJun 25 11:43:29 sms2 etnetclient[28225]:  DBConn::Connect ...Jun 25 11:43:30 sms2 ResourceManager[27747]: debug: /etc/ha.d/resource.d/sms_ha  start done. RC=0Jun 25 11:43:30 sms2 heartbeat: [27727]: info: all HA resource acquisition completed (standby).Jun 25 11:43:30 sms2 heartbeat: [3855]: info: Standby resource acquisition done [all].Jun 25 11:43:30 sms2 heartbeat: [28288]: debug: notify_world: setting SIGCHLD Handler to SIG_DFLJun 25 11:43:30 sms2 harc[28288]: info: Running /etc/ha.d/rc.d/status statusJun 25 11:43:30 sms2 mach_down[28307]: info: Taking over resource group IPaddr::xx.xx.xx.xxJun 25 11:43:30 sms2 ResourceManager[28330]: info: Acquiring resource group: sms1 IPaddr::xx.xx.xx.xx drbddisk::drbd0 Crypto::cr_drbd0::/dev/drbd0::/etc/key.cr_drbd0::noauto Filesystem::/dev/mapper/cr_drbd0::/var/data::ext3::acl,user_xattr sms_haJun 25 11:43:30 sms2 IPaddr[28354]: INFO:  Running OKJun 25 11:43:30 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto startJun 25 11:43:30 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto startJun 25 11:43:30 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto start done. RC=239Jun 25 11:43:30 sms2 ResourceManager[28330]: ERROR: Return code 239 from /etc/ha.d/resource.d/CryptoJun 25 11:43:30 sms2 ResourceManager[28330]: CRIT: Giving up resources due to failure of Crypto::cr_drbd0::/dev/drbd0::/etc/key.cr_drbd0::noautoJun 25 11:43:30 sms2 ResourceManager[28330]: info: Releasing resource group: sms1 IPaddr::xx.xx.xx.xx drbddisk::drbd0 Crypto::cr_drbd0::/dev/drbd0::/etc/key.cr_drbd0::noauto Filesystem::/dev/mapper/cr_drbd0::/var/data::ext3::acl,user_xattr sms_haJun 25 11:43:30 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/sms_ha  stopJun 25 11:43:30 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/sms_ha  stopJun 25 11:43:31 sms2 logger: /etc/ha.d/resource.d/sms_ha stop was called....Jun 25 11:43:33 sms2 su: (to lsvadb_3.0-2) root on noneJun 25 11:43:34 sms2 etnetclient[28225]: LOG WARNING: TEXT NOT INSERTED: Jun 25 11:43:34 sms2 etnetclient[28225]: LOG TEXT : Security Module Server Shut Down.Jun 25 11:43:34 sms2 etnetclient[28225]: land.cpp 7:  security::Exception: Send Notification Mail: Fehler in Datenbank : Interner Fehler in Datenbank.(SqlState=YE000)Jun 25 11:43:34 sms2 etnetclient[28225]: land.cpp 7:  SM-Server ist beendet.Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/sms_ha  stop done. RC=0Jun 25 11:43:39 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr stopJun 25 11:43:39 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr stopJun 25 11:43:39 sms2 Filesystem[28606]: INFO: Running stop for /dev/mapper/cr_drbd0 on /var/dataJun 25 11:43:39 sms2 Filesystem[28606]: INFO: Trying to unmount /var/dataJun 25 11:43:39 sms2 Filesystem[28606]: INFO: unmounted /var/data successfullyJun 25 11:43:39 sms2 Filesystem[28603]: INFO:  SuccessJun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr stop done. RC=0Jun 25 11:43:39 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stopJun 25 11:43:39 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stopJun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stop done. RC=0Jun 25 11:43:39 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/drbddisk drbd0 stopJun 25 11:43:39 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/drbddisk drbd0 stopJun 25 11:43:39 sms2 kernel: drbd0: role( Primary -> Secondary ) Jun 25 11:43:39 sms2 kernel: drbd0: Writing meta data super block now.Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/drbddisk drbd0 stop done. RC=0Jun 25 11:43:39 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/IPaddr xx.xx.xx.xx stopJun 25 11:43:39 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/IPaddr xx.xx.xx.xx stopJun 25 11:43:39 sms2 avahi-daemon[3555]: Withdrawing address record for xx.xx.xx.xx on eth0.Jun 25 11:43:39 sms2 IPaddr[28720]: INFO:  SuccessJun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/IPaddr xx.xx.xx.xx stop done. RC=0Jun 25 11:43:39 sms2 mach_down[28307]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquiredJun 25 11:43:39 sms2 mach_down[28307]: info: mach_down takeover complete for node sms1.Jun 25 11:43:39 sms2 heartbeat: [3855]: info: mach_down takeover complete.Jun 25 11:43:56 sms2 heartbeat: [3855]: WARN: node sms1: is deadJun 25 11:43:56 sms2 heartbeat: [3855]: info: Dead node sms1 gave up resources.Jun 25 11:43:56 sms2 heartbeat: [3855]: info: Link sms1:eth0 dead.Jun 25 11:44:01 sms2 /usr/sbin/cron[28765]: (lsva) CMD (${HOME}/scripts/lsvascpcron)Jun 25 11:44:09 sms2 hb_standby[28777]: Going standby [foreign].Jun 25 11:44:09 sms2 heartbeat: [3855]: info: sms2 wants to go standby [foreign]Jun 25 11:44:19 sms2 heartbeat: [3855]: WARN: No reply to standby request.  Standby request cancelled.

  2. #2
    Join Date
    Mar 2010
    Location
    Austin - Texas
    Posts
    10,140
    Blog Entries
    48

    Smile Re: HA DRBD failover fails...

    This is pretty technical stuff and except that you had a problem, what help are you requesting? Consider that openSUSE 12.1 is current, 12.2 is next and we still support 11.4, but SUSE version 10.3 has been long left in the unsupported dust pile of past OS'. I also find that DR:BD, if this is to which you refer, has been moved to the main kernel since version 2.6.33 and openSUSE 12.2 will contain at least kernel version 3.4 while openSUSE 12.1 sports kernel 3.1. We also have the ability to load any newer kernel should you need it, though again, SUSE 10.3 is no longer supported. So, I ask, explicitly, what problem help request are you making and can you consider an upgrade to a newer version of openSUSE? For DRBD I found the following two links:

    DRBD:What is DRBD & DRBD-RBD is a part of Linux

    You can find the latest openSUSE versions here: software.opensuse.org: Download openSUSE 12.1

    Anyone can use the following script to update their kernel version if they like: S.A.K.C. - SUSE Automated Kernel Compiler - Version 2.75 - Blogs - openSUSE Forums

    Thank You,
    My Blog: https://forums.opensuse.org/blogs/jdmcdaniel3/

    Software efficiency halves every 18 months, thus compensating for Moore's Law

    Its James again from Austin, Texas

  3. #3
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    24,864

    Default Re: HA DRBD failover fails...

    Apart from the valuable information provided by jdmcdaniel3, I wonder how you got that logging posted within the CODE tags, but without any line feeds. Did you realy copy/paste that from a terminal window into the post? I guess you agree that this is even less readable then without CODE tags instead of them improving readability.
    Henk van Velden

  4. #4
    dd@home.dk NNTP User

    Default Re: HA DRBD failover fails...

    On 07/05/2012 06:46 PM, richardavilez wrote:
    > .OS 10.3 (2.6.22)


    is that SUSE Enterprise Linux v10 SP3? if it is, then it is still
    supported (i think) over at http://forums.suse.com/

    otherwise...well, i am probably the last one of those who hang out here
    to leave 10.3 hmmmmm about Feb 2011, so i'd be no help (i can't remember
    breakfast)..

    anyway, your setup is FAR more involved than was my little stand alone
    personal PC..

    i think your best bet is to *hope* you are running some kind of SUSE
    Enterprise...it is easy to find out for sure with:

    cat /etc/SuSE-release

    good luck!

    --
    dd


  5. #5

    Default Re: HA DRBD failover fails...

    Thanks and sorry for the nasty code post. I will not post it again because there seems to be little qualified responses except for the very good advise to switch to some other OS.
    Unfortunately its not up to me.
    Should there however be an expert with an interest, I will re-email particular logs part, hopefully in a readable format.

    tata

  6. #6
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    24,864

    Default Re: HA DRBD failover fails...

    It is not that amazing that further responses are not given. I guess that several people here are still waiting for some answers from your side to make up their mind. We are e.g. still not sure that you have openSUSE 10.3 (you saying OS 10.3, which can mean a lot of things, but almost certainly is not the outcome of some computer command like cat /etc/SuSE-release). And while we are not realy angry at you because we can not read the logs, we still can not read them.

    But, you may be correct in your assumption that not many people here use an HA setup and thus knowledge and experience maybe minimal.

    Wishing sucess in your search for a solution.
    Henk van Velden

  7. #7

    Default Re: HA DRBD failover fails...

    Quote Originally Posted by hcvv View Post
    It is not that amazing that further responses are not given. I guess that several people here are still waiting for some answers from your side to make up their mind. We are e.g. still not sure that you have openSUSE 10.3 (you saying OS 10.3, which can mean a lot of things, but almost certainly is not the outcome of some computer command like cat /etc/SuSE-release). And while we are not realy angry at you because we can not read the logs, we still can not read them.

    But, you may be correct in your assumption that not many people here use an HA setup and thus knowledge and experience maybe minimal.

    Wishing sucess in your search for a solution.
    Yeap,your're right. It's OpenSuse 10.3 kernel 2.6.22-19. I have read that Cryptosetup causes error 239 when a device had been mounted
    already but I don't see how this applies here unless it was never unmounted in first place. Fixes to CryptoSetup have not resolved the problem aparently https://bugzilla.redhat.com/show_bug.cgi?id=574933.

    Code:
    Jun 25 11:43:30 sms2 ResourceManager[27747]: debug: /etc/ha.d/resource.d/sms_ha  start done. RC=0Jun 25 11:43:30 sms2 heartbeat: [27727]: info: all HA resource acquisition completed (standby).
    Jun 25 11:43:30 sms2 heartbeat: [3855]: info: Standby resource acquisition done [all].
    Jun 25 11:43:30 sms2 heartbeat: [28288]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
    Jun 25 11:43:30 sms2 harc[28288]: info: Running /etc/ha.d/rc.d/status status
    Jun 25 11:43:30 sms2 mach_down[28307]: info: Taking over resource group IPaddr::192.184.76.59
    Jun 25 11:43:30 sms2 ResourceManager[28330]: info: Acquiring resource group: sms1 IPaddr::192.184.76.59 drbddisk::drbd0 Crypto::cr_drbd0::/dev/drbd0::/etc/key.cr_drbd0::noauto Filesystem::/dev/mapper/cr_drbd0::/var/data::ext3::acl,user_xattr sms_ha
    Jun 25 11:43:30 sms2 etnetclient[28225]: ALARMIERUNG: Ausfuehrung echo "SM Server gestartet." | mailx -s "SM SERVER NOTIFICATION" christoph.reichenbach@dtc-ag.ch erfolgreich 
    Jun 25 11:43:30 sms2 IPaddr[28354]: INFO:  Running OK
    Jun 25 11:43:30 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto start
    Jun 25 11:43:30 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto start
    Jun 25 11:43:30 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto start done. RC=239
    Jun 25 11:43:30 sms2 ResourceManager[28330]: ERROR: Return code 239 from /etc/ha.d/resource.d/Crypto
    Jun 25 11:43:30 sms2 ResourceManager[28330]: CRIT: Giving up resources due to failure of Crypto::cr_drbd0::/dev/drbd0::/etc/key.cr_drbd0::noauto
    Jun 25 11:43:30 sms2 ResourceManager[28330]: info: Releasing resource group: sms1 IPaddr::192.184.76.59 drbddisk::drbd0 Crypto::cr_drbd0::/dev/drbd0::/etc/key.cr_drbd0::noauto Filesystem::/dev/mapper/cr_drbd0::/var/data::ext3::acl,user_xattr sms_ha
    Jun 25 11:43:30 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/sms_ha  stop
    Jun 25 11:43:30 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/sms_ha  stop
    Jun 25 11:43:31 sms2 logger: /etc/ha.d/resource.d/sms_ha stop was called
    Jun 25 11:43:31 sms2 etnetclient[28225]: ALARMIERUNG: Ausfuehrung echo "SM Server gestartet." | mailx -s "SM SERVER NOTIFICATION" x@x erfolgreich 
    Jun 25 11:43:33 sms2 su: (to lsvadb_3.0-2) root on none
    Jun 25 11:43:34 sms2 etnetclient[28225]: LOG WARNING: TEXT NOT INSERTED: 
    Jun 25 11:43:34 sms2 etnetclient[28225]: LOG TEXT : Security Module Server Shut Down.
    Jun 25 11:43:34 sms2 etnetclient[28225]: land.cpp 7:  security::Exception: Send Notification Mail: Fehler in Datenbank : Interner Fehler in Datenbank.(SqlState=YE000)
    Jun 25 11:43:34 sms2 etnetclient[28225]: land.cpp 7:  SM-Server ist beendet.
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/sms_ha  stop done. RC=0
    Jun 25 11:43:39 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr stop
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr stop
    Jun 25 11:43:39 sms2 Filesystem[28606]: INFO: Running stop for /dev/mapper/cr_drbd0 on /var/data
    Jun 25 11:43:39 sms2 Filesystem[28606]: INFO: Trying to unmount /var/data
    Jun 25 11:43:39 sms2 Filesystem[28606]: INFO: unmounted /var/data successfully
    Jun 25 11:43:39 sms2 Filesystem[28603]: INFO:  Success
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/Filesystem /dev/mapper/cr_drbd0 /var/data ext3 acl,user_xattr stop done. RC=0
    Jun 25 11:43:39 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stop
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stop
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/Crypto cr_drbd0 /dev/drbd0 /etc/key.cr_drbd0 noauto stop done. RC=0
    Jun 25 11:43:39 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/drbddisk drbd0 stop
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/drbddisk drbd0 stop
    Jun 25 11:43:39 sms2 kernel: drbd0: role( Primary -> Secondary ) 
    Jun 25 11:43:39 sms2 kernel: drbd0: Writing meta data super block now.
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/drbddisk drbd0 stop done. RC=0
    Jun 25 11:43:39 sms2 ResourceManager[28330]: info: Running /etc/ha.d/resource.d/IPaddr 192.184.76.59 stop
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.184.76.59 stop
    Jun 25 11:43:39 sms2 avahi-daemon[3555]: Withdrawing address record for 192.184.76.59 on eth0.
    Jun 25 11:43:39 sms2 IPaddr[28720]: INFO:  Success
    Jun 25 11:43:39 sms2 ResourceManager[28330]: debug: /etc/ha.d/resource.d/IPaddr 192.184.76.59 stop done. RC=0
    Jun 25 11:43:39 sms2 mach_down[28307]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
    Jun 25 11:43:39 sms2 mach_down[28307]: info: mach_down takeover complete for node sms1.
    Jun 25 11:43:39 sms2 heartbeat: [3855]: info: mach_down takeover complete.
    Jun 25 11:43:56 sms2 heartbeat: [3855]: WARN: node sms1: is dead
    Jun 25 11:43:56 sms2 heartbeat: [3855]: info: Dead node sms1 gave up resources.
    Jun 25 11:43:56 sms2 heartbeat: [3855]: info: Link sms1:eth0 dead.
    Jun 25 11:44:01 sms2 /usr/sbin/cron[28765]: (lsva) CMD (${HOME}/scripts/lsvascpcron)
    Jun 25 11:44:09 sms2 hb_standby[28777]: Going standby [foreign].

  8. #8
    dd@home.dk NNTP User

    Default Re: HA DRBD failover fails...

    On 07/06/2012 01:36 PM, richardavilez wrote:
    > Should there however be an expert with an interest,


    suggest you join the [opensuse@opensuse.org] mail list and try
    there...generally there are many more technically capable folks
    there...this forum is users helping users...often coming here are the
    very new to Linux crowd, who don't need to be on the mail lists..

    personally, i won't try to help extend the life of an unsupported and
    insecure system...but someone there _may_, begin here:
    http://en.opensuse.org/openSUSE:Communication_channels

    good luck...and try to get the person making the which-OS-to-use
    decisions to pay attention...as 10.3 has (widely known) security
    vulnerabilities..

    --
    dd

  9. #9
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: HA DRBD failover fails...

    On 2012-07-06 16:56, dd@home.dk wrote:
    > On 07/06/2012 01:36 PM, richardavilez wrote:
    >> Should there however be an expert with an interest,

    >
    > suggest you join the [opensuse@opensuse.org] mail list and try
    > there...generally there are many more technically capable folks
    > there...this forum is users helping users...often coming here are the very
    > new to Linux crowd, who don't need to be on the mail lists..


    The mail list is also mostly users helping users :-)

    But it is a different crowd, and chances are there is someone that
    understands HA. On the other hand, 10.3 is "old" and non maintained. If
    there was a bug, it would have been corrected on current versions.

    --
    Cheers / Saludos,

    Carlos E. R.
    (from 11.4 x86_64 "Celadon" at Telcontar)



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •