Single USB ext. hard disk with NTFS and FAT32 partitions crashed OpenSuse 10.3 kernel 2.6.22

Hi

I am not a Guru but need to figure out why the system crashed. I have neither been able to get hold of the usb model nor the logs from the primary server yet, but
could someone point out any known ‘features’ or clues, and how to remedy this ? I was told that the system behaved alright with 1 FAT32 partition but a few days ago
its removal caused the server to go down aswell.

Richard :frowning:

Will be a bit difficult to help you because, as 10.3 is allready out of support for a long time, almost nobody here will have one to try to recreate the same as you have. Also the mechanism used for the dynamic attachment/mount and unmount/detachement of storage devices has considarably changed since 10.3 (HAL is gone to name only one).

On 06/29/2012 12:46 PM, hcvv wrote:
>
> Will be a bit difficult to help you because, as 10.3 is allready out of
> support for a long time, almost nobody here will have one to try to
> recreate the same as you have. Also the mechanism used for the dynamic
> attachment/mount and unmount/detachement of storage devices has
> considarably changed since 10.3 (HAL is gone to name only one).

I hope that your 10.3 system is not attached to the network. Without support for
10.3, the security holes are also not patched.

On 2012-06-29 19:06, richardavilez wrote:
>
> Hi
>
> I am not a Guru but need to figure out why the system crashed. I have
> neither been able to get hold of the usb model nor the logs from the
> primary server yet, but
> could someone point out any known ‘features’ or clues, and how to
> remedy this ? I was told that the system behaved alright with 1 FAT32
> partition but a few days ago
> its removal caused the server to go down aswell.

There is not enough info, but I’ll venture a guess: the usb disk was
removed without umounting first. This might crash a machine if the opened
files were important to the system.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 06/29/2012 09:38 PM, Carlos E. R. wrote:
> On 2012-06-29 19:06, richardavilez wrote:
>>
>> Hi
>>
>> I am not a Guru but need to figure out why the system crashed. I have
>> neither been able to get hold of the usb model nor the logs from the
>> primary server yet, but
>> could someone point out any known ‘features’ or clues, and how to
>> remedy this ? I was told that the system behaved alright with 1 FAT32
>> partition but a few days ago
>> its removal caused the server to go down aswell.
>
> There is not enough info, but I’ll venture a guess: the usb disk was
> removed without umounting first. This might crash a machine if the opened
> files were important to the system.
>


dd http://tinyurl.com/DD-Caveat http://tinyurl.com/DD-Hardware
http://tinyurl.com/DD-Software
What does DistroWatch write about YOU?: http://tinyurl.com/SUSEonDW

Thanks for the advise. In fact, it’s a high security environment with a dedicated network,
and only authorized and x-rayed ;)personnel can access the server via an usb connection

If I had the option to upgrade the 32bit system, let’s say to SLE 11 x86, would I see big improvements
in performance as well as stability ?

richardavilez wrote:
> Thanks for the advise. In fact, it’s a high security environment with a
> dedicated network,
> and only authorized and x-rayed ;)personnel can access the server via
> an usb connection

It sounds like you should have a spare/test machine that’s not in the
production environment. That would be useful to diagnose the problem. If
you have the opportunity to clone the disk and remove whatever keys or
database is sensitive from the clone, that would be a good start for a
test system if you don’t have an existing one.

> If I had the option to upgrade the 32bit system, let’s say to SLE 11
> x86, would I see big improvements
> in performance as well as stability ?

It depends what the workload is, but you probably wouldn’t see a huge
change in performance. Stability/compatibility for new hardware (and
NTFS?) would be better, but if the old hardware has been stable with the
old software, stability won’t (can’t?) improve much either.

The main benefits of current software are:
(1) fixes of bugs, especially security-related
(2) possibility to report bugs and get them fixed

I am still waiting for the log files. Is there an easy way to easily preempt such a situation ?
Are you saying that the removal without unmounting the pen first could provoke a crash on any OpenSuse OS ?

On 2012-07-03 12:46, richardavilez wrote:

> Thanks for the advise. In fact, it’s a high security environment with a
> dedicated network,
> and only authorized and x-rayed ;)personnel can access the server via
> an usb connection

The worst and more frequent attack vector is attack from inside.

> If I had the option to upgrade the 32bit system, let’s say to SLE 11
> x86, would I see big improvements in performance as well as stability ?

Not necessarily.

However, you mention SLE. Is your system really openSUSE 10.3, or is it SLE
10 SP 3? Because if it is the later, it is still maintained, but this is
the wrong forum for it.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 2012-07-03 13:16, richardavilez wrote:

> Are you saying that the removal without unmounting the pen first could
> provoke a crash on any OpenSuse OS ?

Yes, of course. It depends on what the system is accessing on the external
media, and how it is accessed (what software).


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

It’s OpenSuse 10.3. I just ported some software to SLE 11 x86_64 for a gateway server and I liked the new OS.

Carlos E. R. wrote:
> On 2012-07-03 13:16, richardavilez wrote:
>
>> Are you saying that the removal without unmounting the pen first could
>> provoke a crash on any OpenSuse OS ?
>
> Yes, of course. It depends on what the system is accessing on the external
> media, and how it is accessed (what software).

That comes across a little negative. One could equally as well write:

No, of course. Unless you happen to be running a system mounted from the
device, which is a bizarre thing to do.

Actually I have been waiting for the test servers in order to reproduce the problems.
More likely near the end of this month as they will come from Zürich (Switzerland).
I will take your suggestions into account. Thanks.

On 2012-07-03 14:29, Dave Howorth wrote:
> Carlos E. R. wrote:
>> On 2012-07-03 13:16, richardavilez wrote:
>>
>>> Are you saying that the removal without unmounting the pen first could
>>> provoke a crash on any OpenSuse OS ?
>>
>> Yes, of course. It depends on what the system is accessing on the external
>> media, and how it is accessed (what software).
>
> That comes across a little negative. One could equally as well write:
>
> No, of course. Unless you happen to be running a system mounted from the
> device, which is a bizarre thing to do.

I have seen a system crash on removal of external media. As I said, it
depends on what is on it and what needs it.

For example, it can crash because the log floods with errors.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

It’s really OpenSuse. I just ported some software from HPUX 32bits to SLE 11 x86_64 and I liked the OS, especially how it handles USB media.

Here comes the first part (not complete). The user mounted successfully a 15GB pen. Then after some operations using an admin gui app,
the system goes down. But was it the USB pen or the gui application which forced the shutdown 71 seconds later ?
Did the user accidentally touch the ‘Shutdown’ pushbutton ?
How does one identify the problem from this ?


Jun 25 11:39:55 sms1 lsva_gui[4368]: Neuen Benutzer erstellt: tt-gl-konfig ,
Jun 25 11:39:55 sms1 etnetclient[8429]: xmlrpcBackend::execute Ausfuehrung OK
Jun 25 11:40:01 sms1 syslog-ng[2281]: last message repeated 6 times
Jun 25 11:40:01 sms1 /usr/sbin/cron[5230]: (lsva) CMD (${HOME}/scripts/lsvascpcron)
Jun 25 11:40:01 sms1 etnetclient[8429]: xmlrpcBackend::execute Ausfuehrung OK
Jun 25 11:41:01 sms1 /usr/sbin/cron[5759]: (lsva) CMD (${HOME}/scripts/lsvascpcron)
Jun 25 11:41:06 sms1 etnetclient[8429]: xmlrpcBackend::execute Ausfuehrung OK
Jun 25 11:41:56 sms1 lsva_gui[4368]: -> GUI STATE changed to: sickly
Jun 25 11:41:59 sms1 lsva_gui[4368]: clusterorder-thread stopped
Jun 25 11:42:01 sms1 lsva_gui[4368]: dbmeter-thread stopped
Jun 25 11:42:01 sms1 lsva_gui[4368]: ampel-thread stopped
Jun 25 11:42:01 sms1 lsva_gui[4368]: Lockfile erfolgreich geloescht
Jun 25 11:42:01 sms1 /usr/sbin/cron[6252]: (lsva) CMD (${HOME}/scripts/lsvascpcron)
Jun 25 11:42:10 sms1 hald: unmounted /dev/sdb1 from ‘/media/disk’ on behalf of uid 1002
Jun 25 11:42:11 sms1 etnetclient[8429]: xmlrpcBackend::execute Ausfuehrung OK
Jun 25 11:42:29 sms1 hald: mounted /dev/sdb1 on behalf of uid 1003
Jun 25 11:43:01 sms1 /usr/sbin/cron[6559]: (lsva) CMD (${HOME}/scripts/lsvascpcron)
Jun 25 11:43:09 sms1 kernel: usb 2-3: USB disconnect, address 11
Jun 25 11:43:09 sms1 hald[2315]: forcibly attempting to lazy unmount /dev/sdb1 as enclosing drive was disconnected
Jun 25 11:43:09 sms1 hald: unmounted /dev/sdb1 from ‘/media/disk’ on behalf of uid 0
Jun 25 11:43:13 sms1 shutdown[6718]: shutting down for system halt
Jun 25 11:43:14 sms1 init: Switching to runlevel: 0
Jun 25 11:43:15 sms1 heartbeat: [3831]: info: Heartbeat shutdown in progress. (3831)
Jun 25 11:43:15 sms1 heartbeat: [6816]: info: Giving up all HA resources.
Jun 25 11:43:15 sms1 audispd[3526]: input read: EOF
Jun 25 11:43:15 sms1 auditd[3524]: The audit daemon is exiting.
Jun 25 11:43:15 sms1 kernel: audit(1340617395.495:6): audit_pid=0 old=3524 by auid=4294967295
Jun 25 11:43:15 sms1 kernel: tg3: eth0: Link is down.
Jun 25 11:43:15 sms1 ResourceManager[6862]: info: Releasing resource group: sms1 IPaddr::xxx.xxx.xxx.xxx drbddisk::drbd0 Crypto::cr_drbd0::/dev/drbd0::/etc/key.cr_drbd0::noauto Filesystem::/dev/mapper/cr_drbd0::/var/data::ext3::acl,user_xattr sms_ha
Jun 25 11:43:15 sms1 ResourceManager[6862]: info: Running /etc/ha.d/resource.d/sms_ha stop
Jun 25 11:43:15 sms1 ResourceManager[6862]: debug: Starting /etc/ha.d/resource.d/sms_ha stop
Jun 25 11:43:15 sms1 logger: /etc/ha.d/resource.d/sms_ha stop was called
Jun 25 11:43:15 sms1 kernel: tg3: eth0: Link is up at 100 Mbps, full duplex.
Jun 25 11:43:15 sms1 kernel: tg3: eth0: Flow control is off for TX and off for RX.
Jun 25 11:43:15 sms1 avahi-dnsconfd[3740]: Got SIGTERM, quitting.
Jun 25 11:43:16 sms1 avahi-daemon[3537]: Got SIGTERM, quitting.:q

On 2012-07-03 16:26, richardavilez wrote:

> Here comes the first part (not complete).

Please use code tags to post these things:
Posting in Code
Tags - A Guide

> The user mounted successfully
> a 15GB pen. Then after some operations using an admin gui app,
> the system goes down. But was it the USB pen or the gui application
> which forced the shutdown 71 seconds later ?
> Did the user accidentally touch the ‘Shutdown’ pushbutton ?
> How does one identify the problem from this ?

I can not follow it all because it is not all English.


> *Jun 25 11:42:10 sms1 hald: unmounted /dev/sdb1 from '/media/disk' on behalf of uid 1002*

umount

> *Jun 25 11:42:11 sms1 etnetclient[8429]: xmlrpcBackend::execute Ausfuehrung OK*
> *Jun 25 11:42:29 sms1 hald: mounted /dev/sdb1 on behalf of uid 1003*

mount again :-?
And different user.


> *Jun 25 11:43:09 sms1 kernel: usb 2-3: USB disconnect, address 11*
> *Jun 25 11:43:09 sms1 hald[2315]: forcibly attempting to lazy unmount /dev/sdb1 as enclosing drive was disconnected*

And here the stick was removed, system tries to compensate.

> *Jun 25 11:43:09 sms1 hald: unmounted /dev/sdb1 from '/media/disk' on behalf of uid 0*
> *Jun 25 11:43:13 sms1 shutdown[6718]: shutting down for system halt*

I think here a halt was requested.

> Jun 25 11:43:14 sms1 init: Switching to runlevel: 0
> Jun 25 11:43:15 sms1 heartbeat: [3831]: info: Heartbeat shutdown in progress. (3831)


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

richardavilez wrote:
> Here comes the first part (not complete). The user mounted successfully
> a 15GB pen. Then after some operations using an admin gui app,
> the system goes down. But was it the USB pen or the gui application
> which forced the shutdown 71 seconds later ?
> Did the user accidentally touch the ‘Shutdown’ pushbutton ?
> How does one identify the problem from this ?
>
> …
> Jun 25 11:39:55 sms1 lsva_gui[4368]: Neuen Benutzer erstellt:
> tt-gl-konfig ,
> Jun 25 11:39:55 sms1 etnetclient[8429]: xmlrpcBackend::execute
> Ausfuehrung OK
> Jun 25 11:40:01 sms1 syslog-ng[2281]: last message repeated 6 times
> Jun 25 11:40:01 sms1 /usr/sbin/cron[5230]: (lsva) CMD
> (${HOME}/scripts/lsvascpcron)
> Jun 25 11:40:01 sms1 etnetclient[8429]: xmlrpcBackend::execute
> Ausfuehrung OK
> Jun 25 11:41:01 sms1 /usr/sbin/cron[5759]: (lsva) CMD
> (${HOME}/scripts/lsvascpcron)
> Jun 25 11:41:06 sms1 etnetclient[8429]: xmlrpcBackend::execute
> Ausfuehrung OK
> Jun 25 11:41:56 sms1 lsva_gui[4368]: → GUI STATE changed to:
> sickly
> Jun 25 11:41:59 sms1 lsva_gui[4368]: clusterorder-thread stopped
> Jun 25 11:42:01 sms1 lsva_gui[4368]: dbmeter-thread stopped
> Jun 25 11:42:01 sms1 lsva_gui[4368]: ampel-thread stopped
> Jun 25 11:42:01 sms1 lsva_gui[4368]: Lockfile erfolgreich geloescht
> Jun 25 11:42:01 sms1 /usr/sbin/cron[6252]: (lsva) CMD
> (${HOME}/scripts/lsvascpcron)
> Jun 25 11:42:10 sms1 hald: unmounted /dev/sdb1 from ‘/media/disk’ on
> behalf of uid 1002

> Jun 25 11:42:11 sms1 etnetclient[8429]: xmlrpcBackend::execute
> Ausfuehrung OK

> Jun 25 11:42:29 sms1 hald: mounted /dev/sdb1 on behalf of uid 1003
> Jun 25 11:43:01 sms1 /usr/sbin/cron[6559]: (lsva) CMD
> (${HOME}/scripts/lsvascpcron)

> Jun 25 11:43:09 sms1 kernel: usb 2-3: USB disconnect, address 11
> Jun 25 11:43:09 sms1 hald[2315]: forcibly attempting to lazy unmount
> /dev/sdb1 as enclosing drive was disconnected

> Jun 25 11:43:09 sms1 hald: unmounted /dev/sdb1 from ‘/media/disk’ on
> behalf of uid 0

> Jun 25 11:43:13 sms1 shutdown[6718]: shutting down for system halt

So it didn’t crash, it was shutdown.

At 11:42:10 the drive was unmounted.
At 11:42:29 it was mounted again.
At 11:43:01 some script was run.
At 11:43:09 the USB was disconnected.
At 11:43:09 the drive was unmounted again (as a result of the disconnect).
At 11:43:13 the system was instructed to shutdown.

I can’t see anything there that indicates any problem with the USB drive
or indeed anything else.

What does the user claim happened? What does ${HOME}/scripts/lsvascpcro do?

On 2012-07-03 16:46, Dave Howorth wrote:
> So it didn’t crash, it was shutdown.

There is something I have doubts about, this message:


> Jun 25 11:43:15 sms1 heartbeat: [3831]: info: Heartbeat shutdown in progress. (3831)

Is it that “hearbeat” is being shutdown, or is it that heartbeat initiated
a shutdown, because something it detected or failed a check?


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Carlos E. R. wrote:
> On 2012-07-03 16:46, Dave Howorth wrote:
>> So it didn’t crash, it was shutdown.
>
> There is something I have doubts about, this message:
>
>


>> Jun 25 11:43:15 sms1 heartbeat: [3831]: info: Heartbeat shutdown in progress. (3831)
> 

>
> Is it that “hearbeat” is being shutdown, or is it that heartbeat initiated
> a shutdown, because something it detected or failed a check?

Well, is 11:43:15 earlier or later than 11:43:13 ??