All filesystems vanish (look empty) afer a while.

I have a mysterious filesystem issue with openSUSE on an old machine.
Installation wasn’t very pleasant but worked finally. It boots fine. However after 5 or 10 minutes, all filesystems vanish. Most programs (but not all) cannot be found anymore. “ls” output is empty everywhere. “top” used to work (it was working on a ssh connection that I just lost). Now it complains about unset TERM variable. I was able to set it again to xterm. Now “top” is working, displays the running processes : mysqld, Xorg, icewm (my WM), nothing strange there, nobody is eating up the resources. I have other linux installed on that machine : Ubuntu freezes occasionally, mostly with Gnome, but it seems to be a totally different problem (maybe related to the radeon driver) ; Mandriva works fine so far. The two IBM 120 GB harddrives are quite old but the manufacturer fitness test didn’t find failures. I also checked for bad sectors before formating in openSUSE setup. It doesn’t look like a physical I/O problem. In such a case, it would crash or at least complain about not being able to access the HD. Well … I switched to console but I didn’t get any login prompt and now I cannot switch back to X. I can still ping the machine but of course not ssh to it. SysReq keys (although enabled) doesn’t work. I guess I will have to reboot brutally. Last time it took me about one houre of pressing ‘y’ tp repair the filesystem with Ubuntu. I’m sure I will be back soon. What do you guys think about that situation ? I suspect a kernel issue with that hardware. Have you heard about similar problems ? I’m afraid I will have to reinstall 11.1, which used to work fine, on that machine.

Some details about the hardware :
The (onboard) Promise controller and the SATA controller are not used.
The 2 HD are on the first IDE interface as master and slave.

|----------------------------------------------------------------------------|
| * Mainboard |
| model : A7V266-E | BIOS : ASUS A7V266-E ACPI |
| version : REV 1.xx | rev : 100 date : 11/07/2001 |
| vendor : ASUSTeK | vendor: Award |
| pci-pci : VT8366/A/7 [Apollo KT266/A/333 AGP] |
| pci Host : VT8366/A/7 [Apollo KT266/A/333] |

pci-isa : VT8233 PCI to ISA Bridge
* CPU (1)
model : AMD Athlon™ XP1700+
vendor : Advanced Micro Devices [AMD]
freq : 1100 MHz
clock : 100000000 Hz
arch : 32 bits
----------------------------------------------------------------------------
* Memory
DIMM 1 : DIMM DRAM Synchronous - 512 MB - 64 bits
DIMM 2 : DIMM DRAM Synchronous - 512 MB - 64 bits
----------------------------------------------------------------------------
* Graphic Card
chipset : Radeon RV200 QW [Radeon 7500]
vendor : ATI Technologies Inc
clock : 66000000 Hz
driver : radeon
----------------------------------------------------------------------------

. .

|----------------------------------------------------------------------------|
| * Mass storage controllers |
| : PDC20265 (FastTrak100 Lite/Ultra100) |
| vendor : Promise Technology, Inc. |
| driver : | version: 02 |
| clock : 33000000 Hz |
| : SiI 3512 [SATALink/SATARaid] Serial ATA Controller |
| vendor : Silicon Image, Inc. |
| driver : | version: 01 |

clock : 66000000 Hz
* Harddisks
sda : IC35L120AVV207-0
size : 115 GB
version : V24O
sdb : IC35L120AVV207-0
size : 115 GB
version : V24O
.
disk /dev/sda size : 123.5 GB 123522416640 bytes
heads: 255 sectors/track: 63 15017 cylinders
Device Boot Start End Sectors Id System
/dev/sda1 63 514079 514016 06 FAT16
/dev/sda2 * 514080 3711014 1598467 a6 OpenBSD
/dev/sda3 3711015 6907949 3196934 a5 FreeBSD
/dev/sda4 6907950 241248104 234340154 0f Extended
/dev/sda5 6908013 72501344 65593332 7 NTFS
/dev/sda6 72501408 76710374 4208966 82 swap
/dev/sda7 76710438 85064174 8353736 83 Linux
/dev/sda8 85064238 118270529 33206292 83 Linux
/dev/sda9 118270593 126624329 8353736 83 Linux
/dev/sda10 126624393 134978129 8353736 83 Linux
/dev/sda11 134978193 151476884 16498692 83 Linux
/dev/sda12 151476948 192522959 41046012 83 Linux
/dev/sda13 192523023 241248104 48725082 83 Linux
/dev/sda14 3711015 5767334 2056320 - BSD
/dev/sda15 5767335 6907949 1140615 - BSD
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
disk /dev/sdb size : 123.5 GB 123522416640 bytes
heads: 255 sectors/track: 63 15017 cylinders
Device Boot Start End Sectors Id System
/dev/sdb1 63 21848399 21848336 a5 FreeBSD
/dev/sdb2 21848400 65593394 43744994 a5 FreeBSD
/dev/sdb3 43728930 65593394 21864464 0 Empty
/dev/sdb4 65593395 241248104 175654710 0f Extended
/dev/sdb5 65593458 118286594 52693136 7 NTFS
/dev/sdb6 118286658 124439489 6152832 83 Linux
/dev/sdb7 124439553 156906854 32467302 83 Linux
/dev/sdb8 156906918 163140074 6233156 83 Linux
/dev/sdb9 163140138 169389359 6249222 83 Linux
/dev/sdb10 169389423 173550194 4160772 83 Linux
/dev/sdb11 173550258 236637449 63087192 83 Linux
/dev/sdb12 236637513 241248104 4610592 82 swap
/dev/sdb13 63 2056382 2056320 - BSD
/dev/sdb14 2056383 4112702 2056320 - BSD
/dev/sdb15 4112703 8225342 4112640 - BSD
/dev/sdb16 8225343 20563262 12337920 - BSD
/dev/sdb17 20563263 21848399 1285137 - BSD
/dev/sdb18 43728930 45785249 2056320 - BSD
/dev/sdb19 23920785 25977104 2056320 - BSD
/dev/sdb20 21864465 23920784 2056320 - BSD
/dev/sdb21 45785250 56066849 10281600 - BSD
/dev/sdb22 56066850 60179489 4112640 - BSD
/dev/sdb23 38315025 43728929 5413905 - BSD
/dev/sdb24 28033425 38315024 10281600 - BSD
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .

|----------------------------------------------------------------------------|
| /etc/fstab |
| |
| /dev/sdb6 / ext3 |
| /dev/sdb7 /usr ext4 |
| /dev/sda13 /home ext3 |
| /dev/sdb8 /local ext4 |
| /dev/sdb9 /var ext4 |
| /dev/sdb11 /srv ext4 |
| /dev/sdb10 /tmp ext3 |
| /dev/sda7 /usr/local/mnt/ubuntu ext3 |
| /dev/sda8 /usr/local/mnt/ubuntu/usr ext3 |
| /dev/sda9 /usr/local/mnt/ubuntu/usr/local ext3 |
| /dev/sda10 /usr/local/mnt/ubuntu/var ext4 |
| /dev/sda11 /usr/local/mnt/mandriva ext3 |
| /dev/sda12 /usr/local/mnt/mandriva/usr ext3 |
| /dev/sda1 /usr/local/mnt/c vfat |
| /dev/sda5 /usr/local/mnt/d ntfs |
| /dev/sdb5 /usr/local/mnt/e ntfs |
| /dev/sdb14 /usr/local/mnt/bsd/tmp ufs |
| /dev/sdb23 /usr/local/mnt/bsd/home ufs |
| /dev/sdb22 /usr/local/mnt/bsd/share ufs |
| /dev/sdb13 /usr/local/mnt/freebsd ufs |
| /dev/sdb16 /usr/local/mnt/freebsd/usr ufs |
| /dev/sdb15 /usr/local/mnt/freebsd/var ufs |
| /dev/sdb17 /usr/local/mnt/freebsd/tmp ufs |
| /dev/sda14 /usr/local/mnt/freebsd2 ufs |
| /dev/sdb18 /usr/local/mnt/netbsd ufs |
| /dev/sdb21 /usr/local/mnt/netbsd/usr ufs |
| /dev/sdb20 /usr/local/mnt/openbsd ufs |

/dev/sdb24 /usr/local/mnt/openbsd/usr ufs

Get a new hard drive. If you are having file system problems on Ubunutu then you will have them on Suse and anything else you put on it.

I’ve been monitoring /var/log/messages over ssh :

Jan 17 08:59:24 miriam kernel: 1012.198135] attempt to access beyond end of device
Jan 17 08:59:24 miriam kernel: 1012.198210] sdb7: rw=32, want=34359730520, limit=32467302
Jan 17 08:59:24 miriam kernel: 1012.198229] EXT3-fs error (device sdb7): ext3_get_inode_loc: unable to read inode block - inode=148327, block=4294966314
Jan 17 08:59:24 miriam kernel: 1012.200417] attempt to access beyond end of device
Jan 17 08:59:24 miriam kernel: 1012.200448] sdb9: rw=32, want=27099922480, limit=6249222
Jan 17 08:59:24 miriam kernel: 1012.200467] EXT3-fs error (device sdb9): ext3_get_inode_loc: unable to read inode block - inode=105967, block=3387490309
Jan 17 08:59:24 miriam kernel: 1012.215485] EXT3-fs error (device sdb9) in ext3_reserve_inode_write: IO failure
Jan 17 08:59:24 miriam kernel: 1012.243797] attempt to access beyond end of device
Jan 17 08:59:24 miriam kernel: 1012.243824] sdb7: rw=32, want=34359738328, limit=32467302
Jan 17 08:59:24 miriam kernel: 1012.243842] EXT3-fs error (device sdb7): ext3_get_inode_loc: unable to read inode block - inode=35589, block=4294967290
Jan 17 08:59:24 miriam kernel: 1012.255703] attempt to access beyond end of device
Jan 17 08:59:24 miriam kernel: 1012.255747] sdb9: rw=32, want=27099922480, limit=6249222
Jan 17 08:59:24 miriam kernel: 1012.255766] EXT3-fs error (device sdb9): ext3_get_inode_loc: unable to read inode block - inode=105967, block=3387490309
Jan 17 08:59:24 miriam kernel: 1012.256105] EXT3-fs error (device sdb9) in ext3_reserve_inode_write: IO failure
Jan 17 08:59:24 miriam kernel: 1012.257125] attempt to access betail: error reading `/var/log/messages’: Input/output error

Now I cannot access /var/log/messages anymore or any other file (the system was running for about 10 minutes).

I also tried to change filesystems, reformat ext4 partitions in ext3 and reiserfs in ext4, also deleted and recreated partitions with different sizes between installation attempts. It didn’t make any difference.

I don’t have file system problems on Ubuntu, neither did I on 11.1. And I can mount these openSUSE filesystems in Ubuntu or Mandriva for more than 10 minutes. I also checked the harddrives several times and didn’d find any failure. However I don’t know how reliable the IBM drive fitness test is. I checked for bad sectors before formating the partitions in openSUSE. I don’t know what the openSUSE setup would do if it had found any. Fedora would refuse to install (happended to me once with a broken HD).

On Sun, 17 Jan 2010 17:26:01 +0000, please try again wrote:

> I also tried to change filesystems, reformat ext4 partitions in ext3 and
> reiserfs in ext4, also deleted and recreated partitions with different
> sizes between installation attempts. It didn’t make any difference.

Looks like an imminent hardware failure to me. I’ve seen this kind of
behaviour before - in the weeks leading up to a head crash that rendered
the drive inoperable.

Get a new drive and then get as much data off the drive as you can before
it dies.

Jim


Jim Henderson
openSUSE Forums Moderator

Looks like you have a couple of hard drives. You did not say which drive holds which Linux OS. But it still feels like a hardware problem to me. I use a commercial program called Spinrite to do low level scans on drives, but you should be able to get a free one from the drive manufacturer’s site.

I use the manufacturer’s program for low level scans. That’s the first thing I do. openSUSE is on the second drive, but also mounts /home from the first drive. Ubuntu and Mandriva are on the first drive but also use /tmp and /srv from the second drive. All systems swap on both drives.

I didn’t have problems whith 11.1 on that harddrive. I know, they are also (at least) two problems, unrelated and occuring at the same time, like an harware failure during a system update.

BTW SysReq reboot is ok after changing the default value (176) to 1.

i) Make a backup
ii) Make a second backup, just in case

When you are having the problem, is the output of ‘mount’ different from the normal state?

From what I can tell/guess, it seems possible that your partition table doesn’t quite correspond with the physical layout of the drives (well, the numbers on SDB7/SDB9 look way out).

Check that the BIOS isn’t getting reset because the battery is dying…that could explain a lot if sometimes you were booting with the disk adressing mode set up incorrectly.

And what about an fsck? When was that last done?

I don’t worry about that. Important stuff if any is on a Raid fileserver (running openSUSE 11.1).

When you are having the problem, is the output of ‘mount’ different from the normal state?

Nope. mount output is the same. However umount is missing, as most commands cannot be found anymore.

From what I can tell/guess, it seems possible that your partition table doesn’t quite correspond with the physical layout of the drives (well, the numbers on SDB7/SDB9 look way out).

I use the same kind of layout or even a more sophisticated one on a dozen machines running Linux and Unix. In an attempt to fix the problem, I recreated all the partitions from sdb7 to the end before installing openSUSE, since I wanted to resize sdb7 becoming to small for /usr with 11.2 . I created the partitions with gparted under Ubuntu. Normally I would use Partition Magic under DOS (for historical reasons!), but my old version can only create but not delete existing ext3 partitions anymore. I guess it cannot read them since the default inode size has changed (I had the same problem with Unix and I had to patch some kernels in order to mount ext3 partitions).

Check that the BIOS isn’t getting reset because the battery is dying…that could explain a lot if sometimes you were booting with the disk adressing mode set up incorrectly.

The BIOS looks OK. This machine must not be that old and seems to use only LBA addressing (doesn’t offer the choice of normal or large mode, as older mainboards did). I replaced the battery about two years ago.

And what about an fsck? When was that last done?

When this problem occures, I first reboot Ubuntu and do a manual fsck. It repares the filesystem when necessary (sometimes it is not).

missing files and directories are a sure sign of file system corruption. Since no one else is seeing this there are only a couple of reason it is occurring.

  1. Hard drive mechanical problems
  2. Power problems
  3. bad install ie corrupted files on the DVD/CD

I booted openSUSE in runlevel 3, ran top over ssh as well as tail -f /var/log/messages on another terminal. Nothing happended for about 2 hours. Since it became boring I decided to start X, neither Gnome nor KDE but icewm (started with startx). As I expected, the filesystem on sdb7, sdb8 and sdb9 became unreadable after 10 minutes. However (that time) the / filesystem is still there, so all commands in /bin are still available and I could run “df -hl” which produces this fascinating output :

Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb6             2.9G  349M  2.5G  13% /
udev                  504M  416K  504M   1% /dev
/dev/sdb7              16T   16T  8.4G 100% /usr
/dev/sda13             16T   16T   23G 100% /home
/dev/sdb8              16T   16T  2.8G 100% /local
/dev/sdb9              16T   16T  2.6G 100% /var
/dev/sdb11             64Z   64Z   12G 100% /srv
/dev/sdb10             16T   16T  2.0G 100% /tmp
/dev/sda11            7.8G  746M  6.7G  10% /local/mnt/mandriva
/dev/sda12             20G  5.8G   13G  32% /local/mnt/mandriva

All filesystems excepted sda11 and sda12 (which still look OK) were mounted at boot time.

The interesting thing is that sda13 (/home) which is on the first harddisk - as its name indicates - and only contains user stuff and config files also looks empty and shows a 16T size (!)

Do you still think it’s an hardware or even a partitioning issue whith my second harddisk?

As I already mentionned I didn’t have such problems before with 11.1 on the same harddisk, nor do I have filesystems errors under Ubuntu and Mandriva, which mount sdb10 (/tmp) and sdb11 (/srv) at boot time. I have been running Mandriva in the past 3 or 4 days with Gnome (!) without any filesystem error.

mount output still looks fine.
“ls /boot” output looks empty, althoug “/boot” is not on a separate partition and “/” still looks reasonnable.

I guess the best thing to do would be to reinstall 11.1 on the same partitions and see if the problem persists. If it does not, I would say that this openSUSE kernel doesn’t like the IDE controller on that mainboard. Before doing that, I’ll try to plug the two harddisks on the Promise onboard controller and see if it makes a difference …

If the file systems were corrupted, they would remain corrupted (or get repared) while rebooting another Linux. But it is not necessary the case. I just rebooted Ubuntu, checked the filesystems and they were clean. They might or might not get corrupted when I have to brutally reset the machine (btw SysReq does NOT work finally) , but by the time as they appear empty under openSUSE, I guess they are not corrupted.

Did you try a repair? Before doing it though do a media check.

Your df check is just too odd. If the FS showing ok from another OS I guess that it is not really a FS problem but the kernel is losing track of stuff some how.

What does df say if you run it first thing after a boot. preferably to run level 3?

What is being described sounds very similar to this…

Bug#550562: Blob firmware loader corrupts filesystem - Linux Archive

You got it! That’s exactly the situation happening there. That explains why the trouble began about 10 minutes after running X: xscreensaver and gslideshow started at this point.

Who wrote in this forum that there were no trouble with ATI cards on openSUSE ?! Ati cards are a nightmare under 11.2. I gave up trying to get something else but a black screen on an iMac with Radeon HD 2400 XT and finally reinstalled 11.1. Now on that old old computer with that old old All In Wonder Radeon 7500 (RV200), I end up with a garbage filesystem like I’ve never seen on any Linux before.

Thanks a lot for pointing to this explanation! That is an awesome bug report too.

Whooo that is really nasty.:open_mouth:

Note to self never ever buy ATI cards.

There is more here did look briefly upstream, but didn’t see it but suspect it is there…

#550977 - radeon DRI driver corrupts memory - Debian Bug report logs](http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=550977) At the bottom you’ll find altogether 4 bugs merged as one.

This is slightly earlier and seemed progressive at the time, being on the Debian list might be worth raising on the Bugzilla.

As for finding google searching terms with df 16tb hit it, I just figured it had to be significant.

Any way glad to of enlightened you but without a bug report you may find you(& others) may be waiting till the upstream fix is mainstream(If it is there). Noticed there had been chats on #radeon implying they could see where the potential problem may lie.

I did a fresh install of opensuse 11.2 last night. I glided through setup & detected all my hardware correctly (except my printer Canon Pixma MS850). No isued (yet) with my ATI Radeon 7500 maybe because I disabled the screensaver. I don’t like screen savers & prefer my screen to power off after 30min of inactivity.

Wow…I read the bug report #550562 & certainly cannot be ignored. Although I do not use screensavers, what happens if other apps invoke openGL? I would be toast. Hmmm…perhaps best for me to go back to the drawing board…

unless I’m mistaken using the prop driver should fix it, those bug reports seemed based on the radeon and maybe radeonhd, so I would of thought fglrx.

This doesn’t seem that common and seems tricky to distinguish though I guess some other ATI problems are masquerading as something else. But I didn’t find a bug report upstream or either on the novell one.

This is specifically related to file system corruption and what I see as common is the df showing 16tb when it shouldn’t.(glxgears seems to trigger it as well going on the other link) The other bug report highlights this further but last time I looked neither had moved on and if they had it was on things like test 2.6.33 which is still very bleeding edge. So if people want it fixing in distro they’ll need to bug report.