System lock-up during excessive disk I/O

Has anyone else had this problem? At times, the disk I/O will go through the roof and cause the system to slow to a crawl. X becomes really slow and can go completely unresponsive, dropping all keyboard input, preventing me from restarting X. Sometimes I will switch to a virtual terminal to log in and run iotop to check what is using all the I/O, but it will be so I/O bound that the login times out. Sometimes I do manage to get in and can run iotop and then it is a matter of waiting 10 minutes or so for python to load up. Usually by this time, whatever was causing the slowdown has finished and so I don’t even know what the problem is. Other times it never recovers (I once left it over night, since it was at the end of my work day and just went home)

This is my work computer so it is very inconvenient to have to wait for this I/O blockage or lose all my work rebooting. I thought it was due to swapping, so I disabled swap, but that hasn’t helped. I’m running the cfq I/O scheduler, but I’ve tried both deadline and noop.

Any ideas?

Also, is there a way to disable the login timeout so I can at least get logged in to the box from the console when this happens?

Has anyone else had this problem? At times, the disk I/O will go through the roof and cause the system to slow to a crawl. X becomes really slow and can go completely unresponsive, dropping all keyboard input, preventing me from restarting X. Sometimes I will switch to a virtual terminal to log in and run iotop to check what is using all the I/O, but it will be so I/O bound that the login times out. Sometimes I do manage to get in and can run iotop and then it is a matter of waiting 10 minutes or so for python to load up. Usually by this time, whatever was causing the slowdown has finished and so I don’t even know what the problem is. Other times it never recovers (I once left it over night, since it was at the end of my work day and just went home)

This is my work computer so it is very inconvenient to have to wait for this I/O blockage or lose all my work rebooting. I thought it was due to swapping, so I disabled swap, but that hasn’t helped. I’m running the cfq I/O scheduler, but I’ve tried both deadline and noop.

Any ideas?

Also, is there a way to disable the login timeout so I can at least get logged in to the box from the console when this happens?

So you did not tell us much about your computer hardware, but the last time I had this happen was this very winter with my Dell work laptop. After using it outside in 24 degrees F weather, the hard drive went bad. I attributed the slow downs and lock ups to a bad sector or two or three. A new hard drive did fix the problem. This was a Dell E6400 and a 240 GB WD hard drive if that matters.

Thank You,

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I’ve seen similar things, though seldom this bad. The utilities that go
through and index all of your data are usually my first suspect which is
why I disable them on my boxes. Any regularity
(seconds/minutes/hours/days) to the issue’s frequency? You may want to
setup some kind of cron job to run once per minute to dump output to a
file so you do not need to do anything to get output.

Good luck.


Want to yell at me in person?
Come to BrainShare 2011 in October: http://tinyurl.com/brainshare2011
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJOTFaZAAoJEF+XTK08PnB5m8QP+wf9ANVc02IyeybBsw8rpEAR
LV6+YVf2slgyQuYrTM1YGFs3lPJ5xtDakufpmcUDCoM+eVRUOCMiTBCfWvY0+vFd
6srcR2kaigFkDoFTwHcU4xXulrJGS8UV3drnUfJVK5X1r98zyz/MGapQ4vGFiwBp
6ihiG0pCL0ZEVAEL8BygNnWJT6MmYERogaSuqHCKMd20qaHXdAeJrbqHp0p42977
IzBSnAuX3/UFCcsg5ha3XnFYEgLrxh0QlbwXirlHZVBV/ExhQ2vDNWRuKclb5u9G
7loGrcMcO3stpXbYLqCXC+7DibXxFeQIvBKXuqU38zv1kvy7/CI0bpabFq4I5m00
qvabH10KroH9Ud6ADgWA0KorbjHXsxkl8jyH4ZZzJqE5GkL+gkPDrZToUNjIM6ve
Tl6yOhBGQQ7xp3GzA/7BVG8HaEELPDJO69NOfPIG8Y6nbiM9ZSEEmGYcN3fjy6lX
+izlJdbVhWbpVNYfZBMWautvk/0GiWKuwrXXq1+7onkVkXo/2wEe4cSNpUWHGLfZ
T6LguBYBRsYbbezfTgK47+I6NoTV58mzi1Glsbl4H7iKZQ6nUMGCEBwyeLlHOTE5
yvaES9UBVjUSBI3vMH+EjCI5bg0nfuO7GQB9flrNRd/1h0YsszA03NRjXjY+pfBt
pVKU6ichhLUy8TPv2LzX
=3LYw
-----END PGP SIGNATURE-----

On 08/17/2011 07:06 PM, jdmcdaniel3 wrote:
>
>> Has anyone else had this problem? At times, the disk I/O will go through
>> the roof and cause the system to slow to a crawl. X becomes really slow
>> and can go completely unresponsive, dropping all keyboard input,
>> preventing me from restarting X. Sometimes I will switch to a virtual
>> terminal to log in and run iotop to check what is using all the I/O, but
>> it will be so I/O bound that the login times out. Sometimes I do manage
>> to get in and can run iotop and then it is a matter of waiting 10
>> minutes or so for python to load up. Usually by this time, whatever was
>> causing the slowdown has finished and so I don’t even know what the
>> problem is. Other times it never recovers (I once left it over night,
>> since it was at the end of my work day and just went home)
>>
>> This is my work computer so it is very inconvenient to have to wait for
>> this I/O blockage or lose all my work rebooting. I thought it was due
>> to swapping, so I disabled swap, but that hasn’t helped. I’m running
>> the cfq I/O scheduler, but I’ve tried both deadline and noop.
>>
>> Any ideas?
>>
>> Also, is there a way to disable the login timeout so I can at least get
>> logged in to the box from the console when this happens?
>>
>
> So you did not tell us much about your computer hardware, but the last
> time I had this happen was this very winter with my Dell work laptop.
> After using it outside in 24 degrees F weather, the hard drive went bad.
> I attributed the slow downs and lock ups to a bad sector or two or
> three. A new hard drive did fix the problem. This was a Dell E6400 and
> a 240 GB WD hard drive if that matters.

You should switch to the CTRL-ALT-F10 log console to see if disk errors are
being logged. CTRL-ALT-F7 to get back to the GUI.

Another possibility is that your computer is paging? The ‘free’ command will
show that.

On Wed, 17 Aug 2011 23:36:03 +0000, adler187 wrote:

> Has anyone else had this problem? At times, the disk I/O will go through
> the roof and cause the system to slow to a crawl. X becomes really slow
> and can go completely unresponsive, dropping all keyboard input,
> preventing me from restarting X. Sometimes I will switch to a virtual
> terminal to log in and run iotop to check what is using all the I/O, but
> it will be so I/O bound that the login times out. Sometimes I do manage
> to get in and can run iotop and then it is a matter of waiting 10
> minutes or so for python to load up. Usually by this time, whatever was
> causing the slowdown has finished and so I don’t even know what the
> problem is. Other times it never recovers (I once left it over night,
> since it was at the end of my work day and just went home)
>
> This is my work computer so it is very inconvenient to have to wait for
> this I/O blockage or lose all my work rebooting. I thought it was due to
> swapping, so I disabled swap, but that hasn’t helped. I’m running the
> cfq I/O scheduler, but I’ve tried both deadline and noop.
>
> Any ideas?
>
> Also, is there a way to disable the login timeout so I can at least get
> logged in to the box from the console when this happens?

Usually when I see this kind of problem it’s because the system load has
gone through the roof, which is what it sounds like is happening to you,
too.

The last time this happened, I figured I’d use the “Magic SysRq” key to
force the system down relatively cleanly. You have to enable this in
advance (it can be done through YaST). I started the ‘normal’ way of
doing a shutdown this way, with Alt+SysRq+S (to sync the disks before
repeating with U and F - for “Unmount” and “Power Off”), and syncing the
discs seemed to resolve the issue (I didn’t have to finish the
shutdown).

You might give that a try and see if that works for you. I only happened
upon it by accident once, and it may well have been a complete
coincidence, but the system had wedged up pretty good for over 30 minutes
(with VMware running, I might add), and I was quite surprised that it
kicked free right after doing that.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

jdmcdaniel3 wrote:

> So you did not tell us much about your computer hardware, but the last
> time I had this happen was this very winter with my Dell work laptop.
> After using it outside in 24 degrees F weather, the hard drive went bad.
> I attributed the slow downs and lock ups to a bad sector or two or
> three. A new hard drive did fix the problem. This was a Dell E6400 and
> a 240 GB WD hard drive if that matters.

There’s probably a simpler explanation for your case. Going from a cold
environment to a warmer one presents the risk of condensation on/in
equipment. You see it more often on camera lenses until you get to the
REALLY cold climates but I’ve seen condensation on a laptop screen just
taking a laptop that was cold-soaked in the air conditioned room out to the
pool area in a Florida hotel. I guess the moral is to watch the girls while
the lappy warms up :wink:

That temperature transiton (cold to hot) is a real problem in some climates.


WHonea

On 08/18/2011 01:36 AM, adler187 wrote:
> Any ideas?

personally, i’d like to know a little more info, like:

  1. what operating system/version?

  2. desktop environment (if used)/version?

  3. file system in use, number of drives, RAID?

  4. the conditions of this symptom:
    a. are they totally random, no cause-effect detected (and, it has done
    that from the first day of this install)?
    b. or, beginning some days/weeks/months after the initial install and
    some now forgotten routine system update/patch
    c. or, after a recent kernel update (had one last/this week)
    d. or, after installing software from an openSUSE repo with any of
    these strings in its name: factory, unstable, playground, tumbleweed,
    evergreen?

  5. you say this is your “work computer” but tell us a little more, is it
    a desktop or rackmounted server…are there heavy use applications,
    database apps working in the background…etc…is it networked with
    other operating systems…do you have any reason to suspect possible
    networking problems (collisions, etc) eating cycles [does ntop show
    anything interesting?]

but, without any the above i could offer [these are offered in addition
to the other suggestions offered thus far–which are very possibly
helpful]:

-re-enable swap

-any clues in /var/log/messages?

-install atop, default installed it will log top like ‘snapshot’ each
ten minutes…adjustable to more/less often (could serve as AB’s
suggested cron job)

-use YaST/zypper to uninstall preload

-does this ‘lock-up’ always occur when using Flash in any browser
window? [if you are using 64bit and suspect Flash, see
http://tinyurl.com/3fe5aw7

-(i don’t know the answer to your login timeout question, but) you can
get around any login time out by logging in and starting an iotop
instance prior to the crazy attack (you could probably do that
programatically at each boot, until this prob is solved)

finally, it is highly likely i can’t give you the answer you seek even
if i had all the above questions answered…and the the terminal
output from


cat /etc/SuSE-release
uname -a
zypper lr -d
df -h
cat /proc/partitions
cat /etc/fstab
mount
sudo /sbin/fdisk -l
sudo cat /boot/grub/menu.lst

copied and the output pasted back to this thread using the instructions
here: http://goo.gl/i3wnr

but, someone else (a real guru) might have a spot on answer then.


DD Caveat
openSUSE®, the “German Engineered Automobile” of operating systems!

One possible explanation could be that you have hard disks with the advanced format 4K blocks and did not align partitions. But to know for sure we would have to know what exactly the brand and model number of your hard disk is.

openSUSE 11.4, though I’ve had issues with 11.3 and 11.2 before I upgraded

KDE 4.7

Root is ext3, home is btrfs.

These seem to be random and have been going on for some time. Certainly if I have more open applications, it can become more problematic. At first I had 2GiB of memory and would happen quite frequently once the memory “filled up.” When I say it filled up, I mean to say that free showed very little unused memory, but disk cache would be accounting for 800-900MiB of memory. From my understanding of the disk cache, any new request for memory should just take from the disk cache if there is no unused memory.

I talked to the IT support guy here at work about my disk thrashing problem and he just gave me more ram - which was nice, but didn’t solve the underlying problem, just made it less frequent.

I only run latest openSUSE releases, no factory or tumbleweed, etc… but I do run latest KDE release from repos.

This is my work laptop and primary workstation.
It’s a Lenovo Thinkpad T400.
Core 2 Duo CPU P8600 2.40GHz
4GiB DDR3
160GB Fujitsu 7200 rpm hd

The reason I disabled it was because I thought the swapping was contributing to the problem, but perhaps that is also just a symptom of the underlying problem.

Not that I can tell.

I couldn’t find atop in the repos and the website shows it requires a kernel patch (though this is rather old information 2.6.33)

I’ll give removing preload a try

No, but thanks for reminding me to get the 64 bit flash 11 beta installed.

cat /etc/SuSE-release


openSUSE 11.4 (x86_64)
VERSION = 11.4
CODENAME = Celadon

uname -a

Linux kadler 2.6.37.6-0.7-desktop #1 SMP PREEMPT 2011-07-21 02:17:24 +0200 x86_64 x86_64 x86_64 GNU/Linux

zypper lr -u


#  | Alias                           | Name                                                       | Enabled | Refresh | URI                                                                                                                                                                                                                                                                               
---+---------------------------------+------------------------------------------------------------+---------+---------+--------------------------------------------------------------------------------------------                                                                                                                                                                                       
 1 | Home_1                          | Home                                                       | Yes     | No      | http://download.opensuse.org/repositories/home%3a/adler187/openSUSE_11.4                                                                                                                                                                                                          
 2 | KDE_1                           | KDE                                                        | Yes     | No      | http://download.opensuse.org/repositories/KDE:/Release:/47/openSUSE_11.4/                                                                                                                                                                                                         
 3 | Kernel_stable                   | Kernel Builds for stable (openSUSE_11.4)                   | No      | No      | http://download.opensuse.org/repositories/Kernel:/stable/openSUSE_11.4/                                                                                                                                                                                                           
 4 | Libdvdcss_repository            | Libdvdcss repository                                       | Yes     | No      | http://opensuse-guide.org/repo/11.4/                                                                                                                                                                                                                                       
 5 | LibreOffice:Unstable            | LibreOffice:Unstable                                       | No      | No      | http://download.opensuse.org/repositories/LibreOffice:/Unstable/openSUSE_11.4/                                                                                                                                                                                                    
 6 | Packman_Repository              | Packman Repository                                         | No      | No      | http://packman.unixheads.com/suse/11.4/                
 7 | Sub_Pixel_Hinting_Enablement    | Sub Pixel Hinting Enablement                               | No      | No      | http://opensuse-community.org/subpixel/openSUSE_11.4                                                                                                                                                                                                                              
 8 | Updates_for_openSUSE            | Updates for openSUSE                                       | Yes     | No      | http://download.opensuse.org/update/11.4/                                                                                                                                                                                                                                         
 9 | X11_XOrg                        | X.Org development (openSUSE_11.4)                          | No      | No      | http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_11.4/                                                                                                                                                                                                                
10 | devel_languages_perl            | perl modules (openSUSE_11.4)                               | Yes     | No      | http://download.opensuse.org/repositories/devel:/languages:/perl/openSUSE_11.4/            
11 | devel_languages_ruby_extensions | Ruby Extensions (openSUSE_11.4)                            | Yes     | No      | http://download.opensuse.org/repositories/devel:/languages:/ruby:/extensions/openSUSE_11.4/
12 | filesystems                     | Filesystem tools and FUSE-related packages (openSUSE_11.4) | Yes     | No      | http://download.opensuse.org/repositories/filesystems/openSUSE_11.4/                                 
13 | openSUSE:11.4:Update            | openSUSE:11.4:Update                                       | Yes     | Yes     | http://download.opensuse.org/update/11.4/                                                  
14 | openSUSE_11.4                   | openSUSE_11.4                                              | No      | No      | http://download.opensuse.org/repositories/network:/samba:/STABLE/openSUSE_11.4             
15 | openSUSE_Contrib                | openSUSE Contrib                                           | Yes     | No      | http://download.opensuse.org/repositories/openSUSE%3a/11.4%3a/Contrib/standard/            
16 | openSUSE_Debug                  | openSUSE Debug                                             | Yes     | No      | http://download.opensuse.org/debug/distribution/11.4/repo/oss/                             
17 | openSUSE_OSS                    | openSUSE OSS                                               | Yes     | No      | http://download.opensuse.org/distribution/11.4/repo/oss/                                   
18 | openSUSE_Source                 | openSUSE Source                                            | Yes     | No      | http://download.opensuse.org/source/distribution/11.4/repo/oss/                            
19 | openSUSE_non-OSS                | openSUSE non-OSS                                           | Yes     | No      | http://download.opensuse.org/distribution/11.4/repo/non-oss/                               
20 | packman                         | Packman repository (openSUSE_11.4)                         | Yes     | No      | http://packman.inode.at/suse/openSUSE_11.4                              

df -h


Filesystem            Size  Used Avail Use% Mounted on
rootfs                 20G   18G  1.7G  92% /
devtmpfs              1.9G  248K  1.9G   1% /dev
tmpfs                 1.9G  812K  1.9G   1% /dev/shm
/dev/sda1              20G   18G  1.7G  92% /
/dev/sda2             128G   48G   77G  39% /home

cat /proc/partitions


major minor  #blocks  name

   8        0  156290904 sda
   8        1   20973568 sda1
   8        2  133217280 sda2
   8        3    2099032 sda3

cat /etc/fstab


/dev/disk/by-id/ata-FUJITSU_MHZ2160BJ_G1_K84BT8B27AJT-part1 /                    ext3       defaults              1 1
/dev/disk/by-id/ata-FUJITSU_MHZ2160BJ_G1_K84BT8B27AJT-part2 /home                btrfs      noatime               1 2
#/dev/disk/by-id/ata-FUJITSU_MHZ2160BJ_G1_K84BT8B27AJT-part3 swap                 swap       defaults              0 0

proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0

mount


devtmpfs on /dev type devtmpfs (rw,relatime,size=1968136k,nr_inodes=492034,mode=755)
tmpfs on /dev/shm type tmpfs (rw,relatime)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000)
/dev/sda1 on / type ext3 (rw,relatime,errors=continue,commit=15,barrier=1,data=ordered)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
/dev/sda2 on /home type btrfs (rw,noatime)
securityfs on /sys/kernel/security type securityfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
gvfs-fuse-daemon on /home/kadler/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,relatime,user_id=1000,group_id=100)
/etc/auto.gsa on /gsa type autofs (rw,relatime,fd=6,pgrp=23974,timeout=600,minproto=5,maxproto=5,indirect)
/etc/auto.gsaro on /gsaro type autofs (rw,relatime,fd=12,pgrp=23974,timeout=600,minproto=5,maxproto=5,indirect)

sudo /sbin/fdisk -l


Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312581808 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x64656469

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048    41949183    20973568   83  Linux
/dev/sda2        41949184   308383743   133217280   83  Linux
/dev/sda3       308383744   312581807     2099032   82  Linux swap / Solaris


sudo cat /boot/grub/menu.lst


# Modified by YaST2. Last modification on Thu Aug  4 12:17:53 CDT 2011
# THIS FILE WILL BE PARTIALLY OVERWRITTEN by perl-Bootloader
# Configure custom boot parameters for updated kernels in /etc/sysconfig/bootloader

default 0
timeout 8
gfxmenu (hd0,0)/boot/message
##YaST - activate

###Don't change this comment - YaST2 identifier: Original name: linux###
title Desktop -- openSUSE 11.4 - 2.6.37.6-0.7
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.37.6-0.7-desktop root=/dev/disk/by-id/ata-FUJITSU_MHZ2160BJ_G1_K84BT8B27AJT-part1 resume=/dev/disk/by-id/ata-FUJITSU_MHZ2160BJ_G1_K84BT8B27AJT-part3 splash=silent quiet showopts vga=0x367
    initrd /boot/initrd-2.6.37.6-0.7-desktop

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- openSUSE 11.4 - 2.6.37.6-0.7
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.37.6-0.7-desktop root=/dev/disk/by-id/ata-FUJITSU_MHZ2160BJ_G1_K84BT8B27AJT-part1 showopts apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe vga=0x367
    initrd /boot/initrd-2.6.37.6-0.7-desktop

Sometimes during these lockups X stops responding to keyboard commands and I can’t switch to VT10 or any other VT for that matter. When it isn’t that bad and I can get to VT10, I don’t really see any problems being logged.

I’ll give it a shot if I can remember it. I have the Magic SysReq keys enabled, but I can never remember what they are. Of course I could look it up, but that would require my computer to not be locked up…

On 08/18/2011 11:36 AM, adler187 wrote:
>> 3. file system in use, number of drives, RAID?
>>
> Root is ext3, home is btrfs.

Why btrfs? It is still a work in progress. If you have any spare disk left, try
putting /home there on an ext3 or 4 partition. If no spare space, backup,
reformat to ext{3,4}, and restore /home.

On Thu, 18 Aug 2011 17:06:03 +0000, adler187 wrote:

> hendersj;2375913 Wrote:
>>
>>
>> The last time this happened, I figured I’d use the “Magic SysRq” key to
>> force the system down relatively cleanly. You have to enable this in
>> advance (it can be done through YaST). I started the ‘normal’ way of
>> doing a shutdown this way, with Alt+SysRq+S (to sync the disks before
>> repeating with U and F - for “Unmount” and “Power Off”), and syncing
>> the
>> discs seemed to resolve the issue (I didn’t have to finish the
>> shutdown).
>>
>> You might give that a try and see if that works for you. I only
>> happened
>> upon it by accident once, and it may well have been a complete
>> coincidence, but the system had wedged up pretty good for over 30
>> minutes
>> (with VMware running, I might add), and I was quite surprised that it
>> kicked free right after doing that.
>
> I’ll give it a shot if I can remember it. I have the Magic SysReq keys
> enabled, but I can never remember what they are. Of course I could look
> it up, but that would require my computer to not be locked up…

I find them to be pretty easy to remember, but the important one to try
in this instance (if it works) is S - for “Sync disks”. :slight_smile:

I’ll be interested to know if that helps in your case. I’ve only got a
sample size of 1, so it may well have been just a coincidence.

Jim

Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

It was ext3 in the beginning (originally installed as openSUSE 11.2). At some point I removed the windows partition I left just in case I needed some windows utility for work and merged that partition with the /home partition. At that point I think I wanted to try btrfs and see if that might help my problem. I still left root as ext3 just because I’m not that crazy and this is my work machine, after all. I’m not sure why I thought btrfs would help disk i/o.

I guess I’m just frustrated by all this. I remember running older versions of SUSE (I started with SuSE 8.0, and Mandrake 8.2 before that) on much slower hardware and never had mouse lag, X lockups, etc… during high disk I/O. I thought that linux was supposed to be highly responsive even under high load.

On 08/18/2011 01:06 PM, adler187 wrote:
>
> I guess I’m just frustrated by all this. I remember running older
> versions of SUSE (I started with SuSE 8.0, and Mandrake 8.2 before that)
> on much slower hardware and never had mouse lag, X lockups, etc…
> during high disk I/O. I thought that linux was supposed to be highly
> responsive even under high load.

It is, but if there is any slowdown in disk I/O for any reason, responsive is
gone. I once installed a new disk that had 4K sectors without observing the
partition alignment restrictions. Until I fixed the problem, that system would hang.

> I’ll give it a shot if I can remember it. I have the Magic SysReq keys
> enabled, but I can never remember what they are. Of course I could look
> it up, but that would require my computer to not be locked up…

heh! some folks write their password on the monitor…on mine was


REISUB

which is a proven effective and safe way to use Magic to safely shutdown…

but, after reading what Jim wrote (thanks) i added to mine and it now reads:

S REISUB

with the space between S and R i hope to remember to wait and see if
that sync alone will clear up whatever it is that has me dipping into
Merlin’s bag of tricks!!


DD
openSUSE®, the “German Engineered Automobile” of operating systems!

On 08/18/2011 06:36 PM, adler187 wrote:
> 4 | Libdvdcss_repository | Libdvdcss repository | Yes
> Filesystem Size Used Avail Use% Mounted on
> rootfs 20G 18G 1.7G 92% /

continuing with my ideas (not to exclude the ideas of others–especially
those of real gurus like Larry, voodoo, Jim, Will, Jim, James and AB) i
offer:

WAIT for some others to comment on my post below…maybe i got it all
wrong! and before you do anything i recommend see my sig caveat!

then: use YaST to add a new test user, log out and back in as that test
user and do all you can to make it lock-up again…if it doesn’t you
have something in the config files of your old home that is bad news (i
guess probably .kde4…which you can confirm by, while logged in as the
new user, do the following in a terminal to rename your old kde4 with this:


mv /home/[your_old_ID]/.kde4 /home/[your_old_ID]/.kde4.BAK

if that does the trick, great…just go back to it but also follow the
info below to open up more free space in your root partition and tend to
the wayward repos…

if renaming .kde4 does not fix it then probably you will need to either
completely format and reinstall or methodologically abandon your old
home and move data from it to your new home…

otherwise, most of the things you gave as a result of my questions
earlier looks pretty good, except:

  1. i too recommend not to use btrfs…use it on your sand box to TEST,
    don’t use it on your productions machine…yes, i read you tried to see
    if if would fix your problem…it didn’t so dump it…Larry advised use
    either ext3 or 4, i would advise ext4 for both root and home (but, only
    because mine is that way and so far, knock on wood, i’ve had not
    significant problems…

now, i do not know if that will help this problem or not…it might, it
might not…but i feel good advising you to not run btrfs yet…read
around and decide for yourself…

  1. at the least i would disable your repo number 4
    “Libdvdcss_repository” because it could introduce incompatible versions
    of libdvd…further i try to hold very close to the excellent advice
    given in the paragraph beginning with “IMPORTANT” here
    http://tinyurl.com/33qc9vu, which i see you do not…it is your machine,
    but i can tell you that i seldom run into problems which many other
    folks have by opening up to tons of potentially incompatible
    packages…routinely i have only four repos enabled and only two
    refreshing (packman and update)…but, if you like your production
    machine out there on the edge, that is your machine, your job, and your
    business! *

  2. “rootfs 20G 18G 1.7G 92% /” i do not like that at all! what do
    you have in there taking up so much room?? highly suggest you set your
    system to clean out /tmp during boot, this way:
    http://tinyurl.com/yzmzp5b then, reboot and do another


df -h

if you are still short on space:
a. have you adjusted your logrotate to pack down and throw out old
logs? (are any logs over [say] 3 or 4 MB [none over 1 GB]?)
b. if you look in /var/log/messages do you see hundreds of repeating
errors next to each other?
c. what are they?
d. do you have a giant DB in your root directory?
3. or do you have an enormous amount of programs installed?

well, with only 1.7G of space for the system to to shuttle stuff through
/tmp and /var/tmp you could be forcing the system to instead leapfrog
data in/out of memory and /swap…and that could be a big I/O
hog…oh, and we are looking for an I/O hog…maybe we found one!!

lets get root to having (say) 7 empty gigs and see that goes…

  1. yes i know you disabled swap on purpose, i just didn’t want you to
    forget it (by the way, how does one disable swap?) but, another thing i
    see you have 4 Gigs of RAM and only 2 Gigs of swap…which might just
    barely be enough if you wish to hibernate or sleep…i suggest you
    should have at least 3G of swap…many would say 4.

ok, i do not know what the problem is, but i guess if you

A. empty tmp as described and learn the real disk size for root
B. backup your data and system to a different disk
C. reformat to include ext4 on root and home, and give swap 4G
D. restore from backup into root and home

and, then see how it sails…if the problem persists then i have say the
problem is in one of only two areas:

  • video graphics problem

  • hardware (main board? disk controller? cabling?)


DD Caveat
openSUSE®, the “German Engineered Automobile” of operating systems!
*

Ok, so I finally had a chance to make some changes on my laptop. I’ve since repartitioned the disk so the root partition is not so cramped and increased the swap size to 4GiB. I’ve also formatted everything ext4. So far things have gone much better, but I still did get the lock up of death the other day.

It does seem that when it happens and I look at iotop, there are a lot of processes using I/O and a lot of I/O wait. It’s hard to say what the culprit is when looking at the largest user of I/O (almost always read) is using maybe 3MB/s. Total I/O is very low as well. Usually, most processes are using KB/s or so and it doesn’t add up to much. I think actually the bigger issue is not total I/O, but disk thrashing by lots of random I/O. This is much harder to diagnose since there is no clear red flag process using all the I/O. Also, it tends to have a compounding affect where as one process starts causing disk thrashing, other processes start queuing up, which leads to even more thrashing.

Sometimes I can recover by killing firefox, but I don’t think that is a bulletproof solution.

BTW, I found the way to change the login timeout that I was having issues with. Change the LOGIN_TIMEOUT value in /etc/login.defs.

On 2011-11-02 02:36, adler187 wrote:
>
> Ok, so I finally had a chance to make some changes on my laptop. I’ve
> since repartitioned the disk so the root partition is not so cramped and
> increased the swap size to 4GiB. I’ve also formatted everything ext4. So
> far things have gone much better, but I still did get the lock up of
> death the other day.

I would do a checkup of the HD using the manufacturer utility, including
surface test. Otherwise, try smartctl long test.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)