file system becomes read-only, cannot restart apache

running 11.0,Linux 2.6.25.20-0.5-pae

using a 3ware 9500 adapter - raid 5 array with hot spare

the system is only 6 weeks old.

several days ago, apache would stop responding. trying a restart gives:

/etc/init.d/apache2 start
/bin/mktemp: failed to create file via template `/tmp/apache2.nb6xHyzlga2e’: Read-only file system
Error: could not create temporary file for writing loadmodules.conf.
/usr/share/apache2/get_includes: line 15: /etc/apache2/sysconfig.d/include.conf: Read-only file system
/usr/share/apache2/get_includes: line 16: 3: Bad file descriptor
/usr/share/apache2/get_includes: line 43: 3: Bad file descriptor
Warning: found stale pidfile (unclean shutdown?)
Starting httpd2 (prefork) /etc/init.d/apache2: line 108: /var/log/apache2/rcapache2.out: Read-only file system
Syntax OK

The command line was:
/usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL
failed

mount shows all file systems mounted as r/w

there are no specific messages in /var/log/messages, nor anything in the apache error log.

running status on the raid controller shows no errors or alerts.

if there’s an error on the raid array, shouldn’t it rebuild and keep going?

any help would be appreciated!

advwebsys wrote:
> running 11.0,Linux 2.6.25.20-0.5-pae
>
> using a 3ware 9500 adapter - raid 5 array with hot spare
>
> the system is only 6 weeks old.

did you buy it six weeks ago loaded with SUSE Linux Enterprise Server
version 11? (aka SLES 11)

or, was bare metal and you loaded openSUSE 11.0?

and why would you in mid-November load a system on a new machine due
to go non-supported in June 2010 (and have only EIGHT months use)???

do this at a command line and report the results back


cat /etc/SuSE-release

if you are actually running openSUSE 11.0 have you updated it to
current level (that is, i remember that 11.0 was born with a crummy
kernel that was very soon replaced…and, you are not running reiser,
are you?

hmmmmm…how come apache thinks it is trying to write to a read-only fs??

> several days ago, apache would stop responding. trying a restart
> gives:

when you say “apache would stop responding” what do you mean? that is
were you using a browser on another machine somewhere and you couldn’t
fetch html pages? or, did you use telnet/ssl/other to access the
running system directly and saw that apache was ‘frozen’…what tool
told you it was not responding…

what error did you see?

that is, how did you determine what was wrong? the problem (not the
symptom)?

and how did you learn that a restart was the correct solution to the
problem which caused the stopped responding symptom??

ooops! “Warning: found stale pidfile (unclean shutdown?)” after you
determined apache had stopped responding, did you shut down the entire
operating system, or just apache?

[you didn’t use the old Redmond Standard: force a shutdown, reboot,
pray a little and hope it fixes itself, did you?]


palladium

palladium - i’m not a newbie. we have been designing and hosting websites for 15 years. this machine was the most recent we have built on on a standardized hardware platform - with minor variations, like quantity of ram. the last four we built run openSuse 10. this was our first attempt using openSuse 11.

in all my experience - going back to redhat rel 4, i’ve never encountered a similar problem.

here’s the mount info:

/dev/sda2 on / type reiserfs (rw,acl,user_xattr)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
securityfs on /sys/kernel/security type securityfs (rw)

so we are running reiser (it’s our standard).

when we did the initial install, i recollect that we pulled package updates, but probably did not update the kernel. btw - while the system has only been online for about six weeks, the installation was done about six weeks before this, since this system is a replacement for a customer’s dedicated server. we had a lot of migration to do.

as far as problem diagnosis goes, the symptom was detected by firefox (and other browsers) throwing an error saying that the load process was interrupted.

logging into the system with ssh and using top as ps showed that apache was in fact running and was not frozen, although one has to believe that with the entire file system set to read-only, apache could do no logging - not even to the error log. this is validated by the fact that there are no error messages in the apache error log indicating a problem. explicitly shutting down mysql, then shutting down apache, then issuing a reboot clears the problem (for a while).

so your “hmmmmm…how come apache thinks it is trying to write to a read-only file” is the 64 dollar question.

to further prove that the file system ‘thinks’ it read-only, doing a touch ANYWHERE in the file system fails.

the ‘found a stale pidfile’ message is most likely an artifact due to the fact that while shutting down apache it could not delete the pidfile.

sounds like a kernel upgrade is the possible solution. didn’t know 11.0 shipped with a crummy kernel :wink:

advwebsys wrote:
> palladium - i’m not a newbie. we have been designing and hosting
> websites for 15 years.

but i can’t see that from here, and you didn’t say (we have windows
whizkids popping in here all the time…they have one back office
server they try to fix with Redmond reboot…yours sounded like it
could be.

you can drag the fora for clues but i’m about 98% certain that the
kernel which shipped with 11.0 would NOT work with reiser…i don’t
recall if they patched the kernel or patched reiser…

and, btw most folks are breaking bonds with reiser, and moving to ext4…

btw, are you doing your clients any good deeds by giving them new
boxes that can’t be security updated in six months???


palladium

palladium wrote:

> you can drag the fora for clues but i’m about 98% certain that the
> kernel which shipped with 11.0 would NOT work with reiser…i don’t
> recall if they patched the kernel or patched reiser…

sorry, i hit send too soon…i meant to add, i would highly recommend
you make sure that box is fully updated and patched, and see if the
problem goes away…iirc the new kernel/reiser will fix it…

if not the next thing i’d look at is that raid system…you can do
that while YaST is updating, see for example
http://www.google.com/search?q=site%3Aopensuse.org+3ware+9500+adapter


palladium

didn’t know 11.0 shipped with a crummy kernel

So did 11.1 … some updates save you a lot of grey hair.

Though to be fair to Novell, it was an upstream off-by-one kernel bug and 11.1 released in the window between the bug being introduced and being fixed. What was less forgivable was that it took so long for the fixed kernel to go through QA.

This is not the first time that SUSE comes with a filesystem bug in its kernel. I still remember the days of SUSE 9.1 Pro which had an XFS bug and during install when chosing for XFS as filesystem, the kernel will just oops and everything will abort. Of course, same behavior after install with another FS and then chosing to format a disk with XFS… kernel oops :smiley: