Inproper shut down? Raid failed?

Hi Guys, I don’t have much experience with OpenSuse. It was installed on a machine before I started working here.

Recently we had some power problems with power going on and off frequently. Yesterday it booted up fine, everything was normal, today, I get an error.

Sorry I don’t have the exact error at the moment, I’m in the middle of semester with assignments due/overdue and had to ask someone at work to read what it says;

Something like, Raid array 0 cannot be mounted?
Root directory cannot be found.

If possible, can you repair the array?
If not, can I get the data off the hard drives?

Any help or support pointing me in the right direction would be greatly appreciated. I can try to get more information if required.

Cheers

Maybe yes maybe no.

  1. what version?
  2. do you have an install disk?
  3. do you know what kind of RAID? 0,1,5,10??

Power failures can cause disk sector and/or file system corruption. This is bad for a normal setup but is a real disaster with some forms of RAID.

Since the data is important I’d suggest going to your backup. You do have a backup right? :open_mouth:

Cheers for your quick reply gogalthorp.

There is an external HDD that backs up weekly. I haven’t checked how that is going yet. Hopefully it held out. If we can avoid re-entering a weeks worth of data, it would be preferable.

I can’t answer 1 and 3 at the moment, but for 2 I can say for sure, no, there is no installation disk.

I can say, the old box had 2 drives in RAID 1 configuration. I can only assume this one is set up similarly.

Would be great to know what sort of information I need to get, so I can get it all in 1 go, rather than driving back and forth to work.

TekFox wrote:
> Would be great to know what sort of information I need to get, so I can
> get it all in 1 go, rather than driving back and forth to work.

work has neither internet access nor uninterpretable (battery backup)
power–in a place depending on a Linux server for doing business, and
making only weekly backups?

hmmmm…i wonder who set up that house of cards?


palladium

I wont say his name, because he in fact threatened to counter-sue my work for slander, after they threatened to sue him for the dodgiest network ever set up.
But he’s gone now, and I get the thrilling job of cleaning up the mess.

It does have internet, but, I am busy university studies to hang around work all day =)
It has a UPS too, but, it really does nothing.
Power goes out, it kicks in, its battery runs out, server does nothing and still shuts down unexpectedly.

We are in the process of setting up a new HIGHLY improved system. It is practically ready to go, and it was planned to start 1st of April (new financial quarter), but it had to die just 7 days before the switch. :’(

I’m bringing the box + external HDD home now, so I’ll get more info very shortly…

TekFox wrote:
> I wont say his name

what a story! sorry for you…
what you need to tell us is what gogalthorp already asked…

get version info from (with the errors given so far i’d it most likely
you will have to boot from something like a Live CD/DVD with Linux,
openSUSE, Knoppix, Red Hat it does not matter so much, then mount the
drives –==READ ONLY==– and collect info):


cat /etc/SuSE-release
cat /etc/issue
uname -a
cat /etc/fstab
df -h
df --print-type
su -c "fdisk -l"

there are probably others i’m not thinking of at the moment…perhaps
the real gurus will speak up now and add to my neophyte list…

i read where you say “don’t have much experience with OpenSuse” so
please state your level of generic Linux experience…that way we have
a clue now basic to get…


palladium

Please get as basic as you can get… My only experience is with Red Hat 4 years ago in college… doing entry level stuff. All I’ve done with OpenSUSE is follow step by step guides to install software.

Alright, here is the errors from booting it up. It is version 10.2 apparently.

md: md2: raid array is not clean – starting background reconstruction
raid5: device sda3 operational as raid disk 0
raid5: device sda3 operational as raid disk 3
raid5: device sda3 operational as raid disk 1
raid5: cannot start dirty degraded array for md2
RAID5 conf printout
— rd:4 wd:3 fd:1
disk 0, o:1, dev:sda3
disk 1, o:1, dev:sda3
disk 3, o:1, dev:sda3
raid5: failed to run raid set md2
md: pers->run() failed…
mdadm: failed to RUN_ARRAY /dev/md2: Input/output error
Waiting for device /dev/md2 to appear: ok
/dev/md2: unknown volume type
invalid root filesystem – exiting to /bin/sh
sh: no job control in this shell
$

TekFox wrote:
> Alright, here is the errors from booting it up. It is version 10.2
> apparently.

ok now, here is the deal: i’m not a real guru (sometimes i get lucky
and ask the right question) and i’m not able help you with this
problem and hope someone who can stops in and will help.

however, i’m pretty the real guru is gonna need more info than just
that error message…so, do you have a Linux boot disk from any distro?


palladium

You might download and then boot to a copy of gparted.

GParted – Download

This will let us see the partitions at least hopefully.

We still are not sure if this is Hardware, Fake, or software RAID 0,1,5,10. Also the error’s are a bit odd There seems to be a missing disk. Also this points to maybe a RAID 10 setup. Mirrored and striped Do you know how many drives the box actually has?

As for physical hard drives, there are 4. And then the external USB Hard Drive.
For drives, ie C: D: E: etc I can’t remember unfortunately…

So, I download gparted, burn to CD, boot up from CD.
Alright I’ll give that a shot.

I really appreciate your help.

Anything else that needs doing, let me know how to do it and what info you need.

Cheers

when i woke up this morning i had a thought:

i don’t know how big that business is, or how much they stand to lose
if you spend a week and then have all the data on those drives
trashed (which can easily happen if you don’t know what you are
doing–and you have said you do not…no fault of your own…it is
obvious you are not stupid, but unlearned in *nix it is . . . well…)

so, i’m gonna suggest you seek competent local help…i don’t know
where you are but know you are in a place big enough for a
university…there are Linux User Groups all over the earth
<http://www.linux.org/groups/> i think you ought to call/email someone
and ask them for a recommendation or three for a competent Linux
administrator in your area…and, give them a call, explain the
situation and ask for a quote…

it could be that for a couple of hundred bucks, or less (OR more after
you buy new hard drives and raid controller (maybe) . . .

but, i’m gonna guess every day delay is costing someone some money!

ymmv but sometimes it is better to say “I’m over my head here.”

do what you want, but until we have more info we can’t possibly give
help…and, “C: D: E:” is not info we can use…

just you extracting the info we need could do damage IF you do not
mount the drives in read only mode…

your move, shot us some info or . . .


palladium

Thanks for the advice palladium,

Well, the business is running on the new system now. Its been bumped ahead because of this hiccup. The emails (previously hosted on the Linux box) have been redirected to the new datacenter. So it is running smoothly now, all that remains is to get the old data off this box.

I do admit I am not well learned in Linux, which is why I came to the forum seeking advice. I know the business would not hesitate to pay someone to do the work, but there are very few people around that will do it.

I’ve called one company, and they are now contacting all their technicians to see if anyone is willing to work on our system.

We’ll see how that goes for now.

Still in the dark but if 4 hard drives (that is physical hard drive not the pretend C:.D: etc hard drives that has confused windows users for ever which are actually partitions) then it is a good guess that you are running RAID 10. This means that the data is mirrored and stripped. But because ever other sector is written to a different 2 drives it is almost impossible to recover the data if the RAID system is too broken to recover itself.

This becomes a serious economic decision. Exactly how much is that week of data worth?? It could be truly costly to recover it. :’(

Always remember RAID is not backup!!

TekFox wrote:
> I do admit I am not well learned in Linux, which is why I came to the
> forum seeking advice.

i hope you didn’t get the feeling i was trying to belittle your
knowledge…in fact i was impressed with your correct decision to
seek assistance…

it was just after a couple of day and i could see no progress (there,
where you are, coming this way) i felt it better to get the business
back up quickly AND with a high expectation of success…

and, i couldn’t see both of those happening via this venue…

fact is, i’m still not sure you were actually running openSUSE 10.2
and not SUSE Linux Enterprise Server version 10 SP2 (aka SLES 10.),
which IS still supported by Novell…

and i guessed the company has PAID for support from Novell…so, that
is why i asked for that


cat /etc/SuSE-release
cat /etc/issue

because i HOPED for you that you actually had SLES and i’d send you
over there where i would expect that they COULD see you to a quick AND
safe fix…

> I’ve called one company, and they are now contacting all their
> technicians to see if anyone is willing to work on our system.

i’m kinda concerned that maybe you looking for Linux help inside of
places where only Window’s PowerUsers live…

can you say the city you are in, and or the nearest ‘large’
city…perhaps someone reading HERE, now, is right down around the
corner…

and, would consider it a fun challenge to see if s/he can get you
cooking again…(oh, and in addition to fun, rewarding)


palladium