power failure protection during long compute runs

Dear Opensuse User Community: I’m thinking that the best way to protect LOOOONNNG runs of computer simulations is with the computer sensing a power failure and initiating Hibernate. Sleep might work if you have enough UPS. It’s never predictable how long powerdowns will be. Does anyone have wisdom to share on this? Some googling around revealed a few ways this might be handled with linux - so I’m wondering if there is a prefered way with OS13.1? I’ve never had any luck with the software that comes with UPS’s before, and so was wondering if the linux folks had already worked this out. Thank You!! Patricia

On 2014-10-18 18:26, PattiMichelle wrote:
>
> Dear Opensuse User Community: I’m thinking that the best way to protect
> LOOOONNNG runs of computer simulations is with the computer sensing a
> power failure and initiating Hibernate. Sleep might work if you have
> enough UPS. It’s never predictable how long powerdowns will be. Does
> anyone have wisdom to share on this? Some googling around revealed a
> few ways this might be handled with linux - so I’m wondering if there is
> a prefered way with OS13.1? I’ve never had any luck with the software
> that comes with UPS’s before, and so was wondering if the linux folks
> had already worked this out.

Yes, I have used this for years, but not currently. There is a daemon
for it, I think the name is “nuts”.

The problem is that hibernation some times fails and aborts. You need a
failsafe feature so that it then powers off or try force the issue.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

On 2014-10-18 18:26, PattiMichelle wrote:
>
> Dear Opensuse User Community: I’m thinking that the best way to protect
> LOOOONNNG runs of computer simulations

… would be the software saving intermediate states periodically. :wink:


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Thanks for the reply!

That is sort of how I’ve been doing it. But I guess if the power goes out
at just the wrong moment, then you can crash a hard drive. So I guess
I was thinking more of hardware… and also things in the *nix world do
evolve, so I was thinking there might be a common kernel-ish solution
in hand, but there doesn’t seem to be.

Thanks again,
Patricia

APC offer some good UPS technology, and provide utilities to handle graceful shutdown/hibernation in the event of mains outages.

A good wiki here:
https://wiki.archlinux.org/index.php/APC_UPS

Lots of other good material online if you care to search further. :slight_smile:

deano ferrari wrote:

>
> PattiMichelle;2670126 Wrote:
>> Thanks for the reply!
>>
>> That is sort of how I’ve been doing it. But I guess if the power goes
>> out
>> at just the wrong moment, then you can crash a hard drive. So I guess
>> I was thinking more of hardware… and also things in the *nix world do
>> evolve, so I was thinking there might be a common kernel-ish solution
>> in hand, but there doesn’t seem to be.
>>
>> Thanks again,
>> Patricia
> APC offer some good UPS technology, and provide utilities to handle
> graceful shutdown/hibernation in the event of mains outages.
>
> A good wiki here:
> https://wiki.archlinux.org/index.php/APC_UPS
>
> Lots of other good material online if you care to search further. :slight_smile:
>
>
Patricia;

As deano ferrari mentioned APC, I would like to add that there is a package
in the openSUSE repositories that supports APC devices, apcupsd. I’ve been
using this for a number of years. It seems to work just fine. I normally
edit the config file for the options that suit me.


P.V.
“We’re all in this together, I’m pulling for you” Red Green

On 2014-10-18 23:16, PattiMichelle wrote:
>
> Thanks for the reply!
>
> That is sort of how I’ve been doing it. But I guess if the power goes
> out
> at just the wrong moment, then you can crash a hard drive.

There are tricks even for that :wink:

Say at the interval time you:

  • mount partition A
  • save intermediate status to it
  • umount it.

so it is safe. At next interval time you do the same to partition B, so
that in the event of power failure happening at the worst possible
instant, that is, when it is actually saving the status, the other
status partition is safely umounted perhaps minutes before. You lose at
worst one interval.

> So I guess
> I was thinking more of hardware… and also things in the *nix world do
> evolve, so I was thinking there might be a common kernel-ish solution
> in hand, but there doesn’t seem to be.

Yes, there are. There are some daemons that watch the status from the
UPS unit and power down or hibernate the machine.

I would do it like:

  • on mains failure and battery below certain level,
    attempt to hibernate.
  • when the attempt to hibernate returns, and we are
    still running on battery, it means that hibernation
    failed, so halt the machine instead.

This can be tested on virtual machine, to avoid crashing the real one.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

PattiMichelle wrote:

>
> Thanks for the reply!
>
> That is sort of how I’ve been doing it. But I guess if the power goes
> out
> at just the wrong moment, then you can crash a hard drive. So I guess
> I was thinking more of hardware… and also things in the *nix world
> do evolve, so I was thinking there might be a common kernel-ish
> solution in hand, but there doesn’t seem to be.
>
> Thanks again,
> Patricia
>
> robin_listas;2670116 Wrote:
>> On 2014-10-18 18:26, PattiMichelle wrote:
>> >
>> > Dear Opensuse User Community: I’m thinking that the best way to
>> protect
>> > LOOOONNNG runs of computer simulations
>>
>> … would be the software saving intermediate states periodically.
>> :wink:
>>
>> –
>> Cheers / Saludos,
>>
>> Carlos E. R.
>> (from 13.1 x86_64 “Bottle” at Telcontar)
Depending on your load, a UPS would also you to shutdown orderly. The
time they give you depends on the amount of battery power they have. I
agree with the recommendations above on APC. When I was working I had up
to 800 APC brand UPS throught the world. Today at have I have three.

If you meed to keep the application running a Motor Generator setup is
one way. That will depend how much fuel you have. These are usually used
for large data centers.

The main goal in either case should be an orderly save of your data and
then shutting down the application and computer.

Russ

openSUSE 13.1(Linux 3.11.10-21-desktop x86_64|
Intel(R) Quad Core™ i5-4440 CPU @ 3.10GHz|8GB DDR3|
GeForce 8400GS (NVIDIA-Linux-x86_64-340.46)|KDE 4.14.2

On 2014-10-20 18:27, upscope wrote:

> The main goal in either case should be an orderly save of your data and
> then shutting down the application and computer.

They can just as easily hibernate the machine, in which case nothing is
lost.

Unfortunately, there is no… “something” to tell all running
applications to save data and quit as fast as they can, because machine
is going down on emergency. Do not ask questions, do not stop for
anything whatsoever, no optimization of the database on closing: no time.

Even if you are in front of the machine: with no room lights, the UPS
beeping faster every second, pushing you nervous, then mad, as some
application you try to close as fast as you can starts asking you silly
questions about overwriting something or what name to give the file.

Because typically you don’t know for certain how many minutes does the
battery last…


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

+1 for the apcupsd software. Recently, I was doing some wiring at my house, and flipped the wrong circuit breaker. Only when I felt the resulting hot wiring (several times :O) was I willing to concede the fact. I had shut down the outlets to my computer system(s). The only one running at the time was this one (13.1 - KDE - apcupsd) and it had been hibernated and shut down properly when I checked. This was the one and only time I have had the software activate because of power loss, but I feel much more secure now.

On 2014-10-21 10:16, montana suse user wrote:

> been hibernated and shut down properly when I checked. This was the one
> and only time I have had the software activate because of power loss,
> but I feel much more secure now.

Some time ago, I found out when coming back that a machine that should
be up full time was off. Happened a few times. Power was running. other
machines were not affected, apparently.

I blamed system upgrade, kernel, heat… nope, it was none of that.
Simply the UPS battery was too old to cope, it went out withing a second
of power failure. So a quick mains glitch made my “server” to go down
instantly… I happened to be looking once when it happened.

So I just bought a new battery, and never happened again. Yet.

Don’t you trust those things too much…


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

I shall take your advice to heart! lol!

The UPS I mentioned has the capability of self testing the battery. I do it regularly.

As to UPS’s, in addition to making sure you have a good battery:

  1. Never use a surge protector before or after the UPS and
  2. Properly size your UPS. Sure it takes some time to read the labels on your equipment, but its well worth it. My golden rule is ‘More is better’. If your equipment requires 600 watts, go to the next higher unit. Ideally, I like to see them running at about 70-80% of their full load capacity.

On 2014-10-21 15:06, sparkz alot wrote:
>
> As to UPS’s, in addition to making sure you have a good battery:
> 1) Never use a surge protector before or after the UPS

No. Do not meddle with surge protection. Keep it. Under DEATH penalty. :expressionless:


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Modern UPS’s already come with surge and brown-out protection. Placing a surge protector before or after the UPS interferes with the UPS’s ability to respond correctly to these situations, resulting in some unintended and sometimes, unpleasant results. :slight_smile:

Seeing that majority is recommending APC, here is their reasoning:

Using surge strips with APC’s Back-UPS and Smart-UPS products. Published 01/07/2002 08:00 AM | Updated 07/13/2010 02:19 PM | Answer ID 1372 This document will explain why APC recommends against the use of any surge protector, power strip or extension cord being plugged into the output of any APC Back-UPS and Smart-UPS products APC recommends against the use of any surge protector, power strip or extension cord being plugged into the output of any APC Back-UPS and Smart-UPS products. This document will explain why. Plugging a surge protector into your UPS: Surge protectors filter the power for surges and offer EMI/RFI filtering but do not efficiently distribute the power, meaning that some equipment may be deprived of the necessary amperage it requires to run properly causing your attached equipment (computer, monitor, etc) to shutdown or reboot. If you need to supply additional receptacles on the output of your UPS, we recommend using Power Distribution Units (PDU’s). PDUs evenly distribute the amperage among the outlets, while the UPS will filter the power and provide surge protection. PDUs use and distribute the available amperage more efficiently, allowing your equipment to receive the best available power to maintain operation. However, please note that the UPS is designed to handle a limited amount of equipment. Please be cautious about plugging too much equipment into the UPS to avoid an overload condition. To understand the load limit of your particular model UPS please consult the User’s Manual, or visit APC’s Product Page at www.apcc.com/products. Plugging your UPS into a surge protector: In order for your UPS to get the best power available, you should plug your UPS directly into the wall receptacle. Plugging your UPS into a surge protector may cause the UPS to go to battery often when it normally should remain online. This is because other, more powerful equipment may draw necessary voltage away from the UPS which it requires to remain online.

On 2014-10-21 19:46, sparkz alot wrote:
>
> Seeing that majority is recommending APC, here is their reasoning:
>
>>
>> Using surge strips with APC’s Back-UPS and Smart-UPS products.
>> Published 01/07/2002 08:00 AM | Updated 07/13/2010 02:19 PM | Answer ID
>> 1372 This document will explain why APC recommends against the use of
>> any surge protector, ‘power strip’

NOTE: Removing the Surge Protection is ILLEGAL in several countries. In
case of accidents or mishaps, the insurance will plain refuse to pay,
and you may face the court.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

On 10/21/2014 7:06 AM, montana suse user wrote:
>
> robin_listas;2670463 Wrote:
>> On 2014-10-21 10:16,
>> Don’t you trust those things too much…
>>
>> –
>> Cheers / Saludos,
>>
>> Carlos E. R.
>> (from 13.1 x86_64 “Bottle” at Telcontar)
>
> I shall take your advice to heart! lol!
>
> The UPS I mentioned has the capability of self testing the battery. I
> do it regularly.
>
>
I’m not sure that self test is all that reliable. After a few hard lessons with the UPS failing, I have taken to simply
replacing the battery every couple of years. (Or more precisely when I suddenly remember that the battery is ancient.) :slight_smile:


P.V.
“We’re all in this together, I’m pulling for you” Red Green

I’m wondering if we have a mus-interpretation of what a Surge Protector is. Here in the US, what we almost always call a Surge Protector is a fairly inexpensive device that consists of a short, fairly heavy power cord with a plug on one end that plugs into the wall outlet, and has a block or strip of outlets on the other end. These outlets are connected to a (usually) pretty cheap diode that shunts spikes to ground. The diodes scar each time a spike occurs and therefore lose effectiveness rather quickly. Necessitating a replacement of the device.

It sounds to me, that your system is built into the building’s wiring and is designed to be permanent. How close am I?

Bart

Oh, I don’t know. It seems to work. In any event, I always keep the UPS size so that I am never using more than 25%- 28% of capacity. Computer, Monitor, Modems, Routers and desk lamp. The sound amp and the printers can go down. The laptops are on Surge Protection only as they have there own “UPS”. I re-wired all my outlets to isolated ground and actually have very little trouble.

In addition, twice a year, I close most applications on my system, pull the UPS plug out of the wall, and let things run until things shut themselves down. If that’s less that 30 minutes, I replace the UPS.

Bart