Recent Forums downtime

First,
If the reason for recent Forums downtime is related to the infrastructure migration and upgrade announced recently (https://news.opensuse.org/2017/07/14/heroes-preparing-to-make-the-leap/) congrats on executing this with relatively little interruption.

But,
The fact on at least 2 nights/days this past week was noteworthy and perhaps disappointing.
I assume that whether the downtime was expected depends entirely on the objectives stated prior to these events…

  • Were they expected and planned or were they unexpected?
  • Was there an evaluation that a certain amount of downtime was bearable considering that the Forums (and any related web services) considered non-critical which was part of evaluating “allowed downtime?”
  • Even if a certain amount of allowed downtime was defined, did anyone think this might have been a unique opportunity to test related strategy and policy like disaster recovery (It’s always nice to know you can execute a disaster recovery when you’re not in the middle of a real emergency)?
  • If any of these optional objectives were considered, were there sufficient resources allocated to do the planning, setup, testing and then actual execution of a perhaps complex sequence of steps?

So,
For example I would think that this would have provided a nice opportunity to

  • Test virtual machine orchestration and management (OpenStack?) in both the USA and Germany colo sites.
  • Test the upgrade process of the Forums which I would think would include step by step connecting and disconnecting the web frontend with the database backends in turn (or not) with an effort to cut downtime to minutes instead of the many hours I observed.
  • Perhaps included would be a number of staging and test instances.

The real value is that if the above were done, then all issues and solutions would be fully documented to improve procedures, practices and perhaps resulting even in white papers.

IMO,
TSU

A message stating “stuff is happening” might have been a good idea, now you were greeted with a login error message from MF after a lengthy timeout.

We have https://status.opensuse.org for that.

And yes, a lot of migrations are taking place, with the goal to get the openSUSE infrastructure community managed. And ( thank you mr. Murphy ) this all should go smoothly. I’m logged in, posting here, so it’s back to normal. The changes made in the infrastructure have all been initiated by the community,

We might know this, the random person trying to browse the forums… most likely doesn’t.

I know, it’s a rare occurence but it would have been nice.

Edit:
Yesterday the API was down like a legless donkey but status showed everything green :slight_smile:

FYI any forum downtime was not expected. I talked to the IT staff during the
process and they weren’t expecting forums to be offline and were surprised they
were. I hope they’ve found all the issues now and have them resolved.


Kim - 7/17/2017 9:27:34 AM