Cannot open any new processes after some uptime

As of today, I’m suddenly experiencing a major and very weird issue in my openSUSE 12.3 system. After a certain amount of uptime, I am unable to open any new applications on my machine. The only way to open new programs is to close existing ones, like there’s some sort of process limit. Each process I try to start lasts for less than a second then dies. The error I’m given in the console when this happens is:

Maximum number of clients reached : cannot connect to X server

I searched about this issue, and from what I understand it happens when too many processes are started and a certain limit is reached. However, I compared the list of processes when the issue was taking place and after I restarted (not happening), and I have the same amount of processes open. This is confirmed both by the xlsclients command as well as the processes I see in KSysGuard.
xlsclients output with issue

xlsclients output without issue

What can be causing such major breakage after a few hours of uptime? I’ve never had something like this happening until today. Only system updates applied during the last hours were for the X server and fglrx driver, but I’d be surprised if something like that could cause this. Please don’t ask me to change the driver or downgrade packages (I no longer have the old ones), I prefer to avoid anything risky. If there’s any safe way to further debug it otherwise however, I will.

Check whether your issue is related to this https://forums.opensuse.org/english/get-technical-help-here/applications/483937-maximum-number-clients-reached.html#post2532796

Already checked. It’s unrelated to Akonadi, which I tried restarting / shutting down and it didn’t free the taken process slots.

Also, I just woke up, and as always I left my system running throughout the night. The problem doesn’t seem to be happening now. So it might be an application I open that spawns multiple hidden processes in some way. I wonder if there’s a way to track which could be causing it.

On 04/30/2013 10:26 AM, MirceaKitsune wrote:
> So it might be
> an application I open that spawns multiple hidden processes in some way.
> I wonder if there’s a way to track which could be causing it.

two or three times in the last several months i have ‘caught’ Chrome
with many many open processes…like maybe 50+…while only two or
three tabs are open…

if you use Chrome, next time you see this problem, check for one or
more zombies and try this at a user terminal–maybe you catch the same:


ps -ef | grep chrom[e]

and, if you close/kill all chrome processes the zombie(s) die too…

i don’t understand it, so i won’t bugzilla it.


dd
openSUSE®, the “German Engineered Automobile” of operating systems!

I use Firefox not Chrome. I tried shutting FF down, but that didn’t fix the problem (apart from freeing one process slot so I could open one other program instead of FF).

Ok, it appears I found out where the issue is coming from. For a completely unknown reason, my Firefox configuration broke. Luckily I had a backup of my settings so I restored that and only kept what’s new.

It’s pretty strange this caused it (if this was it indeed), since even after shutting down Firefox the problem would persist and I’d still have to restart. Will see how it goes and if the old configuration solved it.

Got even further in finding out what this most likely is. An addon update in Firefox is causing the problem to take place. Several addons are updated so I can’t tell which, but for the time being I reverted them to old versions and disabled automatic updates.

I assume it would be off-topic to post a list of my addons and ask Firefox users here which could be it. I can however mention that the 3 addons I suspect are FireBug, DownloadHelper and Save Images. Best solution might be to update them all manually until I find out which one is it, unless someone here might have an idea.

Ok… it appears this isn’t caused by the Firefox addons either. After a fresh relog, it does it with the old addon versions too. I don’t understand what’s happening any more… this is the craziest thing I’ve ever seen on a Linux system. Please let me know if you have more advice.

Only other things I did yesterday (when the issue started happening) was installing the official updates for X, fglrx, and apparently something with timezones. I doubt it could be related but just in case it might be relevant.

On 2013-04-30 15:46, MirceaKitsune wrote:
>
> Ok… it appears this isn’t caused by the Firefox addons either. After a
> fresh relog, it does it with the old addon versions too. I don’t
> understand what’s happening any more… this is the craziest thing I’ve
> ever seen on a Linux system. Please let me know if you have more advice.

I think you should start with:


ps afxu | less -S

I’m not sure if threads are included or not.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

What does “ps afxu | less -S” show exactly? Can’t understand the output fully. Will compare what it says before and after the problem… currently I’m making a list of comparison outputs which I’ll put here when ready.

For now I’m trying the “xrestop” command which might be very helpful for this. I’m waiting for the issue to happen again and hopefully it will indicate the bad processes. Currently it’s monitoring about 54 clients which sounds like the normal amount.

Issue is happening again. I got to compare all outputs which I put in a pastebin. My paste is divided between the outputs before the problem takes place, and the outputs of the same commands after it does. Here is the link:

http://pastebin.com/raw.php?i=Ty9PB2Gk

From what I can tell, all command outputs are nearly the same thing in both cases. Even when run as either root or user. Only conclusion is that there’s no secret process flood after all, but X server basically lowers the client limit in real time. As time passes, the client limit somehow decreases. This might be a huge bug in X and indicate a critical problem with the latest openSUSE packages (since it doesn’t look like anything I could have broken locally).

During this time, I kept running “xrestop” in console and watching the output. This confirm that the limit most likely changes. At first, I could open processes as long as it would say “Monitoring 55 clients”. Soon after, once xrestop said “Monitoring 54 clients” I could no longer open anything, meaning the limit went down by 1.

What do you think of this and the info in my pastebin? I really need to get my system fixed and understand what’s happening. Also, here’s the output of “ulimit -a”, which can be related to the problem:

mircea@linux-qz0r:~> ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 71817
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 71817
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
mircea@linux-qz0r:~> 

[EDIT] Discovered another clue. At some point, I was able to open Kwrite from a console but not Firefox. Where Kwrite would open, Firefox would give the “maximum clients reached” error. But after closing another process, both would open. So it would also seem like different processes weight differently somehow.

Sorry for posting so many times in a row again, but someone on IRC mentioned something relevant and important: A program can open multiple X handles and forget to close them. He even tested that and said that a simple program could call XOpenDisplay() countless times and it would lead to my issue (X server reaches its client limit and you can’t open anything on the system until you relog). In this case, closing the offending process doesn’t make a difference any more because the handles stay.

I’ve been debugging this the whole day. I can confirm with a dozen commands that no excess processes are open on my system, and I even experimented by shutting down nearly all processes in my session which didn’t fix this either (till I accidentally closed some init process and it logged me out). I’m disappointed that X allows this, and even a simple program can render the system unusable by opening connections and not closing them.

Question would be how I find out which program is going crazy and doing this. Also, isn’t there some way to enable a timeout in X, and close handles open by processes which are now dead? Is there a way to debug what’s forgetting open handles in the X server?

For now, all I can do is downgrade fglrx (I think I can re-obtain the old packages) as well as disable all addons in FireFox if that doesn’t do it. Those are the only two important things that happened yesterday and could have led to this.

Ok. So it seems that at last, I found out what the cause was. The latest fglrx driver (posted on the Geeko repository for openSUSE 12.3 yesterday) is very broken and caused multiple problems. From bad performance and glitches in some games to this. I couldn’t find the actual trigger (what established X connections without removing them) but the video driver was obviously the cause. Luckily I was able to downgrade and the issue no longer happens after about 2:30 hours of uptime and several tests.

I shall contact Tigerfoot about this when I see him online, as he handles packing the fglrx driver and the latest version is dangerous. Till then, everyone should be warned not to upgrade fglrx yet, otherwise they’re going to waste hours repairing their system too. I don’t wanna put my OS at risk so I probably can’t debug this any further, just hope someone can find what’s wrong with the latest fglrx driver (and maybe tell AMD to fix it).

On 05/01/2013 01:36 AM, MirceaKitsune wrote:
> The latest
> fglrx driver (posted on the Geeko repository for openSUSE 12.3
> yesterday) is very broken

it is also (i believe) pre-release/BETA software and should NOT be
discussed in this forum…

in fact, if it is your intention to run such software then you can
never be sure if any problem you have is caused by pre-release
software or not, and you should probably post all your problems to
one of only three places:

  1. the factory mail list
    http://en.opensuse.org/openSUSE:Mailing_lists#Development_lists

  2. the Pre-Release/Beta forum http://tinyurl.com/2du7r4s

  3. irc
    http://en.opensuse.org/openSUSE:Communication_channels#Instant_chat_.28IRC.29

posting to any of the technical help forums for released software
only sucks ups volunteer helper time needlessly.

on the other hand, please post in these help forums if you have
problems with any system you run which has not installed any
factory, geeko, playground, experimental, unstable, Tumbleweed or
otherwise distro version unreleased code.

ymmv, maybe others here have another opinion?


dd

DenverD: I had no idea the latest fglrx in Geeko is an unstable / beta version. I use the URL Index of /mirror/amd-fglrx/openSUSE_12.3 and the update was automatically offered there, which I applied in Apper together with the daily updates (9.012 (working) to 12.104 (broken)). I’ll keep in mind from now on to check fglrx updates before clicking the Apply button in their case, although I’d recommend the owner of Geeko to name the beta package differently so people who aim to use the latest stable one (like me) don’t apply the unstable version unknowingly. Also sorry if I dispatched people over a beta software bug thinking it’s a release issue… I had no idea about any of this during the course of yesterday. Will still post my findings in the beta section later on.

Either the driver is broken or your hardware might not be supported or your kernel is not supported. 12.104 may require newer version of kernel.

On 05/01/2013 11:46 AM, MirceaKitsune wrote:
>
> DenverD: I had no idea the latest fglrx in Geeko is an unstable / beta
> version. I use the URL ‘Index of /mirror/amd-fglrx/openSUSE_12.3’

as far as i can tell once each eight months the openSUSE community
releases a new version which is made up entirely from the contents
of the version numbered oss and non-oss repo…and, changes to that
version are released via the update and non-oss update repos.

so, if you (or anyone) adds software from factory, playground, geeko,
unstable, etc etc etc–you operate an experimental and testing system…

that is to say: until a new fglrx flows into a current numbered
release via an update repo, or into the next general version
release, it is pre-release for testing.

ymmv


dd

DenverD: fglrx is not distributed in the opensuse-update repository, since it’s the proprietary ATI driver and can only be found in third-party repositories (eg: Geeko). If it was I’d use it from there which would indeed be safer.

As said on IRC after several burn test hours, I can’t reproduce it on a freshly installed 12.3 + update + Games + FGLRX repository.
My Hardware is a AMD-x2 3400 + 8 GB Ram + 40Gb SSD + a HIS 5750 Gpu card.
So I’ve no way to find a root cause.

Could you remove any fglrx rpm installed, + modprobe -a + mkinitrd + reboot. Check if there’s no more xorg.conf or fglrx related in xorg.conf.d, and reinstall the 13.4 driver.

I don’t wish to take my system apart and do anything this risky, as much as I wanna fix this… sorry. At least until a safe and tested fix is known which I can then try to apply. Tigerfoot also tested the latest fglrx package yesterday and so far he said he can’t reproduce it either. So (as usual) it’s only something that happens on my machine. It’s been a day since I reverted to the old driver and the issue is clearly gone, so there’s no doubt the latest fglrx was it.

Just to be safe, make sure you have Catalyst AI set to Advanced. What I did that reproduced it (after logging in) was to restart Firefox and Thunderbird many times, but I assume any program with a window will do. Also to play the game Xonotic with maximum settings, although during one of the sessions I didn’t even open it and the issue still happened. Xonotic is where another problem with the latest driver takes place… causing some weird lag when FPS is too high and you look from certain angles on the map (the view jitters and snaps around and there’s lag to the mouse input).