|
||||||
| Forums FAQ | Members List | Search | Today's Posts | Mark Forums Read |
| 64-bit Questions specific to 64 -bit hardware (Software questions should be posted in the appropriate software forums) |
![]() |
|
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Ah ha - I might - I say might - have found the issue. Now to hunt down a solution.
I noticed over the last few days, and getting more familiar with latencytop that the following were popping up a lot. - fsync() on a file - Writing a page to disk - Writing buffer to disk (synchronous) - EXT3: Waiting for journal access Top and latencytop keep reporting in most cases, but occasionally they would stop too. Usually up to 3 CPUs would have high wait states of 90% or more. The machine would not unfreeze until all wait states were back to 0 I had a brainstorm... As I use the Evolution email client, and it is the most painful when this happens, I put a search together and found: More on the EeePC hangs | Community Matters Which seems to be the same issue on an eeePC. There seems to be an issue with fsync() and the ext3 filesystem. It seems that Evolution, and Firefox, use sqlite and this uses fsync() heavily which blocks causing the freeze. Now to solve the issue... Will report back. Cheers Jim |
|
|||
|
FYI,
Found very high "wait load" on CPU, often over 90% when editing a large document in OOo 3. After letting latencytop run overnight, found this high reading on file lock contention. Does this make sense to anyone? ![]() ![]() Generic notebook (eRacks), openSuse 11.1, OOo 3.1.1 |
|
|||
|
konsultor, smpoole7
I have seen latencytop report the minus numbers when my machine has done a "hard pause" (all is frozen, no mouse movement etc.) and I think it's internal timers lose the plot too. As for the large document issue I am not sure if Ooo uses fsync or not, but for me this is where I think the issue is. My findings since last post: After some reorganisation of my file systems, I am now running all partitions except /boot on reiserfs. As I moved my home partition to reiser (on its own disk), and the /root and /mnt/disk1 were still ext3, there did seem to be some improvement. The amount of the freezes seemed to decrease a bit. Evolution was almost usuable. However, as I have moved all partitions now to reiser, the issue is still there. The one big thing I notice it that when the freeze is about to happen/happening the HD light on the front panel of the machine is on solid and the wait states increase across multiple CPUs. When it is working OK the HD light does its usual flicking as expected. So there is something not right in that area. The really annoying part of all this is the machine is quick, really quick, when it is working, but it is taking me longer to do things as I have to wait around for upto minutes at a time before it gets past a "hang". Interestingly enough if I continue to click and type while it is "hung" these are buffered somewhere and are acted upon when the machine returns. So the jam is not on inputs. As 11.2 is on its way I might wait for it to be released and give it a go, Have to check which version of the kernel is being implemented for it. Maybe check Bugzilla again and post a bug if it does not fix the issue. Cheers Jim |
|
|||
|
Latest update...
I think I have found the issue... Had a bit of a think about this and wondered what would happen if I fail one of the disks in the MD RAID..? YESSSSSSSSssssss!!!!! It has stopped the issue. The machine is now whizzing along. Launching apps, evolution, firefox all the ones that were causing issues are now working. The Wait States are jumping up and down, but only in single digits. The HD light is acting normal again. So having MD RAID in a Raid 1 (Mirroring) on the root partition was not a good idea. Or it needs some fine tuning for 1 TB disks and/or the ICH10 SATA controller. More research needed... Jim |
|
|||
|
An interesting thread, but it seems hard for anyone to comment much. A few comments strike me.
An interesting problem but not a very clear one. There's nothing intrinsically wrong with having / on a mirrored md(4) device, and I have used such setups on SMP systems in past. I saw something similar in effect when there was a driver issue, but then there were error messages logged in files (/var/log), which you have not mentioned. Are you sure there's nothing reported about DMA errors for example? CPU Memory wait states, are a very low level hardware issue, the CPU stalls waiting to load from RAM, hence pre-fetching, L1, L2 & L3 caches, and SMT/HT to utilise the core logic better whilst the CPU is (invisibly to OS) cycling on a memory access. The I/O wait state reported by top(1), is reporting something different, not memory, but seperating out the idle "waiting for I/O" and idle "nothing to do". 10 years ago, you did not see that information, my I/O wait goes up when I have a I/O work to do. The netbook report found was the famous SSD "stutter" problem, basically small writes would cause a slow read-erase-write cycle on random I/O which was very inefficient, SSD's have had improved firmware and larger RAM buffers to improve predictability on non-sequential 8writes. Konsultors latency on fsync(2) of 60 msec seems to be expected, after all that call returns when ALL outstanding write I/O has reached the hard disk. > There seems to be an issue with fsync() and the ext3 > filesystem. It seems that Evolution, and Firefox, use > sqlite and this uses fsync() heavily which blocks > causing the freeze. It was very regular fsync(2) which was writing ALL outstanding cached blocks, which killed interactive performance. Hence splitting /home, /var, /tmp and /, mounting with data=writeback, and performance improvers like mounting with relatime (or noatime) so inodes aren't forced to be written after file reads. Given that you mention several cores showing similar, that you might have stumbled on something like a lock contention issue with multi-threading in md(4). But more likely it's related to the Intel ICH10 SATA controller. Digging around I have come across some mention of issues with some Intel SATA controllers : [Bug 187383] Re: System monitor causes Xorg to consume 100% CPU Nabble - freebsd-current - SATA DMA errors on second ICH10 bus Perhaps there's a quirk, showing itself up during RAID 1 writes? Are you using a "tainted" kernel with binary graphics driver? |
![]() |
|
| Bookmarks |
| Thread Tools | |
| Display Modes | |
|
|