Yesterday I installed 11.4. First it did run very nice but now I am facing an odd problem:
I am mounting on my new 11.4 box a folder via nfsv4 from an opensuse 11.2 server (updated to kernel 2.6.33.6).
I got some folders where the 11.4 cannot see files which are defenitely present on the server. They are completely invisible to the client! In one particular case I have eg. 108 files in the folder on the server, but the 11.4 box only sees 107. One is missing. It does not show up using ls and I also can’t access it blindly using eg. cat. The file does not differ from all the others in that folder regarding permissions/ownership/acls etc. It is also not big (1728 bytes) nor is it a special file (link/socket etc. - just a plain C header file). That single file is just not visible when mounting with nfsv4 on the 11.4 box. When mounting it with nfs3 instead the file is visible. But for obvious reasons I want to stick with nfsv4.
I restarted all nfs related daemons on both the server and the client a couple of times but it does not make any difference. I also cannot find any problems in any log files on the server or client up to now. I rebooted the client but not the server (I can’t do that right now as it is in heavy production).
I do not see these problems with other opensuse boxes running 11.0/11.2 or 11.3 in my network. They can see/access these files quite nicely.
This is really strange. Does any have a clue what is going on here or how I can narrow that problem down?
I did some more tests by taking my server box offline this weekend.
I updated the (client) 11.4 box to kernel 2.6.37.3 (from 2.6.37.1) -> no change
I updated the (server) 11.2 box to kernel 2.6.37.3 (from 2.6.33.6) -> no change
On the server I updated rpcbind/tirpc/nfs-client/nfs-kernel-server/idmap to the
same versions as on the client -> also no change.
I did a filesystem check on my server (btw. ext4) with no error and no change.
With nfs4 I can’t see certain files on my server box from the 11.4 client while this works nicely with eg. my 11.0 client boxes.
I am going nuts on this… Has anyone made any similar observations in the meantime?
Again an update on this maybe I am getting closer now…
One things has changed during my updates I have noticed now.
I can now blindly access the invisible files using eg. cat or stat but I cannot see some files when using ls.
When I copy my set of headerfiles to a different location on my server (by using cp -a) I also copy the problem.
I started to play with ownerships. I noticed a change when changing group owner ships.
All files in this folder belong to a certain user/group. All have the same permissions 0644 (-rw-r–r–). The folder itself has 0755 (drwxr-xr-x). The folder originally belongs to a user with id 5000. Group id is 1001. My users/groups are stored in LDAP. When I change group ownership of that folder on the server to a group stored in LDAP with an id < 1000 it works. I can see all files after than. When I change group ownership of this folder to a group in LDAP with an id of >= 1000 some files appear to be missing. When I change group ownership to unknown id <1000 or >=1000 it appears to work, too.
Changing user ownership does not matter AFAICT up to now.
It is sufficient to change group owner ship of the folder itself. I do not need to change the ownerships of the files itself in the folder.
Still very strange. Investigating LDAP settings and idmapper now closer.
I also cannot see any algorithm behind the fact which files are visible and which files not. The only thing I can tell is that always the same files appear to be invisible. The behaviour is constant.
I know it won’t be productive, but I got the same problem (only on OS 11.4) with an NIS system. So nothing to deal with your LDAP.
I’ll try to change the GID tomorrow.
BTW by googling around I found that the problem seems to be related to the readdir and that patches are on their way ([PATCH 2/2] NFS: NFSv4 readdir loses entries – Linux NFS](http://www.spinics.net/lists/linux-nfs/msg19019.html)).
Thanks for your reply… I am glad to hear that I am not alone with this problem.
Also thanks for the pointer to the patch. I will give it a try hopefully tomorrow morning.
Another thing I found out:
When I change the owner of the files/folder to an owner not being a member of the group of the files/folders it also seems to work.
Edit: I have seen kernel 2.6.37.4 was released just some minutes ago from the changelog it appears that it contains the fix for this problem.
It is solved now.
I just compiled kernel 2.6.37.4 on my 11.4 box and it finds again all files from my testset!
So everyone who is using nfs4 on opensuse 11.4 is well adviced to update to at least kernel 2.6.37.4. Elsewise you risk inconsistent data on file copies.
I hope there will be soon a replacement kernel from openSuse itself to be distributed by updated. The problem should be in all kernels 2.6.37 - 2.6.37.3.
I am a bit astonished that this (in my eyes major problem) slipped thru openSuse quality assurance cycle. It is sad that this problem taints the elsewise very good 11.4 version.
I haven’t finalized all my tests right now but everything looks very promising. If there are any news to mention on this I will post again here.
I’ll try to remember how to compile a kernel. It’s quite a long time ago I didn’t do that With the hope a kernel patch will come before I start compiling.
Indeed it is quite astonishing that this kind of problem could sweep through such a distribution.
NFS is not the only glitch: samba is not working well as Apparmor is broken, and the usual nvidia problems…
Daniel
EDIT: Yes changing the group <1000 seems to work, but I have 24TB of files the change…
cd /usr/src
tar xvfj /path/to/linux-2.6.37.4.tar.bz2
rm -rf linux
ln -s linux-2.6.37.4 linux
cd linux
zcat /proc/config.gz >.config
make oldconfig
(answer all questions regarding new functionality)
make
make modules_install
make install
edit /boot/grub/menu.lst
(to match your boot options and make the
labels for YAST unique in the comments)
reboot
Regarding samba: For over 10 years with using suse I go fine with always
having samba compiled on my own.
Thanks for the tip. I’m so far ago from slackware 3.1!
Apart from a slower “initrd”, a missing “preloadtrace.ko” and Apparmor being unable to start, things are working fine. At least I could survive till an “official” suse flavored kernel will appear (soon I hope).
I’ll try that later, as it is less important for the time being. Anyway as Apparmor is dead, no problem with Samba…
On 03/15/2011 04:06 AM, Real Rosch wrote:
> I am a bit astonished that this (in my eyes major problem) slipped thru
> openSuse quality assurance cycle. It is sad that this problem taints the
> elsewise very good 11.4 version.
Did YOU test the MX or RCX versions? In openSUSE, we rely on the users to help
with the testing.
As far as I can remember I gave the M4 a short try, but did not spot any severe problems in that area. But it was only a short testdrive. Maybe I did not realize that problem as there are no error messages or syslog entries. There are “just” some missing files on a cp -a or -r. Or maybe the problem was not present in that milestone.
Don’t get me wrong. I love OpenSuse and appreciate all the hard work of all contributors. I use Suse linux since version 5.2 but can not remember of any final release hitting me in such a central functionality of my work scenario. I simply did not expected that to happen. Maybe it was just bad luck. Beside of this the 11.4 looks to me very well done.
I hope there will soon be a kernel update distributed fixing this for all users.
On 03/15/2011 04:36 PM, Real Rosch wrote:
>
> lwfinger;2305132 Wrote:
>> Did YOU test the MX or RCX versions? In openSUSE, we rely on the users
>> to help with the testing.
>
> As far as I can remember I gave the M4 a short try, but did not spot
> any severe problems in that area. But it was only a short testdrive.
> Maybe I did not realize that problem as there are no error messages or
> syslog entries. There are “just” some missing files on a cp -a or -r. Or
> maybe the problem was not present in that milestone.
>
> Don’t get me wrong. I love OpenSuse and appreciate all the hard work of
> all contributors. I use Suse linux since version 5.2 but can not
> remember of any final release hitting me in such a central functionality
> of my work scenario. I simply did not expected that to happen. Maybe it
> was just bad luck. Beside of this the 11.4 looks to me very well done.
>
> I hope there will soon be a kernel update distributed fixing this for
> all users.
Kernel 2.6.37.4 is propagating through the system right now. I’m not sure how
soon it will be available, but it will be soon.
Did you see any problem with cups? I can’t use the managing interface and (http://localhost:631) and the yast interface seems to be broken for me when using this vanilla kernel.
cups admin interface listens and responds on 127.0.0.1:631 but not on external interfaces (as configured in /etc/cups/cupsd.conf).
Yast both works here in console/gui mode.
Maybe you broke something in when answering questions in make oldconfig?
But anyway lwfinger promised that there is a new “official” kernel in the pipeline. Maybe you should give this version a try then?
Roland
PS: I am currently also testing kernel 2.6.38 on a different machine. The nfs4 bug is also fixed there. And 2.6.38 is really noticable faster in filesystem lookups, but this is off-topic for this thread.