Broken between 20200930 (working) and 20201209 (not working).
I create a network namespace, via a script, to then run a VPN inside and open specific applications within that namespace. After the update commands like ping and dig do not work.
ping 8.8.8.8
ping: socket: Operation not permitted
Running commands as root (or even sudo) within the namespace works as expected. The routing and everything remains unchanged and works properly.
Looking at the changelog of various RPMs pertaining to network the only bit that seems relevant is iputils with the following:
Wed Oct 7 12:16:44 UTC 2020 - Matthias Gerstner <matthias.gerstner@suse.com>
- No longer invoke permissions macros for ping. It now uses ICMP_PROTO sockets
(bsc#1174504).
Seems like that might be part of it, but it doesn’t explain why other tools had access within the namespace before (like even web browsers).
If it was as simple as providing cap_net_raw to my user that would be workable, but one cannot add (without custom modules) such capabilities to users and adding to every executable I would want to use is definitely not realistic or logical. Using capsh does not seem to work either.
What is the expectation here? I do not want to run applications as root just to use a network namespace. I can post the full script, but it really should not be relevant since the namespace works and the issue is lack of capabilities.
As insane as the following looks this was the line of thinking I ended up with (tried numerous variations).
Access to ICMP_PROTO is controlled by net.ipv4.ping_group_range sysctl. It is reset to default in new net namespace. Default is to not allow any user, even root. If ICMP_PROTO fails, ping tries using raw sockets as fallback which explains why root works.
Provide full output of “strace -f dig …” in namespace, this may give some clue why it fails (upload to https://susepaste.org/).
That is entirely different issue than ping (which is why saying “nothing works” and providing only output of ping is highly misleading). dig tries to connect to 8.8.8.8 and 192.168.7.1 and gets connection timeout. And dig provides you with pretty clear diagnostic message what happens.
This appears to be a different problem then I encountered a few days ago. I have not touched the machine so that is something else to explore.
The network namespace is brought up via script that is even tracked via git. Nothing changed in the script, but post-update neither/nothing works. I understand the vagueness issue, but clearly something substantial changed in the update. I can post more things as they are useful, but certainly not my area of expertise.
If dig fails in a different way than ping I fail to see how that invalidates the original premise. I appreciate your attention thus far.
Dig does not work with sudo either. Either my memory is faulty or that changed.
$ sudo dig opensuse.org @8.8.8.8
; <<>> DiG 9.16.8 <<>> opensuse.org @8.8.8.8
;; global options: +cmd
;; connection timed out; no servers could be reached
Is there is an “obvious” solution when sudo ping 8.8.8.8? Some networking stack change of which I am unaware? Does sudo ping effectively escape the namespace via using of ICMP_PROTO?
I’ll need to check my bash history, but I also managed to have dig/git/firefox working on Friday, but have not managed to repeat it.
Now if one protocol/port works and another protocol/port does not, the first obvious suspect is firewall. Show
ip l
ip a
ip r
ip -6 r
iptables -L -n -v
ip6tables -L -n -v
in your namespace. Capturing traffic on your physical interface while testing dig in namespace could give some more hints (use tcpdump, wireshark, shark, dumpcap or whatever you are familiar with).
If you’re interested in a fresh set of eyes…
First, although anything is possible, I wonder how likely the network namespace might be broken by the update.
To test whether the update is actually the problem, you can roll back to a time before the update and then either stay or roll forward to undo the rollback once you’ve confirmed what you remember.
Your original problem appears to be different than what you are trying to solve later.
Wasn’t your original problem a permissions problem running ping and later seems to have evolved into inspecting firewall configuration/blocked ports which is something different?
That’s confusing.
You also stated that ping (ICMP) may be behaving differently than other other protocols.
If that’s the case, you should make a decision whether troubleshooting ICMP should be continued, or focus on the specific protocol you want to enable. ICMP is not a TCP protocol, so has different properties and may be implemented differently than more commonly used TCP/IP protocols.
After you either pin down exactly what the network namespace problem is (define your problem!) or determine the problem doesn’t really have to do with network namespaces, I assume you actually don’t want a different network configuration for each of your paired network namespaces (or correct me here). You didn’t explain your purpose for using a network namespace which may affect my understanding what you’re trying to accomplish so I’m guessing at least for the moment that you only want to confine User and network functionality to modify the network to that User but not modify the network settings by default (From a basic configuration, the User can still set up the VPN or your script can set up in a different script module).
Regardless of all these questions in my mind,
I’d recommend you try setting up a new network namespaces pair using your script (In fact, if you feel comfortable enough, pls post the script but with private information changed to protect your actual use). If it works, then your breakage was likely caused by the upgrade or something random that might never be known for sure. But, if it’s still broken, then the commands in your script likely need to be inspected.
My original take was that ping was the core of my problem and thus trying to add network capabilities to resolve it, but I realize later that multiple problems exist and the primary area I care about is TCP/IP.
I cannot rollback as I do not have a snapshot containing the prior version (aside from the kernel) and the snapshot has been removed from TW Snapshots repository so short of rebuilding it I would be hard pressed to rollback.
My initial expectation was that something specific to network namespaces was changed and was hoping for a gut reaction from someone familiar, but after continued exploration and responses here it seems that is perhaps not the case. I do not believe the firewall is my issue as I have even tried disabling it entirely, but perhaps there is some network magic that does not work as expected.
The goal is to run a VPN for specific applications, shell, browser, remote desktop, etc. while not effecting the default connection. Since I need to be able to visit the same addresses with and without the VPN simultaneously no clever routing rules solve the problem and thus the network namespace.
I always rebuild the namespace from the script (which has not changed - it’s in git and synced with a remote server). I’ll take a look at posting the relevant meat of it tomorrow.
Several days ago I did manage to get it working, while playing with capsh and what not. I have a screenshot and witness to verify I am not insane. I thought my problems were behind me, but alas I have not been able to recreate that success. This also leads me to believe either something is borked or I’m a centimeter from getting it working again.
How are you isolating User access, are you using your network namespaces with VMs, containers and if so what container technology?
I’m assuming network namespaces work similar to how namespaces are used elsewhere, and I’m not familiar with any namespace technology that requires permissions integration, in fact namespaces in general give you the ability to isolate without having to configure other things like permissions. But, user access eg in a container may be supporting a User security context.
Summarizing at the moment…
I’m assuming that you’re back to your original problem which is that ping and dig both don’t work (doublecheck).
Let’s focus on dig which is likely operating like a typical DNS client, trying to query using UDP port 53.
Verify you’re still getting “no servers found” which typically means that TCP/IP is generally working but is actually misconfigured(can’t find 8.8.8.8 in your example), probably lacking a Default Gateway in this case. You don’t have a more serious network problem like mis-matched network configuration settings or no functionality because then a different error would be thrown. It’s possible also that your machine is only looking at its end of the network namespace and unable to know what’s happening on the other end.
This means you should first verify you have a valid IP address within the namespace, and then verify a working Default Gateway.
And, re-inspect how your network namespace pair is being defined.
So this is veth pair and system to which peer interface is connected must perform forwarding. That is really basic network troubleshooting. Check settings on your peer system. Is forwarding enabled, is firewall active, etc. If you provide diagram or detailed enough description of your network some more specific answer may be possible.
I’ll start digging into network config and try to see if there is something that is no longer setup correctly (which assumes something changed).
The meat of the script I have uploaded to pastebin the rest is management bits, VPN setup, and conveniences for executing gui and non gui applications within the namespace. None of that should be relevant since the base namespaced network TCP stack is not working. Presumably once that works the rest will work or is a separate problem anyway.
Alright…it’s a combination of things to get it working again and this explains perfectly how I had it working once as I was playing with these bits. I would also be curious to understand what changed and was it intentional.
If I:
systemctl stop firewalld
That does not solve the problem, but re-creating the namespace afterwards (could have sworn I did this many times) and it is working. I can do this repeatedly and it works.
Looking at firewalld changelog I see:
- Remove the patch which enforces usage of iptables instead of
nftables: