route only selected ports over VPN

So I have a VPN that is set up correctly and that works.

The VPN address is 10.8.0.6 and its gateway is 10.8.0.5.

The regular interface to the web is 192.168.103.254.

My address is like 192.168.103.203 (DHCP).

The main routing table without anything is:

default         192.168.103.254 0.0.0.0         UG    1024   0        0 wls1
192.168.99.1    192.168.103.254 255.255.255.255 UGH   10     0        0 wls1
192.168.100.0   *               255.255.252.0   U     0      0        0 wls1

With VPN set up, normally it is:

default         10.8.0.5        0.0.0.0         UG    0      0        0 tun0
default         192.168.103.254 0.0.0.0         UG    1024   0        0 wls1
10.8.0.0        10.8.0.5        255.255.255.0   UG    20     0        0 tun0
10.8.0.1        10.8.0.5        255.255.255.255 UGH   20     0        0 tun0
10.8.0.5        *               255.255.255.255 UH    0      0        0 tun0
<vpn host>      192.168.103.254 255.255.255.255 UGH   0      0        0 wls1
localhost       192.168.103.254 255.255.255.255 UGH   0      0        0 wls1
128.0.0.0       10.8.0.5        128.0.0.0       UG    20     0        0 tun0
192.168.1.0     10.8.0.5        255.255.255.0   UG    20     0        0 tun0
192.168.99.1    192.168.103.254 255.255.255.255 UGH   10     0        0 wls1
192.168.100.0   *               255.255.252.0   U     0      0        0 wls1

I’ve edited for privatise, it is the direct route to the VPN host.

The second default is NetworkManager’s doing, it won’t allow me to remove the default route it is managing (which is wls1 / wifi).

Actually this question is about not routing over VPN by default, so I will delete the first default route (to the VPN):

…# ip route del default via 10.8.0.5

Now VPN is practically inactive. What I want is certain ports to be directed towards VPN, which is tun0 / 10.8.0.5.

The recipes on internet are fairly simple and consistent with each other, but it doesn’t work.

  1. Add “mark” to packets destined for certain ports. We will take everything except 80 and 443, we will route that through the VPN:

…# iptables -t mangle -A OUTPUT -p tcp -m multiport ! --dport 80,443 -m state --state NEW -j MARK --set-mark 1

  1. Add a new table or at least use a new one. We shall call it table 1 with name “open”. The name is irrelevant:

…# echo “1 open” >> /etc/iproute2/rt_tables

  1. Add a new default route for packets going to table 1:

…# ip route add default table 1 via 10.8.0.5

  1. Add a rule for sending marked packets to table 1:

…# ip rule add fwmark 1 table 1

  1. Make sure packets with a source belonging to wls1 are given a source of 10.8.0.6 before being sent off:

…# iptables -t nat -A POSTROUTING -o tun0 -j SNAT --to-source 10.8.0.6

At this point you would expect things to work. But the kernel is said to check reverse routes to consider if we are not being spoofed in any way. So it is said that the kernel will check, of every packet, whether the packet is coming IN through an interface that would have been used if the packet were to be sent *OUT. *But it might not consider my fancy firewall rules while doing that; only the routing table. And then, if an application binds to 192.168.103.203 (wls1) but the destination gets rewritten to go through table 1, that might not be the normal default route. I’m not sure how this logic works, but it’s called the “reverse path” filter or something. The guides say:

…# for f in /proc/sys/net/ipv4/conf/*/rp_filter; do echo 0 > $f ; done

to disable this rp-filtering.

However, nothing works. Packets (I always call them packages) get sent out of the interface (tun0) and also get replies sent back to them:

listening on tun0, link-type RAW (Raw IP), capture size 262144 bytes
IP 10.8.0.6.41735 > 85.17.251.149.21: Flags [S], seq 3834239717, win 29200, options [mss 1460,sackOK,TS val 179609710 ecr 0,nop,wscale 1], length 0
IP 85.17.251.149.21 > 10.8.0.6.41735: Flags [S.], seq 138399698, ack 3834239718, win 28960, options [mss 1366,sackOK,TS val 1910536405 ecr 179609710,nop,wscale 7], length 0
IP 85.17.251.149.21 > 10.8.0.6.41735: Flags [S.], seq 138399698, ack 3834239718, win 28960, options [mss 1366,sackOK,TS val 1910536755 ecr 179609710,nop,wscale 7], length 0
IP 85.17.251.149.21 > 10.8.0.6.41735: Flags [S.], seq 138399698, ack 3834239718, win 28960, options [mss 1366,sackOK,TS val 1910537255 ecr 179609710,nop,wscale 7], length 0
IP 85.17.251.149.21 > 10.8.0.6.41735: Flags [S.], seq 138399698, ack 3834239718, win 28960, options [mss 1366,sackOK,TS val 1910538355 ecr 179609710,nop,wscale 7], length 0
IP 85.17.251.149.21 > 10.8.0.6.41735: Flags [S.], seq 138399698, ack 3834239718, win 28960, options [mss 1366,sackOK,TS val 1910540355 ecr 179609710,nop,wscale 7], length 0
IP 85.17.251.149.21 > 10.8.0.6.41735: Flags [S.], seq 138399698, ack 3834239718, win 28960, options [mss 1366,sackOK,TS val 1910544355 ecr 179609710,nop,wscale 7], length 0

You can see how the Syn packet gets sent, and the server (it is an ftp server) is trying to reply with SYN-ACK, hoping to get an ACK back as part of the TCP handshake. But the ACK never gets returned.

For some strange reason all my connections with forums.opensuse.org also go through the tunnel at this point, but new connections go through the regular wlan (vls1).

I just have no clue why my routing setup doesn’t work. I have followed all the guides and done all the things. The kernel must still be dropping packets and I don’t know why. Meanwhile some connections (port 443 with OpenSUSE stuff currently) still get tunneled over tun0 for no reason whatsoever? :P. But the things that should be going over the VPN don’t get any returns/results.

(Maybe that 443 anomaly is just because the caching proxy server has bound to 10.8.0.6 as well as the other IP …). (They are not getting diverted, it’s currently the non-443 non-80 traffic that gets diverted onto the tunnel.)

Does anyone know what I can do to investigate? Or what the actual solution here is going to be?. I thought it was going to be so easy, and this has already taken me many many hours.

No one? I might need to install Kubuntu to investigate there, to see if I have the same issue there. It might be a kernel problem.

Regards, B.