rsnapshot/rsync: massive performance decrease

michael_of · October 24, 2018, 11:14pm

Hi all,

I’ve a VPS with Leap 15.0. Originally 42.1, upgraded several times. Working fine.

As I haven’t that much changing disk data from day to day, I’m doing for years now rsnapshot backups from my VPS to my Leap 15.0 box at home, or exactly to an usb 3.0 hdd connected to my box at home.
Working for years now, with fine network speed.

Starting a while ago, network performance dramatically decreased, to only about 80 KB/s.

Asked in #SUSE, user “tacit” advised me to check with netcat for the general TCP speed. Did, general TCP speed is as to be expected more or less based on my VDSL connection (50 MBit/s VPS->box@home, 10 MBit/s box@home->VPS). Tried also with the “speedtest” utility from pkg “speedtest_cli”, both at home and on VPS. And VPS really seems to have the 100 MBit/s download and upload bandwith.

So seems to be an encryption/ssh/whatever issue. Tried an isolated SCP for a large file, voila: Only ~ 80 KB/s

As I haven’t changed anything regarding ciphers etc., at least for the last years, afair only when setting up ssh for the first time years ago, NO IDEA why now suddenly network performance decreased that much.
I can’t say exactly when this started, as at first I’ve thought that these slow backups might be because of temporary network issues.

Any hints what might be wrong? Any hints where to start?

Thanks,
Michael

michael_of · October 24, 2018, 11:34pm

Tested a little bit more, and found out that it’s for sure a ssh issue:

I’ve performed the same scp command several times in parallel, different bash sessions. Each of them runs with the SAME slow network speed, so it’s not a network and not a limited cpu etc. issue, 10 times 80 KB/s are 800 KB/s >:(

michael_of · October 31, 2018, 11:52pm

Wow!

I’ve got the hint from my VPS’ hosting company to be sure that this might not be an IPv6 related issue.

And in fact: IT IS a IPv6 related Issue !!!

scp -v  michael@<hostname, resolving to IPv6>:~/tmp/file.test .

runs with ~80 KB/sec

wheras

scp -v  michael@<IPv4>:~/tmp/file.test .

runs with more than 5 MB/sec !!!

Any hints why??? Known issue???

tsu2 · November 1, 2018, 1:06am

Maybe you should start by analyzing the path and router responsiveness by ding an IPv6 trace route… The problem could be a router and not a problem at either end point.
You could also do a tcpdump and import the dump into wireshark to see if there is any unusual traffic like outright errors or non-responses, re-sends.

TSU

michael_of · November 1, 2018, 4:10pm

TSU, funny:

traceroute6 is much faster than traceroute:


$ time /usr/sbin/traceroute vserver.domain.tld
traceroute to vserver.domain.tld (IPv4), 30 hops max, 60 byte packets
 1  fritz.box (192.168.2.1)  0.529 ms  0.886 ms  0.973 ms
 2  IPv4 (IPv4)  16.881 ms  18.162 ms  20.468 ms
 3  217.0.158.34 (217.0.158.34)  20.377 ms  21.515 ms  22.050 ms
 4  * * *
 5  ae-2-3204.edge7.Frankfurt1.Level3.net (4.69.159.38)  35.324 ms  36.263 ms ae-1-3104.edge7.Frankfurt1.Level3.net (4.69.159.34)  39.290 ms
 6  gw02-n.contabo.net (195.16.162.234)  34.719 ms  26.465 ms  27.070 ms
 7  domain.tld (IPv4)  26.002 ms !X  24.859 ms !X  25.230 ms !X

real    0m5,007s
user    0m0,001s
sys     0m0,007s

$ time /usr/sbin/traceroute6 vserver.domain.tld
traceroute to vserver.domain.tld (IPv6), 30 hops max, 80 byte packets
 1  fritz.box (IPv6)  0.682 ms  1.359 ms  1.749 ms
 2  p......dip0.t-ipconnect.de (IPv6)  17.574 ms  18.128 ms  18.387 ms
 3  dtag-ic-319284-ffm-b4.c.telia.net (2001:2000:3080:104f::2)  20.847 ms  20.834 ms  20.905 ms
 4  ffm-b4-link.telia.net (2001:2000:3080:104f::1)  21.177 ms  22.373 ms  25.193 ms
 5  ffm-bb4-v6.telia.net (2001:2000:3019:6a::1)  29.166 ms  29.924 ms  31.225 ms
 6  nug-d-i40-v6.telia.net (2001:2000:3018:8d::1)  32.046 ms  23.850 ms  24.097 ms
 7  contabo-ic-305268-ffm-b11.c.telia.net (2001:2000:3080:953::2)  74.829 ms  77.704 ms  78.066 ms
 8  domain.tld (IPv6)  80.430 ms !X  79.301 ms !X  80.444 ms !X

real    0m0,106s
user    0m0,000s
sys     0m0,007s

I’ll read about tcpdump and give it a try, interesting

michael_of · November 2, 2018, 2:11pm

Did as recommended, on tcpdump during an IPv4 transfer, one for an IPv6 transfer, same file scp-ed two times.

I’m not very familiar with wireshark, but at a first glance both dumps look pretty similar. a few re-transmissions, both IPv4 and IPv6, even slightly more for the much faster IPv4 transmission. Afai see no “non-responses”.

No idea what I might also looking for.

As said, strange as it looks for me:

If I start IPv6 transfers in parallel, ALL of them are are running with same slow speed, 1/40 of the IPv4 speed. Tried just 20 parallel IPv6 scps, all together they’re running with 50% of the speed of a single IPv4 scp.

So IMHO a) network bandwith and b) cpu power for encryption could not be the reason for this mystic to me issue

michael_of · November 14, 2018, 11:36am

Finally I’ve got an idea what might be wrong.

My vps hosting company offers in parallel to the vps itself a couple of rescue systems. For years now Linux based SystemRescueCD, and quite recently Debian 9 Livesystem and Clonezilla. So I stopped my vps, started Debian 9 Livesystem, configured missing IPv6, and tested. To be sure that’s for sure not an OpenSuse issue not only with my Leap 15 box@home. but also with my Android smartphone:

→ Same results, IPv6 much, much slower than IPv4

Armoured with this I’ve told my hosting company’s tech support that they don’t need any root access to my running vps anymore, which I don’t like. Instead they can test with the Debian 9 Livesystem. And they did, and found out that the IPv6 bottleneck is ONLY between the hosting company’s carriers (Level3/CenturyLink, Telia, Versatel) and my telco provider Deutsche Telekom, at home. They informed me, late, but better than never, that the peerings between Deutsche Telekom and their carriers are, esp. in the evenings, very overburdened, that’s why my experiences are as they are.

Guessing facts until this, the following should be said subjunctively: In my hosting company’s point of view eliminating these bottlenecks is within the responsibility of the major German ISP, Deutsche Telekom. Which seems to be not willing to do this, instead they’re “offering” direct peering agreements, based on much higher than usual market prices.

If this is true, IMHO this would be a clear breach net neutrality: I’ve asked Deutsche Telekom for a response.

glistwan · November 14, 2018, 4:22pm

michael_of:

Finally I’ve got an idea what might be wrong.

My vps hosting company offers in parallel to the vps itself a couple of rescue systems. For years now Linux based SystemRescueCD, and quite recently Debian 9 Livesystem and Clonezilla. So I stopped my vps, started Debian 9 Livesystem, configured missing IPv6, and tested. To be sure that’s for sure not an OpenSuse issue not only with my Leap 15 box@home. but also with my Android smartphone:

–> Same results, IPv6 much, much slower than IPv4

Armoured with this I’ve told my hosting company’s tech support that they don’t need any root access to my running vps anymore, which I don’t like. Instead they can test with the Debian 9 Livesystem. And they did, and found out that the IPv6 bottleneck is ONLY between the hosting company’s carriers (Level3/CenturyLink, Telia, Versatel) and my telco provider Deutsche Telekom, at home. They informed me, late, but better than never, that the peerings between Deutsche Telekom and their carriers are, esp. in the evenings, very overburdened, that’s why my experiences are as they are.

Guessing facts until this, the following should be said subjunctively: In my hosting company’s point of view eliminating these bottlenecks is within the responsibility of the major German ISP, Deutsche Telekom. Which seems to be not willing to do this, instead they’re “offering” direct peering agreements, based on much higher than usual market prices.

If this is true, IMHO this would be a clear breach net neutrality: I’ve asked Deutsche Telekom for a response.

Interesting problem and thank you for the follow up. I wonder how this story will end.