I am using openSUSE 11.2 and had the same issue once before in openSUSE 11.1
After some time, two domains I use very often can’t be accessed any more, these are: Google and api.openoffice.org (am doing some OO development recently). At the time where I write this post, only these two are “blocked”.
When I try to access these two domains using Firefox, w3m or telnet, it tries to open the connection forever. Ping works fine.
I tried opening the IP directly in the browser, but same problem, the connection never reaches.
I know it’s not a router issue, as the other computers can access these domains. Still, I tried to restart the router, but the issue persists.
I tried restarting the network service (/etc/init.d/network stop + start), also nscd, without success.
The only way I found to make it work again is to reboot.
This problem reappears from time to time, after using the computer for a long time, and accessing these domains a lot of times.
Note that I also use suspend to disk, so the “blocked” state is kept after resume. Only reboot “cures” it.
I can ping both the google.com and its IP.
But if I open both using firefox, w3m or even telnet, the connection is never established.
At some time I thought that maybe it was an old cached IP, so I tried pinging google from another computer, and use that IP on my computer. Same issue.
I tried to disable the firewall, unload all the firewall modules, I can still not connect to google.
What is even more strange is that it’s always the same domains that don’t work, now it is google.com, google.de, google.fr, my imap server on yahoo and the URL of Bejewelled Blitz in Facebook. It is almost as if something knows the domain names I use frequently, and blocks them somehows.
It’s not a DNS issue, as the IPs can be resolved. When I try to telnet, it stays stuck at the connection stage.
I also tried to run wireshark to see what’s going on: one suspicious thing is that I get many HTTP RST packets followed by matching “TCP Previous segment lost”. I have no clue what is causing the reset here.
After reboot it will work fine again.
My wife’s Ubuntu doesn’t have this problem, she can access those domains while I can’t.
Another thing I found out with Wireshark, is that Google actually responds. My computer sends one SYN packet and I get about 4 ACK packets, then my computer sends a SYN packet again, followed by 2-3 ACK packets from Google, and so on, until it gives up. For some reason, it seems that my computer doesn’t aknowledge the ACK packets from Google, or they might be too many ?
I tried the same with Yahoo, which stil works, and I saw that Yahoo only sends a single ACK packet, then the TCP connection continues normally.
The issue appears both when using the wireless network card and when using a network cable connected to my router (which is a TP-LINK WR340G)
As someone advised, I changed the MTU to a smaller value, like 1490 instead of 1500. It made Google work for about two minutes, then it was broken again.
I gave up and rebooted. Next time it happens again, if I find some new information I will post it here. I hope one day I can solve this mystery.
A suggestion which may or may not help…
When I’m developing, I disable caching everywhere I can… At the browser, any proxy caching (if you’re behind a Proxy Firewall).
Changing the MTU might have an effect only if you (or someone in your Internet chain of routers) are using very old hardware… about 7 or more years old. All newer hardware (that isn’t faulty) shouldn’t have “old” MTU settings.
If it really is only these two Domains, have you done a trace route? Or more precisely you can do a similar analysis using something like tcptraceroute which can test the specific protocol route? You might be able to pinpoint the specific hardware or point of congestion…
Yes it is always the same series of domains that are not accessible: www.google.*, www.tp-link.com, pop.yahoo.com and some others I forgot.
The problem doesn’t appear right now as I have booted recently. As soon as it appears again, I will try traceroute and tcptraceroute. I will also disable the cache in Firefox next time when texting.
Note that I’m not using any proxy nor firewall (disabling the SUSE firewall didn’t fix the issues). Also, I already had disabled IPv6, in case it would be related.
In the three-way handshake used to initiate a TCP/IP connection, A (you) sends a segment with the SYN bit set, B responds with SYN,ACK, A then sends ACK (but no more SYN) . At this point, the connection is established and data can be transferred.
if you keep sending SYN packets, the connection with the other end has not been established successfully.
vincent:~ # tcptraceroute www.google.com
Selected device wlan1, address 192.168.1.3, port 48557 for outgoing packets
Tracing the path to www.google.com (74.125.77.147) on TCP port 80 (http), 30 hops max
1 * * *
2 * * *
(a few more with three stars)
13 * * *
14 ew-in-f147.1e100.net (74.125.77.147) [open] 59.605 ms 59.721 ms 61.866 ms
vincent:~ # ping www.google.com
PING www.l.google.com (74.125.77.99) 56(84) bytes of data.
64 bytes from ew-in-f99.1e100.net (74.125.77.99): icmp_seq=1 ttl=56 time=61.9 ms
Like I said, ping works but I can’t connect using http or https.
@malcolmlewis: I tried using the Google DNS. The result is the same as above, but with a different google IP.
@please_try_again:
Here the three way handshake doesn’t complete, it does this:
A (my computer) sends a SYN packet.
B (google server) responds with SYN, ACK
B (google server) responds again with SYN, ACK multiple times.
A sends a RST packet.
A sends SYN again from the same outgoing port.
Then B answers again, etc.
After some time, it will retry with another outgoing port.
Normally, A would answer ACK to B before step 3.
It is almost like my host ignores the response from step 2.
If I compare the wireshark traffic with Yahoo, Yahoo has a clean handshake without any duplicate packets.
It is strange that I have to reboot to fix the issue. It is almost as if the kernel itself (network part?) would be in a bad state that can’t be fixed by unloading/reloading modules.
Well,
That’s pretty obvious as a starting point… Your problem isn’t even getting past your local Internet Gateway although your DNS is working.
Try using traceroute, too.
I suspect either hardware failure or a firewalling problem at your Gateway blocking TCP packets, particularly if traceroute works since both traceroute and ping use ICMP (not TCP) packets.
You didn’t describe specifics about your IG, do you manage it? Do you know if your successes/failures happen regularly at specific times?
Traceroute gave me similar results like tcptraceroute.
This is a home network, and I use a TP-Link router connected to my ISP using a DSL modem. I’m pretty sure the problem is not my ISP because before I was living in another country and used the same router, and got the same issue.
My TP-link gateway/router doesn’t have firewalling enabled, also no NAT nor anything particular. I also upgraded the firmware after moving but it didn’t fix the issue. Even if the problem was the router, when I restart it (unplug AC cable, wait a few seconds, plug it again), the problem still persists until I reboot my laptop.
Disabling the SUSE firewall also doesn’t solve it.
If it was the gateway blocking packets, then I should still see my laptop sending out an ACK packet after receiving SYN, ACK from google, using wireshark. But it doesn’t, so the problem is likely on my computer.
The problem seems to appear at least once per day now, often around 6-7 PM. It could be after my laptop has been running for a certain time. It also seems to occur mostly when my wife is also connected to the network for a certain time.
I’ll try to find out more precisely when it happens.
Means that the problem is in the transport layer (TCP/IP), not in the Internet layer (IP)
Packets might be rejected if one end is too fast or too slow for the other end. Something like that could happen between a fast router and a slow netcard or between a fast gateway and a slow router. You might have a bottleneck somewhere. Also “clever” router can do packet filtering queuing to give some services or some protocols precedence or priority (like VOIP for example), even the handshake can be done in a separate queue. It could drastically increase … or decrease network performance.
At this point I generally advise to consider weighing the time (and instructional benefit) from further investigation vs the simple try of buying a replacement router (often about $60 US) with the hope it’s a hit and miss solution.
Once you’ve established it’s a Layer2 vs Layer3 issue, troubleshooting becomes more technical and difficult (IMO), your current tests have verified it’s definitely within your network (router/client) issue and not your ISP’s equipment.
The problem seems to be independent of my wife’s laptop being in the network. It occured to me once or twice while she was not in the network.
I tried checking the uptime when the issue occurs, but it is not consistent. It seems it happens after at least 10 hours uptime, whether or not split by hibernate.
Thanks for the suggestion Tony, I will consider buying a new router later, maybe one that runs Linux
The thing I can’t understand is why when I reboot the router the problem still exists. Maybe the router is messing up my laptop’s network stack/modules and leaves them in a buggy state.
I have some new information. I’ve got another laptop running openSUSE 11.3 and that has the exact same problem, but it doesn’t happen exactly at the same time. For example while one might have trouble connecting to google like described above, the other one still can.
One thing both laptops have in common is that they both have the following wireless network card:
04:00.0 Network controller: Intel Corporation WiFi Link 5100
using the iwlagn kernel module.
The openSUSE version is the same (11.3), at the same update/patch level.
I haven’t tried yet whether the problem also occurs with a network cable on the other laptop.
On 2011-01-18 20:36, PVince81 wrote:
>
> I have some new information. I’ve got another laptop running openSUSE
> 11.3 and that has the exact same problem, but it doesn’t happen exactly
> at the same time. For example while one might have trouble connecting to
> google like described above, the other one still can.
>
> One thing both laptops have in common is that they both have the
> following wireless network card:
> 04:00.0 Network controller: Intel Corporation WiFi Link 5100
> using the iwlagn kernel module.
>
> The openSUSE version is the same (11.3), at the same update/patch
> level.
Put one laptop with another linux distro, try to reproduce the problem. If
the other laptop does not have problems, it is an openSUSE problem.
Otherwise, router or card - perhaps.
–
Cheers / Saludos,
Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)
Try this which changes the kernel’s Congestion Control algorithm to one that’s more suited to analyzing packet loss related to wireless. If it improves your situation, it’s only a mitigating solution. Assuming your wireless connection is line of sight (no walls/floors, etc) and no electromagnetic interference and if you see an improvement you should consider new hardware.
First, you might consider installing and running nTop, then launching “http://localhost:3000” as one way to identify packet loss and current throughput.
I’m days away from publishing the following procedure, here is an excerpt from my current draft, but specifying the veno algorithm recommended for wireless…
Although the new algorithm is effective immediately, it can take some time (minute or a few minutes) for changes to start taking effect. Best is to run these commands immediately after booting and before you start transferring files. The newly specified algorithm would be effective only for your current system session (until next reboot)
Tony
Typical procedure to check, change Congestion Control algorithm
View currently configured Congestion Control Algorithm
Load new Pluggable Congestion Control Algorithm
View currently configured Congestion Controlled Algorithm
View currently configured Congestion Control Algorithm
# cat /proc/sys/net/ipv4/tcp_congestion_control
Load new Congestion Control Algorithm (effective immediately without reboot)
Using sysctl (preferred method)