Slow DNS queries

I have been having some problems with my installation of openSUSE 11.0 on my laptop… DNS queries are really slow with all applications that make use of DNS.

I have searched around for this problem and it seems that many people reported the culprit to be IPV6. I have disabled IPV6 and the situation has not changed. The output of dmesg does not show anything out of the ordinary. Other systems logs do not show anything unusual. I have tried disabling the firewall and that does not make a difference. I have tried different DNS servers and i get the same results. Different networks (home, work, friends house) produce the same results.

If I ping my ISP’s dns server by IP, the ping starts instantly. If I use the hostname, there is about a 5-10 second delay. The same happens with any other IP/hostname combination. Other computers on the same network do not have this issue.

So… what else can i do or check ?

Is nscd not running by some chance?

When I checked Yast->System Services (RunLevel) it was set to run at runlevels 3 and 5, but it was not actually running. I doubled checked with ‘ps aux | grep nscd’ just to be sure, and the service was definitely not running.

In Yast I requested the service to start and it prompted me that boot.clock need to be started as that service was depended on…

Now things are a bit better. Initial look-ups are still slow, but sub-sequent look-ups are being properly cached.

Now for some reason I don’t think this is a permanent fix as boot.clock is only set to run at levels 0 and 6…

That’s normal. boot.clock is only run on bootup and shutdown.

Well its me again…

At first things were working a bit better… then it stoped.

So i dug around again in my logs and found some errors related to nscd…

Jun 26 23:57:45 linux kernel: type=1505 audit(1214524656.878:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2154
Jun 27 05:45:43 dynames kernel: type=1505 audit(1214559936.867:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2070
Jun 27 07:41:17 dynames kernel: type=1505 audit(1214566870.995:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2074
Jun 27 17:07:41 dynames kernel: type=1505 audit(1214600854.155:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=1973
Jun 27 22:46:34 dynames kernel: type=1505 audit(1214621187.595:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=1966
Jun 27 23:48:26 dynames kernel: nscd[3085]: segfault at 7a15 ip b80527c1 sp afe9103c error 4 in nscd[b8042000+1c000]
Jun 30 11:55:17 dynames kernel: type=1505 audit(1214841310.971:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2013
Jul  3 13:00:48 dynames kernel: type=1505 audit(1215104442.307:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2076
Jul  3 15:41:12 dynames kernel: nscd[3165]: segfault at e4ea282d ip b805a99d sp afe9a180 error 5 in nscd[b804b000+1c000]
Jul  3 16:28:45 dynames kernel: type=1505 audit(1215116918.099:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2061
Jul  3 17:39:08 dynames kernel: type=1505 audit(1215121141.594:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2046
Jul  3 18:35:49 dynames kernel: type=1505 audit(1215124542.195:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2033
Jul  3 18:45:09 dynames nscd: 6069 invalid persistent database file "/var/run/nscd/services": verification failed
Jul  3 19:30:29 dynames kernel: type=1505 audit(1215127822.704:9): operation="profile_load" name="/usr/sbin/nscd" name2="default" pid=2055

Any ideas ?

Try deleting that corrupted cache file and restarting nscd.

On Thu, 2008-07-03 at 22:36 +0000, dcabanaw wrote:
> I have been having some problems with my installation of openSUSE 11.0
> on my laptop… DNS queries are really slow with all applications that
> make use of DNS.
>
> I have searched around for this problem and it seems that many people
> reported the culprit to be IPV6. I have disabled IPV6 and the
> situation has not changed. The output of dmesg does not show anything
> out of the ordinary. Other systems logs do not show anything unusual.
> I have tried disabling the firewall and that does not make a difference.
> I have tried different DNS servers and i get the same results.
> Different networks (home, work, friends house) produce the same
> results.
>
> If I ping my ISP’s dns server by IP, the ping starts instantly. If I
> use the hostname, there is about a 5-10 second delay. The same happens
> with any other IP/hostname combination. Other computers on the same
> network do not have this issue.
>
> So… what else can i do or check ?
>
>

More than likely it’s all of the mdns junk. Disable the mdns
stuff and avahi, etc in your runlevel editor. Edit /etc/host.conf
and add:
mdns off

See if that helps.

I have disable avahi and mdns. I have removed old cache as well. It seems that the problem is still around.

After browsing the net for a bit (20 or so sites) and using other apps that need DNS, I checked the stats with ‘nscd -g’ and it produces the following:

hosts cache:

            yes  cache is enabled
             no  cache is persistent
            yes  cache is shared
            211  suggested size
         216064  total data pool size
            152  used data pool size
            600  seconds time to live for positive entries
              0  seconds time to live for negative entries
              0  cache hits on positive entries
              0  cache hits on negative entries
              7  cache misses on positive entries
              0  cache misses on negative entries
              0% cache hit rate
              1  current number of cached values
              4  maximum number of cached values
              1  maximum chain length searched
              0  number of delays on rdlock
              0  number of delays on wrlock
              0  memory allocations failed
            yes  check /etc/{hosts,resolv.conf} for changes

services cache:

            yes  cache is enabled
            yes  cache is persistent
            yes  cache is shared
            211  suggested size
         216064  total data pool size
            400  used data pool size
          28800  seconds time to live for positive entries
             20  seconds time to live for negative entries
              0  cache hits on positive entries
              0  cache hits on negative entries
              3  cache misses on positive entries
              1  cache misses on negative entries
              0% cache hit rate
              3  current number of cached values
              4  maximum number of cached values
              0  maximum chain length searched
              0  number of delays on rdlock
              0  number of delays on wrlock
              0  memory allocations failed
            yes  check /etc/services for changes

Under hosts cache should “no cache is persistent” be yes ? If so where would I change it? It also seems that the cache is not being very useful (cache misses) ? In any case something does not seem right.

Another thing that bothers me is that any other computer on the same network has very fast DNS response times even if their cache’s are cleared. Even this machine has had fast response time when it was running XP and suse 10. The name servers they all use are only one hop away, and are physically located in the same building… so should I not expect similar performance on my laptop compared to other systems on the network?

Thanks again for all the help so far.

Additional info:

dynames:~ # cat /etc/host.conf 
#
# /etc/host.conf - resolver configuration file
#
# Please read the manual page host.conf(5) for more information.
#
#
# The following option is only used by binaries linked against
# libc4 or libc5. This line should be in sync with the "hosts"
# option in /etc/nsswitch.conf.
#
order hosts, bind
#
# The following options are used by the resolver library:
#
multi on
mdns off

dcabanaw wrote:
> I have disable avahi and mdns. I have removed old cache as well. It
> seems that the problem is still around.
>
> After browsing the net for a bit (20 or so sites) and using other apps
> that need DNS, I checked the stats with ‘nscd -g’ and it produces the
> following:
>

IMHO, nscd is evil. Install bind and run a caching DNS instead if
that’s interesting.

Just my opinion.

I had the the same problem with nscd.
It was using both CPUs at full speed.
killing the battery on my laptop.
Installing bind fixes DNS queries,
but nscd caches password and group
requests which can be expensive if you
have a large password file. nscd looks
like it was deadlocked in my case from
the strace output:

strace -f -p 4961

Process 17958 attached with 15 threads - interrupt to quit
[pid 4961] time( <unfinished …>
[pid 4962] restart_syscall(<… resuming interrupted call …> <unfinished …>
[pid 4961] <… time resumed> NULL) = 1215219993
[pid 4961] epoll_wait(12, <unfinished …>
[pid 4963] restart_syscall(<… resuming interrupted call …> <unfinished …>
[pid 4965] restart_syscall(<… resuming interrupted call …> <unfinished …>
[pid 17273] futex(0xb7fb1464, FUTEX_WAIT_PRIVATE, 3018, NULL <unfinished …>
[pid 17958] futex(0xb7fb1464, FUTEX_WAIT_PRIVATE, 3016, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17958] futex(0xb7fb1464, FUTEX_WAIT_PRIVATE, 3018, NULL^C <unfinished …>

I will build a debug version to see if I can get more
information.

I will likely setup a bind caching server… but I want to figure out why initial queries take so long first. My thinking is that if another computer on the network with caches cleared can do a query in less than a second, then my laptop should have similar results on the same network, using the same name servers.

I guess at this point I will need to run a sniffer and see what traffic is actually leaving my laptop and go from there. I am thinking I should disable all DNS caching methods and setup a performance baseline. From there I can start making changes and compare to that baseline.

As far as nscd goes, is there anything else I should do to disable it besides turning it off in the runlevel config?

Again, thank-you for all the input and help so far.

Have you tried a direct query against the nameservers in /etc/resolv.conf to see how long it takes?

dig @nameserverip domainname

Make sure you cut and past the IP addresses from /etc/resolv.conf so that you don’t have any transcription errors. If the problem is slow answers, then any kind of caching, nscd or BIND is just a bandaid solution. It may require more detective work to work out why DNS queries are slow though.

dcabanaw wrote:

>
> I will likely setup a bind caching server… but I want to figure out
> why initial queries take so long first. My thinking is that if another
> computer on the network with caches cleared can do a query in less than
> a second, then my laptop should have similar results on the same
> network, using the same name servers.
>
> I guess at this point I will need to run a sniffer and see what traffic
> is actually leaving my laptop and go from there. I am thinking I should
> disable all DNS caching methods and setup a performance baseline. From
> there I can start making changes and compare to that baseline.
>
> As far as nscd goes, is there anything else I should do to disable it
> besides turning it off in the runlevel config?
>
> Again, thank-you for all the input and help so far.

On the odd chance it might help: I had similar molasses in the system until
I disabled IPV6. The problem was due to the router I use - it passed
itself as the primary DNS and forwarded requests for what it didn’t have to
the secondary/tercary DNS servers supplied by the ISP (DSL connection). I
was also able to (finally) find the setting to disable this. As soon as
the router got out of the way, DNS lookups whizzed right through. Check to
see if the DHCP setup is setting the DNS servers for your ISP - once I got
to that point IPV6 made no difference.


Will Honea

Will Honea wrote:
> dcabanaw wrote:
>
>> I will likely setup a bind caching server… but I want to figure out
>> why initial queries take so long first. My thinking is that if another
>> computer on the network with caches cleared can do a query in less than
>> a second, then my laptop should have similar results on the same
>> network, using the same name servers.
>>
>> I guess at this point I will need to run a sniffer and see what traffic
>> is actually leaving my laptop and go from there. I am thinking I should
>> disable all DNS caching methods and setup a performance baseline. From
>> there I can start making changes and compare to that baseline.
>>
>> As far as nscd goes, is there anything else I should do to disable it
>> besides turning it off in the runlevel config?
>>
>> Again, thank-you for all the input and help so far.
>
> On the odd chance it might help: I had similar molasses in the system until
> I disabled IPV6. The problem was due to the router I use - it passed
> itself as the primary DNS and forwarded requests for what it didn’t have to
> the secondary/tercary DNS servers supplied by the ISP (DSL connection). I
> was also able to (finally) find the setting to disable this. As soon as
> the router got out of the way, DNS lookups whizzed right through. Check to
> see if the DHCP setup is setting the DNS servers for your ISP - once I got
> to that point IPV6 made no difference.
>

Also, if you have IPv6 enabled, for every host lookup you’re going
to ask for a AAAA and then an A record. Just the nature of the beast
I’m afraid.

Well I found the problem… ipv6 was still enabled even though I thought I turned it off… Initially it would show up as disabled in the config, but after ‘refreshing’ the settings dialog it was not actually disabled.

Thanks for all the help and the input. Some of the input has helped me in other situations as well.