Bind problems - DNS server stopped working

I inherited a suse based dns server. Unfortunately it appears to be running I believe on suse 9.2 or 9.3 home edition stripped of gui desktop (so I have been told). Everything had been working just fine for a little over a year and then suddenly it stopped working. I am seeing two problems currently that I do not think are related to each other. The first is that evidently, the Primary was not updating the secondary dns server. The other problem (and most obvious one to fix first) is that this box boots up and appears to be working except that dns (bind) is not doing its job. My only real clue at this time is the warning message on boot that says:

Starting name server Bind9 -warning: /var/run/named/named.pid exists! done

Is this something as simple as removing the named.pid so that when it boots it does not find one in that location or is this a deeper problem? I do not have a lot of linux background - as you can probably tell. Any hints or advice would be appreciated. I can connect to the box through telnet and the network cards lights are all correctly lit so I don’t see how the card is being faulty.

putz3000 schrieb:
> that this box boots up and appears to be working except that dns (bind)
> is not doing its job. My only real clue at this time is the warning
> message on boot that says:
>
> Starting name server Bind9 -warning: /var/run/named/named.pid exists!
> done
>
> Is this something as simple as removing the named.pid so that when it
> boots it does not find one in that location

It may well be, and there’s no harm in trying.
Do a “ps ax” to see that named is really not running, and then just
“rm /var/run/named/named.pid” followed by “rcnamed start”.

> or is this a deeper problem?

You’ll see that if after the above the named process still doesn’t run.
But then you’ll probably find more hints in the log file /var/log/messages
about what went wrong.

> I do not have a lot of linux background - as you can probably tell. Any
> hints or advice would be appreciated. I can connect to the box through
> telnet and the network cards lights are all correctly lit so I don’t see
> how the card is being faulty.

Your problem is certainly not caused by a defective network card.

But while we’re at it, I urgently recommend to:

  1. reinstall that machine with a current, supported SuSE version;
    there are security holes in the BIND version you are using

  2. switch from Telnet to ssh for connecting to the machine;
    it’s not more difficult to use, and infinitely more secure

HTH
T.

Thank you for the reply. I will give this a try. Just for clarification, I used a poor choice in wording, I am connecting by SSH not telnet and I agree/understand that ssh is infinitely better than telnet from a security stand point. I also agree that this box needs to be rebuilt with a current OS and better version of BIND - that is on the docket early next year (budget issues). But I need to get this one back up for now.

Thank you again.

ok, no problem deleting the named.pid file and I ran a rcnamed start and everything seems to go ok but if I run the “ps ax”, there does not appear to be anything related to named at all. The first time I removed the named.pid and then restarted the service, I rebooted and had the same error I discribed originally. So this time, I stopped the service, then removed the file then restarted the service but when running “ps ax”, just not there.

I took a look at the message log, and here is the bottom of the log file (server name/public ip edited out):

Dec 1 18:09:40 <server name> named[4938]: starting BIND 9.2.3 -u named
Dec 1 18:09:40 <server name> named[4938]: using 1 CPU
Dec 1 18:09:40 <server name> named[4938]: loading configuration from ‘/etc/named.conf’
Dec 1 18:09:40 <server name> named[4938]: listening on IPv4 interface lo, 127.0.0.1#53
Dec 1 18:09:40 <server name> named[4938]: listening on IPv4 interface eth0, <PUBLIC IP ADDRESS OF SERVER>#53
Dec 1 18:09:40 <server name> named[4938]: command channel listening on 127.0.0.1#953
Dec 1 18:09:40 <server name> named[4938]: command channel listening on ::1#953
Dec 1 18:09:41 <server name> named[4938]: unable to rename log file ‘/var/log/named_querylog.1’ to ‘/var/log/named_querylog.2’: permission denied
Dec 1 18:09:41 <server name> named[4938]: unable to rename log file ‘/var/log/named_querylog.0’ to ‘/var/log/named_querylog.1’: permission denied
Dec 1 18:09:41 <server name> named[4938]: unable to rename log file ‘/var/log/named_querylog’ to ‘/var/log/named_querylog.0’: permission denied

Is the permission denied a good thing, or does this indicate that permissions were all out of whack and becuase it can not do any renaming of the file(s) it is unable to properly run?

Well normally named runs as the named account and in addition it runs chrooted, so the actual path is /var/lib/named/var/log/named_querylog. It seems named has lost the rights on /var/lib/named/var/log. You should check that directory.

Also if the pid file is there but the process is not, usually the process has terminated unexpectedly. Fix what problems you can see and go from there.

putz3000 schrieb:
> ok, no problem deleting the named.pid file and I ran a rcnamed start and
> everything seems to go ok but if I run the “ps ax”, there does not
> appear to be anything related to named at all.

That would indicate BIND came up but then died rather quickly.

> The first time I removed
> the named.pid and then restarted the service, I rebooted and had the
> same error I discribed originally. So this time, I stopped the service,
> then removed the file then restarted the service but when running “ps
> ax”, just not there.

That fits the diagnosis above. If the BIND process dies unexpectedly
it leaves the pid file behind for a subsequent run of the service
script to trip over.

> Dec 1 18:09:40 <server name> named[4938]: starting BIND 9.2.3 -u named
> Dec 1 18:09:40 <server name> named[4938]: using 1 CPU Dec 1 18:09:40 <server name> named[4938]: loading configuration from
> ‘/etc/named.conf’
> Dec 1 18:09:40 <server name> named[4938]: listening on IPv4 interface lo, 127.0.0.1#53
> Dec 1 18:09:40 <server name> named[4938]: listening on IPv4 interface eth0, <PUBLIC IP ADDRESS OF SERVER>#53
> Dec 1 18:09:40 <server name> named[4938]: command channel listening on 127.0.0.1#953
> Dec 1 18:09:40 <server name> named[4938]: command channel listening on ::1#953
> Dec 1 18:09:41 <server name> named[4938]: unable to rename log file ‘/var/log/named_querylog.1’ to ‘/var/log/named_querylog.2’: permission denied
> Dec 1 18:09:41 <server name> named[4938]: unable to rename log file ‘/var/log/named_querylog.0’ to ‘/var/log/named_querylog.1’: permission denied
> Dec 1 18:09:41 <server name> named[4938]: unable to rename log file ‘/var/log/named_querylog’ to ‘/var/log/named_querylog.0’: permission denied
>
> Is the permission denied a good thing, or does this indicate that
> permissions were all out of whack and becuase it can not do any renaming
> of the file(s) it is unable to properly run?

The “permission denied” messages are definitely not good. But what’s
worse is what’s missing: The next lines after the “listening” lines
should be

<timestamp> <server> named<pid>]: zone <zone>: loaded serial <serial>

for all your zones, followed by

<timestamp> <server> named<pid>]: running

So it looks like your BIND chokes on those permissions.

Does your BIND run chrooted? Have a look in /etc/sysconfig/named to
check. If so, the path it tries to use will be prefixed by the chroot
path (probably /var/lib/named). Then check the permissions of that
directory (either /var/log or /var/lib/named/var/log) to see that the
user named which BIND is usually running under has write permission.

Btw, the fact that BIND wants to rotate /var/log/named_querylog seems
to indicate query logging is activated. Is that intentional?

HTH
T.


Tilman Schmidt
Phoenix Software GmbH
Bonn, Germany

It does appear to be a permissions issue regarding the query logging. What does query logging do anyhow? My “guess” would be that it logs all of the zone query requests that the server gets, would that be an accurate guess?

Anyhow, I made a copy of the named.conf file then rem’d out (#) all of the query logging lines, saved the file and then restarted the service. All is happy in happy land now. So I know what part of the named.conf file is tripping the process. For now I am working around it, but will probably have to fix the permissions if we re-enable before I get a new system built.

Thank you all for your help, it helped point me in the right direction.

Yeah, I would guess too that query logging logs queries. :wink:

It’s commented out in the default config, so some sysadmin must have decided to start doing it.

putz3000 schrieb:
> It does appear to be a permissions issue regarding the query logging.
> What does query logging do anyhow? My “guess” would be that it logs all
> of the zone query requests that the server gets, would that be an
> accurate guess?

'xactly. It’s actually a debugging option and not really intended for
being permanently active, because it can produce quite a lot of data
and put noticeable additional load on your server.

> Anyhow, I made a copy of the named.conf file then rem’d out (#) all of
> the query logging lines, saved the file and then restarted the service.
> All is happy in happy land now.

Glad to hear.

> So I know what part of the named.conf
> file is tripping the process. For now I am working around it, but will
> probably have to fix the permissions if we re-enable before I get a new
> system built.

Well, if you really want to activate query logging again, you may
want to redirect the log to a different directory where the user ID
BIND is running under has write permission. But if I may hazard a
guess, more likely than not this has just been added for debugging
and forgotten to remove some time ago, and nobody will remember why
it is there. Why not just leave it off until somebody complains? :slight_smile:

HTH
T.