Unbound cache ttl

henk · Post by **henk** » 30 Jul 2018 08:15

My gateway died tonight, resulting in a massive amount of messages from monitoring. Monitoring works best when you are awake

After fixing the gateway, unbound was still unable to query dns. Any other host had no issues, since they use different dns servers.

I sure like to know if someone had the same issue and how he fixed it.

Code: Select all

freshclam -v
Current working dir is /var/lib/clamav
Max retries == 3
ClamAV update process started at Mon Jul 30 09:00:50 2018
Using IPv6 aware code
Querying current.cvd.clamav.net
WARNING: Can't query current.cvd.clamav.net
WARNING: Invalid DNS reply. Falling back to HTTP mode.
If-Modified-Since: Wed, 07 Jun 2017 21:38:10 GMT
Reading CVD header (main.cvd): WARNING: Can't get information about db.nl.clamav.net: Name or service not known
WARNING: Can't read main.cvd header from db.nl.clamav.net (IP: )
Trying again in 5 secs...
etv, etc

Code: Select all

dig db.nl.clamav.net

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.el6 <<>> db.nl.clamav.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 44327
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;db.nl.clamav.net.              IN      A

;; Query time: 4 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Jul 30 09:02:42 2018
;; MSG SIZE  rcvd: 34

As the failed updates are the minor issue, the failing RBL lookups are the major issue.

The bigger issue -since mail was processed-:

Code: Select all

dig test.uribl.com.multi.uribl.com txt +short

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.el6 <<>> test.uribl.com.multi.uribl.com txt +short
;; global options: +cmd
;; connection timed out; no servers could be reached

Only after a restart from unbound, dns worked again.

[root@sansspam scripts]#

Code: Select all

service unbound restart
Stopping unbound:                                          [  OK  ]
Starting unbound: Jul 30 09:04:25 unbound[11758:0] warning: increased limit(open files) from 1024 to 8266
                                                           [  OK  ]

[root@sansspam scripts]#

Code: Select all

dig test.uribl.com.multi.uribl.com txt +short
"permanent testpoint"

[root@sansspam scripts]#

Code: Select all

dig 2.0.0.127.zen.spamhaus.org +short
127.0.0.4
127.0.0.2
127.0.0.10

[root@sansspam scripts]#

Code: Select all

dig @127.0.0.1 db.nl.clamav.net

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.el6 <<>> @127.0.0.1 db.nl.clamav.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4125
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;db.nl.clamav.net.              IN      A

;; ANSWER SECTION:
db.nl.clamav.net.       900     IN      CNAME   db.nl.clamav.net.cdn.cloudflare.net.
db.nl.clamav.net.cdn.cloudflare.net. 900 IN A   104.16.189.138
db.nl.clamav.net.cdn.cloudflare.net. 900 IN A   104.16.186.138
db.nl.clamav.net.cdn.cloudflare.net. 900 IN A   104.16.187.138
db.nl.clamav.net.cdn.cloudflare.net. 900 IN A   104.16.185.138
db.nl.clamav.net.cdn.cloudflare.net. 900 IN A   104.16.188.138

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Jul 30 09:04:29 2018
;; MSG SIZE  rcvd: 160

The only thing I can think of causing this are the Unbound cache settings:

cache-max-ttl: <seconds>
Time to live maximum for RRsets and messages in the cache.
Default is 86400 seconds (1 day). If the maximum kicks in,
responses to clients still get decrementing TTLs based on the
original (larger) values. When the internal TTL expires, the
cache item has expired. Can be set lower to force the resolver
to query for data often, and not trust (very large) TTL values.

cache-min-ttl: <seconds>
Time to live minimum for RRsets and messages in the cache.
Default is 0. If the minimum kicks in, the data is cached for
longer than the domain owner intended, and thus less queries are
made to look up the data. Zero makes sure the data in the cache
is as the domain owner intended, higher values, especially more
than an hour or so, can lead to trouble as the data in the cache
does not match up with the actual data any more.

cache-max-negative-ttl: <seconds>
Time to live maximum for negative responses, these have a SOA in
the authority section that is limited in time. Default is 3600.
This applies to nxdomain and nodata answers.

The best thing I can think off is to lower the cache-max-negative-ttl to a decent value.