eFa Filter sudden restarts

General eFa discussion
Post Reply
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

eFa Filter sudden restarts

Post by wstemb »

From the installation day I can find a lot of "silent" and unsolicited restarts of the server, I have a lot of directories in /var/crash, about 3 to 5 a day.
Most of them are so short to be almost invisible to the user, but some of them had repercussions on delaying messages (msmilter service was down and the server repeatedly restarted).
I inspected (at the level of my knowledge of eFA and mailscanner) and it seems that the problem is connected in some way to the unbound service: just before the restart in some of logs I can find errors:

mail postfix/smtp[65537]: 4KjyMK2nxczN2spt: to=<xxxxxxxxxxxxx>, relay=no
ne, delay=391070, delays=391070/0.03/0/0, dsn=4.4.3, status=deferred (Host or domain name not found.
Name service error for name=xxxxxxxxxxx type=MX: Host not found, try again)

I am not sure, but it seems to me that this error occurred when the postfix tried to resend delayed messages from the outbound queue.

No errors on the company firewall at the time of restart.

After the automatic restart, at least at the last two, there was a error message in log and in output ot systemctl status unbound.:

failed lookup, cannot probe to master k.root-servers.net

eFa is defined as relay at the perimeter, controlling the mail entering and exiting the net and most of the time it has his work done.

MailWatch Version: 1.2.18
Operating System Version: CentOS Stream 8
Postfix Version: 3.5.9
MailScanner Version: 5.4.4
ClamAV Version: 0.103.5
SpamAssassin Version: 3.4.6
PHP Version: 7.2.24
MySQL Version: 10.3.28-MariaDB
GeoIP Database Version: GeoLite2 Country database 2018-06-07 22:38:29

unbound -V
Version 1.11.0

Configure line: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pythonmodule --with-pyunbound PYTHON=/usr/libexec/platform-python --with-libevent --with-pthreads --with-ssl --disable-rpath --disable-static --enable-relro-now --enable-pie --enable-subnet --enable-ipsecmod --with-conf-file=/etc/unbound/unbound.conf --with-pidfile=/var/run/unbound/unbound.pid --enable-sha2 --disable-gost --enable-ecdsa --with-rootkey-file=/var/lib/unbound/root.key
Linked libs: libevent 2.1.8-stable (it uses epoll), OpenSSL 1.1.1k FIPS 25 Mar 2021
Linked modules: dns64 python ipsecmod subnetcache respip validator iterator

BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues

unbound had the initial configuratio + the localforward.conf in conf.d for local (Intranet) DNS query defininition.

Somebody experienced something similar?
Walter
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: eFa Filter sudden restarts

Post by shawniverson »

It sounds like things are dying on your box before the system crashes. How do your resources look? Any indication in the logs as to why?
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: eFa Filter sudden restarts

Post by pdwalker »

It is really unusual for a centos box to restart unexpectedly.

Is the server having hardware problems? Is the system short of memory? Are there any errors before a restart in /var/log/messages?

I think you have to identify and resolve this problem first. The programs that make up eFa are incapable of causing server restarts.
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

Re: eFa Filter sudden restarts

Post by wstemb »

Resources are not an issue, the eFa filter is running in a ESX VM on a blade server, so as first step I gave a lot computing resources: 4 cores, 16GB RAM.

The number of restarts is very high, but most of them are invisible to the whole system, the mail throughput is still not so high. At 20 Apr. we had a problem in the clamd@scan service, because of "Malformed Database" error. Until I deleted and rebuilt the database using freshclam, the system was crashing / rebooting every 3-6 minutes.

I can't find anything in messages log, minutes of nothing and then the bootup process:

I am working now on journalctl and crash to find part of logs not written to files.
freyuh
Posts: 62
Joined: 04 Oct 2018 11:21

Re: eFa Filter sudden restarts

Post by freyuh »

A stupid question, but have you done a file system check on your eFa?
Maybe the filesystem is faulty...
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

Re: eFa Filter sudden restarts

Post by wstemb »

The machine and file systems are OK.

Restarts are in some way connected to the apps working on the system, but I have to find the connection. A proof is at Apr.20, when because of "Malformed Database" I had the clamd@scan service "activating", not active, and we had a crash / restart every few minutes...

______________________

I changed two things on the system:
1. I suppressed the NDR on the mail server. The Postfix outbound queue was always high, because of NDR-s that could not be delivered and were deferred. But, we had also a system restart with mailq = 2
2. I changed a parameter in unbound.conf, just to try (because of error in the starting of unbound: failed lookup, cannot probe to master k.root-servers.net). I will write here what I changed if this is the cause.

Now, the uptime is 2 day and 3 hours, the mailq is 0, no restarts of services or machine (the startup time of most of the services is similar to the boot time of the machine.

Before I have a lot of service restart, much more than system restarts (probably due to the CRON job:

CROND[592305]: (root) CMD (/usr/sbin/eFa-Monitor-cron >/dev/null 2>&1)

which is testing services and restarting them if neccesary every minute.

But, I can not be sure if this is solved now (better to tell workarounded now) , I am waiting with journalctl to see the next crash and the real cause. In few day I will return to last configuation, step by step, and continue to look at journalctl.

I found a situation in the system startup:

Postfix is started at: " Active: active (running) since Tue 2022-04-26 11:32:35 CEST; 2 days ago"
Unbound is started at: "Active: active (running) since Tue 2022-04-26 11:33:09 CEST; 2 days ago"

half minute before, so I think this is the reason on errors found in maillog of domains not resolved at postfix restart (mailq resending).

mail postfix/smtp[65537]: 4KjyMK2nxczN2spt: to=<xxxxxxxxxxxxx>, relay=none, delay=391070, delays=391070/0.03/0/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=xxxxxxxxxxx type=MX: Host not found, try again)

What is the reason of the cron job (every minute) :

CROND[593919]: (root) CMD (/usr/sbin/checkreboot.sh)

it is checking it the file /reboot.system exists and if yes reboot the system. But I can't find who, when and why is placing the reboot.system file :-(
Last edited by wstemb on 28 Apr 2022 17:51, edited 2 times in total.
freyuh
Posts: 62
Joined: 04 Oct 2018 11:21

Re: eFa Filter sudden restarts

Post by freyuh »

Which virtual SCSI controller and network adapter do you use?
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

Re: eFa Filter sudden restarts

Post by wstemb »

VM settings:

Virtual network adapter type VMNEX3
SCSI Vmware paravirtual

lspci extract:

03:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
0b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
freyuh
Posts: 62
Joined: 04 Oct 2018 11:21

Re: eFa Filter sudden restarts

Post by freyuh »

OK, the same as we do.
Two eFas on two different ESXi.
One is rocky linux 8.5 and one CentOS 8.
And they are running both flawlessly ...
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: eFa Filter sudden restarts

Post by pdwalker »

That's very strange.

I've been running eFa on ESXi 6.5 for years without a single unplanned reboot.

Are there any logs in ESXi you can check? Or are there anything inside the vm itself that might clue you in?

This is not an EFA problem, but highly likely a problem with your vm.
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

Re: eFa Filter sudden restarts

Post by wstemb »

This is not an EFA problem, but highly likely a problem with your vm.
I am not very sure about this, I can't prove it, it is still a "sensation". Existing logs on previous restarts are always empty before the restart, so I have to wait the next crash.

Why I am not sure it is a infrastructure problem:

1. We have hundreds of VM-s on server blades, this is the only one with similar problems. No alarms on vSphere.

2. We have two exception to "the 2-6 restart a day rule":
  • 77 crashes: A day when eFa services were in abnormal states due to clamd@scan "Malformed database" error and "Activating" status, when the mail was not relayed in any direction, just entering the mailq, I had restarts every 5-6 minutes .
    Once resolved the problem, the service restarted well and when the outbound queue (>500 at the problem time) was empty, no more restarts so often (2-6 a day).
  • 0 crashes: I have now the uptime of 3 days (never happened before). Mailq = 0 everytime I am looking at him, all services (except maillscan, which is restarting every day) are as old as uptime. I did not touch anything on the infrastructure or in OS, I just eliminated the NDR on the MS Exchange server and I changed just one parameter in unbound.conf. I have now a journalctl -f scrool in a ssh terminal to see in realtime what is happening.
Our email and mailbox situation was a little strange and it was generating a very high Postfix output queue full of deferred NDR-s, I will not explain the reasons here publicly, but I think that this situation pushed the server into some race condition or over some "edge" probably not expected when planned.

Next week, if the eFa stays stable, I will return, step by step, in configuration changes to initial situation to find if the crash reappear to try to find the cause in the journalctl.

I am a fan, not a denigrator of eFA, all this is just to find and solve the real cause, still hidden :-)
When I find the cause, I will inform here in detail.
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: eFa Filter sudden restarts

Post by shawniverson »

Please keep us posted. I would like to recreate the conditions that caused this crash to improve the system.
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

Re: eFa Filter sudden restarts

Post by wstemb »

shawniverson wrote: 01 May 2022 14:21 Please keep us posted. I would like to recreate the conditions that caused this crash to improve the system.
Sure. I am here and I am watching last 5 days on logs, no crashes. As the side effect of finding causes I saw something in logs during restarts after crashes, we will discuss it later if it is normal (premature start of postfix?) or can be avoided.

Tomorrow I will begin to revert back in config changes, one by one, I do not like things repaired "with no evident reason".
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: eFa Filter sudden restarts

Post by pdwalker »

Very weird, and a super annoying problem. Good luck isolating the cause, and I hope you find a reason.
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

Re: eFa Filter sudden restarts

Post by wstemb »

No crashes from Apr. 26... system and efa filter stable, doing the work.
Tried to revert the MS Exchange NDR settings, had >180 mails in Postfix outbound queue, no problem.

Next days I will revert unbound config changes to check further.

Unbound proved last week to be extremely unstable as service at external routing / connectivity problems (ISP network, out of my control). During this routing / connectivity issue the unbound service dropped always to failed status at first DNS request from the system.
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: eFa Filter sudden restarts

Post by pdwalker »

Are DNS requests being blocked by a firewall somewhere?
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

Re: eFa Filter sudden restarts

Post by wstemb »

pdwalker wrote: 10 May 2022 10:00 Are DNS requests being blocked by a firewall somewhere?
No, firewall and firewall logs are under my control, firewall opened to DNS and "green" in logs.
The unbound issue mentioned in last post was caused by a error in the routing tables in ISP routers network. After I notified them and they corrected, all is working again.
No new system crashes after April 26.
wstemb
Posts: 13
Joined: 16 Mar 2022 07:35

Re: eFa Filter sudden restarts

Post by wstemb »

After days of uptime, the system crashed again, only once, 26. May. No new crashes after that, but I found (again) a error in the unbound log after start:

[root@mail crash]# systemctl status unbound
● unbound.service - Unbound recursive Domain Name Server
Loaded: loaded (/usr/lib/systemd/system/unbound.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/unbound.service.d
└─override.conf
Active: active (running) since Thu 2022-05-26 13:07:00 CEST; 1 weeks 3 days ago
Process: 2371 ExecStartPre=/usr/sbin/unbound-anchor -a /var/lib/unbound/root.key -c /etc/unbound/icannbundle.pem -f /etc/resolv.conf -R (code=exited, status=0/SUCCESS)
Process: 2357 ExecStartPre=/usr/sbin/unbound-checkconf (code=exited, status=0/SUCCESS)
Main PID: 5434 (unbound)
Tasks: 4 (limit: 101059)
Memory: 70.8M
CGroup: /system.slice/unbound.service
└─5434 /usr/sbin/unbound -d

May 26 13:06:25 mail.uljanik.hr systemd[1]: Starting Unbound recursive Domain Name Server...
May 26 13:06:25 mail.uljanik.hr unbound-checkconf[2357]: unbound-checkconf: no errors in /etc/unbound/unbound.conf
May 26 13:07:00 mail.uljanik.hr systemd[1]: Started Unbound recursive Domain Name Server.
May 26 13:07:00 mail.uljanik.hr unbound[5434]: [5434:0] notice: init module 0: iterator
May 26 13:07:00 mail.uljanik.hr unbound[5434]: [5434:0] info: start of service (unbound 1.11.0).
May 26 13:07:00 mail.uljanik.hr unbound[5434]: [5434:0] error: .: failed lookup, cannot probe to master k.root-servers.net


I saw similar error before, after some of crashes. This time I was not near the server when the last crash happened and the last 6 minutes in log just before the restart are missing, so I can't be sure.
User avatar
Aryfir
Posts: 21
Joined: 04 Sep 2020 13:52

Re: eFa Filter sudden restarts

Post by Aryfir »

Thats weird, your unbound cannot probe only to master K.ROOT-SERVERS.NET? What about A,B,C....M.ROOT-SERVERS.NET?

Try to ping K.ROOT-SERVERS.NET on IPv4 193.0.14.129 or IPv6 2001:7fd::1, and see if you can reach that.

And also try to update your unbound root.hints
Post Reply