Inbound mail queue has become slow

General eFa discussion
Post Reply
iglooo
Posts: 30
Joined: 25 Jan 2019 19:52

Inbound mail queue has become slow

Post by iglooo » 09 Apr 2019 15:04

Hey,

Normally any inbound queues get processed within a second or so but today I've noticed it can take upwards of 15 seconds. Spam learning takes very long to process too.

Here's lint for mailscanner and spamassassin:

Apr 9 10:57:34.635 [41763] dbg: rules: __HAS_LIST_ID merged duplicates: __ML2 0.66906

SpamAssassin reported no errors. 2.6422

I've attached a screenshot from "top" - anything else I can provide to help figure this out? /var/log/messages looks clean. Running 3.0.2.6 and CentOS packages are up to date, and here's a disk space report:

/ 4.98GB [64%]
/boot 353.02MB [73%]
/tmp 9.2GB [95%]
/var 94.31GB [92%]
Attachments
Capture.PNG
Capture.PNG (29.59 KiB) Viewed 347 times

iglooo
Posts: 30
Joined: 25 Jan 2019 19:52

Re: Inbound mail queue has become slow

Post by iglooo » 10 Apr 2019 14:24

Anyone?

henk
Posts: 375
Joined: 14 Dec 2015 22:16
Location: Netherlands
Contact:

Re: Inbound mail queue has become slow

Post by henk » 10 Apr 2019 15:08

Just read back your post, and then try to give some advise.... :shifty:
Somehow i've got the feeling that the answer on the question : What changed? is: Nothing
"/var/log/messages looks clean" means there are no errors/warnings in the logs?
As will try to help, can you post

Code: Select all

df -h
and

Code: Select all

ls -l /var/dcc/log |wc -l
and

Code: Select all

find /var/lib/mysql/ -maxdepth 1 -type d ! -name '.' -exec du -sh {} \;
and a mysqltuner report.

iglooo
Posts: 30
Joined: 25 Jan 2019 19:52

Re: Inbound mail queue has become slow

Post by iglooo » 10 Apr 2019 15:38

Haha henk, indeed nothing has changed HOWEVER the night this happened, our hyperv host was set to reboot, and upon restarting the iscsi drive EFA lives on didn't come back online automatically. It has since been moved to a physical drive - actually, 2 different drives for testing, but the performance remains unchanged.

That being said, EFA was shutdown properly through hyperv integration and everything seems intact. Recent /var/log/messages just show " clamd[48501]: SelfCheck: Database status OK." Here's the other information you asked for:

Code: Select all

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_00-lv_root
                      7.8G  2.4G  5.0G  33% /
tmpfs                 7.8G     0  7.8G   0% /dev/shm
/dev/sda1             485M  107M  354M  24% /boot
/dev/mapper/vg_00-lv_tmp
                      9.8G   23M  9.2G   1% /tmp
/dev/mapper/vg_00-lv_var
                      103G  3.3G   95G   4% /var
none                  7.8G  1.7M  7.8G   1% /var/spool/MailScanner/incoming

Code: Select all

6450

Code: Select all

377M    /var/lib/mysql
16K     /var/lib/mysql/efa
872K    /var/lib/mysql/sqlgrey
4.0K    /var/lib/mysql/temp
1.2M    /var/lib/mysql/mysql
8.0K    /var/lib/mysql/performance_schema
49M     /var/lib/mysql/sa_bayes
154M    /var/lib/mysql/mailscanner

Code: Select all

-------- Log file Recommendations ------------------------------------------------------------------
[--] Log file: /var/lib/mysql/efaserv.local.domain.err(62K)
[OK] Log file /var/lib/mysql/efaserv.local.domain.err exists
[!!] Log file /var/lib/mysql/efaserv.local.domain.err isn't readable.

-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +Aria +CSV +InnoDB +MEMORY +MRG_MyISAM +MyISAM +PERFORMANCE_SCHEMA +SEQUENCE
[--] Data in MyISAM tables: 1.0K (Tables: 1)
[--] Data in InnoDB tables: 137.5M (Tables: 29)
[OK] Total fragmented tables: 0

-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.

-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[!!] There is no basic password file list!

-------- CVE Security Recommendations --------------------------------------------------------------
[--] Skipped due to --cvefile option undefined

-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 1d 3h 7m 10s (470K q [4.822 qps], 5K conn, TX: 154M, RX: 94M)
[--] Reads / Writes: 41% / 59%
[--] Binary logging is disabled
[--] Physical Memory     : 15.6G
[--] Max MySQL memory    : 864.0M
[--] Other process memory: 0B
[--] Total buffers: 425.0M global + 2.9M per thread (151 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 442.4M (2.78% of installed RAM)
[OK] Maximum possible memory usage: 864.0M (5.42% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (0/470K)
[OK] Highest usage of available connections: 3% (6/151)
[OK] Aborted connections: 0.12%  (6/5077)
[!!] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
[!!] Query cache may be disabled by default due to mutex contention.
[!!] Query cache efficiency: 0.0% (0 cached / 186K selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 54 sorts)
[OK] No joins without indexes
[!!] Temporary tables created on disk: 58% (185 on disk / 314 total)
[!!] Thread cache is disabled
[OK] Table cache hit rate: 85% (115 open / 134 opened)
[OK] Open file limit used: 2% (87/4K)
[OK] Table locks acquired immediately: 100% (454K immediate / 454K locks)

-------- Performance schema ------------------------------------------------------------------------
[--] Performance schema is disabled.
[--] Memory used by P_S: 0B
[--] Sys schema isn't installed.

-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 4 thread(s).
[--] Using default value is good enough for your version (10.1.37-MariaDB)

-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 18.3% (24M used / 134M cache)
[OK] Key buffer size / total MyISAM indexes: 128.0M/124.0K
[!!] Read Key buffer hit rate: 73.9% (69 cached / 18 reads)

-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 128.0M/137.5M
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (75 %): 48.0M * 2/128.0M should be equal to 25%
[!!] InnoDB buffer pool <= 1G and Innodb_buffer_pool_instances(!=1).
[--] InnoDB Buffer Pool Chunk Size not used or defined in your version
[OK] InnoDB Read buffer efficiency: 99.97% (39000998 hits/ 39011183 total)
[!!] InnoDB Write Log efficiency: 84.9% (1526434 hits/ 1797940 total)
[OK] InnoDB log waits: 0.00% (0 waits / 271506 writes)

-------- AriaDB Metrics ----------------------------------------------------------------------------
[--] AriaDB is enabled.
[OK] Aria pagecache size / total Aria indexes: 128.0M/1B
[!!] Aria pagecache hit rate: 75.4% (700 cached / 172 reads)

-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.

-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.

-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.

-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: STATEMENT
[--] XA support enabled: ON
[--] Semi synchronous replication Master: Not Activated
[--] Semi synchronous replication Slave: Not Activated
[--] This is a standalone server

-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
    Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
    When making adjustments, make tmp_table_size/max_heap_table_size equal
    Reduce your SELECT DISTINCT queries which have no LIMIT clause
    Set thread_cache_size to 4 as a starting value
    Performance schema should be activated for better diagnostics
    Consider installing Sys schema from https://github.com/mysql/mysql-sys for MySQL
    Consider installing Sys schema from https://github.com/good-dba/mariadb-sys for MariaDB
    Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: https://bit.ly/2TcGgtU
Variables to adjust:
    query_cache_size (=0)
    query_cache_type (=0)
    query_cache_limit (> 1M, or use smaller result sets)
    tmp_table_size (> 16M)
    max_heap_table_size (> 16M)
    thread_cache_size (start at 4)
    performance_schema = ON enable PFS
    innodb_buffer_pool_size (>= 137.5M) if possible.
    innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
    innodb_buffer_pool_instances (=1)
Edit: Today mailscanner lint part for spamassassin is even slower than yesterday: SpamAssassin reported no errors. 6.80256

henk
Posts: 375
Joined: 14 Dec 2015 22:16
Location: Netherlands
Contact:

Re: Inbound mail queue has become slow

Post by henk » 10 Apr 2019 18:09

:shhh:
Not clear what you mean with: " It has since been moved to a physical drive - actually, 2 different drives for testing, "
I use vmware, but the concept would be the same.

There is no harm in udating mysql server a bit ( server config in the mysqld section )
viewtopic.php?f=13&t=3157

Mysql is only 137.5M , so we need to find the issue somewhere else. :think:

Can you monitor ( with top) during message receiveval?

Did you change the vm config ( processor/mem or whatever, besides moving to a physical drive)

Do you use recursion? ( is dns working ok)

Did you check all the logfiles? ( also the mysql logfile)

Can you see message details in the mailwatch gui?

iglooo
Posts: 30
Joined: 25 Jan 2019 19:52

Re: Inbound mail queue has become slow

Post by iglooo » 11 Apr 2019 14:21

henk wrote:
10 Apr 2019 18:09
:shhh:
Not clear what you mean with: " It has since been moved to a physical drive - actually, 2 different drives for testing, "
I use vmware, but the concept would be the same.
I mean the hyperv host has multiple physical drives so initially I moved EFA from an iscsi drive to a physical drive, and that's when I noticed the performance issues so I then moved it to another physical drive to rule out that hard drive as the cause

There is no harm in udating mysql server a bit ( server config in the mysqld section )
viewtopic.php?f=13&t=3157
Is this necessary? I can certainly follow your guide but it worked fine this entire time with default config

Mysql is only 137.5M , so we need to find the issue somewhere else. :think:

Can you monitor ( with top) during message receiveval?
For sure! Anything specific you want me to look out for? I've checked top during incoming mail and RAM at least is nowhere near maxed out

Did you change the vm config ( processor/mem or whatever, besides moving to a physical drive)
No vm config changes besides moving the VM to another drive

Do you use recursion? ( is dns working ok)
I disabled it during initial setup and "dig" seems to show that DNS is working quickly

Did you check all the logfiles? ( also the mysql logfile)
Sorry, I'm nowhere near as proficient in linux as you are - what are all the log files? Maillog, messages, mysql, what else?

Can you see message details in the mailwatch gui?
Yep, checking message details is actually very fast!
This morning I ran sa-learn with the debugging option on a single message and noticed where sa-learn slows down but it's not exactly clear what that part is doing.

Code: Select all

Apr 11 10:09:59.051 [18557] dbg: bayes: tokenized body: 510 tokens
Apr 11 10:09:59.055 [18557] dbg: bayes: tokenized uri: 585 tokens
Apr 11 10:09:59.055 [18557] dbg: bayes: tokenized invisible: 34 tokens
Apr 11 [b]10:09:59.058[/b] [18557] dbg: bayes: tokenized header: 125 tokens
Apr 11 [b]10:10:44.888[/b] [18557] dbg: bayes: seen (0ab33c1c8d55f1b953047ce7d1abe8a2a81808c0@sa_generated) put
Apr 11 10:10:44.889 [18557] dbg: bayes: learned '0ab33c1c8d55f1b953047ce7d1abe8a2a81808c0@sa_generated', atime: 1554990189
Apr 11 10:10:44.889 [18557] dbg: TxRep: learning a message
Apr 11 10:10:44.889 [18557] dbg: check: pms new, time limit in 254.103 s
Also noticed this in maillog:

Code: Select all

Apr 11 10:36:29 efaserv MailScanner[3021]: New Batch: Found 2 messages waiting
Apr 11 10:36:29 efaserv MailScanner[3021]: New Batch: Scanning 1 messages, 19719 bytes
Apr 11 10:36:29 efaserv MailScanner[3021]: Virus and Content Scanning: Starting
Apr 11 10:36:29 efaserv MailScanner[3021]: <A> tag found in message 1DDC6100054.A41E3 from email@address.com
Apr 11 10:36:29 efaserv MailScanner[3021]: Spam Checks: Starting
Apr 11 10:36:29 efaserv MailScanner[3021]: Expired 4 records from the SpamAssassin cache
[b]Apr 11 10:36:43 efaserv MailScanner[3011]: SpamAssassin timed out and was killed, failure 2 of 10[/b]
Apr 11 10:36:45 efaserv MailScanner[3011]: Content Checks: Detected and have disarmed web bug tags in HTML message in C9868101550.A9080 from email@address2.com
Apr 11 10:36:45 efaserv MailScanner[3011]: Requeue: C9868101550.A9080 to 0D783101555
Apr 11 10:36:45 efaserv MailScanner[3011]: Uninfected: Delivered 1 messages
Apr 11 10:36:45 efaserv MailScanner[3011]: Deleted 1 messages from processing-database
Apr 11 10:36:45 efaserv MailScanner[3011]: MailWatch: Logging message C9868101550.A9080 to SQL
So the issue seems to be with spamassassin but where do I take it from here? Mind you this timeout doesn't happen all the time

EDIT: Someone on IRC helped figure me out. It was caused by the physical drive - moving EFA back to ISCSI cleared the issue right up. Thanks for your help too henk!

Post Reply