Hi
How does SA Spam Bayes DB is managed ?
How the expiration is run ? from a cron job ?
Thx
Bayes management
- shawniverson
- Posts: 3650
- Joined: 13 Jan 2014 23:30
- Location: Indianapolis, Indiana USA
- Contact:
Re: Bayes management
Here is a snippet from the spamassassin sa-learn man page. To answer your question, it is opportunistic and runs during each call sa-learn. Details are in the man page.
EXPIRATION
Since SpamAssassin can auto-learn messages, the Bayes database files
could increase perpetually until they fill your disk. To control this,
SpamAssassin performs journal synchronization and bayes expiration
periodically when certain criteria (listed below) are met.
SpamAssassin can sync the journal and expire the DB tokens either
manually or opportunistically. A journal sync is due if --sync is
passed to sa-learn (manual), or if the following is true
(opportunistic):
- bayes_journal_max_size does not equal 0 (means donât sync)
- the journal file exists
and either:
- the journal file has a size greater than bayes_journal_max_size
or
- a journal sync has previously occurred, and at least 1 day has passed
since that sync
Expiry is due if --force-expire is passed to sa-learn (manual), or if
all of the following are true (opportunistic):
- the last expire was attempted at least 12hrs ago
- bayes_auto_expire does not equal 0
- the number of tokens in the DB is > 100,000
- the number of tokens in the DB is > bayes_expiry_max_db_size
- there is at least a 12 hr difference between the oldest and newest
token atimes
EXPIRE LOGIC
If either the manual or opportunistic method causes an expire run to
start, here is the logic that is used:
- figure out how many tokens to keep. take the larger of either
bayes_expiry_max_db_size * 75% or 100,000 tokens. therefore, the goal
reduction is number of tokens - number of tokens to keep.
- if the reduction number is < 1000 tokens, abort (not worth the
effort).
- if an expire has been done before, guesstimate the new atime delta
based on the old atime delta. (new_atime_delta = old_atime_delta *
old_reduction_count / goal)
- if no expire has been done before, or the last expire looks "weird",
do an estimation pass. The definition of "weird" is:
- last expire over 30 days ago
- last atime delta was < 12 hrs
- last reduction count was < 1000 tokens
- estimated new atime delta is < 12 hrs
- the difference between the last reduction count and the goal
reduction count is > 50%
Re: Bayes management
Thx for help.
In my SA setups I always preferred running the bayes expiration from cron job for performance reason.
I also see that the expiration could be controlled internally by mailscanner with:
Rebuild Bayes Every = 0
Wait During Bayes Rebuild = no
Still prefers to user: sa-learn --force-expire by daily cron
Thx
In my SA setups I always preferred running the bayes expiration from cron job for performance reason.
I also see that the expiration could be controlled internally by mailscanner with:
Rebuild Bayes Every = 0
Wait During Bayes Rebuild = no
Still prefers to user: sa-learn --force-expire by daily cron
Thx
Re: Bayes management
Just as confermation of doubts:
http://www.maiamailguard.com/maia/wiki/BayesAutoExpire
http://www.maiamailguard.com/maia/wiki/BayesAutoExpire
- shawniverson
- Posts: 3650
- Joined: 13 Jan 2014 23:30
- Location: Indianapolis, Indiana USA
- Contact:
Re: Bayes management
We will take this under advisement.
One possibility is that we could offer the ability to turn this on and off as you describe as a configurable option.
One possibility is that we could offer the ability to turn this on and off as you describe as a configurable option.
Re: Bayes management
IMHO would be better to silenty "crontab" the process.
It is a feature that does not change the way the expiration works, so could be setted "transparenlty" in regards to user's view.
The less the user set the more the things works.
It is a feature that does not change the way the expiration works, so could be setted "transparenlty" in regards to user's view.
The less the user set the more the things works.
- shawniverson
- Posts: 3650
- Joined: 13 Jan 2014 23:30
- Location: Indianapolis, Indiana USA
- Contact:
Re: Bayes management
We will also consider this as well. Thanks!