Page 1 of 1

Bayes management

Posted: 13 Feb 2014 10:27
by buzzzo
Hi

How does SA Spam Bayes DB is managed ?
How the expiration is run ? from a cron job ?

Thx

Re: Bayes management

Posted: 13 Feb 2014 15:37
by shawniverson
Here is a snippet from the spamassassin sa-learn man page. To answer your question, it is opportunistic and runs during each call sa-learn. Details are in the man page.
EXPIRATION
Since SpamAssassin can auto-learn messages, the Bayes database files
could increase perpetually until they fill your disk. To control this,
SpamAssassin performs journal synchronization and bayes expiration
periodically when certain criteria (listed below) are met.

SpamAssassin can sync the journal and expire the DB tokens either
manually or opportunistically. A journal sync is due if --sync is
passed to sa-learn (manual), or if the following is true
(opportunistic):

- bayes_journal_max_size does not equal 0 (means donĂ¢t sync)
- the journal file exists

and either:

- the journal file has a size greater than bayes_journal_max_size

or

- a journal sync has previously occurred, and at least 1 day has passed
since that sync

Expiry is due if --force-expire is passed to sa-learn (manual), or if
all of the following are true (opportunistic):

- the last expire was attempted at least 12hrs ago
- bayes_auto_expire does not equal 0
- the number of tokens in the DB is > 100,000
- the number of tokens in the DB is > bayes_expiry_max_db_size
- there is at least a 12 hr difference between the oldest and newest
token atimes

EXPIRE LOGIC
If either the manual or opportunistic method causes an expire run to
start, here is the logic that is used:

- figure out how many tokens to keep. take the larger of either
bayes_expiry_max_db_size * 75% or 100,000 tokens. therefore, the goal
reduction is number of tokens - number of tokens to keep.
- if the reduction number is < 1000 tokens, abort (not worth the
effort).
- if an expire has been done before, guesstimate the new atime delta
based on the old atime delta. (new_atime_delta = old_atime_delta *
old_reduction_count / goal)
- if no expire has been done before, or the last expire looks "weird",
do an estimation pass. The definition of "weird" is:
- last expire over 30 days ago
- last atime delta was < 12 hrs
- last reduction count was < 1000 tokens
- estimated new atime delta is < 12 hrs
- the difference between the last reduction count and the goal
reduction count is > 50%

Re: Bayes management

Posted: 13 Feb 2014 16:17
by buzzzo
Thx for help.

In my SA setups I always preferred running the bayes expiration from cron job for performance reason.
I also see that the expiration could be controlled internally by mailscanner with:

Rebuild Bayes Every = 0
Wait During Bayes Rebuild = no

Still prefers to user: sa-learn --force-expire by daily cron

Thx

Re: Bayes management

Posted: 13 Feb 2014 16:28
by buzzzo

Re: Bayes management

Posted: 13 Feb 2014 16:35
by shawniverson
We will take this under advisement.

One possibility is that we could offer the ability to turn this on and off as you describe as a configurable option.

Re: Bayes management

Posted: 14 Feb 2014 10:13
by buzzzo
IMHO would be better to silenty "crontab" the process.
It is a feature that does not change the way the expiration works, so could be setted "transparenlty" in regards to user's view.
The less the user set the more the things works.

Re: Bayes management

Posted: 14 Feb 2014 20:57
by shawniverson
We will also consider this as well. Thanks!