Bayes management

General eFa discussion
Post Reply
buzzzo
Posts: 94
Joined: 03 Feb 2014 09:09

Bayes management

Post by buzzzo »

Hi

How does SA Spam Bayes DB is managed ?
How the expiration is run ? from a cron job ?

Thx
User avatar
shawniverson
Posts: 3650
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: Bayes management

Post by shawniverson »

Here is a snippet from the spamassassin sa-learn man page. To answer your question, it is opportunistic and runs during each call sa-learn. Details are in the man page.
EXPIRATION
Since SpamAssassin can auto-learn messages, the Bayes database files
could increase perpetually until they fill your disk. To control this,
SpamAssassin performs journal synchronization and bayes expiration
periodically when certain criteria (listed below) are met.

SpamAssassin can sync the journal and expire the DB tokens either
manually or opportunistically. A journal sync is due if --sync is
passed to sa-learn (manual), or if the following is true
(opportunistic):

- bayes_journal_max_size does not equal 0 (means donât sync)
- the journal file exists

and either:

- the journal file has a size greater than bayes_journal_max_size

or

- a journal sync has previously occurred, and at least 1 day has passed
since that sync

Expiry is due if --force-expire is passed to sa-learn (manual), or if
all of the following are true (opportunistic):

- the last expire was attempted at least 12hrs ago
- bayes_auto_expire does not equal 0
- the number of tokens in the DB is > 100,000
- the number of tokens in the DB is > bayes_expiry_max_db_size
- there is at least a 12 hr difference between the oldest and newest
token atimes

EXPIRE LOGIC
If either the manual or opportunistic method causes an expire run to
start, here is the logic that is used:

- figure out how many tokens to keep. take the larger of either
bayes_expiry_max_db_size * 75% or 100,000 tokens. therefore, the goal
reduction is number of tokens - number of tokens to keep.
- if the reduction number is < 1000 tokens, abort (not worth the
effort).
- if an expire has been done before, guesstimate the new atime delta
based on the old atime delta. (new_atime_delta = old_atime_delta *
old_reduction_count / goal)
- if no expire has been done before, or the last expire looks "weird",
do an estimation pass. The definition of "weird" is:
- last expire over 30 days ago
- last atime delta was < 12 hrs
- last reduction count was < 1000 tokens
- estimated new atime delta is < 12 hrs
- the difference between the last reduction count and the goal
reduction count is > 50%
buzzzo
Posts: 94
Joined: 03 Feb 2014 09:09

Re: Bayes management

Post by buzzzo »

Thx for help.

In my SA setups I always preferred running the bayes expiration from cron job for performance reason.
I also see that the expiration could be controlled internally by mailscanner with:

Rebuild Bayes Every = 0
Wait During Bayes Rebuild = no

Still prefers to user: sa-learn --force-expire by daily cron

Thx
buzzzo
Posts: 94
Joined: 03 Feb 2014 09:09

Re: Bayes management

Post by buzzzo »

User avatar
shawniverson
Posts: 3650
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: Bayes management

Post by shawniverson »

We will take this under advisement.

One possibility is that we could offer the ability to turn this on and off as you describe as a configurable option.
buzzzo
Posts: 94
Joined: 03 Feb 2014 09:09

Re: Bayes management

Post by buzzzo »

IMHO would be better to silenty "crontab" the process.
It is a feature that does not change the way the expiration works, so could be setted "transparenlty" in regards to user's view.
The less the user set the more the things works.
User avatar
shawniverson
Posts: 3650
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: Bayes management

Post by shawniverson »

We will also consider this as well. Thanks!
Post Reply