Bayes management
Posted: 13 Feb 2014 10:27
Hi
How does SA Spam Bayes DB is managed ?
How the expiration is run ? from a cron job ?
Thx
How does SA Spam Bayes DB is managed ?
How the expiration is run ? from a cron job ?
Thx
EXPIRATION
Since SpamAssassin can auto-learn messages, the Bayes database files
could increase perpetually until they fill your disk. To control this,
SpamAssassin performs journal synchronization and bayes expiration
periodically when certain criteria (listed below) are met.
SpamAssassin can sync the journal and expire the DB tokens either
manually or opportunistically. A journal sync is due if --sync is
passed to sa-learn (manual), or if the following is true
(opportunistic):
- bayes_journal_max_size does not equal 0 (means donĂ¢t sync)
- the journal file exists
and either:
- the journal file has a size greater than bayes_journal_max_size
or
- a journal sync has previously occurred, and at least 1 day has passed
since that sync
Expiry is due if --force-expire is passed to sa-learn (manual), or if
all of the following are true (opportunistic):
- the last expire was attempted at least 12hrs ago
- bayes_auto_expire does not equal 0
- the number of tokens in the DB is > 100,000
- the number of tokens in the DB is > bayes_expiry_max_db_size
- there is at least a 12 hr difference between the oldest and newest
token atimes
EXPIRE LOGIC
If either the manual or opportunistic method causes an expire run to
start, here is the logic that is used:
- figure out how many tokens to keep. take the larger of either
bayes_expiry_max_db_size * 75% or 100,000 tokens. therefore, the goal
reduction is number of tokens - number of tokens to keep.
- if the reduction number is < 1000 tokens, abort (not worth the
effort).
- if an expire has been done before, guesstimate the new atime delta
based on the old atime delta. (new_atime_delta = old_atime_delta *
old_reduction_count / goal)
- if no expire has been done before, or the last expire looks "weird",
do an estimation pass. The definition of "weird" is:
- last expire over 30 days ago
- last atime delta was < 12 hrs
- last reduction count was < 1000 tokens
- estimated new atime delta is < 12 hrs
- the difference between the last reduction count and the goal
reduction count is > 50%