spamassassin autolearn always no
spamassassin autolearn always no
My bayes database is scoring but every spam email I receive has got "spamassassin autolearn: no". I do quite some manual learning but it seems that autolearn is not working. I have add
bayes_auto_learn_threshold_spam 9.0
to /etc/MailScanner/spamassassin.conf and restarted mailscanner, but still no autolearn.
Does anybody know how to enable this?
Thanks,
Roger
bayes_auto_learn_threshold_spam 9.0
to /etc/MailScanner/spamassassin.conf and restarted mailscanner, but still no autolearn.
Does anybody know how to enable this?
Thanks,
Roger
- shawniverson
- Posts: 3650
- Joined: 13 Jan 2014 23:30
- Location: Indianapolis, Indiana USA
- Contact:
Re: spamassassin autolearn always no
This?
Code: Select all
bayes_auto_learn 1
Re: spamassassin autolearn always no
I have that too, and I just saw an autlearn yes (no spam) , so it seems to be working. But I can't find a single spam message where the autolearn also said yes
- shawniverson
- Posts: 3650
- Joined: 13 Jan 2014 23:30
- Location: Indianapolis, Indiana USA
- Contact:
Re: spamassassin autolearn always no
I wonder if the new bayes behavior is the culprit?
https://spamassassin.apache.org/full/3. ... shold.html
(look at the end)
https://spamassassin.apache.org/full/3. ... shold.html
(look at the end)
Re: spamassassin autolearn always no
How so?
Assuming that bayes_auto_learn is set to 1 /etc/MailScanner/spamassassin.conf then the system should be auto learning according to the following settings
bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
bayes_auto_learn_threshold_spam n.nn (default: 12.0, minimum 6, 3 from header, 3 from body score)
bayes_auto_learn_on_error (0 | 1) (default: 0)
This last entry says
[edit]
My bayes_auto_learn_threshold_spam is set to 9.0, and my spam that is 9.0 or greater shows "SpamAssassin Autolearn: N".
Maybe the bayes_auto_learn_on_error default is actually 1? I'm going to set this value to 0 and see if it makes a difference.
Assuming that bayes_auto_learn is set to 1 /etc/MailScanner/spamassassin.conf then the system should be auto learning according to the following settings
bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
bayes_auto_learn_threshold_spam n.nn (default: 12.0, minimum 6, 3 from header, 3 from body score)
bayes_auto_learn_on_error (0 | 1) (default: 0)
This last entry says
which means the default value should always "learn" detected spam/ham according to the thresholds above."With bayes_auto_learn_on_error off, autolearning will be performed even if bayes classifier already agrees with the new classification (i.e. yielded BAYES_00 for what we are now trying to teach it as ham, or yielded BAYES_99 for spam). This is a traditional setting, the default was chosen to retain backwards compatibility."
[edit]
My bayes_auto_learn_threshold_spam is set to 9.0, and my spam that is 9.0 or greater shows "SpamAssassin Autolearn: N".
Maybe the bayes_auto_learn_on_error default is actually 1? I'm going to set this value to 0 and see if it makes a difference.
Re: spamassassin autolearn always no
So I set it to 0, and it didn't make a difference - the header still showed "spam assassin autolearn = N"
I did a manual learn, and it reported "learned 1 message" which definitely means it didn't autolearn.
Now that I think about it, I wonder if there is a way to report whether a particular message has been "learned" or not by spamassassin?
*scratches chin* Looks like I have a new distraction for today.
I did a manual learn, and it reported "learned 1 message" which definitely means it didn't autolearn.
Now that I think about it, I wonder if there is a way to report whether a particular message has been "learned" or not by spamassassin?
*scratches chin* Looks like I have a new distraction for today.
Re: spamassassin autolearn always no
It might be a bug in Mailscanner. Checking...
[edit] no, not a bug. still checking.
[edit] no, not a bug. still checking.
Re: spamassassin autolearn always no
I ran this query on the mailscanner maillog table:
and I get the following interesting results (only a few examples shown):
The key thing to see in the spamassassin report is the "autolearn="
When I look at all my records, not every entry has an autolearn, including some of the ones that I, based on my autolearn threshold settings should have been learned.
Filtered for the ones with an autolearn in the spamreport, I can see that it does autolearn some for both spam and not spam, so autolearn is definitely working in some circumstances. My log entries above show this.
The only thing I can think of is from the spamassassin documentation
Let's test a message that was marked as spam, but not autolearned
and that appears to be the case.
tl;dr: spamassassin autolearn is working correctly as designed.
Code: Select all
SELECT timestamp, sascore, spamreport FROM mailscanner.maillog where (spamreport like '%autolearn=%' and not spamreport like '(blacklisted)') order by timestamp desc;
Code: Select all
'2017-09-01 00:18:04','-4.48','not spam, SpamAssassin (not cached, score=-4.478, required 4, autolearn=not spam, BAYES_00 -1.90, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00, MIME_QP_LONG_LINE 0.00, ML_SPAM_HEADER_NO -0.01, ML_SPF_PASS -0.68, MXPF_TEST 0.00, RCVD_IN_DNSWL_MED -2.30, RCVD_IN_SORBS_SPAM 0.50, SPF_HELO_PASS -0.00, T_SPF_PERMERROR 0.01)'
'2017-09-01 00:17:25','13.00','spam, SpamAssassin (not cached, score=13.003, required 4, autolearn=spam, BAYES_50 0.80, DATE_IN_PAST_12_24 1.05, DCC_CHECK 1.10, DEAR_SOMETHING 1.97, DIGEST_MULTIPLE 0.29, FREEMAIL_FORGED_FROMDOMAIN 0.20, FREEMAIL_FROM 0.00, FREEMAIL_REPLYTO_END_DIGIT 0.25, HEADER_FROM_DIFFERENT_DOMAINS 0.00, HPF_PASS -0.10, HTML_MESSAGE 0.00, KAM_LAZY_DOMAIN_SECURITY 1.00, MISSING_MID 0.50, ML_SPAMINFO_EXISTS 3.00, MXPF_TEST 0.00, PYZOR_CHECK 1.39, RCVD_IN_BL_SPAMCOP_NET 1.35, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, URIBL_DBL_SPAM 2.50)'
'2017-09-01 00:18:04','-4.48','not spam, SpamAssassin (not cached, score=-4.478, required 4, autolearn=not spam, BAYES_00 -1.90, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00, MIME_QP_LONG_LINE 0.00, ML_SPAM_HEADER_NO -0.01, ML_SPF_PASS -0.68, MXPF_TEST 0.00, RCVD_IN_DNSWL_MED -2.30, RCVD_IN_SORBS_SPAM 0.50, SPF_HELO_PASS -0.00, T_SPF_PERMERROR 0.01)'
'2017-09-01 00:17:25','spam, SpamAssassin (not cached, score=13.003, required 4, autolearn=spam, BAYES_50 0.80, DATE_IN_PAST_12_24 1.05, DCC_CHECK 1.10, DEAR_SOMETHING 1.97, DIGEST_MULTIPLE 0.29, FREEMAIL_FORGED_FROMDOMAIN 0.20, FREEMAIL_FROM 0.00, FREEMAIL_REPLYTO_END_DIGIT 0.25, HEADER_FROM_DIFFERENT_DOMAINS 0.00, HPF_PASS -0.10, HTML_MESSAGE 0.00, KAM_LAZY_DOMAIN_SECURITY 1.00, MISSING_MID 0.50, ML_SPAMINFO_EXISTS 3.00, MXPF_TEST 0.00, PYZOR_CHECK 1.39, RCVD_IN_BL_SPAMCOP_NET 1.35, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, URIBL_DBL_SPAM 2.50)'
When I look at all my records, not every entry has an autolearn, including some of the ones that I, based on my autolearn threshold settings should have been learned.
Filtered for the ones with an autolearn in the spamreport, I can see that it does autolearn some for both spam and not spam, so autolearn is definitely working in some circumstances. My log entries above show this.
The only thing I can think of is from the spamassassin documentation
So I'm guessing that the messages that are not getting autolearned are because the header or body is not scoring at least three on the spam autolearn test.Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6.
Let's test a message that was marked as spam, but not autolearned
A spamassassin -D -t on that message reveals:spam, SpamAssassin (not cached, score=12.714, required 4, BAYES_99 4.00, BAYES_999 2.00, HTML_FONT_FACE_BAD 0.98, HTML_FONT_LOW_CONTRAST 0.00, HTML_MESSAGE 0.00, KAM_BADIPHTTP 2.00, KAM_LAZY_DOMAIN_SECURITY 1.00, ML_SPAM_HEADER_NO -0.01, MXPF_TEST 0.00, RAZOR2_CF_RANGE_51_100 0.50, RAZOR2_CF_RANGE_E8_51_100 1.89, RAZOR2_CHECK 0.92, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, T_KAM_HTML_FONT_INVALID 0.01, URIBL_SBL 1.62, URIBL_SBL_A 0.10)
Code: Select all
2274 Sep 1 14:04:41.927 [12309] dbg: learn: auto-learn: message score: 11.714, computed score for autolearn: 5.26
2275 Sep 1 14:04:41.927 [12309] dbg: learn: auto-learn? ham=-10, spam=9, body-points=1.045, head-points=1.991, learned-points=6
2276 Sep 1 14:04:41.927 [12309] dbg: learn: auto-learn? no: inside auto-learn thresholds, not considered ham or spam
tl;dr: spamassassin autolearn is working correctly as designed.
Re: spamassassin autolearn always no
I didn't need to do this, but I felt compelled to answer my own question anyway.
If you edit /var/www/html/mailscanner/status.php and find the sql query at the beginning of the file, insert the following lines just before the "FROM" line:
so,case
when spamreport like '%autolearn=spam%' then 'spam'
when spamreport like '%autolearn=not spam%' then 'not spam'
else '-'
Code: Select all
mcpsascore,
'' AS status
FROM
maillog
Code: Select all
mcpsascore,
'' AS status
,case
when spamreport like '%autolearn=spam%' then 'spam'
when spamreport like '%autolearn=not spam%' then 'not spam'
else '-'
end as autolearn
FROM
maillog
Note to self: Make the modifications necessary to also display whether it has been "learned" by the bayes classifier or not, in the same column.
Note to self: but don't do this today no matter how much you want to.
- shawniverson
- Posts: 3650
- Joined: 13 Jan 2014 23:30
- Location: Indianapolis, Indiana USA
- Contact:
Re: spamassassin autolearn always no
Big thanks for digging into this!!!
Re: spamassassin autolearn always no
No worries.
It's moments like the above that makes me think I have OCD. Might as well make use of it!
Actually, having a "learned" column would be useful, especially if I can combine the autolearn information with the sa_learn data.
I think I'll do that next.
It's moments like the above that makes me think I have OCD. Might as well make use of it!
Actually, having a "learned" column would be useful, especially if I can combine the autolearn information with the sa_learn data.
I think I'll do that next.