I ran this query on the mailscanner maillog table:
Code: Select all
SELECT timestamp, sascore, spamreport FROM mailscanner.maillog where (spamreport like '%autolearn=%' and not spamreport like '(blacklisted)') order by timestamp desc;
and I get the following interesting results (only a few examples shown):
Code: Select all
'2017-09-01 00:18:04','-4.48','not spam, SpamAssassin (not cached, score=-4.478, required 4, autolearn=not spam, BAYES_00 -1.90, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00, MIME_QP_LONG_LINE 0.00, ML_SPAM_HEADER_NO -0.01, ML_SPF_PASS -0.68, MXPF_TEST 0.00, RCVD_IN_DNSWL_MED -2.30, RCVD_IN_SORBS_SPAM 0.50, SPF_HELO_PASS -0.00, T_SPF_PERMERROR 0.01)'
'2017-09-01 00:17:25','13.00','spam, SpamAssassin (not cached, score=13.003, required 4, autolearn=spam, BAYES_50 0.80, DATE_IN_PAST_12_24 1.05, DCC_CHECK 1.10, DEAR_SOMETHING 1.97, DIGEST_MULTIPLE 0.29, FREEMAIL_FORGED_FROMDOMAIN 0.20, FREEMAIL_FROM 0.00, FREEMAIL_REPLYTO_END_DIGIT 0.25, HEADER_FROM_DIFFERENT_DOMAINS 0.00, HPF_PASS -0.10, HTML_MESSAGE 0.00, KAM_LAZY_DOMAIN_SECURITY 1.00, MISSING_MID 0.50, ML_SPAMINFO_EXISTS 3.00, MXPF_TEST 0.00, PYZOR_CHECK 1.39, RCVD_IN_BL_SPAMCOP_NET 1.35, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, URIBL_DBL_SPAM 2.50)'
'2017-09-01 00:18:04','-4.48','not spam, SpamAssassin (not cached, score=-4.478, required 4, autolearn=not spam, BAYES_00 -1.90, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00, MIME_QP_LONG_LINE 0.00, ML_SPAM_HEADER_NO -0.01, ML_SPF_PASS -0.68, MXPF_TEST 0.00, RCVD_IN_DNSWL_MED -2.30, RCVD_IN_SORBS_SPAM 0.50, SPF_HELO_PASS -0.00, T_SPF_PERMERROR 0.01)'
'2017-09-01 00:17:25','spam, SpamAssassin (not cached, score=13.003, required 4, autolearn=spam, BAYES_50 0.80, DATE_IN_PAST_12_24 1.05, DCC_CHECK 1.10, DEAR_SOMETHING 1.97, DIGEST_MULTIPLE 0.29, FREEMAIL_FORGED_FROMDOMAIN 0.20, FREEMAIL_FROM 0.00, FREEMAIL_REPLYTO_END_DIGIT 0.25, HEADER_FROM_DIFFERENT_DOMAINS 0.00, HPF_PASS -0.10, HTML_MESSAGE 0.00, KAM_LAZY_DOMAIN_SECURITY 1.00, MISSING_MID 0.50, ML_SPAMINFO_EXISTS 3.00, MXPF_TEST 0.00, PYZOR_CHECK 1.39, RCVD_IN_BL_SPAMCOP_NET 1.35, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, URIBL_DBL_SPAM 2.50)'
The key thing to see in the spamassassin report is the "autolearn="
When I look at all my records, not every entry has an autolearn, including some of the ones that I, based on my autolearn threshold settings should have been learned.
Filtered for the ones with an autolearn in the spamreport, I can see that it does autolearn some for both spam and not spam, so autolearn is definitely working in some circumstances. My log entries above show this.
The only thing I can think of is from the spamassassin documentation
Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6.
So I'm guessing that the messages that are not getting autolearned are because the header or body is not scoring at least three on the spam autolearn test.
Let's test a message that was marked as spam, but not autolearned
spam, SpamAssassin (not cached, score=12.714, required 4, BAYES_99 4.00, BAYES_999 2.00, HTML_FONT_FACE_BAD 0.98, HTML_FONT_LOW_CONTRAST 0.00, HTML_MESSAGE 0.00, KAM_BADIPHTTP 2.00, KAM_LAZY_DOMAIN_SECURITY 1.00, ML_SPAM_HEADER_NO -0.01, MXPF_TEST 0.00, RAZOR2_CF_RANGE_51_100 0.50, RAZOR2_CF_RANGE_E8_51_100 1.89, RAZOR2_CHECK 0.92, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, T_KAM_HTML_FONT_INVALID 0.01, URIBL_SBL 1.62, URIBL_SBL_A 0.10)
A spamassassin -D -t on that message reveals:
Code: Select all
2274 Sep 1 14:04:41.927 [12309] dbg: learn: auto-learn: message score: 11.714, computed score for autolearn: 5.26
2275 Sep 1 14:04:41.927 [12309] dbg: learn: auto-learn? ham=-10, spam=9, body-points=1.045, head-points=1.991, learned-points=6
2276 Sep 1 14:04:41.927 [12309] dbg: learn: auto-learn? no: inside auto-learn thresholds, not considered ham or spam
and that appears to be the case.
tl;dr: spamassassin autolearn is working correctly as designed.