spamassassin autolearn always no

Woger · Post by **Woger** » 29 Aug 2017 13:53

My bayes database is scoring but every spam email I receive has got "spamassassin autolearn: no". I do quite some manual learning but it seems that autolearn is not working. I have add
bayes_auto_learn_threshold_spam 9.0
to /etc/MailScanner/spamassassin.conf and restarted mailscanner, but still no autolearn.

Does anybody know how to enable this?

Thanks,
Roger

Post by **shawniverson** » 29 Aug 2017 20:47

This?

Code: Select all

bayes_auto_learn 1

Woger · Post by **Woger** » 29 Aug 2017 20:58

I have that too, and I just saw an autlearn yes (no spam) , so it seems to be working. But I can't find a single spam message where the autolearn also said yes

Post by **shawniverson** » 31 Aug 2017 23:40

I wonder if the new bayes behavior is the culprit?

https://spamassassin.apache.org/full/3. ... shold.html

(look at the end)

Post by **pdwalker** » 01 Sep 2017 04:17

How so?

Assuming that bayes_auto_learn is set to 1 /etc/MailScanner/spamassassin.conf then the system should be auto learning according to the following settings

bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)

bayes_auto_learn_threshold_spam n.nn (default: 12.0, minimum 6, 3 from header, 3 from body score)

bayes_auto_learn_on_error (0 | 1) (default: 0)

This last entry says

"With bayes_auto_learn_on_error off, autolearning will be performed even if bayes classifier already agrees with the new classification (i.e. yielded BAYES_00 for what we are now trying to teach it as ham, or yielded BAYES_99 for spam). This is a traditional setting, the default was chosen to retain backwards compatibility."

which means the default value should always "learn" detected spam/ham according to the thresholds above.

[edit]

My bayes_auto_learn_threshold_spam is set to 9.0, and my spam that is 9.0 or greater shows "SpamAssassin Autolearn: N".

Maybe the bayes_auto_learn_on_error default is actually 1? I'm going to set this value to 0 and see if it makes a difference.

Post by **pdwalker** » 01 Sep 2017 04:46

So I set it to 0, and it didn't make a difference - the header still showed "spam assassin autolearn = N"

I did a manual learn, and it reported "learned 1 message" which definitely means it didn't autolearn.

Now that I think about it, I wonder if there is a way to report whether a particular message has been "learned" or not by spamassassin?

*scratches chin* Looks like I have a new distraction for today.

Post by **pdwalker** » 01 Sep 2017 05:03

It might be a bug in Mailscanner. Checking...

[edit] no, not a bug. still checking.

Post by **pdwalker** » 01 Sep 2017 06:06

I ran this query on the mailscanner maillog table:

Code: Select all

SELECT timestamp, sascore, spamreport FROM mailscanner.maillog where (spamreport like '%autolearn=%' and not spamreport like '(blacklisted)') order by timestamp desc;

and I get the following interesting results (only a few examples shown):

Code: Select all

'2017-09-01 00:18:04','-4.48','not spam, SpamAssassin (not cached, score=-4.478, required 4, autolearn=not spam, BAYES_00 -1.90, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00, MIME_QP_LONG_LINE 0.00, ML_SPAM_HEADER_NO -0.01, ML_SPF_PASS -0.68, MXPF_TEST 0.00, RCVD_IN_DNSWL_MED -2.30, RCVD_IN_SORBS_SPAM 0.50, SPF_HELO_PASS -0.00, T_SPF_PERMERROR 0.01)'
'2017-09-01 00:17:25','13.00','spam, SpamAssassin (not cached, score=13.003, required 4, autolearn=spam, BAYES_50 0.80, DATE_IN_PAST_12_24 1.05, DCC_CHECK 1.10, DEAR_SOMETHING 1.97, DIGEST_MULTIPLE 0.29, FREEMAIL_FORGED_FROMDOMAIN 0.20, FREEMAIL_FROM 0.00, FREEMAIL_REPLYTO_END_DIGIT 0.25, HEADER_FROM_DIFFERENT_DOMAINS 0.00, HPF_PASS -0.10, HTML_MESSAGE 0.00, KAM_LAZY_DOMAIN_SECURITY 1.00, MISSING_MID 0.50, ML_SPAMINFO_EXISTS 3.00, MXPF_TEST 0.00, PYZOR_CHECK 1.39, RCVD_IN_BL_SPAMCOP_NET 1.35, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, URIBL_DBL_SPAM 2.50)'
'2017-09-01 00:18:04','-4.48','not spam, SpamAssassin (not cached, score=-4.478, required 4, autolearn=not spam, BAYES_00 -1.90, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00, MIME_QP_LONG_LINE 0.00, ML_SPAM_HEADER_NO -0.01, ML_SPF_PASS -0.68, MXPF_TEST 0.00, RCVD_IN_DNSWL_MED -2.30, RCVD_IN_SORBS_SPAM 0.50, SPF_HELO_PASS -0.00, T_SPF_PERMERROR 0.01)'
'2017-09-01 00:17:25','spam, SpamAssassin (not cached, score=13.003, required 4, autolearn=spam, BAYES_50 0.80, DATE_IN_PAST_12_24 1.05, DCC_CHECK 1.10, DEAR_SOMETHING 1.97, DIGEST_MULTIPLE 0.29, FREEMAIL_FORGED_FROMDOMAIN 0.20, FREEMAIL_FROM 0.00, FREEMAIL_REPLYTO_END_DIGIT 0.25, HEADER_FROM_DIFFERENT_DOMAINS 0.00, HPF_PASS -0.10, HTML_MESSAGE 0.00, KAM_LAZY_DOMAIN_SECURITY 1.00, MISSING_MID 0.50, ML_SPAMINFO_EXISTS 3.00, MXPF_TEST 0.00, PYZOR_CHECK 1.39, RCVD_IN_BL_SPAMCOP_NET 1.35, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, URIBL_DBL_SPAM 2.50)'

The key thing to see in the spamassassin report is the "autolearn="

When I look at all my records, not every entry has an autolearn, including some of the ones that I, based on my autolearn threshold settings should have been learned.

Filtered for the ones with an autolearn in the spamreport, I can see that it does autolearn some for both spam and not spam, so autolearn is definitely working in some circumstances. My log entries above show this.

The only thing I can think of is from the spamassassin documentation

Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6.

So I'm guessing that the messages that are not getting autolearned are because the header or body is not scoring at least three on the spam autolearn test.

Let's test a message that was marked as spam, but not autolearned

spam, SpamAssassin (not cached, score=12.714, required 4, BAYES_99 4.00, BAYES_999 2.00, HTML_FONT_FACE_BAD 0.98, HTML_FONT_LOW_CONTRAST 0.00, HTML_MESSAGE 0.00, KAM_BADIPHTTP 2.00, KAM_LAZY_DOMAIN_SECURITY 1.00, ML_SPAM_HEADER_NO -0.01, MXPF_TEST 0.00, RAZOR2_CF_RANGE_51_100 0.50, RAZOR2_CF_RANGE_E8_51_100 1.89, RAZOR2_CHECK 0.92, RCVD_IN_DNSWL_MED -2.30, SPF_HELO_PASS -0.00, T_KAM_HTML_FONT_INVALID 0.01, URIBL_SBL 1.62, URIBL_SBL_A 0.10)

A spamassassin -D -t on that message reveals:

Code: Select all

2274 Sep  1 14:04:41.927 [12309] dbg: learn: auto-learn: message score: 11.714, computed score for autolearn: 5.26
2275 Sep  1 14:04:41.927 [12309] dbg: learn: auto-learn? ham=-10, spam=9, body-points=1.045, head-points=1.991, learned-points=6
2276 Sep  1 14:04:41.927 [12309] dbg: learn: auto-learn? no: inside auto-learn thresholds, not considered ham or spam

and that appears to be the case.

tl;dr: spamassassin autolearn is working correctly as designed.

Post by **pdwalker** » 01 Sep 2017 06:36

pdwalker wrote: 01 Sep 2017 04:46Now that I think about it, I wonder if there is a way to report whether a particular message has been "learned" or not by spamassassin?

I didn't need to do this, but I felt compelled to answer my own question anyway.

If you edit /var/www/html/mailscanner/status.php and find the sql query at the beginning of the file, insert the following lines just before the "FROM" line:

,case
when spamreport like '%autolearn=spam%' then 'spam'
when spamreport like '%autolearn=not spam%' then 'not spam'
else '-'

so

Code: Select all

 mcpsascore,
 '' AS status
FROM
 maillog

becomes:

Code: Select all

 mcpsascore,
 '' AS status
,case
  when spamreport like '%autolearn=spam%' then 'spam'
  when spamreport like '%autolearn=not spam%' then 'not spam'
  else '-'
end as autolearn
FROM
 maillog

and you'll end up with an extra column on the far right that'll answer your question whether a message has been autolearned or not, and if so, as what. (not a multiple display language friendly fix, I was in a hurry)

: autolearn.png (85.84 KiB) Viewed 12564 times

Note to self: Make the modifications necessary to also display whether it has been "learned" by the bayes classifier or not, in the same column.

Note to self: but don't do this today no matter how much you want to.

Post by **shawniverson** » 04 Sep 2017 17:03

Big thanks for digging into this!!!

Post by **pdwalker** » 05 Sep 2017 03:34

No worries.

It's moments like the above that makes me think I have OCD. Might as well make use of it!

Actually, having a "learned" column would be useful, especially if I can combine the autolearn information with the sa_learn data.

I think I'll do that next.

efa-project.org

spamassassin autolearn always no

spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no

Re: spamassassin autolearn always no