Page 1 of 1

Custom Bad Words

Posted: 16 May 2017 11:13
by wolffpu
Hello,
I'm trying to add a custom filter for bad words on subject email or body email text in our language but it doesn't seem work.
Every guide I've found suggests to change file /etc/mail/spamassassin/local.cf.
I've tried to change also /etc/MailScanner/spamassassin.conf but it doesn't work.

What's wrong?

Code: Select all

header CONTAINS_VIG Subject =~ /badword, badword2,badoword3,etc/i

body CONTAINS_PEN /badword, badword2,badoword3,etc/i

score CONTAINS_VIG 1.5
score CONTAINS_PEN 1.5

describe CONTAINS_VIG Bad Word
describe CONTAINS_PEN Bad Word
KAM.cf custom changes won't survive to update, so I wouldn't change it.

Could you help me?

Thank you,
Regards

Re: Custom Bad Words

Posted: 16 May 2017 12:09
by shawniverson
Did you run an sa-compile and restart MailScanner after the change?

Re: Custom Bad Words

Posted: 16 May 2017 14:03
by wolffpu
To reduce errors, I've tried a simpler version and it works:

Code: Select all

header   BAD_WORDS_ITA Subject =~ /badword/i
score    BAD_WORDS_ITA 2.0
If I add another word as below it doesn't work anymore:

Code: Select all

header   BAD_WORDS_ITA Subject =~ /badword,badword2/i
What's wrong?

I need also to match every word I put in, and if there are more words, I need more score points.
Thank you,
Regards

Re: Custom Bad Words

Posted: 16 May 2017 21:23
by shawniverson

Code: Select all

header   BAD_WORDS_ITA Subject =~ /badword,badword2/i
Cannot comma separate in a regex, do this instead

Code: Select all

header   BAD_WORDS_ITA Subject =~ /(badword|badword2)/i
Better yet, use META rules to score different combinations of bad words...

Code: Select all

header BAD_WORD_1 =~ /badword1/i
header BAD_WORD_2 =~ /badword2/i
header BAD_WORD_3 =~ /badword3/i

meta BAD_WORDSET1 (BAD_WORD_1 && BAD_WORD_2)
score BAD_WORDSET1 2.5

meta BAD_WORDSET2 (BAD_WORD_1 && BAD_WORD_3)
score BAD_WORDSET2 4.0
https://wiki.apache.org/spamassassin/WritingRules

Re: Custom Bad Words

Posted: 19 May 2017 15:41
by peter.munnelly
header BAD_WORD_1 =~ /badword1/i
header BAD_WORD_2 =~ /badword2/i
header BAD_WORD_3 =~ /badword3/i

meta BAD_WORDSET1 (BAD_WORD_1 && BAD_WORD_2)
score BAD_WORDSET1 2.5

meta BAD_WORDSET2 (BAD_WORD_1 && BAD_WORD_3)
score BAD_WORDSET2 4.0

This works but the problem here, is that header BAD_WORD_1 and Header_BAD_WORD_2 are also score as 1.0.

For example I want to block specific email for specific domain;

header CONTAINS_domain To =~ /domain1\.co.uk/i
header CONTAINS_domain From =~ /domain2\.co.uk/i
meta CONTAINS_obsidianmunnelly (CONTAINS_domain1 && CONTAINS_domain2)
score CONTAINS_domain1domain2 10.0

What happens in the headers is this;

1.00 CONTAINS_domain1
1.00 CONTAINS_domain2
10.00 CONTAINS_domain1domain2

Then every other email with either the To as domain1 or the From as domain2 is scored as 1.00

If I try to score the individual header values individually then the meta is not processed at all.

I hope that makes sense? and do you have any ideas how to correct this?

Re: Custom Bad Words

Posted: 20 May 2017 11:24
by shawniverson
Use a META for a single header instead of scoring the header itself.

Re: Custom Bad Words

Posted: 26 May 2017 17:09
by peter.munnelly
Sorry not sure what you mean, could you give me an example please?

Re: Custom Bad Words

Posted: 26 May 2017 17:27
by shawniverson
meta BAD_WORD_1_ONLY (BAD_WORD_1 && NOT BAD_WORDSET1 && NOT BAD_WORDSET2)
score BAD_WORD_1_ONLY 1.0

Re: Custom Bad Words

Posted: 26 May 2017 20:41
by peter.munnelly
Thanks, I've got that working now.

Antother question, how can I catch this in the header

List-Unsubscribe:

I have tried
header contains_listunsubscribe1 Header =~ /Unsubscribe\/i

but this does not catch it?

Re: Custom Bad Words

Posted: 27 May 2017 08:45
by pdwalker
try one of the following, depending on what exactly you want to search for:

Code: Select all

header UNSUBSCRIBE_HEADER List-Unsubscribe =~ /<value>
or

Code: Select all

header UNSUBSCRIBE_HEADER exists:List-Unsubscribe
Let me know if that works for you.

Re: Custom Bad Words

Posted: 29 May 2017 04:15
by rooter_c
shawniverson wrote: 26 May 2017 17:27 meta BAD_WORD_1_ONLY (BAD_WORD_1 && NOT BAD_WORDSET1 && NOT BAD_WORDSET2)
score BAD_WORD_1_ONLY 1.0
I'm new to all this, but after reading the writing rules, I think I have managed to use double underscores to prevent individual rules from scoring but if any 2 are combined it gets a score

header __badword1 from =~ /mikej/i
header __badword2 from =~ /bounce/i
header __badword3 subject =~ /summit/i

meta BAD_WORDSET1 (__badword1 + __badword2 + __badword3 >1)
score BAD_WORDSET1 10.0

Re: Custom Bad Words

Posted: 29 May 2017 04:47
by pdwalker
Are you getting hits on your BAD_WORD_1_ONLY rule? If so, then you've got it working.

Need someone to send you some test emails with suitably appropriate badwords to get flagged as spam?

Re: Custom Bad Words

Posted: 29 May 2017 04:57
by rooter_c
Thanks, I'm pretty sure mister mikej will hit me up again later tonight...

Re: Custom Bad Words

Posted: 30 May 2017 11:23
by peter.munnelly
This seems be doing the trick thanks.

header UNSUBSCRIBE_HEADER exists:List-Unsubscribe

Re: Custom Bad Words

Posted: 31 May 2017 10:03
by pdwalker
Excellent!

Re: Custom Bad Words

Posted: 25 Jul 2017 14:54
by jhavell
is there an existing ruleset for profanity in headers?

Specifically, I have an issue with a particular set of emails in the past few months that Bayesian marking has not been able to flag reliably, despite repeatedly marking and reporting as spam. They are blatantly obvious advertisements for those "rude dating" websites, but they come from random IP address and random (probably spoofed) email.

Image
Edited for obvious reasons.

I want to add a simple list of words that are blocked. I am not exactly sure how to go about create and testing these rules.

would it be as simple as adding the following to /spamassasin.conf ??
then sa-compile and restart MailScanner after the change?

Code: Select all

header   BAD_WORDS_ITA Subject =~ /(F##K|S#X)/i
or is this better?

Code: Select all

header BAD_WORD_1 =~ /F##K/i
header BAD_WORD_2 =~ /EXPRESS/i
header BAD_WORD_3 =~ /S#X/i

meta BAD_WORDSET1 (__badword1 + __badword2 + __badword3 >1)
score BAD_WORDSET1 10.0


Re: Custom Bad Words

Posted: 26 Jul 2017 10:26
by pdwalker
Strange, I'd have thought that the bayesian filters would have caught it. Can you show us the spamassassin scoring for these messages?

Perhaps they bayesian filters are catching it, but the spamassassin scoring is too low for it to be considered spam. The spamassassin reports will tell us for sure.

If you're unhappy with the current SA filters, and you want to give them a little push, then I'd do what you are suggesting; add in my own rules to /etc/mail/spamassassin/local.cf which is preserved across updates.

I had a look in the current spamassassin rulesets and there doesn't seem to be a rule for trapping the stuff you are interested in, so go ahead and update your local.cf and then restart MailScanner. I've never needed to run sa-compile before.

Oh, and my preference is the second way. Also, don't forget you can do body checks as well.

Re: Custom Bad Words

Posted: 26 Jul 2017 20:27
by jhavell
here is a screen cap using a header search for "F##K EXPRESS". Indeed some are coming through as clean while others not so much. some are even negative SA score!


Also, after posting, this I marked all the "clean" messages as spam and reported.
Image

I wonder if there is something else in common. I am surprised that "F##K" isn't a standard filter for the header. I will add to my local.cf using the meta multiple method and run some tests.

Re: Custom Bad Words

Posted: 27 Jul 2017 02:11
by pdwalker
To understand the scores, you really need to show the spamassassin report scorings for these messages, ones like the one below
Screen Shot 2017-07-27 at 10.03.png
Screen Shot 2017-07-27 at 10.03.png (95.47 KiB) Viewed 20115 times
You can see that I've given more spam weight to the bayesian filters values for 99% and 99.9%, plus some other custom rules I've added to my system over time.

I would really like to know the spam report for your negatively scored F##K spam to see what is going wrong.

Re: Custom Bad Words

Posted: 02 Aug 2017 14:50
by jhavell
-3.3 spam score
Image

Bayesian may be working now, however... everything past 7/1 was marked.. the date filtering on message listing sure sorts funny. It cant handle the dates.


From and to address removed. keywork replaced with ### for posting.


# Date/Time (A/D) From (A/D) To (A/D) Subject (A/D) Size (A/D) SA Score (A/D) Status
16/06/17 15:22:05 Easily find girlfriend for ###! 66.47kB -3.3 Clean
11/6/2017 7:58 Easily find girlfriend for ###! 66.63kB -2.49 Clean
17/06/17 03:57:23 Easily find girlfriend for ###! 66.51kB -1.49 Clean
16/06/17 22:27:09 Easily find girlfriend for ###! 66.57kB -1.48 Clean
20/06/17 01:33:46 Easily find girlfriend for ###! 66.55kB -0.79 Clean
16/06/17 09:51:03 Easily find girlfriend for ###! 66.82kB -0.49 Clean
26/06/17 15:33:55 Easily find girlfriend for ###! 66.5kB -0.45 Clean
19/06/17 12:25:21 Easily find girlfriend for ###! 80.04kB -0.36 Clean
12/6/2017 7:11 Easily find girlfriend for ###! 66.26kB 0 Clean
19/06/17 12:08:54 Easily find girlfriend for ###! 66.53kB 0.01 Clean
27/06/17 08:34:15 Easily find girlfriend for ###! 66.46kB 0.04 Clean
10/6/2017 20:41 Easily find girlfriend for ###! 67.93kB 0.1 Clean
19/06/17 12:28:05 Easily find girlfriend for ###! 78.3kB 0.13 Clean
19/06/17 15:11:08 Easily find girlfriend for ###! 77.77kB 0.33 Clean
10/6/2017 16:39 Easily find girlfriend for ###! 66.71kB 0.42 Clean
19/06/17 08:37:54 Easily find girlfriend for ###! 67.51kB 0.5 Clean
26/06/17 20:56:39 Easily find girlfriend for ###! 66.5kB 0.54 Clean
20/06/17 23:22:34 Easily find girlfriend for ###! 66.76kB 0.56 Clean
24/06/17 12:46:28 Easily find girlfriend for ###! 67.73kB 0.91 Clean
19/06/17 14:23:35 Easily find girlfriend for ###! 78.78kB 1.13 Clean
23/06/17 10:38:33 Easily find girlfriend for ###! 78.75kB 1.16 Clean
30/06/17 17:00:14 Easily find girlfriend for ###! 66.63kB 1.21 Clean
30/06/17 20:41:01 Easily find girlfriend for ###! 66.51kB 1.76 Clean
19/06/17 22:46:05 ***wahrsch. SPAM*** Easily find girlfriend for ###! 68.12kB 1.77 Clean
19/06/17 20:47:35 Easily find girlfriend for ###! 66.65kB 2.04 Clean
23/07/17 23:05:51 Easily find girlfriend for ###! 67.67kB 2.21 Clean
29/06/17 02:44:12 Easily find girlfriend for ###! 66.51kB 2.36 Clean
21/06/17 20:40:27 Easily find girlfriend for ###! 66.71kB 2.46 Clean
23/06/17 06:57:43 Easily find girlfriend for ###! 66.55kB 2.84 Clean
23/06/17 06:59:21 Easily find girlfriend for ###! 66.4kB 2.84 Clean
3/7/2017 18:44 Easily find girlfriend for ###! 66.78kB 3.4 Clean
20/06/17 07:30:17 Easily find girlfriend for ###! 67.22kB 3.67 Clean
24/07/17 02:47:29 Easily find girlfriend for ###! 66.75kB 3.87 Clean
23/07/17 11:56:34 Easily find girlfriend for ###! 66.7kB 4.22 Spam
23/07/17 20:40:37 Easily find girlfriend for ###! 67.14kB 4.98 Spam
24/07/17 01:42:23 Easily find girlfriend for ###! 66.75kB 4.98 Spam
24/07/17 10:24:00 Easily find girlfriend for ###! 66.84kB 4.98 Spam
24/07/17 11:46:31 Easily find girlfriend for ###! 66.86kB 4.98 Spam
23/07/17 13:17:42 Easily find girlfriend for ###! 67.57kB 5.1 Spam
23/06/17 20:00:58 Easily find girlfriend for ###! 66.77kB 5.28 Spam
29/06/17 12:19:11 Easily find girlfriend for ###! 66.66kB 5.44 Spam
17/06/17 09:53:55 Easily find girlfriend for ###! 66.75kB 5.56 Spam
24/06/17 09:39:00 Easily find girlfriend for ###! 66.7kB 5.61 Spam
23/07/17 22:29:15 Easily find girlfriend for ###! 67.3kB 5.78 Spam
20/06/17 07:02:35 Easily find girlfriend for ###! 67.55kB 5.95 Spam
22/07/17 22:40:27 Easily find girlfriend for ###! 66.75kB 6.14 Spam
27/06/17 04:35:21 Easily find girlfriend for ###! 66.91kB 6.16 Spam
3/7/2017 7:13 Easily find girlfriend for ###! 79.12kB 6.21 Spam
30/06/17 17:43:03 Easily find girlfriend for ###! 66.66kB 6.25 Spam
24/06/17 10:26:31 Easily find girlfriend for ###! 67.69kB 6.75 Spam
25/06/17 13:14:51 Easily find girlfriend for ###! 67.33kB 7.18 Spam
23/07/17 18:01:08 Easily find girlfriend for ###! 66.74kB 8.02 Spam
24/06/17 11:05:24 Easily find girlfriend for ###! 67.14kB 8.31 Spam
24/06/17 01:23:23 Easily find girlfriend for ###! 67.21kB 8.86 Spam
27/06/17 22:36:41 Easily find girlfriend for ###! 66.75kB 8.92 Spam
28/06/17 14:04:42 Easily find girlfriend for ###! 67.48kB 8.96 Spam
3/7/2017 9:29 Easily find girlfriend for ###! 67.03kB 8.99 Spam
24/07/17 07:15:18 Easily find girlfriend for ###! 66.85kB 9.45 Spam
27/06/17 04:19:34 Easily find girlfriend for ###! 67.3kB 9.8 Spam
23/06/17 05:22:09 Easily find girlfriend for ###! 66.7kB 9.9 Spam
28/06/17 05:51:10 Easily find girlfriend for ###! 67.61kB 10.1 Spam
27/06/17 19:17:27 Easily find girlfriend for ###! 67.51kB 10.89 Spam
24/06/17 21:58:58 Easily find girlfriend for ###! 67.07kB 11.19 Spam
28/06/17 01:51:15 Easily find girlfriend for ###! 67.54kB 13 Spam
2/7/2017 11:52 Easily find girlfriend for ###! 66.7kB 13.39 Spam
30/06/17 21:20:35 Easily find girlfriend for ###! 66.63kB 13.39 Spam
28/06/17 22:56:47 Easily find girlfriend for ###! 67.29kB 14 Spam

Re: Custom Bad Words

Posted: 02 Aug 2017 22:17
by shawniverson
It kind've looks like hit and miss to me...

Which is typical of borderline bayesian spam....or adaptive spam that is trying to defeat the bayesian filter....

Re: Custom Bad Words

Posted: 03 Aug 2017 07:49
by pdwalker
More importantly, if the bayesian filter is being trained, then it should stop scoring the message as "definitely not spam" (BAYES_00) and move it somewhere into one of the higher classifications.

Also, you may need to add your own filters to help boost your spam score for seemingly obvious spammy messages.

edit: about your sort, that's strange. mine sorts perfectly by date. Maybe you can change your date format to yyyy-mm-dd and then see if it sorts correctly for you.