Custom Bad Words

Questions and answers about how to do stuff
Post Reply
wolffpu
Posts: 2
Joined: 16 May 2017 11:02

Custom Bad Words

Post by wolffpu »

Hello,
I'm trying to add a custom filter for bad words on subject email or body email text in our language but it doesn't seem work.
Every guide I've found suggests to change file /etc/mail/spamassassin/local.cf.
I've tried to change also /etc/MailScanner/spamassassin.conf but it doesn't work.

What's wrong?

Code: Select all

header CONTAINS_VIG Subject =~ /badword, badword2,badoword3,etc/i

body CONTAINS_PEN /badword, badword2,badoword3,etc/i

score CONTAINS_VIG 1.5
score CONTAINS_PEN 1.5

describe CONTAINS_VIG Bad Word
describe CONTAINS_PEN Bad Word
KAM.cf custom changes won't survive to update, so I wouldn't change it.

Could you help me?

Thank you,
Regards
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: Custom Bad Words

Post by shawniverson »

Did you run an sa-compile and restart MailScanner after the change?
wolffpu
Posts: 2
Joined: 16 May 2017 11:02

Re: Custom Bad Words

Post by wolffpu »

To reduce errors, I've tried a simpler version and it works:

Code: Select all

header   BAD_WORDS_ITA Subject =~ /badword/i
score    BAD_WORDS_ITA 2.0
If I add another word as below it doesn't work anymore:

Code: Select all

header   BAD_WORDS_ITA Subject =~ /badword,badword2/i
What's wrong?

I need also to match every word I put in, and if there are more words, I need more score points.
Thank you,
Regards
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: Custom Bad Words

Post by shawniverson »

Code: Select all

header   BAD_WORDS_ITA Subject =~ /badword,badword2/i
Cannot comma separate in a regex, do this instead

Code: Select all

header   BAD_WORDS_ITA Subject =~ /(badword|badword2)/i
Better yet, use META rules to score different combinations of bad words...

Code: Select all

header BAD_WORD_1 =~ /badword1/i
header BAD_WORD_2 =~ /badword2/i
header BAD_WORD_3 =~ /badword3/i

meta BAD_WORDSET1 (BAD_WORD_1 && BAD_WORD_2)
score BAD_WORDSET1 2.5

meta BAD_WORDSET2 (BAD_WORD_1 && BAD_WORD_3)
score BAD_WORDSET2 4.0
https://wiki.apache.org/spamassassin/WritingRules
peter.munnelly
Posts: 23
Joined: 25 Nov 2015 16:31

Re: Custom Bad Words

Post by peter.munnelly »

header BAD_WORD_1 =~ /badword1/i
header BAD_WORD_2 =~ /badword2/i
header BAD_WORD_3 =~ /badword3/i

meta BAD_WORDSET1 (BAD_WORD_1 && BAD_WORD_2)
score BAD_WORDSET1 2.5

meta BAD_WORDSET2 (BAD_WORD_1 && BAD_WORD_3)
score BAD_WORDSET2 4.0

This works but the problem here, is that header BAD_WORD_1 and Header_BAD_WORD_2 are also score as 1.0.

For example I want to block specific email for specific domain;

header CONTAINS_domain To =~ /domain1\.co.uk/i
header CONTAINS_domain From =~ /domain2\.co.uk/i
meta CONTAINS_obsidianmunnelly (CONTAINS_domain1 && CONTAINS_domain2)
score CONTAINS_domain1domain2 10.0

What happens in the headers is this;

1.00 CONTAINS_domain1
1.00 CONTAINS_domain2
10.00 CONTAINS_domain1domain2

Then every other email with either the To as domain1 or the From as domain2 is scored as 1.00

If I try to score the individual header values individually then the meta is not processed at all.

I hope that makes sense? and do you have any ideas how to correct this?
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: Custom Bad Words

Post by shawniverson »

Use a META for a single header instead of scoring the header itself.
peter.munnelly
Posts: 23
Joined: 25 Nov 2015 16:31

Re: Custom Bad Words

Post by peter.munnelly »

Sorry not sure what you mean, could you give me an example please?
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: Custom Bad Words

Post by shawniverson »

meta BAD_WORD_1_ONLY (BAD_WORD_1 && NOT BAD_WORDSET1 && NOT BAD_WORDSET2)
score BAD_WORD_1_ONLY 1.0
peter.munnelly
Posts: 23
Joined: 25 Nov 2015 16:31

Re: Custom Bad Words

Post by peter.munnelly »

Thanks, I've got that working now.

Antother question, how can I catch this in the header

List-Unsubscribe:

I have tried
header contains_listunsubscribe1 Header =~ /Unsubscribe\/i

but this does not catch it?
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: Custom Bad Words

Post by pdwalker »

try one of the following, depending on what exactly you want to search for:

Code: Select all

header UNSUBSCRIBE_HEADER List-Unsubscribe =~ /<value>
or

Code: Select all

header UNSUBSCRIBE_HEADER exists:List-Unsubscribe
Let me know if that works for you.
rooter_c
Posts: 13
Joined: 05 Mar 2013 04:52

Re: Custom Bad Words

Post by rooter_c »

shawniverson wrote: 26 May 2017 17:27 meta BAD_WORD_1_ONLY (BAD_WORD_1 && NOT BAD_WORDSET1 && NOT BAD_WORDSET2)
score BAD_WORD_1_ONLY 1.0
I'm new to all this, but after reading the writing rules, I think I have managed to use double underscores to prevent individual rules from scoring but if any 2 are combined it gets a score

header __badword1 from =~ /mikej/i
header __badword2 from =~ /bounce/i
header __badword3 subject =~ /summit/i

meta BAD_WORDSET1 (__badword1 + __badword2 + __badword3 >1)
score BAD_WORDSET1 10.0
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: Custom Bad Words

Post by pdwalker »

Are you getting hits on your BAD_WORD_1_ONLY rule? If so, then you've got it working.

Need someone to send you some test emails with suitably appropriate badwords to get flagged as spam?
rooter_c
Posts: 13
Joined: 05 Mar 2013 04:52

Re: Custom Bad Words

Post by rooter_c »

Thanks, I'm pretty sure mister mikej will hit me up again later tonight...
peter.munnelly
Posts: 23
Joined: 25 Nov 2015 16:31

Re: Custom Bad Words

Post by peter.munnelly »

This seems be doing the trick thanks.

header UNSUBSCRIBE_HEADER exists:List-Unsubscribe
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: Custom Bad Words

Post by pdwalker »

Excellent!
jhavell
Posts: 5
Joined: 12 Apr 2017 19:26

Re: Custom Bad Words

Post by jhavell »

is there an existing ruleset for profanity in headers?

Specifically, I have an issue with a particular set of emails in the past few months that Bayesian marking has not been able to flag reliably, despite repeatedly marking and reporting as spam. They are blatantly obvious advertisements for those "rude dating" websites, but they come from random IP address and random (probably spoofed) email.

Image
Edited for obvious reasons.

I want to add a simple list of words that are blocked. I am not exactly sure how to go about create and testing these rules.

would it be as simple as adding the following to /spamassasin.conf ??
then sa-compile and restart MailScanner after the change?

Code: Select all

header   BAD_WORDS_ITA Subject =~ /(F##K|S#X)/i
or is this better?

Code: Select all

header BAD_WORD_1 =~ /F##K/i
header BAD_WORD_2 =~ /EXPRESS/i
header BAD_WORD_3 =~ /S#X/i

meta BAD_WORDSET1 (__badword1 + __badword2 + __badword3 >1)
score BAD_WORDSET1 10.0

User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: Custom Bad Words

Post by pdwalker »

Strange, I'd have thought that the bayesian filters would have caught it. Can you show us the spamassassin scoring for these messages?

Perhaps they bayesian filters are catching it, but the spamassassin scoring is too low for it to be considered spam. The spamassassin reports will tell us for sure.

If you're unhappy with the current SA filters, and you want to give them a little push, then I'd do what you are suggesting; add in my own rules to /etc/mail/spamassassin/local.cf which is preserved across updates.

I had a look in the current spamassassin rulesets and there doesn't seem to be a rule for trapping the stuff you are interested in, so go ahead and update your local.cf and then restart MailScanner. I've never needed to run sa-compile before.

Oh, and my preference is the second way. Also, don't forget you can do body checks as well.
jhavell
Posts: 5
Joined: 12 Apr 2017 19:26

Re: Custom Bad Words

Post by jhavell »

here is a screen cap using a header search for "F##K EXPRESS". Indeed some are coming through as clean while others not so much. some are even negative SA score!


Also, after posting, this I marked all the "clean" messages as spam and reported.
Image

I wonder if there is something else in common. I am surprised that "F##K" isn't a standard filter for the header. I will add to my local.cf using the meta multiple method and run some tests.
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: Custom Bad Words

Post by pdwalker »

To understand the scores, you really need to show the spamassassin report scorings for these messages, ones like the one below
Screen Shot 2017-07-27 at 10.03.png
Screen Shot 2017-07-27 at 10.03.png (95.47 KiB) Viewed 13343 times
You can see that I've given more spam weight to the bayesian filters values for 99% and 99.9%, plus some other custom rules I've added to my system over time.

I would really like to know the spam report for your negatively scored F##K spam to see what is going wrong.
jhavell
Posts: 5
Joined: 12 Apr 2017 19:26

Re: Custom Bad Words

Post by jhavell »

-3.3 spam score
Image

Bayesian may be working now, however... everything past 7/1 was marked.. the date filtering on message listing sure sorts funny. It cant handle the dates.


From and to address removed. keywork replaced with ### for posting.


# Date/Time (A/D) From (A/D) To (A/D) Subject (A/D) Size (A/D) SA Score (A/D) Status
16/06/17 15:22:05 Easily find girlfriend for ###! 66.47kB -3.3 Clean
11/6/2017 7:58 Easily find girlfriend for ###! 66.63kB -2.49 Clean
17/06/17 03:57:23 Easily find girlfriend for ###! 66.51kB -1.49 Clean
16/06/17 22:27:09 Easily find girlfriend for ###! 66.57kB -1.48 Clean
20/06/17 01:33:46 Easily find girlfriend for ###! 66.55kB -0.79 Clean
16/06/17 09:51:03 Easily find girlfriend for ###! 66.82kB -0.49 Clean
26/06/17 15:33:55 Easily find girlfriend for ###! 66.5kB -0.45 Clean
19/06/17 12:25:21 Easily find girlfriend for ###! 80.04kB -0.36 Clean
12/6/2017 7:11 Easily find girlfriend for ###! 66.26kB 0 Clean
19/06/17 12:08:54 Easily find girlfriend for ###! 66.53kB 0.01 Clean
27/06/17 08:34:15 Easily find girlfriend for ###! 66.46kB 0.04 Clean
10/6/2017 20:41 Easily find girlfriend for ###! 67.93kB 0.1 Clean
19/06/17 12:28:05 Easily find girlfriend for ###! 78.3kB 0.13 Clean
19/06/17 15:11:08 Easily find girlfriend for ###! 77.77kB 0.33 Clean
10/6/2017 16:39 Easily find girlfriend for ###! 66.71kB 0.42 Clean
19/06/17 08:37:54 Easily find girlfriend for ###! 67.51kB 0.5 Clean
26/06/17 20:56:39 Easily find girlfriend for ###! 66.5kB 0.54 Clean
20/06/17 23:22:34 Easily find girlfriend for ###! 66.76kB 0.56 Clean
24/06/17 12:46:28 Easily find girlfriend for ###! 67.73kB 0.91 Clean
19/06/17 14:23:35 Easily find girlfriend for ###! 78.78kB 1.13 Clean
23/06/17 10:38:33 Easily find girlfriend for ###! 78.75kB 1.16 Clean
30/06/17 17:00:14 Easily find girlfriend for ###! 66.63kB 1.21 Clean
30/06/17 20:41:01 Easily find girlfriend for ###! 66.51kB 1.76 Clean
19/06/17 22:46:05 ***wahrsch. SPAM*** Easily find girlfriend for ###! 68.12kB 1.77 Clean
19/06/17 20:47:35 Easily find girlfriend for ###! 66.65kB 2.04 Clean
23/07/17 23:05:51 Easily find girlfriend for ###! 67.67kB 2.21 Clean
29/06/17 02:44:12 Easily find girlfriend for ###! 66.51kB 2.36 Clean
21/06/17 20:40:27 Easily find girlfriend for ###! 66.71kB 2.46 Clean
23/06/17 06:57:43 Easily find girlfriend for ###! 66.55kB 2.84 Clean
23/06/17 06:59:21 Easily find girlfriend for ###! 66.4kB 2.84 Clean
3/7/2017 18:44 Easily find girlfriend for ###! 66.78kB 3.4 Clean
20/06/17 07:30:17 Easily find girlfriend for ###! 67.22kB 3.67 Clean
24/07/17 02:47:29 Easily find girlfriend for ###! 66.75kB 3.87 Clean
23/07/17 11:56:34 Easily find girlfriend for ###! 66.7kB 4.22 Spam
23/07/17 20:40:37 Easily find girlfriend for ###! 67.14kB 4.98 Spam
24/07/17 01:42:23 Easily find girlfriend for ###! 66.75kB 4.98 Spam
24/07/17 10:24:00 Easily find girlfriend for ###! 66.84kB 4.98 Spam
24/07/17 11:46:31 Easily find girlfriend for ###! 66.86kB 4.98 Spam
23/07/17 13:17:42 Easily find girlfriend for ###! 67.57kB 5.1 Spam
23/06/17 20:00:58 Easily find girlfriend for ###! 66.77kB 5.28 Spam
29/06/17 12:19:11 Easily find girlfriend for ###! 66.66kB 5.44 Spam
17/06/17 09:53:55 Easily find girlfriend for ###! 66.75kB 5.56 Spam
24/06/17 09:39:00 Easily find girlfriend for ###! 66.7kB 5.61 Spam
23/07/17 22:29:15 Easily find girlfriend for ###! 67.3kB 5.78 Spam
20/06/17 07:02:35 Easily find girlfriend for ###! 67.55kB 5.95 Spam
22/07/17 22:40:27 Easily find girlfriend for ###! 66.75kB 6.14 Spam
27/06/17 04:35:21 Easily find girlfriend for ###! 66.91kB 6.16 Spam
3/7/2017 7:13 Easily find girlfriend for ###! 79.12kB 6.21 Spam
30/06/17 17:43:03 Easily find girlfriend for ###! 66.66kB 6.25 Spam
24/06/17 10:26:31 Easily find girlfriend for ###! 67.69kB 6.75 Spam
25/06/17 13:14:51 Easily find girlfriend for ###! 67.33kB 7.18 Spam
23/07/17 18:01:08 Easily find girlfriend for ###! 66.74kB 8.02 Spam
24/06/17 11:05:24 Easily find girlfriend for ###! 67.14kB 8.31 Spam
24/06/17 01:23:23 Easily find girlfriend for ###! 67.21kB 8.86 Spam
27/06/17 22:36:41 Easily find girlfriend for ###! 66.75kB 8.92 Spam
28/06/17 14:04:42 Easily find girlfriend for ###! 67.48kB 8.96 Spam
3/7/2017 9:29 Easily find girlfriend for ###! 67.03kB 8.99 Spam
24/07/17 07:15:18 Easily find girlfriend for ###! 66.85kB 9.45 Spam
27/06/17 04:19:34 Easily find girlfriend for ###! 67.3kB 9.8 Spam
23/06/17 05:22:09 Easily find girlfriend for ###! 66.7kB 9.9 Spam
28/06/17 05:51:10 Easily find girlfriend for ###! 67.61kB 10.1 Spam
27/06/17 19:17:27 Easily find girlfriend for ###! 67.51kB 10.89 Spam
24/06/17 21:58:58 Easily find girlfriend for ###! 67.07kB 11.19 Spam
28/06/17 01:51:15 Easily find girlfriend for ###! 67.54kB 13 Spam
2/7/2017 11:52 Easily find girlfriend for ###! 66.7kB 13.39 Spam
30/06/17 21:20:35 Easily find girlfriend for ###! 66.63kB 13.39 Spam
28/06/17 22:56:47 Easily find girlfriend for ###! 67.29kB 14 Spam
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: Custom Bad Words

Post by shawniverson »

It kind've looks like hit and miss to me...

Which is typical of borderline bayesian spam....or adaptive spam that is trying to defeat the bayesian filter....
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: Custom Bad Words

Post by pdwalker »

More importantly, if the bayesian filter is being trained, then it should stop scoring the message as "definitely not spam" (BAYES_00) and move it somewhere into one of the higher classifications.

Also, you may need to add your own filters to help boost your spam score for seemingly obvious spammy messages.

edit: about your sort, that's strange. mine sorts perfectly by date. Maybe you can change your date format to yyyy-mm-dd and then see if it sorts correctly for you.
Post Reply