ImageCerberusPLG5 high score, no?

robertboyl · Post by **robertboyl** » 01 Apr 2016 14:41

Hi, everyone

I found an email, false positive, and the rule ImageCerberusPLG5 4.50 had a hit with high score. All the email had was a banner/image/letterhead with customers logo.

I found it strange, as this rule is not in official SA and as I said, the score is really high, helped a lot to mark the innocent message as spam. Ill try to analyse to see if the rule helps at all...

I saw other posts in forum about people asking on this score and considering lowering score. Any thoughts if this rule does have good hits and why such a high score and how it works? It tries to catch some pornografic images or something?

Thanks

Post by **shawniverson** » 02 Apr 2016 14:25

For some this is the case. Image analysis is not always perfect.

Simply set a lower score for ImageCerberusPLG5 in /etc/mail/spamassassin/local.cf

robertboyl · Post by **robertboyl** » 08 Apr 2016 14:09

Thanks, but is this an official SA rule? As I dont see it in SA rules. It does what exactly, what type of image it catches, porn?

Why such a high score? I will try to analyse to see if it does have some good hits also...

What are other folks experience with this rule? Worth lowering score?

Thanks

dwmp · Post by **dwmp** » 19 May 2016 10:22

Hello,

we have the same problem (EFA 3.0.0.9 installed). But nothing about ImageCerverusPLGx is written in /etc/mail/spamassassin/local.cf. Instead the scores are configured in /etc/mail/spamassassin/ImageCerberusPLG.cf
So in which file shall we make changes to edit the score level?
Thanks in advance!

dwmp

ovizii · Post by **ovizii** » 19 May 2016 14:51

you edit /etc/mail/spamassassin/ImageCerberusPLG.cf to lower your scores

Post by **pdwalker** » 19 May 2016 16:20

No. Don't edit that file. That file could get overwritten on an update.

The proper answer is to override the values in local.cf.

I also found the ImageCerberus scoring too highly for the messages I received, so I reduced them to a 10th of what they were. Here is what I added to /etc/mail/spamassassin/local.cf

Code: Select all

# scoring too high.  Reduce
score     ImageCerberusPLG5     0.5  0.5  0.5  0.5
score     ImageCerberusPLG4     0.4  0.4  0.4  0.4
score     ImageCerberusPLG3     0.3  0.3  0.3  0.3
score     ImageCerberusPLG2     0.2  0.2  0.2  0.2
score     ImageCerberusPLG1     0.1  0.1  0.1  0.1

ovizii · Post by **ovizii** » 19 May 2016 16:34

Sorry if I gave wrong advice but all the sites I've been browsing were saying custom .cf and .pm files go into /etc/mail/spamassassin/ so I didn't expect anything to overwrite files in there but after reading up on it it seems that I was only partially right: you can place your custom files in there and they will stay but everything that's already in there by default could be overwritten.

Post by **pdwalker** » 19 May 2016 16:39

ovizii, that is the place for custom cf amd pm files (at least that is where I am putting mine, but it's not really the place to alter preexisting files (except local.cf) as they were created by other packages.

so yes, prexisting stuff (except local.cf hopefully) could get overwritten. new stuff should be left alone.

dwmp · Post by **dwmp** » 20 May 2016 10:34

Alright, i will edit the local.cf
Thanks very much guys!

robertboyl · Post by **robertboyl** » 20 May 2016 17:07

Hi,

Thanks, everyone! Is it not possible/worth it to lower these scores by default in EFA?

Are these official SA rules?

Thanks

Post by **shawniverson** » 28 May 2016 19:43

robertboyl wrote:Hi,

Thanks, everyone! Is it not possible/worth it to lower these scores by default in EFA?

Are these official SA rules?

Thanks

https://github.com/E-F-A/v3/issues/284

robertboyl · Post by **robertboyl** » 03 Jun 2016 14:31

Thanks a lot, Shawn, very nice of you.

Congrats on EFA and constant improvements!!

Daniel Beardsmore · Post by **Daniel Beardsmore** » 08 Jun 2016 20:12

robertboyl wrote:What are other folks experience with this rule? Worth lowering score?

I've just spotted ImageCerberusPLG3 trip up over a couple of innocuous images in a mail signature (one blank(!) and one being the company name in black text) — this earned the message +3 for its audacity, taking it to 4.08 total (it got hit for 1.20 for KAM_LINEPADDING, but reprieved a little for using DKIM).

I seem to recall being concerned with ImageCerberus scoring in the past, so I think it's time to lower the scores myself too.

robertboyl wrote:Why such a high score? I will try to analyse to see if it does have some good hits also...

This reminds me — something MailScanner seems to lack is a way to ask it "what have the Romans^W^W^Whas this rule ever done for me?" — it's all very well cursing a rule for a false positive, but maybe it's 99% accurate. I've yet to see any feature that allows you to conduct this search, although I've yet to actively seek a solution to this (unless I already tried, failed and forgot about it, which is possible …)

Post by **pdwalker** » 09 Jun 2016 16:06

There is a report that'll show you the spam assassin rule hits and the spam/non spam scoring.

Sorry, not at a computer so cannot tell you exactly where. Look under reports or tools and you'll find it.

Daniel Beardsmore · Post by **Daniel Beardsmore** » 09 Jun 2016 16:49

pdwalker wrote:There is a report that'll show you the spam assassin rule hits and the spam/non spam scoring.

That report doesn't bring up the individual messages associated with each rule. You can't determine from the report whether a rule is scoring too lowly (i.e. there are too many false negatives) or whether the rule is scoring too highly (i.e. there are too many false positives).

The one thing you do learn from it is the significance of the rule: the ratio of messages affected (positively or negatively) vs total messages for ImageCerberusPLG3 is 0.4% for me, so it doesn't seem a huge deal to largely write it out of the equation.

ImageCerberusPLG1 is the only one seeing a sizeable usage, of 7.5%, but (as I understand it) was only adding +1 anyway.

ovizii · Post by **ovizii** » 12 Jun 2016 19:16

Not sure what yo uare looking for but in my opinion going to EFA => Reports => SA Rule Hits shows all that I need to fine-tune and tweak my scores.

ovizii · Post by **ovizii** » 13 Jun 2016 07:39

Has anyone got good experiences with this ImageCerberus plugin?

I just checked my stats and ImageCerberusPLG1 - ImageCerberusPLG4 are 100% HAM and ImageCerberusPLG5 is 50% HAM / 50% SPAM so this plugin basically does nothing to help me...

Daniel Beardsmore · Post by **Daniel Beardsmore** » 13 Jun 2016 07:57

So far as I can tell, the only information that you can gather from the SpamAssassin Rule Hits report is what percentage of messages are being affected by a rule. If this percentage is low, and the rule caused a false positive, it's safe to disable the rule, as it was doing very little anyway.

Let's imagine however that a rule was found to be involved with a lot of messages, and scored 50% spam/50% ham. There are several possible explanations for this. One is that it should be scoring 100% spam, but the rule is scored too lowly and isn't effective enough. Another explanation is that the rule is either entirely inappropriate or is being scored tooo highly, and is causing a large number of false positives. It may be that the rule is actually largely ineffective and just happens to be there doing very little.

As I understand it, the "Ham" and "Spam" columns of the report don't tell you what the message really was, since SpamAssassin doesn't know. The figures only tell you how messages got classified, rather than how they should have been classified.

The report doesn't tell you how strong the rule is (that is, what score was applied), and the report doesn't give you any means to check for yourself whether the ham/spam classification is behaving as desired.

Going back to that 50/50 rule: if you could bring up a report of all messages affected by that rule, you could skim-read the list and check that all the messages in red appear to be spam, and that all the messages in grey appear to be ham. If you were to see ham messages in red and spam messages in grey, then it would become apparent that the messages are being misclassified, and you could check each message to see how the rule in question was being applied, and whether it was responsible for the misclassification.

Not unless I am missing something here?

ovizii · Post by **ovizii** » 13 Jun 2016 08:11

Makes sense and you're right but in my situation I can "trust" the report as I have only recently started using EFA (a few weeks ago) so I have been monitoring it very closely, daily, and corrected every single mistake made so I'd say that every SPAM got caught or at least learned by SA as SPAM if it slipped through and every HAm got marked and learned as HAM. TXREp and Bayes are working perfectly.
I did look at as many emails as I could which were 50/50 with ImageCerberusPLG5 and well, they were 50% SPAm and 50% HAM.:-/

Yes, I second the idea that being able to pull out all messages marked by a specific SA plugin would be awesome but I don't think that would be interesting to too many people or doable.

Post by **pdwalker** » 16 Jun 2016 04:57

If you're interested in playing with SQL, you can run a query to list which messages were affected by certain rules.

Of course, you may want to process the results further to make it a little more readable.

Code: Select all

select 
  timestamp, id, from_address, to_address, subject isspam, ishighspam, issaspam, sascore, spamreport 
from 
  mailscanner.maillog 
where 
  spamreport like '% ImageCerberusPLG5 %' 
order by 
  sascore;

robertboyl · Post by **robertboyl** » 17 Aug 2016 17:06

Guys/Shawn,

Just curious, what value you suggest to score for ImageCerberusPLG5? Maybe 1 point instead of 4.50?

I dont have root access, but Ill ask my sysadmin to see if he assess this, filter out a few days of emails and see how many good results it has, etc. I see some very weird false positives. A very basic customers signature in an email with his company logo and name and it hit ImageCerberusPLG5 4.50

I wonder if this works at all and what ratio. If anyone has any analysis, pls send.

Will try also to get more details on the ratio of when it works versus causes FP.

Thanks!

Post by **pdwalker** » 19 Aug 2016 09:56

You can see my changes near the top of this thread. I've had fewer false positives since then.

robertboyl · Post by **robertboyl** » 19 Aug 2016 16:30

Thanks a lot, pdwalker. Just curious, you see it have some legit hits?

How does it work more or less, it analyses images, an OCR type, but trying to find patterns, seems hard to do... The FP are strange, basic logos of companies with peoples names.

efa-project.org

ImageCerberusPLG5 high score, no?

ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?

Re: ImageCerberusPLG5 high score, no?