ImageCerberusPLG5 high score, no?

General eFa discussion
Post Reply
robertboyl
Posts: 25
Joined: 09 Feb 2015 11:29

ImageCerberusPLG5 high score, no?

Post by robertboyl »

Hi, everyone

I found an email, false positive, and the rule ImageCerberusPLG5 4.50 had a hit with high score. All the email had was a banner/image/letterhead with customers logo.

I found it strange, as this rule is not in official SA and as I said, the score is really high, helped a lot to mark the innocent message as spam. Ill try to analyse to see if the rule helps at all...

I saw other posts in forum about people asking on this score and considering lowering score. Any thoughts if this rule does have good hits and why such a high score and how it works? It tries to catch some pornografic images or something?

Thanks
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: ImageCerberusPLG5 high score, no?

Post by shawniverson »

For some this is the case. Image analysis is not always perfect.

Simply set a lower score for ImageCerberusPLG5 in /etc/mail/spamassassin/local.cf
robertboyl
Posts: 25
Joined: 09 Feb 2015 11:29

Re: ImageCerberusPLG5 high score, no?

Post by robertboyl »

Thanks, but is this an official SA rule? As I dont see it in SA rules. It does what exactly, what type of image it catches, porn?

Why such a high score? I will try to analyse to see if it does have some good hits also...

What are other folks experience with this rule? Worth lowering score?

Thanks
dwmp
Posts: 54
Joined: 05 Feb 2016 13:42

Re: ImageCerberusPLG5 high score, no?

Post by dwmp »

Hello,

we have the same problem (EFA 3.0.0.9 installed). But nothing about ImageCerverusPLGx is written in /etc/mail/spamassassin/local.cf. Instead the scores are configured in /etc/mail/spamassassin/ImageCerberusPLG.cf
So in which file shall we make changes to edit the score level?
Thanks in advance!

dwmp
ovizii
Posts: 463
Joined: 11 May 2016 08:08

Re: ImageCerberusPLG5 high score, no?

Post by ovizii »

you edit /etc/mail/spamassassin/ImageCerberusPLG.cf to lower your scores
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: ImageCerberusPLG5 high score, no?

Post by pdwalker »

No. Don't edit that file. That file could get overwritten on an update.

The proper answer is to override the values in local.cf.

I also found the ImageCerberus scoring too highly for the messages I received, so I reduced them to a 10th of what they were. Here is what I added to /etc/mail/spamassassin/local.cf

Code: Select all

# scoring too high.  Reduce
score     ImageCerberusPLG5     0.5  0.5  0.5  0.5
score     ImageCerberusPLG4     0.4  0.4  0.4  0.4
score     ImageCerberusPLG3     0.3  0.3  0.3  0.3
score     ImageCerberusPLG2     0.2  0.2  0.2  0.2
score     ImageCerberusPLG1     0.1  0.1  0.1  0.1
ovizii
Posts: 463
Joined: 11 May 2016 08:08

Re: ImageCerberusPLG5 high score, no?

Post by ovizii »

Sorry if I gave wrong advice but all the sites I've been browsing were saying custom .cf and .pm files go into /etc/mail/spamassassin/ so I didn't expect anything to overwrite files in there but after reading up on it it seems that I was only partially right: you can place your custom files in there and they will stay but everything that's already in there by default could be overwritten.
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: ImageCerberusPLG5 high score, no?

Post by pdwalker »

ovizii, that is the place for custom cf amd pm files (at least that is where I am putting mine, but it's not really the place to alter preexisting files (except local.cf) as they were created by other packages.

so yes, prexisting stuff (except local.cf hopefully) could get overwritten. new stuff should be left alone.
dwmp
Posts: 54
Joined: 05 Feb 2016 13:42

Re: ImageCerberusPLG5 high score, no?

Post by dwmp »

Alright, i will edit the local.cf
Thanks very much guys!
robertboyl
Posts: 25
Joined: 09 Feb 2015 11:29

Re: ImageCerberusPLG5 high score, no?

Post by robertboyl »

Hi,

Thanks, everyone! Is it not possible/worth it to lower these scores by default in EFA?

Are these official SA rules?

Thanks
User avatar
shawniverson
Posts: 3644
Joined: 13 Jan 2014 23:30
Location: Indianapolis, Indiana USA
Contact:

Re: ImageCerberusPLG5 high score, no?

Post by shawniverson »

robertboyl wrote:Hi,

Thanks, everyone! Is it not possible/worth it to lower these scores by default in EFA?

Are these official SA rules?

Thanks
https://github.com/E-F-A/v3/issues/284
robertboyl
Posts: 25
Joined: 09 Feb 2015 11:29

Re: ImageCerberusPLG5 high score, no?

Post by robertboyl »

Thanks a lot, Shawn, very nice of you.

Congrats on EFA and constant improvements!!
User avatar
Daniel Beardsmore
Posts: 28
Joined: 06 Jan 2016 18:54
Location: Hertfordshire, UK
Contact:

Re: ImageCerberusPLG5 high score, no?

Post by Daniel Beardsmore »

robertboyl wrote:What are other folks experience with this rule? Worth lowering score?
I've just spotted ImageCerberusPLG3 trip up over a couple of innocuous images in a mail signature (one blank(!) and one being the company name in black text) — this earned the message +3 for its audacity, taking it to 4.08 total (it got hit for 1.20 for KAM_LINEPADDING, but reprieved a little for using DKIM).

I seem to recall being concerned with ImageCerberus scoring in the past, so I think it's time to lower the scores myself too.
robertboyl wrote:Why such a high score? I will try to analyse to see if it does have some good hits also...
This reminds me — something MailScanner seems to lack is a way to ask it "what have the Romans^W^W^Whas this rule ever done for me?" — it's all very well cursing a rule for a false positive, but maybe it's 99% accurate. I've yet to see any feature that allows you to conduct this search, although I've yet to actively seek a solution to this (unless I already tried, failed and forgot about it, which is possible …)
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: ImageCerberusPLG5 high score, no?

Post by pdwalker »

There is a report that'll show you the spam assassin rule hits and the spam/non spam scoring.

Sorry, not at a computer so cannot tell you exactly where. Look under reports or tools and you'll find it.
User avatar
Daniel Beardsmore
Posts: 28
Joined: 06 Jan 2016 18:54
Location: Hertfordshire, UK
Contact:

Re: ImageCerberusPLG5 high score, no?

Post by Daniel Beardsmore »

pdwalker wrote:There is a report that'll show you the spam assassin rule hits and the spam/non spam scoring.
That report doesn't bring up the individual messages associated with each rule. You can't determine from the report whether a rule is scoring too lowly (i.e. there are too many false negatives) or whether the rule is scoring too highly (i.e. there are too many false positives).

The one thing you do learn from it is the significance of the rule: the ratio of messages affected (positively or negatively) vs total messages for ImageCerberusPLG3 is 0.4% for me, so it doesn't seem a huge deal to largely write it out of the equation.

ImageCerberusPLG1 is the only one seeing a sizeable usage, of 7.5%, but (as I understand it) was only adding +1 anyway.
ovizii
Posts: 463
Joined: 11 May 2016 08:08

Re: ImageCerberusPLG5 high score, no?

Post by ovizii »

Not sure what yo uare looking for but in my opinion going to EFA => Reports => SA Rule Hits shows all that I need to fine-tune and tweak my scores.
ovizii
Posts: 463
Joined: 11 May 2016 08:08

Re: ImageCerberusPLG5 high score, no?

Post by ovizii »

Has anyone got good experiences with this ImageCerberus plugin?

I just checked my stats and ImageCerberusPLG1 - ImageCerberusPLG4 are 100% HAM and ImageCerberusPLG5 is 50% HAM / 50% SPAM so this plugin basically does nothing to help me...
User avatar
Daniel Beardsmore
Posts: 28
Joined: 06 Jan 2016 18:54
Location: Hertfordshire, UK
Contact:

Re: ImageCerberusPLG5 high score, no?

Post by Daniel Beardsmore »

So far as I can tell, the only information that you can gather from the SpamAssassin Rule Hits report is what percentage of messages are being affected by a rule. If this percentage is low, and the rule caused a false positive, it's safe to disable the rule, as it was doing very little anyway.

Let's imagine however that a rule was found to be involved with a lot of messages, and scored 50% spam/50% ham. There are several possible explanations for this. One is that it should be scoring 100% spam, but the rule is scored too lowly and isn't effective enough. Another explanation is that the rule is either entirely inappropriate or is being scored tooo highly, and is causing a large number of false positives. It may be that the rule is actually largely ineffective and just happens to be there doing very little.

As I understand it, the "Ham" and "Spam" columns of the report don't tell you what the message really was, since SpamAssassin doesn't know. The figures only tell you how messages got classified, rather than how they should have been classified.

The report doesn't tell you how strong the rule is (that is, what score was applied), and the report doesn't give you any means to check for yourself whether the ham/spam classification is behaving as desired.

Going back to that 50/50 rule: if you could bring up a report of all messages affected by that rule, you could skim-read the list and check that all the messages in red appear to be spam, and that all the messages in grey appear to be ham. If you were to see ham messages in red and spam messages in grey, then it would become apparent that the messages are being misclassified, and you could check each message to see how the rule in question was being applied, and whether it was responsible for the misclassification.

Not unless I am missing something here?
ovizii
Posts: 463
Joined: 11 May 2016 08:08

Re: ImageCerberusPLG5 high score, no?

Post by ovizii »

Makes sense and you're right but in my situation I can "trust" the report as I have only recently started using EFA (a few weeks ago) so I have been monitoring it very closely, daily, and corrected every single mistake made so I'd say that every SPAM got caught or at least learned by SA as SPAM if it slipped through and every HAm got marked and learned as HAM. TXREp and Bayes are working perfectly.
I did look at as many emails as I could which were 50/50 with ImageCerberusPLG5 and well, they were 50% SPAm and 50% HAM.:-/

Yes, I second the idea that being able to pull out all messages marked by a specific SA plugin would be awesome but I don't think that would be interesting to too many people or doable.
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: ImageCerberusPLG5 high score, no?

Post by pdwalker »

If you're interested in playing with SQL, you can run a query to list which messages were affected by certain rules.

Of course, you may want to process the results further to make it a little more readable.

Code: Select all

select 
  timestamp, id, from_address, to_address, subject isspam, ishighspam, issaspam, sascore, spamreport 
from 
  mailscanner.maillog 
where 
  spamreport like '% ImageCerberusPLG5 %' 
order by 
  sascore;
robertboyl
Posts: 25
Joined: 09 Feb 2015 11:29

Re: ImageCerberusPLG5 high score, no?

Post by robertboyl »

Guys/Shawn,

Just curious, what value you suggest to score for ImageCerberusPLG5? Maybe 1 point instead of 4.50?

I dont have root access, but Ill ask my sysadmin to see if he assess this, filter out a few days of emails and see how many good results it has, etc. I see some very weird false positives. A very basic customers signature in an email with his company logo and name and it hit ImageCerberusPLG5 4.50

I wonder if this works at all and what ratio. If anyone has any analysis, pls send.

Will try also to get more details on the ratio of when it works versus causes FP.

Thanks!
User avatar
pdwalker
Posts: 1553
Joined: 18 Mar 2015 09:16

Re: ImageCerberusPLG5 high score, no?

Post by pdwalker »

You can see my changes near the top of this thread. I've had fewer false positives since then.
robertboyl
Posts: 25
Joined: 09 Feb 2015 11:29

Re: ImageCerberusPLG5 high score, no?

Post by robertboyl »

Thanks a lot, pdwalker. Just curious, you see it have some legit hits?

How does it work more or less, it analyses images, an OCR type, but trying to find patterns, seems hard to do... The FP are strange, basic logos of companies with peoples names.
Post Reply