Train Bayes with an email archive
Posted: 10 May 2017 01:27
G'day everyone,
I've recently implemented eFa (3.0.2.2) for filtering emails inbound to my Exchange 2013 server (receives 50~75 emails a day). We don't necessarily receive a lot of spam but after a couple of cryptolocker style attachments were executed in recent times I've taken upon myself to do something about it.
With such a low number of emails it will take some time for the Bayes database to start working and may not be effective due to the small dataset it will have to work with, so in an effort to speed up the process I've stumbled past an archive of spam messages to assist with populating the data set:
http://untroubled.org/spam/
I would also need to export known good emails from a few of the mailboxes to level out good vs bad.
1. I searched but came up empty, has someone else already managed to do this?
2. How effective is a fully populated Bayes database and will the effort be worth the gain?
3. Any tips and pointers to what would need to be done would be greatly appreciated, eg where the emails need to be stored, the format they need to be in, after the files are in place how to execute Bayes to begin learning.
If I can pull it off I'll post back the process and script(s) so others can replicate it.
I've recently implemented eFa (3.0.2.2) for filtering emails inbound to my Exchange 2013 server (receives 50~75 emails a day). We don't necessarily receive a lot of spam but after a couple of cryptolocker style attachments were executed in recent times I've taken upon myself to do something about it.
With such a low number of emails it will take some time for the Bayes database to start working and may not be effective due to the small dataset it will have to work with, so in an effort to speed up the process I've stumbled past an archive of spam messages to assist with populating the data set:
http://untroubled.org/spam/
I would also need to export known good emails from a few of the mailboxes to level out good vs bad.
1. I searched but came up empty, has someone else already managed to do this?
2. How effective is a fully populated Bayes database and will the effort be worth the gain?
3. Any tips and pointers to what would need to be done would be greatly appreciated, eg where the emails need to be stored, the format they need to be in, after the files are in place how to execute Bayes to begin learning.
If I can pull it off I'll post back the process and script(s) so others can replicate it.