Hi Griffo,
Having used EFA for a while and benefited from it greatly, I thought it about time I helped out someone here. So first post! Luck you... I hope. I also hope you're getting notifications on replies as your post is getting a little old, I sure took my time! (c;
Answer for question 2 is everyone will tell you Bayes isn't worth much without training.
Answer for question 3 is I leave them in Exchange but to get them into a format for spamassassin (mbox) I use Thunderbird to download via IMAP, 'cos Thunderbird stores the emails in a mbox file. I can just grab the file once it's synced and upload it to EFA.
Answer for question 1 is the wall of text that follows...
What I'd recommend you do to get all these old emails is the following (and I'll assume you have some moderate understanding of Exchange here);
- Open the ECP and do an enterprise search (compliance management > in-place eDiscovery & hold). How you do this is up to you, but I'd suggest choosing "Enable de-duplication" when you do -- this will collapse all the found emails into one folder.
See https://technet.microsoft.com/en-au/lib ... .150).aspx for more information.
- Confirm the results are the emails you want, obviously.
- Open up Outlook and fire up a profile for the administrator (or whichever account you ran the ECP search from).
- Add a secondary mailbox of Discovery Search Mailbox.
https://msdn.microsoft.com/en-us/librar ... .149).aspx for more info.
- Copy out the emails into an empty folder that you have control of. You can export to PST and then import into your mailbox if you so desire.
- Review the emails in your new folder. Delete anything that shouldn't be there.
Alternatively you can review each email and move them into a folder when you know they're the spam/malware you want (to never see again). Either way is fine, but make sure you end up with all the unwanted emails in a single folder and nothing else in it.
- Enable IMAP on the Exchange server. If IMAP is abhorrent to you then you can disable it after we're done. I leave it on, but there's no rule for it on our firewall.
- Fire up Thunderbird and sync it to the account that has this crap-ridden folder. When setting up the sync folders you can turn off all folders except the one you need.
- Close Thunderbird once the sync is finished.
- Browse to %appdata%\Thunderbird\Profiles\bunch-of-alphanumeric.default\ImapMail\Exchange.server.name.
- Find the file that's the name of the folder you've dumped the emails in. If it's in a subfolder then you need to browse down to that. The folder path will be the same as the path in Outlook and then the data file will have the same name as the folder the junk is in, no extension.
- Open WinSCP, log into the EFA box (SFTP or SCP port 22) and then drag in the file to where ever you like. Default will be home (/home/accountname).
Finally...!
- SSH into EFA, drop to shell. Then run;
sudo sa-learn --spam --mbox --showdots /full/path/to/the/FILE
That's it. Watch the dots and the result at the end.
---------------------------------------------------------------
Theoretically you could get EFA to run it's mail client (forgotten which it is) to connect back to the Exchange server via IMAP and download it directly to the box. You could also cron a job that then ran the sa-learn on a regular basis with said downloaded file. If automating you just have to make sure the folder is always devoid of any false positives.
At the moment I've got anything with a header of "X-Spam-Status: Yes" or SCL of 8 or more being BCC'ed to a "Spam Analysis" mailbox which I have added as a secondary. I then review the junk that's in there, delete anything that's legit and then when I'm happy with it I go to step 8 and proceed from there. Once done I move the emails out to a processed folder (keeping them just in case we need to rebuild the database at any time) which means the next time I run a sa-learn it doesn't have to troll over thousands of emails it's already processed. The Thunderbird file can get a bit bloated over time, so I sometimes just delete that as well and resync.
Hope that helps!