for a long time i train the Mails from http://untroubled.org/spam/ to my Bayes DB. There are a lot Textmails and Mails with Attachments to train.
that i don't need to do this manual i have created some scripts that will be run every night with a cronjob:
you need the following folder structure that the scripts runs or you need to customize the path in the scripts:
Code: Select all
home/
├── root/
├── scripts
├── learn
├── ham
└── spam
First Part - Download and train Spam File "download-spam"
Code: Select all
cd /home/root/scripts/learn/spam
filename=$(date '+%Y-%m'.7z)
foldername=$(date '+%m')
wget http://untroubled.org/spam/$filename
echo "Extracting Files"
/usr/bin/7z/7za e /home/root/scripts/learn/spam/$filename -o/home/root/scripts/learn/spam/
sleep 10
rm /home/root/scripts/learn/spam/$filename
/home/root/scripts/spam-learn
rm -r /home/root/scripts/learn/spam/$foldername
/home/root/scripts/download-spam-attachments
for f in *.orig; do rm "$f"; done
for f in *.txt; do rm "$f"; done
Code: Select all
wget -r -l1 -A.orig http://untroubled.org/spam/attachments/ -P /home/root/scripts/learn/spam
echo "spam verschieben"
mv /home/root/scripts/learn/spam/untroubled.org/spam/attachments/*.orig /home/root/scripts/learn/spam/
echo "warte 5 Sekunden bis Spam verschoben ist"
sleep 5
echo "Ordner wird gelöscht"
rm -r /home/root/scripts/learn/spam/untroubled.org/
echo "Spam wird angelernt"
cd /usr/bin
./sa-learn --spam /home/root/scripts/learn/spam --progress
./sa-learn --ham /home/root/scripts/learn/ham --progress
rm /home/root/scripts/learn/spam/*
rm /home/root/scripts/learn/ham/*
i start the scripts by a Cronjob and train a lot of new spam.
cheers