jeroen.se
by jnieuwen
using bayesian filtering to determine the importance of email
16 April 2009 18:54 - Most people are familiar with using some kind of naive bayesian filtering to classify mail as spam or non-spam. Or at least they use a spam filter that uses bayesian filtering without knowing that it uses bayesian filtering. Nothing new so far. But beside spam I had another criteria for classifying email: How important is it to me? In my opinion there are two kinds of mail besides of course spam email.
The first being mail that needs no immediate action. In example linkedin invites, the mail with the link to that funny youtube movie, mailstatistics from servers, updates of who started following you on twitter etcetera.
The second is of course email that needs immediate action. In example the mail from nagios that your raid 1 mirror is degraded or the invitation to that cool party. For me this is email I want to see when I am travelling and can not wait till I am at my desk again.
Therefor I wanted to classify every incoming none spam mail into the important or unimportant category. Hence I decided to try classifying my email this way with bayesian filtering. The idea was that bayesian filtering should save me the overhead of setting up complex rules.
Setting this up was straight forward. First I divided all regular email of the last 2 months in 2 categories: important and unimportant. This gave me the training sets to start training my filter. After this I trained the filter and then configured the filter to place all unimportant email in another 'later' mailbox folder, instead of putting it in the default Inbox.
Now the only challenge was making sure classification mistakes gets corrected. This is done by archiving every mail in one of two archive mailboxes. In my case '2009l' or '2009n'. Every night I run a scheduled job to retrain the filter on these archives.
An interesting question after all this is: Does it work? In short yes, it does work. Around 90% of my incoming email is classified correctly and I did not miss any cool party as far as I know.
2 comments
Categories
Cycling (2)
Gadgets (3)
IPv6 (1)
Misc (27)
Scripting (25)
Travel (7)
Unix (24)
Archive
January 2013 (1)
October 2012 (1)
September 2012 (1)
May 2012 (1)
February 2012 (1)
January 2012 (1)
January 2011 (1)
November 2010 (1)
June 2010 (1)
April 2010 (2)
January 2010 (1)
December 2009 (1)
November 2009 (2)
May 2009 (1)
April 2009 (1)
March 2009 (1)
February 2009 (7)
January 2009 (3)
December 2008 (1)
November 2008 (4)
October 2008 (5)
September 2008 (1)
August 2008 (3)
July 2008 (2)
June 2008 (2)
May 2008 (3)
April 2008 (1)
March 2008 (1)
February 2008 (1)
January 2008 (4)
December 2007 (1)
November 2007 (7)
October 2007 (4)
September 2007 (1)
August 2007 (2)
June 2007 (6)
May 2007 (8)