PDA

View Full Version : Spam filtering (again)


Colin Wilson
January 22nd 04, 02:39 AM
RE: Mailwasher

In case its of any use to any of you, i`ve put a slightly annotated set
of my filters.txt online that you can copy and paste into your own if
you so desire...

http://www.phoenixbbs.dsl.pipex.com/filterlist.html

I`ve now had to manually mark 5 spam mails for deletion over the last 4
days that weren`t picked up between these filters and a short(ish) list
of blocked country codes in the blacklist.txt (in fact, i`ll paste them
below) - I used to use an 80k+ list of blocked domains, but this seems to
perform almost as well - just doesn`t mark them as blacklisted.

I check 8 accounts and at least 4 get spammed heavily, so I don`t think
this is a bad result overall :-p - and for the very observant, you will
probably be able to guess at least 6 of my real email addresses :-}

---
[Blacklisted emails]

0
37999
37898
38006
38007
38004
38007
37876
37817
37924
38004
0
37980
37822
38008
38006
38007
38003
38006
37988
37778
37817
37992
37802
37895
38006
38008
38007
37863
38006
37957
37989
37984
38005
.* 37913
38007
37999
38007
37976
38003
37939
37949
37982
37984
38003
38007
38007
38002
38000
38007
37993
38002
38005
38007
38005
37901
37881
37961
37898
37975
37995
37984
37961
37819
37913
38006
37923
37923
.* 37927
37971
38004
37963
37905
38002
38007
38004
38008
38001
38006
37987
38001
37956
37891
38005
38004
38007
38007
38003
38006
37807
38006
38005
38003
37995
37881
.* 37928
38002
37954
38001
37977
0
38006
38006
38005
.* 0
* 37985
38007
37988
37980
37966
37899
37948
37961
37883
37864
37970
37937
38003
37895
.* 37987
37922
37923
*@*_msn.* 37922
37849
0
*@*bestusrx* 37889
*@*bluerocketonline* 37924
38008
37956
37960
37966
*@*d59072* 0
*@*deals-and-steals* 37938
*@*easymeds* 37960
37833
37897
*@*homelender* 37892
*@*malibumailz* 37882
*@*microsoft* 37990
*@*msdn* 37968
*@*msnbc* 37902
37898
*@*savings1friend* 37893
*@*savingsfriend* 37929
37785
0
*@*specialbuys* 37930
*@*trbrgns* 38007
*@*true-bargains* 37948
38008
---

--
Please add "[newsgroup]" in the subject of any personal replies via email
* old email address "btiruseless" abandoned due to worm-generated spam *
--- My new email address has "ngspamtrap" & @btinternet.com in it ;-) ---

Colin Wilson
January 22nd 04, 11:05 PM
> Hmm, I tried blacklisting programs like Mail washer, and got fed up, as
> it still cost me time constantly updating the lists. I moved my a
> Baysian based filter, and now spam is gone for good!! I have well over
> 99% accuracy. I highly recommend POPFile (popfile.sf.net)

Yeah, it looks ok, but you have to download the spam for it to filter -
and with the proliferation of viruses, I prefer to get shut of the crap
at server level. How newer AV programs don`t go ape**** at downloaded
viral attachments in spam prior to the bayesian filtering and subsequent
deletion is beyond me. My old version of AVP clamps the machine
immediately one appears even as a temp file.

I was spending a lot of time updating the lists too, until I tried
mailwasher on a friends old clanking machine. It was impossibly
horribly slow.

I deleted almost all the domains in my blacklist (80k+ of them) and now
just block country codes in the blacklist (.uk .ie .ru etc) and have the
filters as shown on the webpage I pointed at.

So far in 4 days i`ve had to manually mark 5 spams, which when
considering I use 8 accounts, of which 4 accounts are industrial strength
spam magnets, I don`t think that`s a bad track record :-}

--
Please add "[newsgroup]" in the subject of any personal replies via email
* old email address "btiruseless" abandoned due to worm-generated spam *
--- My new email address has "ngspamtrap" & @btinternet.com in it ;-) ---

Jock
January 23rd 04, 06:04 PM
On Thu, 22 Jan 2004 22:05:20 -0000, Colin Wilson > wrote:

>> Hmm, I tried blacklisting programs like Mail washer, and got fed up, as
>> it still cost me time constantly updating the lists. I moved my a
>> Baysian based filter, and now spam is gone for good!! I have well over
>> 99% accuracy. I highly recommend POPFile (popfile.sf.net)
>
>Yeah, it looks ok, but you have to download the spam for it to filter -
>and with the proliferation of viruses, I prefer to get shut of the crap
>at server level. How newer AV programs don`t go ape**** at downloaded
>viral attachments in spam prior to the bayesian filtering and subsequent
>deletion is beyond me. My old version of AVP clamps the machine
>immediately one appears even as a temp file.
>
>I was spending a lot of time updating the lists too, until I tried
>mailwasher on a friends old clanking machine. It was impossibly
>horribly slow.
>
>I deleted almost all the domains in my blacklist (80k+ of them) and now
>just block country codes in the blacklist (.uk .ie .ru etc) and have the
>filters as shown on the webpage I pointed at.
>
>So far in 4 days i`ve had to manually mark 5 spams, which when
>considering I use 8 accounts, of which 4 accounts are industrial strength
>spam magnets, I don`t think that`s a bad track record :-}

It's easier, if expensive, to use a spam deletion service, but even
that isn't perfect.

If spam comes in between the times they scan you mailbox and you access
it before their next scan you still get the spam.

Nothing's perfect it seems!

--
Jock.

Julian Knight
January 26th 04, 01:43 PM
In message >, Zapp Brannigan
> writes
>Colin Wilson wrote:
>
>> RE: Mailwasher
>>
>
>Hmm, I tried blacklisting programs like Mail washer, and got fed up, as
>it still cost me time constantly updating the lists. I moved my a
>Baysian based filter, and now spam is gone for good!! I have well over
>99% accuracy. I highly recommend POPFile (popfile.sf.net)
>
Or K9 for Windows users.
--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Julian Knight
January 26th 04, 01:45 PM
In message >, Colin
Wilson > writes
>> Hmm, I tried blacklisting programs like Mail washer, and got fed up, as
>> it still cost me time constantly updating the lists. I moved my a
>> Baysian based filter, and now spam is gone for good!! I have well over
>> 99% accuracy. I highly recommend POPFile (popfile.sf.net)
>
>Yeah, it looks ok, but you have to download the spam for it to filter -
>and with the proliferation of viruses, I prefer to get shut of the crap
>at server level. How newer AV programs don`t go ape**** at downloaded
>viral attachments in spam prior to the bayesian filtering and subsequent
>deletion is beyond me. My old version of AVP clamps the machine
>immediately one appears even as a temp file.
>
>I was spending a lot of time updating the lists too, until I tried
>mailwasher on a friends old clanking machine. It was impossibly
>horribly slow.
>
>I deleted almost all the domains in my blacklist (80k+ of them) and now
>just block country codes in the blacklist (.uk .ie .ru etc) and have the
>filters as shown on the webpage I pointed at.
>
>So far in 4 days i`ve had to manually mark 5 spams, which when
>considering I use 8 accounts, of which 4 accounts are industrial strength
>spam magnets, I don`t think that`s a bad track record :-}
>
K9 is written from scratch in C (or C++?) and has very low overheads.
Even in the first few days, it is pretty accurate (80+%) and runs at
98+% for me now. (Based on an average of 50 emails per day of which >70%
are spam).
--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Pete Smith
January 26th 04, 06:36 PM
In article >,
says...
> Peter > wrote:
>
> > But there is a limit to what any individual email recipient will
> > be able to do. You can start with viagra, v1agra, penis, pen1s,
> > septic, porn, sex, etc in the subject header, and end up with a
> > dictionary analysis rejecting every email with more than a certain %
> > of mis-spelt words (a lot of spam uses randomly generated text within
> > it). But it will never be 100%.
>
> Well, that's where Bayesian filtering scores (no pun intended). It
> adapts as the spammers do, and you don't have to waste time tweaking
> rules manually. The one I'm using (POPfile) is currently scoring 99.57%
> accuracy.

I'm using the Bayesian filtering plugin for TheBat v2.

I downloaded someone else rules when I started, so didn't have to train it
overly myself.

It took about a week to work 100% - the main problem was false positives.
Emails that look like spam that I had actually signed up to.

It's now been 100% reliable for my mail.

As an experiment, I've just connected it to a mailbox which I'm looking
after, which has about 200 spams a day.

There are 877 emails.

774 identified as junk. Of this 744, there was 1 false positive, because it
was ALL IN CAPS.

103 not junk. Of this 103, only 6 were junk, and 3 of those were
questionable junk TBH.

This makes it 94% accurate, with 0.12% false positives.

I'm hoping with the extra 773 spams, it'll make it even more accurate :-)

Pete.

--
NOTE! Email address is spamtrapped. Any email will be bounced to you
Remove the news and underscore from my address to reply by mail

Tom Ruben
January 26th 04, 08:08 PM
In article >, Julian Knight
> writes
>>
>K9 is written from scratch in C (or C++?) and has very low overheads.
>Even in the first few days, it is pretty accurate (80+%) and runs at
>98+% for me now. (Based on an average of 50 emails per day of which >70%
>are spam).

Oh that my spam count was so low. However, K9 is currently catching
over 99% of it.
--
Tom

Julian Knight
January 28th 04, 02:18 PM
In message >, phoenix
> writes
>On Mon, 26 Jan 2004 12:45:22 +0000, Julian Knight wrote:
>
.....
>>>
>> K9 is written from scratch in C (or C++?) and has very low overheads.
>> Even in the first few days, it is pretty accurate (80+%) and runs at
>> 98+% for me now. (Based on an average of 50 emails per day of which >70%
>> are spam).
>
>You might also like to take a look at the add-on Filter and Blacklist for
>K9 here www.edcottrell.com/k9.cfm

Thanks Bill, I'd love to but get the response:

"The system cannot find the file specified."

on the site as well as the page.

--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Julian Knight
January 28th 04, 02:20 PM
In message >, Tom Ruben
> writes
>In article >, Julian Knight
> writes
>>>
>>K9 is written from scratch in C (or C++?) and has very low overheads.
>>Even in the first few days, it is pretty accurate (80+%) and runs at
>>98+% for me now. (Based on an average of 50 emails per day of which >70%
>>are spam).
>
>Oh that my spam count was so low. However, K9 is currently catching
>over 99% of it.

Well, I do work in security!

Actually, my old personal address gets another 150/day but I reject all
of those at source as I no longer use the old address. Unfortunately I
cannot change my work address.

--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Julian Knight
January 28th 04, 02:22 PM
In message >, Peter
> writes
>
> Jock > wrote:
>
>>If spam comes in between the times they scan you mailbox and you access
>>it before their next scan you still get the spam.
>>
>>Nothing's perfect it seems!
>
>I find mailwasher to be pretty good, better than 95%, in its
>identification of spam using various built-in rules and by checking
>against known spam sources. But it's not 100% - what could be when
>today's spam uses a different subject etc for every recipient.
>
>An *ISP* could do a better job because when their system sees 10k
>emails from the same source to 10k of their customers, it is obviously
>spam. But there is a limit to what any individual email recipient will
>be able to do. You can start with viagra, v1agra, penis, pen1s,
>septic, porn, sex, etc in the subject header, and end up with a
>dictionary analysis rejecting every email with more than a certain %
>of mis-spelt words (a lot of spam uses randomly generated text within
>it). But it will never be 100%.
>

Whilst an ISP could indeed do better with certain mass mailings, they
cannot ever be as accurate for individuals than your own personal
settings as your profile of emails will be different to everyone else.

So while ISP filtering SHOULD be more common, you will always need your
personal filters as well.

--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Julian Knight
January 28th 04, 02:23 PM
In message >, Tim Hodgson
> writes
>Peter > wrote:
>
>> But there is a limit to what any individual email recipient will
>> be able to do. You can start with viagra, v1agra, penis, pen1s,
>> septic, porn, sex, etc in the subject header, and end up with a
>> dictionary analysis rejecting every email with more than a certain %
>> of mis-spelt words (a lot of spam uses randomly generated text within
>> it). But it will never be 100%.
>
>Well, that's where Bayesian filtering scores (no pun intended). It
>adapts as the spammers do, and you don't have to waste time tweaking
>rules manually. The one I'm using (POPfile) is currently scoring 99.57%
>accuracy.
>

Yes, but on a personal level not a "corporate" one. Bayesian analysis is
MUCH less useful at the ISP level. In fact it is almost certainly pretty
useless.

--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Julian Knight
January 28th 04, 02:27 PM
In message >, Peter
> writes
>
> (Tim Hodgson) wrote:
>
>>> But there is a limit to what any individual email recipient will
>>> be able to do. You can start with viagra, v1agra, penis, pen1s,
>>> septic, porn, sex, etc in the subject header, and end up with a
>>> dictionary analysis rejecting every email with more than a certain %
>>> of mis-spelt words (a lot of spam uses randomly generated text within
>>> it). But it will never be 100%.
>>
>>Well, that's where Bayesian filtering scores (no pun intended). It
>>adapts as the spammers do, and you don't have to waste time tweaking
>>rules manually. The one I'm using (POPfile) is currently scoring 99.57%
>>accuracy.
>
>But it's not perfect, which means you still can't allow it to delete
>emails automatically, and you still have to download everything which
>doesn't give itself away in the subject header.
>
>I could send you an email which refers to e.g. buggering something up
>and it could reject it.
>
>ISPs should be proactive in this stuff.

Whilst this response is often seen and is certainly valid, it is
actually not the whole story. Internet based email can NEVER be relied
on even without spam issues, you would need X.400 or a proprietary
system for that.

There are plenty of other things that can go wrong and lead to lost or
delayed emails.

--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Julian Knight
January 28th 04, 02:32 PM
In message >, Peter
> writes
>
> Pete Smith > wrote:
>
>>It took about a week to work 100% - the main problem was false positives.
>>Emails that look like spam that I had actually signed up to.
>>
>>It's now been 100% reliable for my mail.
>>
>>As an experiment, I've just connected it to a mailbox which I'm looking
>>after, which has about 200 spams a day.
>>
>>There are 877 emails.
>>
>>774 identified as junk. Of this 744, there was 1 false positive, because it
>>was ALL IN CAPS.
>
>That's pretty impressive; I can see this would work fine for an
>individual. But try for a business, where a lot of (especially
>foreign) customers type all in caps :) Also many less email-aware or
>less educated people type all in caps.
>
>A 100% effective measure is this: in response to your email to a new
>destination, you get an automated email back which asks you to go to a
>URL where you copy some text (which is actually a graphic) into a box.
>Then you get white-listed at the recipient. This works fine for
>individuals but it's no good for a business where frankly most
>customers wouldn't bother.
>
>I get a high enough rate with Mailwasher to save a lot of time, but so
>long as there are any false positives I have to read all the subject
>headers just to make sure.
>
With K9, I've only ever had a false positive in the first week of use
(based on several new installs).

I find it reliable enough to only do a quick check of the spams (which
are diverted into a separate folder) every few days. Actually, that way
that K9 works, I could delete the spams and use K9's cache for checking.
That would have the advantage of being a lot more secure than using
Outlook (though I use a utility called PocketKnife Peek).

--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Julian Knight
January 29th 04, 01:38 PM
In message >, Tim Hodgson
> writes
>Julian Knight > wrote:
>
>> In message >, Tim Hodgson
>> > writes
>> >Peter > wrote:
>> >
>> >> But there is a limit to what any individual email recipient will
>> >> be able to do. You can start with viagra, v1agra, penis, pen1s,
>> >> septic, porn, sex, etc in the subject header, and end up with a
>> >> dictionary analysis rejecting every email with more than a certain %
>> >> of mis-spelt words (a lot of spam uses randomly generated text within
>> >> it). But it will never be 100%.
>> >
>> >Well, that's where Bayesian filtering scores (no pun intended). It
>> >adapts as the spammers do, and you don't have to waste time tweaking
>> >rules manually. The one I'm using (POPfile) is currently scoring 99.57%
>> >accuracy.
>> >
>>
>> Yes, but on a personal level not a "corporate" one. Bayesian analysis is
>> MUCH less useful at the ISP level. In fact it is almost certainly pretty
>> useless.
>
>Well, possibly, but I was responding to the previous posting (above)
>which was about individual spam filtering at the client level.
>
I understand. I just wanted to point out the limitations of ISP provided
options.
--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

robert of northworthige
January 30th 04, 02:55 PM
In article >, Julian Knight
> writes
>In message >, Tim Hodgson
> writes
>>Julian Knight > wrote:
>>> >Well, that's where Bayesian filtering scores (no pun intended). It
>>> >adapts as the spammers do, and you don't have to waste time tweaking
>>> >rules manually. The one I'm using (POPfile) is currently scoring 99.57%
>>> >accuracy.
>>> >
>>>
>>> Yes, but on a personal level not a "corporate" one. Bayesian analysis is
>>> MUCH less useful at the ISP level. In fact it is almost certainly pretty
>>> useless.
>>
>>Well, possibly, but I was responding to the previous posting (above)
>>which was about individual spam filtering at the client level.
>>
>I understand. I just wanted to point out the limitations of ISP provided
>options.

As an aside, Demon have had their Brightmail-based spam filtering system
(using 'honeypot' addresses to find unsolicited spam) up for about 24
hours now - for the majority of the stuff I used to get it's working
very well...
Bob
--
robert of northworthige

Julian Knight
January 30th 04, 03:47 PM
In message >, robert of
northworthige > writes
>In article >, Julian Knight
> writes
>>In message >, Tim Hodgson
> writes
>>>Julian Knight > wrote:
>>>> >Well, that's where Bayesian filtering scores (no pun intended). It
>>>> >adapts as the spammers do, and you don't have to waste time tweaking
>>>> >rules manually. The one I'm using (POPfile) is currently scoring 99.57%
>>>> >accuracy.
>>>> >
>>>>
>>>> Yes, but on a personal level not a "corporate" one. Bayesian analysis is
>>>> MUCH less useful at the ISP level. In fact it is almost certainly pretty
>>>> useless.
>>>
>>>Well, possibly, but I was responding to the previous posting (above)
>>>which was about individual spam filtering at the client level.
>>>
>>I understand. I just wanted to point out the limitations of ISP provided
>>options.
>
>As an aside, Demon have had their Brightmail-based spam filtering system
>(using 'honeypot' addresses to find unsolicited spam) up for about 24
>hours now - for the majority of the stuff I used to get it's working
>very well...
>Bob

It seems to be working well here too. I reject all mail to my demon
domain (except for postmaster and webmaster of course). I was getting
between 100 and 200 messages a day, I now have 5! Even with envelope
rejection, this should speed up my mail downloads a fair bit.

Clearly this is one area of centralised spam filtering that does work
well.

--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

robert of northworthige
January 30th 04, 06:07 PM
In article >, Peter
> writes
>
> robert of northworthige > wrote:
>
>>As an aside, Demon have had their Brightmail-based spam filtering system
>>(using 'honeypot' addresses to find unsolicited spam) up for about 24
>>hours now - for the majority of the stuff I used to get it's working
>>very well...

yes, that's what they're doing
>
>I take on board what's been said here, but I still think an ISP should
>be proactive because they are uniquely well placed to get rid of
>*most* spam, leaving users to do their own solutions for what gets
>through.
>
>As an aside, what are Demon like as a BB/dialup ISP? I know they used
>to offer fixed IPs, and had long periods of being very very slow...
>but that was years ago.
>
there's a separate thread running on demon & broadband.
fixed IP is standard on both dialup and broadband for demon
(and yes, I've had some rough periods of connectivity on dialup in the
last 18 months - nothing yet on bb (been up since 10/9/03))
Bob

Julian Knight
February 2nd 04, 12:00 PM
In message >, Peter
> writes
>
> robert of northworthige > wrote:
>
>>As an aside, Demon have had their Brightmail-based spam filtering system
>>(using 'honeypot' addresses to find unsolicited spam) up for about 24
>>hours now - for the majority of the stuff I used to get it's working
>>very well...
>
>I take on board what's been said here, but I still think an ISP should
>be proactive because they are uniquely well placed to get rid of
>*most* spam, leaving users to do their own solutions for what gets
>through.
>
>As an aside, what are Demon like as a BB/dialup ISP? I know they used
>to offer fixed IPs, and had long periods of being very very slow...
>but that was years ago.
>
Like all ISPs, they have bad spots from time to time but in the last
year or so they seem to have been pretty good. They are not free of
course and there are certainly better value options.

The reasons I have stuck with them so far:
- I'm an early adopter so I still have a fixed IP address
- Flexible email gathering (smtp & pop3) though some other ISPs now
offer smtp
- My demon sub-domain was known and used (no longer an issue, I haven't
actively used it for a couple of years now since I got a proper domain
name)
- Their dial-up does NOT require caller-id validation (which my other
favorite, PlusNet does). I can't use caller-id on customer sites as it
is typically disabled.

I plan to leave demon sometime this year if I can get ADSL at home.
--
Julian Knight,
Sheffield, South Yorkshire, United Kingdom.
Security, Directory, Messaging, Network & PC Consultant
http://www.knightnet.org.uk/

Google

Credit Consolidation - WoW Gold - Online Advertising - Reptile Supply - Buy WoW Gold