"A group of graduates from the Massachusetts Institute of Technology (MIT) aim to change that by crawling the Web with hundreds, and soon thousands, of virtual computers that detect which Web sites attempt to download software to a visitor's computer and whether giving out an e-mail address during registration can lead to an avalanche of spam.
The goal is to create a service that lets the average Internet user know what a Web site actually does with any information collected or what a download will do to a computer, Tom Pinckney, vice president of engineering and co-founder of the start-up SiteAdvisor, said during a presentation at the CodeCon conference here."
The concept is simply amazing, and while it's been around for ages, it stills needs more acceptance from decision makers that tend to stereotype on perimeter and antivirus defense only. Let's start from the basics, it is my opinion that users do more surfing than downloading, that is, the Web and its insecurities represent a greater threat than users receiving malware in their mailboxes or IMs. And not that they don't receive any, but I see a major shift towards URL droppers, and while defacement groups are more than willing to share these with phishers etc., a URL dropper is easily getting replaced by an IP one, so you end up having infected PCs infecting others through hosting and distributing the malware, so sneaky, isn't it? My point is that initiatives such as crawling the web for malicious sites, listing, categorizing and updating their status is a great, both security, and business sound opportunity. The way you know the bad neighbourhoods around your town, in that very same way you need a visualization to assist in research, or act as a security measure, and while its hard to map the Web and keep it up to date, I find the idea great!
So what is SiteAdvisor up to? Another build-to-flip startup? I doubt so as I can almost feel the smell of quality entrepreneurship from MIT's graduates, of course, given they assign a CEO with business background :) APIs, plugins, already tested the majority of popular sites according to them, and it's for free, at least to the average Internet user who's virtual "word of mouth" will help this project get the scale and popularity necessary to see it licensed and included within current security solutions. They simply cannot test the entire Web, and I feel the shouldn't even set it as an objective, instead map the most trafficked web sites or do so on-the-fly with the top 20 results from Google. I wonder how are downloads tested, are they run through VirusTotal for instance, and how significant could a "push" approach from the end users, thus submitting direct links to malicious files found within to domain for automatic analysis, sound in here?
I think the usefulness of their idea could only be achieved with the cooperation/acquisition of a leading search engine, my point is that some of the project's downsizes are the lack of on-the-fly ability(that would be like v2.0 and a major breakthrough in respect to performance), how it's lacking the resources to catch up with Google on the known web (25,270,000,000 according to them recently), how IP droppers instead of URL based ones totally ruin the idea in real-life situations(it takes more efforts to register and maintain a domain, compared to using a zombie host's capabilities to do the same, doesn't it?)
In one of my previous posts on why you should aim higher than antivirus signatures protection only I mentioned some of my ideas on "Is client side sandboxing an alternative as well, could and would a customer agree to act as a sandbox compared to the current(if any!) contribution of forwarding a suspicious sample? Would v2.0 constitute of a collective automated web petrol in a PC's "spare time"?
Crawling for malicious content and making sense of the approaches used in order to provide an effective solutions is very exciting topic. As a matter of fact in one of my previous posts "What search engines know, or may find about us?" I mentioned about the existence of a project to mine the Web for terrorist sites dating back to 2001. And I'm curious on its progress in respect to the current threat of Cyberterrorism, I feel both, crawling for malicious content and terrorist propaganda have a lot in common. Find the bad neighbourhoods, and have your spiders do whatever you instruct them to do, but I still feel quality and in-depth overview would inevitably be sacrificed for automation.
What do you think is its potential of web crawling for malicious content, and by malicious I also include harmful in respect to Cyberterrorism PSYOPS (I once came across a comic PSYOPS worth reading!) techniques that I come across on a daily basis? Feel free to test any site you want, or browse through their catalogue as well.
You can also find more info on the topic, and alternative crawling solutions, projects and Cyberterrorism activities online here :
A Crawler-based Study of Spyware on the Web
Covert Crawling: A Wolf Among Lambs
IP cloaking and competitive intelligence/disinformation
Automated Web Patrol with HoneyMonkeys Finding Web Sites That Exploit Browser Vulnerabilities
The Strider HoneyMonkey Project
STRIDER : A Black-box, State-based Approach to Change and Configuration Management and Support
Webroot's Phileas Malware Crawler
Methoden und Verfahren zur Optimierung der Analyse von Netzstrukturen am Beispiel des AGN-Malware Crawlers (in German)
Jihad Online : Islamic Terrorists and the Internet
Right-wing Extremism on the Internet
Terrorist web sites courtesy of the SITE Institute
The HATE Directory November 2005 update (very rich content!)
Recruitment by Extremist Groups on the Internet
Technorati tags:
security, information security, SiteAdvisor, web crawler, search engine, cyberterrorism