Sunday, December 16, 2007

Cached Malware Embedded Sites

Google, with its almost real-time crawling capabilities, has rarely proved useful while researching malware embedded sites who were cleaned before they could be analyzed, mainly popular sites who get crawled several times daily. However, Yahoo's and MSN's search engines, with MSN providing type of historical crawling content, have been an invaluable resource in providing the actionable historical intelligence in the form of what was embedded at the site, where was it pointing, are there many other sites currently embedded by the same campaign etc. This is an interesting opinion stating that cached malware embedded sites are a security problem, well they're, but the bigger problem to me is that it's only Google that's taken efforts to deal with the problem next to the market challengers - Yahoo and MSN - "Google, Yahoo, Microsoft Live search engines contain page-caching flaw, says Aladdin" :

"Researchers at Aladdin Knowledge Systems have discovered a “significant” vulnerability in the page-caching technologies of three major search engines, allowing them to deliver malicious pages that have been removed from the web. The researchers discovered the vulnerability when analysing the content of a hacked university website. The site was cleaned, but malicious content was still reachable via search engine caches. The flaw is a "glimpse of the future" of multifaceted web-based attacks, said Ofer Elzam, director of product management at Aladdin."

Let's discuss the current model of dealing with such sites. Whenever Google comes across a site that's potentially malware embedded, they don't just label it "this site may harm you computer" but also remove all the cached copies of the site. By doing so, they protect the "cached surfers crowd", and by doing so, often prompt me to locate the actual cached copies with the embedded malware hopefully still there by using other search engines, ones whose crawling capabilities aren't as fast as Google's.

Therefore, don't put Google in the same row as Yahoo and MSN, since Yahoo and MSN do not provide such in-house built malware embedded sites notification services, and given the slow content crawling, it's among the top reasons why I love using their search engines given I'm aware of a malware embedded site, but couldn't obtain the obfuscated javascript/IFRAME before it got removed.

Here's an example of how useful cached malware sites are for research purposes. Back in September, the U.S Consulate in St.Petersburg was serving malware, and the embedded malware link was removed sooner than I could obtain a copy of the infected page. Best of all - there were still cached copies available serving the malware which lead to the assessment of the campaign. Another great example that the intelligence sharing between the industry, independent reseachers and non-profit organizations, is resulting in far more detailed exposures of various malicious campaigns, compared to a vendor's self-sufficiency mentality.

This is how Google understand the malicious economies of scale, where efficiency gets sacrificed for a short lifecycle of the campaign, a trade-off I've been discussing for a while especially in respect to the Rock Phish Kit :

"Examining our data corpus over time, we discovered that the majority of the exploits were hosted on third-party servers and not on the compromised web sites. The attacker had managed to compromise the web site content to point towards an external URL hosting the exploit either via iframes or external JavaScript. Another, less popular technique, is to completely redirect all requests to the legitimate site to another malicious site. It appears that hosting exploits on dedicated servers offers the attackers ease of management. Having pointers to a single site offers an aggregation point to monitor and generate statistics for all the exploited users. In addition, attackers can update their portfolio of exploits by just changing a single web page without having to replicate these changes to compromised sites. On the other hand, this can be a weakness for the attackers since the aggregating site or domain can become a single point of failure."

Google are clearly aware of what's going on, but are trying to limit the potential for false positives of sites wrongly flagged as ones serving malware, which is where malicious parties will be innovating in the future, while it still remains questionable why they still haven't done so by obvious means - RBN's directory permissions gone wrong for instance.

The bottom line - cached malware embedded sites are a valuable resource in the arsenal of tools for the security researcher/malware analyst to use, and not necessarily a threat if it's Google's approach of removing the cached copies we're talking about, prior to notifying of the infection. Which leads us to more realistic attack tactic than the one discussed in the article, where an attacker will supposedely embedd malware at different sites, let the search engines crawl and cache it, than remove the sites and wait for the visitors to use the cache, thereby infecting themselves. Case in point - the U.S Consulate's site for instance wasn't even flagged by Google as malware embedded one, which is hopefully the result of their fast crawling capabilities, but the ugly attack tactic I have in mind is not just embedding the IFRAME, but embedding an obfuscated IFRAME that leads to the usual obfuscated exploit URL, which is what happend in the Consulate's case, an obfuscated IFRAME by itself.