May 12, 2007

Google and the Web-Based Malware

Google made an interesting study [PDF, 438 KB] about the pages that try to automatically install malware (the so-called "drive-by download") by exploiting flaws in Microsoft's Internet Explorer. By analyzing all the pages from Google's index, the study found that 450,000 URLs launched files that contained malware. If we assume Google's index has 20 billion pages, that means one in 2,222 pages launches malware. Trojans were the most frequent category of malware, followed by adware.

"The installed malware often enables an adversary to gain remote control over the compromised computer system and can be used to steal sensitive information such as banking passwords, to send out spam or to install more malicious executables over time."

It's also useful to know "the four prevalent mechanisms used to inject malicious content on popular websites: web server security, user contributed content, advertising and third-party widgets". As an example of widget, the study mentions a free stats counter that required users to include links to some external JavaScript files in order to monitor the traffic. At some point, the files started to include exploit code. In this case, the malware was outside the control of the webmaster, but could still be dangerous to the users.

"Examining our data corpus over time, we discovered that the majority of the exploits were hosted on third-party servers and not on the compromised web sites. The attacker had managed to compromise the web site content to point towards an external URL hosting the exploit either via iframes or external JavaScript."

Google started to flag the web sites that try to install malware (example of query). They're still included in Google's index, but you'll have to manually copy the URL and paste it in the address bar to visit the site. Most of the pages let you download pirated software and music. Also the newest version of Google Desktop shows warnings if you visit one of these sites.

The best defense against these threats is to use more secure browsers like Firefox or Opera and to install anti-virus / anti-spyware software (Google Pack includes all of these: Firefox, Norton Security Scan and Spyware Doctor, but there other free alternatives).

{ via BBC, that hires people who don't know how to count and draw the inaccurate conclusion that "one in 10 web pages scrutinised by search giant Google contained malicious code that could infect a user's PC" .}


  1. Yay! No more malware! Thanks, Google!

  2. Google's of course right. There are hundreds of thousands of IE pages that deposit serious malware, just like there are tens of thousands of FF pages that do the same thing which Google happens to omit mentioning lol. Install SpywareBlaster to block all the validated IE & FF pages that are longtime offenders, or any of the other top-rated Hosts File managers.

    Google + Security is a sensitive matter. I'll be diplomatic and say Google isn't in the business of optimizing your system's security.

    They could've done it for free by selecting the proper freeware security products for the Pack, but they've chosen not to.

    Norton's anti-virus scan is useless baitware ~ Google could have easily chosen the excellent full-function free AVG Anti-Virus. Spyware Doctor Starter Version is equally useless baitware ~ Google had chosen the excellent full-function free Lavasoft AdAware for the original Pack but very strangely chose to delete it. It was a telling move by Google for those of us who are familiar with the Lavasoft program which does quite a bit more than removing IE third-party cookies if the user wishes it to ~ specifically, it gives the option to remove MRU identifiers on your system, something one might guess isn't exactly desired by Google. They also could easily offer the excellent Zone Alarm free firewall in the Pack, if they truly cared about user security.

    A useful plug-in for Googlers using Firefox is customizegoogle dot com. It offers brilliant security functions such as encrypted GMail, disable tracking cookies from Google searches, disable ad's on GMail pages, etc.

    Google's a great company but we should all bear in mind that, at least for the current time, they're an advertizing firm. And online advertizing ~ by its very nature ~ brings with it a host of security issues.

  3. That's nothing. Check out how many of those sites are hacked and contain malware because of that. And then check out how many sites were hacked for SEO reasons. Once you've searched for an hour and have 10k domains in your list, try to contact them about it: 99+% don't answer, 90+% don't react. People don't care if their site is hacked. Companies don't care. Government agencies don't care. Then add XSS to the picture.... How bad do they have to hurt to care about their sites being misused?

  4. "Hacking" is a convenient excuse to host malware. Quit being such a dupe. You're being SEd.

  5. Matt Cutts, from Google, writes more about this.

    "All in all, I think Google does a pretty good job of protecting users from getting infected, while at the same time providing tools that assist webmasters in detecting and correcting hacked urls that could spread malware. Certainly compared to other search engines I think we provide more notice to users about potential malware urls, and we provide more info to webmasters about potentially hacked urls. So I think Google’s response to this issue balances the needs of users and webmasters pretty well."

  6. How did you arrive at that keyword to find the size of Google Index?

  7. At one point, you could see the number of pages from Google's index on the homepage. Then Google removed the number, but you could search for "* *" and still see it. After some months, Google disabled that query (and other similar void queries).

    The best replacement I could find is a query like "the * * * *" that contains a very frequent English word and many wildcards. The more stars you add, the bigger is the estimated number of results. The biggest number I could find is 20,330,000,000, but this is just an estimation and doesn't include too many non-English web pages.

  8. The study says they analyzed about 4,5 mln URLs and found 450 000 malicious. That's actually 1 out 10 pages. Where is BBC wrong?


  9. The study says they analyzed all the pages from the web and found 4,5 mln URLs that needed further investigation. From these, 450,000 actually linked to malware.

    So it's 450,000 / Google's index.

    Quote from the report:

    << We analyzed the content of several billion URLs and executed an in-depth analysis of approximately 4.5 million URLs. From that set, we found about 450,000 URLs that were successfully launching drive-by-downloads of malware binaries and another 700,000 URLs that seemed malicous but had lower confidence. >>

    << Our automated analysis harnesses the fact that Google, as part of indexing the web, has the content of most web pages already available for post-processing. We divide the analysis into three phases: identification of candidate URLs, in-depth verification of URLs and aggregation of malicious URLs into site level ratings. (...) In first phase we employ MapReduce to process all the crawled web pages for properties indicative of exploits. >>

  10. how can we track down these websites?

  11. how do those website look like?

  12. Even Google confirms what I said:

    << Unfortunately, the scope of the problem has recently been somewhat misreported to suggest that one in 10 websites are potentially malicious. To clarify, a sample-based analysis puts the fraction of malicious pages at roughly 0.1%. The analysis described in our paper covers billions of URLs. >>

  13. << how can we track down these websites? >>

    Google Desktop warns if you visit one of these sites. Also the results from Google search show warnings like in the screenshot included in the post.

    << how do those website look like? >>

    Some of these sites have been hacked, so they look like any other site. Other sites use popular content like MP3s, serial numbers for software to attract users.

    I was running a free scan of my computer from Garbage Clean LLC my security stopped the scan.This site likes to inject malicious things into your files at the same time.I can't be the only sucker in this world but FREE seems to attract us to these lousy no-good sites THANKS GOOGLE! D.C.C.

  15. Here you listed the top 4 ways that malware gets into your computer: "web server security, user contributed content, advertising and third-party widgets." In my experience, it is advertising (or, false advertising) by a landslide. I would be very interested to see exactly how these numbers break down as far as percentages go. Perhaps you will consider posting this in the future?