My blog has moved! Redirecting...

You should be automatically redirected. If not, visit http://www.dataminingblog.com and update your bookmarks.

Data Mining Research - dataminingblog.com: Google

I'm a Data Miner Collection (T-shirts, Mugs & Mousepads)

All benefits are given to a charity association.
Showing posts with label Google. Show all posts
Showing posts with label Google. Show all posts

Tuesday, March 11, 2008

Google mines behavioral data

For what? Simply to improve their SERP (Search Engine Result Pages). The ultimate aim of Google is to help users to find the information they are looking for (at least it should be their aim). By mining data they have gathered from Google Toolbar, Google Analytics, Feedburner (it is now part of Google) and certainly other free tools, they are certainly trying to get better SERP.

Maybe some of you already thought it was the case. This news is giving a better insight regarding this issue. However, as it is often the case when making conclusions from experiments, their finding is maybe not generalizable to other webpages. This news has thus to be taken with caution. Maybe some of you have an opinion about this? Feel free to comment.

Continue reading... Sphere: Related Content

Thursday, August 02, 2007

Google mining the web for blog spam comments

It is always a pleasure to come back from holidays and read comments on your blog. However, not all comments are worth spending time. An example of undesirable comment can be found here. After a first read, it already sounds like a strange comment. Expression such as Hey buddy! make it feel that it is spam. If you go deeper in the text, you can see that there is no personal information about my blog or the topics I cover. This is typical of spam comments.

However, deleting a normal comment may be annoying, especially for the guy who posted it. If you want to avoid the extremity of word verification process or comment moderation, a simple solution is to use Google. Just copy/paste the first line of text and Google will mine the web for other similar comments.

In the case described above, simply put the first sentence (using quotes to get the exact match) and Google will link you to this site. You can easily see that the first comment is exactly the same and can therefore be safely considered as spam.

Continue reading... Sphere: Related Content

Wednesday, June 06, 2007

Using Google to mine sensitive data

The time when hackers where trying to directly attack companies may be over. Nowadays, sensitive information about a company can be found using Google and associated tools. A recent article on SearchSecurity.com explains how hacker techniques may benefit from Google search engine. The author, Bill Brenner, explains the tools used these days by hackers to find useful information about a given company (your competitors could even pay hackers to find such information). He gives many examples of possible tools (e.g. Google Patent Search, Google Blog Search, etc.).

Brenner also warns against the use of blogs inside a company. Intellectual property may be shared involuntarily. I think one of the first (and more simple) example of such information you can find on the web, was regarding the ability of search engine such as Google to mine .doc or .pdf files. Before Google, people could put their sensitive documents on the web and they where somehow hidden since nobody knew the exact URL. Now techniques are more complex as you will see if you read Brenner's article.

Continue reading... Sphere: Related Content

Friday, November 24, 2006

Now boarding!

Here is some food for the week-end:

  • Will is explaining a good alternative to the standard Euclidean distance by introducing the Mahalanobis distance on his blog
  • Andy is writing about the fact that Google seems to start integrating blog post in its results (pointed by Matthew)
By the way, I would like to thank Joël Arnold for the nice drawing he made for me (picture on the right).

Continue reading... Sphere: Related Content

Monday, November 20, 2006

Google #1 in 2007

If you thought, like me, that Google was the most visited website in the world, then you're wrong. At the moment, the most visited website is Yahoo! with 130 million visitors a month. However, this will change in 2007 according to an article of MarketWatch (citing Citigroup). According to predictions, Google will be the most visited website worldwide in 2007. And as you perhaps know, Google is keeping trace of every web search made on their search engine. Can you imagine the quantity of data Google will then be able to mine? (Picture from www.vivelavie.fr)

Continue reading... Sphere: Related Content

Friday, October 27, 2006

Smile! You have just been mined!

You think this title is a provocative one? Indeed you're right. Data mining is a powerful tool. Therefore, the way you use the notion of mining something or (worst) someone's behavior will certainly generate curiosity as well as anxiety. This is not justified in my blog, since you haven't really be mined (you already knew it?). In fact, only some information of your IP address has been mined. Therefore, I'm not able to say who you are, respectable reader of my blog, but only from which town you come from. This is one of the many possibilities proposed by Google Analytics. A plot of some of the geographic connections to this blog for 2006 is given below.

Not all geographic connections are taken into account in this picture.

This function allows you to know how many people connect to your website and from where. Of course, the idea is not to look for someone precisely (to know that someone connects from New York is of no help to be honest), but rather to have a worldwide view of reader locations.

Continue reading... Sphere: Related Content
 
Clicky Web Analytics