Data Mining Research

Thursday, March 08, 2007

The future of search engines

Mining the web for information relevant to the user is the main task of search engines. According to a recent article on TechNewsWorld, Google will not be on the top forever. The author has interviewed the president of SiteSpect, a provider of search engine marketing and Web optimization technology, for details about future of search engines. According to him, in the future of search engines Google is no more on the top.

This pessimist view of Google's future is supported by the fact that people are working on new search engines that provide different information to the user. According to Larry Kerschberg, a professor at George Mason University, complicated query such as "What is the best way to treat my cancer?" is the kind of questions that search engines should be able to answer tomorrow. Even if Google's role may be less important in a few years, it will certainly be in our everyday life with popular applications such as GMail and others.

Continue reading... Sphere: Related Content

Friday, November 03, 2006

When web mining meets clustering

Google is nowadays the most widely used search engine on the planet. A lot of people use it and are satisfied by its performances. However, Google suffers from several drawbacks. For example, a lot of results are redundant. It sometimes happens that Google gives you too much answers. Assume that you have an information on a .pdf file linked from a specific webpage itself belonging to an overall website. Google will perhaps give you three different links (the main website, the specific webpage and the .pdf file itself). Another drawback of Google (and many other free-text search engine) is the lack of structure among results. Information is given in a raw manner, without themes, hierarchies or categories. So, it often happens to be drowned under the information obtained. A search on the term data mining, for example, results in 52,600,000 hits.

Clusty, a recent search engine (Pittsburgh, 2004), is a good alternative to Google. Clusty is a meta search engine, which means it queries top search engines and combines the results for the user. Clusty use clustering techniques to group results into categories. The results are automatically clustered according to selected key-words. For the example of the term data mining, Clusty proposes 246 results that are part of 36,244,144 hits found. The figure below shows the results obtained.

Click on the picture to enlarge.
Clusty proposes clusters and sub-clusters that can be browsed (left part of the figure). Information is not raw as in Google, but rather organized. Up to now, the only drawback I have noticed regarding Clusty is about ads. They are to close to the results obtained and this sometimes induce confusion to the user.

Continue reading... Sphere: Related Content