My blog has moved! Redirecting...

You should be automatically redirected. If not, visit http://www.dataminingblog.com and update your bookmarks.

Data Mining Research - dataminingblog.com: Trends

I'm a Data Miner Collection (T-shirts, Mugs & Mousepads)

All benefits are given to a charity association.
Showing posts with label Trends. Show all posts
Showing posts with label Trends. Show all posts

Monday, May 14, 2007

SVM, neural network and decision tree

After reading a post concerning the PAKDD 2007 competition on Abbott's Analytics, I was curious about the trends of some data mining methods. I decided to play with Google Trends using three common methods: Support Vector Machine (SVM), Artificial Neural Network (ANN) and Decision Tree (DT). The following picture shows the trends in search on Google for the three terms "svm", "neural network" and "decision tree" since 2004:


Red = "neural network", blue = "svm", orange = "decision tree"

The main observation is that SVM and ANN seem to be less trendy these last years. It is interesting to see that DT are constant over the years. These are the first conclusions we could draw from this picture. However, it is always dangerous to conclude on some numbers. In the above case, several factors have to be taken into account when making such conclusions:
  • The way of writing the searched terms. For example, SVM could be found under "support vector machine", "support vector", "svm", etc. However, it seems that "svm" is most often used. The same remark for neural networks is also valid.

  • The diversity of search engines. Although the most popular, Google is not the only search engine on the web. A lot of people may use other engines such as Yahoo!, Live Search or All the Web. Only searches on Google are considered in this picture.

  • The difference between "searching" and "using". In other words, people may search for some methods but finally decide to use another one. Therefore, the fact that a keyword is often searched on Google does not mean that the corresponding method is used.
Consequently, even if these kind of plots look nice, interpreting the information they give and in which context it is valid is not an easy task.

Continue reading... Sphere: Related Content

Thursday, April 05, 2007

Data mining trends

I receive several emails of people asking me what are the trends in data mining and what will be the topics of interest of tomorrow. Since I'm not myself a data mining algorithm that may predict the future, I prefer to give a list of articles that deal with data mining trends. Here is an incomplete list of articles related to the future of data mining:

  • From the application point of view, the editorial from Perner, appearing in Engineering Applications of Artificial Intelligence (2006) is a recent resume of future trends in data mining.
  • A very comprehensive and quite long reference is the paper from Dietterich (1997) which is already ten years old.
  • The excellent paper from Fayyad and Uthurusamy contains a section about future trends in data mining. To be noted that other articles in this special issue of Communications of the ACM contain ideas about future directions for data mining.
  • Several possible applications for data mining and different topics of interest are discussed in an article by Hsu.
  • The technology-oriented article by Cios and Kurgan focuses on supporting data mining with XML and other languages and technologies.
Some articles may not be accessible without a subscription (i.e. university subscription). An earlier post about data mining trends is accessible on DMR. Feel free to comment this post with interesting articles or books about future trends in data mining.

[DMR blog is on holidays for two weeks and will be back on April the 23rd]

Continue reading... Sphere: Related Content

Wednesday, January 03, 2007

2006 trends on Data Mining Research

Welcome back to Data Mining Research! I hope you enjoyed your holidays. Although I will not make predictions about the future of data mining, I want to highlight three topics that have emerged from last year posts on this blog.

The first one is the data mining software or language used by people in research and industry. It is clear that several possibilities exist (examples can be found on this post). I think that the diversity of people using them, as well as their aim, makes it difficult to have a universal language for data mining.

The second topic is about data mining pitfalls and the related difficulties for beginners using data mining as a tool. After discussing on the post about data mining pitfalls and garbage in, garbage out, it is clear that many different pitfalls and traps stand on the knowledge way.

The last one, related to the previous one, concerns the automation of the data mining task. One of the main issue concerning the management of the above mentioned pitfalls. How to automate clustering when the number of cluster is unknown? How to automate neural networks avoiding underfitting and overfitting? How to choose the right data mining method to use? Some of these questions may be answered through following a methodology in a book. In addition, companies such as KXEN may be helpful.

Continue reading... Sphere: Related Content

Thursday, August 31, 2006

GIF or JPG?

If you are interested to know what is trendy these days (or years!), have a look at Google Trends. It is easy to use and give you a good idea about information people are looking for on the web (or at least on Google).

Here is Google definition of this functionality: "Google Trends analyzes a portion of Google web searches to compute how many searches have been done for the terms you enter relative to the total number of searches done on Google over time."

For example, you can try to see which format of images, between GIF and JPG, is most searched on Google. Results are shown in the figure below (GIF is blue and JPG red):

We can clearly see that JPG pictures are less used than GIF ones. This is normal to a certain extend, since GIF format has been made for the web. What is more interesting is that searches on JPG decrease (!)

Continue reading... Sphere: Related Content
 
Clicky Web Analytics