2006 trends on Data Mining Research
Welcome back to Data Mining Research! I hope you enjoyed your holidays. Although I will not make predictions about the future of data mining, I want to highlight three topics that have emerged from last year posts on this blog.
The first one is the data mining software or language used by people in research and industry. It is clear that several possibilities exist (examples can be found on this post). I think that the diversity of people using them, as well as their aim, makes it difficult to have a universal language for data mining.
The second topic is about data mining pitfalls and the related difficulties for beginners using data mining as a tool. After discussing on the post about data mining pitfalls and garbage in, garbage out, it is clear that many different pitfalls and traps stand on the knowledge way.
The last one, related to the previous one, concerns the automation of the data mining task. One of the main issue concerning the management of the above mentioned pitfalls. How to automate clustering when the number of cluster is unknown? How to automate neural networks avoiding underfitting and overfitting? How to choose the right data mining method to use? Some of these questions may be answered through following a methodology in a book. In addition, companies such as KXEN may be helpful.
3 comments:
In regards to your comment about the data mining tools and development languages which people use, there is a poll on free tools currently (Jan-03-2007) being conducted over on the Data Mining and Predictive Analytics log.
Do you see Graph-Based DM as a soon future Trend?
Thank you,
Ed Garcia
I don't know graph-based data mining, so I will not make random predictions. In books and papers I recently read about future challenges in data mining, nobody has pointed out this subject. However, as you certainly know, books not always tell the truth :-)
Post a Comment