My blog has moved! Redirecting...

You should be automatically redirected. If not, visit http://www.dataminingblog.com and update your bookmarks.

Data Mining Research - dataminingblog.com: data mining

I'm a Data Miner Collection (T-shirts, Mugs & Mousepads)

All benefits are given to a charity association.
Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts

Friday, July 13, 2007

Why is Matlab the best language for data mining?

While starting a new project a few days ago, I had to answer the recurrent question: What language do I choose? In research, we have the opportunity of choosing any language, free or not. This is usually not the case in industry where the language can be fixed for many reasons (price, customer choice, boss choice, same as existing system, etc.).

I basically had to choose between Java and Matlab (C++ was soon deleted from my list since I don't like to spend time on pointers and manually free up the memory, but this is very personal). Of course a lot of others are available, but I feel more confident with these two. As most of my work was done with Matlab, I decided to start with Java. Contradictory? Not at all, I just wanted to know how easy it was to use Java for raw data mining tasks (i.e. without using JDM framework or such).

When doing data mining, a large part of the work is to manipulate data. Indeed, the part of coding the algorithm can be quite short since Matlab has a lot of toolboxes for data mining. And when manipulating data, Matlab is definitely better. It is normal since it is done to work with matrices (MATrix LABoratory). Thus, deleting a row, a column, transposing a matrix, calculating the determinant... all these can be done in one line of code. To my knowledge, this is not the case with Java, but if you know some way, feel free to comment.

For more information about using Matlab for data mining, the best place is Will's blog. In the next post, I will write about the other side of the coin and explain some of Matlab's drawbacks.

Continue reading... Sphere: Related Content

Tuesday, April 24, 2007

Data mining definitions

In the literature, the field of data mining can be found under several other terms. Below are examples of definition related to the field of data mining:

  • Machine Learning: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." (Mitchell, 1997)

  • Exploratory Data Analysis: "A philosophy of data analysis where the researcher examines the data without any pre-conceived ideas in order to discover what the data can tell him about the phenomena being studied." (Martinez and Martinez, 2004)

  • Pattern Recognition: "Statistical pattern recognition is a term used to cover all stages of an investigation from problem formulation and data collection through to discrimination and classification, assessment of results and interpretation." (Webb, 2002)

  • Data Mining: "Data mining is a technology that blends traditional data analysis methods with sophisticated algorithms for processing large volumes of data." (Tan et al., 2006)

  • Knowledge Discovery: "[...] a new generation of techniques and tools with the ability to intelligently and automatically assists humans in analyzing the mountains of data for nuggets of useful knowledge." (Fayyad et al., 1996)
Some of these definitions are technical while other are intuitive. Do you have other examples of such definitions or remarks about these ones? Feel free to comment.

Continue reading... Sphere: Related Content
 
Clicky Web Analytics