My blog has moved! Redirecting...

You should be automatically redirected. If not, visit http://www.dataminingblog.com and update your bookmarks.

Data Mining Research - dataminingblog.com: Why is Matlab the best language for data mining? (cont'd)

I'm a Data Miner Collection (T-shirts, Mugs & Mousepads)

All benefits are given to a charity association.

Sunday, July 22, 2007

Why is Matlab the best language for data mining? (cont'd)

In the previous post, I was arguing that Matlab is an excellent programming language (and environment) for data mining. However, as you know, no programming language is perfect. Matlab also has its drawbacks. Here are a few of them.

First, Matlab is an interpreted language. Which means no compilation. The good thing is the on the fly programming-executing aspect. On the other side, there is no declaration or type checking. This is normal since Matlab is not by definition a typed language. Or if you prefer, every element is a matrix. For example, the number 3 is stored as a 1x1 matrix containing the number 3. As there is no type checking, if you put by mistake '3' as a string in your matrix, you will have no error and no warning.

Again, as there is no declaration, this situation can happen in Matlab:

myvariable = 0;
...
myVariable = 10*i;
...
disp(myvariable)

Indeed, since no declaration is needed, mistyping errors are dangerous in Matlab.

Another issue in Matlab is its execution time which is quite high in comparison to C++ or even Java. Of course one solution is to use the MEX interface with which you can directly call C/C++ code. However, the communication between Matlab and the C code takes time and it is generally slower then a direct C/C++ code.

Even with these limitations, I'm personally convinced that Matlab is a very powerful tool for data miners. The main reason is that you spend less time on the programming part and more on the problem your try to solve. If you think Matlab has other important drawbacks or on the contrary, if you think that the ones I mentioned are not really drawbacks, feel free to comment.

Note: Data Mining Research is on holidays until August, 2nd.

Sphere: Related Content

1 comment:

Will Dwinnell said...

For some time, MATLAB has been saddled with an undeserved reputation for being slow. I'll make the following points:

1. Depends on application: If the code does things which MATLAB is good at (numerical array manipulation, for example), then MATLAB tends to be faster, sometimes even faster than hand-built code in other "faster" languages. See, for instance, the answer given by Big Toe (Mtl) at:

How slow is Matlab?!?

2. Depends on code: Code which takes advantage MATLAB features such as vectorization will execute faster than otherwise. This can make an enormous difference.

3. Programmer time, readability and maintainability count, too. In a field like data mining, in which operations are performed across entire arrays of data, MATLAB can be easier to write, understand and modify. The lines-of-code ratio from MATLAB to many procedural and OO languages has to be 10-to-1. Elimination of loops, alone, will make an enormous difference. Leveraging built-in MATLAB functions will improve programming on all of these counts.

In conclusion, I will say that, for some applications, MATLAB will not be the fastest-executing choice, but this will not (nearly) always be the case.

 
Clicky Web Analytics