My blog has moved! Redirecting...

You should be automatically redirected. If not, visit http://www.dataminingblog.com and update your bookmarks.

Data Mining Research - dataminingblog.com: Java data mining

I'm a Data Miner Collection (T-shirts, Mugs & Mousepads)

All benefits are given to a charity association.

Monday, October 23, 2006

Java data mining

Are you interested in data mining? Yes... you are reading this blog. Do you use to program in Java? If yes, then the book Java Data Mining can interest you. It is briefly described by KDnuggets. To my point of view, Java is perhaps not the best language to use for data mining. Either you are in the industry and need a fast running application; then you will certainly use C++ or .NET. Or you are doing research and you need more interactivity and simplicity while coding; then you will probably use MATLAB for example. Java is neither as fast as C++, nor as easy to use as MATLAB. Perhaps this book will tell you why to use Java for data mining. By the way, a good book on data mining (with Java examples) is Data Mining: Practical Machine Learning Tools and Techniques.

Sphere: Related Content

8 comments:

Anonymous said...

There are two freely available open-source data mining suites implemented in Java: WEKA and YALE. YALE comes with an easy to use graphical user interface (GUI), but can also be used from the command line or as a library by your own programs. YALE provides more than 400 data mining operators and fully integrates WEKA. For further details see http://yale.sf.net/

For more open-source software for data mining, you may want to check the corresponding lists at Wikipedia or KDnuggets.

Best regards,
Ralf

Sandro Saitta said...

Hi Ralf,

Thanks for the comment. I did know WEKA but not YALE. I think WEKA is well known in the data mining community. As a researcher, I prefer to work with Matlab. However, I'm quite sure that WEKA and YALE are more useful to data mining practitioners.

Anonymous said...

Hi Sandro,

WEKA is better known, because it is the older project and up to now also more wide-spread. As far as I know, WEKA started some time around 1998, while YALE started in 2001. However, by now YALE is far more comprehensive than WEKA as far as the flexibility of the experimental setup and the number of available operators is concerned.

As far as the wide-spread use of WEKA and YALE is concerned, YALE is catching up quickly. This month, YALE has already had more than 16.000 downloads, as counted by the SourceForge.net download statistics:
YALE download statistics.

WEKA and YALE are both used in academia for research and teaching as well as in industry for research, development, and applications. I know, that there are many researchers with a personal preference for R or MatLab, especially those with a background in Mathematics, Statistics, or Physics. Nonetheless there also many researchers which prefer WEKA and YALE for their work. Anyway, we live in a free world and everybody should be free to choose his favorite tool(s).

WEKA and YALE try to address both, researchers and practitioners. By the way, as you can tell from
my home page
, I am also a researcher. ;^)

Best regards,
Ralf

Sandro Saitta said...

As you have perhaps seen, I have added a post about our discussion.

Anonymous said...

Hi Sandro,

yes, I saw your post about our discussion. Thanks for your blog entry about YALE.

The WEKA project started even earlier than I had thought, in 1993. So the WEKA project started already 13 years ago.

The YALE project started in 2001, i.e. only about 5 years ago.

The R project and its R programming language is another widely used open-source data mining tool with a large and active user community. There is also an alternative graphical user interface (GUI) for R called Rattle.
However, R is not implemented in Java, like WEKA and YALE, and hence may be a little bit of topic here (for a blog entry on Java Data Mining).
Most of the user-visible functions in R are written in R, an interpreted language. It is possible for the user to interface to procedures written in the C, C++, or FORTRAN languages for efficiency.

Best regards,
Ralf

Anonymous said...

Besides of Rattle, there are many alternative GUIs for R integrating the R programming language.

Sandro Saitta said...

Ralf,

Thanks for all these information. I'm sure it can help people choose (or change?) the programming language used for data mining.

Will Dwinnell said...

I;d just like to say, as a practitioner, that MATLAB is my tool of choice for statistical and datamining work. Although I do research on my own, I am not an academic. For the past 4 years, I've been working for a bank.

 
Clicky Web Analytics