My blog has moved! Redirecting...

You should be automatically redirected. If not, visit http://www.dataminingblog.com and update your bookmarks.

Data Mining Research - dataminingblog.com: Top 5 Reasons R is Good for you

I'm a Data Miner Collection (T-shirts, Mugs & Mousepads)

All benefits are given to a charity association.

Wednesday, January 21, 2009

Top 5 Reasons R is Good for you

After reading the interesting post of Ajay, I decided to write a post about the good aspects of R. First, I would like to state that I'm not a SAS nor a Clementine user. So the following arguments are my opinions as a R programmer:

  • R is easy and free to improve: R contains hundreds of useful packages (data mining, finance, etc.). If this is not enough, you can program your own packages and share them with others. You are not dependent on some programmers.

  • R is a white-box: Since R is a programming language, it is easy to understand the overall process of the system in development. There is no GUI that allows you to put black-box components that may be unclear.
  • When you know R, you know everything: Ok, this is a bit too much. But the message is that it is much more easier to start with R and then move to SAS or Clementine than the opposite. Especially for users who only use the GUI.

  • R is free: This is very good since small companies don't have the money to buy SAS or Clementine. Also, if several users need such tools, then the price increase. Of course, in a large company, SAS and SPSS tools may be an alternative.

  • R is a good choice: R is as convenient as Matlab (or even more?) and as cheap as Java (which means free). Which makes R an excellent choice among existing tools and programming languages.

Here is an article about R from the New York Times. Since the above list is completely subjective, you are invited to give your own opinion by posting a comment.

Sphere: Related Content

8 comments:

Steffen said...

I totally agree...

I wondered ... Sandro, can you recommend a good R Programming Book ? Or (more important) Software Development with R (S4 ...) ? One of the drawbacks of a scripting language like R is the invitation to hack code together...

kind regards,

Steffen

Matthias said...

Ok with you.

R is living! A lot of functions, methods, docs and tutorials totally free!

Unfortunately, R is incapable to work with matrix larger than the physical memory of the PC. But if you work on "small" datasets (or aggregated data), it's the one.

Nevertheless, this is an excellent companion for a data miner (see deeply the data, build amazing grahics or develop personal algorithms).

Thanks you Sandro.

Erik said...

I would like to give the top one reason I think why R is not used in operational data mining: One of R main weaknesses is the way data is managed. There is a workspace in memory in which data have to be imported and then from which results are exported. This means that for big dataset memory issues are frequent.

Remember that the vast majority of operational data mining (I mean by that, the data mining projects which results are used operationally on a day to day basis) are made in CRM. In this field, we have regularly training data sets with hundreds or thousands columns and hundreds of thousands lines, so R is cornered into domains with less data volume constraints.

Sandro Saitta said...

@Steffen: I don't know about R books, but I'm sure they exist. I prefer to use tutorials such as Data Mining with R, for example.

@Matthias: I agree that R has some limitations, and maybe in some situations (very big data sets) it is not possible to use R.

@Erik: That's a very good point. In fact I have the same issue in using R in finance since I have to load all prices for a given time period and a set of stocks... in my case, this is not feasible under Windows (due to RAM limitations).

Will Dwinnell said...

R is as convenient as Matlab

Whoa! Let's not say anything that we can't take back! Heh heh...

Actually, I am curious as to scalability. I see that someone else has mentioned a limitation in data size to physical RAM, but I wonder more about speed of computation. In my limited experience several years ago with S-Plus (R's commercial cousin), performance on data sets I would consider small was abysmally slow. Can you characterize R's performance on data tables whose size are typical of data mining projects?


-Will Dwinnell
Data Mining in MATLAB

Sandro Saitta said...

@Will: Thanks for your comment! What I meant by "R is as convenient as Matlab", was in the programming point of view (I realized the sentence was not clear enough). It is easy to program in R and Matlab (compared to other languages). Of course, this is a very personal point of view.

Regarding R's performance, I have made no test up to now.

Paolo said...

@Sandro
Regarding the performance issue (and more), the R-help mailing list can be very useful: see, for example, the thread starting here:
http://tolstoy.newcastle.edu.au/R/e6/help/09/01/0138.html

Sandro Saitta said...

@Paolo: Thanks for the link. This is an interesting discussion!

 
Clicky Web Analytics