My blog has moved! Redirecting...

You should be automatically redirected. If not, visit and update your bookmarks.

Data Mining Research - Why not to use Wikipedia as a reference

I'm a Data Miner Collection (T-shirts, Mugs & Mousepads)

All benefits are given to a charity association.

Friday, August 31, 2007

Why not to use Wikipedia as a reference

I recently had in interesting discussion with my director about references. The goal of references is twofold. The writer, can refer some texts positively. It is the case if he uses an existing algorithm, for example. A reference can also be used negatively. This happens when the writer want to highlight lack in the literature. He can thus justify the originality of his work. As already mentioned in an former post, some book or articles refer to Wikipedia. That's where things go bad...

Wikipedia is a huge database, an open source encyclopedia (i.e. anybody can contribute to it). The main advantage is that you get a tremendous quantity of articles in any domain. This is a good source to get fast information. But there are two main drawbacks. First, anybody can modify it. Some people may stop me and argue that articles are reviewed by the community. The problem, concerning references, is more regarding the second drawback of Wikipedia: the content changes over time! And this is really bad...

When writing a data mining article, people usually refer to journals or books. The implicit assumption is that the content is fixed and will not evolve over time. If a writer refers today to a data mining algorithm on Wikipedia, he has no guarantee that it will be the same in two, three or ten years. Requirements for reliable references are that they i) cross the time without any changes and ii) that they are easily accessible. Following these ideas, writers should refer articles in this order of priority:

  • Journal articles or books
  • Thesis (if in English)
  • Technical reports
  • Conference proceedings
  • Websites (but they should be avoided)
I think Wikipedia is definitely not a good place to refer. What is your opinion? Is Wikipedia a reliable referring source?

[Thanks to Prof. Ian Smith, for fruitful discussions about this topic]

Sphere: Related Content


Anonymous said...

You can use to give you a snapshot of a webpage at a point in time. For instance, here's wikipedia's page on data mining as of June 29, 2007:

David Gerard said...

... or, as we recommend, link to the particular version in the history! - the version as of June 29, which is actually as of June 27.

Dean Abbott said...

I'll use Wikipedia for ideas, but I also realize that it is not necessarily vetted. I still prefer trusted authors and publications for reliable information. That said, I must say too that most of the time, I find the content related to data mining pretty good on Wikipedia.

Sandro Saitta said...

I think the same problem may happen with since it may no longer exist in a few years.

Regarding the particular version history, although it is a solution, web addresses may change

I agree, that the above mentioned drawbacks happen only in worst cases. But the Web is constantly evolving. However, Wikipedia is definitely a good source of inspiration for data mining and related algorithms.

Will Dwinnell said...

I am reluctant to use anything on the Internet as a reference, unless it is something which has already been published elsewhere.

The fundamental nature of the Internet is that it makes sharing of information extremely easy.

This is beneficial to the extent that barriers, such as the economics of book and article publishing are removed. Yay! Now that guy who's an expert on dragonflies in some small town in Wyoming (and who isn't associated with a university) has a mechanism for sharing his knowledge.

This is detrimental to the extent that the Internet also removes the filtering effect provided by more traditional publishing processes. Ugh. Now every idiot who's got some crackpot theory has a soap-box to stand upon.

Quality of material on-line, including Wikipedia, is certainly mixed.

Will Dwinnell said...

Here's one perspective on this subject:

The Faith-Based Encyclopedia

Sandro Saitta said...

Here are other examples from

Datashaping said...

Wikipedia has an history of arbitrary censorship, even on subjects such as data mining, or statistics. They censor on a very large scale: more than 50% of the best references are blacklisted on Wikipedia.

Sandro Saitta said...

Thanks for the information, it is always good to know.

Shilpa said...

can anybody help me in getting the code related to "Semantic annotation applied to Frequent PAttern Mining"

Clicky Web Analytics