Data Mining Research

Showing posts with label MLDM. Show all posts

Monday, August 20, 2007

MLDM 2007: A brief overview

Here is the last post about the MLDM 2007 conference in Leipzig. As mentioned in an earlier post, several different topics were covered in this meeting. In my opinion, there were no trendy topics such as SVM, ANN, GA, etc. that flood other methods. Below, you can find a list of interesting papers.

"Kernel MDL to Determine the Number of Clusters" by Kyrgyzov et al. where they combine Minimum Description Length (MDL) with Kernel K-means to estimate the number of clusters.
"A Case-Based Approach to Anomaly Intrusion Detection" by Micarelli et al., a work that combines Case Based Reasoning (CBR) and clustering for intrusion detection.
"Comparing state-of-the-art collaborative filtering systems" by Candillier et al. (selected for best paper award) presents different collaborative filtering methods to point out their advantages/drawbacks and propose some basic options to consider when using a particular technique.

The title of the best paper award was: "Affine Feature Extraction: A Generalization of the Fukunaga-Koontz Transformation", written by Wenbo Cao and Robert Haralick. To be noted that R. Haralick gave an additional (very strange) presentation about data mining and religion. Of course, as you can imagine, this presentation was subject to a lot of discussion and controversy. Maybe some risky subjects should not be presented...

Continue reading... Sphere: Related Content

Thursday, August 16, 2007

MLDM 2007: Anil K. Jain's presentation on clustering

As written in the previous post, Anil K. Jain was the invited speaker of MLDM 2007. He gave an interesting presentation about clustering, focusing on the user's dilemma. He started with a comprehensive introduction on clustering and then showed some of the future work he is involved in: semi-supervised clustering and clustering with co-association. Below is the abstract of his presentation:

Data clustering is a long standing research problem in pattern recognition, computer vision, machine learning, and data mining with applications in a number of diverse disciplines. The goal is to partition a set of n d-dimensional points into k clusters, where k may or may not be known. Most clustering techniques require the definition of a similarity measure between patterns, which is not easy to specify in the absence of any prior knowledge about cluster shapes. While a large number of clustering algorithms exist, there is no optimal algorithm. Each clustering algorithm imposes a specific structure on the data and has its own approach for estimating the number of clusters. No single algorithm can adequately handle various cluster shapes and structures that are encountered in practice. Instead of spending our effort in devising yet another clustering algorithm, there is a need to build upon the existing published techniques. In this talk we will address the following problems: (i) clustering via evidence accumulation, (ii) simultaneous clustering and dimensionality reduction, (iii) clustering under pair-wise constraints, and (iv) clustering with relevance feedback. Experimental results show that these approaches are promising in identifying arbitrary shaped clusters in multidimensional data.

He made some interesting remarks during his talk. I have noted three of them:

K-means has been invented in 1955, 1957, 1965 and 1967 (!)
In a good feature space, any simple clustering algorithm will work
A clustering method is not the same as a clustering algorithm (an algorithm is an implementation of a particular method)

If interested, you can find more information related to his work.

Continue reading... Sphere: Related Content

Thursday, August 09, 2007

MLDM 2007: Clustering in Leipzig

I recently came back from the Machine Learning and Data Mining (MLDM) conference in Leipzig, Germany. This was an interesting meeting with various subjects. Out of the usual subjects such as classification (SVM, etc.), feature selection and clustering, a lot of papers were dedicated to applications of data mining.

Examples of application domains are:

Intrusion detection
Marketing data
Image mining
Medical and biological data mining
Text and document mining
Spam, Newsgroup, blog

A unique session was organized during three days. In comparison to huge conferences with parallel sessions, here the advantage is that more people are attending your presentation. I was personally there to present my work on cluster validity. The most interesting presentation, in my opinion, was the invited talk given by Anil K. Jain about data clustering (certainly because I'm myself involved in clustering). In the next post, I will point out some of his conclusions and recommendations for clustering.

Continue reading... Sphere: Related Content

Tuesday, May 01, 2007

Machine Learning and Data Mining (MLDM'2007)

Every two years, since 1999, the IBAI institute in Germany, is organizing the International Conference on Machine Learning and Data Mining (MLDM 2007). Well, even in Europe we have data mining related conferences :-) A lot of subjects are covered and applications papers are also encouraged in fields such as multimedia, biomedical and webmining.

I will go to MLDM this year to present an article about cluster validity. I hope to see some of you there. One of the keynote speaker is Anil K. Jain, a pioneer in the field of clustering. He is giving a presentation entitled "Data Clustering: User’s Dilemma". I'm sure this will be very interesting and I will give you some feedback after the conference.