Thursday, October 1, 2015

Data Mining by Clustering

Clustering is one of the most used techniques in the Data Mining field. Its objective, in ARGOS project, is to help analyzing existing data by subdividing these data into a small number of clusters or (groups) where the events inside any specific cluster are very similar to each other and very dissimilar to the events of the other clusters.
As a result of the clustering process, several important information will be obtained among which two major ones can be quoted:

• The reduction of a large amount of events into a smaller number of events which contain almost all of the useful and relevant information

• the highlighting of regular and recurring events on the one hand which represent the frequent and “normal” events and relatively rare events on the other hand which represent infrequent and eventually “abnormal” events
Each cluster is described by the most important attributes that participated to the grouping of its events. This leads to the discovery of the profiles of the "normal" events as well as the "abnormal" ones. 

The clustering method we use in Argos project is “Relational Analysis theory” [2]. This method has very powerful advantages with respect to the k-means method (the most used method for clustering), among which we can quote:

• no need to fix arbitrary the number of clusters to be found in data. 

• no need to fix the clusters’ centroids (like in the k-means method)

This method has been applied in several fields (insurance, banking, video recordings, marketing, etc.). An example of its use in video recordings can be found in [1].

By Dr. Hamid Benhadda and Mikael Griffoulieres  -  Thales Services


[1] H. Benhadda, J.L. Patino, E. Corvee, F. Bremond, and M. Thonnat. “Data mining on large video recordings”. Colloque V.S.S.T.2007 : Veille Stratégique Scientifique & Technologique, 21-25 Octobre, Marrakech, 2007.

[2] Mustapha Lebbah, Younes Bennani and Hamid Benhadda. “Relational Analysis for Consensus Clustering from Multiple Partitions”. Machine Learning and Applications. ICMLA 2008: Seventh International Conference. pp 218- 223. San Diego, California, December 11-13, 2008.


  1. Big data is set to take the healthcare industry to the next-level of profit making. However it is imperative that healthcare institutions take a more holistic, patient-centric approach that focuses on superior health-care results and treatment expenditures. See more benefits of data mining in healthcare

  2. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.

    Data Science Training in Bangalore

  3. It was really a nice article and i was really impressed by reading this
    Big data hadoop online training Bangalore