Hereby is a notebook that explores K-means in the context of PBxplore.

alexdb27 commented Sep 23, 2015

@jbarnoud, About clustering reproducibility

I am not sure I understood everything you asked. I carried out 100 times the clustering of the same 270 PB sequences from a MD trajectory, and we can observe that the succession of clusters along the trajectory is not always the same. The figure is difficult to analyze, I will try to come up with something more quantitative and more readable.

-> Ok, i've fixed our own confusion point. You cannot look at the cluster i in simulation S(t) and look if it looks at cluster i in simulation S(t+1). What you need is to do a (i) a confusion table that is based only on data associated to each cluster. The principle is that to define the number of data found both in each cluster in simulation S(t) and in simulation S(t+1). (b) you take the max for each line (or column) and it gives you the correspondance between one cluster [S(t] and another [resp S(t+1)]. (c) you sum all and you have the true confusion. You do that cycle (t) after cycle and you will see if it reproducible

alexdb27 commented Sep 23, 2015

@ jbarnoud On testing the method

I would like to test the pertinence of the clustering on structure similarity within the clusters. What do you usually use to compute GDT TS and TM-scores?

It is mainly RMSD. GDT TS and TM-scores will be quite not sensitive for so highly similar structures. :-)

pierrepo commented Sep 23, 2015

Impressive indeed and very nice.
About scipy, I was juste wondering if the built-in k-means clustering implemented in scipy was easier-to-use / quicker.

HubLot commented Sep 28, 2015

Great job Jonathan!
RMSD could be a nice measure for the different clusters. The issue on a regular MD (the 270 sequences you tested) is to know the good number of clusters. Maybe '4' is not a good one, hence the reproducibility is hard to assess.
The issue, I think, about built-in k-means it is really difficult to have a custom distance metrix and a custom representation of the centroids.

jbarnoud/kmeans.ipynb

Select an option

No results found

Select an option

No results found

alexdb27 commented Sep 23, 2015

Uh oh!

alexdb27 commented Sep 23, 2015

Uh oh!

pierrepo commented Sep 23, 2015

Uh oh!

HubLot commented Sep 28, 2015

Uh oh!