Skip to main content

k-means clustering

In order to clustering data to reduce the noise, we can use the simple k-means clustering algorithm.
The idea of k-means is quite simple. Here is the step of the k-means algorithm.

1. Randomly pick samples (depending on how many groups we want to cluster) as the references.
2. Compute the distance between each data point and references.
3. Comparing the distance to each reference, grouping the data points from the shortest distance.
4. Compute the centroid of each group as the new reference.
5. Repeat 2-4, until the centroids are the same with the previous result.

Here is the Matlab code:
=======================================
% An example for k-means clustering
%
% Renfong 2018/12/19
%
% create test data
% There are 3 types of sigal
% 1-20, 36-70, 91-100 are the group 1
% 21-35 are group 2
% 71-90 are group 3
xx=0:1:1024;

cen=[120, 360, 780];
amp=[100, 60, 55];
sig=[50, 10, 30];

% peak 1+3
for i=1:20
    sp(i,:)=amp(1)*exp((-1*(xx-cen(1)).^2/(2*sig(1)^2))) + ...
        amp(3)*exp((-1*(xx-cen(3)).^2/(2*sig(3)^2))) + 10 * rand(size(xx));
end

% peak 2+3 
for i=21:35
    sp(i,:)=amp(2)*exp((-1*(xx-cen(2)).^2/(2*sig(2)^2))) + ...
        amp(3)*exp((-1*(xx-cen(3)).^2/(2*sig(3)^2))) + 10 * rand(size(xx));
end

% peak 1+3
for i=36:70
    sp(i,:)=amp(1)*exp((-1*(xx-cen(1)).^2/(2*sig(1)^2))) + ...
        amp(3)*exp((-1*(xx-cen(3)).^2/(2*sig(3)^2))) + 10 * rand(size(xx));
end

% peak 1+2
for i=71:90
    sp(i,:)=amp(1)*exp((-1*(xx-cen(1)).^2/(2*sig(1)^2))) + ...
        amp(2)*exp((-1*(xx-cen(2)).^2/(2*sig(2)^2))) + 10 * rand(size(xx));
end

% peak 1+3
for i=91:100
    sp(i,:)=amp(1)*exp((-1*(xx-cen(1)).^2/(2*sig(1)^2))) + ...
        amp(3)*exp((-1*(xx-cen(3)).^2/(2*sig(3)^2))) + 10 * rand(size(xx));
end


% k-means parameter
group=3;
[m,n]=size(sp);

% do k-means
distMap=zeros(m,group);
c=randsample(m,group);
cen=sp(c,:);

h1=figure(1);
set(h1,'Position',[220 378 560 420]);
set(h1,'Color','white');
for i=1:group
    subplot(group,1,i);plot(cen(i,:));
end

temp=cen;
count=1;
while 1
    fprintf('step %i. \n',count);
    for i=1:m
        for j=1:group
            distMap(i,j)=sqrt(sum((sp(i,:)-cen(j,:)).^2));
        end
    end

    [minD,ind]=min(distMap,[],2);

    h1=figure(100+count);
    set(h1,'Position',[220 378 560 420]);
    set(h1,'Color','white');
    for i=1:group
        cen(i,:)=mean(sp(ind==i,:),1);
        subplot(group,1,i);
        plot(cen(i,:));
        axis([0,n,0,inf]);
        if i==1
            title(['Iter=',num2str(count,'%03i')]);
        end
    end
    
    h2=figure(200+count);
    set(h2,'Position',[820 378 560 420]);
    set(h2,'Color','white');
    plot(ind);
    title(['Iter=',num2str(count,'%03i')]);
    axis([0,m,0,group+1])
    
    if temp==cen
        break;
    else
        temp=cen;
        count=count+1;
    end

end

===========
The results:


Comments

Popular posts from this blog

Top hat filter

The top_hat filter can be used to detect the relatively small edges/peaks superimposed on large background signals. The concept came from the EELS workshop during IMC19. Thanks to Prof. Nestor J. Zaluzec. -- // Using Top_hat digital filter to detect the  relatively small edges  //    superimposed on large background signals. // // ref: Ultramicroscopy 18 (1985) 185-190  //      Digital Filters  for Application to Data Analysis in EELS //      by Nestor J. ZALUZEC // Parameters: // win_s: signal window (default:3) // win_b: background window (default:3) //  a_s : amplitude of signal (fixed value) //  a_b : amplitude of background  (fixed value) // Renfong 2018/10/11 // Main function image Top_Hat_Filter(image img, number win_s, number win_b) { // read image string fname=img.GetName() number sx,sy img.getsize(sx,sy) // filter image img2 := imageclone(img)*0 //the area between...

HyperSpy - read the calibration information in a dm3/dm4 file

Some example of dm3 file reading by using Python HyperSpy package, which can read the detail information of the dm file. -- # import packages import numpy as np import hyperspy.api as hs # load file sp=hs.load('sp.dm3') # Read the axis information      # Print all the calibration detail print(sp.axes_manager) ''' <Axes manager, axes: (272|2042)>             Name |   size |  index |  offset |   scale |  units  ================ | ======= | ====== | ======= | ======= | ======                     x |    272 |      0 |       -0 |  0.0025 |     µm   --------------- |  ------ | ----- |  ------ | ------- | ------    Energy loss |  2042 |         | 3.2e+02 |       1 |     eV...

MLLS in matlab

MLLS stands for  multiple linear least squares fitting, which is the common strategy for the solving EELS edge overlapping and which is also built-in the GMS software. The target spectrum Y and the reference spectrum X Y = A * X Assuming Y is 1*256 matrix and we have three reference spectrums, ie, X is 3*256 matrix. So A is 1*3 matrix. The target is to solve A. If Y and X are n*n matrices, we can use the simple formula Y * inv(X) = A * X * inv(X), ie., A = Y * inv(X). However, Y and X are not n*n  matrices, it is necessary to have some trick to solve it. We can multiply the transpose matrix to produce n*n matrix. Y * X' = A * X * X'  (ps X' means the transpose matrix of X) so A = Y * X' * inv(X * X') Here is the Matlab code: =========  % create target spectrum x=0:256; c=[90,120,155]; sig=[5,10,8]; int=[5,10,8]; xn=zeros(size(x)); ref=zeros(length(c),length(x)); factor=rand(size(c))'; for i=1:length(c)     xn=xn+int(i)*ex...