Kmeans

算法学习-Kmeans聚类

farthest_first_traversal

1
2
3
4
5
6
7
8
input Data(a set of points), k

initial_center = a single random point in Data
while |centers| < k:
data_point = the point farthest from the centers
centers.add(data_point)

return centers

Objective function:

1
2
3
4
5
\text{Distortion}(\text{Data}, \text{Centers})
=
\frac{1}{n}
\sum_{i=1}^{n}
d(x_i, \text{Centers})

compute the squared error distortion

K-means聚类

1
2
3
4
5
6
7
8
9
10
input data,k
randomly select k points in data as initial center


compute the distance of each point to each center
label the point by the nearest center
then get the inital cluster
compute the gravity center by the current cluster

repeat the process by num_iters

EM-example by coins

1
2
3
4
5
6
7
8
9
10
there are two biased coins, with probability of head A,B.

if you know the hidden_vector, you can initialize the parameter
sequence1, 0.9 1
sequence2, 0.4 0
sequence3, 0.8. 1

A = 0.85, B = 0.4

if you don't know, you can guess the parameter, and then

UPGMA

留着之后再做