## cluster techniques

Topics about the trajectory clustering program for HYSPLIT.
tianshui
Posts: 5
Joined: September 4th, 2013, 4:27 am
Registered HYSPLIT User: Yes

### cluster techniques

Which cluster technique does the hysplit model use ?K_means algorithm /self-organizing maps?Or which does it like?

ariel.stein
Posts: 639
Joined: November 7th, 2012, 3:14 pm
Registered HYSPLIT User: Yes
Contact:

### Re: cluster techniques

Description of clustering process:

Initially, total spatial variance is zero. Each trajectory is defined to be a cluster, in other words, there are N trajectories and N clusters. For the first iteration, which two clusters (trajectories) are paired? For every combination of trajectory pairs, the cluster spatial variance (SPVAR) is calculated. SPVAR is the sum of the squared distances between the endpoints of the cluster's component trajectories and the mean of the trajectories in that cluster. Then the total spatial variance (TSV), the sum of all the cluster spatial variances, is calculated. The pair of clusters combined are the ones with the lowest increase in total spatial variance. After the first iteration, the number of clusters is N-1. Clusters paired always stay together.

D = distance between a trajectory endpoint and the corresponding cluster-mean endpoint

SPVAR = SUM(all trajectories in cluster) [SUM(all trajectory endpoints) {D*D} ]

TSV = SUM(all SPVAR)

For the second iteration, which two clusters are paired? The clusters are either individual trajectories or the cluster of two trajectories that were initially paired. Again every combination is tried, and the SPVAR, and TSV for each is calculated. The two clusters combined are the ones that result in the lowest increase in TSV. The percent change in TSV and number of clusters (N-2) are written to a file.

The iterations continue until the last two clusters are combined, resulting in N trajectories in one cluster.

In the first several clustering iterations the TSV increases greatly, then for much of the clustering it typically increases at a small, generally constant rate, but at some point it again increases rapidly, indicating that the clusters being combined are not very similar. This latter increase suggests where to stop the clustering and is clearly seen in a plot of percent change in TSV vs. number of clusters, where the number of clusters are decreasing to the right on the plot. The iterative step just before (to the left of on the plot) the large increase in the change of TSV gives the final number of clusters. Typically there are a few "large" increases.

tianshui
Posts: 5
Joined: September 4th, 2013, 4:27 am
Registered HYSPLIT User: Yes