### cluster techniques

Posted:

**November 25th, 2013, 9:52 am**Which cluster technique does the hysplit model use ?K_means algorithm /self-organizing maps?Or which does it like?

A Forum for HYSPLIT Dispersion Model Users to Communicate Questions, Problems, and Ideas for Upgrades, etc.

https://hysplitbbs.arl.noaa.gov/

Page **1** of **1**

Posted: **November 25th, 2013, 9:52 am**

Which cluster technique does the hysplit model use ?K_means algorithm /self-organizing maps?Or which does it like?

Posted: **December 2nd, 2013, 3:05 pm**

Description of clustering process:

Initially, total spatial variance is zero. Each trajectory is defined to be a cluster, in other words, there are N trajectories and N clusters. For the first iteration, which two clusters (trajectories) are paired? For every combination of trajectory pairs, the cluster spatial variance (SPVAR) is calculated. SPVAR is the sum of the squared distances between the endpoints of the cluster's component trajectories and the mean of the trajectories in that cluster. Then the total spatial variance (TSV), the sum of all the cluster spatial variances, is calculated. The pair of clusters combined are the ones with the lowest increase in total spatial variance. After the first iteration, the number of clusters is N-1. Clusters paired always stay together.

D = distance between a trajectory endpoint and the corresponding cluster-mean endpoint

SPVAR = SUM(all trajectories in cluster) [SUM(all trajectory endpoints) {D*D} ]

TSV = SUM(all SPVAR)

For the second iteration, which two clusters are paired? The clusters are either individual trajectories or the cluster of two trajectories that were initially paired. Again every combination is tried, and the SPVAR, and TSV for each is calculated. The two clusters combined are the ones that result in the lowest increase in TSV. The percent change in TSV and number of clusters (N-2) are written to a file.

The iterations continue until the last two clusters are combined, resulting in N trajectories in one cluster.

In the first several clustering iterations the TSV increases greatly, then for much of the clustering it typically increases at a small, generally constant rate, but at some point it again increases rapidly, indicating that the clusters being combined are not very similar. This latter increase suggests where to stop the clustering and is clearly seen in a plot of percent change in TSV vs. number of clusters, where the number of clusters are decreasing to the right on the plot. The iterative step just before (to the left of on the plot) the large increase in the change of TSV gives the final number of clusters. Typically there are a few "large" increases.

Initially, total spatial variance is zero. Each trajectory is defined to be a cluster, in other words, there are N trajectories and N clusters. For the first iteration, which two clusters (trajectories) are paired? For every combination of trajectory pairs, the cluster spatial variance (SPVAR) is calculated. SPVAR is the sum of the squared distances between the endpoints of the cluster's component trajectories and the mean of the trajectories in that cluster. Then the total spatial variance (TSV), the sum of all the cluster spatial variances, is calculated. The pair of clusters combined are the ones with the lowest increase in total spatial variance. After the first iteration, the number of clusters is N-1. Clusters paired always stay together.

D = distance between a trajectory endpoint and the corresponding cluster-mean endpoint

SPVAR = SUM(all trajectories in cluster) [SUM(all trajectory endpoints) {D*D} ]

TSV = SUM(all SPVAR)

For the second iteration, which two clusters are paired? The clusters are either individual trajectories or the cluster of two trajectories that were initially paired. Again every combination is tried, and the SPVAR, and TSV for each is calculated. The two clusters combined are the ones that result in the lowest increase in TSV. The percent change in TSV and number of clusters (N-2) are written to a file.

The iterations continue until the last two clusters are combined, resulting in N trajectories in one cluster.

In the first several clustering iterations the TSV increases greatly, then for much of the clustering it typically increases at a small, generally constant rate, but at some point it again increases rapidly, indicating that the clusters being combined are not very similar. This latter increase suggests where to stop the clustering and is clearly seen in a plot of percent change in TSV vs. number of clusters, where the number of clusters are decreasing to the right on the plot. The iterative step just before (to the left of on the plot) the large increase in the change of TSV gives the final number of clusters. Typically there are a few "large" increases.

Posted: **December 22nd, 2013, 9:11 am**

Thank you for your answer.

May ask you another question.I use the HYSPLIT calculate the trajectory with the GDAS data,it has a variable mixed depth layer(MIXDEPTH).Is it the PBL(Planetary boundary layer height) of the GDAS data?If not,what the different between them?

May ask you another question.I use the HYSPLIT calculate the trajectory with the GDAS data,it has a variable mixed depth layer(MIXDEPTH).Is it the PBL(Planetary boundary layer height) of the GDAS data?If not,what the different between them?

Posted: **January 6th, 2014, 2:15 pm**

It should be the same