Ski Resort Clusterring near Montreal

November 18, 2017 by Byron ZHU

massif

Rational

My 3rd ski season in montérégie is coming, and this post uses a very basic machine-learning-unsupervised-clusterring algorithm to show an interesting analysis about how to choose your next ski journey.

Algorithm

Quote from wiki page

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

kmeans

Data

Collecting data is always essential to the success of ML model. Our context is to group various ski resorts near montreal, or more specifically my home into different clusters (term used in machine learning)

The columns represents all the features to consider when dealing with K-means.

name	distance(km)	altitude(m)	vertical drop(m)	Skiable area(acres)	lifts	night	easy	intermediate	difficult	extreme
st-bruno	3.4	175	134	50	4	true	10	6	2	1
bromont	86.1	565	385	450	9	true	35	54	26	26
owl’s head	139	753	540	163	8	false	15	17	9	12
orford	122	850	589	245	8	false	21	16	8	17
sutton	114	962	460	230	9	false	15	18	11	16
st-sauveur	91.3	416	213	142	7	true	9	9	16	6
olympia	92.8	440	200	80	3	true	14	10	6	7
morin heights	101	465	200	80	4	true	10	10	10	5
mont-blanc	143	580	208	140	7	true	7	12	18	6
mont-tremblant	160	875	645	665	11	false	22	28	32	14
jay peak	136	1209	656	385	9	false	14	31	34	0
la reserve	132	700	305	100	2	false	9	8	12	11
le massif	332	806	770	406	7	false	13	20	19	8
mont saint-anne	283	800	625	547	5	true	15	33	14	9
massif du sud	309	915	400	226	2	false	6	3	14	9
stoneham	262	593	346	333	4	true	8	11	16	7

Code Snippet

A quick dirty python code to generate all code needed for google map api


import io
import json
import numpy as np
from sklearn.cluster import KMeans

columns = (
    'distance','alt','vertical','skiable terrain','lifts','night',
    'easy','intermediate','difficult','extreme')

def parse_ski(stations):

    for col in columns:
        if col == 'night': # doesn't make sense to scale boolean value
            for station in stations:
                station[col] = int(station[col])
        else:
            # mean feature scaling
            buf = [station[col] for station in stations]
            minval = min(buf)
            maxval = max(buf)
            mean = sum(buf)/len(buf)
            for station in stations:
                station[col] = (station[col] - mean)/(maxval-minval)

    return io.StringIO('\n'.join(
        (','.join((str(station.get(col)) for col in columns))
        for station in stations)
    ))

# GPS location not shown in the table
stations = json.loads('your json raw data')
matrix_fp = np.loadtxt(parse_ski(stations), delimiter=',')
colors = {
    k:v
    for k, v in enumerate(('#4E4EB2','#FF5600','#66CC46','#99A695','#0001FF',))}

for num in (2, 3, 4, 5):
    print('clusters numbers: ', num)
    km = KMeans(num, init='k-means++').fit(matrix_fp)
    locations = {
        station['name']: {'center': station['gps'], 'color': colors[lbl], 'group': lbl}
        for station, lbl in zip(stations, km.labels_)
    }
    print(locations)

For each locations map we can visaulize the result in google map using the official demo examples.

// This example creates circles on the map, representing populations in North
// America.

//using the locations generated by python code
var citymap = {
  "jay peak": {"color": "#14CCC8", "center": {"lng": -72.5071207, "lat": 44.9376778}}, "Stoneham": {"color": "#FFDF43", "center": {"lng": -71.3978895, "lat": 47.0303657}}, "massif du sud": {"color": "#14CCC8", "center": {"lng": -70.4917626, "lat": 46.6213833}}, "st-bruno": {"color": "#FFDF43", "center": {"lng": -73.336873, "lat": 45.558709}}, "le massif": {"color": "#14CCC8", "center": {"lng": -70.59809, "lat": 47.2820407}}, "la reserve": {"color": "#14CCC8", "center": {"lng": -74.183668, "lat": 46.286398}}, "orford": {"color": "#14CCC8", "center": {"lng": -72.223443, "lat": 45.3176101}}, "olympia": {"color": "#FFDF43", "center": {"lng": -74.1552723, "lat": 45.9004148}}, "owl's head": {"color": "#14CCC8", "center": {"lng": -72.2977126, "lat": 45.0753163}}, "morin heights": {"color": "#FFDF43", "center": {"lng": -74.270762, "lat": 45.899502}}, "sutton": {"color": "#14CCC8", "center": {"lng": -72.564034, "lat": 45.104728}}, "st-sauveur": {"color": "#FFDF43", "center": {"lng": -74.1598336, "lat": 45.8815953}}, "mont-blanc": {"color": "#FFDF43", "center": {"lng": -74.4849394, "lat": 46.1090299}}, "mont-tremblant": {"color": "#B25B9F", "center": {"lng": -74.732755, "lat": 46.1756729}}, "mont saint-anne": {"color": "#B25B9F", "center": {"lng": -70.9409543, "lat": 47.0864416}}, "bromont": {"color": "#B25B9F", "center": {"lng": -72.6543549, "lat": 45.2909317}}
};

function initMap() {
  // Create the map.
  var map = new google.maps.Map(document.getElementById('map'), {
    zoom: 8,
    center: {'lat': 46.1587401,'lng': -71.0195173},
    mapTypeId: 'terrain'
  });

  // Construct the circle for each value in citymap.
  // Note: We scale the area of the circle based on the population.
  for (var city in citymap) {
    // Add the circle for this city to the map.
    var cityCircle = new google.maps.Circle({
      strokeColor: '#AABBAA',
      strokeOpacity: 0.5,
      strokeWeight: 1.5,
      fillColor: citymap[city].color,
      fillOpacity: 0.9,
      map: map,
      center: citymap[city].center,
      radius: 3500
    });
  }
}

Result

Before any analysis, we can think the best cluster number should be 3 or 4.

The primary difference is the size of moutain: vertical drop and skiable area. This basically decides the numbers of trails
The distance from home is also another important thing to consider
minor factor also includes whether openning at night

Cluster:2

2clusters

Cluster 3

Note the initialization have big impact on output, this is a perfect example showing two outcomes, but interestingly, I think both of them make sense.

The first result has Bromont as single cluster, which reminds some of AlphaGo’s moves are labelled as Go Seigen style. It’s the biggest resort in the area with night operation hours. 3clusters

The second puts Mont-Tremblant, Bromont and Saint-Anne together. All 3 are very tourist-oriented and successful in commercial perspective. ski

Cluster 4

4clusters

Cluster 5

5clusters