Ski Resort Clusterring near Montreal

November 18, 2017 by Byron ZHU

massif

Rational

My 3rd ski season in montérégie is coming, and this post uses a very basic machine-learning-unsupervised-clusterring algorithm to show an interesting analysis about how to choose your next ski journey.

Algorithm

Quote from wiki page

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

kmeans

illustration

Data

Collecting data is always essential to the success of ML model. Our context is to group various ski resorts near montreal, or more specifically my home into different clusters (term used in machine learning)

The columns represents all the features to consider when dealing with K-means.

name distance(km) altitude(m) vertical drop(m) Skiable area(acres) lifts night easy intermediate difficult extreme
st-bruno 3.4 175 134 50 4 true 10 6 2 1
bromont 86.1 565 385 450 9 true 35 54 26 26
owl’s head 139 753 540 163 8 false 15 17 9 12
orford 122 850 589 245 8 false 21 16 8 17
sutton 114 962 460 230 9 false 15 18 11 16
st-sauveur 91.3 416 213 142 7 true 9 9 16 6
olympia 92.8 440 200 80 3 true 14 10 6 7
morin heights 101 465 200 80 4 true 10 10 10 5
mont-blanc 143 580 208 140 7 true 7 12 18 6
mont-tremblant 160 875 645 665 11 false 22 28 32 14
jay peak 136 1209 656 385 9 false 14 31 34 0
la reserve 132 700 305 100 2 false 9 8 12 11
le massif 332 806 770 406 7 false 13 20 19 8
mont saint-anne 283 800 625 547 5 true 15 33 14 9
massif du sud 309 915 400 226 2 false 6 3 14 9
stoneham 262 593 346 333 4 true 8 11 16 7

Code Snippet

A quick dirty python code to generate all code needed for google map api


import io
import json
import numpy as np
from sklearn.cluster import KMeans

columns = (
    'distance','alt','vertical','skiable terrain','lifts','night',
    'easy','intermediate','difficult','extreme')

def parse_ski(stations):

    for col in columns:
        if col == 'night': # doesn't make sense to scale boolean value
            for station in stations:
                station[col] = int(station[col])
        else:
            # mean feature scaling
            buf = [station[col] for station in stations]
            minval = min(buf)
            maxval = max(buf)
            mean = sum(buf)/len(buf)
            for station in stations:
                station[col] = (station[col] - mean)/(maxval-minval)

    return io.StringIO('\n'.join(
        (','.join((str(station.get(col)) for col in columns))
        for station in stations)
    ))

# GPS location not shown in the table
stations = json.loads('your json raw data')
matrix_fp = np.loadtxt(parse_ski(stations), delimiter=',')
colors = {
    k:v
    for k, v in enumerate(('#4E4EB2','#FF5600','#66CC46','#99A695','#0001FF',))}

for num in (2, 3, 4, 5):
    print('clusters numbers: ', num)
    km = KMeans(num, init='k-means++').fit(matrix_fp)
    locations = {
        station['name']: {'center': station['gps'], 'color': colors[lbl], 'group': lbl}
        for station, lbl in zip(stations, km.labels_)
    }
    print(locations)

For each locations map we can visaulize the result in google map using the official demo examples.

// This example creates circles on the map, representing populations in North
// America.

//using the locations generated by python code
var citymap = {
  "jay peak": {"color": "#14CCC8", "center": {"lng": -72.5071207, "lat": 44.9376778}}, "Stoneham": {"color": "#FFDF43", "center": {"lng": -71.3978895, "lat": 47.0303657}}, "massif du sud": {"color": "#14CCC8", "center": {"lng": -70.4917626, "lat": 46.6213833}}, "st-bruno": {"color": "#FFDF43", "center": {"lng": -73.336873, "lat": 45.558709}}, "le massif": {"color": "#14CCC8", "center": {"lng": -70.59809, "lat": 47.2820407}}, "la reserve": {"color": "#14CCC8", "center": {"lng": -74.183668, "lat": 46.286398}}, "orford": {"color": "#14CCC8", "center": {"lng": -72.223443, "lat": 45.3176101}}, "olympia": {"color": "#FFDF43", "center": {"lng": -74.1552723, "lat": 45.9004148}}, "owl's head": {"color": "#14CCC8", "center": {"lng": -72.2977126, "lat": 45.0753163}}, "morin heights": {"color": "#FFDF43", "center": {"lng": -74.270762, "lat": 45.899502}}, "sutton": {"color": "#14CCC8", "center": {"lng": -72.564034, "lat": 45.104728}}, "st-sauveur": {"color": "#FFDF43", "center": {"lng": -74.1598336, "lat": 45.8815953}}, "mont-blanc": {"color": "#FFDF43", "center": {"lng": -74.4849394, "lat": 46.1090299}}, "mont-tremblant": {"color": "#B25B9F", "center": {"lng": -74.732755, "lat": 46.1756729}}, "mont saint-anne": {"color": "#B25B9F", "center": {"lng": -70.9409543, "lat": 47.0864416}}, "bromont": {"color": "#B25B9F", "center": {"lng": -72.6543549, "lat": 45.2909317}}
};

function initMap() {
  // Create the map.
  var map = new google.maps.Map(document.getElementById('map'), {
    zoom: 8,
    center: {'lat': 46.1587401,'lng': -71.0195173},
    mapTypeId: 'terrain'
  });

  // Construct the circle for each value in citymap.
  // Note: We scale the area of the circle based on the population.
  for (var city in citymap) {
    // Add the circle for this city to the map.
    var cityCircle = new google.maps.Circle({
      strokeColor: '#AABBAA',
      strokeOpacity: 0.5,
      strokeWeight: 1.5,
      fillColor: citymap[city].color,
      fillOpacity: 0.9,
      map: map,
      center: citymap[city].center,
      radius: 3500
    });
  }
}

Result

Before any analysis, we can think the best cluster number should be 3 or 4.

  • The primary difference is the size of moutain: vertical drop and skiable area. This basically decides the numbers of trails
  • The distance from home is also another important thing to consider
  • minor factor also includes whether openning at night

Cluster:2

2clusters

Cluster 3

Note the initialization have big impact on output, this is a perfect example showing two outcomes, but interestingly, I think both of them make sense.

The first result has Bromont as single cluster, which reminds some of AlphaGo’s moves are labelled as Go Seigen style. It’s the biggest resort in the area with night operation hours. 3clusters

The second puts Mont-Tremblant, Bromont and Saint-Anne together. All 3 are very tourist-oriented and successful in commercial perspective. ski

Cluster 4

4clusters

Cluster 5

5clusters

© 2018 | 朱曉清 | powered by Hugo