Multidimensional Scaling (MDS) with R

This page shows Multidimensional Scaling (MDS) with R. It demonstrates with an example of automatic layout of Australian cities based on distances between them. The layout obtained with MDS is very close to their locations on a map.

At first, the data of distances between 8 city in Australia are loaded from http://rosetta.reltech.org/TC/v15/Mapping/data/dist-Aus.csv.

dist.au <- read.csv("http://rosetta.reltech.org/TC/v15/Mapping/data/dist-Aus.csv")

Alternatively, we can download the file first and then read it into R from local drive.

dist.au <- read.csv("dist-Aus.csv")
dist.au
##    X    A   AS    B    D    H    M    P    S
## 1  A    0 1328 1600 2616 1161  653 2130 1161
## 2 AS 1328    0 1962 1289 2463 1889 1991 2026
## 3  B 1600 1962    0 2846 1788 1374 3604  732
## 4  D 2616 1289 2846    0 3734 3146 2652 3146
## 5  H 1161 2463 1788 3734    0  598 3008 1057
## 6  M  653 1889 1374 3146  598    0 2720  713
## 7  P 2130 1991 3604 2652 3008 2720    0 3288
## 8  S 1161 2026  732 3146 1057  713 3288    0

Then we remove the frist column, acronyms of cities, and set them to row names.

row.names(dist.au) <- dist.au[, 1]
dist.au <- dist.au[, -1]
dist.au
##       A   AS    B    D    H    M    P    S
## A     0 1328 1600 2616 1161  653 2130 1161
## AS 1328    0 1962 1289 2463 1889 1991 2026
## B  1600 1962    0 2846 1788 1374 3604  732
## D  2616 1289 2846    0 3734 3146 2652 3146
## H  1161 2463 1788 3734    0  598 3008 1057
## M   653 1889 1374 3146  598    0 2720  713
## P  2130 1991 3604 2652 3008 2720    0 3288
## S  1161 2026  732 3146 1057  713 3288    0

After that, we run Multidimensional Scaling (MDS) with function cmdscale(), and get x and y coordinates.

fit <- cmdscale(dist.au, eig = TRUE, k = 2)
x <- fit$points[, 1]
y <- fit$points[, 2]

Then we visualise the result, which shows the positions of cities are very close to their relative locations on a map.

plot(x, y, pch = 19, xlim = range(x) + c(0, 600))
city.names <- c("Adelaide", "Alice Springs", "Brisbane", "Darwin", "Hobart", 
    "Melbourne", "Perth", "Sydney")
text(x, y, pos = 4, labels = city.names)

 

mds1

By flipping both x- and y-axis, Darwin and Brisbane are moved to the top (north), which makes it easier to compare with a map.

x <- 0 - x
y <- 0 - y
plot(x, y, pch = 19, xlim = range(x) + c(0, 600))
text(x, y, pos = 4, labels = city.names)

 

mds2

MDS is also implemented in the igraph package as layout.mds.

library(igraph)
g <- graph.full(nrow(dist.au))
V(g)$label <- city.names
layout <- layout.mds(g, dist = as.matrix(dist.au))
plot(g, layout = layout, vertex.size = 3)

mds3

 

About Yanchang Zhao

I am a data scientist, using R for data mining applications. My work on R and data mining: RDataMining.com; Twitter; Group on Linkedin; and Group on Google.
This entry was posted in R and tagged . Bookmark the permalink.

6 Responses to Multidimensional Scaling (MDS) with R

  1. Pingback: PCA - FA - MDS | Pearltrees

  2. Pingback: Multidimensional Scaling (MDS) with R ← Patient 2 Earn

  3. Bob Muenchen says:

    Nice example! I did something similar long ago using cities in the US (done in SAS). I told it to just use the rankings of distances since I was preparing for a social science example. It gave an almost identical result, which made me think that there was a bug in PROC MDS that let it still use the actual mileage. So I ranked it before passing it into MDS and, of course, the result was the same. Then I thought about all the constraints these many comparisons made, NY City is closer to Miami than San Francisco, etc. (but not by how much since it’s just a rank) and I realized how it could do it. I was using airline mileage and when I asked for a third dimension, sure enough, it showed NY City and San Francisco were “close” and Topeka was far away (center of the country, so higher in elevation). Fun stuff!

  4. Rick says:

    Nice clear example, and it worked fine for me. However, is it correct that in the cmdscale() implementation of MDS there is no method to either calculate Stress or control or set the number of iterations used in developing the solution? [These do not appear as arguments under ‘?cmdscale()’]

  5. Pingback: Escalagem Multidimencional usando R | Mineração de Dados

  6. Pingback: Reconstructing geographic distance among cities using distributional semantics – Lucas M. Chang

Leave a comment