If we have two numeric vectors of equal length, we can determine the angle between them. An angle of 0° would indicate that the two vectors coincide (maximal similarity), while an angle of 180° would mean that the vectors are directed opposite to each other (maximal dissimilarity). Here we will try to use this knowledge to estimate which pairs of people could make good friends.

Suppose that we know ten people and their preferences about various activities, e.g. whether they like: 1) walking in parks, 2) jogging, 3) climbing mountains, 4) swimming, 5) talking to strangers, 6) reading books, 7) playing chess, 8) writing code, 9) cooking, 10) taking photos, 11) travelling, 12) going to concerts, 13) going to conferences, 14) ice skating, 15) watching TV, 16) playing piano, 17) learning languages. In each of these dimensions we could assign a score from 1 to 5, depending on how strong their preference is. We could end up with a table like this:

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Matilde | 3 | 5 | 2 | 2 | 3 | 4 | 1 | 1 | 3 | 4 | 5 | 2 | 2 | 4 | 5 | 2 | 1 |

Shelby | 3 | 2 | 3 | 4 | 3 | 2 | 2 | 2 | 4 | 4 | 4 | 3 | 2 | 4 | 2 | 2 | 4 |

Norris | 5 | 3 | 5 | 2 | 4 | 3 | 3 | 3 | 2 | 3 | 4 | 2 | 5 | 3 | 4 | 1 | 1 |

Emanuel | 4 | 5 | 5 | 1 | 4 | 4 | 3 | 4 | 2 | 4 | 5 | 3 | 4 | 4 | 1 | 2 | 4 |

Emilio | 3 | 4 | 5 | 5 | 3 | 5 | 5 | 2 | 3 | 4 | 5 | 2 | 1 | 1 | 4 | 1 | 2 |

Nena | 4 | 4 | 4 | 2 | 4 | 4 | 1 | 1 | 5 | 5 | 5 | 2 | 1 | 3 | 2 | 5 | 5 |

Chase | 3 | 2 | 3 | 5 | 3 | 3 | 4 | 2 | 2 | 3 | 4 | 1 | 1 | 2 | 4 | 4 | 4 |

Altha | 4 | 5 | 1 | 2 | 1 | 4 | 2 | 1 | 3 | 3 | 3 | 1 | 2 | 1 | 2 | 3 | 5 |

Ehtel | 5 | 2 | 2 | 4 | 2 | 3 | 3 | 1 | 4 | 5 | 5 | 4 | 2 | 4 | 4 | 1 | 3 |

Cora | 5 | 5 | 5 | 3 | 1 | 3 | 1 | 4 | 2 | 4 | 5 | 3 | 4 | 4 | 3 | 1 | 4 |

Now we can compute the cosine similarities between each two individuals. For instance, the vector describing Matilde's behavior is given by the sequence [3,5,2,2,3,4,1,1,3,4,5,2,2,4,5,2,1]. The cosine similarity between two vectors is defined as their dot product divided by the product of their norms. This allows us to create a matrix where each cell is color-coded such that white cells show the lowest similarity and black cells indicate the highest similarity between people.

Matilde | Shelby | Norris | Emanuel | Emilio | Nena | Chase | Altha | Ehtel | Cora | |
---|---|---|---|---|---|---|---|---|---|---|

Matilde | ||||||||||

Shelby | ||||||||||

Norris | ||||||||||

Emanuel | ||||||||||

Emilio | ||||||||||

Nena | ||||||||||

Chase | ||||||||||

Altha | ||||||||||

Ehtel | ||||||||||

Cora |

Since the similarities vary from 0.7794 to 0.9559, we need to scale our presentation of the results accordingly. As we can expect, the matrix is symmetric with diagonal values which are meaningless, since noone can be friends with themselves. From the colors, we see that Altha and Norris are among the least compatible people. On the contrary, Shelby and Ehtel show the highest similarity, followed by the pairs Emanuel-Cora (0.9472), Shelby-Nena (0.9299), Emanuel-Norris (0.9283) and Chase-Emilio (0.9260). These individuals have the highest chance of becoming friends, given this data.

But finding good friends is a lot more than a simple numeric analysis. Very often we change in unexpected ways over the course of our lives, which means that even people with interests that were initially close to ours can become progressively different over time and people which were very different from us can change and become more like us over time. This means that such a matrix quickly becomes instable as it can't capture dynamic behavioral changes. We still have to live with our gut feeling.