Finding friends via cosine similarity

If we have two numeric vectors of equal length, we can determine the angle between them. An angle of 0° would indicate that the two vectors coincide (maximal similarity), while an angle of 180° would mean that the vectors are directed opposite to each other (maximal dissimilarity). Here we will try to use this knowledge to estimate which pairs of people could make good friends.

Suppose that we know ten people and their preferences about various activities, e.g. whether they like: 1) walking in parks, 2) jogging, 3) climbing mountains, 4) swimming, 5) talking to strangers, 6) reading books, 7) playing chess, 8) writing code, 9) cooking, 10) taking photos, 11) travelling, 12) going to concerts, 13) going to conferences, 14) ice skating, 15) watching TV, 16) playing piano, 17) learning languages. In each of these dimensions we could assign a score from 1 to 5, depending on how strong their preference is. We could end up with a table like this:

1234567891011121314151617
Matilde35223411345224521
Shelby32343222444324224
Norris53524333234253411
Emanuel45514434245344124
Emilio34553552345211412
Nena44424411555213255
Chase32353342234112444
Altha45121421333121235
Ehtel52242331455424413
Cora55531314245344314

Now we can compute the cosine similarities between each two individuals. For instance, the vector describing Matilde's behavior is given by the sequence [3,5,2,2,3,4,1,1,3,4,5,2,2,4,5,2,1]. The cosine similarity between two vectors is defined as their dot product divided by the product of their norms. This allows us to create a matrix where each cell is color-coded such that white cells show the lowest similarity and black cells indicate the highest similarity between people.

MatildeShelbyNorrisEmanuelEmilioNenaChaseAlthaEhtelCora
Matilde
Shelby
Norris
Emanuel
Emilio
Nena
Chase
Altha
Ehtel
Cora

Since the similarities vary from 0.7794 to 0.9559, we need to scale our presentation of the results accordingly. As we can expect, the matrix is symmetric with diagonal values which are meaningless, since noone can be friends with themselves. From the colors, we see that Altha and Norris are among the least compatible people. On the contrary, Shelby and Ehtel show the highest similarity, followed by the pairs Emanuel-Cora (0.9472), Shelby-Nena (0.9299), Emanuel-Norris (0.9283) and Chase-Emilio (0.9260). These individuals have the highest chance of becoming friends, given this data.

But finding good friends is a lot more than a simple numeric analysis. Very often we change in unexpected ways over the course of our lives, which means that even people with interests that were initially close to ours can become progressively different over time and people which were very different from us can change and become more like us over time. This means that such a matrix quickly becomes instable as it can't capture dynamic behavioral changes. We still have to live with our gut feeling.