Friend List Name Statistics

Back in 2015, I did some basic analysis on the names of the people in my FB friend list. With the help of some app that lets you export Facebook contacts, I got a hold of all the names and then most common names, last names and so on. I was reminded of it recently, and felt like doing it again. It was easier, since Facebook now gives us a convenient way of downloading our data, and while it’s not very amenable to scripting etc., the list of friends is pretty straightforward to obtain. For this analysis, I took names from my Friend List, list of Followers and list of Pending Friend Requests. After de-duplication (between Followers and Pending Friends), I finally ended up with a list of 8127 people, who I will refer to hereafter as the friend pool.

Lots of names have 2 or more spellings variations, since transliteration of Indian languages into the Roman script, as I have discussed here, here and here, can be particularly inconsistent. To fix this, I first cleaned up the names by trying to merge spelling variations into one canonical form. There are several heuristics for doing this for English (such as Soundex), and Prashant [Sachdeva] and I had come up with a similar heuristic for Indian languages 4 years ago. I kinda sorta followed it, using a series of string replacements.

I also needed some specific rules to fix spelling variations in commonly occurring names. For instance, names like Siddharth are also sometimes spelled as Sidharth. And don’t get me started about all the ways in which people transliterate अग्रवाल. For this one, general heuristics were going to be useless, so I went with using a regular expression.

/agg?(ra|ar)(v|w)aa?ll?a?/

 

One pattern to bring them all, and in a RegEx bind them. Likewise for Goyal / Goel. The script is here. Note that I haven’t tried to be very efficient here—it’s a one-off quick-and-dirty thing.

There were a total of 2778 unique first names. Here are the top 20 most common first names, along with their frequency—

Abhishek      89
Aditya        81
Rahul         79
Gaurav        75
Saurabh       73
Ankit         67
Prateek       63
Ashish        60
Siddharth     60
Shubham       59
Rohit         58
Nikhil        57
Rishabh       52
Mohit         50
Akshay        49
Amit          47
Vaibhav       46
Akash         42
Yash          42
Vivek         41

There are no big surprises here, though I never expected Gaurav to be so high in the list. I would have thought Nikhil would be higher, Ankit would be lower, and Amit, Yash and Vivek won’t be present.

The top 20 first names account for 15% of the friend pool.

Let’s move on to family names. This distribution is more skewed than that for first names—

Singh         248
Jain          242
Agarwal       224
Sharma        217
Gupta         183
Kumar         181
Chaudhari     123
Shah          84
Mehta         83
Goyal         73
Yadav         63
Chauhan       58
Patel         57
Mathur        56
Joshi         53
Rathore       53
Meena         51
Varma         50
Shrivastava   49
Soni          48

The top 20 family names account for 27% of all family names.

Remember the RegEx I showed you to capture all Agarwal surnames? I was curious what the actual distribution of these was in my friend pool, so I ask the program to spit that out, and here’s how it looks—

Agarwal       121 (54%)
Agrawal       87 (39%)
Aggarwal      14 (6%)
Agarwalla     1
Agrawall      1

I knew the RegEx was going to be worth it!

Here’s a word cloud of the top 100 first names and last names, made using the wordcloud module in Python.

Word Cloud of Top 100 Last Names

firstnames_wc

Oh, and which were the most common full names overall? There is a large n-way tie at frequency = 4, but we have an undisputed (by a very small margin) winner!

Nikhil Jain       7
Rishabh Jain      6
Aayush Agarwal    6
Amit Kumar        6
Ankit Agrawal     6
Saurabh Sharma    5
Mohit Agrawal     5
Rahul Jain        5
Shreyansh Jain    4
Ajay Kumar        4
Rohit Sharma      4
Amit Singh        4
Abhishek Gupta    4
Abhimanyu Singh   4
Ashish Agrawal    4
Abhinav Gupta     4
Ankit Jain        4
Rahul Kumar       4
Prashant Jain     4
Akshat Jain       4
Vikash Kumar      4
Prateek Agarwal   4
Saurabh Gupta     4
Aditya Gupta      4
Arpit Agarwal     4
Nikhil Sharma     4
Aman Jain         4
manoj kumar       4

Back when I did this analysis for the first time, the app that I used to export FB data also exported birthdays. The FB data downloader doesn’t give you those, but from what I remember, 25th December was the most common birthday, followed by 14th Feb, and most birthdays were in the Jul-Aug-Sep quarter. That data also had sex, and the most common girl names were Neha, Puja, Priyanka, Akanksha, Shruti, Aditi, Divya and Garima (in that order). I wish I had an updated version of that data now, but I think it’s going to be tedious to collect it.

Is there any other analysis you’d like me to do on this? Just ask for it in the comment section!

Tags: , , ,

4 Responses to “Friend List Name Statistics”

  1. mayank lodha January 14, 2016 at 1:54 am #

    Antariksh, Most uncommon names?

  2. Abhilash Bhati January 14, 2016 at 2:17 am #

    cool. make it mapped by the most last names living in different places..?

  3. Abhilash Bhati January 14, 2016 at 2:18 am #

    cool . make it mapped by the last names living in different cities.

  4. Komal January 14, 2016 at 4:25 am #

    And what about the unique names ?

Leave a Reply