In the age of big data and machine learning, the data scientist role is certainly one of the most sought after by graduates. But which graduates are apt for the title of a ‘data scientist’?
There is no established data science education at universities; this much is plain obvious. It is logical to presume, therefore, that mathematicians and statisticians will be suitable for the job. But is the data scientist position exclusive to those fields? As any such professional would remark: this is a question worth looking into from a quantitative perspective. So, what does the data have to say?
We at 365 Data Science embarked on a journey to find the answer.
Before jumping to conclusions or slicing and dicing the data, we need context. The data was acquired from 1,001 data scientists’ LinkedIn profiles. Convenience sampling was employed, together with country quotas to limit bias: US (40%), UK (30%), India (15%), and other countries (15%). Unlike previous research, the data was taken from the data scientists’ LinkedIn profiles, which are perceived to be a good proxy of one’s resume.
The overall results of the study show that the ‘average data scientist’ is male, speaks 2 languages, and has a median work experience of 4.5 years, with 2 years of occupying the data scientist position. R and Python are the main tools they use, with 74% of the cohort ‘speaking’ at least one of these programming languages.
But what truly interests us in this article is education. Seventy-five percent of the sample hold a second-cycle university degree, either a Master’s (48%) or a PhD (27%), and 15% of the cohort hold a Bachelor’s. This demonstrates that the field is not PhD or Master’s exclusive.
That said, there are major differences when the location of employment is brought under consideration. According to the data, the US follows the overall trend we just discussed, but in the UK, the density of PhD-holders is much higher, standing at 37%. But while the UK presents a tough labour market, the booming economy of India is much more flexible: there, only 5% of data scientists are PhDs, whereas Bachelor’s degrees are much more common (25%).
Area of studies
That said, the information we can squeeze out of education level alone is a little limited. What is truly surprising is the heterogeneity of degree subjects. In the sample, there are a total of 472 unique degrees! The overall trend in degree choice, however, points towards ‘something quantitative’ with 90.4% of the sample falling under this umbrella term.
Given the large number of unique degrees (472), some kind of clustering is needed. Six major clusters can be identified with the top 2 being Computer Science (19.4%) and Statistics and Mathematics (19.3%). Surprisingly, Economics and Social Sciences (18.7%) complete the top 3, which strongly suggests that ‘math geeks’ do not dominate the field.
In addition to degree subjects, the university rankings of the Alma Mater of data scientists were also investigated. The ‘Times Higher Education’ world university ranking was employed to rank all universities. The hypothesis was that higher ranked universities will have a very strong representation in such a lucrative career. Moreover, ranking in general is a good predictor when it comes to salary and the data scientist is definitely close to top of the ‘salary chain’.
Unexpectedly, the data shows only 28% of the cohort graduated from a top 50 university. This is comparable with the portion of data scientists who came from institutions that were not in the 1100 universities, ranked by the Times (25%). So, perhaps something else is going on behind the scenes, right?
Given the aforementioned findings about subject degrees and cross-country differences, one must be perplexed as to how people from unranked universities managed to get their foot in the data science door.
Taking the experience of professionals such as dataquest.io founder, Vik Paruchuri, as a reference point, a plausible explanation is self-preparation. The only measurable metric that could have been taken advantage of is completion of online courses. The hypothesis obviously being that students from lower ranked universities catch up with their peers through self-preparation.
The data on whether a data scientist has listed having completed an online course or acquired a certificate gives a definitive answer. Sure enough, 40% of the sample had listed having taken at least one online course. Given the nature of the issue, this can be interpreted as: at least 40% of current data scientist took online courses. The underlying assumption here is that not everyone is willing to list all the courses they have completed, as an online course in Python would not add much value to the LinkedIn profile of a data scientist who has been working in Python for several years.
Segmenting self-preparation by university rank, it is clear that students from the lowest ranked or unranked universities were compensating with online courses. Clearly, those data scientists did not have the same on-campus recruitment or possibilities right after graduation, but they nonetheless managed to reach the top of the data science ladder through motivation and self-preparation.