Data collection

Method Snapshot

Data Source Why did we included it ? What did we do with it? Size
Facebook Facebook posts were included to explore what business schools are posting about: Do they mention learning activities, skills students are learning, important events that happen on the campus? Facebook posts were not modified. Very large posts and non-English posts were excluded. 265,743 tweets
Twitter Tweets were included to explore what business schools are tweeting about: Do they mention learning activities, skills students are learning, important events that happen on the campus? Tweets were not modified. Non-English tweets were excluded. 386,882 posts
LinkedIn For each business school we collected aggregated information about their students. We did not collect personal information, but only the top 15 skills, courses, and employers. We categorised the skills according to the 21st century skills and data-driven decision-making skills. Top 15 observations/institution
Indeed Not all employers are equal. Some provide nourishing work environments. These give graduates the chance to build on their skills. Other's are less good for employability and better for pure short-term task execution. We did a sentiment analysis on a subset of the reviews to create a score for each company. The score ranges from +1 (very positive emotions attached to the company) to -1 (very negative emotions attached to the company). 832,368 reviews in total,
Digital Skill Survey (EC) This survey provides information about the skills that are currently being used in the European business world. The result of this survey provides thus information with regard to what skills students should at least be knowledgeable in. We did not transform the data. 7,800 participants
Adult Skill Survey (OECD) This survey provides some information about the skill level of adults in different countries. We did not transform the data. 34 countries
Academic Vacancies Vacancies were investigated to discover if business schools are seeking out talented employees who can teach the skills graduates need in their workplace. We explored the vacancies using natural language processing to get a set of topics which group skills together. 16,318 vacancies


Our sample consisted of 150 business schools that are accredited by EQUIS , the European accreditation association conducting quality assurance of business school. Our sample of institutions is located in 42 countries.

Social MEdia

For each institution, we captured their social media activity between January 1st 2011 and September 7th 2017. We focused on content curated by the institutions on Twitter and Facebook . In practice this means, the data contains all tweets that institutions created or retweeted, and all posts that the institutions posted on their official Facebook page. This excludes students comments, and @ mentions. This resulted in a dataset of 975,574 social media posts. Due to language and technical issues we focused on posts that were in English and contained no more than 4000 bites, between 1024 and 4096 characters. After data cleaning our database contained a total of 652,625 posts, consisting of 386,882 tweets and 265,743 Facebook posts.
Using literature we created skill categorize to help exploring the data. We focused on three data-driven decision making skills (Big data, data science, and relational big data), and three 21st century skills (collaboration, communication, ICT literacy, and Citizenship). For each skill category we developed a thesaurus, a list of words that describe the skill .


Information about students’ self-reported skills and employment information was gathered using LinkedIn’s school pages . This does not include any personal information. The school's pages provide aggregated information about the top self-reported skills and current employers of graduates who have attended the institutes during our selected time period.

Employment Review

Once our students graduate from their programs, they will be working for companies, big and small. We wanted to understand what it means to work for these companies. Are they innovative? Do they provide a good working climate? To get insights about this, we decided to gather information about what employees are saying about their employers. Therefore, we collected employee reviews from the recruitment site indeed . We scrapped reviews from companies where our graduates worked with the aim to rank companies according to how favorable the reviews were. To make the analysis manageable, for each company we took a sample of 1000 reviews, calculated the sentiment for each review, and averaged the sentiments for each company. These steps were conducted five times, and the sentiment values for each company were averaged across these five analysis cycle. Our ranking ranged from +1 (good company) to -1 (bad company). The mean sentiment was 0.64 (sd = 0.1). The lowest score was -0.18 and the highest score was 0.85.

Requirements for Academic Jobs

With the aim to understand the research and teaching expertise of business school we collected information about vacancies from academic institutes. We collaborated with Academic Transfer , a company publicizing vacancies in the Dutch Academic labor market. From their dataset we analyzed vacancies across all scientific disciplines from 2010-2017. We conducted an automatic topic analyzes on the vacancies description and requirements ( Latent Dirichlet Analysis ). We analyzed the word distributions across several different models and decided that the clustering of words into topics wast most insightful with eleven topics. The results are reported in [add tab]. Each circle represents one topic. By selecting a topic, the words and their probability to be part of this topic are shown.

National Data

To gain better insight into the labor market and skill level, we searched for open data sets. The European Commission conducts research on digital skills . From this research project, we included the Digital Skill Survey . Specifically, we added survey respondents’ information about their company (size of company, country of HQ, job title) and their estimates about digital skills needed and possessed across different job categorize (professionals, managers, sales). In the survey’s code book these questions were Q16, and Q23 for managers, professionals, technicians, clerical support workers, and sales workers.
Also from the European commission, we included Eurostat’s dataset on managers’ perception about digital skill of employees, downloaded via Datamarket .
From the OECD , we included results from the 2013 Adult Skill Survey (PIAAC), which provides country information about abilities, work style, skill, and knowledge. The survey assessed literacy, numeracy, and problem-solving in technology rich environments .
From OECD we also included two tables with country specific trend data about percentage of workers who report structural change in their workplace and percentage of workers who report new ways of working in the workplace (1990 – 2009).

Skill Category Words
Collaboration teamwork teaming "collaboration skill" negotiat* compromise "shared responsibility" "team member" "individual contribution"
Communication "communication skill" mother?tongue foreign?tongue language "written communication" "verbal communication" "interpret world" "interpret reality" vocabulary vocab grammar grammatical "verbal interaction" dialogue intercultural "listening skill" "speaking skill" "reading skill" "writing skill" "spoken message"
ICT Literacy "information literacy" "digital environment" "digital skill" "process information" "organize information" "transform information" "create knowledge" "create information" "research skill" problem?solving~ "integrate information" "summarize information" "summarise information" "analyze information" "analyse information" "interpret information" "model information" "ICT literacy" "ICT skill"
Citizenship citizenship "societal literacy" "cultural literacy" "personal competence" "personal skill" "interpersonal competence" "interpersonal skill" "intercultural competence" "intercultural skill" "social life" "working life" "codes of conduct" manners norms multi?cultural socio?economic tolerance viewpoint "negotiating skill" empathy assertiveness integrity citizenship "civil right" solidarity "value systems" "ethnic group" "religious group" responsibility "shared value"
Big Data Apache Hadoop Hive Java Spark Yarn Scala SQL TensorFlow Python "streaming data analytics"~ "Temporal data analytics"~ "geospatial data analytics"~ "network analysis" "network analytics" "text analysis" "text analytics" "latent semantic analysis" lsa Database program?ing MapReduce HDFS NoSQL HBase Cassandra MongoDB "C++" "C#" "Structured Query language"
Relational Big Data Data?warehousing~ dimensional?modeling "extraction transformation and loading" "dimensional data visualization" dashboard scorecard "advanced SQL" "procedural extensions" "intermediate SQL" "relational data model" normalization "ER diagrams" Hive
Data Science "Data products development" "R Shiny" "deep learning" "reinforcement learning" "ensemble learning" "neural network"~ "natural language processing" nlp "random forests"~ "uni-directed graphical" math statistic~ classification Regression "support vector machines"~ svm algorithm "linear regression" "nonlinear regression" "logistic regression" "link analysis" "sequence analysis" "cluster analysis" "graph-based techniques"~ k?means "hierarchical clustering" "text mining"~ "web analytics"~ "linear algebra" calculus probability R?program?ing