Data-informed decisions now act as one key driver for good judgements. In the school context, it includes collecting different data points throughout students’ journey, teacher’s performance, operation processes and so on. However, working with top educational organizations, we see a recurring pattern that hinders data analysis: data (input) quality.
Data quality of many schools and institutions is largely fragmented, leading to significant time use on data prep - the process of streamlining raw data into a form that is readily available for analyzed.
One Example of Data Prep
50 students of all come from a school called: Hanoi Innovative Schools. Without consistency in data inputting, the Data team may get something like this:
As presented, data team must manually correct
the School Name into the same values and;
the Parent Phone Number into number value
Only then could they calculate the total number of students come from Hanoi Innovative Schools and set up automated SMS channels for fast communication.
What Data Analyst Says
According to a recent survey of CrowdFlower - provider of a “data enrichment” platform for data scientists, this task accounts for 80% of data analysts work.
Ironically, 76% of analysts considered it as least enjoyable.
Data prep, in large dose, strips data analysts and scientists from valuable resources such as time and motivation by repetitive, manual and unrewarding tasks.
Simple Steps to Streamline
1. Collecting Data with Purpose
Leaders got input on how to collect data from hundreds of best practitioners and high performing schools. It can cause difficulties in selecting the data to be recorded and platform to log up all type of data.
In this situation, school leaders must clearly define and articulate the “WHY – Why do we do it?” behind each data-drive strategy - whether it focuses on student learning environment, teacher well-being, or any other purposes. They must decide which target to prioritize.
On this foundation, leaders can create better alignment among data points, reduce the waste input and even yield better, structured and more connected results.
2. Prevent Data Defects
Data defects are often caused by input errors (e.g.: typos, missing information, redundant data collection, irrelevant format). One simple way to fix the input where possible:
Take the phone number data as an example. Instead of letting users type in data freely, fix that cell as “number only” and provide examples.
If this data set contains thousands of students’ data, this easy trick could save hours of data correction.
3. Enable Automation, Adjust and Refine
For the ease of data collecting, analysts can (1) use relevant input data as reference for future use and (2) formulate pre-set data input.
(1) Future Use
For example, if a school run a background check on all teachers, this data (after cleaning) should be codified and served as source for other data-driven projects. When the next project starts, teacher should be able to fill in their personalized code, and every other fields (e.g.: address, age, teaching grade, email address, and so on) should be auto-filled. It would eliminate the time spent on cleaning data after the first time.
Similarly, for transferring students, analysts can list out some schools that most transferred schools in previous years and pre-fill the detail beforehand. After students choose their old school from a drop-down list, all fixed details should follow.
Adjust and Refine
Data collecting processes should be a work-in-process. For each iteration, leaders should take a participative approach towards filtering which data should be in more details or which should be trim off.
Again, we urge our leaders to gain clarity on why and how your organizations should collect their data against streams of advices from “top practitioners” to obtain the best results without squandering the budget on low value-added tasks.