This guest post was written by Restart volunteer Steve Cook. We’d like to thank Steve for his lengthy service to Restart. Read on to learn more about his invaluable contributions.
Our Fixometer system enables community teams to record data on their repair events, and provides them with metrics that help to quantify the beneficial impact of their activities. Event organisers use it to record data such as the type of devices brought to Restart Parties, and what the repair outcome was.
The patterns and trends in this data can give us powerful insights into the repair movement, and there are now over 6,000 device repair attempts captured in the Fixometer. The data also highlights opportunities for improving the Fixometer, a virtuous circle of feedback that will improve the quality and quantity of data being captured.
Here at Restart HQ we have been exploring tools and methods for working with this data, which is an ongoing process so I’ll describe the journey so far.
Querying our data
Our first objective was to create a quick and flexible way to query the Fixometer data and assemble dashboards of summary results. The Fixometer itself has some of the latter but isn’t intended for ad-hoc queries, so we were looking for a separate solution. We evaluated re:dash, Grafana and Tableau, but settled on Metabase for its ease of use, simplicity, widespread adoption and open source pedigree. Metabase gives us powerful reporting and charting tools:
This makes it much easier to draw inferences or conclusions from the data, for example:
- Over 90% of paper shredders were successfully repaired (most likely because they are electrically and mechanically simple, and commonly fail due to blockages which can be removed)
- Only around 1 in 3 of the cameras we see are repaired successfully (potentially due to the high level of miniaturisation, complexity and fragility of small mechanical parts, likelihood of physical impact damage and non-availability of spare parts)
We also can dashboard our event activities, showing the growth over time and the geographical spread:
In general what Metabase gives us is very powerful, but it also highlights the importance of accuracy in the underlying data. So, we began analysing the details.
Data quality challenges
We found that the main challenge on data quality is in the Fixometer’s device data; this data originates from each repair event, where it is typically first recorded on a flipchart or similar offline system and then entered into Fixometer after the event.
There are two main data points we want to capture on each device:
- What was it? What category of device, and ideally what brand and model?
- What was the outcome? Was it repaired successfully, not repairable, or needs further effort?
The category drives our calculation of CO2 footprint and the outcome determines impact (e.g. a repaired device is likely to mean no replacement device needs to be manufactured, which drives the CO2 impact calculation).
Because this data is recorded ‘in the field’, some gaps and inconsistencies are to be expected. We used Metabase to create dashboards on data quality, showing metrics on missing brand/model for example, and completed a manual ‘data wrangling’ exercise to review and re-categorise devices that had been recorded in the ‘Misc’ category.
The first pass of this was done using a spreadsheet to edit the data, and then applying the changes to a Fixometer test environment to validate the impact on the calculated metrics. We also looked at using data wrangling tools for this, including OpenRefine and Trifacta; these may be useful in future as they’re well suited to the task.
Manual review was also very useful in identifying where we might want to create new categories, for items such as sewing machines which occur fairly frequently but don’t currently have a dedicated category. A big thanks to Restart volunteer Stefania Fantini for her fine-grain review of the data in this project.
Next, we looked at how best to align data with a common standard in order to ‘normalise’ it. We saw several variant spellings of brand names, and also some data where model and brand had been conflated; to avoid this it would be preferable for the Fixometer to guide the user towards standardised forms of each. For example:
|Brand (current)||Model (current)||Brand (target)||Model (target)|
|iPhone 4S||(blank)||Apple||iPhone 4S|
|Black & Deckre||drill||Black & Decker||Drill|
|Black and decker||sander||Black & Decker||Sander|
This would need a clean source of reference data for brands and models; for this we investigated using open data sources including WikiData and Open Knowledge International, the iFixit product set, Google Product identifiers and the World Intellectual Property Organization brand database, as well as self-referencing the data from Fixometer.
WikiData in particular has the benefit of being open, actively maintained, with good query tools and a well established model for describing an item (e.g. “iPhone 4S” is a subclass of “iPhone”, which is a subclass of “Mobile Device”, and is manufactured by “Apple”).
Google have been working towards standardisation of product data but is really intended for merchants, didn’t seem particularly accessible and is unlikely to include legacy products.
The iFixit dataset is especially appropriate given it is geared toward repair, and the iFixit website’s product selector for repair guides is a good example of how to prompt a user to provide detailed input without it being tedious.
This needs further work to reach a conclusion, and should be done as part of defining the Open Repair Data Standard.
Machine learning and the future
Finally we looked at using Machine Learning to automatically (re)classify each device record based on existing data (category, brand/model and comments).
We used uclassify.com, a web service for training and using classifiers which could be integrated with Fixometer. Early results on this are promising, particularly for automatically assigning the device category based on the brand/model/comment.
This would be useful for further data cleansing, and could potentially simplify the data capture process in the Fixometer, which would improve data quality.