Manually labelling data is nobodies favourite machine learning chore. You needn’t worry though about asking others to help out provided you can give them a pleasant tool for the task. Let me present to you: generated Google Forms using Google App Script!
The regular way people might label data is just by typing in the labels into a spreadsheet. I would normally do this as well, however in a recent task I needed to label paragraphs of text. Have you ever tried to read paragraphs of text in a spreadsheet?.. it’s hell! Luckily whilst trying to figure out a way to make the labelling process less gruelling I came across a way of auto generating a form based on data in a spreadsheet document using Google App Script.
Creating the script that will generate our Form
To get started we just jump into the App Script editor from within the Google Spread Sheet containing the data we want to gather labels for:
What’s great about using Forms for labelling is that you can guarantee consistency in the user input by specifying the data input type. For example:
item.createChoice('Is a cat')
Multi class label
See the details for more input types in the App Script API docs (or just look at the different input types when manually creating a Google Form).
You can grab the script I have used to generate a Form for labelling text documents with numbers 0 to 10 from my Github:
After you have your script written (or copy and pasted); you then select your scripts’ entry point and run it! Warning You’re probably going to have to jump through a few authorisation hoops the first time you do it.
Using the generated Form
After the script has run, you can head over to your Google Forms and there you should find a brand new Form! You can send the Form to whoever you want to do the labelling:
Accessing the data labels
After the labelling is done, you can then just view the labels as a spreadsheet and export as a CSV:
Hopefully this saves you a bit of headache in your future machine learning efforts!
The full script and dataset used in this article can be found on my Github:
Friendlier data labelling using generated Google Forms was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.