How I trained a language detection AI in 20 minutes with a 97% accuracy

Weird — I actually kind of look like that guyThis story is a step-by-step guide to how I built a language detection model using machine learning (that ended up being 97% accurate) in under 20 minutes.Language detection is a great use case for machine l…

Conspiracy Theories about AI Self-Driving Cars

By Lance Eliot, the AI Trends Insider What do these aspects have in common: Roswell, Area 51, JFK assassination, moon landing, AI self-driving cars? They are all considered by some to be conspiracies. Most of us are familiar with Roswell, New Mexico and the claims of a UFO that landed there. Likewise, most of us […]

Diabetes Prediction — Artificial Neural Network Experimentation

Diabetes Prediction — Artificial Neural Network ExperimentationBeing a data science profesional, we tend to learn about all the available techniques to crunch our data and deduce meaningful insights from them. In this article, I have described my exper…

Friendlier data labelling using generated Google Forms

Manually labelling data is nobodies favourite machine learning chore. You needn’t worry though about asking others to help out provided you can give them a pleasant tool for the task. Let me present to you: generated Google Forms using Google App Script!

Google App Scripts allow you to build automation between Google Apps

The regular way people might label data is just by typing in the labels into a spreadsheet. I would normally do this as well, however in a recent task I needed to label paragraphs of text. Have you ever tried to read paragraphs of text in a spreadsheet?.. it’s hell! Luckily whilst trying to figure out a way to make the labelling process less gruelling I came across a way of auto generating a form based on data in a spreadsheet document using Google App Script.

Nasty! Nobody wants to strain their eyes trying to read documents in spreadsheet cells!

Creating the script that will generate our Form

To get started we just jump into the App Script editor from within the Google Spread Sheet containing the data we want to gather labels for:

Opening the App Script editor from a Google Spreadsheet

Using App Script (pssst! it’s just JavaScript) we can read the spreadsheet data and send commands to other Google Apps (in this case the Google Forms).

What’s great about using Forms for labelling is that you can guarantee consistency in the user input by specifying the data input type. For example:

Number range:

form.addScaleItem()
.setTitle(dataToLabel)
.setBounds(1, 10)
.setRequired(true);

Binary label:

form.addCheckboxItem()
.setTitle(dataToLabel)
.setChoices([
item.createChoice('Is a cat')
])

Multi class label

form.addMultipleChoiceItem()
.setTitle(dataToLabel)
.setChoices([
item.createChoice('Cats'),
item.createChoice('Dogs'),
item.createChoice('Fish')
])

See the details for more input types in the App Script API docs (or just look at the different input types when manually creating a Google Form).

You can grab the script I have used to generate a Form for labelling text documents with numbers 0 to 10 from my Github:

ZackAkil/friendlier-data-labelling

After you have your script written (or copy and pasted); you then select your scripts’ entry point and run it! Warning You’re probably going to have to jump through a few authorisation hoops the first time you do it.

Make sure to select the entry point function of the script before running.

Using the generated Form

After the script has run, you can head over to your Google Forms and there you should find a brand new Form! You can send the Form to whoever you want to do the labelling:

Finally you can send your labellers a convenient link to a familiar Google Form that they can use to carry out the labelling task.

Accessing the data labels

After the labelling is done, you can then just view the labels as a spreadsheet and export as a CSV:

It’s pretty straight forward to get the labels out as a CSV.

Hopefully this saves you a bit of headache in your future machine learning efforts!

The full script and dataset used in this article can be found on my Github:

ZackAkil/friendlier-data-labelling


Friendlier data labelling using generated Google Forms was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Responsible deployment of machine learning

[A version of this post appears on the O’Reilly Radar.] We need to build machine learning tools to augment our machine learning engineers. In this post, I share slides and notes from a talk I gave in December 2017 at the Strata Data Conference in Singapore offering suggestions to companies that are actively deploying products … Continue reading Responsible deployment of machine learning

Behavioral Analysis of GitHub and StackOverflow Users

In this blog post, we will take a look at the activity on websites that
became a significant part of development across all areas in, as well as,
outside of Data Science: GitHub and StackOverflow. It doesn’t matter where
developers are from or what their specific focus is, everyone uses these
websites.