This article was first published on Towards Data Science - Medium.
Whether you’re a new startup or an existing business, here’s one way you can get an AI-enabled product or service into production in 1 week or less. And you certainly don’t have to just take my word for it, I’ll share with you some tools you can use to really speed things up.
Pick a problem that can be solved with machine learning. This sounds obvious, but believe me it isn’t always. I’ve written a lot about good and bad use cases for machine learning, so have a look at these or do some research and make sure you’re tackling something that is possible.
Think tagging product images, recommending content, making data more searchable or classifying Tweets, not predicting the stock market or building a self-driving car.
There are actually a ton of really good use cases for machine learning and AI, the key is to think simple with the idea that you can get more complicated over time.
Let’s take the idea of a news aggregation website that lets your users search by topics, people, and content.
First, sit down and decide what categories of news you want your users to be able to browse through. Perhaps they are science, sports, politics, health, technology, and crime.
You could spend the rest of your first day gathering data. Go out and grab as many articles as you can in those categories and organize them into folders with the names of the categories.
It’s important you have an equal number of examples for each category.
Make sure you have 100–500 (the more the better) examples of articles in each category. Now comes the fun part. Visit https://machinebox.io and download Classificationbox. You’ll have to sign up with your e-mail, install Docker, and run a Terminal command. Shouldn’t take more than a few minutes.
Once Classificationbox is up and running, clone this open source tool that lets you create a machine learning model from text files on your computer. Run textclass on your folders with the sample articles in them.
/training-data /sports sports-example1.txt sports-example2.txt sports-example3.txt /politics politics-example1.txt
After a few seconds, you’ll have a machine learning model that can automatically classify news articles. Assuming the data set was good, your accuracy should be above 90%. If it isn’t, consider cleaning the data set a bit or simply adding another 100 samples or so to each category.
It’s not even 10am and you’ve already got a machine learning model ready to go into production. But let’s keep going.
The next thing you’re going to want to do is make your articles searchable, but as you start aggregating millions of articles, you’ll quickly hit a wall if you’re trying to index every word. Further more, if you search for the word Radcliffe you want to be able to return results that are relevant, whether that person is searching for the place or news about Daniel Radcliffe.
Once again, visit Machine Box and download Textbox. Textbox is a keyword and named entity extraction tool that will give you a nice bit of metadata for every news article that will work well in Elastic Search. There’s some great detail on how to do this here, but the highlights are:
- Filter search by people, places, time etc.
- Easily build visualizations of the entirety of the data set and see what is trending.
- Process and analyze the data blazingly fast
- And a whole lot more…
Articles sometimes come with photos of people and places. You might want to be able to tag people in articles, and tag things found in images to further improve search and categorization.
Although Tagbox is teachable with custom labels, it also comes pretrained with tens of thousands of labels so it is really easy to start with without having to put in any work. When you start writing the web services part of the app, you can easily just pass every photo into Tagbox and store the response in Elastic Search. You might even just decide to pick the top two tags, or tags above a certain confidence.
You might also want to start recognizing people in photos inside the articles. The first step is to pass every photo into Facebox, and then store the Faceprint in your database. You can read more about how the Faceprint works here.
Once you’ve built up a nice dataset of Faceprints, you can, at any point, go through and teach Facebox who these people are. It will not only recognize these people you teach it in new photos you give to it, but you can apply the learning back through time to the photos you’ve already processed by simply updated the relationship of the Faceprint to name in your database.
Now that you’ve got the machine learning models ready to go, you could start to build the web app in the afternoon of Day 3.
Spend these two days building the web app, aggregating news stories, deciding how to display the information to your customers, integrating Elastic Search, and maybe handling customer logins.
Then, build a flow for every new article to pass through the various Machine Box boxes, and handle the output.
Deploy to your favorite hosting site.
Now you’ve got a product that has multiple machine learning models integrated, doing everything from auto-categorizing news articles based on how they’re written, to improving search using people, places, things, and more, both in photos and in the content of the article itself.
And there is a lot more you can do. You could start recommending articles that people are more likely to click on based on who they are, detect and weed out fake news, or build your own language detection model to engage auto-translation.
The point is, you should be able to launch a product or service with AI in less than a week, even if you’re starting from scratch.
What is Machine Box?
Machine Box puts state of the art machine learning capabilities into Docker containers so developers like you can easily incorporate natural language processing, facial detection, object recognition, etc. into your own apps very quickly.
The boxes are built for scale, so when your app really takes off just add more boxes
By: Aaron Edell