This article was first published on Towards Data Science - Medium.
Diabetes Prediction — Artificial Neural Network Experimentation
Being a data science profesional, we tend to learn about all the available techniques to crunch our data and deduce meaningful insights from them. In this article, I have described my experiments with neural network architectures for exploratory analysis of data.
Here is the github link to my code repository, which I have used for exploratory data analysis, all the architectural designs mentioned in this article. I have used Python 3.6 along with Pandas, Numpy and Keras (backend on tensorflow) modules.
Here is the link to the dataset I have used for my exploratory data analysis, from Kaggle website. The data description and metadata of columns is mentioned in the link.
Number of Observations : 768 Number of Features : 8 Input Neurons : 8 Output Neurons : 2 (Diabetic and Non-diabetic) Test Data size : 20% Train Data size : 80%
First, I have created the Correlation Matrix and plotted the heatmap and pairs plot (plot between every two features of the dataset) using Seaborn module for data visualization. The correlation plot gives an idea about the dependencies on the features by comparing them against one another. Correlation is a part of bivariate analysis of any dataset.
Below is the pair plot which describes the bivariate plot between every feature and also shows histogram of every feature. The cell [i,i] shows the histogram, where i is the ith row/column.
One inference, which can be drawn from the histogram plot above is that, few of the features may follow standard or known distributions like Gaussian, Rayleigh, etc which is evident from the shape of the plot. This assumption can be handy while making a predictive model, if required, since we already know (or atleast assumed) the distribution of the data with its mathematical formulation.
From the above plot, it is fine to conclude (Because of high correlation score) that ‘Outcome-Glucose’ pair and ‘Outcome-BMI’ pairs are most interdependent. So plotting a joint plot with Pearson Correlation will focus on their behaviour. From the below, one can also say that Outcome (Target) has only two possible outcomes.
From the first Pearson Correlation plot, it is evident that ‘Glucose’ is the highly correlated to Outcome, which makes ‘Glucose’ the most important feature.
Now this is the most interesting part, experimenting with the various possible architectures of Neural Network. Before diving into this, I would like to point out few key points regarding decision of architectures.
- #Input Neurons = #features in X
- #Output Neurons = #Classes in target
- #Hidden Layers >0
- #Neurons in Hidden Layer1 ~ #Neurons in Hidden Layer2 ~ #Neurons in Hidden Layer3 …. ~ #Neurons in Hidden LayerN (If the architecture has N hidden layers)
- #Neurons in Hidden Layers ~ #Input Neurons | #Neurons in Hidden Layers ~ 2 X #Input Neurons
- Weights must be randomly Initialized
Artificial neural network is one the most popular machine learning algorithm, with wide area applications in predictive modelling and building classifiers. Presently, many advanced models of Neural Networks like Convolutional Neural Network, Deep learning models are popular in the domain of Computer vision, Network security, Artificial intelligence, Robotics applications, Health care and many more advanced technologies.
Few exciting facts which drive data scientists to use Artificial Neural Networks are :-
- Adapts and trains itself to complex non-linear problems.
- Flexible to various kinds of problem sets.
- Fundamentally compatible with real-time learning (Online Learning).
- You need lots of data and a fast GPU for computation in most cases to build ANN
All the metrics and graphs are subject to this particular problem set only since this a exploratory data analysis.
Single Hidden Layer Architecture
In case of our problem, applying single layer architecture yeilds a saturated accuracy of 64.28% for any number of neurons (a strong assumption based on graph above) in hidden layer.
Two Hidden Layer Architecture
In case of two hidden layer architecture, similar behaviour is observed where the accuracy is always saturated to 64.28%.
Multiple Hidden Layer Architecture
Since we are discussing about exploratory data analysis with neural networks, I would like to present few key points to keep in mind :-
- The choice of activation functions affects the performance to great extent. Wise decision on selection of activation functions based on experiments, the type of target and data we are dealing with, matters.
- The number of neurons in hidden layers should be similar to the input neurons. If the number of neurons is large enough, that may increase performance but also may increase complexity. A trade off is to be maintained for the same.
- Use of Momentum with backpropagation can help in convergence of solution, and achieve global optima.
- While deciding the architecture of hidden layers, experiments with various architectures will help, since every data set may behave differently with architectures.
- The size of data matters, so try to select data size accordingly. The bigger, the better!
- Random initialization of weights of the network is mandatory, when the network is built from scratch (NO PRE-TRAINED WEIGHTS like Inception model).
After the neural network, I applied few other algorithms to experiment with the data set and performance. Here is the result :-
In every real world problem, the first step to build a solution focused model is to perform an exploratory data analysis. This will estabilish the suitable model for the problem, which can be further used to tune up the performance and solve the problem efficiently.
Exploratory data analysis for artificial neural network deals in playing with the hidden layers and activation functions. Advanced big-data problems, image based problems and many other complex problems arre now tackled with Convolution Neural Networks (CNN). Deep learning has been extensively used in many complex research problems, because of it’s ability to gain insights from big data, skip the process of data feature extraction in many cases (CNN can work directly on the images, without any feature extraction). CNN in computer vision applications has another benefit of keeping the spatial property of the image intact, which can be very useful for many geometrical based computations and inference.
- Exploratory Data Analysis — https://www.kaggle.com/etakla/exploring-the-dataset-bivariate-analysis
- Keras — https://keras.io/
- Pandas — https://pandas.pydata.org/
- Seaborn — https://seaborn.pydata.org/
- Pearson Correlation — https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- Artificial Neural Network — https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_neural_networks.htm
Diabetes Prediction — Artificial Neural Network Experimentation was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
By: Monik Raj