Fundamentals of Machine Learning - Part 4 - Hand written digit Recognition using k- NN Classifier
Estimated read time: 1:20
Summary
In this engaging video tutorial by Sathiesh Kumar V, viewers dive into the world of handwritten digit recognition using the k-nearest neighbors (k-NN) classifier. Harnessing the power of the MNIST dataset, which comprises grayscale images of digits (0-9), the tutorial guides learners through the steps of data preparation, normalization, and model training using Python libraries like scikit-learn and Keras. The video also emphasizes parameter tuning, particularly the 'k' value, to achieve optimal model accuracy, resulting in effective digit recognition. The engaging narrative ensures that even beginners can follow along and apply these machine learning techniques confidently.
Highlights
- The tutorial emphasizes the importance of the MNIST dataset for training. ๐
- Viewers are guided on reshaping and normalizing data for better accuracy. ๐
- The video showcases the procedure for one hot encoding of labels. ๐ข
- Learners are shown how to split data for training, validation, and testing effectively. ๐งช
- Key parameter tuning, including the 'k' value selection, is demonstrated in detail. ๐๏ธ
Key Takeaways
- Harness the power of k-NN classifier for digit recognition. ๐ฅ๏ธ
- Leverage the MNIST dataset to train models effectively. ๐
- Understand the importance of data normalization. ๐
- Learn to tune parameters for optimal model accuracy. ๐ฏ
- Follow along with Python code examples in the video. ๐
Overview
Sathiesh Kumar V's video tutorial focuses on practical machine learning, specifically the application of k-nearest neighbors (k-NN) classifier for digit recognition. The journey begins with a comprehensive introduction to the MNIST dataset, which contains 60,000 training images and 10,000 test images, each representing digits from 0 to 9 in grayscale. This foundational knowledge prepares viewers for deeper learning.
Following the introduction, the video delves into the technical aspects of data preprocessing. Learners are taken through steps to reshape and normalize data, ensuring each pixel value of the images is optimized for computation. The method of one hot encoding is meticulously explained, converting labels into a format suitable for machine learning algorithms, making this section invaluable for foundational understanding.
The final segments of the video are dedicated to model training and evaluation. Sathiesh explains how to use the scikit-learn and Keras libraries to split data into training and validation sets, highlighting the importance of parameter tuning for the k-NN model. By adjusting the 'k' value, learners witness firsthand how significant these changes can be to the accuracy of the model. The tutorial concludes with a performance analysis using classification reports, ensuring learners understand how to evaluate their model effectively.
Chapters
- 00:00 - 00:30: Introduction The 'Introduction' chapter welcomes learners to the course on practical machine learning using Google Colab. The instructor mentions that the tutorial will cover handwritten digit recognition using the k-nearest neighbor (k-NN) classifier. Viewers are encouraged to watch a foundational video on the k-NN classifier before proceeding.
- 00:30 - 01:30: Overview of MNIST Dataset The chapter provides an overview of the MNIST dataset, which is commonly used for handwritten digit recognition. It contains images of handwritten digits ranging from 0 to 9, and includes 10 classes corresponding to each digit.
- 02:00 - 03:00: Preparing the Google Colab Environment The chapter focuses on setting up the Google Colab environment for processing images. It describes the specific type of images being used, which are grayscale images each containing a digit. The pixel values of these images range from 0 to 255. Each image is 28x28 pixels in size, indicating 28 pixels per row and 28 pixels per column. The conversation suggests a focus on a dataset, likely MNIST, commonly used for training image processing models in machine learning due to its simple structure and small size.
- 03:00 - 05:00: Importing Libraries and Packages This chapter introduces the MNIST dataset, consisting of 60,000 training images and 10,000 test images, and explains its use in training a K-nearest neighbor (KNN) classifier to categorize digits. The program for implementing this classifier is briefly mentioned.
- 05:30 - 08:30: Loading and Reshaping the Data In this chapter, the process of configuring the Google Polar environment for data analysis tasks is outlined. It begins with setting up the environment by accessing notebook settings, selecting the use of a GPU to enhance computational efficiency, and then establishing a connection to the cloud environment to facilitate data processing and analysis.
- 08:30 - 09:30: Data Normalization In this chapter titled 'Data Normalization', the focus is on setting up the initial environment for data processing tasks. The chapter begins with connecting to a cloud environment which serves as a prerequisite for the subsequent steps. The first significant task is importing necessary packages. This includes the 'print' function, which is essential for displaying outputs, and 'sklearn', a comprehensive machine learning package. 'Sklearn' is highlighted as a crucial tool given its extensive library of machine learning algorithms and utilities geared towards simplifying the process of data normalization and other tasks. Although the excerpt stops there, it suggests the forthcoming discussions might elaborate on using 'sklearn' for data normalization purposes.
- 09:30 - 12:30: One-Hot Encoding In this chapter, the concept of 'One-Hot Encoding' is discussed as part of machine learning preprocessing techniques. It mentions the use of the 'train_test_split' function from the sklearn package, a popular Python library for machine learning. The function is essential for splitting datasets into training, test, and validation sets, which is a crucial step in training machine learning models. The chapter likely covers in detail how to implement one-hot encoding using sklearn and its importance in preparing data for machine learning algorithms.
- 13:00 - 15:30: Splitting Data for Validation The chapter introduces a class called 'k neighbors classifier' used for performing k-nearest neighbor classification. The chapter notes the presence of a classification report in sklearn, which details performance metrics such as precision, recall, and the F1 score. Additionally, the chapter mentions Keras, an advanced machine learning package that includes various datasets, illustrating this by referencing a search query for Keras on Google.
- 15:30 - 19:30: Tuning K-Value for KNN The chapter titled 'Tuning K-Value for KNN' discusses the selection of datasets available for usage in Keras, particularly pointing to the Boston housing price regression dataset among others. It hints at the availability of numerous datasets within the Keras package, suggesting that these can be explored or used for practical exercises related to KNN (K-Nearest Neighbors) algorithm. The focus is on adjusting the K-value, a crucial parameter in the performance of the KNN model. However, the provided content only introduces the topic without delivering any specific methodologies or results related to the tuning of the K-value itself.
- 19:30 - 21:00: Retraining Model with Optimal K-Value The chapter discusses retraining a model using the optimal K-value, focusing on using an M List dataset from Keras. It outlines the process of importing the dataset and its labels, which range from 0 to 9, and converting these labels into vectors using a method called one-hot encoding. The use of one-hot encoding is emphasized for handling the dataset effectively in the model retraining process.
- 21:00 - 23:30: Model Evaluation Using Test Data The chapter titled 'Model Evaluation Using Test Data' provides an overview of the necessary tools and libraries used for model evaluation. It emphasizes the importance of utilizing utilities from the gas package and introduces the numpy library, commonly referred to by the alias np, for handling array and matrix computations. Additionally, the chapter highlights the use of various computer vision library packages that are essential for executing the program. The chapter explains the preliminary steps of importing all required functions to ensure the successful execution of the intended model evaluation workflow.
- 23:30 - 25:00: Conclusion The 'Conclusion' chapter focuses on loading data using the Keras library. It specifically highlights the function 'mnist.load_data,' used to load dataset variables including 'x_train' for training data and 'y_train' for training labels.
Fundamentals of Machine Learning - Part 4 - Hand written digit Recognition using k- NN Classifier Transcription
- 00:00 - 00:30 hello learners welcome to this course on practical machine learning using google collab so in this tutorial i will be describing about and hand written digit recognition using k nearest neighbor classified so i strongly recommend the viewers to look at my earlier video on the fundamentals of k-nearest neighbor classifier before progressing into the
- 00:30 - 01:00 this tutorial so at m list data set is a hand written digit recognition so here if you see the m list data set it contains the hand written digits like this from 0 to digit 9. so it has about some 10 classes
- 01:00 - 01:30 where each image contains a digit and it is of gray scale in nature so it has a pixel value varying from 0 to 255 the size of each image the size of each image is about 28 by 28 so 28 pixels along the row and 28 pixels among the column and it's a grayscale image and this mlst
- 01:30 - 02:00 data set it contains about 60 000 training images and 10 000 test images are available so we make use of this m list data set for learning our k nearest neighbor classifier so we will be using this data set to categorize the digits by using a k nearest neighbor classified so let us go on to this program
- 02:00 - 02:30 by opening the google polar environment so here i will send the notebook settings go to edit notebook settings so let me make use of a gpu and let us start our program by connecting it to the cloud environment let it get connected to the cloud environment so once it gets connected you will get this connected
- 02:30 - 03:00 right so now it is connected to the cloud environment first step which we need to do is we need to import the packages required for this program to work so i am going to import the print function because we might be requiring the print function to print some parameters then i am going to use a sk learn is a machine learning package so this sk learn it contains lot of
- 03:00 - 03:30 machine learning libraries are available so we are going to make use of this sklearn package in which i am going to have a function called as train test split which is used for splitting your data into training and test data as well as for training data can be splitted up into training and validation data so far that we make use of this drain test sprint function again from scale and package we have a
- 03:30 - 04:00 class called as k neighbors classifier which will be performing your k nearest neighbor classification process then sk learn it also contains a classification report where i can see the performance matrix like precision recall f1 score etc then from keras keras is an advanced machine learning package which contains different data sets so for example if i go into google and type keras
- 04:00 - 04:30 data sets it opens this link here you can find different data sets are available in keras boston house housing price regression data so like that we have some number of data sets are available in the keras package so we are going to
- 04:30 - 05:00 make use of this m list data set for our process so we use from keras try to import the m list data set and the labels of this m listed data set will be in terms of 0 to 9. so we need to convert this label into a vector so for that we use a procedure called as one heart encoding method for in order to do the one hot encoding
- 05:00 - 05:30 method we make use of the utils from the gas package import the mp utils then we are going to use the numpy library which is a numerical python library mainly used for array and matrix computation and in short form we are going to use it as np in our code and we will be making use of some computer vision library packages also so we import all the necessary functions which we require for our program so execute the cell
- 05:30 - 06:00 then let us go and load the data so the first step is from the keras library i need to load the data into our program so for that i make use of mnist dot load underscore data function so which loads the data into this output variables where x train is the training data and y train is the training data
- 06:00 - 06:30 labels same way x test is the test data and y test is the test data labels so this x train and y train contains 6000 samples are 60 000 samples data along with labels same way x test and y test contains 10 000 data along with its labels so those get loaded into our program it gets downloaded
- 06:30 - 07:00 once it is downloaded we need to reshape it so in order to do reed shaping i am going to find the characteristics of the image by using dot shape function so dot shape of 1 gives me the width information dot shape of 2 gives me the height information so which is the height of the image and width of the image is formed so x train is the training data
- 07:00 - 07:30 so i find the training data width as well as the height so we know the width and height is nothing but 28 by 28 so i when i multiply 28 by 28 i get 784 so 784 is assigned to the number underscore pixel then i try to reshape it reshape the vector using xtrain is the training data i am going to reshape so i say it is i'm going to have some
- 07:30 - 08:00 dot shape of 0 will give me the number of samples in the data set for training data i have about 60 000 data so we have 60 000 comma 784 so i reshape the matrix in terms of the first sample 784 elements second sample 784 elements third sample 784 elements so like that it goes on till sixty thousand to sample 784 element
- 08:00 - 08:30 so that the 784 element is nothing but the pixel values of the image same process we do it for test data also so here i take the dot shape of zero gives me ten thousand test data and comma seven eighty four which means again first data seven eighty four element second data seven eighty four elements it goes until ten thousand the sample 784 elements then we try to print the training data
- 08:30 - 09:00 shape of zero and when we visualize it we get about sixty thousand because dot shape of zero will give me the number of samples in our list now once we have done reshaping our data we can normalize our data because our image contains a pixel value between 0 to 255 since it is a grayscale image so we can convert this pixel values from 0
- 09:00 - 09:30 to 255 to 0 to 1. so by doing the x train divided by 255 255 is the maximum value of my pixel so we can change the range of pixel values from 0 to 255 to 0 to 1. so we do it for both the training data and test data so we try to normalize the data so that the pixel values lie between 0 to 1. next step
- 09:30 - 10:00 in our program is we have labels associated with each and every data so this label for example the first image might be a digit two so the label will be two same way the fifth image might be nine and the label will be nine so we need to convert this label into a vector form so it will be easy for us to do the classification part so in order to do that we perform one hot encoding method
- 10:00 - 10:30 so one part encoding is very simple method where i convert a label into a vector so for example if the image contains digit one then i use a vector element for each class so the first element is for class digit zero second element is for digit one third element is for digit two digit three digit four digit five digit 6 digit 7
- 10:30 - 11:00 digit 8 digit 9. so for each class i have a element in my vector so depending upon what type of label it is for an image i assign 1 to that position so here in case of a digit one i place one at this position because this position represents the class one or digit one same way if my image contains digit seven it comes at this position so 0
- 11:00 - 11:30 1 2 3 4 5 6 7 so 7th position we try to put one for all the other labels we put it as zero so this process is called as a one hot encoding process so to formulate this we make use of this np underscore utils function to change it to be in this vector form so we do it both for your training
- 11:30 - 12:00 labels as well as for testing labels so we convert our label into a one part encoder vectors so again here when we see this white ring and y test which is there which is now a vector form of the labels so the dot shape of 0 will give me the number of samples dot shape of one will give me this
- 12:00 - 12:30 with information so width contains 10 classes so the 10 classes i try to take it into this variable num underscore classes so y test dot shape of one dot shape of one will give me the width information so it represents the number of classes if the width is say some 20 elements then the number of class will be 20 if the width is some five elements then the number of class will be five in our case our width is ten elements so the number of classes
- 12:30 - 13:00 will be okay so i try to print it and see how many classes are there in our dataset so it says we have 10 classes then we start our process of machine learning so when we obtain the data set itself we get a separate training data and test data so we need not worry about splitting the data into training and test data but we
- 13:00 - 13:30 have in k nearest neighbor classifier we have some parameters to be tuned something like a k value needs to be tuned or the distance metric can be tuned so in order to do that we need to divide our training data into training data and validation data so for that what we do is we make use of this train test sprint function we give the training data along with the training labels
- 13:30 - 14:00 and specify the split up ratio so i want 10 percentage to be split up from this training data and this 10 percentage will be used for your validation or for estimating the parameters of your classified so in order to have a reproducibility in our result we make use of this random state to be some value so usually this value should be greater than 40.
- 14:00 - 14:30 so once i do this split up i get training data and its training labels same with validation data with validation limits so where this validation data and validation labels is the ten percentage of your original training data so then we print this data and see how many data is available in training category how many data is available for validation category how many data is available for district category
- 14:30 - 15:00 so when i print it and see you can see initially we had a training date of about 60 000 and tested data of about 10 000 so the test data remains same whereas the training data 60 000 we have divided into 54 000 for training and 6000 for validation so this validation data is important to identify which parameter gives me the best result best performance matrix
- 15:00 - 15:30 so once this is done let me initialize the k value so i am going to take k value in the range of 1 to 10 with an incrementation of 2 so k value i am going to take it as 1 3 5 7 and 9. so remember in python so the stop index will not be included for your computation so it will go just before your stop index so we start with one and
- 15:30 - 16:00 end with 10 which means it will go before okay so before 10 the legal value is your 9 so it will compute for my tillman so we are going to assign this k values in this cave as library and in order to open the accuracies for each k value i am creating a list right now it is an empty list then we go and
- 16:00 - 16:30 change the k value and see our performance on the validation data so first let me use the for loop in k changes from 1 to 10 with an incrementation of 2 so first k value takes to be 1 and i am going to use that k value for initializing our k neighbors classifier we create a model then once the model is created for k value 1 then i am going to do the training
- 16:30 - 17:00 process so training process is done by using this model dot fit function so i give both trading data and training labels for the training purpose once model gets trained i need to validate it right with the k value 1 i need to say i need to see what is the performance matrix so for that i use a function called as model.score i provide validation data and validation labels
- 17:00 - 17:30 obtain the score then upon that score to the accuracy list so when the for loop runs for the first time i will get k value equal to 1 and whatever accuracy we got it will be when the k value runs for the second time my k value will be 3 and whatever accuracy i get i will get abundant to the accuracy list same way for k5 accuracy will be updated k7 accuracy will be appended and k9 accuracy will be open so we have created
- 17:30 - 18:00 a list so in this list we need to find which k value gave me a better performance matrix better accuracy so in order to do that we use this np dot argument max function to find which index gave b or which k value give me highest accuracy so that index i try to
- 18:00 - 18:30 take it here i try to compute here so here you see right now the code is running for k value 1 we have obtained an accuracy of about 97.48 same way it will be computed for k value 3 5 7 and 9. so for k value 3 also we got a similar accuracy of four 97.48 percentage so like that it will be listed till k
- 18:30 - 19:00 value nine once it is done we use this np dot argo max function to find which index gave me the highest accuracy then obtain the k value for which we got the highest accuracy and retrain the model with that k value so k value equal to 5 you can see the accuracy got little bit reduced 97.12 percentage
- 19:00 - 19:30 for k value 7 it still got reduced 96.77 percentage let us wait for some more time so that the k value will come to nine so here you can see k value 9 it is still more reduced so we can find that either i can use a k value of 1 or 3
- 19:30 - 20:00 both have resulted in a singular accuracy so we will find the index 0 and 1 which has a k value of 1 and 3 to be having the highest accuracy so either i can use index 0 or 1 for retraining the model so we find the k value with the highest accuracy so k equal to 1 achieve the highest accuracy of 97.48 on test data
- 20:00 - 20:30 so now we use this k value for retraining our model so again let me initialize the model with the k value which got the highest accuracy once model is created we again do the retraining process on the training data and training levels once it is done we can expose the model to a test data so to expose a model to the test data i use the function called as model.predict here i specify only the tested data
- 20:30 - 21:00 then the predictions are updated here once the predictions are obtained we can print the data by using the classification report so classification report it will take both the ground truth value so y test is the ground truth value of the test data and the prediction is from our machine so we can make use of these two
- 21:00 - 21:30 datas to see the performance metrics like precision recall f1 score etc so now let us print the classification report here you can see the digit twice the performance matrix for digit 0 the precision is about 0.98 recall is about 0.99 f1 score is about 0.98 same way for each digit
- 21:30 - 22:00 a individual performance matrix is obtained and also we can see the average value of each performance matrix where this support indicates the number of images containing digit 0 in the test data so for example 980 images are there in the test data with digit 0 same way 1028 images are there in the test data with digit 7 so like
- 22:00 - 22:30 that out of 10 000 images each category contains this many number of images so that is what it is represented by the support functionality so this is how a classification can be done using a k nearest neighbor classifier thanks for watching please subscribe for more technical
- 22:30 - 23:00 learn thank you