Understanding AI through Intuition

AI4E V3 Module 2

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In Module 2 of the AI for Everyone course, the focus is on providing an intuitive understanding of how AI systems work. The module breaks down complex concepts by showing AI's applications in analyzing tabular data, recognizing images, processing natural language, and understanding speech. Participants are introduced to the fundamental mathematical principles such as linear regression and neural networks, demonstrating that AI is essentially sophisticated mathematics. Practical examples, like predicting housing prices and facial recognition systems, are used to illustrate the concepts, highlighting the importance of math skills acquired in secondary school as foundational for understanding AI.

Highlights

AI is essentially a complex form of maths, making it accessible with basic math skills! 🎓
Predict housing prices efficiently using linear regression techniques. 🏡
Neural networks transform how AI interprets data, from housing prices to speech. 🔍
Facial recognition involves matching vector profiles to stored data using cosine similarity. 🖼️
NLP pipelines convert language into vectors for AI training and sentiment analysis. 📖
Speech processing involves transforming audio waves into vectors for recognition. 🔊
Advanced AI models like Megatron Turing use billions of parameters for tasks like NLP. 🤖

Key Takeaways

AI is essentially advanced mathematics, not magic! ✨
Linear regression can be used to predict outcomes in various scenarios, like housing prices. 🏠
Neural networks are complex versions of simple mathematical equations. 📐
Facial recognition systems use neural networks trained with vast datasets. 🤳
Natural Language Processing (NLP) transforms text into numeric vectors for analysis. 💬
AI is trained using real data and algorithms such as gradient descent. 📊
Understanding local language nuances is crucial for accurate speech processing. 🗣️
The skills learned in secondary school math are foundational for grasping AI concepts. 📚

Overview

Module 2 dives deeper into understanding how AI operates by simplifying complex systems into relatable examples and accessible math. From predicting real estate prices using linear regression to understanding the fundamentals of neural networks, this module breaks down AI into its mathematical essence. It's all about seeing AI as an extension of what you might have learned in school, even as it tackles tasks as sophisticated as facial recognition.

Participants are guided through the applications of AI in recognizing images and processing language. The power of AI is made apparent through examples like facial recognition systems leveraging trained neural networks. These systems classify images, analyze sentiments in text, and even transform audiowaves into interpretable data, demonstrating AI's broad range and capability.

By demystifying AI and connecting it back to math principles, learners are empowered to see AI systems as an extension of mathematical learnings. The module emphasizes that AI is not an enigma but a tool grounded in math fundamentals accessible to those familiar with them, preparing participants for real-world AI applications in future modules.

AI4E V3 Module 2 Transcription

00:00 - 00:30 welcome back to module 2 of ai for everyone in this section we'll help you get an intuitive understanding of how ai system works we're not going to be very robust or rigorous here the intent is to get you to have a good idea of how ai works and appreciate that ai is really just maths we'll show you how ai can be used on tabular data how ai can see how ai reads and how ai can hear once you understand this you can better
00:30 - 01:00 understand the various ai applications and systems that you will interact with every day we briefly discuss y equals to mx plus c in module 1 of this course let's dive a little bit deeper here let's assume we can predict hdb price with just the floor area we collect prices of recently sold hdb price and note the corresponding floor area say we collected four data points as shown
01:00 - 01:30 it looks like we can build a model by drawing a straight line through the points one common method to fit a straight line is to use the least squares method the basic idea here is to minimize the errors the distance between the data point and the estimated line the error function is as shown and from your math and secondary school to minimize the function you can differentiate the error function and set the value to zero if you work out the maths we get the final equations for m and c
01:30 - 02:00 once the model has been built we can then determine the price of a hdb flat by entering the floor area however the real world is more complicated a better model to predict the hdb flat should probably include not only the floor area but also which floor the flat is on and whether it is near the mrt or a school
02:00 - 02:30 fortunately we can extend the earlier y equals mx plus c into y equals to m1 x1 plus m2 x2 and so on this is known as multiple linear regression multiple linear regression is a form of machine learning algorithm simple but very powerful and easy to understand do note that not all problems can be modeled as such assuming the hdb price can be modeled linearly using the same least
02:30 - 03:00 squares method but this time for multiple linear regression we can find the values of m1 m2 m3 m4 and c with m1 m2 m3 m4 found we can now apply it to say a 120 square meter flat near a school no mrt and on the fifth floor if we plug in the numbers we get 300 000. however note that the
03:00 - 03:30 algorithms have no concept or idea of what is floor area which floor the hdb flat is on what is the mrt or what is the school to the algorithm it is just a set of variables we will now show intuitively how an artificial neuron works using the hdb price prediction as an example this neuron was shown earlier the weighted inputs are sum an activation function is then applied the activation function is modeled after how the brain neuron works
03:30 - 04:00 if the inputs are strong enough it will then fire now we overlay the hdb example you can see how the same multiple linear regression can be mapped into the form of a neuron intuitively neural networks is nothing more than a more complicated version of the familiar y equals to mx plus c and a neat way to represent this is through vector and matrices
04:00 - 04:30 again which is something that you have learned in secondary school let's extend our intuition further the input layer is connected to a node in the hidden layer which further connects to an output node in the output layer typically the hidden layer is a bit more complicated than just one node we can add another node and connect them as shown
04:30 - 05:00 and then another node this will be a typical diagram for a neural network also we can see that the mathematics involve from the input layer to the hidden layer is your secondary school mats of vectors and matrices question how many weights or amps is there in this neural network 15. so we need a way to find the 15 m's or parameters because of the non-linear activation
05:00 - 05:30 function used we cannot use the least quest method here we will need another way let's use predicting hdb flat price again and the earlier neural network constructed we will initialize the weights m1 to m15 with some random values it doesn't really matter what it is in the beginning we'll take the first row of data of hdb price and feed it to the neural network this input data is
05:30 - 06:00 often represented as a vector we will compute the predicted price which will obviously be incorrect because the weights were randomly initialized the computation in this forward pass is basically vector matrix multiplication we now compute the error if the predicted price is higher than the actual price we do a backward pass and adjust the weight smaller
06:00 - 06:30 similarly if the predict price is lower than the actual price we increase the weights we then move on to the next set of price data and continue to adjust the weights based on the errors this technique of propagating the errors backwards is known as back propagation of course we do not just adjust the weights randomly we use mats specifically minimization of the arrow function similar to what you saw in earlier slides and a technique known as gradient descent again
06:30 - 07:00 maths which you will have likely learnt in your secondary school we may need thousands of training examples to find an optimal set of weights which can predict the price of a hdb flat correctly one question you may have is how do i know how many nodes and hidden layers do i need or how many rows of data do i need well it is often more art than science and includes lots of trial and error
07:00 - 07:30 the earlier neural network only has 15 weights or parameters to optimize microsoft and nvidia announced on 11th october 2021 megatron during a neural network with 530 billion parameters it has hundreds of hidden layers with many nodes per layer the megatron turing is a neural network for natural language processing now let's explore how a computer see images are represented as
07:30 - 08:00 pixels in a computer in black and white pictures the areas here where the picture appears the pixel would be a one and where there are no picture it will be a zero the number one here is shown in a five by five pixel square we can stretch out the pixels vertically as shown and it now becomes a vector of zeros and ones which we can now use to train a neural network this neural network will have an input layer of 25 nodes and if we use a hidden layer with four nodes we will end up with
08:00 - 08:30 a hundred amps of parameters we need to find as we want to predict numbers from one to nine including zero we could have an output layer with 10 nodes it's representing the value of 1 to 9 and 0. there is a total of only 140 parameters in this simple example with 25 pixels image today's cameras and even your smartphone have sensors that are 24 megapixels or bigger
08:30 - 09:00 which means you have at least 100 million parameters neural network researchers have developed more advanced techniques today such as convolution neural networks or cnn that reduce the number of parameters required now that you understand how a computer can see an image and can be trained to recognize it let's briefly discuss a common computer vision system you may encounter in office facial recognition entry systems typically the vendor would train the
09:00 - 09:30 neural network with millions of pictures of faces which is tagged with a corresponding unique id typically 128 bit long vector once the model has been trained it will be deployed into your office your company will then ask you to provide your latest photo the photo will be shown to the neural network and it will generate a unique id a 128 bit long vector this id together with your name and staff code will be saved into a database and when you come back to the office after the weekend
09:30 - 10:00 and try to enter the office your picture will be captured at the door and the same neural network will generate a unique 128 bit long vector now the system will find the closest match to this vector and to see how close two vectors are you can use cosine similarity a technique you probably studied in your secondary school trigonometry class why do we not try to find the exact same 128-bit vector
10:00 - 10:30 well remember the vector stored into the database was based on the photo you submitted which could be taken a few months ago or a few years ago or photoshop over the weekend you went to sentosa and got a 10 or you now spot a different haircut or you have lost weight these differences would mean the neural network would have generated a 128-bit vector that is different from the ones stored in the database based on your original photo
10:30 - 11:00 but this is okay we only need to find the closest match and the vendor will work with your company to determine the level of closeness that best fits the company's security policies now let's discuss natural language processing or nlp next nlp uses techniques from computer science ai and linguistics it has been used to classify documents translate one language to a different language chatbots and auto completion of sentences a
11:00 - 11:30 feature i find very useful in the gmail email client computers can only work with numbers specifically vectors and matrices so text data needs to be converted into vectors here is a typical nlp pipeline say we have the following sentences this laksa is spicy we love it we can do sentence segmentation to convert the text into two sentences
11:30 - 12:00 and then word tokenization to break up the sentences into individual words we may also want to remove stop words words like is the this to reduce the number of words for the algorithm we then apply stemming or lamentization to get to the root word for example the root word for spicy here would be spice once the data is clean we can apply various algorithms to convert it into vectors let's see a simple example let's see how we convert the following sentence this is spicy we love it into a vector the typical vocabulary of a person
12:00 - 12:30 is thirty thousand words so assume we have a dictionary of thirty thousand words as shown we then place a one into each location of the dictionary where the words in the sentence appear so we'll end up with a thirty thousand long vector with lots of zeros and only seven ones with the vectors formed we could then feed it to the neural network and train it for example
12:30 - 13:00 for positive and negative sentiments so here is a positive sentiment that is tagged with a one we use supervised learning to train the neural network and this is another sentence which has a negative sentiment and is tagged with a minus one will of course need thousands of labeled sentences some negative and some positive sentiments to train the neural network once the neural network has been trained it can then be used to classify positive
13:00 - 13:30 and negative sentiments in text or comments made on social media for example note that in using the trained neural network there is no need to label the sentence anymore 30 000 long vectors don't really make sense so researchers have developed algorithms to convert text to vectors that are shorter typically 32 or 128 bit long vectors there are several methods to do text vectorization but we will not cover them here
13:30 - 14:00 more importantly word vectors have been found to embed meanings in them for example here the words puppy and dog are close to each other since puppy is a young dog whereas you will not expect to find the word cat or house to be close to the dog vector another interesting aspect of word vectors is that you can do mathematics with them for example king minus man plus woman equals queen
14:00 - 14:30 with images you converted images into numbers vectors specifically with nlp the same you converted it into vectors or what is known as word embeddings with speech what do you need to do yes convert it into a vector again let's say we have an audio clip of the word hello we then slice the audio clip in 20 millisecond slices and use the amplitude as the value of the vector
14:30 - 15:00 and we know its output value or label is a word hello we can now use supervised machine learning to train the neural network of course we have to collect thousands of hours of audio clips slice them annotate them and use the annotated audio clip to train the neural network with a trained neural network we can now present an unlabeled audio clip which goes through the same process of slicing the audio in 20 millisecond slices
15:00 - 15:30 to extract out the value of the vector but this time we do not know what the output is we fit the vector to the neural network and the neural network will produce output it may generate hello or a low something close but not necessarily exact we take the generated output and pass it through a dictionary and we get the best case what is important here
15:30 - 16:00 is that you need properly trained annotators to listen carefully to the spoken sentences and label the sentences correctly often with local nuances especially a language like singlish this is particularly important to understand and that is why for speech annotation it is often hard to outsource to someone who is not local and may not understand the local slang i hope by now you can see that ai is really just maths ai is not magic and the meds you need to
16:00 - 16:30 understand how ai works intuitively is something that you have learned in secondary school this ends module two i hope you now have an intuitive understanding of ai the mess required to understand how ai works is something you have already studied in secondary school with techniques like least quest method differentiation and finding the minimum and maximum of functions we also showed the maths behind how ai is used for computer vision
16:30 - 17:00 natural language processing and speech processing in the next module we'll walk you through several real world use cases of ai done right here in singapore