R Tutorial For Beginners 2022 | R Programming Full Course In 7 Hours | R Tutorial | Simplilearn

Estimated read time: 1:20

    Summary

    In this engaging and comprehensive R tutorial, Simplilearn guides beginners through a seven-hour course designed to help them master the essentials of the R programming language. The course is led by an experienced trainer, Ajay, and covers a wide range of topics, including variables, data types, logical operators, functions, data manipulation packages such as dplyr and tidyr, and creating visualizations with charts and graphs. The tutorial offers a thorough understanding of R, emphasizing its popularity in the fields of data science and statistical computing, and provides practical examples to help learners apply their new skills to real-world scenarios.

      Highlights

      • Ajay introduces R as a powerful tool for data science, used more than Python by statisticians and machine learning enthusiasts. 💪
      • The course covers the easy setup of R on different operating systems and the benefits of using RStudio. 🖥️
      • In-depth explanations of vectors, lists, data frames, and matrices, including naming conventions and common operations. 📚
      • Instructions on how to use logical operators effectively in R and handle data types efficiently. 🔍
      • Tutorials on grouping, summarizing, and transforming data using dplyr and tidyr packages. 🔄
      • Visualizing data with ggplot2 and understanding its grammar of graphics to create informative charts and graphs. 🎨
      • Special focus on handling time series analysis and performing simple machine learning tasks using R. ⏳

      Key Takeaways

      • Master R programming language essentials with this thorough tutorial led by expert Ajay. 🎓
      • Learn about variables, data types, logical operators, and functions in R. 🤓
      • Discover how to manipulate data with packages like dplyr and tidyr. 🔧
      • Create stunning visualizations with R's graphical capabilities for data analysis. 📊
      • Understand R's role in data science and its comparison with Python. 🐍

      Overview

      The Simplilearn R Tutorial for Beginners is a comprehensive walkthrough of the R programming language, designed to teach you the necessary skills to excel in data science and statistical computing. Led by expert trainer Ajay, this course covers everything from the basics to more complex concepts, ensuring you have a solid foundation in R.

        Throughout this seven-hour course, you'll delve into various topics such as variables, data types, logical operators, and the powerful functionalities of R. You'll also learn about data manipulation using essential packages like dplyr and tidyr, and how to create insightful visualizations that can help in data analysis and decision making.

          By the end of this tutorial, you’ll understand why R is a preferred tool among data scientists and its advantages over Python for specific tasks. You'll gain hands-on experience in setting up R and using RStudio, writing scripts, handling data structures like vectors and matrices, applying functions, and performing time series analysis, all of which are crucial skills in the world of data science.

            Chapters

            • 00:00 - 01:00: Introduction to R Programming R is a popular programming language used by statisticians and machine learning enthusiasts for data analysis and solving business problems.
            • 03:00 - 10:00: Setting up R and RStudio The chapter titled "Setting up R and RStudio" begins by introducing the basics of the R programming language. It covers foundational concepts such as variables and data types. The chapter then explores various R objects, including vectors, matrices, lists, and data frames. Finally, it explains logical operators and functions in R. The chapter progresses to explain data manipulation packages like dplyr and tidyr, and concludes with instructions on creating various charts and graphs in R for data visualization and insight generation.
            • 12:00 - 31:00: Working with Variables and Data Types This chapter introduces the programming language R and explains its significance in data science. It begins with a welcome message and sets the stage for a tutorial on R programming. Ajay, the presenter, highlights R's prominence as a preferred language in data science over other languages like Python, according to surveys of data mining experts.
            • 31:00 - 46:00: Operators and Expressions This chapter discusses programming languages used in data science, specifically R and Python. It explains that while Python is also used, R is more predominant in data science activities. R is an open-source programming language used for statistical computing and is one of the most popular languages today. It was inspired by the S Plus language and is similar to the S programming language, making it particularly suitable for data science applications.
            • 46:00 - 95:30: Control Structures: If, Else, and Loops The chapter introduces fundamental programming control structures such as 'if', 'else', and loops. It highlights the popularity of a programming language, noting its free and open-source nature, and its optimization for vector operations. It also mentions the vibrant community supporting the language, boasting over 9000 community-contributed packages, enabling a wide array of functionalities.
            • 95:30 - 129:00: Functions in R Programming R is an open-source programming language that can be installed for free.
            • 129:00 - 189:00: Working with Vectors and Lists The chapter titled 'Working with Vectors and Lists' provides insights into utilizing R programming for data analysis. It emphasizes the importance of sample data sets, which facilitate easier reporting of analysis results. Before delving into variables, loops, and core functionalities of R, the chapter advises on setting up the R environment properly. This involves accessing resources from R-project, which serves as the foundation for the upcoming topics.
            • 189:00 - 254:00: Data Frames and Matrices The chapter begins with an introduction to obtaining R for statistical computing, guiding the reader to the project's home page.
            • 254:00 - 294:00: Data Manipulation with dplyr and tidyr The chapter discusses the process of installing R on different operating systems including Linux, MacBook, and Windows. It provides guidance specifically for Windows users, explaining how to download and initiate R and mentions additional packages required for working with R efficiently. The chapter emphasizes practical steps to get started with R quickly.
            • 294:00 - 338:00: Data Visualization in R The chapter provides a guide on setting up R and R Studio, starting with downloading R from the most suitable CRAN mirror for your location. It advises on following the link to download the necessary files and confirms that these files are typically stored in the 'downloads' folder on your computer. The emphasis is on how to access and initiate these downloads properly.
            • 338:00 - 682:00: Time Series Analysis Project The chapter provides guidance on setting up the R environment for time series analysis projects. It begins with instructions for installing R, including creating a desktop shortcut for easy access. It then mentions RStudio, a toolkit that works in conjunction with Base R to enhance the user's developmental capabilities. These foundational steps are necessary for initiating work on time series analysis using R.
            • 682:00 - 683:20: Conclusion and Further Learning The chapter discusses the ease of working with R and how to utilize existing scripts or files. It mentions the R console and demonstrates how to open and work with scripts through a menu option. The process includes selecting and opening a pre-written script, providing an editing interface for further development.

            R Tutorial For Beginners 2022 | R Programming Full Course In 7 Hours | R Tutorial | Simplilearn Transcription

            • 00:00 - 00:30 r is a popular programming language used by statisticians and machine learning enthusiasts for performing data analysis and solving business problems hey everyone welcome to this full course video on our programming language in this video we will learn all the necessary skills that will help you master our programming we have our experienced trainer ajay who will take us through this full course in this video
            • 00:30 - 01:00 you will start off by learning the basics of r first we learn about variables and data types then we look at the different r objects such as vectors matrices lists and data frames next we will understand about logical operators and functions in r going further you will get an idea about the data manipulation packages in r such as d plier and idea and finally you will learn how to create different charts and graphs in r to visualize your data and draw insights
            • 01:00 - 01:30 over to ajay now hi everyone welcome to this tutorial on r let's see what is our programming and how it helps so r is well known as a language of data science now if you really look at the ranking from survey of data mining experts based on the softwares they have often used in their work r is used more than python when it comes
            • 01:30 - 02:00 to data science python is also used however r is predominantly more used for data science kind of activities it's a open source programming language used for statistical computing it is one of the most popular programming languages today it was inspired by s plus and it is similar to s programming language so when it comes to data science what we can say is r is
            • 02:00 - 02:30 a popularly used programming language across the globe it is free and open source as i mentioned it is optimized for vector operations which we will learn about later it has an amazing community has in fact 9000 plus contributed or community packages allowing us to do almost anything or everything using r now when we talk about features of r
            • 02:30 - 03:00 as i said it's open source programming language so you can install r for free and you can straight away start working you wouldn't have to really go for a licensed version or pay for the software non-coders can also understand and perform programming in r as it is easy to understand and it has various data structures and operators it can be integrated with other programming languages like c c plus plus java and python it consists of various inbuilt packages
            • 03:00 - 03:30 a lot of sample data sets which can be used and that makes reporting the results of an analysis easier by using r now before we start learning about variables loops how you work with r and so on it would be good to know how you can set up r and work on r so for that what you can do is you can just go to r minus project
            • 03:30 - 04:00 dot org and once we get to the home page of our project for statistical computing using this link we can click on download r here now that brings you to a page to download it now there are various links here so it shows you the comprehensive r archive network that is cran mirrors and it is available at different urls however i would choose the first one
            • 04:00 - 04:30 which is xero cloud you can just click on this one and then based on your operating system whether you are working on a linux machine on a macbook or windows you can install it so you can just click on this one as of now i'm using a windows machine so i can click on download r for windows and that takes me to this link which says binaries for base distribution now this is what we can use to work with r straight away however there is one more package that
            • 04:30 - 05:00 is r studio we will see how we can set up that now this one takes us to the best mirror possible for our location from where we can download r so you can click on this base and then you can download by clicking on this link i have already downloaded this so once you click on this one you can just save it so i have it here already in my downloads and that's more than enough then you can just double click and you can
            • 05:00 - 05:30 go through the instructions to set up r that would also allow you to basically set up a desktop shortcut which i have already done here on my machine and if i go in here i see our base you can click on this one and that brings you to the page which you can use to straight away start working with r now yes there is one more package called rstudio which is set up on top of base r
            • 05:30 - 06:00 which makes working with r easier now here also you can start working so it shows you our console and you can click on file and if you have some scripts or files already written in the format of r you can use those so i can click on open script and that takes me to a page where i have some files which are already existing i can just select this one and click on open and that shows me some options here so i have an editor which
            • 06:00 - 06:30 shows me say if i want to get a library to use built-in data sets i could summarize the data i could do a clean up and we'll see all of this but i would suggest using rstudio rather than just using base r however installing base r would be required and depending on your machine configuration like mine is a 64-bit i have chosen 64-bit while i was setting up base r now when it comes to rstudio
            • 06:30 - 07:00 it is basically a package which makes working with are easier so to install our studio what you can do is you can go to the r studio home page or you can just go to google and say type r studio download and then it takes you to this page you can click on this which says download rstudio you can choose your version you can go for the free version that is our studio desktop and you can click on this
            • 07:00 - 07:30 download and then you can download rstudio for windows which i have already done and then you have to run through the steps so just click on this one and i already have rstudio here right now i can just basically use that so for example if i go to downloads and if i look for rstudio if i do a double click i can say yes and then it takes me to the r studio setup just click on next and here
            • 07:30 - 08:00 you can choose the location if you would want to place it in a specific location click on next and then it says select the start menu folder so let our studio be chosen here click on install and then it will basically start installing this in a particular location now in my case it is already existing right so we can even click on show details and see what it is doing what packages or what executables it is extracting now once this is done
            • 08:00 - 08:30 then you will be able to use our studio you can also add a shortcut to your taskbar and you can continue using it so i've already done this this might take couple of seconds just wait for this to complete and you would have r studio which is an easier way of working with r so a lot of developers across the globe would be using rstudio when they are working with r to work on their data science or programming requirements
            • 08:30 - 09:00 now let's just wait it is almost done and now i can click on finish so so that part is done you can add it as a shortcut so rstudio has consistent commands it has unified interface it makes easy to navigate and manage through r and it is set up on top of your r base now if i click and open on this so that's my r studio which is coming up now here you see console
            • 09:00 - 09:30 which will show you the result where you can give your commands so where we can get text output now again i can choose a file so i can just say open file and then i can go into a particular location where i have downloaded some data and then basically i can choose say for example rstudio and that brings me here so now you have your script which has some commands right
            • 09:30 - 10:00 on the left bottom you have console where you can see the output on the right side you also have environment now that is to use or provide variables and then we can also have plots which we can see here now we can look at this as an example so here i am loading the built-in data sets so what i can just do is i can place my cursor here and i can just do a control enter and that basically loads the built-in
            • 10:00 - 10:30 data sets which we can see here that has been done now there is an inbuilt iris data set and we can just use head option to look at the first six lines of iris data set so just place your cursor and do a control enter and that shows you a summary basically the first six lines of this data set what it contains we will look into this data set later this is a default data set which you can easily find when you are working with r
            • 10:30 - 11:00 you can also have your cursor place on summary and then just do a control enter so that basically shows you summary statistics for iris data you can do a plot and that basically shows you the plot which you can also maximize and look at it in full screen you can just do a zoom if you are interested in looking into this and we will discuss how or what kind of information we can infer from the plots now when it comes to cleaning up you can just do detach and then we
            • 11:00 - 11:30 can say package data sets and here we had loaded those data sets so we are just doing a detach and we can say unload equals true so i'll just do a control enter i can also clear off the plots by doing this for whatever plots we had and we can either do a edit and then we can do a clear console from here or the shortcut is ctrl r and you can clear of the console so that's a simple way of starting
            • 11:30 - 12:00 your working with r by installing our studio so let's continue learning about working with r and basically the first thing which we should learn here is about variables in r so variables as in any programming language is a way to store your data value factor of list values or a data set or object in r
            • 12:00 - 12:30 it allows us to conveniently reference the variable name basically saving us from rewriting the data value or object many times in our program so when we talk about variables in r they are mainly used to store data with named locations that your programs can a variable can be a combination of letters digits period and underscore
            • 12:30 - 13:00 so you can have some valid variables as total sum you can also have dot notation so there are different naming or style conventions in r and we can use dot to separate names in description of a variable we can also start a variable with dot we can include numbers in a variable and remember r is case sensitive so we have to whenever we declare a variable we need to remember
            • 13:00 - 13:30 what case was used as in in the name of the variable and there can be other conventions also such as using an underscore or even using a case in between the variables so variables can only consist of letters numbers periods underscores your dot followed by a letter not a number and we can declare our variables we can also look at the type of the variables
            • 13:30 - 14:00 and the class to which it belongs so there are some invalid variables which we are seeing here so that also needs to be remembered so this is an example where you can use an assignment operator which you see here between x and 10 to assign a value to a variable you could also do that by doing a dot y and then assign a value you could be doing that by using a z and then having a computation done between x and y and
            • 14:00 - 14:30 finally you could do a print so let's see some example here before we move further and for that i can bring up my r studio here so as i said we can basically have different kind of variables or naming conventions for example i could do something like model 1 and then i can basically assign this so this is just a variable and i could be assigning anything to it i could be assigning different data types which are available
            • 14:30 - 15:00 here for example i could do something like this and i could do a control enter so that's my variable i can always do a type of and then basically i can check what's the type of my variable so it tells it's a character i can also do a class and then i can basically say show me the
            • 15:00 - 15:30 class and that shows me it belongs to the character class we'll learn about data types later but we are using assignment operator now if i say what is model 1 it shows me the value but if i would do something like this then it says object model not found and why because it is case sensitive the variable which we had created was all in lower case and the one which we tried to call was
            • 15:30 - 16:00 starting with an upper case so you could have variables created in such way i could also do something like hello underscore string and this could be my variable where we are using an underscore and then we can just given something here and that becomes my variable which you can always call and check what is the value of that you could also be doing something like this so you could be using
            • 16:00 - 16:30 different cases and then i could say something like this and that's also my variable and then i can basically look at the value of this variable now if we try to create a variable where we start the variable name with the number what would happen so if i say something like this and then if i try to assign a value to it for example let's say 100 now this one
            • 16:30 - 17:00 will throw an error message because you cannot have your variable starting with a number but if i used period and then basically give something like this and let's try doing this by giving it a number so if you see here since we gave a period the rule is that it should be followed always by a
            • 17:00 - 17:30 letter and not a number so i could just remove this and that works perfectly fine so these are some naming conventions which when you practice you will learn about so now i can assign a variable by just doing a dot pairs and then assign any value to it but always remember if you are using a period if you are using a notation then in that case that should always be followed by a letter one more thing which is always practiced in a real time environment
            • 17:30 - 18:00 is that we cannot have spaces when we are creating variables so for example if i say first num and then i try to assign this a value it basically fails but obviously i could have done this by doing it underscore and that perfectly works fine and you can basically then call the value for this one always remember one more standard practice which is followed in real time environment
            • 18:00 - 18:30 is you will try to have variable names with a little meaning to them so for example if i would create a variable and i would say for example let's say bird that's my variable name and then if i assign this a value tiger it works fine but then it really does not make sense and that would basically create a lot of ambiguity in our coding
            • 18:30 - 19:00 so it is always good to say for example animal and then i would say okay so tiger is an animal and that basically not only allows me to assign a value to the variable but it is also a little bit more meaningful now when we talk about variables it is also good to know the different data types which are available in r now like any other programming language
            • 19:00 - 19:30 r also supports different data types so you have your logical data type such as true and false you have numeric values which is say these numbers you could also be creating an integer which is 3l and 40 l for l and so on you can have a complex number you can have characters which can be just letters or a set of letters or anything which is within the quotes or you can even have
            • 19:30 - 20:00 raw data so these are different data types we can again see quick examples here on data types let me come out of this one and as we saw already when we created model 1 this was character now i can just say x and let's say 100 and obviously this is going to be not my integer okay so let's see this what is this one this one by default is double
            • 20:00 - 20:30 it is by default double so if i would want an integer then i would say for example something like like this and this one you can check by using type off and you can see the value for this one so this is an integer so similarly you can have character you can have complex you can have raw data you can have numeric values so all these are different data types you could also be
            • 20:30 - 21:00 saying for example i would want to check the boolean so i could check this and select this one and now when i check the value for a it is true and we will learn about logical operators where we can basically be using these values assigned to the variables to compare to compute between different variables so this is a simple small example of using variables
            • 21:00 - 21:30 so we have seen here using variables and also using the assignment operator and then assigning values to the variables and different naming conventions we can also be using different data types which are supports and work with the variables now once we have learned about variables or data types let's also just first learn about your operators
            • 21:30 - 22:00 and how they can be used in your r programming language now we might be intending to do some calculations on numeric values uh find out differences between values or say for example compare values so in that we can be using different kind of operators so we have various operators we have arithmetic operators we have rational operators we also have logical operators
            • 22:00 - 22:30 so before we straightaway look into logical operators let's also understand about the basics such as your arithmetic operators which supports for example let me pull up a notepad file here and when we talk about arithmetic operators here we are talking about your addition [Music] you have subtraction you have multiplication
            • 22:30 - 23:00 you have division and you have remainder or modulus and you have exponent and what makes it also important is that when you're using arithmetic operators you also need to know about the order of operations so when you say order of operations always the priority is to paranthesis
            • 23:00 - 23:30 so that takes the priority you have then exponent or your computation if that would involve exponent so let's say exponent here which is then followed by your multiplication and division and that one also follows an order of left to right whichever comes first when we talk about
            • 23:30 - 24:00 multiplication and division and similarly when we talk about addition and subtraction it is left to right whichever comes first so these are some of the arithmetic operators now we can see some examples here quickly although these are some simple examples so for example i can say 100 plus 100 and that gives me the value right you can always do a hundred minus 50.
            • 24:00 - 24:30 you can do a hundred multiplication you could do a hundred division two or you could also use modulus to which basically gives you an error here so i will just give me a minute so let's give here one more percentage sign
            • 24:30 - 25:00 and that basically says what would be the remainder so if we would want to look at the ordering when we are using this arithmetic operators we can see an example so for example if i say 34 plus 46 divided by 2 gives me 57 however if i use 34 plus 46 in parenthesis which gets the priority
            • 25:00 - 25:30 and then i divide my result is different so understanding what arithmetic operators you can use and also the ordering in which that leads to the computation is very important so we can use all of these arithmetic operators and to control the ordering we can be using paranthesis or we can have our computations ordered with what kind of operation we would want whether that would be
            • 25:30 - 26:00 multiplication or division addition or subtraction now at any point of time i can always do a control l and that allows me to clear my console let's continue our learning and let's learn about operators so when we speak about arithmetic operators we see that allows us to do computations but we have also rational and logical operators which help us in
            • 26:00 - 26:30 doing our computations or comparing values or sometimes finding difference between different values whether those are group of values or whether those are individual values so with your rational and logical operators you can compare data values so if we would want to see if the values match or not match or if the values are above or below equal to something and so on so when we talk about your rational operators we basically have
            • 26:30 - 27:00 in case of rational or logical operators rational or logical operators so we obviously have greater than you have less than you have greater than or equal
            • 27:00 - 27:30 you have less than or equal you have equal to and you have not equal these are some of your rational operators we can say and when you talk about your logical operators then you have and you have or and you have not so and is when it compares two values so it
            • 27:30 - 28:00 returns true if both the conditions are true else it will return a false so for example if i have 10 greater than 20 and 10 is less than 20 now that's not possible and we are comparing the result of both of these so we are checking if both the conditions are true and that's not really true here so we see the value is false now if i would have replaced this one this and with r
            • 28:00 - 28:30 it would check even if one of the conditions is true it would basically show me a result as true you can also use a not operator which takes each element of the vector and gives the opposite value so we can be using any one of these operators and then basically do our computations so let's see some examples about these logical operators now either you could just be assigning values to your variables and check or you could also be
            • 28:30 - 29:00 picking up a data set from your machine and then try to use these logical operators so for example if i say x has been assigned 100 y has been assigned 200 and if i try to say x equals y so that already
            • 29:00 - 29:30 checks the value and compares and tells me that's not true it is false and if i would have used a not operator for example if i would have said something like this one so it tells me true so i can just check simple conditions like this i can say is my y greater than x
            • 29:30 - 30:00 and that tells me yes it is true if i say y is greater than or equal to x well it would still say true because when you are saying greater than or equal to x so when you are saying this one it works fine right now we can also be picking up some data set and for that what i can do is i can pick up one of the data set from my machine so i can go in here and i have some data
            • 30:00 - 30:30 sets let's look into that and i would be interested in taking this auction data set and loading the values here so i'll get this path and i will come here i can use auction as my variable name you could have given a dot separated name for example i could have said auction dot data if this is what you want to do and then you can assign variable a value so here i'll say read.csv and i
            • 30:30 - 31:00 intend to pick up a file so i give this path and when we are working on windows machine we need to give a double slash so i'll say auction.csv now i could give other things like header being true what is the separator if you would want to fill values to take care of missing values we can look at all of those so here i'll just add a backslash i will add a backslash
            • 31:00 - 31:30 and i will basically just do a control enter now i can look at the values of this by just doing a auction.data and i can see what values it has so it has a lot of data here it has a lot of your data here you could have used some other functions which we can see later where i can choose head and i can see the first top five values so we can basically
            • 31:30 - 32:00 assign data to the variable and continue working on this now we can keep it simple so let me repeat this step and here i will say auction as my variable name and i'll assign this so i can basically do a also a view on auction so auction and then basically that shows me a tabular format of the data which allows
            • 32:00 - 32:30 me to look into the data and basically understand it and then i can you know use this to work on variables so what i can do here is i can say x and let's say assign some value to this for which i would want to work on my data set which is auction now what do i want to do here so let's use auction and then i can use a dollar symbol and i
            • 32:30 - 33:00 can choose which column i'm interested in so for example let's choose bidder and i can just give a value to this one and let's pick up a name so let's say tweet and that's the name and i can be assigning all the values to this or i could say i would want to use another condition so i'll say auction
            • 33:00 - 33:30 dollar and then let's take this value of bid and let's say it is equals to 100 and then i ended up with comma and i can try doing this now here it gives me a problem because what we did was we did not use the right operator so we will say for example and so i will say x is being assigned the value of
            • 33:30 - 34:00 auction bidder being tweak and auction bid value being 100 so now once we do this i can look at the value of x and that shows me the value so this is just a simple example of using a logical operator now i could have just said instead of and i could have used or which is basically a pipe which you have to use
            • 34:00 - 34:30 and that gives you or condition and now hit on enter and if i now look at the values of x it will show me lot of values because we have given an or condition which basically matches one of the conditions so in this way we can use logical operators and continue working and continue doing our computations let's learn about print formatting and how print can be used to
            • 34:30 - 35:00 view your data when you talk about r r uses print function to display the variables so for example if i have assigned number 10 to x i can do a print x and that will show me the value of x what we see here with one in square brackets that also has a meaning which basically means it is a vector and we'll learn about vectors later so r
            • 35:00 - 35:30 uses the paste and paste 0 functions to format strings and variables together for printing in few different ways for example if i would do this which i say is print paste and then pass in two strings here or two words here such as hello and world that would be printed as follows now i could also do a print paste and then use a separator so my print
            • 35:30 - 36:00 would look something like this if i use base 0 then that avoids any space between these two words or for example these three words so let's see some basic examples here when we talk about print so for example if i bring up my r studio here is an example so x as we say now this is your assignment operator which we already discussed now i can be assigning a value to this
            • 36:00 - 36:30 so i can just place my cursor here and i can just hit on control enter so value has been assigned now let's look at the value of x now i could also be doing a print x explicitly by using print function for example if i do similarly for message as hello and then i can print the message by using print now if for example i do something like this
            • 36:30 - 37:00 this is not going to print anything until i call the variable or i use a print function so for example if i do a y pc auto printing shows us the value or i could do explicitly by using the print function by explicit printing now whenever we look at this number one as i mentioned it means y is a vector and five is its first element
            • 37:00 - 37:30 now you can also use operator to create integer sequences and we'll learn about sequences or list later but this is just a simple example so i am creating an integer sequence of length 20 i can place my cursor here which would start with 10 and end at 30. so let's look at this values for our sequence of integers now at any point of time you can always use a class
            • 37:30 - 38:00 to look at the class of say x and that shows me the classes of integers now looking further when we talk about different data types as we learned a few minutes before so r has basically five basic or atomic classes of objects so you have character numeric values
            • 38:00 - 38:30 that is real numbers you have integers you have complex and you have logical values let's spend some time in understanding some basic arithmetic operations and how you can do it using your r programming language now here i've opened up rstudio and these are some basic examples such as performing arithmetic operations now for example we can add two numbers and i can just place my cursor here and
            • 38:30 - 39:00 please press ctrl enter that shows me the addition i can do a subtraction i can do multiplication division also going for exponential power or use modulo which returns the remainder now when we are performing operations what we can also do is we can change the order of operations and in this case we are using parenthesis so i am putting in 500 into 2 in a paranthesis plus 80 divided by 2 so first it
            • 39:00 - 39:30 operates what is given in parenthesis and that's why i get a result 1040 similarly i can change the order of operations so here i can give 500 into and then something in the parenthesis so that gets operated first and hence you get result of thousand five hundred now we have already discussed about the assignment operator and what we can do here is we can assign variables some value so for example i create a
            • 39:30 - 40:00 variable called selling and then i would assign it a value similarly for cost and then we can do some calculation so we can say profit is selling minus cost we can do that and here i can look at the value of profit which shows me 250. now let's also spend some time in understanding data types in our so we can have different types of data so this one shows me an example of
            • 40:00 - 40:30 assigning a decimal value which is part of a numeric class so i can do this and then if i would be interested in seeing the value of num so i can just look at the value of num if i would be interested in looking at the type of num so i can do that here by just typing in type off and then select this one and pass in your num and it shows me the value is double i can also look at what class it belongs
            • 40:30 - 41:00 to and that shows me it is numeric so in this way you can not only assign values to a variable but you can look at the class and type of it now here we can assign whole numbers which are also known as integers now if i look at the type of this it shows me double so if i would want to explicitly assign an integer i could have done for example i let's say j and i could have used the
            • 41:00 - 41:30 assignment operator and i could have done this and then if i look at the value of j it shows me the value but what we would be interested in looking at the class of j so we can do this and it shows me it is an integer so explicitly either i can assign this by using a capital l or i could use a function called as dot integer so we'll see that later now we can also assign boolean values or
            • 41:30 - 42:00 basically your logicals so here we assign true and then we do a false and we can look at the type of t and that tells me it is a logical class now similarly you might be interested in working on text or string values and here we can do this by saying ch and then passing in a value look at the class of this it tells me it is the data type is character and if you look at the type of it it says me
            • 42:00 - 42:30 character similarly r also supports complex data types so we can do that too by just doing this and look at the class of it it tells me it is complex and you can also pull out the length of this by now here we are doing a length on the character so let's look at this one and it shows me what is the length of this now one of the useful functions which we usually use
            • 42:30 - 43:00 in r is print now i can simply do a print hey and that prints whatever values pass to print i can assign a value to a variable and then print it so that is also fine you could have also without using function just typed y and that also shows the value however sometimes using print as an explicit function can be useful it makes your code more readable now here we would use an inbuilt data set that is empty cars
            • 43:00 - 43:30 and then if you would want to print the data set that shows me the values which shows me the car models and different other features such as mileage cylinder horsepower and so on now one of the use case of print with a paste function can also be seen here so i'm doing a print paste and that basically prints whatever was passed in a concatenated way i could also do a print paste with a separator
            • 43:30 - 44:00 if i would want to format my data in a particular way so here i've used separator as comma there is one more function paste 0 which can be used so i'm just doing here paste 0 and that tells me just concatenate these values without any space so paste 0 shows no space between these two elements which were passed now we can explicitly do some printing and for that i'm using a s print f
            • 44:00 - 44:30 option i am going to pass in percentage s which is for string and percentage f for float and we can print the values of this so these are some basic operations or usage of your functions to basically do some computations or look at your results so when you talk about basic type of any r object it is a vector and when we talk about vectors empty vectors can be created with vector
            • 44:30 - 45:00 function a vector can contain objects of same type or a class now when we talk about list list is a vector which contains objects of different classes so these are some basic examples so apart from your print formatting we can be looking at what we call as our objects such as vectors or lists and so on so when we talk about vectors it is a
            • 45:00 - 45:30 sequence of data elements of same basic type we use the function to declare a vector so we can always do a c function to declare a vector for example here we are creating a variable v 1 and we are assigning it a vector by using c and then giving some basic type so numbers one to five or for example words you can always do a print or you can also use a class to find out
            • 45:30 - 46:00 what is the class of the elements or the values which have been passed to the particular object so we can look at some examples like this for example we can see here so list is a vector which contains objects of different classes so you can have numeric objects so that is your numbers such as 1 2 etc
            • 46:00 - 46:30 which are your numeric values for example here what we are doing is we are assigning a value 1 to a and that can then be used i can either do a print or i can just use auto printing i can also do here a value for a i or i could be doing something like this which shows me 0 which can be for missing value so if i
            • 46:30 - 47:00 would want to use auto printing i can just call a and it shows me the value what has been assigned to it you can always use a type of to look at the value of a which is double by default and if i look at type of a i that is basically an integer because we used l here so in this way we can continue working with say our different classes of objects so for example let's create a
            • 47:00 - 47:30 vector here so i can say v1 and then basically assign it by using a c function and then pass in the values to this one and that basically gives me a variable and you can look at what are the values assigned to it now if i look at the class of v1 that shows me it is numeric if you use type off and then you would want to see the
            • 47:30 - 48:00 values of v1 that shows me the values are double now as we were seeing here we can be looking at the class so for example if i create one more variable and then assign values to it using c so passing in some words here for example let's go and say hello world and then i can basically do this and look at the values of this one i
            • 48:00 - 48:30 could also explicitly print as we discussed earlier by doing a print v2 we could also be having a paste function if we would want to use that so for example if i would do a paste function i could be using and this is missing a bracket so let's complete this and that shows me the value i could have
            • 48:30 - 49:00 also used for example paste 0 function and that also works fine so it depends on what we are looking at here so if i look at class of v1 which we had it is numeric and v2 is basically having elements which are of the class character so this is just a simple example of
            • 49:00 - 49:30 having your print functions having vectors created printing out the values of those printing out class and type of these to continue our learning on vectors as i mentioned earlier we can use the c function which can be used to create vectors of objects by concatenating things together so for example if we look at this one which says x and then i use c
            • 49:30 - 50:00 function and i say 0.5 and 0.6 so we can have a vector of numeric types so let's do this and then we can look at the value of x so it shows me my vector which has 0.5 and 0.06 i can also have my vector of logical values and now let's look at the value of x so it has true and false or we could have done it in this way
            • 50:00 - 50:30 where we can then look at the values so we can use the short form by using capital t and f i can create a vector with character types and then look at the values of those i can also be creating a sequence of integers as we saw in previous example and then look at the values which start at 9 and end at 29. now you can also create with complex
            • 50:30 - 51:00 types and look at the value so these are some simple examples of creating vectors now we can also use vector function to initialize vectors so for example if i would do this where i am saying my vector will be of type numeric length is 10 and then look at the values so it just shows me a vector which has all zeros and the length is 10. now you can create a vector of numbers
            • 51:00 - 51:30 by doing this as we saw in previous example and use explicit printing to look at the values or might be letters and then use a print statement to print function to basically look at the values of the vector now we can also try concatenating the above two so that creates a mixed vector which has two different kind of types here so i can do a mixed vector
            • 51:30 - 52:00 by using the c function and then passing in my numbers which has numeric types and letters which has character types and then we can basically do a printing of this which shows me the value but here what we see is coercion that is basically casting if you would know as the word in different programming languages so it basically coerces the numbers to character as characters cannot be coerced into numbers and then you can
            • 52:00 - 52:30 print the values of this mixed vector where everything is of character types so for example at this point of time if i would have done something like class of mixed vector and if i would want to look into the values of this one it shows me everything is of character types here
            • 52:30 - 53:00 now data type of different vectors can be returned by the function class as we saw just now so it is common to use the class function to integrate an object asking what is the class now we can create one dimensional object such as an integer vector which we have done earlier and then look at the class of it which tells me it is an integer i can also create a numeric vector
            • 53:00 - 53:30 by giving in some values here so when we do this so i have given the vector function c and then giving in the value and look at the class it shows me it would have numeric values now you can create a character vector and then basically look at the values of it now at any point of time in all of these for example if i would do num i can see what are the values assigned to it i can do letters
            • 53:30 - 54:00 and i can see the values of this so let me just create some space here now i can create a factor vector and then look at the values of it or also you can see what is the value in this factor vector so here we said as dot factor so factor function is being used here and we are creating a vector of letters and then we look at the class
            • 54:00 - 54:30 we also look at the values what are assigned to this or what are in this particular vector so if you look into all of these vector examples initially we were using an assignment operator where we were using the c function and when we started creating vectors by say concatenating or vectors of particular types we are using equals here and that also is fine
            • 54:30 - 55:00 now looking further when we look at concatenating two different kind of vectors so for example here we have say numbers and letters as we discussed earlier it will do coercion that is change one type into other now when we talk about one-dimensional objects we can have integer vectors or say float which we saw just now
            • 55:00 - 55:30 ending at 10.5 so when we say c 1 is to 10 it basically starts with 1 but then there is also you can say a question happening here and then you have the values ending at 10.5 that is float and i can look at the class of it and when we did a class of did we do a class here so let's come here and let's do a class of this one it
            • 55:30 - 56:00 saves me it is numeric you can look at the values of it similarly you can create a character vector which is 1 to 10 and then basically look at the class of it or basically the value of this vector or as we did the factor vector now for two dimensionals we will explore that when we are learning about matrix so as of now let's forget that now when you talk about mixing objects there are occasions when
            • 56:00 - 56:30 classes of our objects get mixed together so that could be accidentally or that could be intentional so if you look at this example here we have y which has been given values which is 1.7 and a and at this stage if i would look at the value of y that's my vector if you look at the class of y that shows me it is
            • 56:30 - 57:00 as character now when you look at some other examples so let's pass in logical and numeric values what would happen in this case so we can again use class of y and that basically has numeric and if you would want to look at the value of y that shows me 1 and 2 here let's go further
            • 57:00 - 57:30 so let's look at the value of this one so y and then basically see what is the value of y so it is a true and you can also look at the class of it now we are mixing objects of two different classes in a vector remember when we talk about vector we always talk about vector having elements of same type but when we talk about lists which we will learn later
            • 57:30 - 58:00 that would have basically or that can have your each element of different type so for vectors it is not allowed so when different objects are mixed in a vector coercion occurs so that every element in the vector is of the same class now we have seen earlier the implicit coercion where our r tries to find a way to represent all the objects or
            • 58:00 - 58:30 elements as i say so all the objects in the vector in a reasonable fashion so we can also be doing explicit coercion so that is from one class to another by using a as dot and then using a relevant function so if i have x here now if i look at the class of x it tells me it is an integer but i can convert that to numeric by doing a as dot numeric or as dot logical or as dot character to basically do a
            • 58:30 - 59:00 coefficient and change the class of the objects now if r cannot figure out how to coerce an object this will result in nas being produced which we can also relate to missing values or not applicable values so for example if we create x and look at the class of x it tells me this character let's try changing character to numeric which will not work and it says n a's are
            • 59:00 - 59:30 introduced if you do it even in logical that would not work and it shows me any values or if you do a complex it says values have been introduced so at this point of time if i look at the value of x it tells me it was assigned a b c and we try to convert that into a different class now when we talk about vectors it is also good to know about attributes in brief
            • 59:30 - 60:00 so all your r objects have attributes that is metadata for object so when you talk about our object attributes you could have names you can have dimension names you can have the dimensions that is matrices and arrays you can look at the classes such as integer numeric and so on and you can also look at length which is user defined attribute so if i say x we are assigning a value to x now at this point of time if i see my value to
            • 60:00 - 60:30 x is 1 but then all objects need not necessarily have attributes so in that case whenever you try to use an attributes function that would return null so at this point of time if i look at the attributes of x it shows me null value so these are some of the basics which help us in working with r and using your vector function or looking at the coefficient which is
            • 60:30 - 61:00 implicitly happening or explicitly can be done by us by using a as dot sum function now let's learn about lists and how we can work using r on list when we talk about vector which we saw in previous examples vector is a one-dimensional array right and it can hold elements only of same
            • 61:00 - 61:30 type so we would say vector is more of one-dimensional but when you talk about list list is a generic vector that can contain objects of different types so when you talk about say for example matrices matrices can also hold elements of same type but in matrices it is a two dimensional array we will talk about matrices also we will learn so when you talk about lists they can contain all kind of r
            • 61:30 - 62:00 objects so you can have dates you can have data frames you can have vectors and many more so in list there is no coercion which is required that is changing of data type there is no loss of functionality and lists do not follow any predefined structure now we can create lists using this list function as it is shown here so you can create a variable and then assign a list to it where you can be using
            • 62:00 - 62:30 either passing in a vector or what you can do is you can simply create a list by using this list function so let's see some example here now for that what we can do is i can bring up my r studio where we can see an example on list and how it works so when you talk about list what you can do here is let me close this one and this one yeah so what we can do is we can basically
            • 62:30 - 63:00 say for example test and i can basically give something here so for example i can say music tracks and then i can say how many hundred of them and i can say let's give 100 as number and then we can say how many of them got 5 stars and i can do this so i can
            • 63:00 - 63:30 check this and this shows me all the objects or elements of this list right now when we do this what we are doing is we are creating a vector right and vector basically can have coercion depending on what are the elements which are passed because whenever you use the c and you create a vector it will only accept elements of the same type so for example if i do a class on test
            • 63:30 - 64:00 it shows me here all the objects are of type character right and you can also use type off to check for our test variable and it is basically having all the objects as character now how would you create a list so what we can do is we can use a list function so for example let's again do a test here but this time i'm interested in creating a list
            • 64:00 - 64:30 and list can have objects of different types so let's say music tracks and then i can just give hundred and i can say with rating five and now if i look at my test it shows me all the elements of your particular list here we see each element or each object with a double bracket and we can see each element
            • 64:30 - 65:00 now what we can also do is we can use is list function and then we can pass and test here to check what is it and it is a list right so here we have created a list but if for example we take the previous example where we were creating a vector and if i would do a is list it would show me false right so we just created a simple list and we can also arrange labels
            • 65:00 - 65:30 or we can use a name function to basically give names so what i can do here is let me create a list first so i can do that like this and now what i can do is i can do a name and i can use a name function to this test and then basically what i can do is i can pass labels so here i can just given some names here
            • 65:30 - 66:00 so for example i can say let's give it a name product so say we are talking about product of a company and then we can say here i can give count and here i can give rating and this is basically two given names so let's just give
            • 66:00 - 66:30 there's some error here let me just check this so let's use this name function here and what i'll do is i will basically use names and now let's do a test so that shows me the names what we have assigned to our
            • 66:30 - 67:00 list objects now we can always access the elements of our objects from a list using indices or even using double square so for example i have test here and basically i can give something like this which gives me based on the indices the position where you are accessing the elements of the list so we
            • 67:00 - 67:30 can do this what i can also do is we can specify names when creating a particular list so for example what i can also do is i can say product dot category and now i can just give list function so i would want to assign names while creating a list so i can say for example
            • 67:30 - 68:00 product and this would be say [Music] music tracks then i can give say for example count and count would be hundred and then ratings and i can say five and now we can basically access this list which we have created
            • 68:00 - 68:30 so what we have done here unlike earlier when we created a list and then basically use names function to assign a name to it or each object here while creating a list itself we passed in the names so we can also do that now if you would want to basically list display the list or a compactly display structure of a list
            • 68:30 - 69:00 we can always use the string function and here i can pass in the name so let's choose this one and this is in a more compact way listing down the elements of your list so list can be containing other lists also and we can also do that so for example i create one more list for example i can say similar
            • 69:00 - 69:30 product and here i can give a list again and what i would want to do is i would want to say product equals and i can say film and then i can basically give a count and then i can give ratings say 4 and here what i've done is i have just
            • 69:30 - 70:00 created one more list but my intention is not just to create a list but i would want to add this to our existing list so what we can do here is we can take our previous list that is product dot category like what we did earlier and now i intend to say list and here i would want to say for example
            • 70:00 - 70:30 let's copy this or we can just so this is what we were doing when we were creating a list using product giving the names while creating a list and what i also want to do is here i will just say similar and then pass in similar
            • 70:30 - 71:00 dot prod so now if you look at our list we have just added new elements so this is one more way where we can create a list and we can basically add or our list can have other list so when we talk about subsetting or extending list so one of the main ways as i said to access a specific element or a subset we use double brackets and
            • 71:00 - 71:30 we can always do that so for example we take our prod dot category and then i would want to access a particular element so i can always do this by giving the index positions and i can access the elements of my list so this is one single way now here if we use a single bracket instead of double bracket then in that case
            • 71:30 - 72:00 we will the output would be a list so if i look at this one then this would be a list but if you use double brackets then you are accessing a particular object if we were creating a vector we could just be using a subset by using the c function now what we can also do is we can subset by names or even logical so what we can do here is we can take this product category and if we have
            • 72:00 - 72:30 defined names then in that case what i can do is i can say i would be interested in music tracks and this is the name we had given so we can close this one and we can try accessing the elements here so we what's the name we had given so no it's not music tracks that's the value the name is product so we do this and then we can access the
            • 72:30 - 73:00 elements what we can also do is we can be subsetting based on logicals so what we can do is we can basically just give something like this and here we can pass in values something like this
            • 73:00 - 73:30 and we missed a bracket so that's also a way of pulling out the values so you can be doing a subsetting using the names which you have assigned to objects within your list or you can say names which you have assigned to the elements or by using logicals now what we can also do is we can use the dollar function now if you see here we are looking at the name and that is preceded by dollar so we can
            • 73:30 - 74:00 always pull out the values from our list by giving the list name and then give it dollar symbol and then choose the name for example if i choose product i can list the values here i can be looking at say dollar and then choose a count and this is also one way of accessing your elements from the list using your dollar
            • 74:00 - 74:30 symbol now to add elements to a list as i said you can add a vector of names and that can be passed to your list so these are different ways in which you can work with list and then you can access the elements either using indices or using names or even using dollar symbol and pointing the right names so this is one simple example of working with your list now one more and now i can just do a ctrl l
            • 74:30 - 75:00 and i can clear that off so your list always remember is a generic vector that can contain objects of different types now when we talk about matrices now matrix is a collection of data elements arranged in two dimensional rectangular layout so we can use matrix function to create a matrix as shown here so matrix is two dimensional now we already know
            • 75:00 - 75:30 that vector is one dimensional array of data elements or a sequence of data elements but when we talk about matrix it's a collection of data elements that is two-dimensional arranged in fixed number of rows and columns so here you see that we are creating a matrix and we have specified the number of rows is three number of columns is three and we want it to be arranged by row where we have given the value as true so always remember matrix is 2
            • 75:30 - 76:00 dimensional and matrix can have only one atomic vector type unlike your list it's a natural extension of vector going from one dimension to two dimensions so matrix actually needs a vector which contains values that you place in a matrix and at least one matrix dimension so we can choose to specify the number of rows or number of columns when we are creating matrix so
            • 76:00 - 76:30 let's see a quick example of working with matrix so for example i could just say matrix which will have values 1 2 6 and then i can basically give n row and you can give a value to this one and that's my matrix similarly you could also be giving n columns so i can just say end call and i can choose this one and then pass in the value so that's a matrix where r
            • 76:30 - 77:00 fills values column by column now if you intend to fill up matrix in a row wise fashion so that your values 1 2 and 3 are in first row then we have to just modify this in a little bit different way so we have to say matrix 1 colon 6 n row is 2 and then i can give by row
            • 77:00 - 77:30 so you always have these helper functions which allow you to put out the values so for example i do this and then i can do a control enter so now if you see you have the values one two and three in your first row so when we pass a matrix function to a vector that is too short to fill up an entire matrix then something different happens we can have a look at this so say you pass a vector containing
            • 77:30 - 78:00 value 1 to 3 to the matrix function and say explicitly you want a matrix with 2 rows and 3 columns how do we do that so for example i can say matrix and here i can say 1 is to 3 now i can give n row and then i can give the number of rows which we want is 2 and then i say n column and this one i'll say three
            • 78:00 - 78:30 so i can do this and here what i have done is i have given the values one two three i have said number of rows is two and your number of columns is three so here r fills the matrix column by column and simply repeats the vector now if you want to fill using a four element vector in a six element matrix in that case obviously r will generate a warning message now apart from the simple matrix
            • 78:30 - 79:00 function which we are seeing you also have some functions such as r bind and c bind which are offers when you are working with matrices so we can use those so for example i could say c bind i could say 1 colon 3 and then i can say 1 colon 3 and that's my c bind that is column bind where i am passing the values 1 to 3 and which are stacked in a in columns i can
            • 79:00 - 79:30 also do r bind and similarly we can be passing in the values so i can say r bind and that basically arranges the values row wise so be creating a variable for example let's say n and let me create a matrix here so i'll say matrix now that will contain 1 to 6 and i can say by row and then you can give
            • 79:30 - 80:00 value which is true and then i can basically say the number of rows is going to be 2 and this is also fine so let's look at the value of n here so you basically created a matrix with 1 to 6 you arrange them row wise and the number of rows what you have chosen is 2. so what we can also do is we can use our bind and we can add values to it so for example if i want to add value 7 to 9
            • 80:00 - 80:30 what i can simply do is i can do a r bind i can say i would want to edit my n and then pass in the values so i can just do this and this has basically appended or added values to existing matrix so similarly you could have done a column bind and you could have added values to your existing matrix so for example if i
            • 80:30 - 81:00 take this one and look at my n and what i could do is i could do a c bind and then i can basically take my n and then pass in values to this one so let's say 10 and 11 and basically i've added 10 11 as a column to my existing matrix so this is one simple way where you work with a matrix and you are appending the values either at a row level or at a column level
            • 81:00 - 81:30 so let's also look at some other examples so basically if you would want to work with matrix one of the useful things would be naming the matrix that is in case of matrices we can assign names to either the columns or the rows if you don't do it we see the default values here which follows a numbering but what we can also do is we can use two functions here one is row names or you can use
            • 81:30 - 82:00 column names so these are the two functions which can be used so for example let's do a ctrl l let's try to get our n and this is what we are doing here but what we would want to do is we would want to give them some names so for example i'll say row names and then i will basically pass in a vector which has row names or vector which has column names so what i can do here is i can say
            • 82:00 - 82:30 i would want to give row names to n and then i basically give some value so for example let's say row 1 and then let's say row 2 and now i can look at my n which has the row names assigned to my rows similarly i could have also given column names so all i need to do here is i need to say
            • 82:30 - 83:00 column one and then i will say column two and then i can be using column names and let's look at this one so what went wrong here so we have three columns here we forgot that so we have to add one more column name and then it should be 5. so now if you look at this one we have just given row names and column names so naming the columns or rows in your matrices can
            • 83:00 - 83:30 be very useful now as the previous error says there is also a function called dim names and that's basically an argument of matrix function which can be used so we could also do something like this so for example i have dim names so let's have r n and then what you can do is you can do a dim names
            • 83:30 - 84:00 which you can then just create a list and in this one you can pass in a vector for row one and then vector for row two and what we can do here is once we have given this let's give a comma here and then give c and then give your column names which is column 1 column 2
            • 84:00 - 84:30 and then basically column 3 and now if you just look at dim names so you can just see that you have given some row names and column names and this can be used basically to assign to your list so if you try to store different objects in a matrix what would happen coercion would takes place right so for example if i have x and let's basically
            • 84:30 - 85:00 try to create a matrix which will have 1 to 8 and let's say the number of columns is going to be 2 so let's look at our x and this has the values now what if i create say l and then basically i will create a matrix which will be a matrix of letters so let's say letters and then here with letters i'll
            • 85:00 - 85:30 say 1 colon 6. now i would want to give the number of rows and let's give it say 4 and let's say number of columns and let's give it 3 and now let's look at the value of l so it has letters and x is having numbers and what if we bind them together using c bind which is for column wise binding
            • 85:30 - 86:00 so for example if i do a c bind and then pass in my x comma l so if you see here there is a coefficient which has happened where everything is converted into character so you can always do a class and you can check so this is a simple example of working with matrices there are much more you can do subsetting like what we saw in list but that we can learn later now let's learn about data frames
            • 86:00 - 86:30 and what is the data frame and how do you use r to work with data frame now data frame is used to store the data in the form of a table and for this we have a function data dot frame to create a data frame so what we know already is that data sets are comprised of observations or what we call as instances or variables and we always have observations
            • 86:30 - 87:00 to which some variables are associated for example we can talk about data sets of say five people now let's look at the information here here we look at the body mass index bmi where we are using a data dot frame function and then we are passing in say gender so we use the c function to pass in the values and then you have height and then you have weight and h and these things then become the columns
            • 87:00 - 87:30 of your data frame so for example if we would want to work on creating a data frame for people where let's say each person is an instance and properties about each person such as name age child or if the person has a child would become the variables so if we have such kind of information we
            • 87:30 - 88:00 cannot easily store that in matrix or list now data frames can be used for such cases now it's a fundamental data structure to store data sets pretty similar to matrix as it has rows and columns and here rows correspond to observations now here we can talk about in every individual or every person columns correspond to variables that is properties for each person
            • 88:00 - 88:30 now difference between your data frame and matrix is that data frames can contain elements of different data types so for example we can have one column being character other being numeric and yet another being logical or numeric so restriction is that elements in one column should be of the same data type now how do we work with data frames let's see some examples so what we can do is we can bring up our r
            • 88:30 - 89:00 so when we talk about data frames usually we don't create data frames by ourself we import data from data sources such as csv file or rdbms or even your excel or spss and then we create data frames of course r has ways to manually create data frames using data dot frame function so we can create three vectors first and then we can pass in those vectors to
            • 89:00 - 89:30 create our data frame so let's do that so let's say name and here i will use the assignment operator which we have learnt earlier and then i'll use c and then i can give some names here so let's say john and let's say peter let's say patrick and let's say julie and let's also give one more name
            • 89:30 - 90:00 so let's say bob so this is the vector which we are creating and we can check this is the vector which we have created now obviously you can do a class and you can check what is this and that says it is a vector of character now similarly we can create one more vector which is age and let's give some numbers here so for example let's say 28 and 30 31
            • 90:00 - 90:30 38 35 and these are the values for the age so age is also created similarly we can say if each person has children so we can say children and then i'll create one more vector and here i'll give values which are logicals i'm not going to give any numerics or character but i'm using logicals here so if a particular person has children or
            • 90:30 - 91:00 no so let's have this vector created and now we have three vectors that is name age and children and we can use this to create our data frame so we can just call our data frame as df and what we can do is we can use data dot frame function and then what we can do is we can pass our vectors within this such as name age
            • 91:00 - 91:30 children and that should create my data frame let's have a look at this and this shows me that the data frame is created now column names are inferred from variables which are passed to data dot frame function so the variables which we have passed to our data dot frame function is name age and children and those become the column headings for my data frame now what we could have also done is we could have created it in a different way so i could have said df
            • 91:30 - 92:00 and then i could have used my data dot frame function and in data dot frame function i could have said name is going to be named age would be age and then i could say children could be children and i could do this and this is also one more way where i'm creating a data frame
            • 92:00 - 92:30 and in this way we can now have rows of data frames like in matrix so this is also one way of creating a data frame to look into the data frame structure we can always use sdr and then we can pass our data frame and this basically prints out similar to that of list so we also need to know that under the hood
            • 92:30 - 93:00 data frame is a list and in this case this is a list with three elements so each list element is a vector of length phi corresponding to the number of observations if we create data frame with vectors not of same length we would get an error now here when we look at our data frame we know that name is a column so name column which is character is actually a factor instead of character
            • 93:00 - 93:30 to suppress this behavior we can always use a property that is strings as factors equals false so what i can do is i can do a data frame like this use my data dot frame function and then basically we can pass in our vectors that is name age and children and then what i can do is i can say strings as factors and set this value to false
            • 93:30 - 94:00 so if i do this and now if i look at my data frame structure sorry yeah now let's look at this one and this one shows me that unlike your earlier one now we are creating a data frame where our name would be containing characters there also by default it was showing as
            • 94:00 - 94:30 character usually if you because this value by default is set to false or it would have created characters or factors as we say now how do we do a subset and extend and sort data frames in r so as we have learned so far in brief about your data frames so data frame is somewhere like an intersection between matrices and lists so if you would want to subset a data
            • 94:30 - 95:00 frame we can always use the square brackets and in that we can use the single square brackets which are from matrices or we can use double square brackets from list or we can also use the dollar symbol so that all these things can be used to subset the data frame so let's use our data frame which contains information about people so we can select single element from our
            • 95:00 - 95:30 data frame so here what we can do is we can just say df and then i can use a single bracket and i can just do a three comma two so it would be good if we can first print the data frame and that's my value and now let's do a single bracket and let's look at this one so this tells me that we are using the row index first which is number three which shows me that we would be going to
            • 95:30 - 96:00 the row number three and then we point or pass in our column index that is number two so we could have done it in a different way also so we could have done df and then give it row index and then give the column name which you are interested in looking at and that also gives me the value so just like matrices we can choose to omit one of two indices to end up with entire row or entire column and for example if we would be
            • 96:00 - 96:30 interested in looking for information for patrick what i could have done is i could have just add df3 comma and this is showing me the entire row now always remember whatever results we see here that is giving me a data frame with a single observation because there has to be a way to store different data types and that's why the result is also a data frame what we can also do is to get entire age
            • 96:30 - 97:00 column we can just use our data frame and then we can pass in the column name here like this and that gives me just the column now here the point to notice is result is a vector because columns contain elements of the same type in previous example we were seeing a row and in that row was not a vector it was a data frame because
            • 97:00 - 97:30 values were of different data types now subsetting a data frame that results in a data frame and contains multiple observations can also be done by doing something like this for example i will do df and then i will say let me get 3 comma 5 and then i can just say age and children for example so let's say age
            • 97:30 - 98:00 and children and i can be pulling out the values in this way so i could also be just getting the results in the age column if i'm interested in by just saying df and here i can just pass in the column number and that also gives me the h column now we know data frame is a list containing vectors of same length this
            • 98:00 - 98:30 means we can use list syntax to select elements also and what we can do is we can use our dollar symbol and then choose the column name and this is also one way wherein you can pull out the values or you can use double brackets as i mentioned earlier and pass in the column name so that's also fine or you can give a column number and that also would work and in all
            • 98:30 - 99:00 these cases the result is a vector now with single brackets you can still do it always remember if you use single brackets then that will result in a data frame the result can be a data frame here but what we are seeing here is a list which contains only age column having the data elements so these are different ways in which you can do a subsetting of a data frame now
            • 99:00 - 99:30 using single brackets or double brackets can have serious consequences so we need to always think about what we are dealing with and how are we handling it now what we can also do is we can extend our data frames that is we can add variables we can add columns that is adding variables or we can add rows which are nothing but observations so adding columns is like adding new elements to the list and for which we
            • 99:30 - 100:00 can obviously use dollar or double brackets say for example now this is my data frame and if we would want to add height whose information is in a vector so let's say height let's create a vector here and this one is what i would want to add for each person so let me do this and let me pass in some values here and
            • 100:00 - 100:30 the last one something like this so this is a vector created now what i can do is so we have data frame is called df so we can say df dollar height and then i will pass in this vector here and now if i look at my data frame you see the fourth column has been added and that's my height column now what i can do is i could have done it in a different way
            • 100:30 - 101:00 basically if i had my data frame i could have just done df double brackets and then give it a name and then i could have passed my vector in this way however so this is also one way of doing it we have already added the column so we don't need to repeat the step now what we can also do is we can use a c bind function and if you remember c bind that is for column binding so for
            • 101:00 - 101:30 example let's create a weight vector now and let's pass in some values here so for example let's say 75 65 54 34 78 and these are my values of weight now what i can do is i can just do a c bind and then pass in my data frame and then pass in this vector and in this way i'm just adding columns or i'm extending my data frames by adding more columns to it now obviously
            • 101:30 - 102:00 if we can use c bind then we can also use r bind to add new rows so for r bind creating a new vector won't work because we need to create a new data frame with one single observation remember row will have values of different data types so we cannot create a vector we have to create a new data frame and then we can add it using our bind so let me create a data frame here for
            • 102:00 - 102:30 example let's call data frame and storm and let's pass in some values here so i will say data dot frame function and then let's give name what we can do is we can give age then we can give the logical value then we can give say height and since we have added weight let's
            • 102:30 - 103:00 also add weight and this is my data frame now we can use r bind function so i can say r bind and then i can pass in my data frame and this new data frame which we have created and this tells me that the number of columns of arguments do not match so we will have to check this one so we have our data frame which has just height so
            • 103:00 - 103:30 it does not have the weight that was only as the result of c bind so let's create the storm again without weight and now let's do a r bind and let's again check what is the reason here so this is height and let me just check this so to look at this this is the error we were getting because i was creating a data frame with
            • 103:30 - 104:00 four columns and then i was trying to add that to a data frame which had three columns now yes we had done a c bind and c bind was showing us the fourth or fifth column but the original data frame only had three columns so what i did here was i did tom and then basically i created a data frame with three columns which matches with my original data frame
            • 104:00 - 104:30 which had three columns and then i could use r bind to basically add one more row so what we did was we used r bind and our bind was used to add a new row to our data frame now when it comes to sorting or ordering your data frame say for example we want to sort data frame by age now how do we do that so we could easily do sort df and then select
            • 104:30 - 105:00 our column and we could just do a sorting now if we do this it is good but not really what we need now other clear way of doing that would be using ranks so for example if i do a ranks and instead of doing a sort i would use order and then basically pass in my column so i would say df and then i would use h now in this case if i look at ranks it shows me
            • 105:00 - 105:30 a vector of ranks with rank position of each element now if i do a df dollar h it shows me the values and if you look at the ranks it will tell 21 or here the lowest value is your 28 and that's the lowest value and that's why we see as rank as one and so on we
            • 105:30 - 106:00 can look at the ranks so what we can also do is we can just do a df and then basically use ranks and we can just look at the result so this shows data frame which is a ordered data frame now based on ranks now if we would want to do it in a descending order what we could also do is we could do a df and then use order and within order i will basically pass
            • 106:00 - 106:30 my data frame i will choose my column and then i could say decreasing equals true and i could do this and here this could show me the value so it says undefined column names so what i would have to check is what is my data frame here so we have age and then what we would have to do is we would have to select a particular column so let's do that
            • 106:30 - 107:00 and here i have just selected the column and then there is a comma missing that was showing an error so now we can have the data ordered in a descending wave so there are dozens of packages such as d plier data table which can help you manipulate filter merge and sort your data frames so this is in brief about the data frames working with data frames subsetting them and also sorting the data in your data
            • 107:00 - 107:30 frames now one more important type of object in your r is vector and that really helps us in various ways so let's see how we work on vectors here so to create a vector we can use the c function and pass in the values those will be the objects or elements within the vector and then you can look at the value of the vector or also at the class of it which tells me the values are numeric now in case of vector all the values
            • 107:30 - 108:00 have to be of the same type or belong to the same class we can say so here we are creating a vector looking at the value of it and then looking at the class which says the values passed in here our character similarly we can do it for numerics that is true false and then look at the value of this and this class is logical now what we can also do is we can print all the three vectors at once and here
            • 108:00 - 108:30 we will use semicolon to separate two or more variables and we can pull out the values of all the vectors which see we see here now what happens if we pass in the values which belong to different classes or you can say different data types so within a vector if you do that there is something called as coercion which takes place which will convert all the values into one type and in this case it has converted everything into character
            • 108:30 - 109:00 similarly we can pass in values wherein we can pass logical and numeric and in this case it's not going to go for character it is going to convert everything into numeric now if i had done this where i passed a character and numeric and if you look at this then it has converted everything into character so character always takes a precedence if it is one of the values of vector and you have other values which are not characters then in that case coercion
            • 109:00 - 109:30 will happen there is one more way of creating a vector and that is by providing a range to your c function so we can do that here wherein i said c 1 colon 20 and then basically look at the value of vector 7 so it shows me all the values starting from 1 till 20 however there is one more way you can use the sequence function to do the same thing now i could avoid the bracket i could avoid
            • 109:30 - 110:00 the c function and i can straight away pass a range and that is also fine to create our vector starting from 1 ending at 25 so what if i want to create a vector with odd values between 1 to 20. now in this case i am going to say how many values to skip or to jump so i am creating a variable called odd value i am using sequence function and then to that i'm passing the beginning number the ending number and then the skip or
            • 110:00 - 110:30 the jump and now if you look at the values it shows me only the odd values well you could have done the same thing to get even values and that's not very complicated so you can start from two and then you can do skip wherein after two it basically gives you every second value so we are looking at the even values and this is how you can create a vector which is having odd or even values
            • 110:30 - 111:00 now what if you want to create a vector with 10 odd values starting from 10 so you are basically giving a length so here you can say from where you would want to start what is your skip and then the length of the vector which tells me it gives me 10 odd values beginning from 20 or from 20 onwards that is we take it from 21. now one of the
            • 111:00 - 111:30 requirements is always to name the values so that we can access the values either by indexing or by their name which have been passed to the value so let's see that so let's create a vector which is called temperature so variable is temperature pass in the values to this look at the values of temperature now what we would want to do is we would want to assign these names to each value which makes it more readable more accessible so i can use the names function
            • 111:30 - 112:00 pass in my temperature as a vector to names function and then assign the names to each value of temperature now if you look at temperature it shows me the names which have been assigned well we could have done it in a different way we could have created a vector of names something like this and then what i could have done is i could have created one more vector such as temperature and instead of assigning values we could have assigned the vector
            • 112:00 - 112:30 to our existing vector so if you do this so you are assigning the names vector to the temperature 1 and now look at the values it still does the same thing so this is where you are assigning names to every value of your existing vector now there is one more way and that is using your sequence so here i'm creating a sequence which starts with 100 and set
            • 112:30 - 113:00 to 2020 with a skip of 20 values or every jump would be 20 values so let's do that use your names function on price and then what i'm going to do is i'm going to use my paste 0 option which takes p and then 1 to 7 as the values so we know base 0 basically skips the space and we are going to assign those values to as names to price
            • 113:00 - 113:30 and now let's look at our price so that basically gives me the names as we desire so these are some smarter ways of assigning names to every element or every object within your vector now how do we perform some basic operations let's have a look so let's create a vector passing in the values and then you can simply do an addition on two vectors where each element is getting added to other element of the vector
            • 113:30 - 114:00 you can subtract two vectors that is element to element subtraction element to element multiplication or division and you can basically perform operations on the vectors now how do we use some inbuilt basic math functions and that's pretty easy this is my vector now let's do a sum which sums up all the elements let's find out a standard deviation for all the values let's find out the variance
            • 114:00 - 114:30 for all the values here let's do a product of vector values find the maximum or find the minimum value so these are some basic inbuilt math functions which sometimes are useful in our data science or data analysis kind of activities now one more requirement might be comparing the vectors using comparison operators and this is where i create a vector 1 create a vector 2 and let's find out the
            • 114:30 - 115:00 values in v1 which are smaller than v2 values and that gives me the logicals as the response that is false true and false similarly you can do v1 greater than v2 or you can say where v1 values are not equal to v2 or equal to v2 so these are some simple comparison examples now i can create a different vector and then i can find out individually if the elements in the vector are lesser than 3 by just doing a
            • 115:00 - 115:30 v lesser than three so it compares each element with this so you are actually using one scalar value to compare it with all the elements and you can do that it gives you the logicals so you can also be doing slicing and indexing on vectors and this is very much important when you're storing your data in vectors how do you access them so let's create a vector using sequence let's give it some names as we have seen in past
            • 115:30 - 116:00 and let's look at our price one so that tells me the name and the values now you can access the elements using indexing so let's get the third element and it shows me 590. remember the indexing here starts with 1 unlike other programming languages like python where indexing starts with 0. now i can also get the 3rd and 4th value by doing a 3 colon 4 i can also specify the vector and say one comma four and that shows me
            • 116:00 - 116:30 the first and the fourth position or second or sixth position so this is one way where you are using indexing to access the elements similarly i can give the names now that's where we see the benefit of giving names to every element so i can use c function pass in the name and look at the value for that particular name or selectively select different columns or different names
            • 116:30 - 117:00 or we can also use this square bracket wherein we pass the names so sometimes it can also be useful to use logical positioning that is we would want to find out the logical position if the value exists and we can do that or using true and false and then look at the values so there is one useful way where you can exclude a particular position might be that is an
            • 117:00 - 117:30 n a value might be a value which you are not interested in and that's where you will say minus 2 which will skip the p2 value or minus 2 n minus phi where we are skipping a p 2 and p phi and we can exclude particular values from our vector now how do we do a comparison operator on the values of vector so you can just say price 1 and i would want all the values which are greater than 600 or you can assign this to a filter and
            • 117:30 - 118:00 then basically pass in the filter for your vector so these are some simple basic operations which you can run using your r programming where you would want to manipulate where you would want to store some data and extract that data use your different logical operators or other operators and perform your basic easy computations now that we have seen some basic operations using r let's look at some
            • 118:00 - 118:30 more operations when you're working with vectors such as one of the common issues is handling the missing values now here we are assigning a vector to a variable order detail and this one has a missing value now let's see how this is handled and you see all the values in the vector are assigned what you can also do is you can assign names as we have seen earlier by using the
            • 118:30 - 119:00 names function and then look at the value of order detail so you see the names and these are your missing values which are also taken care now what we can also do is we can perform an operation on a particular vector which will be applied to all values of the vector so for example here i will just add a scalar value plus 5 to the elements in the vector and that shows me number 5 has been added to each element or each object in the vector
            • 119:00 - 119:30 now if you would want to work on two vectors for example to add two vectors let's create a vector called new order and then let's add it to order detail now in this case what we are doing is we have a vector which is from 5 to 10 and what we are doing is we are adding values to order detail now our order detail earlier was 10 20 30 n a 50 and 60
            • 119:30 - 120:00 and what i have done is i have passed in a vector which is 5 and 10 and you are adding it to the elements so 5 gets added to 10 and then your value 10 gets added to 20 and then you have again 5 which is added to 30. now you cannot add in anything to a missing value so that remains as it is then you add again 5 to 50 and then 10 is added to 60. so in this way you are adding two vectors which are not of same
            • 120:00 - 120:30 length but you are adding these values now what i can also do is i can update the order by doing this so i am creating an update order and now let's look at the value of update order what does it show so you are basically doing the same thing so if you would want to work on a subset of vector how do you do that so here you are using some indexes so i'm saying order detail and this is my order detail
            • 120:30 - 121:00 so let's take one colon two and assign it to first two so if we look at the value which is assigned to first two we have just sliced and added a subset of vector to this one and if i would want to take the length of order detail it shows me the length here which is six elements here including the missing value also what we can also do is we can do some more operations so for
            • 121:00 - 121:30 example from order detail what i am doing is i am saying length minus 1 and then up to the length so let's do this and let's see the result of this so what we have done is we had our order detail which had these values and what we have done is we have said length minus 1 colon length so you have taken these two elements and you have assigned that to
            • 121:30 - 122:00 your v1 similarly we can do length minus one and two elements so i can do this and now let's look at the value of v2 so this shows me the value where you are taking length minus one and then you are taking it till the second position of the index element which is 20 so you are getting in the values here so you get your 50 n a 30 and 20 because you started with
            • 122:00 - 122:30 length minus 2 and up till the second index position similarly we can use the length and we can take it from this element and let's look at the value of v3 so that shows me that i am i'm doing some slicing or i'm getting subset of my vector so similarly you can also do this one so v4 and let's do this and then let's look at the value of v4
            • 122:30 - 123:00 so it gives me the values based on our subsetting or slicing now you can extract all the values below 30 and this is where you are doing a comparison so you will take your vector and then you would want to compare each value if it is less than 30 and you would want to take all the values here so it gives me the logicals or the response for all the values which are lesser than 30 what we can do is we can also use the square brackets and
            • 123:00 - 123:30 do this this will show me the actual values here we were just getting the logicals but here we are getting the values now to omit any value from the vector we can use n a dot omit and this one will help me in getting rid of the n a values plus i'm also checking the values if they are less than 30 and then i am basically doing using this n a dot omit
            • 123:30 - 124:00 so you can do something like this you can look at the values what you can also do is you can find the order details that are multiples of three and here we would want to use modulus and we would want to find out if the remainder is 0 then i am getting the numbers which are divisible or multiples of 3. so let's do this and it gives me again the logical values of all the values which are divisible by 3 giving us a remainder of 0
            • 124:00 - 124:30 or if you would want to look at the values then you can say order detail open up a square bracket and then pass in your condition now we can then omit n a from this one and then we can look at the values so this is simple way where you are subsetting a vector or extracting the values which you are interested in which might be one of the requirements of your data wrangling or data manipulation or just data extraction
            • 124:30 - 125:00 now i can also use a sum function now if we do this it returns n a because there is already a missing value and you cannot do a sum on the values now what i can do is i can do a n a dot r m to remove the n a values so i can do a sum on order detail where i intend to add up all the values but what i also want to do is i want to remove the n a value so i'm giving it a
            • 125:00 - 125:30 value as true and then if i do it it gives me the sum of all the values so similarly you can do a mean you can do a maximum you can find out the minimum value standard deviation or even square root now these are some simple operations what we are doing on vector where we are interested in extracting some specific values now let's look at matrix which we have also discussed and matrix is also one way where you can use the matrix function to create a matrix
            • 125:30 - 126:00 which is multi-dimensional so for example if i do this and if i look at the value of v i get a matrix which starts with the value of 20 ends with 30 and at any point of time you can convert this to matrix so first we created a vector and now i'll create a matrix out of it wherein i am seeing the row numbers i am seeing the column number and i am seeing the values in that particular column
            • 126:00 - 126:30 so you have already done that now let's take it to the next level so let's create a matrix wherein we are using the matrix function we will say 0 comma 3 comma 3 and now let's look what it has done so you have created a matrix which is of three columns and three rows and by default the row number and column numbers have been assigned to them we can also create a matrix by passing in values so we can say 1 colon 9 and then
            • 126:30 - 127:00 give the dimensions that is number of columns is 3 number of rows is 3 and if i look at the matrix now i have passed in the values to my matrix sometimes you may want to arrange the data in a matrix for particular kind of calculations you can also use n row and by row so you can say how many number of rows you would want and you would want to assign the data row wise so when we are doing this now
            • 127:00 - 127:30 if you notice the difference between the previous one where we just gave the values and we said three rows and three columns so it was doing it column wise so one two three four five six seven eight nine but here we said by row is true so it has arrange the values in a row wise fashion so it goes one two three four five six and seven eight nine similarly i could have just done this by giving the dimension and selecting by row and
            • 127:30 - 128:00 if i do this it is still doing the same thing now what we can also do is we can create matrix using vectors so here let's create a vector stock1 and then stock2 now we would want to merge both the vectors so you can always do a c function and then create a new vector that is stocks which is emerged result of stock one and
            • 128:00 - 128:30 stock two and let's look at the result so that's my stocks that's a vector and now what i would want to do is i would want to create a matrix using the stocks so i'm giving it a name that is stock.matrix i'm using the matrix function wherein i will pass my vector i will say by row so i want the values to be arranged row wise and i'm also selecting the number of rows so if you look at this one so the values
            • 128:30 - 129:00 which we had in our stock which was all the values have now been arranged row wise and in two rows so it starts with 450 51 52 45 and 68 that's my first row and the rest five values are arranged in the second row so one of the main requirements is instead of going for default column names and default row names we can give specific names to our columns
            • 129:00 - 129:30 and rows to make more sense to the data how do we do that so we can basically say days so this is a vector which we are creating and then what we want to do is we want to create a new variable which is stock 1 and stock 2. now this is for my columns and this will be for my rows now how do we assign that so we can say column names and this is where i will say on my stock.matrix
            • 129:30 - 130:00 i will assign days which has five values and that will become my column names and similarly using row names function i can basically assign row names to my matrix so if i look at my matrix now it shows me the column names and row names which we have assigned or which we have passed to our matrix now there are different functions which are associated with the matrix and let's look at some examples so these are some simple basic examples now if i say
            • 130:00 - 130:30 let me find out the number of rows and that gives me the number of rows or number of columns or get a dimension that is the number of rows and columns of your matrix now we might be just interested in getting their own names or column names or even the dimension names which basically will give me returns the row and column names so in this way you can use these symbol
            • 130:30 - 131:00 functions which are associated with matrix to extract information about your matrix or data which has been transformed into matrix to pull out some information about that one of the requirements which data scientist or data analyst might face is carrying out arithmetic operations on your matrix now what we can do is we can create a matrix which takes values 1 to 50. we want to arrange it by rows and we will say number of rows is 5 so that's my values
            • 131:00 - 131:30 starting from 1. now i can do a addition here by just doing a 5 plus matte one and if you notice number five as a scalar value has been added to every element of the matrix similarly you can do a multiplication you can do a division you can basically return the quotient if you would want to do that or go for
            • 131:30 - 132:00 exponential values so you can perform simple arithmetic operations for every element of the matrix and what if you want to have arithmetic operations done on multiple matrix so let's create mat one plus mat one and we get a total where every element is added to every element you can do a subtraction you can do a multiplication and you can get the value so this might be also very
            • 132:00 - 132:30 useful when you are working on multi-dimensional data you can also do some more operations on matrix such as returns the sum for each column say you are doing a summation or at a row level or you want to do a mean for every row you can do that by using these simple functions now you can add rows and columns to a matrix using r bind and c bind functions
            • 132:30 - 133:00 so r bind is for row bind and c bind is for column bind but for that we have to first create a vector so let me create a vector of same length which will then be added to every or added as a row to my existing matrix now my matrix has five columns so let's create a vector with five elements and then i can basically add this as a row to my
            • 133:00 - 133:30 existing matrix by doing this and now if i look at my values i will see the new values at this as the third row and if you also see the variable name becomes the row name and we have added a row to our matrix now similarly i can find out row means that as we have seen earlier by calculating the mean or average so i can do that and i can find out the value of average now what i can do is i have got the
            • 133:30 - 134:00 average for every column and what we can do is we can basically do a column bind by using a c bind function and i will say i'm going to take the total stock which has three rows and then get the average and now let's look at the total stock which shows me the average value which is the new column which has been added to the matrix so these are some simple very simple operations which you can do
            • 134:00 - 134:30 but that gives you good insight in what can be done at a matrix level where your data is arranged in multi dimensions now how do we do a selection and indexing in matrix so in vectors we were using either names or we were using positions or we were using indexing now here let's create a matrix called student and we are using the matrix function but within the matrix function we are using the c function to create a vector
            • 134:30 - 135:00 which will pass in all the values which also has n a values if you closely notice we will split these values into number of rows is four so that means the values the number of values in this vector should be a multiple of four i'm saying columns is four and i would want to arrange this data row wise so i've done that and if you would want to get the dimensions out of this so i can do a dim names so what i'm doing here is
            • 135:00 - 135:30 on my student i'm assigning a list which will basically have these names which are basically assigned and now if you look at your student it basically shows me the values which were first applied to the row names that is john matthew sam and alice and then you have one more vector which goes as the column names for the values so you have not only created a matrix
            • 135:30 - 136:00 by using a vector by defining your dimensions that is number of rows and columns you have arranged the data in a row order and what you have also done is using a list function you have passed in the values which will be applied as row names and column names to your matrix now how do we extract particular columns here so we can take our matrix and we can just say comma 1 and that basically gives me the values for
            • 136:00 - 136:30 john matthew sam and alice and what we are looking at is the first column now i can also say from first column onwards i would want to look at how many columns so i can do this and now here i'm selecting first and second column i can also be using a vector function here and that also does the same thing where i'm saying one comma three and i'm getting the values from first and second column
            • 136:30 - 137:00 so third is not included here now if you would want to do row wise then you have to give the row position first so if i do a student one that gives me the row values and this is giving me values for my student which we are seeing here so for john we have 20 30 na and 70 and that's what we get here when you do a row wise operation you can also do a row wise and how many rows do you want
            • 137:00 - 137:30 you can use the vector function to do that you can also select or slice out a value where you are getting an intersection of row 2 and column 2 and then you can also start from a particular position and then onwards get your rows so these are different ways in which you are slicing the values from your matrix by columns or by rows
            • 137:30 - 138:00 so at this point of time let me just type in student here and let's look at the value of student and then here we are interested in 3 colon 4 and then 2 colon 3 so what does that give me so you are looking at third to fourth row so you are looking at sam and alice and then you are looking at columns 2 and three so that basically gives you your 26 32 24 and a
            • 138:00 - 138:30 so first is you're giving your row positions or how many rows you want and then you're giving your column so similarly you can do this you can say from row number two to four and then column wise you can say one two three so if we do this so this tells me two columns which is first and second and it shows me rows which is from second to fourth so in this way we can extract data based on rows and columns now if we
            • 138:30 - 139:00 would be interested in finding out a specific value so for example if i again bring up student this is my student and what i would be interested in is getting the value of john and for specific subjects so maybe we are looking for 2 colon 3 now if i do this it shows me for john and what we are interested in is 2 colon 3 so that gives me the value for chemistry and biology so you are
            • 139:00 - 139:30 giving the columns so row wise you have already specified the name and that basically selects the particular row i could have given a number and chosen which row or which rows we would want to pull out the values now if i would want to find out the value for john and sam now in that case i could use indexing or positioning but that has to be continuous but here you are talking about john and sam which has matthew in between so we will basically
            • 139:30 - 140:00 create we will get the values for john and sam and then we will look at the value 4 now that is basically giving me the values in the fourth column which is 70 and 75 similarly if you go further you can look at maths and bioscore of sam and alice so you will give your row names that is sam and alice and then you would want the values for maths and
            • 140:00 - 140:30 bio so that is basically your third and fourth column and we can do that by looking at the values how do you find out an average well that's pretty simple you can use the mean function on student you will select your row name that is john you also want to get rid of n a values otherwise that will give a power problem so you get rid of that by saying n a dot r m
            • 140:30 - 141:00 equals true and then you get the average score of john now how do i do further computation that is if i want to find out the average and total score of all students so in this case i can apply or i can use an apply function here i'm saying i'm working on student and we would want to give the row number that is 1 and we want to also give the columns so i want to find out mean
            • 141:00 - 141:30 i want to remove or get rid of the n a values and now if i look at help apply it tells me how does the apply function works over the array margins so i will do an apply function on student where i would want to select the first row i would say i want the sum and i want to get rid of the n a values so this gives me the sum for each student
            • 141:30 - 142:00 and here we are getting a mean value which was for each student so what we are doing here is for example let's look at student again just so that we avoid confusion so we have student and then we have physics chemistry bio maths and i have said row 1 so basically what we want is for john we want the total
            • 142:00 - 142:30 and what we can do here is we can say 20 plus 30 avoiding n a and then 70 that gives me 120 then you look at matthew so this is again doing a totaling there is no n a value and you look at the value right so when we have chosen apply function we have worked on student now here we are interested in the values that is sum of all the values for this particular row i am saying take care of any and then
            • 142:30 - 143:00 give me a sum similarly you did a mean and that was giving you a mean for each student so these are some simple operations now what we can also do is we can basically create a vector called passing score and what we would want to do is we want to get the values or find in how many subjects alice has passed how do we do that we will have to compare alice score
            • 143:00 - 143:30 which should be greater than or equal to the passing score so what we can do is we can create a variable here pass now i am saying student i would be interested in the values for alice so i have mentioned that row name here i am then comparing it with passing score which we have created here and that will give me the values wherever alice has passed in a particular subject now i can obviously get rid of the na values and then look at this which
            • 143:30 - 144:00 basically tells me there was one subject in which alice passed and rest were either false or any now same thing we can do for sam so sam is here and what we want to do is we want to look at the values here so we will say let's do the same thing for sam and find out the comparison with passing score and get rid of n a values so you are basically extracting value so these are
            • 144:00 - 144:30 some easier operations and usage of functions on your matrices which are filled in with values at row level and column level and then you can apply one of these functions or multiple functions to basically extract value which makes more meaning so that's with your matrix now let's also look at data frames now data frames as we know is basically data which has been ordered in rows and columns
            • 144:30 - 145:00 wherein we can assign row names we can assign column names we can do some operations on data frames so let's look at example so if i do a data here so that gets me some sample data sets or functions what we have here so let's do once we have our data here so it says use data package and then you can get
            • 145:00 - 145:30 list all the data sets in available packages and you can basically look at all the r data sets which we are seeing here it has opened up so i would be interested in getting the air passengers data so i'm going to pass that in the data function and then if i do a head to see the initial data from air passengers it shows me the values what we have similarly we can do that on iris data set and look at the head values i can do a view to look at specific values in
            • 145:30 - 146:00 a tabular format if that makes more meaning and that makes it easy for analysis now i can do a view on state x77 and that basically shows me the population income and all this for different u.s states so these are some different data sets what we have you can do a view on them to basically understand the data or look in a more readable format you can just do a tail to get some end data so head
            • 146:00 - 146:30 and tail functions just give you the top six entries or basically your entries from that particular data set now the question is how do we work on this data so i can get a statistical summary so i have the iris data set which we had here so if i do a head it shows me iris data set this is a popular data set which shows the petal lens sepal length of particular flowers and the species what
            • 146:30 - 147:00 is the length what is the width and what species does that flower belongs to okay now here we can get a summary that is statistical summary of a data set which gives me mean first quartile median mean third quartile and maximum values it basically shows you the count of the entries for each species what we have under the species column now what i can do is i can check the structure of this data set using str
            • 147:00 - 147:30 i can create a data frame now of this data using the data.frame function so for that we need to also have say for example if we would want to create a data frame let's see how do we do that so first we create a vector of days we can create a vector of temperatures and rain and then we want to create a data frame out of this so i use the data dot frame
            • 147:30 - 148:00 option i pass in my days temp and rain as the vectors and now if you look at the data frame you basically see that i have my days my temp and rain so those were the variables those were the vector names and those have become the column names row names are auto assigned and basically we are seeing the values which have been passed in my data frame now i can do a summary on this to basically look at what is the length or
            • 148:00 - 148:30 how many values we have in data frame what is the class of elements so that is character you are looking at your values or summary which gives you mean first quartile median mean and so on and then it also shows you the complete data on rain what is the mode here what how many falls or how many true values we have you can also look at the structure of this data frame by doing a sdr
            • 148:30 - 149:00 which gives me how many objects we have how many variables we have what are the different variables so that is days temperature and rain and the values for those for days if you notice it is of the type character temperature is numeric rain is logical now how do we do data frame indexing so like your matrix which basically has rows and columns and in multi-dimensions similarly in data frames also you have
            • 149:00 - 149:30 indexing so you can do a data frame so i could just extract the first row by doing this and that basically gives me the value so you can always compare it by just typing df so that's my data frame and now let's look at the values extract the first row and that shows me monday 25.6 rain value is true now i can also do it column wise so for example i could do it in this way so
            • 149:30 - 150:00 here what i'm doing is i'm doing extracting the second row from this one so it tells me 25.6 30.1 40.0 37.3 so you have extracted the values for the column right so i would not say extract the second row you would say extract the second column okay so this one should be second column yeah now selecting using column names so that's the easiest way to extract the values
            • 150:00 - 150:30 for a particular column so i can just do this instead of giving the position of the column or the column number i'll give the column name and that gives me all the values of temperature and if i do this where i'm saying 2 colon 4 and then i'm giving the columns so it gets me the second third and fourth rows for day and temperature and we are looking at the values so you have given your row names and then you
            • 150:30 - 151:00 have selected your columns you can also do a dollar sign if you would want all the values of a particular column so i can just do a df dollar days or df dollar rain and it shows me the values from my data frame now one more way of doing that is using your bracket notation to return a data frame format of same information so if you want the resultant data in a data frame format you can just do a df rain or df temperature and that is basically giving
            • 151:00 - 151:30 a data frame so if i had assigned this to a value and if i had look at the type of this that would be data frame now one of the things which we also require is filtering data frames using a subset function so that is subsetting the information from a data frame so we know we have our data frame let's look at our data frame again so that just reminds of what data values we have and here let's get a subset out
            • 151:30 - 152:00 of it using the subset function so i'm passing in my data frame i am saying i would be interested in the rain column so i am giving subset rain column and wherever the values are true so returns all the columns where it has rained similarly i can do a subsetting by giving a value for temperature wherever the value is greater than 25 and that shows me the value so this is where you are filtering
            • 152:00 - 152:30 the data in data frames using a subset function to which you have to provide a column name and then giving a condition now one more important thing which might be required is sorting your data frame using order function so i can create a variable by name sorted dot temp i want to do a ordering of data frame and here i am doing an ordering based on temp and now if i look at the value
            • 152:30 - 153:00 or i can create this in an ascending order so let's look at the values and now if i look at my data frame it just gives me the order or the ranking for the particular values so we have discussed this in other section also so what i can do is i can return all the columns with temperature sorted in a descending order so right now what we were seeing was we
            • 153:00 - 153:30 were seeing in ascending order but what we can do is we can do that in a descending order so here i'm creating a variable descending.temp i'm doing an ordering but when i'm doing a ordering i'm using the minus symbol and this one if you would look at in the form of a data frame it shows me the values which are ordered in a descending order based on the temperature column now another way of sorting is by using a particular column
            • 153:30 - 154:00 so what i can do is i can sort i can do a order and then i can choose the column based on which i would want to order it and then if you would want to get the values of this so it tells me the values have been ordered based on tip so this can be very useful when you would want to sort the data or order it in a particular way to basically understand your data or to make more meaning out of it right similarly one more requirement might be
            • 154:00 - 154:30 merging your data frames so here i am creating a data frame so i'm saying authors and i'm using data.frame function and what we are doing is instead of creating three vectors i am basically doing that within my data frame function so let's do that and now what we can do is at this point of time i can check what my authors look like so this is my authors now here if you see we have
            • 154:30 - 155:00 the vector turkey venables tierney ripley and mcneil so that becomes my first column which is surname then you have your nationality and then you have deceased where you have also repeated the values four times right so that's something new which you might be seeing so you are creating a vector where you are passing in a value and for other set of values you are basically using a repetitive function
            • 155:00 - 155:30 now similarly we can create a data frame called books and this one is where i am having name column title and then i have other dot author and you are passing in the values so at this point of time if you would want to look at your books it would look something like this so you have given a name now just closely look at the data frame function so here you are using the names
            • 155:30 - 156:00 you have the titles whatever values you are passed in always remember when you have multiple vectors they are ending with a comma right so do not forget that and then you have other dot author so that's the name of the column and you are passing in the values where you have also passed some n a values and at this point of time you can look at authors this is your books and our intention will be to merge these data frames so that's what we would want to do might be we are interested in getting
            • 156:00 - 156:30 the data together so what i'm doing here is i'm saying m1 now i want to use the merge function i pass in my data frames that is authors and books so if we closely look at authors it has three columns and five rows and here you have three columns and we have seven rows so we would want to do a merge so we will say author's books and we will say by dot x so this is where i'm choosing which is the column based on which
            • 156:30 - 157:00 i would want to merge so i have by dot x which is surname and by dot y which is name so we would want to merge the data where we are giving a condition based on values and surname and name so you see there is turkey here there is turkey here we have venables we have venables we have tierney we have this one we have ripley which we have here multiple entries and then you have mcniel
            • 157:00 - 157:30 now we don't have our core which is there in your author so let's see what happens when we do emerging here okay and now we see the result of this merge where it has taken all the values from both the data frames so you have surname nationality deceased you get the title you get the other dot author which you are getting in from your books and the name column is avoided right
            • 157:30 - 158:00 because we are doing the merging based on surname and but y dot name is name so we don't see the name column but what we are seeing here is the values which have been merged and then you can compare so for example let's do a random check so if i look at mcneil that's the surname or here it was named so you have mcniel you have a nationality which comes from the first data frame deceased from the first data frame
            • 158:00 - 158:30 then you have your interactive data analysis and then you look at title.author what you don't look at in the merge is this r core because this does not have any value in your author's data frame so you can do a merging of your data frames using the merge function so please try it out and you can create different data frames and try to use this similarly you can manipulate a data frame so for example here we are creating one more data frame called
            • 158:30 - 159:00 sales report which is data dot frame you are giving an id product has some values unit price is where you are getting the values as integer and quantity as integer so now if i look at my sales report this is the values which i have let's spend a couple of seconds to look at this value so id value is 1 0 one two one zero ten product is a b so that is automatically assigned
            • 159:00 - 159:30 unit price is starting where you say one zero one one 140 184 right so we are using a as dot integer we are converting it into integer and basically we are assigning these values here for your unit price and similarly for quantity we are assigning the values by doing a as dot integer and then just doing a run if
            • 159:30 - 160:00 now once we have done that we have created a data frame now how do you transpose what do you mean by transpose so transpose is when you are changing your accesses so if i do a transpose on sales report and if i want to do a view so you will see the positions which have changed so you have all these values so my row names or row whatever values become the column headings and basically your column headings
            • 160:00 - 160:30 becomes your row names so that is what you are achieving by doing a transpose you can do a head to look at some initial values you can do a sorting of this data frame by using the order function and you can choose the column and also the order if you would want to have it in ascending or deciding or basically increasing or decreasing values you can also choose a particular column like we are choosing product as a column and i would want to
            • 160:30 - 161:00 take the values of sales report in a descending order that is unit price and we can just do ordering of data frames or sorting the values and data frame so this is pretty easy please spend some time in practicing these things taking these examples and you will learn more about these functions you can always try creating an example at your end and you can try to look into these now what about subsetting the data frame so
            • 161:00 - 161:30 when you are saying subsetting the data frame let's do a subset function like what we used earlier i will say subset dot product a i'm using the subset function and here i will get the subset based on the product value being a let's look at this and this shows me only the values where product value matches a now extract the rows for which product is a and your price is 150 so you are still doing a subsetting you are still
            • 161:30 - 162:00 passing your data frame here you will give the product as a which will tell basically the values for product and unit price greater than 50 so you are giving some conditions and look at the values now if you are only interested in particular columns so if i say only the first and the fourth column product is a and unit price is 150. so you have to still use your subset function pass in your data frame product will be given as a and unit price should be greater than
            • 162:00 - 162:30 150 but what i am interested in is the values from the first and the fourth column and now if you see it shows me the values for my fourth column what we can also do is we can create two subsets so set a from data frame where we take the product is being a other one is being b and then we can look at the values so this is just a this is just b and what we can do is we can combine them or we can merge them using column bind so when
            • 162:30 - 163:00 i say column bind and i'm saying set a set b so it is basically going to stack the data frames column wise and if you do r bind it is going to stack the data frames row wise so we can either use column or we can do a row wise so this is in one way where you can merge the previous example where we saw merging was based on a particular condition which is met based on some columns which might have
            • 163:00 - 163:30 similar values right and this is where you are straight away merging the data frame using columbine and c bind so if you compare this with the other merge operation what we saw here this was where you are comparing the values of first data frame and second data frame and then merging but here we have just used column mind and row bind so we are not merging on a particular condition we are just stacking them either column wise or row wise
            • 163:30 - 164:00 now what we can also look at is doing some aggregate operations this is going deeper into data frames so when you use aggregate function you are passing in your data frame you are choosing the quantity column and then you are basically using the list function so list function is going to work on your data frame on the product column so product column for your sales report so at this point of time let's look at sales
            • 164:00 - 164:30 report and let's look at the value here so this is my sales report and what we want to do is we want to aggregate the values on quantity column but for that i will say i will just take the product columns and i will get a sum wherein i am ignoring the any values let's look at this and that gives me an aggregation value so remember aggregate function
            • 164:30 - 165:00 is doing a summing up now here we are doing a summing up on your product that is sales report product column is what we have so you are kind of grouping by based on product so we have two products here a and b now what we also want to do is we want to take the quantity column so that's why we have given that first and what we are doing is we are doing a summing up so we are summing up all the values for a and all the values for b
            • 165:00 - 165:30 and we are seeing that here if there are any n a values we are ignoring it so these are some basic operations on data frames or matrices subsetting them extracting useful information using some inbuilt functions to do transformation or computation and extracting some values now similarly we can also work on lists now that we have looked at data frames
            • 165:30 - 166:00 matrices vectors let's also look at one more structure and how we work in r when we have to work on lists so list is basically a structure here and what we are doing is we are creating a list by using the list function and here i am passing in three vectors you see here now c function is being used now in vector we know that all the elements are of the same type now let's create a list
            • 166:00 - 166:30 wherein we see three vectors which are of three different types or objects of three different types so let's create this list and now let's look at our list so it basically has elements wherein you have values of different types we can create a different list which can also have sequence elements that is one to ten a matrix which is of three dimensions and then also passing a list so this is also one
            • 166:30 - 167:00 way of creating a list let's look at list two and if we look at the values here list2 basically has a vector which has values 1 to 10 it has a matrix of 3 into 3 it has a list which has values a having 10 and b having 20. so this is how you can create a list which can have objects of different types so we can also use recursive variable a variable that can
            • 167:00 - 167:30 store value of its own type so for that you have to use a recursive function something like this so i'm saying is recursive and then do it on your list and we can check if the list basically has a variable that can store values of its own type now one of the main requirements when you're working with list is indexing so i have created a list and
            • 167:30 - 168:00 here i can access this elements by using an index so if i do this this shows me the matrix what i could have also done is using the dollar symbol and then choosing particular element of the list by doing a mat which is the name given to our matrix or by choosing a name that is vector so you can access the elements using indexing or dollar renovation or giving
            • 168:00 - 168:30 the name of a particular element now i can also work on list and i can get the third elements second value so we can do that and that shows me 20 or you could have done by giving the value 3 that is the third element and within that you are looking for second element so i can get the length of the list i can get the class of the list which shows me this type list and what i can also do is i can convert
            • 168:30 - 169:00 vectors into list so here we are creating a variable price which is being assigned a vector which has 10 20 and 30 and now what i want to do is i would want to convert this vector into list and for that i'm using the list function so i'm creating a variable called price list and then i'm saying as dot list so that's going to convert my vector into list and now let's look at price list
            • 169:00 - 169:30 which shows me a list or you can look at price which is a vector so that's when you are converting your vector into list now how do you convert your list into vector and that also can be done by doing a unlist function so i can basically work on price list wherein we converted vector to list and i can just do a unlist on that which will convert my list into a vector looking at the values of the vector
            • 169:30 - 170:00 now sometimes we may want to get the dimensions so we can use the dimension function to convert the vectors to a matrix so that it can have multiple dimensions so here we create a vector which has four values and then i am going to give a dimension to this so that it is converted from vector into matrix by giving dimensions 2 comma 2 and now if you look at price 1 it has basically changed into rows and columns
            • 170:00 - 170:30 of 2 into 2 dimensions so these are some simple examples of working with list now when you talk about basic data type functions we have seen how you use the assignment operator how you get the data type of a particular variable or the class to which it belongs i can assign different values such as 10.5 so the previous one was showing me the value numeric and now what we would want to do is we want to
            • 170:30 - 171:00 assign a value 10.5 look at the class of it it says numeric type of it shows double so by default it belongs to the double class now i can check if the values in n1 are numeric and that shows me true and similarly for n2 and that shows me 2. so you are using the numeric function which returns true if the given value is numeric similarly we can
            • 171:00 - 171:30 have integer assigned to a particular variable and for that either i can do as dot integer or i can assign a value with capital l so i can do this and look at the value of i1 similarly i2 and look at the values and if i would want to check if that is an integer let's look at the values of i2 which was an integer i1 which was an integer and i3 which is an integer
            • 171:30 - 172:00 so here we have assigned integer values to a particular variable now all integers are numeric but all numerics are not integers so let's check that so if i do a is numeric on i1 which was assigned as dot integer 10 that shows me true if i say is dot integer on i1 so was that an integer and if i look at the value it shows me true now let's look at the character values
            • 172:00 - 172:30 so if we say c1 c2 and look at the class of this it shows me this of character type similarly on c2 and you can always validate that by using the character function you can also use some inbuilt functions such as converting to an upper case or getting a substring from the starting till the position what you would want the elements i can do a paste function
            • 172:30 - 173:00 which basically will give me the data combined or you can say concatenated you can also use a paste 0 which we know will get rid of the space and it just concatenates them without a space i can also use a specific separator which we have seen examples and we can do that and what we can also do is we can replace set of characters so here i am saying
            • 173:00 - 173:30 substitute and then if i look at the values it has basically replaced rob with cena and let's look at the length of it or number of characters in this so these are some basic operations what you're doing on matrices on your data frames on your list and also on your variables where either you are assigning them values of a particular type or you are changing the data types you can also go for coercion
            • 173:30 - 174:00 in case of vectors we have seen that where if you are passing in values of different types that's coerced into same types so later we can learn more on functions and flow control and how that is handled in r let's learn how r can be used to take care of flow control that is if i would want to have a if else condition and if what i would want to compute or
            • 174:00 - 174:30 if i would want to check some values how r can be used so here if statement consists of a boolean expression which is followed by one or more statements so we can just say if we can pass in a boolean expression where we would want to compare particular value or we would want to check a particular value and then whatever is passed in the statement will get executed so what we can do is here we can use assignment operator i can pass a value to x now we
            • 174:30 - 175:00 can always do a type of and that can tell me that x is basically an integer and now i can use my if where i can say is dot and then i can choose integer and i would want to check the value of x if that is an integer then i will just use brackets and i'll pass a statement here so let me say print and let's say x is
            • 175:00 - 175:30 an integer and we can execute this and this tells me that the boolean value is true now if for example we would have done something else or say for example instead of integer if i had used let's say character for that matter and we can check the value and we can do this so
            • 175:30 - 176:00 here we will check the values and it says there is an error with the bracket and let's check this one so if x because we missed a bracket here so let's do that one and then try this and it doesn't show me any result so how could we handle something like this if the boolean expression does not match to true and in that case we can always go for else statement so we can check for a value so
            • 176:00 - 176:30 if the boolean expression is true statement will be executed and if it is false then next statement will be executed so we could have done the same thing here where i said print x is an integer which we know is not true and what i could do is i can here after this one say else and then i can open up one more bracket
            • 176:30 - 177:00 and then i can say print and i will say x is not a character and now we know that x is not a character so this is a simple way where you can use if else and you can control the flow by passing in the conditions now that's when you are using if else statements now what about while loop so that also can be useful when you are programming
            • 177:00 - 177:30 in r so an else statement is executed when the condition in the if statement results to false so that basically means what we can do here is let's pass in a word or a set of words like this for example let's say v and then we use c function to create a vector for example and then i can just say hello world and if you look at v
            • 177:30 - 178:00 you can look at the class of v it's of characters and if you look at type of v it is having the objects or elements as character now what we can do is we can basically then say count and let's assign this a value to now what we would want to check is is the count of elements in our v equals to two so what i can do is
            • 178:00 - 178:30 while my count is less than say five now i'm saying i would want to do something while the count is less than 5 so we have already given a value to count as 2 and now what i can do is here i can open up a bracket i can say print and then pass the value of v and then what we do
            • 178:30 - 179:00 is not only this we will also increment the value of count and we will say count plus one and here it gives me error probably because we have missed a bracket so let's see what we are missing out here so let's just check this one again so here it is we have created v which has two elements of the type
            • 179:00 - 179:30 character and then what we do is we assign count a value of 2 and we would want to check while the count value is less than 5 we would want to print the value of v so what we are doing here is we are saying while then you pass in an expression which will check the value of count we do a print and then we increment the value of count now this is a simple example where you are using while to basically test an expression and
            • 179:30 - 180:00 while that expression is true you would be doing something whatever is passed within your brackets now we could also be going for for loop now for loop is basically used to iterate over a list of elements or a range of numbers so for example if i have a vector like fruit which has some values i could just say for i in fruit i would want to print something so let's try this also as an example to test our for loop now we can
            • 180:00 - 180:30 just say names and we can basically then assign values to this so let's say vj aj dj and let's say sj and let's create this let's look at the value of names now what i can do is i can use a for loop and i can say for i in my names so i will say for i in
            • 180:30 - 181:00 names now what do you want to do so open up your brackets here and then we would want to say print i and then basically close the bracket so you see for every element in this vector it is basically going to print the name one by one so you are iterating through a set of objects by using a for loop now this is how we
            • 181:00 - 181:30 can work on for loop so if else while and for loop can be very useful when you would want to iterate or when you would want to check the value of an expression or when you would want to loop and do a particular task it's always good to understand how you manage flow control in r that is either when you're working with your for loops your while loops
            • 181:30 - 182:00 also understanding how you can use your logical operators for working with your data in r so let's look at some examples and understand logical operations so either you could be having and or you could be doing a or where you are evaluating one condition or you are using not so these are your logical operations now here i can assign a value to x and then i can check if my x value is
            • 182:00 - 182:30 less than 10 and it shows me false so i have been checking the value of x so let's see is it greater than 10 and that's true now i can use logical operations here so i can say and so i'm saying is my x value less than 20 and is my x value greater than 10 now both these conditions are not true so in this case we get the result as false but if i say x is greater than 20 which is true and
            • 182:30 - 183:00 i am saying x is greater than 5 that's also true and x is equal to 25 now whenever we are talking about and we have to look at all the conditions have to be right so let's look at this and we get the value as true but if i say x is greater than 10 or x is later than 5 then one of the condition has to be true which is true in our case so we get the result as true we can take a different example we can
            • 183:00 - 183:30 say is x less than 20 which is not true but is x equals to 30 and that's also not true so in this case we get result as false now we can straight away compare some numbers and we can say is 12 equals 3 and that's false and if i say not then that basically will give me the result as true so these are some simple logical operations which help you when you're working with your data in r
            • 183:30 - 184:00 now we can create a data frame by using an inbuilt data set empty cars and let's look at our data frame so that shows us the values with all the different car models and the different column names so car models are the row names and then you have other things like mileage and cylinder and so on which are the specification for the data now what i can do is i can filter out values here using indexing so i can say data frame now in that data frame
            • 184:00 - 184:30 i would want to compare the value of mileage which is greater than or equal to 30 and then i can end it with comma so that gives me the value wherever the mileage is greater than 30. i can also do a subset on data frame where i can select a particular value so we can be doing this or we can be using square brackets we can also do a dollar and compare the values now we will use
            • 184:30 - 185:00 our logical operations knowledge here so we will work on data frame where i am interested in the mileage which is greater than 20 and i am looking at the column hp horsepower and that should be greater than 100 remember when we are doing a and both the conditions have to be met as true and that shows me the result where you're looking at the mileage and you're looking at the horsepower column both of these are met and that's why we get the result
            • 185:00 - 185:30 so these are some simple examples of using your logical operations either when you're working on a data frame so same thing can be done on a matrix same thing can be done on a list or a vector or individual values now let's also learn about flow control that is how if else or else if is handled in r so you can do a single condition check so for example i assign a value to hot which is false and i'm saying temperature is 50 now
            • 185:30 - 186:00 what i would want to check is if the temperature value is greater than 60 which in our case will not be true which will not be true because temperature has been assigned 50 so is it greater than 60 no so if i do this if condition and i am saying if the condition is true then i would want to assign the value of hot to true and now if you look at the value of hot
            • 186:00 - 186:30 it is still false why because the condition which we passed for our if is not true it has not been met so whatever was passed within the statement has not been done now let's change the value of temperature as 100 and now if we do the same thing we say is my temperature greater than 60 which is right so then whatever has passed in the bracket will be applied so hot will be assigned new value and now if you see
            • 186:30 - 187:00 the hot value is set to true so this is a simple single condition check what you are doing now certain times there can be multiple conditions to check and that's where we use else so in this case we go for assigning a value to score which is 63 so let's do that and now let's say is my score value greater than 80 which is not true so whatever is passed in here which is print it's a good score will not be done
            • 187:00 - 187:30 but it will jump to else and then whatever we have passed in else will be done so it will say it's not a good score so let's do this if and it says it's not a good score so this is a simple way of using if else where you are checking two conditions or you are checking the condition but what if the condition is not met then your control is passed to your next statement now i can also do a else if so i can say
            • 187:30 - 188:00 score is 63 and i can say is my score greater than 80 that's my first condition so it would pretend good score but might be i would want to check something else so i'll say else if and i'll say is my score greater than 60 yeah and is it less than 80 remember the and which has to evaluate and true for both the conditions so i'll say print decent score i can still keep on giving conditions
            • 188:00 - 188:30 here in else if scored less than 60 and score is greater than 33 that would not meet so that will be ignored and then you have else which says print poor so first it checks or evaluates for the condition which you have passed for if if that doesn't work then it goes to else if and if anything in else if is met then it's going to take that into consideration and it will not go for else if if and else if conditions
            • 188:30 - 189:00 are not met then it goes to else and we see decent score already printed here now that's a simple example of if else and if else if wherein we are evaluating a condition but probably we have multiple other things to check now how do you work with while loops in r that's very simple so what we can do is we can assign a value to x and now i will say while
            • 189:00 - 189:30 my x is less than 10 so i'm going to create a loop so i have said my x has been assigned a value of 0 and that's fine so this is going to be less than 10 but if we are going to just do this then it will keep running and it will get into an infinite loop so we'll see how we do that so we'll say while x is less than 10 i would want to basically have the value of x i would want to print x is still less than 10
            • 189:30 - 190:00 adding 1 to x and what we are doing is we are incrementing the value of x now if you do not do this step then it will get into an infinite loop because x will be always less than 10 so we are incrementing the value of x by 1 and then we are giving a condition so if at any point of time x is equals to 10 i would want to say x is equal to 10 terminating the loop and then basically my while loop ends so
            • 190:00 - 190:30 we can do this so let's say x is 0 and then do this while loop and now you see it is at every step it is basically printing out the value of x it is still less than 10 adding 1 to x and it also gives you the value of x so when we do a x is currently and i print out the value of x so it shows me 0 next time you increment it it becomes 1 and 2 and so on so this is where you are
            • 190:30 - 191:00 using a while loop where you are looping where based on a particular condition and then you basically have once the condition is met you are able to complete the loop now let's look at let me take this one here we'll look into functions in a later stage so let me take this function and let's get rid of this one
            • 191:00 - 191:30 i would also want to talk on break statements and while loop and once we are done with the flow control on while loops then we can look at the functions aspect either we can look at how we control our functions or how we create built-in functions so let's look at this one and let's continue with our while loop so we just saw a simple while loop here and what we also want to see is when you are working with your while loop
            • 191:30 - 192:00 how do you break if a particular condition is met so we saw a simple example of while loop and that's fine wherein we were printing out something we were auto incrementing the value of x we were also checking at one point of time within our while loop if the value of x was met we would say we are terminating the loop and it comes out of that now if that does not happen then we
            • 192:00 - 192:30 continue doing it how about a break statement so break statement is when you would want to end the while loop if a particular condition is met so for example here i assign a value to x which is 0 now i want to evaluate this lesser than 5 so that means i will be auto incrementing the value of x so i'll create my while loop i'll give in a condition that x is less than 5 now what i want to do is i want to use the cat
            • 192:30 - 193:00 function which will print the value so i am saying x is currently and i am printing out the value of x then i say print x is less than 5 because we have not yet incremented the value of x we are adding 1 to x like what we saw in previous example i am saying x is then incremented by 1 and here i am saying if x reaches 5 so while we keep incrementing the values within the file loop we'll see if x's
            • 193:00 - 193:30 value is 5 we will print it is equal to 5 and we can just do a break now if you do not use a break you can still end the while loop but break is basically to end this loop here based on condition which is met and we can do this and then run this while loop so you see here x was met as 5 and we just broke out of the loop so that's your simple while loops what
            • 193:30 - 194:00 we are seeing similarly we can work on for loops so for loops can also be useful so your conditionals what we saw is if else or else if your while loop is while a condition is not yet met you keep looping and keep doing some actions now what you can do is you can also work on for loops so here i'm creating a vector and then i am going to loop that is i am going to iterate through
            • 194:00 - 194:30 every element so i will say 4 and when you are using for loops you will say 4 and then you can given anything you can given any value i can say i i can say x so i am just giving temporary variable invector and then i am printing it out so this basically prints all the values one by one so there is one more way to do it you can say for and you can say i in and i would want to take length of the vector
            • 194:30 - 195:00 so 1 2 the length of vector that is till the last element is reached i would want to print the vector elements using the value of i so what is i here it's the index position and i can do it in this way so if you are looping over a list so i'm creating a list and it's very simple so you can just do a for loop where you can say for i in list i want to print the i and that gives me the
            • 195:00 - 195:30 list elements or you say for i in and you give from starting position that is one till the length of list and you would want to print every element so here we can also use double brackets so if you would want to loop through a matrix so sometimes that might be required so let's create a matrix which has 1 to 25 values around by row and you look at your matrix and now what you want to do is you want to iterate
            • 195:30 - 196:00 through a matrix so you want to do a looping so i'll say for i in matrix i would want to print out the values and that prints out all the values in matrix now what if i want to print the square and square roots of numbers between 1 to 25 so i can say for i wherein the value starts with 1 ends with 25 and then within my for loop i can basically
            • 196:00 - 196:30 give this condition where i am saying get me the square root that is i into i or get me a square root of i and just print it out so i am saying message i is this one square root is this and my square is this and square root is this so if i look at this values here now i am looking at all the values from 1 is to 25 i am looking at the square of the values and i am looking at the
            • 196:30 - 197:00 square root so what we did was we did a 4 we passed in the elements by saying i in 1 to 25 and within the bracket i have said what do i want to do for every element so either i have calculated a square i have calculated a square root and then i am printing out when i am using the message function which takes the value which you are passing in comma the value of i similarly square and similarly square
            • 197:00 - 197:30 root so these are some simple examples of understanding flow control in r that is using your for loops your while loops and also your if else later we will spend time in learning about functions which could be either created by the user or built-in functions and also factors in r welcome to this section of our programming where we will learn about functions whether that is about inbuilt
            • 197:30 - 198:00 function or creating your own functions and working on your different data structures so what are functions function is basically a set of statements to perform a specific task now r has a large number of inbuilt functions so you can say packages which you can import and start using or users can create their own functions so when it comes to functions the syntax is very simple
            • 198:00 - 198:30 you give a function name you can assign your function to a variable and a function can take no arguments one argument or any number of arguments so let's see some example on functions so for example here we are creating a variable called squares and we are assigning a function to it now this function would take one argument which is a and then we use a for loop so we say for
            • 198:30 - 199:00 i in from 1 to the number a we would basically be doing a exponential computation so what we would do is we would square the value in this particular range and assign that to b and print it now when we do this we can call in this function and pass in a value to look at the square of that particular value now this is a simple example of function so this is
            • 199:00 - 199:30 how it would look depending on what value you have passed to the function so for example we say squares and we pass in a value of 4 so that becomes 4 i in 1 to 4 so you would start with 1 the value of 1 square would be 1 and then you have your value for 2 so 2 square would be 4 then we have 3 3 square would be 9 and then we have 4
            • 199:30 - 200:00 square which is 16. this is a simple example of function and this is how you can create your own function to calculate or carry out some computations now let's look at some other examples before we get into built-in functions which basically allows you to work with different data structures so there are different mathematical functions which can be used for your data science or computations
            • 200:00 - 200:30 you have your regular expressions which can be used for pattern matching or you can also use functions for data manipulation before we get into data manipulation let's look at how you work with functions taking some examples so let me bring up my r studio wherein we will try out some examples and see how functions work now here are some examples and we can see how this work let me just clean up the console and we can start here
            • 200:30 - 201:00 now here we are creating a symbol function which does not take any argument we call it as hello world and this will start with the word function and parenthesis now that could have arguments passed in however this function we are not passing in any argument and what we are doing here is we are printing out whatever value is passed within the bracket so let me just do a ctrl enter my
            • 201:00 - 201:30 function is created and you can straight away call this by just doing this now however if you would have tried this function without the bracket for example something like this then it would have printed out the complete function it would have printed out the complete function and whatever you passed in to hello world but if you would want to call the function then basically you would just do hello world and then use the brackets
            • 201:30 - 202:00 so that's how you call the function and that's how it shows the result now your function can be with a single argument so for example here we are passing in an argument called name and we can then use this to pass a value to this so here i'm saying hello name i have my function but this one takes a single argument and we are going to use paste which basically can concatenate or just adds up whatever you are passing in to paste so we will say paste hello and
            • 202:00 - 202:30 then the name notice that i have given a space here after hello so that i can have it in the right format and i can just do this so the function is created and now let's pass a name here and just try to call the function so name is one argument or a single argument which is passed to this function so let's look at the result and that shows me the name whichever was passed to this one
            • 202:30 - 203:00 now what we can also have is function created which takes two arguments and this is a simple example so here we are creating a function add num i'm saying function it takes to argument i'm not providing any value or default values for this we'll see some other examples for those now here this particular function takes two arguments and whatever you pass in here a addition of that will be seen so let's create this function and let's
            • 203:00 - 203:30 call it and test it and that shows me the result as 70. now what we can also do is we can add a vector to a number so vector is list of elements or list of objects you can say and here we would want to perform add num or we would have to call add num function by passing in vector which becomes the first argument and the second value is the next argument so let's run this one and that shows me the result
            • 203:30 - 204:00 wherein 5 as a value has been added to every element of this particular vector now when it comes to function you can also have default argument values which can be passed so here let's look at as an example so we have hello name again but this time instead of passing in just an argument we will also provide it a value or you could say that could be considered as a default value now when we create this function we are
            • 204:00 - 204:30 doing the same thing as previous examples but we are passing in an argument and that argument has a value now once i do this i can surely call this function without passing in a value and that shows me the name which we had assigned to the argument or we can even pass in a new name which will be assigned to name so if we do this it works in both the ways fine so this is in one way you are passing in a default value
            • 204:30 - 205:00 and then basically you can either call the argument or you can assign it a new value so if we would do something like this hello name and then for example i would say name equals say jerry and if i would do this so that also works fine however since we are passing in an argument we are assigning a value so either we can let it go for the default or we can just pass in the value
            • 205:00 - 205:30 or we can be very specific in mentioning the argument and then the value for it now how do we return value from a function let's look at this so here we are creating a function we are calling it full name and this one takes name wherein we are giving sachin and title is say tendulkar and what we would do is we would use a return statement here so return would
            • 205:30 - 206:00 basically use the paste function it will take the values of name and title and then glue them together however we are using also a space so that there is a space between these two values to the arguments which are passed now if i run this argument sorry this function my function is created now we have already passed named arguments or we have already passed value to those so we can straight away say just call the
            • 206:00 - 206:30 function and that does whatever you have mentioned in the function body i could have also said that i could create a new one wherein i will pass new set of values which we saw in a previous example and then if we call this it takes up the new values so either you can let it go for default like what we did here we can also pass in new values or if you would want to keep it specific
            • 206:30 - 207:00 you could basically say full underscore name that's my function i could say name equals and i can say john and then i will say title smith and that's also fine so we can do this and that works in the same way as it would have worked with just passing in the names so this is fine and if you would want to
            • 207:00 - 207:30 test it out say for example if i would just take off name here and just do this that also works perfectly fine wherein we are still using these arguments in the particular order now if i would have changed this one to name wherein i am already passing in a value for name and if i tried this so in this case what happens is name is smith
            • 207:30 - 208:00 and basically your title becomes john right so we have to remember how we are what arguments we are passing and if we are basically assigning values to the arguments or letting it pick up the default ones so let's do this and that looks okay now when you talk about scope of a variable okay now before we understand scope of a variable let me show you some more examples on function now say for example
            • 208:00 - 208:30 if you were using built-in functions we have lots and lots of built-in functions which are available for programmers which they can use in their data science activities or data processing or computation now here we are using a function called r norm to generate 1000 random values from a normal distribution of mean 0 and 1. so i would use the r norm that's an inbuilt function and i will call this say normal
            • 208:30 - 209:00 distribution so that is already done now we can find out the mean on these random values which would have been generated using the inbuilt mean function and that works perfectly fine you can also create a histogram out of this and if i do this it shows me the histogram so let's see the histogram here let's bring it out and that shows me the histogram of normal distribution if you
            • 209:00 - 209:30 would be interested in knowing about a particular inbuilt function you can just do a question mark and use the function and that basically shows you the documentation of the function so this is a generic function which computes a histogram of a given data value and here it takes arguments so this is basically your data this could be the number of arguments which you are passing in for your histogram to be created
            • 209:30 - 210:00 now we can look at some more examples here so i can say two histogram with large number of interval breaks and this is where i am also specifying breaks and passing in a value so this allows me to provide arguments to functions by position now the same example which we have given here we can do it without breaks argument but as a good practice we should actually give name to the arguments which we are defining so if i would do
            • 210:00 - 210:30 this when i'm passing in my data that is normal distribution and then for breaks i'm just giving the value 50 and that is also fine it works perfectly fine here now we can create our own function which as we saw in some basic examples functions which can be without arguments say this is a simple example or with arguments so this one we have already seen how you can create a function without giving any
            • 210:30 - 211:00 argument or by giving an argument and then basically calling in the function now when it comes to optional arguments so we can look at this function wherein i would want to say find out the exponential value of a particular number so i call it expo value i use my function i say this will take the value x now that's an argument which we are passing in we could have given it a value or we will just let the user
            • 211:00 - 211:30 provide the value when this function is being called i will also give a default argument which is power equals 2 and here we would want to get a histogram of the values with a particular power so if i create this particular function that's done and now i will just pass in my value i don't need to mention power that has been given a default value yeah if we would want to change it then we
            • 211:30 - 212:00 can pass in that so let's run this one and that gives me exponential value a histogram based on the normal distribution data and by default it is using power s2 now what we could have also done is we could have specifically mentioned a different value for power and that works perfectly fine i could have just passed in the value as power and that also works fine
            • 212:00 - 212:30 so here you are using named arguments and basically passing in any other arguments now what we can also do is we can use these named arguments and then we can also do or we can pass these arguments that is what we call as passing any other arguments now if you look at the explanation of this hist function histogram function
            • 212:30 - 213:00 if you look at this it shows me these three dots and this is what we can use to pass in any other arguments so let's look at an example for this one so say for example i would want to create a function where i am passing named arguments i am passing in the data but then i would also want to pass any other arguments which can be passed dynamically now for that we can create a function here
            • 213:00 - 213:30 wherein i am calling it expo value again i am passing in my x which will be the data which we will pass in you are mentioning power which is two which is a named argument which can also be considered as a default or you can change the value or you can provide a new value and then i am also giving these three dots which are also passed in within this particular function so let me create this function here now once that is done then i can call this function by passing
            • 213:30 - 214:00 in my data which is normal distribution power is 2 and then i'm also using these breaks for getting my histogram with intervals of 50 so let's call this function and that gives me the histogram now what we can also do is sometimes it might be useful to pass logical arguments so for logical arguments what we can do is we can create a function which will take the data here i am using a named argument exp
            • 214:00 - 214:30 that is for exponential i am saying if the value of histogram is false and then i am also giving any other arguments so what we will do here is in this function we will say if the value of hist is true then this block of code will get executed where you will get a histogram based on the exponential which has been assigned in the function passed as an argument
            • 214:30 - 215:00 and if that doesn't hold true which is by default false as we have given in our function then this piece of code will get executed so let's create this function and that's done and now we can straight away just pass in our data exponential value is given as 2 histogram has been given in false that means the else part of the code will get executed and we can look at the values here i can also say
            • 215:00 - 215:30 histogram is true and that's where we will be calling in the hist function and i can do this that shows me the histogram so in this way we can pass in named arguments we can pass logicals and then we can also pass any other arguments for our use case now looking further in functions let's also understand the scope of a variable in a function so here i am saying v
            • 215:30 - 216:00 and then i am saying i am global variable let's create this and then i am saying stuff so i'm global stuff so this is basically we have assigned some values to variables v and stuff now let's create a function where i'll say fun i'll use the function and i will say this will take my variable stuff i'm saying print v and then for stuff i'm assigning in a new value and then i'll print stuff
            • 216:00 - 216:30 so let me create this function and let's see how it works so if for example i would just say print v that shows me the global variable which we had created earlier and since i'm using that within my function it basically has the value now i also have a global stuff so i'm saying print stuff and that shows me whatever was assigned to the variable and now we will basically call the function
            • 216:30 - 217:00 by giving in the argument as stuff the variable which we had created now if we do this then it says reassigning stuff inside the function and that's because within the function we are basically assigning a new value to this stuff now i can also just do a print stuff now if and if you see it still goes back and prints the global variable so only within the function
            • 217:00 - 217:30 reassignment happened and that's what we understand when we talk about global variable or local variable now to create a function to find the final output amount to be paid by a customer after adding 20 tax to the purchase amount how do we do that so i'm here creating a function which will take x as hundred and what does that function do we would want to basically
            • 217:30 - 218:00 find out the amount which is paid by customer after adding 20 percent of tax now how do we calculate that so we take x plus 20 percent of x and that would be the final amount which will be paid so we do a return t and this is my function so let's create this function and then let's pass in a value to see what is the amount which customer would pay with an addition of 20 percent tax so this is a simple function where we are
            • 218:00 - 218:30 passing in one argument we are giving it a value and then we are doing computation within the function body what we can also do is we can create a function where i am passing in an argument and i can then check the value of that so if the argument passed was greater than 0 then we would find out the final amount which is amount plus 20 percent of the amount if the amount is less than or equal to
            • 218:30 - 219:00 amount then equal to zero then our final amount is equal to amount and we return f amount so here we will be evaluating these conditions and based on that my function will return the value so let's create this function and pass in a value and that shows me 100. so you can just test this by saying amount 1 and say for example i would have passed in zero now in this case my final amount is zero because there is no amount which needs
            • 219:00 - 219:30 to be paid by the customer now checking the argument and the body of a function so we can always use this inbuilt function args which will tell me for this particular function what are the arguments and what is the body of the argument which basically tells me whatever we have coded within the function body now to understand the scope we can create a function here which is taking an argument x and what does this do so we assign a value to y
            • 219:30 - 220:00 then we basically say g1 and here i am using function of x now what does that function of x do so this one will take the value of y plus multiply x by itself so this is a function which we are creating and then i am saying g1 of x so what you are doing is whatever value
            • 220:00 - 220:30 was passed in as x for that function x will be applied so let's create this function and then pass in a value 10 and that gives me the result as 110. similarly we can create another function where we want to do some computation and then i am creating one more variable which has basically the function pass in a value for y and then basically what you do is
            • 220:30 - 221:00 you are calling in your g2 function and then let's call in this function so let's do this and let's also create f2 and then finally we will call in f2 which is internally calling g2 so these are some simple examples where you are doing some computations and creating some simple functions let's also create a function which is taking two arguments so here i have g two
            • 221:00 - 221:30 function takes two arguments x and y what does that function do here we are saying y plus x into x that's my g two and similarly i'll create f2 which is going to have a value assigned to y and this one is going to call in my g2 function which will take x and y x which we are passing in here
            • 221:30 - 222:00 and y which we have assigned so let's create this and then let's call our f2 and what does that f2 do it basically has the value of y assigned and then it does whatever is mentioned in g2 with our x and y values so i'm passing in 10 here so it is basically y which is 10 plus you have the x value which has been passed here
            • 222:00 - 222:30 so let's look at the calculation which is 10 so that gives me 110. so 10 into 10 into 10 plus the value of y so this is how we can create functions which have been assigned some values and then pass in some other values to those look at some more examples here when it when we work with functions and see how we can use functions to carry out our basic operations or calculations so for example here
            • 222:30 - 223:00 i am creating a function and this will take an argument wherein we are saying it would be marks now let's do this and the function body would say result is not defined now if the marks are greater than 50 then result will be pass and you will have the message which is your result is and then you are passing the value of result so let's look at this one so
            • 223:00 - 223:30 let's create this function pretty simple function and then let's pass in a value here so i'll say status as 60 which will be checked for the value greater than marks or lesser than marks and that tells me your result is pass and if we give this one then it says your result is not defined however we can have additional statements here which can say if the result was lesser than 50 then what should have been printed this is a simple example
            • 223:30 - 224:00 let's look at one more example and here my argument is h now just notice that we are not passing any default values or we are not passing any values to the arguments we are just passing in an argument which will be assigned a value when you call the function now here we say age group is not defined we say vote is not defined and then we start using some condition checks so i say if the age passed is greater
            • 224:00 - 224:30 than 18 then the age group would be adult and the person can vote and message your age group is and voting status is will be printed out so we can use this or from our previous learning we can do a if else and modify the function so let's create this function and then pass in a simple value to this and that tells me what is your age group and what is your status to vote
            • 224:30 - 225:00 so now if we would want to create a function to convert a name into uppercase let's see how we can do that so we are creating a function here which takes the value name now then we also find out the length of this particular argument and for that we are using a inbuilt function called n character which will be for your name and you would want to find out the length of this particular name
            • 225:00 - 225:30 and we would say if the length is greater than 5 then we are again using a inbuilt function called two upper which will convert the argument or the name passed to uppercase we will say message user given name is and then you print out your name so let's call in this function so let me first create this and then i can call in this function and we clearly see that the number of
            • 225:30 - 226:00 characters in this word is more than 5 and that's why it is converted to uppercase however if you would call the function with a name which has less than five characters it says as it is now this is again a simple function which we created let's see how you can create a function to calculate bonus now here we are passing in two arguments so this function takes two arguments one is salary and
            • 226:00 - 226:30 one is experience and then we say if the experience is greater than five then bonus percentage will be 10 and else bonus percentage will be 5 and here we will calculate the bonus so first it will find out how many years of experience a particular employee has and based on that a value of bonus will be assigned or bonus person page will be assigned and then you say what will be the bonus that is salary into the bonus percentage
            • 226:30 - 227:00 and return the bonus amount so this is a simple function let's basically select this and let's create this function and then let's calculate the function if the salary is 25 000 and experience is six years and that basically will tell me the value so let's look at the value it tells me 2500 which is 10 percent of the salary similarly if we go for this one which
            • 227:00 - 227:30 will basically go for the execution of else part of the code we can do this and that gives me bonuses half of it now how do we handle multiple conditions and multiple actions so let's look at that so let's create a function which takes one argument which is h we would check if the age is greater than 0 then we would want a nested if within this condition so if age is greater than 0 then
            • 227:30 - 228:00 whatever we have given here will get executed and this will be this part of your code and here i am again checking if age is less than 18 then age group would be kids else if now else if is to check the second condition so if the age was passed if it is greater than zero then we get into this block of the code now it was
            • 228:00 - 228:30 greater than 0 but then is it less than 18 then i would categorize the person as kids if edge is less than 60 then we will say age group adult else we will say age group senior now we can basically say that we could have given more conditions to this because here we are saying if age group is less than 18 then the individual would be within the age group of kids
            • 228:30 - 229:00 if that is not true that is is not less than 18 so probably it is 18 or greater than 18 then we are checking the second condition if the age is less than 60 age group is adult and if these two conditions are not met then it jumps to else where age group is senior and if this whole block was ignored because age was less than 0 then we would have just printed out age group
            • 229:00 - 229:30 is not defined which is message is wrong h and your h is such and such so this is our whole function so let's go ahead and run this now let's check the age group when the age is 10 when the age is 40 when the age is 65 or when the age is minus 10 which is not defined now there are some inbuilt functions which can be used in r such as your switch function so looking
            • 229:30 - 230:00 at this function that is switch function we can see or we can use this for our different kind of operations so here your switch function returns values match with the first argument and first argument should be a character let's have a look at the example so say for example you want to return the house rent allowance or hra amount based on cities so we create a function called hra
            • 230:00 - 230:30 now that takes an argument which is city name and here we will say what does this function do so here i am saying hra amount and i am going to use the switch function now switch function i am saying i would want to convert the city name to uppercase so that we can maintain some consistency and here i'm saying if the city is bangalore it would be 7500 if it is mumbai thousand if it is delhi 8 000 chennai
            • 230:30 - 231:00 7500 and you have 5000 value and you are returning the hra amount now what do we do with that so let's create this function it's done and now we will pass in the value so we will see whatever value has been passed to this and that gives me the value here right so switch is basically taking me directly to this value now however if i
            • 231:00 - 231:30 try to provide a city name which is not given in the list so when i am saying say for example pune now what is happening is it is just taking a value which has not been assigned to any of these conditions if i go for again something else which is in a lower case now this is where your two upper function will come into use and if we do this it basically converts this into upper case mangalore and then basically
            • 231:30 - 232:00 it gives you the value so this is the usage of a switch function let's look at one more example so for example here we are creating a salary range which will take an argument which will be banned and i will say these are my bands or you can say these are my options so i can say l1 is basically 10 000 to 15 000 l2 is so and so and l 3 is so and so and you return the
            • 232:00 - 232:30 range now let's create this function sometimes you have to do it this way so our function is created and now we can just do a salary range given a value and that gives me the range of the values however if you pass something which is not mentioned then it basically prints out null so in r you can also use repeat which
            • 232:30 - 233:00 can be useful and what does repeat do so here i am assigning a value to a variable called time let's do that and then i'm giving a piece of code with repeat now what does repeat do so you are passing in a message which is hello welcome to our tutorial and then you are saying if time is greater than or equal to 20 you would want to break out from this loop and then you also
            • 233:00 - 233:30 increment the times value and this will keep repeating till this if condition is met wherein we have said time value starts from 15 so let's do this and this basically will print out the message wherein first my time was 15 which was less than 20 so you increment it it becomes 16 you print it again 17 print it again 18 print it again 19 and 20 and as soon as
            • 233:30 - 234:00 you reach the times value which is 20 it breaks out of this and it stops printing this particular message okay now let's look at some more examples so if you say r we will use say a function to find the square of any given user number okay if the square value is less than 100 then increment user value by 1 and find square again and repeat this till square exceeds
            • 234:00 - 234:30 100 pretty simple so you create a function which takes n as an argument and you would want to repeat it so you would want to repeat this by squaring the numbers until the square exceeds 100 and once it reaches 100 you will break out so this is what we are doing and we are auto incrementing or incrementing the value of n by 1 every time we calculate a square and then you return the value of n so
            • 234:30 - 235:00 let's create this function and now let's calculate it for square 6 and that tells me what is the square now as soon as your square value touches 100 it basically breaks out of the loop now if you would want to find balance in a bank account after n years if a person has deposited x amount in the beginning and bank gives a interest of
            • 235:00 - 235:30 eight percent per annum right this is a simple calculation so it needs the amount which was deposited you need the year and you need the rate now year which is n ears can be given by the user rate we have already given eight percent however functions main functionality is that you can even assign new values to it say later one month down the line the bank rate changes might be it increases might
            • 235:30 - 236:00 be decreases then function should not be modified it can just take up the new values and start calculating from there on now here we will say get the final balance function takes amount the amount which would be deposited year and that could be say four years or five years or ten years for which you would want to calculate the rate of interest and add it to the amount so i will say for i in 1 to year so that
            • 236:00 - 236:30 depends on how many times you would want to run this loop i would say interest would be using the round function i am saying amount into rate whatever is the rate of interest and then you are giving two years now final amount will be calculated so you are basically saying amount plus interest and you will pass in a message where we'll say year
            • 236:30 - 237:00 is the value of i that's first year or second year amount what is the amount what is the interest you are calculating based on the round function and final amount will be amount plus the interest and then you basically say amount will be given our final amount will be assigned to amount now if this is a function you would want to return the final amount so let's select this and then basically create a function
            • 237:00 - 237:30 and let's say i would want the final balance if the amount deposited was five thousand it was kept in the bank for five years and rate of interest was eight now that should basically give me my final amount and if we double that so we say amount is 10 000 number of years is 10 but the rate of interest is less so let's calculate this and that gives me the interest however if you notice
            • 237:30 - 238:00 based on my message it is basically telling me what was the first amount what was the interest what was the final amount and it does that for all these number of years so these are some simple examples for your functions right now we can also look at on the similar lines we can create some interesting functions so you can find the total number of years required to raise thousand dollars if the user deposits
            • 238:00 - 238:30 750 per month so here you are not actually calculating the final amount but you would want to find out how many years are required to basically have the amount as 8 000 so your function we are saying the amount is say 550 or say 750 per month now i would say let the final amount be zero as of now month is zero and i will
            • 238:30 - 239:00 say while my final amount is less than or equal to eight thousand i would want to do something and that is you are incrementing the value of month by one because that's your first time your amount is less than eight thousand whatever deposit was made say seven fifty dollars per month and then you have final amount which is your initial amount which has been assigned to f amount that is 0 plus the amount
            • 239:00 - 239:30 you print out the message and then you basically say year is whatever value was passed for month so you may want to have it for number of years or years with particular amount of month so we will calculate the year value now here what we are doing is we are calling in this required years function without an argument which takes the default argument
            • 239:30 - 240:00 or you can pass it with 750. we can run this so let's create this function pretty simple done and if we do not pass an argument then the amount is 750 and it tells me what would be or how much time it would take for us to reach from say 750 or 550 to final amount similarly if i would have done this it tells me again a new value so we are finding out
            • 240:00 - 240:30 the total number of years required to raise thousand or raise the amount to 1800 dollars so these are some simple examples of functions which you can use for your operations your calculations and also creating functions which can be repeatedly used with either one either no or either multiple arguments
            • 240:30 - 241:00 now so far we were learning on creating our own functions and we also looked at using some inbuilt functions either creating a plot or basically doing some basic operations or passing in multiple arguments so let's look at some more examples and when we talk about built-in functions there are lots and lots of built-in functions which are available in r which can be used so let's look at these
            • 241:00 - 241:30 so for example here are some built-in functions which can allow you to work with different data structures for example you have a sequence function which allows you to create sequences so for example i could just say test nums and i can just say sequence and here i can say where does it start from so might be i can say 0 goes all the way to 50 and then i can also say
            • 241:30 - 242:00 if i would want a jump or how many numbers should be used so for example let's do this and now if i look at test nums so that shows me the value however not to confuse we could have also done this using assignment operator like this and then look at your test nums so it tells me it has created a list of numbers from 0 to 50 which are even numbers now you can always do a class off
            • 242:00 - 242:30 and let's look at this and that tells me the objects here are numeric and say for example i would use type of to see what is this says nums which we just created there was a typing mistake let's check this and it has the values with double right so we have created a sequence here where we are creating a list of numbers which have a space of 2 or you are saying about
            • 242:30 - 243:00 even numbers now you can also use a sort function so i can do a sorting here and i can give it an increasing or a decreasing order so if for example i have created this sequence and i could just create a simple variable like this pass in a vector into this which could be say for example i'll try your test nums and then look at your v so those are my numbers and you can
            • 243:00 - 243:30 straight away do a sort on your test nums so i could just do a sort on v and that basically shows me the number however i could also do a sort v and then i could say here let's check this v comma and then you can say decreasing equals true and let's do this it just reverse or
            • 243:30 - 244:00 puts the data in a reverse order or it sorts based on decreasing value and having the greater value in the beginning and the lowest value at the end so you can use a inbuilt sort function similarly you can use a reverse now reverse need not actually sort the values it will just reverse the elements in your sequence for example let's say v2 and i will again use this one as c and then just pass to your test
            • 244:00 - 244:30 nums that's an easier way or i could have created a new vector so i'll say test nums that's my v2 and you can do a reverse on v2 and that basically shows me the values but here we see let's see so we are looking at okay so this was wrong i should have given a capitals and do it yeah this is fine
            • 244:30 - 245:00 and we get the values however if i had created something like this v3 and let's say c and then let's say 99 and 2 and 3 and 4 and 5 and 78 100 so that's a vector i'm creating and now what i can do is i can use the reverse on v3 and you see it has just reversed the elements in the
            • 245:00 - 245:30 list now we could have done this without giving these brackets here and it shows me the result so this is good to understand what your sorting does so sorting is basically going to look at the objects and it's going to sort them in ascending or descending order reverse is just going to reverse the elements in your list now similarly you can also use append which is basically to combine objects so let's say v4 and that will basically have append
            • 245:30 - 246:00 and let's say let's take v2 and let's take v3 and this is what we would want to append and now look at your value of v4 which basically has everything added into one so this is your append similarly you have other functions like finding out the absolute value of a number you would want to find out the square root you would want to find the sum of all the elements in a vector
            • 246:00 - 246:30 you would want to find out the floor value exponential value of something and you basically finding out the mean value so these are some built-in mathematical functions so you have built-in symbol functions you have mathematical functions you have regular expressions in r which can also be used for pattern matching now what we can simply do is we can create a variable let's say text
            • 246:30 - 247:00 sorry for caps let's say text and here i will pass in something r is a programming language for data science let's do this and now i would want to use grep function so i can say grep and this one needs what i am searching for so let's say language and where am i searching for so i'm searching it in text and let's do that and that tells me
            • 247:00 - 247:30 where is this found so when i do a grep i'm trying to find out if this was found in my element so here i am saying text and grep language similarly i can also use one more function which is finding out index positions so i can also find out index positions by basically giving the vector and here i can do a grep pass in my vector a b c
            • 247:30 - 248:00 d you are searching for b and in your vector and that tells you your b is at the index position 2 d is at the index position 4. so here we are using some regular expressions now there are also other ways in which r can be used for data manipulation so let's learn about factors in r and how do you work with factors and what are they for so when you talk about
            • 248:00 - 248:30 factors so here let's clean this up and let's see what is this so when you say factors here we are talking about categorical variables so categorical variables can take only limited number of different values now don't be confused with this histogram example here might be we can just look at packages so that that doesn't get confused
            • 248:30 - 249:00 so when we talk about categorical variables we are talking about variables which can belong to only categories for example in r there is a data structure to work with these kind of variables and that is called your factor so with factors we can be sure that all statistical modeling techniques will handle such data correctly so for example you can talk about a person's blood group and you can say the
            • 249:00 - 249:30 blood group could be a or b or a b or o so say we collected information about eight people and we recorded this information as a vector and we can call it blood group so let's do that so let me try that here so if i say blood group and then i would like to create a vector here so that
            • 249:30 - 250:00 we can look at information about eight people and their blood group and this can be in the form of a vector which can then be created or converted into factor by using the factor function so how do we do that let's say i have blood group and here i will basically given some values so i will use c function and here let's give some values so for example let's say
            • 250:00 - 250:30 b let's say a b and let's say o and let's say a again let's again say oh might be one more o let's say a and let's say b so here we have eight entries and let's consider we have recorded the blood group of eight people and this is in the form of a vector so for example let me create this now this
            • 250:30 - 251:00 is a vector which we have created and you can always look at the value of this one so let's say blurred group and that basically okay there was a spelling mistake let's do blood group and that basically shows me the values and here you see all the values that are in double quotes now we have basically created a vector now to convert this vector into factor we can use the factor function
            • 251:00 - 251:30 and how we can do that is basically we can say for example let's go here and let's say blurred group underscore factor and for to convert this vector into factor i will use the factor function and then basically pass my blood group here and now i have created a factor and we
            • 251:30 - 252:00 can look at this factor by just doing in blurred group factor and now if you see it basically shows us a factor it does not have any double quotes and you can also see the factor levels for categorical variables which get printed out here now what what actually r is doing here is first r scans through vector to see the different categories in there
            • 252:00 - 252:30 then our sorts levels alphabetically and then it converts the character vector to a vector of integer values so these integers correspond to set of character values to use when factor is displayed now we can always do a structure to find out more details of this and here i will pass in blood group factor and let's look at this one and this one shows me
            • 252:30 - 253:00 this factor is with four levels so inspecting the structure will reveal that it has four levels it shows me what are the categorical variables and it shows me some integers so here we are dealing with a factor of four levels now a's are recorded and a would have say recorded as one so that would be a first level you have abs which which are recorded here
            • 253:00 - 253:30 and that is basically your second level b is the third level and o is the fourth level so when this when we are looking at this factor we may think why this conversion so categories could be very long character strings and each time repeating a string or an observation can take up lot of memory so using factors and having these levels can reduce the memory space
            • 253:30 - 254:00 now factors are actually integer vectors and each integer corresponds to a category or a level now to specify a different order of levels we can specify levels inside the factor function how do we do that so let's say i will say blurred underscore factor 2 and here i will basically give the same so i will say factor and this factor will have blood group
            • 254:00 - 254:30 which we had created earlier but this time also i am going to specify levels and then i can basically pass in a vector here so within this levels i will specify the values what are the levels so here i will say o then i will say a then let's say b and then let's say
            • 254:30 - 255:00 a b so this is what we are doing here to specify different order of levels and we are specifying the levels here now if i do this that would have got created and let's look at blood factor 2 and that shows me the value here below where you have specifically assigned the levels now if you look at the previous one where we had blood group factor where levels were
            • 255:00 - 255:30 automatically understood by r so we were looking at the categorical variables we were seeing what are the levels here and here what we have done is we have just created again a factor but then we have specified levels in a different order and you can obviously do a structure on this to compare so for example i'll say structure and then i will say blood fracture 2 and that basically shows me the
            • 255:30 - 256:00 structure with four levels so this was the initial one where we said a a b b and zero and then there were some integer values which were responding to these categorical variables here we have given a different level and we have a different set of numbers which we see here so if we compare structure of blood factor in blood factor 2 we will see encoding is different right now that is done we can also specify the level names
            • 256:00 - 256:30 so what we can do is as we use names function for name of vectors we can pass vectors to levels here and there is basically a function what you can use so let's say i will say levels and then within this i'll pass blood underscore factor or blood group underscore factor here
            • 256:30 - 257:00 so once this is done let's say blood group so in this one we created blood group underscore factor and this one was blood factor two so that's okay i mean it's just a naming convention and here let's pass in levels to my blood factor and then what i can do is i can pass in the values here so this is
            • 257:00 - 257:30 when you would want to give specific names and let's create a vector and let's call it say bt underscore a and might be you would want to give bt underscore a b and then you will give bt underscore b and the final one is bt underscore o so what i'm doing is i'm doing the naming for these particular categorical variables
            • 257:30 - 258:00 by using levels now let's do this and it says blood factor not found so we have to look which one did we have so we have blood group factor so this is what we should have given so let's say blood group factor blood group factor and now we have given some names here so let's look at the blood group factor now
            • 258:00 - 258:30 blood group underscore factor and now if you see we have some levels or we have given the name to the categorical variable so if you compare this one so here we were creating a blood group where we had these and these variables were the categorical variables which was just creating a vector then we created a factor out of it and then we looked at our factor we
            • 258:30 - 259:00 looked at the structure of it and similarly what we did was we created a different factor so let me also change the name here and let me call it blood group factor 2 but here we were specifying levels in a different order let us look at this one so which is blood group factor 2 and then you can look at the structure of this one blood group factor 2 and here in this example what we did was the
            • 259:00 - 259:30 initial bread group factor what we had created we have just given some names to that like what we would do in case of vector by using the names function here you are using the levels function so we basically created some levels and let's group at the blood group factor now which basically has some different names so what we are doing here is we are just using naming now we can also specify the categorical variable names
            • 259:30 - 260:00 or levels by specifying label arguments so inside the factor function so that is basically to give some names or levels so let's look at this one and how do we do that so we can specify by using factor which basically creates your factor and then here i'll say blurred group i'm going to specify labels for my
            • 260:00 - 260:30 naming so in previous examples we saw how we were using the levels right and this was by specifying levels for a different ordering and then we could have also done this by saying levels and given some different names or we can just do labels and then within this i will say labels equals
            • 260:30 - 261:00 and then i can say c and then let's give these values which we have bt underscore a bt underscore a b bt underscore b and o so let me just copy this one again and let's put it here and then we can basically do a ctrl enter so i would have created a factor here and then we should remember one thing here that
            • 261:00 - 261:30 it is important to follow in the same order as the order of factor levels that is a a b b or o now these are the levels what we are seeing so if you look at any one of these in the beginning which we had created it was showing me what levels it has a a b b and o and a a b b and o so we are following the same order but we are using the
            • 261:30 - 262:00 labels within my factor creation now sometimes there might be issues because of wrong ordering so we can actually use a combination of manually specifying the levels and label arguments when creating a factor now what we can do there is we can say factor and in this case let's say blood group which i'm creating then i will basically say
            • 262:00 - 262:30 levels to give the right ordering and here in levels let's say o let's say a let's say b and then let's say a b so this is for my levels which i'm creating and then what i can also do is i can go for
            • 262:30 - 263:00 labels so levels will take care of my ordering and labels will take care of by naming the categories so let's say labels and then we can create a vector and we can give some names so we can say bt underscore o what else we have we have bt underscore a we can then say bt underscore
            • 263:00 - 263:30 b and finally we can say bt underscore a b and then let's create this one so now what we have done is we have created a blood group which has levels which is following your ordering which is following the naming as we have passed so if you look at the levels it tells you the names what you have created it also tells you all the categorical
            • 263:30 - 264:00 variables which were used for my blood group and basically these will have some labels so we can anytime look at our blood group which we had created in the beginning and let's look at the values of those so when we talk about categorical variables there are two kinds in categorical variables so
            • 264:00 - 264:30 you have nominal or you have ordinal now in nominal you don't have any implied order for example blood group o is not necessarily greater than a that is o is no or not more worth than a that we can think of now trying such comparisons with factors will generate a warning so say for example we would want to look into our blood
            • 264:30 - 265:00 factor and let's look at what blood group factor contains now that's the new blood group factor let's say blood group factor and here i will try to pull out a value here and let's compare this with blood factor and let's look at some other value now in this case we see
            • 265:00 - 265:30 not meaningful for factors so it cannot really compare the categorical variables and see if one variable is greater than other or has more worth now there can be many examples where such ordering does exist and in r we can impose such ordering in factors thus making it ordered factor so inside factor we can set the argument ordered is true and we can do that now for example you
            • 265:30 - 266:00 would look at the size of address so let's say address size and here i will say for example let's create a vector and let's say medium let's say large let's say small let's say again small and then let's say large
            • 266:00 - 266:30 let's say medium again an entry of large and then let's say medium so here i'm creating a vector and let's see if we missed out any quotes or comma so it says unexpected symbol and where is that so let's look at this one so we have dress size and we are
            • 266:30 - 267:00 looking at c so i'm saying m l s s l and here is a quote missing and that was the reason so and this one also has a code missing and now it should resolve yeah so let's look at this one and now we have created a vector called dress size now obviously you can create a factor of this so i'll say address size
            • 267:00 - 267:30 underscore factor where i would want to look at the ordering of this so let's create a factor and in factor we will pass our vector on which we we want to convert or we want to create a factor we will say ordered equals true so i am specifying a particular ordering and then i can also specify levels as we
            • 267:30 - 268:00 saw earlier so in levels we will give the category so what categories we have so we have small we have medium and we have large so these are the three levels which we have and let's create this as a factor now that's done and what you can do is you can look at the factor and we can also do a comparison so let's for example look at our factor
            • 268:00 - 268:30 what does it contain it has some levels and if you closely notice there are these levels which also have a comparison of which one is worth or more worth than other variable so you can look at dress size factor which has some ordering which we have implemented and now let's do a comparison between dress size and compare it with some other variable and see what is the result so now it says if
            • 268:30 - 269:00 it is true or if it is false earlier we were not able to do that because we did not have any ordering and if we were looking at the variables we were not really clear if one variable has more worth than others so these are some simple examples what we have seen now we can also look at some more examples so say for example i do a type here now that basically is creating a vector if i would want to compare the element that is type 3
            • 269:00 - 269:30 is it greater than type 4 it shows me false right now here what we are seeing is that if you are looking at a particular value okay we can basically see that there is some comparison happening here if i compare this with 1 and 2 which tells me true or false and if i look at this it also does some comparison so i can always
            • 269:30 - 270:00 convert this into factor by using the factor function so i can do this if i'm checking if for example i would want to create a nominal factor i can do a type dot factor and it tells me it is true you can also do a type dot factor 2 and then use the factor function pass in your type which is a vector and here you are saying ordered as true which we just now saw and now you look at type dot factor 2 which is creating an ordinal
            • 270:00 - 270:30 type of variables now here we can again create type dot factor 3. so what we are doing here in this case we had a vector we basically said type dot factor we said factor is of true and then we looked at the nominal factor we also did a factor 2 and then we created factor but we specify ordered as true so we get ordinal
            • 270:30 - 271:00 and now if you look at type dot factor 3 here you are saying ordered and you are also specifying levels like what we did in the previous example and now you would look at ordered factor with user given order which also has the levels which clearly show us a comparison between those now we can take a different example we can say type dot factor 4 we are using the factor function i am specifying type which is a vector i am
            • 271:00 - 271:30 saying ordered is t i am using level which is giving me some levels and then we also have labels which are basically going to have the naming convention so let's look at this one and look at type dot factor 4 so it tells me what are the categorical variables which are small medium large small large medium these are for my type values which we created a vector here these are my type
            • 271:30 - 272:00 values for which we created a vector we said ordered is true the levels is small medium large and we gave some names so we are looking at the values of this so this basically helps you to work on your categorical variables when you can then compare the values and you can see what does it show now here what we are doing is we are creating a different vector we say small tall tallest medium small and so on
            • 272:00 - 272:30 let us look at this one which is basically type and it has the value so what we would want to do is we would want to compare height type of first value with the fourth value so for that let's create a vector on this type ordered is true level is we are saying small medium tall and tallest these are the levels and now when you look at your type dot
            • 272:30 - 273:00 factor phi it basically shows me what are the levels which you have specified so small is the smallest then you have medium which is bigger than small tall is bigger than medium tallest is bigger than tall we have assigned some levels and based on these levels now you can compare your values in this factor type dot factor 5 take the first value which is small and compare it with the fourth value which is medium and you will know if
            • 273:00 - 273:30 small is greater than medium so the result would be false now i can also convert this into integer and i can continue working on this now here you have basically a sequence so let's use the sequence function where i'm starting from 0 ending to 20 and there is a jump of 2 so that basically creates a vector let's look at the vector value here
            • 273:30 - 274:00 and if you would want to sort the vector so we are using a inbuilt function wherein let's create this vector with these numbers i can do a sorting i can also do a sorting with decreasing is true you can do a reversing of vector so these are some examples of inbuilt functions which we have already discussed so here you are doing a reverse you are finding out the structure you want to append two vectors you want to check the class of an object
            • 274:00 - 274:30 you want to convert a vector into a list using as dot list converting the vector into a matrix you are having a sample with with two random values between 10 and 20. so these are some inbuilt functions which we have already discussed such as your absolute such as your vector and getting an absolute value or getting a sum of it or a mean of it around
            • 274:30 - 275:00 or basically rounding it to two decimal places getting the ceiling value getting the floor value truncating it get returning the log getting the exponential value and so on now we have also looked at regular expressions earlier so regular expressions let's just revisit that so here you are basically creating a variable called text and then you can just do a grip you can say what you would want to search and where you would want to search it and that would give you the
            • 275:00 - 275:30 logical value indicating if the pattern was found you can try to search something else which might not be found you can also search for independent values like this and that basically can give you the position of that particular object within the vector and here is one more example of working with time stamps so for example if i would just to assist or date it returns the current system date if i would want to
            • 275:30 - 276:00 set that as a variable and then call that variable it shows me our current time i can also use as date and then let's look at this one so as date and this would be converted into date and then you can obviously use formatting techniques like getting the month getting the day getting the year so here we are passing in the date and then we are saying what format we would
            • 276:00 - 276:30 be interested in and that basically gives us the data in a particular format so that's also useful when you have your time series data or when you would want to convert the data types and so on now there are different ways in which you can do formatting so for example in this one we were saying month day and year i can also say for getting the full month name or getting the full year name i can do this caps
            • 276:30 - 277:00 so i can look at this one and that basically shows me my date in a particular format so these are some inbuilt functions which we are seeing and before this we were seeing factors which is mainly to work with categorical variables either they have levels auto assigned and they might not have labels so you can give labels you can give levels you can control the ordering you can give levels in a different way so
            • 277:00 - 277:30 that you can have a different ordering so this is how you use factors and work on categorical variables maybe that is nominal or ordinal and easily you can do your statistical computations on such data let's learn about data manipulation in r and here we will learn about d player package and when we talk about this d player package it is much faster and much easier to read than base r so d player
            • 277:30 - 278:00 package is used to transform and summarize tabular data with rows and columns you might be working on a data frame or you might be getting in a inbuilt or data set which can then be converted into a data frame so we can get this package the plier by just calling in library function and this can be used for grouping by data summarizing the data adding new variables selecting different set of
            • 278:00 - 278:30 columns filtering our data sets sorting it selecting it arranging it or even mutating that is basically creating new columns using functions on existing variables so let's see how we work with dplyr now here i can basically get the package here so i can just say install dot packages dplyer now we already see the the package here which is showing up so i will just select this
            • 278:30 - 279:00 one i can do a control enter and that will basically set up the package package player successfully unpacked so that is done now you can start using this package by just doing a library d plier and this was built it shows me my version of r so let's also use a inbuilt data set that is new york flights 13 so we can do install dot packages and that will search and get that relevant data set i can again call it by using library
            • 279:00 - 279:30 function now once that is done we can look at some sample data here by just doing view flights and that shows me the data in a neat and a tabular format which shows me year month day departure time schedule departure time and so on now we can also do a head to look at some initial data which can help us in understanding the data better so what is this data about how many
            • 279:30 - 280:00 columns we have what are the data types or object types here it shows me how many variables we have so this is fine now we can start using the player and in that we can use say filter function if we would want to look in for specific value now here we have the column as month so i will do a filter now i'm creating a variable f1 i'm using the filter function on flights which we already have
            • 280:00 - 280:30 and then what we can do is we can basically look at the month where the month value is 0 7 so let's look at that and this one you can do a view on f1 which shows me the data wherein you have filtered out all the data based on month being seven so this is a simple usage of filter we can take some other example we may want to include multiple columns so we can
            • 280:30 - 281:00 say f2 filter flights and here we will say month is equal to 7 day is 3 and then look at the value of f2 if you are interested in seeing this and that tells you the month is seven and days three you could also look into a more readable format by using view on f2 and that gives me my selected results so we are just extracting in some specific value we can keep extending
            • 281:00 - 281:30 this so here we can say flights is what we would want to work on i'm using the filter function so i can straight away instead of creating a variable then then doing a view i can also do a view in this way i can just pass in my filter within the view and within this i am saying filter i would want to look at the flights month being 0 9 day being 2 and origin being lga and then that shows me the value here
            • 281:30 - 282:00 and obviously you can scroll and look at all the columns and if you see the origin column it shows the selected value so now we have filtered out our data based on values in three different columns now what we can also do is we can use and or we can use or operators so i could have done this in a a little different way so i could have said head which shows me
            • 282:00 - 282:30 initial result i will do a flight so within my head function i am passing in this and what does that contain so you are saying flights and in this flights data set you would want to pick up the month being the column so we use the dollar symbol here we given a value and i'll say and and i'll again say flights wherein i will select the day being 2 and and and remember when you talk about and it is going to check if all the values are met true so
            • 282:30 - 283:00 then you say flights origin lgea and you look at the value so in this way i can filter out specifically multiple values by specifying columns now we could have done it in this way we could have created a view or we could have assigned this to a variable and then done a view on that where we could have selected month being day and origin or you can be more specific
            • 283:00 - 283:30 in specifying all the columns it makes the code more readable so let's look at the values and here you are looking at head which shows me based on month day and then you can look for further columns for other variables that is origin being lga now what we can also do is we can do some slicing here to select rows by particular position so i can say slice and i would want to look at rows one two five and i can do this
            • 283:30 - 284:00 so you can always assign or look at the view of this i can just do here so when i did a slide one is to five it shows me my entries for 1 to 5. now similarly we can do is slice 5 to 10 and now you are looking at 5 to 10 values so you can always look at the complete
            • 284:00 - 284:30 data and then you can slice out particular data now mutate is usually a function which is used when you would want to apply some variable on a particular data set and then you would want to add it to your existing data frame or you would want to add a new column so this is where you use mutate which is mainly used to add new variables so let's see how you work on mutate
            • 284:30 - 285:00 so it's pretty simple so you create a variable over delay now i would want to do a mutate so that it adds a new column so i'm selecting my data which is flight i will call the new column as overall delay and then basically i can look at overall delay being arrival delay minus departure delay so let's create this and let's look at view of this which shows me or which should show me my new column
            • 285:00 - 285:30 which is overall delay which was not in my original data set so you can anytime do a head on this one to compare the value so this one shows me arrival delay and then there are many other variables what you can also do is you can do a view and you could have just look at flights if you would want to compare so you can look at the flights and this one would not have any overall delay column so it basically
            • 285:30 - 286:00 shows me 19 columns only what we see here and if you do a view on overall delay then that basically shows me 20 columns so we know that the new column has been added to this overall delay so if you would want to work with 20 columns you will use overall delay if you would want to work with your original data set you will use flights now you can also use a transmute function which is used to show only the
            • 286:00 - 286:30 new column so we can do an overall delay and at this time we will say transmute we will say flights overall delay the computation remains same but at this time if i look at view on overall delay it only shows me the new column so sometimes we may want to compute result based on two variables or two columns and just look at the new value and then we can decide if we would want to add it to our existing structure
            • 286:30 - 287:00 now you can also use summarize and summarize basically helps us in getting a summary based on certain criteria so we can always do a summarize and what we can do is we can look at our data and we can say on what basis we would want to summarize this particular data so we can do a summarize function now summarize on flights i will say average
            • 287:00 - 287:30 a time and i would want to calculate an average so for that i am using inbuilt function called mean i will do that on airtime column so let's look at flights once again and here we can see there is arrival time not a time sorry arrival time and we would want to do some average on this particular data we would want to summarize this so what i'll do is i will use the summarize function i will say average airtime and this one
            • 287:30 - 288:00 i will look at mean of a time so let's see if there is a a time column i might be let's look at this one and i will delay and yes we have an airtime so we were actually looking at summarizing based on air time not the arrival time so air time is how much time it takes in air for this particular fight and we will want to use the trans summarize function not the transmute so summarize
            • 288:00 - 288:30 flights average a time and this one we will calculate the mean of average a time and i will also do a any removal which is i am saying true so let's do this and that basically shows me the average a time is 151 i can also do a total airtime where i'm doing a summation of values or i can get the standard deviation or i can basically get multiple values such as mean
            • 288:30 - 289:00 i can say total airtime where i am doing a summation and then i can look at other values which is if you would want to put in standard deviation here you could do that so let's look at the result of this summarize and this basically allows me to get some useful information which is summarized based on a particular function such as mean sum standard deviation or all three of them
            • 289:00 - 289:30 now let's look at grouping by so sometimes we may be interested in summarizing the data by groups and that's where we use the group by function so we can always use the group by clause now here we are taking a different data set so we will say for example let's look at head of mt cars and that is basically my data set on
            • 289:30 - 290:00 empty cars now that shows me the model of the car it shows me my lathe cylinder power this and your horsepower and various other characteristics or variables in this particular data set so here we can say let's do a grouping by gear so there is a column called gear so i will call it by gear i will look at my data set and then what i am using here which you see with these percentage and greater symbol is called
            • 290:00 - 290:30 piping so that basically feeds your previous data frame into next one so this is sometimes useful and you can get this by just saying control shift and m and you can then use this so we are going to have piping so i am saying empty cars now this is my original data set where i did a head or i could have done a view on this one if you would want to see it in a more readable format and that basically shows
            • 290:30 - 291:00 me the data so we are using a different data set so i want to group it by the gear column so i'm going to call it by gear and this one takes my data that is empty cars i'm using the piping and then i'm saying group the data based on gear column that's done now let's look at the value of by gear or you can always do a view so remember whenever you're doing a group by it is giving you a internal object where your data is
            • 291:00 - 291:30 grouped based on a particular column so we can look at the values here you can do a view that shows you your data grouped based on a particular column now i can again use the summarize function where i would want to now work on the new one where it was grouped based on gear so i am doing a summarize and here i am going to say gear 1 which will be having the value of summation on the
            • 291:30 - 292:00 gear column and then i'm saying gear 2 which is mean well you could give some meaningful names to this and let's look at the value of this one where we are basically now looking at the values which is sum and mean values based on the gear similarly we can use look at different example so we can say by gear and i am again using piping but earlier we had taken gear
            • 292:00 - 292:30 we had grouped the data and we called it by gear so we took our original data set empty cars but now within this particular data which was grouped by gear i will take this data set i will use the piping and i will summarize it where i am saying within this particular data set i would want to get the sum or i would want to get the mean and then you can look at the values so what you are doing is you are either looking at your original data set
            • 292:30 - 293:00 or you are looking at the data which was already grouped and then you can look at the values now here what we can do is we can group by cylinder say might be you are interested in looking at data which is summarized based on the cylinder column you can do that and then for this by cylinder i'm doing a piping where i'm using the summer rise function and summarizing will then be done based on the mean values of the gear column or
            • 293:00 - 293:30 the horsepower so let's do this and then you can basically look at the value at any point you may want to look at the data set again so just go ahead and you can look at what does the value contain and by cylinder or by gear and do a head and it gives you the value so you can always do some summarizing or grouping in these ways now here we are going to use sample
            • 293:30 - 294:00 underscore n function and sample underscore fraction for creating samples so for this let's take the flights data set again and we would want to get 15 random values now that is done and it shows me 15 rows with some random values from the data what you can also do is you can do a portion of data by using sample underscore fraction and here i'll say flights i'll
            • 294:00 - 294:30 say 0.4 which will return 40 percent of the total data so this can be useful when you are building your machine learning where you would want to split your data into training and test might be you are interested in some portion of the data so you can do this which is very useful function and then you can look at the value of that now what we can also do is we can use a range function so like we were doing a grouping by or we were trying to pull
            • 294:30 - 295:00 out a particular column so in the same way we can use a range which is a convenient way of sorting than your base r sorting so for a range function let's do a view based on a range so we will work on the flights data set which we have and here what we would want to do is we would want to arrange the flights data set which is based on year and departure time and we are doing a view out of it
            • 295:00 - 295:30 so that basically gives me the data which is arranged based on your year and departure time now i can do a head to give me some highlighting of that data now the piping operator what we are using can be used in these ways also so here i will say df i will just assign the data set empty cards to it let's look at the df which has basically your different
            • 295:30 - 296:00 models you can obviously look at the head or view of it to look at useful information we can also go for nesting options which can be useful so we are creating a variable called result here now that has the arrange function so what does this arrange function do so when we would want to use arrange to sort the data so i would want to sort the data but what data would i sort so i
            • 296:00 - 296:30 will use sample n which will give me some portion of the data or some sample data now what is that sample data so here we are using nesting that is earlier when we did a sample we just said data and how many random samples we want but instead of giving that what we are going to do is we are going to use filter here now this filter will work on df so filtering will happen based on the
            • 296:30 - 297:00 mileage which is greater than 20 i will say size is 5 and i would want to basically arrange this in a descending order so i'm using the des on this particular mileage column by default it is always ascending so let's get the result out of this which will basically show me the mileage details in a descending order so this is my data frame and now we can look at the result what we have
            • 297:00 - 297:30 created so just do a view or do a head and look at the view so here you see mileage where the highest value is on the top and we were only interested in five values in our random sample so that's why when you did a view it shows your five values and it shows in a descending order based on mileage so we have not only used an inbuilt function
            • 297:30 - 298:00 we have not only arranged the data that is we have sorted the data but we have sorted the data based on a descending order on a particular column we have said the value should be greater than 20 and we have also said we just need five random samples now let's look at some other examples so you can always do a multi-assignment so i can say filter wherein i am going to use df which was assigned empty cars i am going to say
            • 298:00 - 298:30 mileage should be greater than 20 then i say b which is going to get a sample out of a and i just want 5 random values so let's look at that so we have b which is going to get a set of five values from a now i will create a result variable which will arrange b which is sample data in a descending order now let's look at the result of this and
            • 298:30 - 299:00 that basically shows me what we were seeing earlier so you can do a multi assignment where you can create a variable get a sample out of it and then basically whatever is that result you can arrange that or sort that in a descending or by default ascending order so same thing we can do it using pipe operator so piping so here i will say result i'm passing in my df that's the data set i'm using piping and which basically
            • 299:00 - 299:30 tells what you need to do on this particular data set so i'm going to filter out the data based on mileage 50 sorry mileage 20 then i'm going to push that or forward it to get the random sample and whatever is this random sample is going to be pushed so you are arranging this in a descending order so this is one more way of doing it and then basically you can look at the result so these are some simple examples where you can use your d plier with multiple assignments or using
            • 299:30 - 300:00 your nesting to filter out the data you can also do a arrange which is to sort the data you can get some random samples out of it you can summarize the data you can also summarize the data based on one or two or multiple columns and you can use some inbuilt functions to summarize the data based on some functions which are applied on the variables or on the columns you can transmute it
            • 300:00 - 300:30 where you would be interested in only looking at one column you can mutate it where you want to add a new column you can slice it and you can give the conditions where you can say and on or to filter out the data so what we can also do is on this particular data set which we have say for example df where i have my data let's look at this one and if i just do a df at this point it shows me my data set and if you would
            • 300:30 - 301:00 be interested only in particular column then your d player also allows you to either we can do a filter or we can simply do a select now for selecting we can choose our data so for example i'll say df underscore i'm interested in mileage i'm interested in horsepower might be i'm interested in your cylinders in this and for this one what i can do is when i
            • 301:00 - 301:30 would want to do a select i can basically say selected df let's call it some name i can say control shift m which is for piping and then basically what you can do is you can do a select and you can choose your columns so i was interested in mileage i was interested in horsepower
            • 301:30 - 302:00 i was interested in cylinder and here what i'm doing is i'm using a select where i can look at the new data frame so let's do this and i'm sorry here we will have to give it df this is where you are passing in your data yeah now this one is done and we can look at the value of this one by just doing a df or head on df
            • 302:00 - 302:30 underscore mileage horsepower cylinder and look at the selected result so you can be looking at selective columns i could have done this filter but filter will always look for a condition say your mileage is greater than 20 or might be your cylinders are more than 4 or something else but when you do a select you are selecting specific columns so view always gives you all the columns head gives you highlight but then select
            • 302:30 - 303:00 can be useful when we are interested in looking at only specific data so this is how you can use the player for manipulation for your data transformation for basically filtering out the data by selecting particular data and then working on it so similarly there is one more package called tidr and we'll see how we can use data manipulation done using your tie dr package let's
            • 303:00 - 303:30 learn about that idr package it makes it easy to tidy your data and this basically helps you creating a more cleaner data so which is easy to visualize and model now this comes with mainly four functions so you have gather which makes your data wide or it makes white data longer so that is basically used to stack up multiple columns you have spread function which makes
            • 303:30 - 304:00 long data wider that is stacking the data together or stack if you would want to unstack the data to data and you are talking about data which has same attributes and then your spread can spread the data across multiple columns you have separate which is function which splits single column into multiple columns and to complement that you have one more function which is unite and that
            • 304:00 - 304:30 combines multiple columns into single columns so these are four main functions which are used in your tidr package so let's look how we work with this so let me bring up my r studio here now for this first is let me just clean up my screen here doing a control l so i will install the package it is already installed but we can just do a control enter and then i can say do you want to restart r prior to reinstall store install i'll say okay
            • 304:30 - 305:00 and it is basically going to get the package now it says package ti tidyr the rest idrs has been successfully unpacked let's use that package using our library function and that was built under our version 3.6 now i can basically start using these functions so for example here we are creating a data frame so let's say n is 10
            • 305:00 - 305:30 and then we basically would say we will call it white now that's the variable name i'm using the data.frame function i'm saying id which will be 1 to n so that will take the values from 1 to 10 and then these are the values which have 10 entries so this is a vector phase 1 phase 2 phase 3 let's create a data frame out of it now that's done we can have a look at our data frame by just
            • 305:30 - 306:00 doing a view wide and that shows me the id column and it has face dot one face dot two and face dot three now we can use our function so for example we can work with gather that is reshaping the data from wide format to long format and basically you can say stacking up multiple columns so let's see how we do that here i'll call it long i'm working on white i am using the piping functionality and then i am using gather
            • 306:00 - 306:30 so this one i will say what will be the data which i will use so we are using wide as a data frame then i am saying response time so that will be basically one more column and then you have your columns which you would want to basically stack so i'm saying from phase one to phase three so let's do this and once this is done let's have a look at our variable long so this one shows me that i have an id column
            • 306:30 - 307:00 i have the response time column and i have the face column which we mentioned and that basically has all the values stacked in so you have face dot one face dot two and face dot three so if all the columns are being stacked here so all my data so now i have totally 30 entries in this one so this is basically using your gather function now sometimes we may want to use a separate function now separate function is basically splitting a single
            • 307:00 - 307:30 column into multiple columns so which we would want to use when multiple variables are captured in a single variable column okay so let's look at an example of this one so let's say long separate that's what we will call we will work on this long which has all the data stacked in as the columns we selected then i am saying separate i want the face column and then i would say when i separate the columns what are my column names now i could also give a
            • 307:30 - 308:00 separator by giving a comma and then mentioning the separator if that is required so let's do this now once this is done let's have a look at our long separate so what we see here is the column which we used so we were doing a face column and that was to be split and we wanted to split it into target and number so that's what we see here so you have face being split into target and number and then you have the
            • 308:00 - 308:30 response time so this is how you use the separate function now there is also something called as unite function which is basically a complementing of separate function so it takes multiple columns and combines the elements to a single column so for example here we will call it long unite and we will take long separate which was separating the data we want to unite so we will take face target number
            • 308:30 - 309:00 and we want to have a separator between them so let's basically do this and now let's look at the result of this unite so you see you have the face and target merge together so you have face dot one the separator is dot as we have mentioned and we have united multiple columns so this is one more function of your tie dr which helps you basically tidy up your data or put it in a
            • 309:00 - 309:30 particular way now then you have your spread function and this is basically for unstacking so that is if you have if you would want to convert a stack to data or if you would want to unstack the data which is of same attributes spread can be used so that you can spread the data across multiple columns so it will take two columns say key and value and spread it into multiple columns so it makes long data wider so we can look at this one we will say long
            • 309:30 - 310:00 unite i'm using the piping i will use the spread function i'll work on the face column and response time and let's do this and then let's do a view on this so it tells me our data is back in the shape as it was in the beginning so these are four functions which are very helpful when we work with idr package so let's learn about visualization
            • 310:00 - 310:30 and here we will learn about r which can be used for your visualization now one thing which we need to understand is because of our ability to see patterns which is highly developed we can understand the data better if we can visualize it so the efficient way or effective way to understand what is in our data or what we have understood in our data we should
            • 310:30 - 311:00 or we can use graphical displays that is your data visualization so there are actually two types of data visualizations so you have exploratory data visualization which helps us to understand the data and then you have explanatory visualization which helps us to share our understanding with others so when you talk about r r provides various tools and packages to create data visualizations
            • 311:00 - 311:30 and which can be used for both kind of data analysis or both kind of visualizations so when you talk about exploratory data and visualization the key is to keep all the potentially relevant details together now the objective when we talk about exploratory data analysis is to help you see what is in your data and the main question is how much details can we interpret now when you talk about different
            • 311:30 - 312:00 functions which we see here such as plot which is more for a generic plotting you have bar plot which is used to plot data using rectangular bars so you can say creating bar charts you have histogram or hist function to create histograms where you look at the frequency of the data or basically used to look at the central tendency of the data you have box plot which is used to represent
            • 312:00 - 312:30 data in the form of quartiles you have gg plot which is a package which enables the user to create sophisticated visualizations with the little code using the grammar of graphics and then you have plotly or plot ly it creates interactive web-based graphs via the open source javascript graphing library now before we see some examples here let's also talk about when you talk about plotting let's also
            • 312:30 - 313:00 try to understand what kind of plots you can have and what kind of techniques you have so let me open up my r studio here now for example i can pull out a particular data set and let's look at this one so here i can look at all the panes and that shows me the information now what i can do is i can install and get the inbuilt data sets and then i
            • 313:00 - 313:30 can simply do a plot wherein i am doing a plot on jquery data set so let's see what does that show it summarizes the relationship between four variables in chick-weight data frame which is in our's built-in data set package now from these plots we can see for example weight varies systematically over time you can also see that chicks were
            • 313:30 - 314:00 assigned to four different diets now when we talk about explanatory data analysis or visualization that shows others what we found in the data this means we need to make some editorial decisions what features we would want to highlight for emphasis what features are distracting or confusing and you want them to be eliminated right so there are different ways of doing it now when you talk about your graphics or visualizations you have
            • 314:00 - 314:30 i would say three different types or you can say four so you have the base graphics which is easiest to learn now here we are having an example of base graphics where i can use the base graphics i can get a data set using library then i can simply create using plot function to a generate a simple scatter plot of calories with sugar from u.s serial data frame in the mass
            • 314:30 - 315:00 package and then i can give it a title so this is basically a simple example of base graphics now you also have what we call as grid graphics which is powerful set of modules for building other tools now you also have latest graphics which is general purpose system based on grid graphics and then you have your gg plot 2 which implements grammar of graphics and is based on grid graphics so you have different ways now here since i already have used library and i have the
            • 315:00 - 315:30 data set i can just do a x so i can assign the sugar related values to x and calories related value to y then i can use one more which is library function and calling in grid now i can basically use functions such as push view port if i would want to create a plot using your grid graphics to create the similar kind of plot which we created using base graphics but this
            • 315:30 - 316:00 will give you much more power than base graphics it will have a steep learning curve but it is usually useful so i can do this where i'm saying push view port then i can basically say i would want to have a data viewport i would say different functions of your grid package so i'm saying rectangle you have x axis y axis given some points here and then basically you can add details
            • 316:00 - 316:30 to the graph by giving the names to the columns and you can basically create a simple grid graphics based plot here now there are different other options which we can use to create plots now before we go into understanding how you create plots let me just give you a brief on what are the different kind of plots and how they can be used so here we will look at these different plots now for example
            • 316:30 - 317:00 we have a bar chart which is a graph which shows comparisons across discrete categories so you have x-axis which will show the categories being compared and y-axis which represents a measured value and height of the bars are proportional to measured values now to create different kind of charts you can use ggplot which is a package for creating graphs in r it is basically a method of thinking
            • 317:00 - 317:30 about and decomposing complex graphs into logical subunits and that is a part of tidy works ecosystem so it takes each component of graph accesses you can give scales you can give colors you can give the objects and you can build graphs on particular data you can modify each of those components in a way that's more flexible and user friendly you can if you are not providing details for the components then ggplot will use sensible
            • 317:30 - 318:00 defaults and this basically makes it a powerful and flexible tool now here are different options when you use your gg plot such as you can use geom or what we call as geometry objects to form the basis of different type of graphs for bar charts you have for line graphs you have scatter plots that is underscore point you have underscore box plot for box plots you have quartile for continuous x violin for richer display
            • 318:00 - 318:30 of distribution and jitter for small data so here is some simple example where i would not go into too many details here but you can just have a look at this one where we are using library function to get the ggplot2 package then basically we would want to look into the mileage data we would want to look at the structure of it and then we can basically get the tidy words package finally we can create a
            • 318:30 - 319:00 bar chart using geo underscore bar and we can basically also mention what would be in x-axis now you can also give different colors to basically add more meaning to your data you could also go for stacked bar charts so here we are actually telling ggplot to map the data in the drive column to fill the aesthetic so here i am giving aesthetic access class and i am saying what is the data we need
            • 319:00 - 319:30 to have and then we are using geom underscore bar so you can also have dodged bar in your ggplot that is not bar charts which are stacked but next to each other and you can create that by using your position as position underscore dodge okay now you can obviously use your different packages which are inbuilt and you can create your bar charts and you have other kind of graphs such as line
            • 319:30 - 320:00 graph which is basically a type of graph that displays information as a series of data points connected by straight line segments such as this one and for this one we are using if you see geom underscore line now you can also create a scatter plot which is a two dimensional data visualization that uses points to graph the values of two different variables one in an x axis one on y axis like what we saw in base graphics
            • 320:00 - 320:30 example and they are mainly used if you would want to assess the relationship or lack of relationship between two variables and you also have histogram which i mentioned is mainly to look at the distribution of a data to look at the central tendency of the data basically looking at your large amount of data for a single variable you would be interested in saying where is
            • 320:30 - 321:00 more data found in terms of frequency where is lesser data found in the graph how close the data is towards its uh mid point or what we call as mean median mode so you can use histogram where you can categorize the data in what we call as bins so these are some basics on different kind of graphs now we can look at some examples and see how that works so what we were seeing is some quick examples of base graphics or grid graphics now here
            • 321:00 - 321:30 let's do an example of pie chart for different products and units sold so you want to create a graph for this first let's create a vector and pass in the value here now i can also create labels which i would want to assign to these values and then basically i can plot the chart by saying pi so that's the kind of chart which i would want to create and i would say the data would be x and labels so let's do this and that shows me a
            • 321:30 - 322:00 simple pie chart now i can also give main details here so instead of just doing a pi x comma labels i can say what is the main and then what kind of coloring it should follow so this is the way you can create a simple uh plot now i can also find out what is the percentage and then basically i would be interested in plotting the pie chart which takes x which takes the labels which will be the
            • 322:00 - 322:30 percentage which we are calculating here by doing a round function and then you can basically give details to your graph you can say what color it follows you can basically look at the legend where it needs to be in your chart what are the values and then basically fill up the colors so let's run this one and that shows me the percentage which was calculated and it gives me the details and we can always have a look at our
            • 322:30 - 323:00 plot now if you would want to go for a 3d pie chart then you can get the package which is plotrix let's use that by calling in the library function let's pass in some data to x and let's give some values or labels which will make more meaning to the data and then let's plot the 3d graph so i'm saying pi 3d here where i'm using x and
            • 323:00 - 323:30 labels then i'm basically doing an explode which will basically control how your graph looks like and basically give the values so it also takes the title when you say main and by chart of countries now let's create data for graph so again we are having a variable here we are create using the c function creating a vector and then let's create a histogram for this one where i would say xlab what would be
            • 323:30 - 324:00 your data around x-axis what is the color what is the border and here i am creating a simple histogram which as i discussed earlier will always show your values on the x-axis and y-axis is more of frequency and then you can look at the set of values and what is their frequency and we can basically use this histogram for exploratory data analysis look at the data try to understand what is the central tendency of your data values
            • 324:00 - 324:30 now we can also give some limits by using the x lim and ylim and then i can also specify what is the limit so we have given some values here wherein we have said your x limit is 0 to 40 and y limit is 0 to 5. now if you compare this with the previous one which we had created this one based on the frequency had taken the limits but we can assign limits
            • 324:30 - 325:00 explicitly by giving this and then create a histogram which makes more meaning now let's take another data set that is air quality let's view this to see what does that data contain so you have ozone solar wind temperature month and the day so this is the kind of information we have in the air quality now let's use the plot function to draw a scatter plot where as i mentioned you would be interested in analyzing variables and see
            • 325:00 - 325:30 what is the relationship between them so to plot a graph between ozone and wind values so we will say plot we will say the data which is air quality from that i would be interested in the ozone column or ozone field and the wind field i can create a plot based on this now i can also be saying what should be the color what is the type of the data which you would want to create and you can look at the info information so you
            • 325:30 - 326:00 can create a histogram you can create a scatter plot to basically understand the data better and then infer some information from that data so let's take the air quality data set itself without specifying any particular column and you can create a plot which shows me all the different values which you have in the data and it basically shows you the difference this is more of an example like what we did for chickweight
            • 326:00 - 326:30 where we did a base graphics now you can assign labels to the plot so that is when you are creating a plot you can say air quality you will say ozone and then that's your ozone concentration you have your y lab which is the number of instances you have what is the title ozone levels in new york city what is the color so these are the details what we have given with our plot function and let's look at the data so it just tells me that this
            • 326:30 - 327:00 is the ozone concentration the number of instances what you have and you are looking at the data now we could also create a histogram by picking up a particular column that is such as solar from your air quality and that basically shows me the frequency of solar values and we can then try to find out what is the mid what is the mean what is the standard deviation and so on you can also look at your histogram and try to understand if
            • 327:00 - 327:30 it is left skewed and right skewed so we can do that now here let's get the temperature out from this particular data set let's create a histogram on temperature and that basically shows me the frequency of the temperature values and what values have the most frequency or most occurrence now you can create a histogram with labels so let's do that with the limit
            • 327:30 - 328:00 and then let's also use text to basically given the values which also takes the values and for each set of frequency or each set of values it gives me the labels now you can have a histogram with non-uniform width so you could do that by doing a hist function and then passing in your temperature you can say what will be the main what is the title what will be your x lab
            • 328:00 - 328:30 it will tell you a limit around x axis what is the color what is the border what are the breaks you would want to have for your bars and you can simply create a histogram using this so this basically takes the breaks which we have given such as 55 to 60 to 70 70 to 75 and so on so this is basically creating a histogram with non-uniform width and it purely depends on the kind of
            • 328:30 - 329:00 values what you have now you can also create a box plot which sometimes helps us in understanding the the data quartiles also understanding our outliers so you can create multiple box plots based on the data from air quality so we'll select all the data and then we'll do some slicing on the data so let's create a box plot which tells me the values and if you look at these points here like single dots these are basically your outliers
            • 329:00 - 329:30 we can learn about that more in later sections so you can use your gg plot 2 library to analyze a particular data set so for that we will first use the install install.packages and get ggplot2 so it says do you want to restart r and i can say yes so let it get the package i think the package was already there and now let's look at using ggplot2 so for that
            • 329:30 - 330:00 i have the library function and let's do a attach where i'm getting a data set which is empty cars now then i will create a variable p1 i will use ggplot i will pass in my data i'll give the aesthetics what is the columns which you would be interested in and then you are using geom underscore box plot to basically create a plot which gives me the box plot for the values here and
            • 330:00 - 330:30 this is based on the cylinders which is there in your data so we can always look at what does our data contain and what kind of values or features are available in the data now let's create a box plot we will also use the coordinate function and that basically gives me based on the data so i've changed the coordinates now if you look at the previous one where we created a plot we had mileage on the
            • 330:30 - 331:00 y-axis and cylinders on the x-axis now i did a coordinate flip and that's like your transpose function so you have created the box plot but you have just flipped the coordinates you can create a box plot and then say fill which is the factor of cylinder so that can be used to fill up the values in your box plot now what we can also do is we can create
            • 331:00 - 331:30 factors so we have learnt about factors earlier which is usually used to work on categorical variables so here let's create a factor which is empty cars gear you have am you have cylinder and if you look at the factors which we have created we have passed our data what is the field or the column we are interested in what is the level of values there and what are the labels for those values right so we have learnt
            • 331:30 - 332:00 about factors you can always look into the previous section and learn more about factors now let's create a scatter plot by using the ggplot function again we will use the data as empty cars i will go for mapping option and then i will give my aesthetics that is what would be x what would be your y and you also would want to use what kind of function you are using so let's go for geom pawn point and that basically helps
            • 332:00 - 332:30 me in creating a scatter plot now you can create a scatter plot by factors so here we will say ggplot so notice in all of these cases depending on the kind of data you have depending on the kind of plot you are interested in you will use the gg plot and then basically a function with that or the inbuilt package so here i am saying data is empty cars i am going for mapping which basically will take the
            • 332:30 - 333:00 values for your x and y what is the color and the coloring will be done based on the factor values now if you remember factors will obviously have some levels and those levels will basically help you in differentiating between your categorical variables so i'm saying as dot factor on cylinder and then i'm using geom point to basically create this scatter plot so let's do this and i can look at the values of this one so it
            • 333:00 - 333:30 says must be there is an error which says must at least one color from the hue palette so let's look at that one so the error which we were facing when we gave color as the factor values was because when you look at these factors which were created with some labels if we look at the values of these it tells me there are any values in that particular column similarly your gear or similarly you can
            • 333:30 - 334:00 completely look at the complete data set it tells me cylinder you have am you have gear now these have some we have created some labels but these have n a values so what we can do is we can create a scatter plot as we did earlier by giving the aesthetics and that's a simple scatter plot wherein i'm also using geom point so that i can have these points by default or with defaults you can also give a color specific basically if you
            • 334:00 - 334:30 would want to have different kind of data in the same plot or i can create scatter plots by different sizes by giving a size or i can give a color and size and that's again one way in which you can create your scatter plots now let's also see how you can visualize one more data set which is mpg so i can also do it in this way where i set ggplot2 and then pass and look at the data set
            • 334:30 - 335:00 what we have here you can just do a view on this to see what my data contains if the fields have any any values if that's going to affect your plotting so now what we can do is we can create a bar plot or a bar chart so i am saying gg plot the data would be as we have given in previous lines that is ggplot2 mpg then i will say what should be in my aesthetics and what kind of
            • 335:00 - 335:30 chart are you going to create so i'm saying geom underscore bar so that's my bar chart and that has basically your class and count now you can create a stacked bar chart where your information is stacked in the same bars and we are still using the same data we are going for aesthetics which is class and then when you say geom bar which creates your stack bar we will use fill which is drive and we can always go back and look at
            • 335:30 - 336:00 our data for example you can always look into this so you have the drive column here and you are also working on this complete data set so let's go ahead and create a stacked bar chart and that basically gives me the information where you have the drive information which is stacked here now you can do a dodge by giving the position as dodge so we are still going to go for a stack chart but this time the bars will be
            • 336:00 - 336:30 next to each other and that can also be done which is very useful you can use this by using geom point where you are mapping and you are specifying what are your aesthetics so we were creating a scatter plot now you can also use or give more details where you can say color can be based on the class and we have different classes and based on that my points have been colored now you can also use a plot
            • 336:30 - 337:00 ly or plotly library so let's install this one i will say yes for example let it basically restart so that all my packages are updated then i can access that package using library function and then create a variable to which you are assigning your plot underscore ly plot so data is empty cars what will be your x-axis what will be your y-axis and details on your marker which we have given
            • 337:00 - 337:30 wherein i will give a list which is size color which is a combination and then you have your line what kind of color it will have and what will be the width so this is where i am going to use plot ly and let's look at this plot so it basically gives me some information now we see some warnings which are getting generated but there is you don't need to worry about that so you can look at the packages what you
            • 337:30 - 338:00 have and what options you are using so similarly we can create one more plot using plot ly and look at the values of those so that's a plot with a trend which explains me about my data so this is a simple small tutorial on understanding or how you can have your graphics or visualization used to understand your data obviously there are much more examples much more ways in which you can pass
            • 338:00 - 338:30 into your plot functions or your gg plot and the inbuilt packages which are available in r for your visualization now that could be for exploratory data analysis or explanatory data analysis so try these graphs and see if you can change these options and try or create new visualizations now let's do a hands-on project to perform a
            • 338:30 - 339:00 time series analysis using our programming in this project we'll be using time series energy data to explore the variations in electricity demand and renewable energy supply over time over to ajay now welcome to this session where we will learn on time series analysis using our programming language so this is basically a mini project where we will look at time series data and how we can analyze it visualize it to basically find some
            • 339:00 - 339:30 important information or gather insights from the data now when you talk about time series analysis time series is basically any data set where your values are measured at different points in time so when you talk about time series data data is usually uniformly spaced at a specific frequency for example hourly weather measurements you have daily counts of website visits monthly sales total and so on so when you talk about
            • 339:30 - 340:00 time series that can also be irregularly spaced and sporadic for example time stamped data in computer systems event log or history of 9 11 emergency calls now when we work with time series data for example here i am taking a energy data set we can see how techniques such as time based indexing resampling rolling windows can help us explore variations in electricity demand and
            • 340:00 - 340:30 renewable energy supply over time now here we will look at some aspects of this data set which i am considering so there is this is open power systems data set and here is the data set i have we can look at the data set now this is in a simple format it has time it basically has values for consumption and then you have data for wind and solar and wind plus solar so in certain cases you have only the date and the
            • 340:30 - 341:00 consumption but then if we scroll down we will also find data for wind solar wind plus solar and so on so this is a time series data set which we would want to work on sometimes you may also have the data collected which just does not have the time but it may also have time stamp that is it would have say hour minutes and seconds and that can also be worked upon so let's consider
            • 341:00 - 341:30 this data set and let's work on this project where we will analyze this time series data set now here we can work on this time series data we can basically create some data structures out of it such as data frames we can do some time based indexing we can visualize the data we can look at the seasonality in the data look at some frequencies and also do some trend detection now when you talk about this data set it
            • 341:30 - 342:00 has electricity production and consumption which is reported as daily totals in gigawatt hours and here are the columns of the data which i was just showing you so you have data you have consumption you have wind you have solar and wind plus solar so this is the data we have and we will basically explore say electricity consumption and production in germany which has varied over time so some of the questions which we can answer here is when is
            • 342:00 - 342:30 electricity consumption typically highest and lowest how do wind and solar power production vary with seasons of the year what are the long-term trends in electricity consumption solar power and wind power how do wind and solar power production compare with electricity consumption and how has this ratio changed over time we can also do wrangling or cleaning of this data or pre-processing of data and create a data frame and then we can
            • 342:30 - 343:00 visualize this now let's see how do we do that so i will open up my r studio and let's look at the data set so here is the data set now i'm picking it up from my machine you can also pick it up from github so all the data sets or similar data sets can be find in my github repository and here i can look in the data sets you will find a lot of different data sets here there are some time series data sets such as
            • 343:00 - 343:30 power i can search for power or you have basically coal or you have this opsd germany daily data set and there are many other data sets which you can work on now to get the documentation on this project you can also look in my github repository and you can search for repositories and then basically you can look in data science and r
            • 343:30 - 344:00 and here there is a project folder where i have given the documentation sample data set and also your time series analysis related document this is also the code which you can directly import in your r studio and you can practice or work on this project so let's see how does that work so first thing is we will create a data frame from this data set now here if you see i am using header as true so that it understands the heading of each column i
            • 344:00 - 344:30 am also giving row.names and i am specifying date so there is this date column in the data set as i showed you earlier let's look at it again so you have date consumption wind solar wind plus solar so you can suggest that date should become the index column which can be useful so you can do this now let's just create this let's look at what does this data frame contain
            • 344:30 - 345:00 and here if you see it shows me some data which has been now as a part of this data frame structure it starts with consumption wind solar wind plus solar and if you see this one is becoming my index column so i can always do a head and look at part of the data frame using head or tail so look at the first records so let's see this now that shows me the head data i can also
            • 345:00 - 345:30 do a tail and look at the ending values so if you closely see here we have wind solar wind dot solar and that basically has n a values so there are missing values but let's look at the tail and that tells me that there is some data available for wind and solar and wind solar now we can always look in a tabular format using view and we can look at the data so this shows me that there are values in these
            • 345:30 - 346:00 columns we see any values but if i really scroll down i can see some values which would be available for wind and solar and wind solar so i can just use view now i can look at the dimensions of this particular object and that tells me there are 400 4384 rows and four columns you can always look at the structure that is check the data type of each column which can be
            • 346:00 - 346:30 very useful so if i see here i don't see the date column because date column was considered as an index which can be useful but i also look at my other columns they are of the num types so that's the data type for each attribute or each column here now we would be interested in looking at this date column so let's look at the data type of this date column now if i try to do this this will show me that this is null because date as a
            • 346:30 - 347:00 column does not exist because we created it as an index so if i look at row names and then i search for my data show me the index column or row.names it tells me these are the values that's the date column which we are seeing here now we can access a specific row by just doing a my data and give the index value or row name value so let's look at that and that shows me based on this index you
            • 347:00 - 347:30 are looking at the value you can obviously search for a different date something like this you can also pass in a vector and you can give range of values so that is 0 1 2006 to 4 of january and we can look at this one so it shows me these are the values so here actually i'm not giving a range but i'm just selecting multiple values from row.names now we already know that in r you have a
            • 347:30 - 348:00 summary function so you can always do a summary and that gives you for each column it gives you minimum first quartile median mean third quartile and maximum values so we are looking at consumption we are looking at wind solar and wind dot solar now this is good but then if i would want to really visualize the data access the data do some analysis then it would
            • 348:00 - 348:30 be good to take all the columns and then we can later decide to change the data type of say date column if we want to use it so earlier i was using date as row.names or the name of the rows or index what you call in any other programming language so here i will just use my data set and i'll say header is true i'm calling it mydata2 let's look at the data and this one shows me
            • 348:30 - 349:00 five columns where in my first column is the date consumption wind solar and so on now looking at the structure so let's look at the data type so it tells me that if now i'm interested in looking at the date column from my data to data frame it tells me it is a factor with four 384 levels and these are the values so it is not in a date time format it is a
            • 349:00 - 349:30 factor now what we can do is we can convert this into a date format how do we do that so let's have a variable x and i'm going to use as dot date function and i'm going to pass in my date column so that's assigned to x now let's look at the head of x and it shows me the values we will also see what kind of class it is and we will look at the structure of x so class already says it is date type and look at the structure so it shows me the format
            • 349:30 - 350:00 now we have converted this column or column related value into x now how do i basically extract values out of it or make it a part of data frame so first i will use so all once it has been converted in date format i will go for as dot numeric and here i will create a variable called year and i will just do a format on x which is basically of date type and then
            • 350:00 - 350:30 i am saying percentage y so that will get me the year component out of this let's look at the values that shows me year component now similarly we can get the month out of this and then basically look at the month values we can get the day out of it and we can get the day component now if i look at my data 2 which we had created earlier this basically had date consumption wind solar wind solar so what i can do is i can add these
            • 350:30 - 351:00 extracted columns such as year month day to my data frame using a c byte that is column bind and i will assign it to my data 2 again so let's do this and now if you look at head it shows me date so that should be date format consumption now this one might not be date format but we'll see you have consumption wind solar and we have extracted the year month and day which can help us for group by we can do some
            • 351:00 - 351:30 aggregations we can do a plotting and we can do various things by these additional columns now let's look at first three rows here so i'll say one is to three for my data two and that shows me some data here you can always do a head and look at the sample of data so that basically shows me month day your columns and then you have your date now what we can do is we would want to visualize this data we would want to
            • 351:30 - 352:00 basically understand the consumption now as i said if we want to visualize the data say for example i want this which is consumption of data over years and this one is in terms of gigawatts per hour as we were mentioning here gigawatt hours so if i would want to create this visual to basically understand the pattern of the data how do we do it so we can you create a line plot of full
            • 352:00 - 352:30 time series of germany's electricity consumption using the plot method now how do we do that so here one of the option is i can straight away use the plot method i can then say what would be in my x-axis what would be on my y-axis what would be the type of graph i would want to plot what is my name on x-axis y-axis and this is the simplest way so i am saying mydata2 i am
            • 352:30 - 353:00 extracting the year column and here i am taking the consumption so let's create a plot and here if you see we are looking at a plot we do see some tick times and we see that the data has been divided with every two years so from 2006 onwards to 2016 but then really this data does not give me uh you know a very useful way of looking at the data or understanding it might be
            • 353:00 - 353:30 what i can do is i can use the same way but i can give apart from x axis and y axis i can say the limits that is x limit is 2006 to 2018 and y limit is from 800 to 1700 so we can do this and let's look at this again this is a plot but it really does not help me in visualizing and understanding the data so what are the better options i can go for multiple plots in a window
            • 353:30 - 354:00 as of now we are just sticking to one plot in window so if you would want to have multiple plots you can always change the value here and make it two or three that will say how many rows and how many columns as of now we will just keep it as it is par mfro now if i would want to plot i can straight away give the column name so i am interested in getting the consumption now i can just do a plot i'll say
            • 354:00 - 354:30 mydata2 and i will choose the second column which is consumption which we saw here from our data so consumption was the second column so i can just do a plot in a straightaway way without mentioning your x-axis y-axis limits and so on and if you look at this this one is giving me a pattern now here i am looking at x-axis y-axis which is not really named
            • 354:30 - 355:00 we do not have a name to this graph and we are looking at the data it does show me some kind of pattern but might be we can make it more meaningful so i can do it this way where i say my data second column let's give access as year x axis y axis is consumption now that has changed the x axis and y axis now i can also give some more details i can say type should be line i have the line width i'm saying color
            • 355:00 - 355:30 is blue and let's do this so this looks more meaningful might be shows a wavering pattern of consumption over years i can also give a limit of x that is 0 to 2018 and that basically shows me the range now we can change that and we can be more specific and saying x limit should be 2006 to 2018 and let's look at this now this one once you have given a proper limit it shows
            • 355:30 - 356:00 the line graph and it shows what was the consumption in 2006 and over a period till 2018. i can then use any of these options are fine but it depends on what and whom you are presenting the data or what kind of analysis you are doing so i can do a plot i can choose column second x lab which is x axis y axis type is line width giving x limit y limit and then i'm giving a title to
            • 356:00 - 356:30 this which is consumption graph and then basically you are looking at the line graph now those are the options which you can do either you could be very specific or you could just give your column which you want to plot or obviously make it more meaningful by giving all the details now what we can do is if we would want to look at this data and understand it better rather than just looking at a simple line i can take the log values so here i
            • 356:30 - 357:00 am saying log of my data to second column so i'm taking log values of consumption and i'm taking the difference of logs so i can say difference and then you can basically increase or decrease this by multiplying it by some number so rest remains the same i'm changing the color and let's look at this plot and you see this basically is giving me a better pattern which makes meaning here we see the log values so this is you are using a simple
            • 357:00 - 357:30 plot function in r you can also use ggplot now for that we can install the ggplot package it's already there in my machine so i'll say no i will access this by using the library gg plot 2 and now i can use ggplot to plot so the way you specify here you can say mydata2 that's the data frame i am saying type as o and when i'm saying
            • 357:30 - 358:00 line i am basically going to use x axis which is here y is consumption and let's look at this plot so again we are back to the one which we were doing earlier really does not make any sense gives us some data but then really does not give me enough information i can in my aesthetics i can say x as year y is consumption i can do a grouping and then i can give line and plot so again
            • 358:00 - 358:30 we have some information but really does not help me right now let's look at other example so i'm just doing the same thing here and i'm looking at line type being dashed i am using the gg plots other methods such as geom line and gm point to give me more information and if i look at the plot it does give me data it tells me what are the different values it gives me some kind of pattern but i would still prefer the way we were doing with
            • 358:30 - 359:00 plot now we can change the color and obviously add details to it so what we see is when you use the plot method which i did earlier it was choosing pretty good tick locations that is every two years and labels the years for the x axis which was helpful right but with these data points which we were seeing here or say for example this one or say this one
            • 359:00 - 359:30 or say this one we are looking at some data but then that really is quite crowded and it is hard to read you can look at the values but then it really does not give you enough information so we can go for plot method but then we will see how we can consider different data now if i would want to plot the solar and wind time series so let's see how do we do that so wind column is what i'm interested in so
            • 359:30 - 360:00 first thing is it was always good to find out the minimum and the maximum values in every column so i'm saying minimum i'm saying let's put in here my data 2 and then let's look at the values so we are looking at the columns we know consumption is the second column wind is the third column and you have solar as the fourth and this one is the fifth so let's say let's find
            • 360:00 - 360:30 out the minimum of each of these columns which we would want to plot so let's say minimum of data third column and here i'm also saying remove the n a values because we do not want to consider the any values so let's let look at the minimum that shows me 5.7757 what is the maximum value it is 826 so that also helps me in giving a limit if i want to plot wind on y axis i can give a y limit from 5 to
            • 360:30 - 361:00 consumption wise let's find out the minimum from second column and maximum and similarly for solar find the minimum and maximum and wind plus solar minimum and maximum so this will be helpful when you would want to plot multiple graphs or give some limits so that's fine now for multiple plots as i said instead of having one plot let's plot consumption and wind and solar and try to see a pattern so i can say par
            • 361:00 - 361:30 function and i will say three rows and one column so now when i start plotting you will see you will have multiple plots in one single window so let's see how we do it so here let's look at plot one so this one is consumption as we did earlier and let's look at the data so that gives me some data you can always do a zoom and you can look at the data you can
            • 361:30 - 362:00 basically expand this graph or you can reduce this graph to see what kind of pattern we have in consumption similarly we can basically choose date being x-axis my consumption being y-axis right so this is being more specific because here we have a range but it really does not give me enough information so i will basically give x-axis y-axis i will give the name that
            • 362:00 - 362:30 is daily totals and then i will basically give consumption color and y limit based on my minimum and maximum limits so let's do this and now we can look at the data here so let's see this data makes a little more meaning because we are looking at the dates and let me do a zoom so it shows me all the dates it shows me the data points it shows me how the data
            • 362:30 - 363:00 pattern is changing for consumption now this is for consumption so what we can do is we can also extract specific data so if you see here i have done some testing where i am saying okay i would want to get a date specifically i would want to extract some value so we are looking at the date column but if you remember we did not change the data type we just changed the data type of
            • 363:00 - 363:30 date column we extracted year month out of it it would be good if we can convert a column into date time format and put that in our data frame now let's look at the plot2 this is mainly for your column which should be consumption and wind and solar so here i see it is solar data and i can plot this one to see how it looks like
            • 363:30 - 364:00 and that tells me from 2006 onwards we have some pattern i can be more specific where i say i would be giving date and then the column for solar x-axis y-axis what is the type what is the y limit and what is the color it is always good to specify your x and y axis given name rather than let it automatically pick up now this makes more meaning because it shows me some
            • 364:00 - 364:30 dates similarly we can do for wind so either you do it just by giving the column or you give your x and y axis so let's look at this one and this shows me the data so we can choose plot three this one we can choose plot two we can choose plot one and we can put all that data in one graph so that's when you are putting in multi plots in one particular graph you can always do a zoom
            • 364:30 - 365:00 you can always look at the data right and this is usually useful to look at the pattern what kind of pattern we see what data we have and so on now moving forward so we have seen how you are creating these plots all in one window let me reset this back to one plot per window and let's basically plot time series in a single year so what we have seen is that when you look at the plot method it was quite crowded then we looked at solar and wind and if you compare that
            • 365:00 - 365:30 you will see your consumption pattern your solar pattern your wind pattern and basically we can see from this particular data some kind of pattern so electricity consumption is highest in the winter where we will see what is the consumption is it highest in winter or is it in summer we can see that by breaking a year further into months we can see that but
            • 365:30 - 366:00 we see a pattern which goes for every year or every two years being highest at a particular point of time and then it drops down so electricity consumption is highest in winter and that might be due to electrical heating and increased lighting usage and lowest in summer now when you look at electricity consumption appears to split into two clusters we can always look at the consumption one with oscillation centered roundly around 1400 gigawatts
            • 366:00 - 366:30 so you can always look at 1400 gigawatts and you see all the values here which are in that particular consumption another with fewer and more scattered data points sentry roughed around 1150 so if you really expand this you can see you will have lot of data points at this point now we might guess that these clusters correspond with weekdays and weekends which we can see if you break that data into yearly monthly weekly and so on now
            • 366:30 - 367:00 if you look at solar production that is highest in summer when sunlight is most evident and lowest in winter so obviously when you are making or gathering some insights when you are looking at the data you are also using your domain knowledge your business knowledge your you know knowledge of business to understand how this goes if you look at wind power production that's again highest in winters and drops down in summer so due to stronger winds and more frequent storms and lowest in summer
            • 367:00 - 367:30 so there is some kind of increasing trend in wind power production over years which we can see here over the years and all the time series data what we are looking at is referring or showing us some kind of seasonality that is we are looking at seasonality in which a pattern is repeating again and again at regular times at regular intervals so if you look at consumption solar and wind time series
            • 367:30 - 368:00 that oscillates between high and low values on a yearly time scale which we can break down and see i'll show you that it corresponds with the seasonal changes in weather over the year so seasonality does not have to correspond with meteorological reasons for example if you look at retail sales sales data that will show you yearly seasonality with increased sales in particular months so seasonality when we say can occur on
            • 368:00 - 368:30 other time scales so the plots what we are seeing here they are fine but if you look at those plots they might show some kind of weekly seasonality also so in your consumption corresponding to weekdays and weekend so let's plot for one single year now how do i do that so first is i will look at mydata2 that shows me the structure it shows me date which is factor other columns which
            • 368:30 - 369:00 are all numerics now like we did earlier i'll repeat this step where i'm going to convert the date column into date type look at head of it look at class of it look at the structure of it right and then what i want to do is i want to add this as to my data frame so i will create a variable called mod data and this one will have as data and i'm formatting
            • 369:00 - 369:30 the value of x which is date time into month day and year so let's do that and now you look at the mod data which i created like modified data so this is the format i have it is in date type if you carefully see here and then i can look at the head of it so it saves me more data now we are what we did here is when i said mydata3 so mydata3
            • 369:30 - 370:00 we did a c bind and i did a mod data which is going to add this column to my other columns of my data too so my new data frame is my data three let's look at the structure of it and you see there is this date column i can delete it i can remove it i can let it be right so that depends on our choice might be we want to once our analysis done we want to remove the mod data right so we can keep both of them now let's
            • 370:00 - 370:30 basically extract data for a particular year now how do you do that so this is some wrangling so i will say mydata4 let's call it mydata4 and i will use subset function so subset will work on mydata3 that's the data and what i'll do is i will do a subset how does how is the subset found so i'll say take the mod data column the value should be greater than or equal to 2017 and should
            • 370:30 - 371:00 be less than 2017 december 31st so i'm getting data for one year and i'm storing it as my data four let's get the head of it and you see we are specifically looking at 2017 related data now let's do a plotting of this where i will only create a plot for one year so i'm saying my data for that's my new data what we got
            • 371:00 - 371:30 so here i am going to take the first column which is mod data i am going to take the third column which is consumption so i am looking at the date format for one year consumption values for it and then rest of the things as we have done earlier let's look at the plot and this makes more meaning right so when you look at this plot it tells me jan to jan it shows me some kind of pattern where i have
            • 371:30 - 372:00 divided the year into months right and it is broken down into say two months so jan and march and may and july and so on but we still see a pattern and that gives me good understanding of pattern where i've broken it down into months so this is where you have taken time series in a single year to investigate further and this is what we see right now we can clearly see there are some weekly
            • 372:00 - 372:30 oscillations what one more interesting feature is that at this level of granularity that is when you're looking at yearly data there is a drastic decrease in electricity consumption in early january and late december during the holidays so probably we can assume that this is holidays now i can zoom in further and look at just jan and feb data let's see how we do that and let's see how we work by zooming in the data
            • 372:30 - 373:00 further so to zoom in the data further let's see how we do it now here we have this data 4 which is basically having a subset right so let's work on this one so i will say my data 4 which earlier i was taking data 3 i was doing a subset and i was giving the date but this time i will make it more narrower so i'll say mydata4 i will say subset from mydata3
            • 373:00 - 373:30 and i will choose mod data column which we have modified with the date format i will choose the starting date as 1701 that is jan and then let's go till feb and let's create this now let's look at the head of this so it shows me we have the data which is jan and then you you can basically look at more on this now again as i said earlier let's find
            • 373:30 - 374:00 out the minimum of this from the first column so that is basically your mod data so let's look into this one and that basically will give me minimum and maximum let's look at the values so this one tells me jan 17 january 1 and maximum is your feb 28 second month 2017. so we are actually looking at two months data here
            • 374:00 - 374:30 let's look at the y minimum so this is i will look at column three now what is column three consumption so let's look at the minimum value for consumption maximum value of consumption let's look at the values which can be given as our limits now this is the minimum and maximum now let's do a plotting for this data which has been narrowed down for consumption based on my data so i'm saying my first column which is mod data and
            • 374:30 - 375:00 then third column which is consumption i'm giving some naming convention for sorry namings for your x-axis y-axis what is my consumption or what is my title here what is the color and then you see i'm using x limit to give the minimum and maximum limit and y limit so let's look at this data and if you look at this data it is specifically for two months and again i can look at the pattern here
            • 375:00 - 375:30 what i can also do is i can add some grid here so i can basically look at this data and make more meaning out of it so it is bi-weekly data you can see now i can add a line here using ab line and then i can basically choose what lines i would want to add horizontally so that basically allows me to dissect the data and look at data in a more meaningful way i can also add vertical lines so vertical lines is
            • 375:30 - 376:00 i'm saying sequence will be minimum maximum and i'm saying an interval of seven so let's do this and this basically has added some lines every week and you can see at the end of week it is dropping and then it is starting again it peaks somewhere in the middle of the week and again it drops down so this is you are looking at your consumption data right now what we can also do is we can create some box
            • 376:00 - 376:30 plots so when we looked at zooming in data for jan and feb you can add some data points like this so consumption is highest on the weekdays as i showed you here and lowest on the weekends so this is what we are seeing when we are breaking the data or zooming it further for a couple of months so we have vertical grid lines and we have nicely formatted tick labels that is jan 1st and 15th feb first and so on so we can easily tell which days are weekdays and weekends with use of
            • 376:30 - 377:00 these grid lines and basically breaking it down so there are many other ways to actually visualize your time series data depending on what patterns you're trying to explore you can use scatter plots you can use heat maps you can just use histograms and so on now moving further we would want to explore the seasonality right so when you further explore the seasonality of our data we can use box plots basically to group
            • 377:00 - 377:30 the data by different time periods and display the distribution for each group now how do we do that let's come here and let's see how box plot works so i can just do a simple box plot and i can choose my consumption column and that gives me just the consumption data but this really does not give me any meaning i can look at solar data i can look at the wind data and we can also see some outliers here so we can create box plots but
            • 377:30 - 378:00 if we would want to do a box plot what is box plot it is basically a visual display of your phi number summary that is you want to look at your mean median you want to look at your 25th percentile 50 percentile or 75th percentile so we can use a quantile function use the consumption column and then you basically give a vector which shows you phi number summary so that's your quantile and then
            • 378:00 - 378:30 let's do a box plot so if you are looking at quantile it tells me what is the minimum what is 25th percentile 50 75th 100 that's from my consumption column so let's create a box plot for consumption let's give it a name as consumption let's give y axis as consumption and a limit for y axis now that's my consumption graph so i can look at yearly data now that will make more meaning rather than just looking at the complete consumption data
            • 378:30 - 379:00 so how do we do it yearly so we will say consumption and then i will say the year column so it is consumption but grouped based on year so here i can give x axis y axis and i can give y limit so let's create this and this makes more meaning we can give some coloring scheme here but now i'm looking at 2006 2007 8 9 and so on and we can look at the
            • 379:00 - 379:30 data what is the range right it gives me five percentile or sorry five number summary of the data per year and it basically allows me to look at the seasonality of this similarly we can create box plot by just giving consumption early group and here i am giving the title as consumption y-axis x axis and y limit wherein i can also use less so this is
            • 379:30 - 380:00 one more feature which you can do and that basically will give me the tick points if you compare this one to the previous graph so when i created this previous graph i had 2006 2008 and i had from 600 to 1800 and if i go for the next one i am basically seeing more useful information now let's look at monthly data so i would want to group it based on months
            • 380:00 - 380:30 and let's create that so this gives me the monthly data where i'm looking at months and i could select a particular year or i can just do a grouping based on months so i can have multiple plots to see a difference here so let's do this now let's create a box plot for consumption which is monthly data and let's give it a color let's look at the wind data which is again grouped monthly and let's look at the solar data which
            • 380:30 - 381:00 is grouped monthly now if i zoom in it basically gives me the seasonality of the data for your wind for your consumption for your solar so what we are doing is we are creating these box plots which are giving us values now what i can also do is i could look at the day wise also but before we look into this how do i infer some information from these box
            • 381:00 - 381:30 plots which are being created so this is what we have done where we are looking at the data for month and these box plots give me year seasonality which we were seeing in earlier plots but give some additional insights so if i look at the data here it tells me the electricity consumption is generally higher in winter now this is based on months so we can see consumption is higher in winters
            • 381:30 - 382:00 and lower in summer so we can obviously look at our plot we can see where it is lower where it is higher and then we can look at the median and lower two quartiles are lower in december and january compared to november and february so that is you look at the quartiles and you will see that the median and lower two quartiles are lower in december and january
            • 382:00 - 382:30 here jan and december so you can look it from my plot now this is giving you some idea on seasonality now that might be due to business being closed over holidays now this one we were also seeing when we looked at time series for 2017 only and box plot basically confirms that there is this consistent pattern throughout the years now when you look at your solar and wind power production both
            • 382:30 - 383:00 will give you a year seasonality what we are seeing here and if basically i look at the data so it depends on what parameters you are choosing but if you look at solar it will reflect the effect of occasional extreme wind speeds associated with storms and other transient and since we are grouping it based on months we can see this pattern is quite evident every year now what we can do is we can group the
            • 383:00 - 383:30 data day wise so here let me again reset this to one plot per graph now i'll say box plot i'll say consumption which is group based on day now we know that there is a day column and let's give a while limit and let's look at the data so this is where i'm grouping the data day wise so you look at 31 days and you look at the box plot so this is where you are plotting it on a daily basis
            • 383:30 - 384:00 so you can look at the data you can break it down to a particular week so here i have given a day and i have chosen all the 31 days but i can break it down to a week and i can look at the data so if we look at the data per week or per day we can basically infer that electricity consumption where i'm doing a consumption group by day is higher on weekdays than on weekends
            • 384:00 - 384:30 so time series with strong seasonality can often be represented with models that can decompose signal into seasonality and long trend now this is an easy way now how do we look at the frequency of the data that could be interesting to see so let me look at say the yearly data which we were seeing here
            • 384:30 - 385:00 now let's go further and here we have looked at data so what we will do is we look at the frequency now when you look at the frequency when you talk about frequency in your data so we have the modified date column which gives me a frequency and if we really look into the data that will tell me that the data is on a daily basis so for that let's look at my data 3 again which gives me data and you can just see all
            • 385:00 - 385:30 the data's data or dates are in sequence so you're 22 23 24 25 26 and so on i can look at i can access a d player package that is basically allowing me to work in a better way now i can look at the summary of this and for all my columns i am seeing what is the minimum phi number summary date and consumption so date does not show me anything because this is not in a date format it is just a factor but
            • 385:30 - 386:00 other things have the fine number summary so we are looking at win plus solar we are looking at year and month and day and all these columns now what we will do is we will want to find out the sum of each column how many entries does it have and we will say the value should any value should not be considered so let's look at this one so it tells me for my
            • 386:00 - 386:30 particular columns so let me run this again and that shows me for each column how many values you have and these counts do not include the n a values now similarly i can find out specifically for consumption i can find out is there any n a value so i'm saying is dot n a and let's find out if there is any n a value or missing value in consumption it says zero
            • 386:30 - 387:00 okay that's good if you look in wind it tells me there are 1463 entries which are any similarly solar similarly wind dot solar or wind plus solar so it gives me a count of n a values that is missing values and also values which are not missing so to understand frequency what we can do is we can find out the minimum on the date that is the first column and
            • 387:00 - 387:30 i am saying rm na dot rm is true that is get rid of n a values and find out the minimum and let's look at the minimum value this is the minimum from my modified date now if i would want to get the frequency i can basically use sequence function so i can say from x minimum that is the minimum value i want to look at the frequency that is day wise and let's just look at five
            • 387:30 - 388:00 entries and see if there is a day by day frequency so let's look at the value of this and obviously it tells me there is device frequency so that allows me to look at the frequency look at the type of it it is an integer class is a date so similarly we can say from x minimum we can basically look at the frequency month-wise and i can again look at five records so
            • 388:00 - 388:30 that shows me monthly data right so i can extract the data for frequency similarly yearly data and that's also very useful now we can select data which has n a values for wind so how do i do it i would want to find out the wind column and i want to find out where the values are and a so i will create a variable and here i will say my data 3
            • 388:30 - 389:00 and then i give a conditional where i say is n a in the column so let's do this now once i've done this once i have done this i have said that my selected wind data from my data 3 where we said n a values and i will give the names to this so name should be in my data 3 i'm interested in mod data consumption wind and solar so these are the four columns i'm interested in let's look at first 10
            • 389:00 - 389:30 records here or first 10 rows so that tells me these are the values where wind has n a or missing values i can always do a view and that gives me the complete data so it basically shows me 1463 entries and here it shows me all n a values so you can look at all the way to the end and it shows me wind has any solar does have some value here
            • 389:30 - 390:00 in the last row but then also if you see the numbers have a difference so you have one four six one and then you have two 2174 so there is a difference so there is some data in between where wind has some values so we have found out n a values now what we will do is we will select data which does not have any values so i will call it cell selected wind2 i'll again use mydata3 i will say which
            • 390:00 - 390:30 but now i'm saying not any from this column and i will select the data for the columns so i'm interested in looking at 10 records and this shows me not any value so no more missing values so if i really look at this data as i saw earlier which has n a and if i look at these values which are not any for the wind column so looking at these two result we will know that in year 2011
            • 390:30 - 391:00 wind column has some missing values so let's focus on year 2011. so how do i do that let's call it a different variable i'll say mydata3 i will say here when i say which where we were saying na here i will say the year should have a value of 2011 and i want all these columns let's look at the data here and this is showing me 2011 but
            • 391:00 - 391:30 we are not seeing all the values so there are some values but then there are some missing values also for 2011 based on whatever analysis we have done so let's look at the class of this it is basically a data frame do a view and this one will help me in finding out where are the n a values so if you just scroll down looking at all the data let's search if wind column has a n a or a missing value
            • 391:30 - 392:00 and i will see if there is any missing value in which column or which row it is for the wind column so we have all the values which are existing i could select and search for one specific value and i'll show you how we can do that so here let's scroll all the way down so it's like you're exploring your data and seeing is wind column having n a or missing value for a particular row
            • 392:00 - 392:30 and let's scroll here and here you see there is a missing value for one particular row so 13th december 2011 has wind value 15 december has wind value but your 14th december does not have right similarly we can search so there was only one entry which was missing now that could be for some reason might be it was not calculated might be it was not tabulated so we have a missing value and that can affect my plotting that can affect
            • 392:30 - 393:00 my analysis so let's look at the number of rows in this which will tell me how many rows we have for 2011 so it tells me 365 so that is basically the number of days in a year now we will find out if there were any values so we earlier checked total number of na values per column that is in your row number 265 to 269
            • 393:00 - 393:30 we can see here 265 to 269 so this is where we were seeing are there any n a values right so let's go back here and we want to find out the number of n a values for a particular year how do i do it so i can just do a sum i will say is n a now i am interested in my data 3 wind column and i am saying my year has to be 2011 but i am finding out the n a
            • 393:30 - 394:00 values so let's do this and it tells me 1 and that's right that's what we saw when we did a view let's see how many non-na values you have and that is 364 so that basically satisfies my logic so it's 364 plus 1 missing so there are 365 let's look at the structure of this it tells me you have modified date and date format you have consumption wind and solar now
            • 394:00 - 394:30 let's create a variable selected wind four i will save in three that is which was having all my n a and non n a values for 2011 i will say let's find out the n a value because i'm interested in finding out that particular row so i'm saying find out where the value is n a and i want all the columns let's look at this one and this is my specific
            • 394:30 - 395:00 row which has a n a value now we know that data follows a device frequency which we have clearly seen now let's select data which has n a and non na values so let's say let's call it test one i will use win3 which has any non-any values but now i will say i want the modified date which should be greater than 12 12 2001. now remember we
            • 395:00 - 395:30 had when we were doing a view we saw that one particular day or what we see here 14th of december there is no date so i will select a subset of data which includes this n a and non n a that is might be i can take 13th of december and 15th of december so let's start from 12 12 so the date should be greater than 12 12 that means 13th and it should be less than 16 so that is 15th and the columns
            • 395:30 - 396:00 right so now we have some data let's look at this so i have a i've selected a subset of data i could have done this using subset also so i have any and non-any values now why are we doing this so sometimes you might have some data for a particular column and you may want to find out if there are any missing values might be you want to fill them up or replace them with something so that is usually useful when you are doing a trend detection so say for example you have data for every
            • 396:00 - 396:30 month and might be in one one of the months you have missed or might be you have data for every year collected monthly and then in one of the years for a couple of months you don't have the data like i can say 2016 i have data for all 12 months 2017 all 12 months 2018 might be i don't have data from march and june 2019 i don't have data for same months so i can forward fill or backward
            • 396:30 - 397:00 fill them using the previous year's same month data so we can do that so here i have test data where i've extracted a subset of data i can look at the class of this it is a data frame structure of this it has the columns now let's use the library and function and use the tidy r package and what we will do is we will fill it up so i will use test one i will fill
            • 397:00 - 397:30 the wind column which has a missing value now once you do this if you notice it has done a forward fill so it has taken the previous value and it has just filled up that so you can fill up the data using different directions such as up and down left and right and so on so we can take care of missing values in our frequency data which allows us to basically analyze the data in a better way now
            • 397:30 - 398:00 here we will want to also look at some more data so this is to deal with frequencies of fill column wherein you can take care of missing values forward filled so filling values can be done in different directions as i said and you may want to first convert your time series to specified frequency if your data does not have a frequency but we had now if you do not have a frequency might be you can convert it
            • 398:00 - 398:30 into a frequency such as weekly daily monthly as i showed you and then basically you can do a forward fill for the value so for example if i have my data i can break it down into weekly and then look at the values and if there are any values missing for weekly data i can use a forward fill so that can take care of my frequency data then let's look at the trends of the data which is the last part of this project so basically let's look at the trend so
            • 398:30 - 399:00 when you say trend what does that mean so in time series data you always have some kind of trend so that will exhibit some slow gradual variability in addition to higher frequency variability such as seasonality and noise now to visualize these trends what we do is we use what we call as rolling means so we know how our data is spread over year or month or day but how about looking at
            • 399:00 - 399:30 a rolling average and see what is the difference so a rolling mean will tend to smooth a time series by averaging out the variations and frequencies so this can be higher than the window size so there is something called those windowing where you can choose a set of time frame you can also average out any seasonality on a time scale equal to window size so this will allow you to look at lower frequency variation in the data
            • 399:30 - 400:00 so when we are looking at electricity consumption time series we already saw there is a weekly pattern there is a yearly seasonality which we saw using box plots so we can also look at the rolling means of the time scales how do we do that so for this you can use some package like zoo and then you can basically use a rolling mean using this zoo package and you can say what
            • 400:00 - 400:30 is the frequency with which you want to calculate the rolling mean now how do we do this let's look at this data so here i'm going to my look at my data 3 which we have been using so far now let's call it a 3 day test you can give it any name i am going to use my data 3 i am using the pipe in function now i will use d plier and i will arrange the data descending in here now
            • 400:30 - 401:00 you can always break it down step by step and you can see the result of this so i'm going to arrange this data in descending order of year so obviously my last one 2017 or 2018 will be on the top you want to group the data by year so it depends on how many years we have we'll see so you can group the data by year now this data is then used to basically mutate so mutate function is going to allow me to use this rolling mean so i will call it as
            • 401:00 - 401:30 says 0 3 day so i'm going to calculate a rolling mean every three days for my consumption column and basically let's ungroup this so let's see how this works sorry yeah let's look at this and here when i'm doing a three day test let's look at the result of this and then i'll explain this so if you see here we have the test three day column now this has the rolling average now
            • 401:30 - 402:00 what does that mean so first value here what we see is 1367 is the average consumption in 2017 from the first date with the data point on either side of it that is you can look at this date so 1130 then you look at you are looking at the value 1 3 6 7 here so you look at 1 1 3 0 1 4 4 1 1 5 3 0 if i take a mean of these so for
            • 402:00 - 402:30 example if i would just do this part and that is giving me mean okay because i have a comment so let's basically add anything as comment and then let's do this so it saves me one three six seven that's what we are seeing here right so you've got getting a rolling average every three days similarly if you want every five days it takes the five values and it gets the mid value right so you can always find out the mean
            • 402:30 - 403:00 rolling mean for a particular frequency now let's do that for seven days that is weekly data and yearly data that is 365 days so how do i do it same logic my data test now i'm using my data 3 i'm arranging it in a descending order i'm grouping by year so when you do a group by year so earlier when we did a grouping by and when we looked at the data it was telling me how many rows we had
            • 403:00 - 403:30 right so let's do a grouping by year and let's say test 0 7 so that's a rolling average every 7 days and i'm also saying take care of the n a values similarly i'm getting rolling average every 365 days might be you can do quarterly might be you can do half yearly and let's do this so let's create this my data test and let's look at the result of this so i will use my data test i will say arrange
            • 403:30 - 404:00 based on modified date now we know there is a column called modified date i want to just look at 2017 data so i am doing a filter right and then i will choose what are the columns i'm interested in so i will look at the 7 and 365 day and let's look at say first 7 records so let's do this and that basically gives me the consumption value modified date year and my rolling seven day average order of
            • 404:00 - 404:30 seven day mean which is for first seven days and then 365 you will not see the data here but if i do a view on this i can basically see the values so you can always select a particular column to see the values these are the values for every 7 day rolling average this is for 365 days every 365 days so you see all the values are missing but every 365 entry you will have basically some data
            • 404:30 - 405:00 now let's do a plotting of this and basically visualize this data which we are seeing rolling average so let me first do a plotting one plot per graph and let's do a plotting i will take consumption data x-axis y-axis color and give a title to this so let's create this and that's my consumption data which is spread over a period of time and that's
            • 405:00 - 405:30 fair enough but now let's add some more plot to this so i will add the seven day rolling average to this so for second plot to be added in the same one in r you can use points so i will say points i will choose seven data column type is line width x limit y limit and color so let's do this and that's my pattern seven day rolling average which basically gives me some kind of trend
            • 405:30 - 406:00 similarly i can add one more here and this time i will choose the 365 day and look at the pattern lines so now you see some dots here well you could do it in a different way so i can just add legend to this and i can say legend will be aware in x-axis and y-axis so i am saying it will be 2500 and y is 1800 so my legend will come in somewhere here i am saying my legend
            • 406:00 - 406:30 will have consumption test and this one i can give some names i can give what is the color i can say what kind of legend it explains what is for each color and then basically a vector so let's add a legend to this and i've added a legend now you can do a zoom and look at the data and here i see that my x-axis is fine but y-axis is going a
            • 406:30 - 407:00 little about of my plotting area so i can actually change that so here i have 1800 how about making it 1600 and let's look at this one so we can basically go for this one and start again here plot and points and line and then add a legend right and you can basically place your legend anywhere in the plot so this
            • 407:00 - 407:30 basically is giving me the trend what i'm looking at my rolling average so similarly you can look at the trend for wind and solar data so what we are seeing here is when you look at trend this is one more way of looking at it you can always create plots in different ways so seven day rolling mean has smoothed out all weekly seasonality which we were seeing here in my graph where you look
            • 407:30 - 408:00 at every seven day preserving the yearly seasonality so seven day will tell that electricity consumption is typically higher in winter and lower in summer so better is you break it down yearly so here if you look at every year you can see when is winter when is summer what is the seasonality what your trend what you are seeing here and if there is a decrease or increase for few weeks every winter
            • 408:00 - 408:30 so similarly if you look at 365 now as you said as i said rolling average basically reduces the variation so if i look at 365 rolling mean we can see long term trend in electricity consumption is pretty flat now that's what we are seeing it's kind of pretty flat there is not much variation over years if you really join these dots so we can basically see some highs and lows and that gives me a trend now this is
            • 408:30 - 409:00 how you can do a trend detection and similarly we can do plotting for wind and solar so this is a small project which i demonstrated using r now all this code which you have here in the form of a project dot r file you can find here in my github page this is a document which explains some things feel free to download this and you can add details to it this is the sample data set which you can also find in my repository in the datasets folder so continue learning and
            • 409:00 - 409:30 continue practicing are with that we have reached the end of this video tutorial on the our programming full course i hope the video will help you gain expertise in our programming language if you have any questions related to this video then please put it in the comments section our team will help you solve your queries thank you for watching and keep learning [Music] hi there if you like this video subscribe to the simply learn youtube
            • 409:30 - 410:00 channel and click here to watch similar videos to nerd up and get certified click here