A Deep Dive into Teradata

Teradata Tutorial in 3 Hours | Teradata Tutorial for beginners | Teradata complete training | SQL

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

This comprehensive Teradata tutorial hosted by NIC IT ACADEMY 2.0 details the foundational concepts, architecture, and practical applications of Teradata, tailored for beginners. The video begins by explaining Teradata's role in data warehousing and its unique features such as AMP and parallelism, and then dives deep into topics like indexing, data protection methods, and various SQL operations. Throughout the tutorial, the intricacies of how Teradata handles large data volumes and ensures efficient data retrieval through its architecture are explored, offering insights into both theoretical and practical aspects.

Highlights

Teradata's architecture supports parallel data processing, enhancing performance 🚀.
Unique indexing concepts with primary, secondary, and partition indexes improve data management 📘.
Data protection mechanisms like RAID and fallback ensure reliability and security 🔒.
The video covers both theoretical insights and practical SQL demonstrations 💡.
Teradata's intelligent memory use makes data retrieval faster and more efficient 🧠.

Key Takeaways

Teradata allows for massive parallel processing, making it ideal for handling large volumes of data 📊.
Understanding the architecture, especially AMP, is crucial for efficient data handling in Teradata 🌐.
Indexing techniques in Teradata, such as primary and secondary indexes, can optimize data retrieval 🗄️.
Teradata's fallback and RAID concepts enhance data protection and reliability 🔐.
The tutorial transitions from theory to hands-on SQL operations, providing a holistic learning experience 🔄.

Overview

The tutorial begins with an introduction to Teradata, discussing its significance in the data warehousing ecosystem. Teradata's unique architecture, which supports parallel processing and massive data management, is explained in simple terms, setting the foundation for newcomers.

As the session progresses, intricate details of Teradata's architecture and indexing methods are broken down. From primary and secondary indexes to partitioned primary indexes, the video ensures you understand how each impacts data distribution and retrieval efficiency.

Finally, the tutorial transitions into interactive SQL command sessions, demonstrating practical applications of Teradata's architecture and concepts. Viewers are equipped with foundational knowledge and practical SQL skills, enabling them to leverage Teradata in real-world scenarios.

Teradata Tutorial in 3 Hours | Teradata Tutorial for beginners | Teradata complete training | SQL Transcription

00:00 - 00:30 [Music]
00:30 - 01:00 hi everyone welcome back to teledata developer training series in our today's session we are going to learn about introduction to teradata so what is teradata why do we need this teradata so how this teradata will be used in our
01:00 - 01:30 data warehousing environment and teradata architecture so what is amp what is node smp and mvp architecture then we are going to learn about primary index concepts if you haven't subscribed our channel please subscribe the channel and click on the bell icon so that you'll be getting all the notifications so let us start our session step is architecture architecture the second one is like sql
01:30 - 02:00 uh third one is like uh the utilities okay utilities and also practical experience on the things which are available in day to day whatever we are practicing it right so even if you are if you are having having any doubts on data you can ask me so if you have a very good experience on the sql then teradata is very easy only the thing is the architecture ways we have some
02:00 - 02:30 differences so those things we have to learn and then creating a table also we have to learn some different things we have to learn okay so first two to three days we are going to see like a theoretical class only after three days or four days we are going to see that completely on practical creating a table creating utilities all those okay so my suggestion is take one notepad and pin okay so whatever i'm teaching it you can make a note of it so this is what the
02:30 - 03:00 agenda for today so just we are going to see that introduction of teradata and terror data features all those architecture everything how would it be the indexing how it will be the storage all those the next session we are going to see that uh utilities um it's like a space all those okay so you know the teradata rate so why the naming call it the standard data so this is the first database in the database environment the first database we will call it as it will it handles the data in terms of terabytes 10 to the
03:00 - 03:30 power of 12. so in 1970s itself they have started in california and this is the first database handled terabytes of data terabytes of data in the year of so 2000 2000 in the era of 2000 handle the terabyte of data so now the big data environment they are handling it like petabyte but this is the first database handled the data with the terabytes of data before teradata right it was
03:30 - 04:00 named as ncr corporation okay so that is the only company they have developed first after that teradata has a the company is acquired by teradata and from there they have uh named as teradata corporation in 2007. so that is what the they have formed the terror data corporation it is an american-based company okay it's a headquarters california that they are operating so normally terror data they mainly meant for data housing applications data by hosting applications you know already
04:00 - 04:30 data values right so data various is nothing but it is used to handle huge volume of data the historical data we will have here we'll have our oltp system uh instead of fetching the data for reporting the ostp we have developed one system called oil ap so this is like olt oil ap is for to handle huge volume of data why because for example if i have a company to have the company analysis from the
04:30 - 05:00 first day to till day like last five years last 10 years or last quarter last month or last year if i want to analyze it then it has huge value of data right the data should be handled if i'm having only this oltp if i go and fetch the data for reporting then this oltp will get impacted so that is why the company will have oltp separately they will not disturb this database this database normally oltp they will kept it in oracle database oracle
05:00 - 05:30 database whatever the transaction we do right for example if you have online transaction isctc or amazon or flipkart or anything right so whatever we are ordering it everything will be handled by oltp system only everything will be handled by oltp at every frequent of time okay frequency of time we will fetch the data from here and we will load it into the olap so for olap for reporting purpose it's for only for the management people the
05:30 - 06:00 management people they will take report from this oil ap why because they should not take for example if you take amazon right amazon per second order is 800 800 orders per second think about one number order how many recurs will be created so one order will create the backend 20 um almost 20 records the back end approximately 20 records but think about one minute record one hour record one day record how many volume of record it will be created
06:00 - 06:30 so if i'm going to fetch like last five days of sales it will be the system will get impacted to avoid that we should not affect this particular system to avoid that we are a brief like half an hour or one hour frequency we are pulling the data from here and loading into a olap system got it oil ap system so from there i'm pulling the data for reporting this ole ap system should have the capability to handle huge volume of
06:30 - 07:00 data so that is why we are going for teradata we are going for teradata why because it can handle the data in terms of parallel okay instead of in oracle it will be a serial write serial of data access it will be supported but in teradata it will be parallel so we have a process called amp so access module processor so we will have multiple amp on that error data so it will be processed so that is why the companies
07:00 - 07:30 are going for teradata for their data bar using applications so from this warehouse only we are going for reporting like tableau or all those so this is what the introduction of data were hosting why they need uh data virusing on teradata so that is why uh we so multiple business will up will fetch the data from the data barriers so that is why the system should withstand this only for reporting purpose not for
07:30 - 08:00 the day-to-day business this data values we will create it only for the reporting purpose so in if you take teradata we have different product from the teradata corporation okay these are like data database okay columnar database so you we have different utilities right so a lot of utilities are available so application side we have a lot of uh like teradata marketing operations we have applications also data even hadoop also we have hadoop
08:00 - 08:30 environment teradata application for hadoop okay big data environment also they have improved on nowadays see here big data analytics and hadoop services they have already started the services that is why that the data is used by the large enterprises not for the small enterprises and also the cost of teradata is huge only the large those who are handling huge volume of data they will go for teradata for their applications and uh this is what the
08:30 - 09:00 generator stands for like with all the applications like nowadays uh instead of teradata they have come up with big data also but anyhow it is on the whenever we are going for huge volume of data they are going for some teradata applications so we have different versions in data data so normally teradata 16 is the current version we are using in real time so this is what the history of teradata 1979 they have introduced first teradata of the database this is the
09:00 - 09:30 1999 only they have handled 130 terabytes of data so this is the first database to handle 130 terabyte in 1999 itself so even nowadays some of the companies right they have data values only on terabytes like 130 or 1000 terabytes all those okay in 1999 itself they have handled 130 terabytes so 2008 they have introduced teradata 13
09:30 - 10:00 and 2012 they have noticed their editor 14. then 2014 they have introduced canada 15 and now 2016 2016 right uh december 2016 they have introduced 16th version teledata 16th version so as of now we are using the data 16 so these are all the main features in teradata the first main feature is parallelism it has the architecture of mpp architecture mpp is
10:00 - 10:30 nothing but massive parallel processing so it will do the parallel processing of massive data so that is why we will call it as mpp architecture and the shad nothing architecture so we will call it as teradata architecture is shadow thing i will tell you what is the channeling architecture after some time it will have the horizontal scaling not at the vertical scaling it will have the horizontal scaling so if one any one of the node has been failure the other node will take care that is called a shared nothing architecture
10:30 - 11:00 for example the data i will tell you what is that nothing normally we will go for this is if you take this is my database this is my database okay now it has something like 500 terabytes of space you assume that it's a 500 terabytes of space you want to increase 100 terabytes more so normally if you are going for increasing it here okay it's like vertical scale now it will be at 600 terabytes it will be vertical scale instead of that if you are going for vertical scaling what will happen if
11:00 - 11:30 this particular system get hanged or something happen all the 600 terabytes you cannot hit get it we will call it like system halt then we have to restart it and then we have to check some troubleshoot method to bring it but if you're going for horizontal scaling for example whenever we are going for horizontal scaling instead of vertical whenever you want to add some 100 terabytes you go for horizontal value this way so this is like a box teradata
11:30 - 12:00 box a complete error data box will have within the box will have multiple node so node will have so each node will have the space okay teradata space so each node assume that 500 terabyte terabytes of space if one node has been fade here then whatever it has handle it will pass that activity to other node so this is like whenever this node the failure mechanism we will see after some time whenever the
12:00 - 12:30 node has been then whatever the data in that node is they will be there any any replicas of that yeah definitely we need to go for a replication so we will have the replication before that also we have the fallback table right fallback table or so whatever the node replication factor will have we will define that replication factor in the teradata box itself so whenever it is storing the data into the one node it will replicate that data same same data into a different node for example by default the replication factor will be three if
12:30 - 13:00 i'm going for replication factor will be three the other three node will have that that two node will have the same data if this node has been failure the other node will take care uh that activity this is like a fault tolerance i will explain that data protection method okay so this is like shad nothing architecture if any one component has failure the other component will work on why because it's an independently work right it's not dependent on the other one so if it is here it is failing so
13:00 - 13:30 enter it will get failed but here it is failing whatever the activity has been handled it will be transferred to the activity to the other node another node will take care then they will go for the this is like hard standby node okay hard standby is nothing but whenever a node failure the other node will take care that activity one whatever the activities handle right the other node will take care it's like a stephanie in the car okay this is what it will it will take care and each component in terror data it will work independently so that is
13:30 - 14:00 what that shared nothing architecture and linear scalability so this is what the linear scalability we can go for linearly the the horizontal scalable rate whenever we need to have the x the space we can go for linear scalability it can be scalable up to four zero nine six node now they have improved to four six nine zero nodes so like this they can go for uh linearly they can go for four zero nine six no okay parallel and then connectivity they have teradata
14:00 - 14:30 connectivity they have the binate right so by it it is a software and hardware component so for the connectivity they will use binary binding will connect it will interact with each other node and also inter inter node internal node also and we have the optimizer this optimizer will tell us where the the active data is present and then whatever the data we are frequently accessing it it will handle that the optimizer will do that the access path everything we will see that okay so sql
14:30 - 15:00 it is if it will follow the ansi standard uh no other standard it will follow that error data will follow that ansi standard so it will be faster okay so we have different utilities right the distribution the data distribution will occur automatically based on the the amp we are choosing number of amp based on the indexing it will data will be automatically distributed with other um so this is what the feature of teradata we are going to see uh in detail
15:00 - 15:30 so as i told we have a lot of utilities are available so either you can import the data or export the data from the database okay so you are importing or exporting data from our to a direct data okay you can import it or you can export it from the database so from flat file if you want to load it into the data you can go for fast load so if you are if you want to go to multiple tables at the time you can go for multi-load it's a t-pump it's say where again you are loading the data
15:30 - 16:00 into a teradata fast export you are exporting data from the database that is what the fast export okay tpt teradata transporter we will see one by one after some time thankfully this so these utilities we will see then you can understand clearly the teradata architecture right this is what the basic architecture of teradata like this is what the architecture of teradata i think you have seen already so it has four major component one is passing engine by net amp and storage okay disk
16:00 - 16:30 so normally this is what uh the architecture of uh teradata i think you have uh you are familiar but i will explain the passing engine is the like a gatekeeper okay so from the database or to a database so whenever you want to take data from the database or you want to store the database this is the first person to check your authentication whether you are the you are the log on it will check your credentials everything and then it will pass that particular
16:30 - 17:00 query whatever you are giving that sql query right select star from something it will convert that actual query into a machine readable uh language and then it will it will give it to the database okay and also it will check the logon so very first time you it will check whether you are the authenticated user or not it will tell that okay it will it will pass on to her the next one then passing engine and then it will check for the access path also you are running the same query again and again it will
17:00 - 17:30 it will store that query into one particular place okay the access path it will place and then you are fetching the same query so anyhow it will store it in the ram and then it will fetch that uh very easily so that is what the activity of passing engine then you have the buy net okay so buy net is nothing but banner network it is for the connect it's a hardware and software component on top of uh the the wire right so within the database we have we'll have some hardware component
17:30 - 18:00 on the top of the hardware we will have some software component it is used to do the communication between the the modules okay the modules component and the amp is nothing but it is the the processing engine okay it is a processing engine for teradata so access module processor is the acronym by uh for the amp so this is for the parallelism okay so based on my index based on my index it will distribute the data so to the data
18:00 - 18:30 storage where it will be stored the data how to be stored and it will keep all the information for the storage and as well as retrieval it will be important you have stored today after some time you want to retrieve right so for the retrieval it should tell that where it has been stored so whether the data is there in the disk one or disk two or this three or disc four it'll have all this information and also whenever you are doing all the arithmetic operations like the rank or arithmetic operations like
18:30 - 19:00 the minimum of maximum of sum of the calculations you are doing right all the calculations done by using the sample this app only then storage it is used to store the data so if you look at the storage it will be a one hard disk only on single database only the single database is subdivided into logically how we are having like c drive d drive e drive right it's a logically divided but it's a separate component but if you check the hard disk into one single
19:00 - 19:30 component okay the database is one single component so you know right so if you take a database the entire database or sorry teradata database so within database you'll have multiple databases in oracle we will call it like a schema in teradata we will call it as database like in our in our real time also we'll have different databases if you are working for one particular database this database only you will have access the other databases you need to get access based on your role so you'll have so if
19:30 - 20:00 you want to check how many databases are there you have to go for dbc so dbc is nothing but it's a metadata table to be mostly used for teradata developer database computer so that is abbreviation for dbc dbc dot tables dbc dot tables b so we will have one session on dbc also okay how many different tables are there on dbc we will see one by one so so that you can get that metadata tables you can have the idea on dbc
20:00 - 20:30 so this is a logical division of storage the data will be stored in different different disk based on the file back whether it will be stored here or the replication it will be stored on different storage we will see on the replications okay so this is what the teradata architecture while inserting the data or while how many the passing engine needed for this particular database how many amp is needed it's based on the the number of node and also the number of amp they are using it
20:30 - 21:00 so this will be taken care by teradata admin it will be taken care by teradata admin but for example if you have six rows of data it will not store in a single storage single does if we have three different disk it's all about unique primary index we are using it then record will be stored in different different discount okay if the row is 1 is coming here row 2 will be going here row 3 will be going somewhere row 4 and row 6 and row 7 so this is what it will go so this is
21:00 - 21:30 what data storage as well as data retrieval so this is the abbreviation for all those mpp is nothing but massive panel processing amp is nothing but access module processor pe is nothing but passing engine smp is nothing but symmetric multi processing okay symmetric multi crossing this is one symmetric multi processor this is one single smp okay the combination of the smp is called as mpp okay combination of smp we will call it
21:30 - 22:00 as mpp architecture one smp may have a multiple amp the parallel processing okay here for the pictorial representation we have only given four but in real time you can check how many amp is there we have one query okay the database you can go and check how many amp amp is there for the particular uh database okay i will give you the query for that how to check the how many amp is there and then binary is a band in network b
22:00 - 22:30 tech is nothing but error data basic query so this is what uh btec script we have so this is what the the introduction so now we are going to cover the different so whatever i have told is a single data store only single data store but logically divided okay logically divided so horizontally scaling and parallelism can handle the parallelism and the parallel load and utilities we have
22:30 - 23:00 fault tolerance i told right the shadow nothing architecture and uh total cost of ownership we will say like it's a low cost for setting up but the track is uh huge the cost is huge only we will call the terror data we will call it as low cost but uh the cost ways when compared to other database the cost-wide terra data is the business intelligence we will go for data by losing right star schema or snowflake schema we'll go for camera data
23:00 - 23:30 and we will call it as intelligent memory okay so teradata we will call it as intelligent memory so what is that intelligent memory whatever the data we are frequently querying okay so frequently querying that the data will be automatically stored in the ram of the disk the database the access path it will be easily maintained so it will easily fetch the data so that is what intelligent memory we will contact for the better performance so this is the architecture of one
23:30 - 24:00 particular node one smp it all right one smp and we will see each and every component now smp is nothing but symmetric multiprocessing mpp is nothing but collection of smps so this is one single node this is one single node here we have four amp but in real time we have multiple amp you can check how many amp is there in your database you can query that i will give you some the metadata query to query the database
24:00 - 24:30 and you can check it okay and then the smp smp is nothing but this is one single smp single node we will call it a single smp then if you connect most the all the smps in the bank network the buy net for internet connectivity of the node then we'll go for mpp architecture okay this is called mpp architecture so mpp is nothing but the horizontal scaling we will call it as cluster cluster of cluster of database if one in
24:30 - 25:00 the cluster if one node has been failure the other node will take him so whatever the task is has been handled it will be given to the other node the other node will handle it and then teradata admin team they will restart this particular node and they will bring it up this particular node then it will take the attribute after bringing up it will act as a heart standby node this is what the architecture if you look at the architecture we will call it as virtual processor
25:00 - 25:30 okay so vprock so vproc is nothing but see here this is what the virtual processor so one virtual processor for used for for the parallelism okay the virtual processor is nothing but parsing engine and amp the combination of passing engine armor is called as virtual processor so this one we will call it as virtual processor v proc the parsing engine and then amp so this is like one versus one
25:30 - 26:00 uh when virtual processor v proc so sometimes we will call it as v proc what is that so v proc is a combination of passing engine and ram so what is the functionality of vprac to read the data okay so read the data from this the outer world to error data or teradata to outer world read and write operations okay and then starting so while storing the data if we want to store the data in the sorted order then it will do the sorting algorithm building the indexes indexing and
26:00 - 26:30 loading the data into the database and aggregations i told right the amp will take care that aggregations aggregations transactions and journaling backup and recovery but the transaction and the journals and the fallback backup i will tell you after some time what is it the combination of vproc we will call it as one single unit multiple v product will be available this is one v proc the same way this is
26:30 - 27:00 another v proc another v product another v prime so the combination of v product will be called as one node the node will be called as npp architecture so this is what what is the basic activity of vproc it is used to do the parallelism the multiple vproc will be available to handle that parallelism instead of doing one particular task by one person the task by divided by into multiple persons then it will be easily carried out so this is what the virtual transfer
27:00 - 27:30 then what is the functionality of passing engine i told right passing engine is the gatekeeper of the first one from the client to the database the passing engine is the gatekeeper okay so it will act as a communicator between the the client and the data mesh client at the database through the amp through the ambulance okay this is the passing engine the main activity of the passing engine the session control it will check the log on and log of authentication orders so in
27:30 - 28:00 the beta script you are checking like log yeah on right script first you will do the logon first if you are running the beta thread the passing engine will check the logon okay whether the system is having that either sometimes a system account also may get locked or system account may be expired so that time the passing engine itself it will stop that sometimes whenever you are executing the job and right you will be getting error your session has been expired the log on has been expired so
28:00 - 28:30 that time it will not go into into the database also the parsing engine itself it will authenticate and then it will tell you what is there then the sql parsing sql passing is nothing but if you are giving some joins or sub queries so you are giving some select star from something right it will convert that select star from into a missing readable language the passing error error handling if your query is having some error some syntax error or logical error it will throw logical error it will not
28:30 - 29:00 throw the syntax error it will throw the passing itself passing engine itself it will identify that syntax it will not go into the database also so error handling execution plan execution plan is nothing but so how can i execute this particular query okay so in which way the execution plan so you are executing the same query in daily rate it will store that execution plan in particular place if you are running the same query on daily basis daily it will not uh check
29:00 - 29:30 the execution plan so one time it will check the execution plan and then it will store it next time it will have the hash value so based on the hash value it will take that execution and then it will proceed with that execution plan then dispatch that query into a amp so whatever we are passing that query right it will dispatch with the particular amp then it will proceed with the result set to the client so this is what the the functionality of parsing engine got it then
29:30 - 30:00 the buy net is like a brand new network this is what we will call it as high-speed interconnect network communication between all the nodes and everything so linking the vproc nodes signaling signaling messaging we will go for buying it then we will go for ram amp is nothing but access module processor like each and every amp is the main processing unit for the arithmetic operations so virtual processor the b proc right in
30:00 - 30:30 the v product is the main component okay manage the database handle the file task manipulate the task although it will be handling it you have the locking system right so you you will have the table locking if you want to read that data from the table you will make read read lock if some process is writing a data into a table you will go for right clock then no other build at the same time they cannot do the right into the table okay read log right clock we will do and then you you can we will
30:30 - 31:00 see that what is that logging system and then joining sorting aggregations uh journaling the space management everything will be taken care by access module processor then storage storage will be taken care by storage and we do we do have one the software component we will call it as pde pd is nothing but on top of our os right for example in real time on top of linux os they will be installing that teradata the teradata need to communicate with the os also the linux
31:00 - 31:30 os and the teradata it needs to communicate on database so that connectivity this pde will give give us that connectivity parallel database extension will give us the connectivity between the os and the database traction between voice and windows and windows we can windows so here i have installed on windows right but in real time it will be installed on unix only not on windows so windows we have only the
31:30 - 32:00 for example this is the database server this will be mounted on unix os this will be mounted on os right in our system it will be mounted on os called windows os but in real time it will be mounted on teradata has been installed on unix are linux or red dot all this will do all the operating system okay any one of the operating system the interaction between the os
32:00 - 32:30 and the database by taken care by this pde parallel database extension so now we will start with the indexing so i told the data has been distributed the automatically distributed the features also we have seen parallel distribution will take that automatically right how the data will be distributed automatically based on the index only create index are you you may not create index teradata itself will automatically create an index for the table first
32:30 - 33:00 column it will take it as a indexing column so sometimes in our real time if you don't if you don't create indexing it will take it as no pi table no pi table no pa is nothing but no primary index so in our database environment they have set up like if you don't mention any primary index you take it as no pi table in our environment we they are using a no pi table so nowadays one nowadays they are using like no pi but it will be
33:00 - 33:30 differ from project to project in your project might be they might be using the first column as primary index in our system right if you don't use that primary index first column will be taken as private index so you assume that you have the data so you have the data in the table some data in the table based on the data based on the data it will be distributed okay it will be distributed so you have customer table assume that you have customer table you have customer id so assume that you have unique customer id so you have unique
33:30 - 34:00 numbers okay unique numbers you want to store this data into a database so each and every record is unique right so based on that unix value it will have some algorithm to divide that data into a different amp okay something like modulus function or something it will have it if i have four different amp then this will be divided into four different the amp okay so it is going to amp one okay you assume that this is going for ramp one
34:00 - 34:30 this will be going to m4 okay this will be going to m4 this will be going to amp 3 then again it will be going to 2 so this is what it will go then again it will be repeated to be repeated for next four records got it so i am 1 on top two on three or round four which will be repeated so parallelly it will distribute the deck card same record if it is this is called indexing based on the indexing one
34:30 - 35:00 record will go to the one particular amp correct it will not go to the same amp again the same record is not repeating it so it will go to the so different different amp if one record is coming here it will go to amp one next record will go down to m3 m4 and again i'm going up to m3 m14 it will be again it is not on the order it might go to this way also so this is called the distribution of the data so parallel it will be distributed so
35:00 - 35:30 whenever the reading the data like amp one the same instant m4 will read the data mp3 will read the data so at one single instance the four data will be written one single instance so second instance it will read the data third instance it will read the data so this is what three seconds it will process total records now okay but it's oracle only three records will be processed but different error data it will be processed like control records got it
35:30 - 36:00 this is like one amp operations why because like if if i'm having like uh the record it will be only one the record will be stored in only one amp right it will not be stored the other arm this is called on up operations if it is not unique primary index you assume that if it is not unique primary index it can have a duplicate also it can have duplicate also the same record can be repeated the same record can repeated whatever the amp it has been loaded the
36:00 - 36:30 same amp it will process it will go to again on the same amp not on the different amp so again if it is record is coming the same record again it will go to the same amp only m4 only this will be taken care by that algorithm i will show you the algorithm okay the algorithm takes place to the particular ramp it will go to the same amp so even though it is a upi or uh the pa
36:30 - 37:00 it will be a one amp operations cut it what is moved by one amp operations so it will be one amp operations the two ohm operation will be on the secondary index i will tell you so now the one amp operations what is the tone up operation now if you look at here the database the primary index the primary index is nothing but so based on the mechanism to assign value to a particular ramp particular disk the path okay the access path for the particular
37:00 - 37:30 so entire table it's going to be distributed within this particular ram okay so while creating the table you have to mention which is the primary index if you don't mention so you can have the combination of column also so you can combine two or three columns as one particular primary index so the combination of value will be stored in a particular amp only one primary index but you can have multiple columns in the primary index only one primary index per
37:30 - 38:00 table only one prime only one primary index per table so primary index value can be value can be unique or non-unit this is the value right this is the value this value can be unique or non-unique if it is unique then it will be a unique parameter index okay if it is a non-unique it's an unique primer index so primary index we will call it as non-unique primary index also the other word okay and also the primary index
38:00 - 38:30 value can be none okay if you are making like this columnist primary index you can accept null value also not larger so it can accept null value also the combination of so all the null value it will go to one particular number more values coming that is what the skewness for example more duplicate value is coming something like more uh this value is coming where the other value will go to same amp right all the value will go to the same amp number four
38:30 - 39:00 same disk same disk it will go so this is called skewness one disk will have more value on this will have more value right this is called students we will call it as tables are skewed this is this is like not a skew skew table based on the if you are not selecting the index's proper index then see here the record is going to the same amp right this this will be a skewness we will
39:00 - 39:30 call it as keonus skewness is this the the statistical name how the data distributed among the disk okay this is called skewness so we will come to that point later skewness will happen if you don't select the index is the correct index okay if you are thinking that this column will have more duplicate value you go for one more column okay customer id and mobile number
39:30 - 40:00 combination of this column you create for indexing that is why in real time they will have combination of column for primary index so they will call it like primary index of primary index of these two columns like customer id comma uh the mobile number the second column they will give even third column also they will give so combination of this column will not have duplicate right one customer may have if they have two mobile number it will
40:00 - 40:30 be going to the two different amp it will not go to the same amp so that is why the data will be eventually distributed it will not get any skewness if you get skewness one app will have more operations then it will struggle to fetch the data and also struggle to load the data so that is what the skewness got it so primary index value can be modified okay if you have already created the table the value can be modified from
40:30 - 41:00 this value this value can update some other value you can update it value can be updated but after creating the table after loading the data into a table you cannot modify it the indexing if the table has record right if the table has record you cannot for example first of all you have created index by customer id private index of customer id after some time you want to change it customer id to mobile
41:00 - 41:30 number the indexing or combination of the column you want to create it the the only thing is you have to either you can drop and recreate or if it is a it is a final table you will take the backup of the table you will delete all the record from the table you will modify the indexing and then you will restore the data from the backup table to normal table okay that is what you will do it so this thing we will do it in the real time we will i will show you
41:30 - 42:00 so like we cannot change the different column name as the primary index with after creating the table if the table contains data if the table doesn't contain any data it's a blank table okay you have created the table with the primary index of customer id it doesn't you haven't loaded data as of now you can modify it okay you can monitor it you cannot modify it what we will do we will take the backup of this table
42:00 - 42:30 okay we will take the backup we will we will delete all the record from the table we will modify it or you can re you can drop and recreate also no issues since we have we have created the backup right so after that we can create uh like drop and recreate we have created and then again we will restore the data from here to here so this is what we will do in real time okay sometimes the dba will tell us the table is going for more skewness you
42:30 - 43:00 look into the issue that's at that time you will go for the the operation and also sometimes you will have some school space issue i will tell you what we need to do that but skewness this is what we need to do that you check the table so how the table index has been duplicate record you check the for that particular indexing column if you have more duplicate value on the particular index column you can assume that there will be sqmis okay so you have to choose some different
43:00 - 43:30 indexing so this is like a primary index so this is what the rules of primary index i told right employee id will be a primary index if you want to create employee id alone you can create an employee id or if you want to create like a combination of the indexing okay like employee id pan number and employee data but fan number it's like a indexing okay primary index of first name last name data joining the combination will will not repeat it
43:30 - 44:00 again the index can be divided into two different index one is unique primary index non-unique primary index unique primary in the sense if you are assigning economists unique primary index that particular column will not allow duplicate it's like a primary key okay if you are assigning a columnist unique primary index that column will not allow duplicate record so non-unique primary index is nothing but uh simply you are creating a private index right that is non-uniforms so the
44:00 - 44:30 primary index value can be none okay but the index so you cannot if you are after creating the table if you are you cannot modify the index if you want to buy if you want to modify it then you have to go for whatever i told right the back taking up on the table and also so primary index you can go up to 64 combinations of columns like i told right this is the way you can go for the combinations like
44:30 - 45:00 uh employee id are two columns three columns four columns right the same way you can go for 64 different columns you can go for and how can i create the index so the create table structure at last primary index of the column names so either one column or two column you can go up to 64 combinations of columns so this is what the indexing so what is that so significance of primary index i will tell you the next session how the hash value will be
45:00 - 45:30 generated how the data will be distributed i told right to be going to the different different amp so how it is going to different different app i will tell you the next session hi everyone welcome to nikkei academy in this series we are learning teradata completely in our today's session we are going to learn about unique primary index what is up how it
45:30 - 46:00 can be created what is secondary index unique secondary index non-unique secondary index what is partitioned primary index and what is qnus what is one amp operation two amp operations and teradata data protection methods and we are going to learn about teradata data protection methods we do have many concepts in data protection methods like we have node level amp level and
46:00 - 46:30 disk level protections we have read concept fallback concepts and click and lock concepts so that we are going to learn about so keep watching the session without skipping it if you haven't subscribed our channel please subscribe the channel and click on the bell icon so that you will be getting all the notifications thank you let us start our session like primary index is nothing but whenever you are creating a table if you
46:30 - 47:00 mention so it will accept the primary index if you don't mention it will it will take first column as the primary index but so after the teradata 13 so we have that like um no pa concept so nowadays in in real time uh if you don't mention the in our real time so if you are not mentioning a primary index so this table will be taken as no primary index table no pa so it's again it's based on the
47:00 - 47:30 company how they are defining it the admin level if you don't mention the first column will be taken as primary index okay that is what a normal scenario defaults value so even you can go for two or three columns for the indexing and up to 64 columns you can go for the index so how the indexing will be working so if you are creating index primary index our unique primary index it will be a one amp operation i told in the morning right so one number operations how it will be so you assume
47:30 - 48:00 that this is what the record okay record from the table that client mission we are getting from application or from the client we are passing it record so the id column is like uh 2 32 67 12 something some id columns we are getting from the client from outer world to teradata then how it will be processed right so it's based on the indexing like either it's a upi rpa it will create the teradata will create
48:00 - 48:30 cache algorithm okay the parsing engine will create that hash algorithm first okay based on the primary index so hash value for each and every record that error data will create one hash value so say for example for this particular record it is creating a hash value called 646 okay it will be maintained in one table called hash map okay hash map table so it will have the hash value and each and every record should have the hash
48:30 - 49:00 value and based on the hash value it will have which amp it is available okay that particular record will shampoo so based on the value so if it is a unique primary index or primary index it will have different different amp right so it will assign different different time this is what you will have so if all the records are unique then eventually it will be distributed okay if you are getting if you are getting a duplicate value on the column then one
49:00 - 49:30 single amp will have more value why because the hash value will be same for same record okay so if i'm getting again 32 the hash value will be generated like 646 only same hash value will be generated so if it is generating same hash value then it will be assigned to the same amp it will go to the same amp now okay so that is why if you are getting more duplicate value on the column so one single amp will be getting skewed so
49:30 - 50:00 this is what we will say like has been skewed okay so it should not be that is why we are going for combination of columns if you are going for combinations of columns like primary index or unique primary index so that column will not have more duplicate value right so that is why we can choose the primary index to avoid the skewness this is what the algorithm will be maintained so for example [Music] we are not going to take care
50:00 - 50:30 so just we are creating so our aim is to create the indexing that's okay so finally we have to tell them like all the if you are all the problem mentioning right after that you have to mention close this bracket okay here we are opening the bracket right so close this bracket here and then tell like which column is primary index primary index of country code center local id so country code is the first column
50:30 - 51:00 center local id is the some other columns okay some other columns will be there in the second column we have a column so we have to mention in this way then teradata will create one hash value it's like a number it will create on a hash value then based on the hash value it will assign the um this is like so either primary index or unique primary index it will be a one amp operation why because it will go to one single amp on it so retrieving as well as storing the data it will go for one amp operations so here
51:00 - 51:30 if you are creating a table while creating itself you have to tell like either it's a primary index or unique private index if you are mentioning like unique primary index that particular column will not accept duplicate value it's like a primary key okay so normally in internal data we will not create a table like primary key so we will create like unique primary index got it if you are mentioning economists unique primary index that particular column will not accept any duplicate
51:30 - 52:00 value so normally unique primary index definitely will not have any duplicate record so all the amp will have eventually distributed so hashing algorithm will be created for each and every record so based on the retrieval as well as storing it will go to the particular amp it will show the data so this is what it will have a unique primary index okay so this is what i have explained unique primary index how the data will be if it is if you are not
52:00 - 52:30 selecting the primary index correctly then one amp will have more data okay one particular amp will have more data then this we will call it as this this has been skewed okay to avoid that we have to choose correct index okay skewness it is not distributed while uh retrieving the data yeah retrieving as well as storing the data it will go to one particular arm okay sometimes uh the teradata admin team they will send their mail links this this table has been highly skewed
52:30 - 53:00 look into it okay then we have to check for uh so select from the table okay we have to mention like a source or extract from the table right it will show you that create table structure then you have to check which column you have chosen that uh primary index then go to the table you check for that particular column we do have duplicate record or not so how many duplicate record we have you check the combination of column or other single column so you decide that
53:00 - 53:30 okay it should not be a primary index then you create a backup of the table under create another table with some other columns primary index okay then you change the primary index again store the data from the table to this table so that is the way you can avoid that skewness it's very rare scenario after creating the table but uh by creating the table itself we have to check the tables and all the initial stage itself there are cardinal based on that we have to choose that primary index okay then secondary index
53:30 - 54:00 so you have created uh one table with the primary index okay you have one table with primary index but you are querying the table with a different column you are carrying that column with a different column okay so you you take this card this state this sheet for example this you have the customer table you assume that i have the customer table you are having the customer id and customer name like another column is called mobile number okay mobile number then you have something like city column
54:00 - 54:30 then data but okay this is what column i have you are fetching the data based on the customer id so that is what you have given that private index sometimes you will be fetching the data based on the mobile number also select start from the table where mobile number equal to this way you are frequently accessing another column not this column okay so either this column or this column you will be accessing it so this column you have access you have
54:30 - 55:00 created as primary index you cannot create one more columns primary index right you cannot create one more column as primary index so if you are you want to create so this will be a unique primary index you assume that it should be a usa unique primary index okay then you have to create one more columns indexing column separately you have to create why because you are separately querying the column multiple times single column so for this column you cannot create
55:00 - 55:30 again indexing like sega you primary index you cannot create if the table contains already index this column will be a secondary index if you are creating this column it will become a secondary index secondary index is nothing but you are fetching the column it is an alternate path okay for the data retrieval as well as data storage got it instead of going for this column you are going for a different column then how it will be maintained in this
55:30 - 56:00 database right this is 2 ohm operation we will call it as 2 amp operations so normally the data will be stored based on the unique primary index only our primary index formula the amp it will distribute the data based on the primary index funding for example here i have id some id columns so based on this id column it will distribute the data in the amp so different different app it will distribute but this is like something like mobile number some mobile number you have so based on the mobile number you are fetching the
56:00 - 56:30 data so this is an alternate access part so you are fetching the data in different way so how the teradata will maintain so this this so if you are going for secondary index okay either it's a unique secondary index or non-unique secondary index if it is a unique secondary index this is also a unique column okay it will not accept any duplicate so if you are going for some non-unique secondary index then it may have duplicate value first we will go with unique secondary index how the
56:30 - 57:00 unique secondary index will be maintained so if you are going for unique secondary index how the data will be so first of all this is the algorithm okay you you take it like select star from customer where mobile number equal to some number instead of customer you take it like mobile number something secondary index column we are going for this is not actual column this is a secondary index column we are going so hashing algorithm will create on hash algorithm this is since it is a
57:00 - 57:30 secondary index column how it will be hash map the secondary index a sub table will be created a sub table will be created in all the amp all the sub table will be created and also remember this internal data whenever you are creating the table right whenever you are creating the table like create table table name the structure will be maintained in all them you remember this whenever you are creating a table it will not have one structure
57:30 - 58:00 if i have four amp right forearm four amps will have the same structure of the table okay this also will have the structure of the table this also will have the section of the table this also will have structure of the table this also will be the structure of the table so it will have the id called two this will have id called three this will have id called some four or five something okay so if i have four record the same structure will be maintained in all the tables all the amp but you'll
58:00 - 58:30 have one one record whenever you are creating an uh table right the structure will be created on all the table all the amp okay the data will be saved based on that uh record which we are passing it the same way if you are creating a secondary index okay secondary index a sub table will be created on all the all the uh okay data will create a sub table in all them okay so all the amp it will create
58:30 - 59:00 so for example if you are having like a mobile number equal to 998 something you are having then hash value will be created for that particular row then this is like hash map this hash map will have where is that sub table recites okay it will have the hash bucket number 602 and amp number and the base row value so base value is nothing but it's based on the primary index value so the record will
59:00 - 59:30 be stored in the database based on the private index rate yes but this is the alternate path we are carrying the table so the record will be saved in the amp based on the index the driver index only but this is an alternate path so for example this will be on amp number one two three or something so based on this the another path right another part it will have where the base row has been stored
59:30 - 60:00 it will the hash map will have remember this okay carefully listen the hash map will have values that hash bracket has market number is 602 okay then amp number two is the sub table where the subtable will be maintained it will be number two whereas the base row it's a 778 it will go to this particular sub the amp number two it will check for the sub table so each and every amp will have the sub table right so this amp will have the
60:00 - 60:30 sub table called hash num hash value 602 and amp number four it will tell you like the base record is present in the amp number four like it uh that has bucket number will give not actually that where the record is located in each term information subtable information this sub table will give you that where that base
60:30 - 61:00 yeah so it will give you amp number four and then base row row id is 778 so the teradata will go to this particular amp amp number four it will check for the row id 778 it will pick the data and then it will give you that result okay okay so for this 998 this is what the record it will give you so this is like 2 ohm operation not only one amp it's amp number two also individual only primary index that is a
61:00 - 61:30 one amp and with the secondary index it will lead yes to operation okay for example if you are going for unique primary unique secondary index this is operation if you are having any duplicate record okay there is no no other row will be created on the amp number two for example amp number row this will not create another row it will say like so based on the based on the for example you are having the same record 988 here
61:30 - 62:00 it is like a duplicate right duplicate correct yes so for these two record for those this is like a non-unique secondary index correct non-unique secondary index for the two record it'll have the same hash value the row id will be different 778 and different value the row number okay so but the same same record the this sub table will have that 778 comma the another row number
62:00 - 62:30 so for example 6640 it will have that id then it will go to this particular id and then it will pick the record it will not create two record for the non-unique secondary index only one record will be created on the subtable got it this is also a second uh this is also two ohm operations so this is what will be explained here if you go through this then you can able to get it clearly yo
62:30 - 63:00 okay this is like uh something like we will get some duplicate record so here name column we are fetching it right select start from customer where name equal to adams might be adams will have two number two names see here atoms here also here also right adams so this is secondary index customer is the unix primary index the name is the secondary index adams is two numbers but the row id is 2 22 and 115
63:00 - 63:30 right the sub table will have that 222 and 115 on the same amp yes okay same amp based on that it'll pick the record okay five six seven two five six seven one it will go here and then it will check where is that row id it will go to this particular row id triple two and one one five it will pick that particular record from uh the amp number some different amp and that will pick their account so this is what it will go
63:30 - 64:00 for the non-unique secondary index got it primary index unique primary index secondary index that is a non-unique secondary index and uh unique second index so this is what we will create uh the join index concept i will tell you after some time then next one next one is like a partition primary index okay this is another index
64:00 - 64:30 on the same same app we'll go for partition okay we are distributing the data right on the same app while distributing the data we are partitioning it so partition primary index is the distribution of row based on the different partitions for the faster retrieval so where we will create the part primary partition index right this one ppi for example you have the transaction date okay some date column order date some transaction date
64:30 - 65:00 on that column we will create ppa oh this is also while creating the table only we have to do something yeah yeah definitely so while creating the table we have to mention that ppi column primary what will be the syntax then so here we will mention okay so primary index order number primary index is order number only partition by range we are creating say order date between
65:00 - 65:30 this date to this date each interval is seven day normally we will go for one day so each day will be one day so this is for one year they went for right one year so first january to 31st december right so each partition will have seven days of data normally we'll go for one day so one day once we are creating it assume that okay so order number primary index is order number only but order date is
65:30 - 66:00 partitioned for example order date so we are having like uh there we have created like seven days right seven days or one month this is like one month see now we are fetching the data based on this so we have the four amp here okay four amp here so it's it's based on the upa rate the order number the order number it will be distributed correct so one is here 2 is here 3 is here 4 is here like 5 6 7 8 this distributor based on
66:00 - 66:30 the upa but we are creating one more column order data it's a partition so if you don't create partition if you are fetching the data so within this some order date you are going for for example you are going for select star from the table where order date equal to the state since it is not a primary index teradata will confuse that where the data is has been reset right so this is green color is january and yellow color is preferably
66:30 - 67:00 it is not eventually distributed but if you are creating partitioned primary index then within the amp itself it will partition so the above will be january the below will be february normally we will create like a transaction date like transaction date or order date each one day will be on one partition so within the amp we will go for the partition it will distribute the data based on the
67:00 - 67:30 primary index column only but within the amp we are going for and here this is also one table right as i told every amp will have separate table list yes again this table will have some data within this table we are going for partition so this is what partitioned primary index ppi column okay so ppi column this is what the syntax you have to follow if you check some other table syntaxes in real time go and check how to check
67:30 - 68:00 it right so you will have some column like a transaction table or some table like you can check like this okay if you are going for show table okay show table table name it will give you that something like order table for example order table it will give you that uh normally here we have to mention like a db name so which tv is present db name dot table name will make in our database we have some database db1 db2 or something
68:00 - 68:30 we have something like uh production okay we underscore tables production table we have our table name so you can show in this way or you can go for show select start from the start from the table name if you are mentioning this also this will give you the structure of the table how it has been created there you can check the create table structure how it has been created okay you can check how they have created they have mentioned in this way primary index
68:30 - 69:00 and partition by something this is a ppa call so definitely the large table right transaction table under they will create ppa transaction table sales table some invoice table like fact table you know the fact table right the fact table will have they will create like a partition private index so indexing the secondary index what is the disadvantage of secondary index again it will create a subtable subtable right so that is a disadvantage and also if you are going for some
69:00 - 69:30 fallback operations when we are creating the fallback table so it will maintain the data in the different amp then it is an overhead to the subtable subtable information okay so ppi definitely needs additional space like what 2 kb per row so why because it is again this partitioning right this information is creating like each and every amp is creating so each and every row it is secure another 2 kb of data then it is again the space base it is
69:30 - 70:00 overhead so that is what the the indexing concept okay and here we will create uh this concept you will see again this concept while seeing that create table structure so now we will see that something like the data protection what is the data protection all those then we will see like the raid concept also right the data protection is nothing but how how are we going to protect our data okay in the fall tolerance something
70:00 - 70:30 goes wrong okay something goes wrong in our data environment how are we going to retrieve the data okay so it has five different level either can be a disk failure okay the storage the storage failure or it can be a node failure the entire node can be failure or it can be a amp failure so one up is failed and how to retrieve that particular ram data right so the data has been stored in one particular
70:30 - 71:00 ramp so you cannot retrieve it ramp is failure or transaction level if you are doing some operations some select operations are insert operation in between something happen some data only will be inserted right so how to retrieve the other data everything you are going to see like transaction level and the object level object level is nothing but the table level so we have different different method so first method is called raid method rate okay it's a disk level protection or disk level data protection
71:00 - 71:30 rate is nothing but redundant array of independent disk redundant area of independent disk so we have different so raid is a concept actually it is like a concept teradata have used trade one and five one and five right yeah okay so raid one raid two raid three we have different trade concept for the database environment so teradata they have taken raid 100 right five okay for the disk failure so raid one is
71:30 - 72:00 nothing but disk mirroring okay so we have the disk mirroring concept we have the amp under disk right the virtual disk we have yeah in in teradata it will have the disk mirroring so we will mirror that okay so something like emc square logic we will use it's a disk ring is delayed one concept okay raid 5 raid 5 is nothing but some other concept it's a data parity protection so wherever you have
72:00 - 72:30 something like um something like array of data okay it's nothing but whenever you are going for you some data only we will frequently query the data right you will not query like five years of data or 10 years of data frequency you will query but not frequently so whenever you are having like you are not querying the data that will be maintained in separate so separate disk okay you will you are
72:30 - 73:00 not going to query the data but whatever you are querying frequently that will be mirrored out okay so you are for example you have irctc like uh last four months only you will be frequently going right last one year you will not query like last 10 years of data yeah but 10 years of data will be maintained in one disk okay we will not go for something like mirroring but whatever we are frequently occurring that will be mirrored out okay like uh last five years only we will find out so that concept will be raid five
73:00 - 73:30 but rate one is like all the data will be brought out mirroring for example rate level so mirroring and do duplexing the data will be duplicated in the second disk how it will be for example this is what uh you assume that raid1 okay disc duplexing so you have the something like disk right so this is like a disk via the disk disk one disk to disk three are those right we got the disk so this is like mirroring
73:30 - 74:00 we have one more disk okay this disk array controller ds is nothing but disk array controller will have that information like keep this disk failure okay if the actual disk failure it will tell like go and take it in the mirror disk this is actually the mirror disk okay so mirror disk the data will be stored in number of list you remember if you are going for mirroring it you have to record for one terabyte of data
74:00 - 74:30 you need to have two terabyte of disk data is fully replicated on the mirror disk whatever the data is available here it will be mirrored out it will be mirror okay we are not going to do any replication or we are not going to take backup it will automatically dac will take that whatever the data is storing here right the same way parallelly it will store here also if you delete record here it will automatically delete here also but you have 100 terabyte of disk then 200
74:30 - 75:00 terabyte of disk record 100 terabyte of data you have if you are going for raid 1 concept you need to have 200 terabyte of hard disk double double the storage okay so this is what i explained like read one is nothing but uh the mirroring so d1 d2 disc one you'll have this one mirror disk two this two will be better disk three three will be mirrored so this will be disk one a complete disk
75:00 - 75:30 one on disk uh disk zero and disk one within the disk you will have like a mirror right okay so this will be a mirroring mode this is also so either one disk controller will be there or two this controller will be there duplexing so it's more advantage so to this controller will be there to maintain the separate way if one path will get the failure the other path will take it like data this is for
75:30 - 76:00 the raid concept okay raid 5 is nothing but something like parity okay parity so there it is maintaining like uh disc one and this one disc two disc three is the same like this way right but how it will be maintained here right five okay stripping with parity so how it will be data divided into even number of blocks okay even number of blocks with additional parity block so we are clearly so for example here i have a1 b1 and c1 for the a1
76:00 - 76:30 you will have the block in different way here here you have the parity block okay this will have all the a data the parity data wherever the a is present then c will have c1 c2 c3 will have here something different disk you will have the c data so d is not available here right so d1 d2 d3 for parity blocks it will be maintained here okay so this is what you can reduce
76:30 - 77:00 something like data storage if any one of the drive has been failure data required with the remaining blocks the parity informations so this is what you can take it like raid 5 concept so they will go for raid 1 normally [Music] this is like a logical unit one logical
77:00 - 77:30 unit lun is nothing but one logical unit one logical you need to have all the blocks and you will have a mineral block also this is called rate concept disk level failure you have writing so disk level failure you have raid concept node level we have hard standing so amp level you have fall back okay what is that fall back for example you have four amp assume that you have four amp amp zero amp one m2 m3 if one have this failure then the data
77:30 - 78:00 has been stored in this amp is uh you cannot retrieve right we are not going to store the data in the amp but the structure will be maintained here it will access through this amp only the data we are fetching through amp only if one amp is failure then it will be difficult to retrieve the data for that whenever you are creating a table we will create a table with fallback so in our environment whenever we are creating a table
78:00 - 78:30 even though it is a default if you are not mentioning it will be created as fallback table only so by default sometimes you can find like a table with normally it's a for no fallback table right by default but in real time in our environment we have fallback table fallback table so what is that fallback table the mirroring happened on the amp level so amp level the mirroring will happen so for example the record 62 is coming here right
78:30 - 79:00 the same 62 will be maintained in some other amp also in the mirror in the fall back rows okay 62 is here the 62 are at the fallback if this has been failed this amp has been failed so 62 8 and 27 will be on different different amp 62 will be on this amp right eight will be on this amp 27 will be on this amp got it so yes one amp has been failure the data
79:00 - 79:30 can be retrieved on all other amp this is called fallback whenever you are creating a table you should mention that by default will be no fall back but if you want to create a fallback again it's overhead to the storage but the retrieval is very easy so whenever you are having some critical data right some sensitive informations for that we will create like fallback next one is like node level
79:30 - 80:00 node level we will collect like click so i i told you right that node will have something like node one node two node three it's a cluster like horizontal way of uh storing the data correct so one smp will have different node so smp zero zero zero one two zero zero nine s p zero zero one something like one to ten something like eleven uh twelve okay so you have something like different different numbers you have so smp this
80:00 - 80:30 is one smb one sm is having some node number zero one two three up to nine node we have so all the node will be created by uh based on the sharing the the disk okay for example if the node is disk disk controller it is storing here the same will be connected to the another disk also disk this this can be a mirroring option also okay this k will be here also click this is one click okay if if any one of the
80:30 - 81:00 node has been failure okay for another node has been failure how the data will be for example this is node one okay node two node three node four okay always you'll have one hard standby node whatever the activity which has been taken care by this node 1 will be transferred to hearthstone by node so whatever the activity has been taken care by this node right so all the
81:00 - 81:30 activity will be given to the node the heart standby node this will become node one now after that we will rectify this one and then we will make this as a heart start by doing once you are bringing up this particular node this will act as a hard standby it will not act as a node one again okay it will act as a hard start by node if any of the node is failure whatever that it will be processed the database is processing it will be transferred to
81:30 - 82:00 this particular node then it will become a hot standby node so this is what will be maintained the click click concept in case of node failure so it will be transferred to other node whatever the handle by this disk and everything so there will be there won't be any failure okay the locking system i will i will tell you after some time so how the locking will be while we are creating the table i will tell you that logging okay locking locking alone i will tell you here we
82:00 - 82:30 have seen that the disk level is called uh read amp level it is called fallback the node level we will call it as click okay technology this is what the data protection in the teradata regard it sometimes in the interview they might be asking this concept do
82:30 - 83:00 so consider this table it's a simple
83:00 - 83:30 create table and we have all the columns like integer worker and date columns so i'm going to create this table i didn't mention here any indexing so i'm going to create this table so if i'm directly going to this create table then it will be going to the dbc
83:30 - 84:00 dbc you will not have access so normally you have to mention the database so which database you are going to create a table so here in in dbc dbc i told right it's a database computer so in dbc we do have lot of databases so any one of the database you have to mention the table name okay so i'm going to mention the database name is retail retail dot the table name so table name i'm mentioning like
84:00 - 84:30 apply 28 so i'm going to create the table in the database name called writing if you don't mention the database name right it will not be able to create so it is showing like below it will show you whether the table has been created or not so it has been created with the zero rows so if you you want to check after creating a table you want to check whether the table has been created or how it will be you can use something like select start from
84:30 - 85:00 uh select start from the table name so you cannot directly use it you have to use in this way select start from table name it will not it will show you the table but it will it will not have any record okay so it will show you like uh select completed zero row certain so you want to see the structure of the table how that has been created as i told yesterday so you can make like so select star form so if you are making so start from it will show you that this is what it will
85:00 - 85:30 show you that the table okay so since here we have created a simple table create table table name so by default it is creating a set table set table in the sense it will not allow duplicate set table is different unique primary index is different unique primary index that particular column alone it will not allow duplicate but set table all the columns it will check all the columns same record should not be repeated so if
85:30 - 86:00 by default it will create a set table it is creating a no fallback yesterday we are seeing right now fallback and before journal after general i will tell you what is a journal concept after some time checksum so checksum is nothing but its os related uh stuff here data and uh who is related so it will have always default and see now employee number it is integer and all those so here we have the first name column which is the bank
86:00 - 86:30 of 20 we have given so it will take it like character set okay so normally that we have two types of character set one is latin and another one is unicode so either it will be created as latin or unicode so normally if you are having like any metadata tables like dbc or some other tables it will create like unicode okay it will not allow any junk values if you are creating user table it will create with the character set latin so i will show you some other tables like a
86:30 - 87:00 metadata table so that you can able to see the differences so i'm going to use select star from dbc dot tables okay dbc dot tables it will give you all the table list and then tables v is nothing but view we are not going to directly hit our tables we are going to hit on the tables v table speed is nothing but on top of dbc tables they would have created one view these are all something like system table rate
87:00 - 87:30 system tables i'm just opening it okay the character said they haven't defined right so normally by default will take character set as unique for the user table it will take it like latin these are like metadata table so this table has been created with so character set unicode right so this is what it will have so not case specific that means this column so case insensitive if you
87:30 - 88:00 are giving like vacant right so if you are giving like uh any case uh sensitive data it will not no issues even you can give any way so this is what table will be created if you check here the character set is two types of character set one is unicode another one is latin so after that you will you will find the date forward date format you have to give in this format only and there's case specific it will show you
88:00 - 88:30 not case specific we want to mention like a specific then you have to remove this and you have to create it so primary index it is taking like employee number so always first column it will take it as a line number is first columbus primary index got it so this is what create table so either you can go for show select start from or you can mention like show table table show table table name
88:30 - 89:00 so on top of this table if you want to create any view you can create it okay something like create or replace or you can go for directly replace create or replace view view name so view name is so whatever the database we are creating like retail dot so employee 28 underscore v i am creating a v view let's select start from yeah if you want to have if you want to
89:00 - 89:30 create different column you can be able to create different columns you can directly go for replace view so here you you are not having that create view access so i'm going for clear replace view view name whenever i'm inserting a data into the table the view also will get updated automatically so here you can go for show view view name so you can mention like show view it will tell you that what is the base table you have used so what is the base table you have used to create a
89:30 - 90:00 view so that is why you will create you will go for the so view your name so if you are going for select star from the view it will show you the table name as well as the view name also okay if you are going for select star from the view name okay it will show you the table name as well as view name so this is a way in real time also you want to check how the view has been created what are the columns they have mentioned
90:00 - 90:30 all those you have to go for this way the base table as well as the view will be showing you so this is like a set table it will not allow any duplicate record it will not allow any duplicate record so for example if you want to insert some record into the table insert into the table name okay so you can go to the values so you can mention the values what are the values you want to have for example you employ a number number i'm going for employee number it's a department id
90:30 - 91:00 then first name so i'm giving one first name last name comma salary i'm giving so i'm giving some salary here and i'm giving data birth so data but you have to give the format like this way this is what you have to give the date format so i cannot give the address like this way you can address close it and execute it so this line you can execute one draw inserted if you go check in the table it's a set table right select start from a table name
91:00 - 91:30 it will give you the result it will give you the result okay but if i'm going to if i'm trying to insert the same record i'm trying to insert the same record it will not allow you why because it's a set table it is a set table the full row duplicate it will not allow but since we have created the employee id as the primary index it will allow duplicate if i am going for this department id 30 so it is not full
91:30 - 92:00 road duplicate now since it is a primary index it will allow one if you have created a table as unique primary index then it will not be allowed you know so you can use it like this and then you can see the two record is available if you are checking the view also you will have the two records automatically get updated the view also so this is what the set table if you have created like multiset table then this record it will allow you
92:00 - 92:30 for example this table i'm going to create like multiset team table with multiset table and i'm going for employee29 assume that employee if you are not mentioning anything by default create a set table right so here i am going to use multiset table you have to explicitly mention that multiset table that execute table has been created now i am trying to insert this record so into this table called 29
92:30 - 93:00 so it will be exec it will be inserted so even two times i am going to execute so three times it will be executed so it will allow you to enter into the table if you are going for some performances you will go for multi multi set table only why because it will not always check for the duplicate record table at all you are creating rate for table and you know that stage table on one table stage table and work table these are like intermediate tables so normally this is what the table will be in the real time
93:00 - 93:30 first it will go to the stage table so from any outer world to error data we will load it into the stage table like in the data so from stage to work we will load some some intermediate calculation we will do after that from our table to final table so you can have multiple tables also like corner table and second of table you can do lot of transformation logic by between dot tables and they can load it into the final table so final table will be a incremental load so already data will be
93:30 - 94:00 there in the final table you are going to insert an update based on it but work table and stage table is a truncated load like delete and roll so you will already delete the record and load it so delete and load you use so this is this is also work table also delete and load we will use so it's a again it's a based on scenario if you are facing some issues you are going for the work table based on some insert into select you are making insert into select
94:00 - 94:30 some you are getting some some huge volume of record there if you are if you do not want to check duplicate then you can go for the multiset table it will not go for the duplicate check then the performance will be so the performance will be faster when compared to set table set table means always it will check for the duplicate whenever you are inserting a record one record if you are going to insert it will data will check that whether this record is present in any of them it will check for the duplicate so that is
94:30 - 95:00 why it will take a lot of time instead of that we will go for multiset table normally multiset table but some scenario if you are expecting like you should not allow full row duplicate that you can go for section in our practical environment we are using most of the cases are multi-set table and creating a set table will not allow any duplicate multi-sector table will allow duplicate so unique primary index so i am going to say table you know right already said
95:00 - 95:30 table it will not allow duplicate record but full draw duplicate but unique primary index is nothing but this column alone will not allow duplicate the other columns will allow to pick it okay so if you are mentioning like all the columns you mention it finally you mentioned like unique primary index of employee number it will it will not allow duplicate this column will not allow to duplicate so i'm going to create this table
95:30 - 96:00 so i can go for even my multiset table also okay so that column will not allow early duplicate okay so we should not go for directly you have to use that table uh database name so database name and then you have to run it so table has been created now if i'm going to insert the record even it will not allow duplicate on the particular column maybe 30 i'm going to insert the record
96:00 - 96:30 same record i'm going to insert it duplicate unique primary key value that is what we will be getting so if you are if you are changing this now it will allow if you are changing other columns for example if you are changing other columns last time you have changed the department id this time it will not allow so why because this column alone we have created as unique private index
96:30 - 97:00 okay so if it is a set table or multiset table so it will allow but here it is a unique primary index on this particular column so it will not allow you to insert the duplicate record so this is you upi column so this column will not allow any so it's completely on one amp operations okay so non-unique primary index if you are not mentioning anything it will automatically take non-unique parameter index set table with the unique primary index that is what we have created now
97:00 - 97:30 set table with the primary index a table within the unique secondary index so if you are going for secondary index right unique secondary index primary index and unique secondary index so unique secondary index will not allow duplicate on the secondary index column this column unique primary index and unix unique second index okay both so i'm removing this i'm just creating like primary index and the unique
97:30 - 98:00 secondary index so whatever the way you want to have you can create it so it has been created now this column will not allow duplicator department department one person should be there in one department only so only one person will be allowed for one department so i haven't had then i'm going for some other some other person but of the same department it will not allow
98:00 - 98:30 secondary index uniqueness violation this is what you'll be getting then you have to check what is that secondary index it has been uh you have to go for like uh social extract from this this show table table name you can mention it and then you can check how it has been created okay okay so you can assume that the primary index of employee number unique so if you are if you are going for secondary index no need to mention like secondary index and all if you are
98:30 - 99:00 mentioning like indexing primary index are indexing it will take it like a first index will be a private index the second integral it will automatically take it as secondary index unit secondary index so if you want to explicitly mention unique you have to mention like unique second unix here the table has been created with multiple primary index okay can i create multiple primary index yes you can create multiple primary index like i'm going to create with multiple
99:00 - 99:30 primary index so this way it will reduce the skewness the combination of these column it will go for the hash value okay if it is used to create the hash value then it will reduce these skewness so this is multi-column fragment index here you can go for unique primer index also these three combination it will not have a duplicate unique frame index okay i'm going for 34 now
99:30 - 100:00 this combination it will not allow duplicate see here 34 i'm creating it i'm giving some value now this will be allowed okay again so since it is a unique private index i am changing some other number some other number will it allow the employee here change a different number [Music]
100:00 - 100:30 but if you are going for same employee number department number and salary it will not allow so different [Music] location i'm going for it will not allow why because the same employee number department number salary i'm using instead of this salary i'm going for different salary it will now know so this is what combination of
100:30 - 101:00 some columns for the primary index either it is a primary index or unique framework index you can go for anything a fallback table as i told so we are not going to see any differences in the fallback or no fallback by default the database will take it like
101:00 - 101:30 [Music] if you want to mention then you have to mention like after mentioning the table name come on fall back you have to mention it then it will be created okay so here you have to mention so far back okay this way you got to mention then it will be created for example i'm creating like uh something like party i'm creating a table with fallback table it will be created so select star from 40 it is a
101:30 - 102:00 fallback table now it is a fallback table so this is the way you can create a fallback no fallback tip creating a table with no fallback by default you go for no fallback or you can mention like no fallback before journal after journal journaling concept i will tell you after some time okay after journal the whole journal column not table so this one creating a unique
102:00 - 102:30 secondary index after creating a table so if the table contains a record even you can change the unique secondary index or you can create a unique second index after creating a table so making an exact i will i will show you okay after creating a table if you want to make uh if you want to change the primary index from one column to other column how to do that i'll tell you okay exact copy of the table with the data you have to mention this trick
102:30 - 103:00 create table table name create table table name as another table name with the data if you are going for the table name you can mention simply a table name or if you're going for something like select star from then you have to mention like so if you want to create an exact copy of the existing table you are something like a backup table we are creating right so we can create like backup table like this okay so i'm going to create one table
103:00 - 103:30 so i will check whether the card is present yes a record is present i'm going to create one more table backup table for this table exact copy of this table okay so you have to mention create table table name underscore bkp i'm creating as you can mention the table name or the select also so you can mention the table name or select also plus table name with the data you have to mention in
103:30 - 104:00 this way with data so if you want to have only structure then no need to mention that you can mention like with no data you can mention like with data or with no data so these two are same if you are making select star from you have to make it like bracket so this one i am creating it right now create table completed this also will create the same table different name i have given if you go and check this backup table the record is present now so this is a
104:00 - 104:30 backup table so if you do not want like data you do not want the data you have to mention like with no data only structure of the table will be created so table has been created right so if you go and check this table will have only the structure of the table after they can insert like insert into table name something i you can mention it so you want the primary index also to be copied then you have to mention with the data comma primary index of so you have to
104:30 - 105:00 mention like primary index right otherwise the primary index will not be copied primary index of column you want to mention it which column you want to make it like primary index so for example this table right so selector from backup right so primary index so since it is only first column it will be automatically taken but if you are mentioning like different primary index right it will not be taken 34
105:00 - 105:30 backup i'm creating it with the data okay data will check whether the primary index is copied or not okay show table table name it will show you that so private index didn't copy it here we have the show table we have how many columns are there in the private index three columns we have created like with
105:30 - 106:00 the data since we haven't mentioned the primary index it will create like only one column of primary index if you want to copy it then you have to mention like this way so backup one i'm creating okay with the data with the primary index of equinox we have a column you have to mention okay labor index of those three columns you want to mention these three columns you have to mention this otherwise it
106:00 - 106:30 will not be created like so whenever you are creating a table so you have to mention that primary index also now if you go and check this table backup one okay so underscore backup on show table it will have that private index also so you know the backup table also will have that index this is what you have to create it with the data and private index only dimension make commander you have to directly mention like with data space private index of which column
106:30 - 107:00 you want to have you can create in this way if you don't mention the first column will be taken as primary index while creating the backup table so this is like a create table select start from we have used right instead of table name directly we are using just a selection from the table name okay with the data specific column if you want to have then a convention like select the column names with the data if you want to create table with statistics uh you have to mention with the data
107:00 - 107:30 under stats so you'll do some collections right if you want to create a table so you are selecting some record from other table so you want to copy the stats also with the private index so you have to mention with the data you have to mention that stats also be collected statistics means it is after loading the data on the table we will create statistics so now we will create one table with primary index and then we will try to
107:30 - 108:00 change it we are creating a table with primary index of employee number okay so first we are creating the table as primary index of employee number this way then i'm i'm loading the data before loading the data if you want to change it you can change it for example first of all i have created the table with primary index of employee number right after that i want to modify it so i can mention like alter table table name table name is this is the table name okay you have to use modify
108:00 - 108:30 okay modify primary index of if it is simply primary index you can mention like primary index of primary index of uh instead of employee number i am going for employee number and department number right employee number comma department number i am looking for so this one i'm executing it that the table has been modified now okay if you check now show table table name since we don't have any data in the table it has been modified right so table so
108:30 - 109:00 it will update to be available like uh you check now so it has been changed but we have created a table okay so i'm going to drop this table recreate you are going you have created a table and loaded some data drop table table name i'm going to create again the same table with only one columns framework index and i'm going to load some record into the table
109:00 - 109:30 i'm going to load the record into the table so with the primary index of one column right after that i'm using the same alt statement so with no data we have we able to do it but with the data you cannot able to do it we cannot do it okay so this is what you'll be having alter table table name if you want to do it if you want to do it one thing is you drop the table and recreate it are you create a backup table for this
109:30 - 110:00 and the truncate tracking that they will enter table delete all the record and then load it again so we will do it now so i'm going to create a backup for this table or you can do one more thing you can simply rename this table so normally you will not uh drop the table existing table so rename table name to new table name like the underscore simply we are creating a table with backup
110:00 - 110:30 rename table older name to new name i just renamed ok the table now i can create a table with private index of two columns correct the 50 has been renamed now 50 will not be available so i'm creating a table with two columns now i have created table this is what we will do in real time also then you have to bring back the record from backup table to normal table
110:30 - 111:00 insert into this table okay so you have to mention select star from the table link right so you can directly mention no need to mention anything select start from underscore backup we rename like this and then we re create them yeah see here select so instead of creating so instead of
111:00 - 111:30 creating a backup table and then drop and recreate i haven't dropped a table so why because sometimes if you drop the table the property also will get dropped stable property will get dropped right that is why we have renamed it this table structure will not will not anywhere change sometimes if you drop the table and recreate it right some of the the unique code character or some some character they have created to be changed to the default the table are done so to avoid that we are going for this way
111:30 - 112:00 renaming it and we are doing it the primary index the same way if you want to go for unique private index you have to mention modify unique frame index so if the table doesn't have any record so if you want to make it like unique private index you can mention like unique private index this one will not be it will not do it now why because the table contains a card okay unique is not allowed so you have to use delete from
112:00 - 112:30 table name so here in teradata so whatever that statement you are executing its autocommit okay if you have deleted that after deleting it you cannot retrieve it so in oracle we have something like you can retrieve right dda dml statement you can do the rollback but in teradata we don't have any rollback okay that is why we should be very careful on before deleting the record so after deleting it i have changed it to unique parameter index
112:30 - 113:00 before that we are getting some error now it is we are getting like okay the table has been created altered so this is what you can go for private index set table multiset table all those the one thing is if the table contains data you cannot alter it correct so what we can do is two way you can do one thing is you create one backup table for the existing table okay so you create backup table here you have the table
113:00 - 113:30 it has some data also there are some data already okay okay something like this will be a primary index now but you want to add this column also so with the data you cannot modify what you can do you can create on backup table for this right you can create one backup table and then you can truncate all the record from the table you can delete table so if the table doesn't have any record then you can modify it correct modify it
113:30 - 114:00 and bring back again or you can drop the table but if you are creating a backup table right sometimes the table structure okay if you are dropping a table and then creating it a stable structure might be the unique code or fallback or some other they would have created something it will get changed the primary index will be changing to avoid that so if the table contains data we are just renaming it this table we are just renaming it okay
114:00 - 114:30 we we are renaming it different different way we are renaming it this table so just we are renaming it this one we are renaming it to this one and then we are creating one table newly then we are bringing back the record from here to here okay so anyway you can the more the way we'll know but renaming option only we will do why because we are not going to drop the table right the table will be maintained while dropping it if it is
114:30 - 115:00 something happen while creating the backup table some record is not moved correctly that will be a very big issue correct you have billions of record here okay in this table you have millions of record you are creating a backup table if it is not correctly created the backup table what will happen if you truncate the table main table rank it the record will go off right so to avoid this kind of confusion we are just renaming it
115:00 - 115:30 and then creating the table structure with the modified primary index then inserting it by selecting the backup table record so insert into table name no need to mention anything directly convention select start from that column okay no need to mention any bracket or with the data or something no no need to mention all that okay so this is what you'll do for the create table structure we need to do as you said you are going to work in something like b text script or
115:30 - 116:00 something we are having source type this is source table this is teradata you are bringing the data from this table to teradata so you are you are going to do lot of logic in multiple stages multiple stage you are not applying all the logic into a single query you are going to apply a lot of logic and then you are bringing the record to this final table these are like a stage table this will be a stage table okay this will be a stage table this will be our table table one okay or table two
116:00 - 116:30 something like one table is nothing but some intermediate table normally you'll have some intermediate table so intermediate table you will have you will bring the record from the source table to this stage table stage table always will be a deleted node why because we have all the data to the final table here right up to yesterday data it will be available here no need to keep the record in the sage table or even in your table normally stage table or dot table are
116:30 - 117:00 delete and load we will delete the record in this table and then load it again new record for today we will load it again right this was stage table for example just up yesterday you have loaded the data into the final table got it okay up to yesterday you have you have taken the record into the final table so you need to bring back today's record only that is a delta load correct we will call it as delta rate
117:00 - 117:30 so delta delta record that that is whatever the data has been inserted or updated after this particular date we will have some threshold date from that date only we will fetch for example up yesterday 7 pm it has been loaded up to 7 7 am it has been loaded okay our 9 amd has been loaded yesterday up to 9 am loaded so yesterday's 9 am to today's 9 am is one day data will be available here right it is not available it is not loaded here
117:30 - 118:00 so we will take that one day data and then load it into the stage table do i need the studies table in the data in the stage table no right we already loaded so stage table we will delete the record already record and then load it then stage two table we will do some transformation logic we will do some transformation from work table one two what table two we will do again something like joins up queries we will apply some logic and then we will load again here to here we will apply some logic
118:00 - 118:30 so we we don't have any more record then we will go for merge code merge into target table so you know that merge right statement merge merge into this table why because why we are merging if the record is already present here we will update if the record is not present we will insert it that is what we will do so now always remember the stage table on our table will be deleted we will delete the record in the existing table and then load it why because this is only one one day record
118:30 - 119:00 this is all the orange color represents only one day but it will have file you have all the history of data we are just inserting or updating right we are not deleting it okay that is why the stage table and what table will be stage table is landing area might be your article so from oracle to you are bringing the record into the teradata correct so first you will bring the record from oracle to teradata in the stage table this entire
119:00 - 119:30 entire uh staging is under teradata got it this work table stage table and final table will be on the error data so normally teradata will be running on the parallelism right yes for analysis purpose it will be good so for analysis purpose we are going to analyzing right for any banking domain or anything they will do lot of analysis like how the customer is performing how the product is moving okay which region is performing well which reason is not performing well how much we have
119:30 - 120:00 collected for today's profit loss everything they will they will analyze it on the table all those details will be there in uh general normal rdbms also right but yeah this is also rdbms only teradata also it's rdbms only but it can handle huge volume of data so this this is for the user for example you are having the banking rate this will be your oltp system ostp is nothing but your online transactional processing
120:00 - 120:30 all so if i am the user for my icsa bank i will go to www.icsa.org then i will do trumps transfer or i will purchase some i'm going to recharge my phone then i will use my ice a bankrupt icc bank uh mobile net banking and then i'm using that amount i'm doing the transaction right everything will be saved in this database yeah one hour once we are loading the data from here to here for the analysis purpose due to corona how we are
120:30 - 121:00 impacted last four days okay we want our answer whether we have improved some some business will get improved some business will go very slow right i have different different businesses so what is the impact on our coronal effect something you want to analyze it so you will analyze on this teradata only most of the companies if they are going for huge volume of data they will analyze it through on the teradata only why because data
121:00 - 121:30 for analyzing it will be fast running faster when compared to others this teradata is for analysis purpose meant for business people not for the end user sometimes we will get data from this uh error data also for the applications but most of the data most of this olap system is meant for analysis and we will do analysis okay so that is why we are uh loading the data from here to here everyone our owns are often our own uh when we are trying to
121:30 - 122:00 like uh uh look for the history and uh there won't [Music] it should not have more than five years or more than one year which depends on the company some company will have five years of data here some company will have only one year so and also here how can i find that uh the record has been changed past one hour inside our updated will something like commit timestamp
122:00 - 122:30 okay so timestamp column will be there comment [Music] these columns is ordered columns based on this column we will bring that data into the error data while we are loading the stage table right we will be making like select star from the table here oracle okay where created
122:30 - 123:00 date and time stamp is greater than or equal to two days half an hour that sequence you will make it like what is happen hours then one day it will be running on 48 times 48 sequence should be running based on the sequence number we will be passing the timestamp and then we are fetching the data from here to here whatever the half an hour it is getting inside or updated here it will be picked up
123:00 - 123:30 so [Music]
123:30 - 124:00 we have three different types of space one is permanent space sometimes they will be asking in interview also one is permanent space
124:00 - 124:30 the school space and the temporary space temporary space permanent space is nothing but it's a physic physical space that objects will use that okay something like tables the views and the databases so users it will be users all the space it's the physical space it will be used like whenever you are creating a table right so it will be a it will allocate some permanent space so whenever you are creating a database
124:30 - 125:00 you have to give all these three spaces if you are trying to create a database you have to give three spaces one is permanent space full space and temperature okay permanent space is like uh so if you are having thousand terabytes of database so how much you are going to allocate for permanent space okay that's full space then the temporary space so normally permanent space will have more spaces and normally you have a database whenever you are creating right so database tables uh users all those it will take the views will take all the permanent spaces
125:00 - 125:30 then you have the school space spool space is nothing but unused space so unused permanent space is called schools so you are running on query on complex for you are running you are running some rank something like the rank you are finding the rank or you are doing some aggregations okay you are doing some sub queries all those you are running on bigquery you are running it will take lot of space right for example you are going to run a rank function on one particular
125:30 - 126:00 some millions of data then it needs to compare with all the millions of data and it is doing sorting so it will take some space to do the sorting algorithm or you are finding some distinct distinct rank or sorting or aggregation like minimum maximum sum you are doing some aggregations then it will take a lot of space that space will be taken from the unused space so i told right so if you have a thousand
126:00 - 126:30 terabyte of artist space if you are allocated 500 gb for 500 db for permanent space the remaining something like 400 tb you are using stressful space that 400 tb will be allocated in all the we have some 10 amp or 12 amps or 24 amps so how many amps we have that whatever the space we have left out for the smooth space right it will be allocated in all the so whenever you are doing some intermediate calculations like uh sub
126:30 - 127:00 queries aggregations joints you are doing so that spool space will be used the unused space so we need to assign the upper limit of the unused space but what is the spool space for for each and every user for example you are one of the user for that particular database right for each and every user will have one school space for example if i am the user for a database then each and every user while creating the user id they will allocate what is
127:00 - 127:30 the maximum spool space they can use it if they are allocating like five basically uh we have to like we have to mention it how user id and password then it will create the user id for the database based on the rules
127:30 - 128:00 then it will allocate automatically for that user what is the based on the roles it will allocate the small space one particular user that school space you have to use it okay no other space you cannot use it for example you are running a complex query that is is requiring more than the small space you cannot run that coin you will get another like spool space issue no more smooth spaces okay the school space is nothing but unused permanent space which is used by
128:00 - 128:30 the system okay teradata so similar to the permanent space pool space also you have to define a maximum amount when you are creating the user okay but also the spool space is divided by the number of amp so whenever you are you are creating the user right that time the school space the maximum limit will be assigned so this is like something like manual query something like the dba we use that so create user username this is data values
128:30 - 129:00 production admin so admin user they are creating from data values pro that that is one of the might be they are having some database as permanent spaces this money bytes out of this small space is these many bytes this something like 2 tbr 2 gb 20 gb or something this 20 gb can be divided into all them if i have something like 10 amps in the database then this pool
129:00 - 129:30 space will be divided into 10 amps okay so when will you get no more full space whenever for example if if one of the amp is getting skewed okay if you are not selecting proper index right for the particular table if one will get uh skewed right it will get more values so you will have some space that space also utilized then you will be getting no more small space so you have some four amp system if 200 gb is allocated for totally for the
129:30 - 130:00 spool space like 24 arm system then each amp will get 8.3 gb smooth phase all other amp is using within 8.3 but one amp is using 10 gb then you will be getting full spaces here no more small space okay got it if any one of the amp is getting issue then you'll be getting no more full spaces okay even if that query is skewed that definitely will be having
130:00 - 130:30 that small space issue so to avoid that school space is you what you need to do first of all you need to run that query you go for the explain plan so if you are running on query right for example you are running select distinct of some id okay product id from the table it has so many product id you are running some select distinct pro id something like you are making some intel corey and then you are making some distinct value then it is taking a lot of sure and then it is showing like no more full space
130:30 - 131:00 issue it has been errored out then what you need to do you have to go for explain plan for that query then it will show you where is the terror okay so you do collect stats okay you do collect chats check whether the table is a set table or multi-set table you go for multiset table you are matching on table right selecting a table that table you check it whether it is a set table or multi-set table even insert insert site but also you will be getting some small spaceship it will take a lot of time to process it to insert it
131:00 - 131:30 so why because you are creating a set table while inserting it it will take a lot of time to insert it will be checking the duplicate right and it will be taking small space issue that time you will create the multiset table then it will easily to load into the table uh you said like if you create a multiset table then this um prop ratio will be resolved now that's what you told right yeah why because you are having set table set table will check each and every record and it will check the duplicate right if
131:30 - 132:00 we create this set table then on the skewness problem and all will be resolved now then also then there will be no spool space now no no no no set table you are inserting the record into the set table something you are selecting in different table you are doing some insert into select operation you are making so it will be selecting some queries and then you are loading into a set table this times the set table will check each and every time it will check uh duplicate okay
132:00 - 132:30 duplicate check right whether it is each and every time it will go and check whether it is a duplicate or not so that is why it will take a lot of small space you will be getting another route instead of a set table you go for multi-set table
132:30 - 133:00 and then if you have any duplicate you can rectify it and also you go for explain plan that explain plan you check the giant indexes okay now the giant indexes is whether it is going for a one-nap operation or operations child index you check and you check something like skewness also okay so that's the plan i will tell you after some time while checking the sampori i will do the some extreme plan that time you can get to know the check for proper index
133:00 - 133:30 whether it has been index properly defined or not sometimes low index also will be available something like no primary index table so that time you will not have some spool spaces you will be getting so that time you have to do it okay this is what the smooth space you will be getting another these are like something like school space definitely you'll be getting this issue in real time okay so that time you have to go for the explained plan and then you have to check whether you the first thing is you can do collect stats on the
133:30 - 134:00 column okay okay correct stats on the column you check multiset table set table so then these two things you can do immediately okay then you can check some index column indexing giant indexing how the indexing is whether it is going for something like cartesian product or not but is going for cartesian product definitely to have lot of smooth space okay so avoid that you have to do some join columns very correctly for that you can go for some giant index you can
134:00 - 134:30 create on top of the table after this temporary table you will go and see the giant index so giant index you can create to reduce that kind of issues okay and the skewness also to be showing you whether you have the particular color is having more duplicate data then you can avoid the data while bringing it itself okay something like you can go for rank column you can apply something logic you can remove the duplicate on the particular column by using some rank function the
134:30 - 135:00 qualifier then you can go for this temporary space what is the temporary space it is the amount of space okay it is the amount of space used by the global temporary table okay temporary space space used by the global temporary started what is global temporary table i will tell you global temporary table i will tell you these are the three different spaces are available in teradata so for example here this database we are having right we are via that pin for this database
135:00 - 135:30 in real time you will not have access to this particular page this tool you will not have access so here this is what my database i'm going to create the database i can create a database and i can create a user if i want to create one database database in the sense you'll have multiple database inside the data right so i'm going to create one database so database name it is asking right asking who is the owner and it is asking the permanent space correct school space
135:30 - 136:00 so this is what it will ask so either it is a fallback fallback means even free space the database it will have one more memory before journal after journal even general concept also it will take a lot of memory if we set all these uh options here then all the tables the database will be by default we'll set to this properties yes correct correct a database database entire database itself will be a fallback before journal after journal
136:00 - 136:30 journal is nothing but you are writing inserting updating the account into the table right like we are having some paper and then we are writing it if something goes wrong in between how the data will be stored where up to where it can be stored that is what the journal i will tell you so that is also same right how it is going to be so it is like a journal okay these things you they will be giving us this one we will not use but in real time it will be automated process okay they will not use like manually
136:30 - 137:00 they will have some tool we have a temporary table we have three different temporary table one is like derived table volatile table or global temporary table okay global temporary table and the third table is we will call it as the volatility all right controller table what is this we will see one by one so derive the table we don't have control okay we have a table we don't have control that error data will take care of the derivative so something like inline views in oracle okay inline use you know
137:00 - 137:30 right whenever you are having some sub queries or joins or anything else we are making alias name the table alias right or here we are making some sub queries for example this is the query we are making okay so one query we are making some sub queries we are making that queries we are having some inner query we are aliasing like a rb or something we are making this query as a or b or something so this is one courier like t we are making
137:30 - 138:00 so what is this t whenever you are going to execute this query that time it will execute this integral at this instance it will execute this query it will save this result into a table called t with the product id and sales table will be created it will be called as t so from there it will take the outer query once the query has been executed this table will get vanished
138:00 - 138:30 this table will not be available if you are going for select start from t it will not be available this is called derivative data it is called derivative okay delivery table maintained by canada only we are not able to create any table so next one is like volatile temporary table so what is the quality temporary table normally if you are doing some intermediate calculations right intermediate table you want to create storage of you are having very big query that query you are going to divide that query into multiple stages
138:30 - 139:00 like you are storing one data into a table resultant of this table you are storing into another table then another table you are doing some transformation logic another table another table then final table you are loading instead of doing all the logic into a single table single query you have splitted that query into multiple temporary like stage work that is physical table this is not a physical table so this is a physical table only but it will be created only that instance
139:00 - 139:30 okay once that session has been completed that java has been completed this table will get vanished automatically will get dropped this table will not be available okay the space will not be used the data will not be available and the structure of the table also will be get vanished global temporary table also it's a temporary table okay so you can use it for some other purpose but that for the ddl okay the structure if you are creating something create table
139:30 - 140:00 table name you are using right that structure will be maintained for other sessions also for other users also so if i am the user i can use one table called gtt underscore one that is one table it has some columns i am using so i have used the table for to store some data and then i have come out of the session if you are another user you are entering into the database and then you are if you want to use the table you can use it once you come back your session data will get balanced but
140:00 - 140:30 the structure will be maintained the structure will be maintained in the global dictionary got it so i'm going to create one volatile table now okay i'm going to create one volatile table then i'll show you i'm going to create one volatile table uh for example the retail database i will create okay so in the you can mention create table table name and the detail dot something r so i'm going to execute
140:30 - 141:00 the syntax called database retail i'm going to execute the syntax called database retail that means no need to mention the database name every time okay one time i have mentioned here then all the times it will be retail table only detailed database only so i'm going to create a volatile table now so create volatile table table name something like log it will take will be maintained for long department number average salary maximum salary something i'm creating on commit preserve rows it
141:00 - 141:30 will preserve the data if you don't use it then it will not save the data if you don't use this uh oncoming pressure rose it will not store the data in the table so here i am going to create a volatile table table has been created right in this instance it has been available so i am going to insert some record into the table generally in which cases in real time we will use this type of tables yeah that is what so you that i told right so if
141:30 - 142:00 you are having very big query or very complex logic you are going to use it you can split up the complex select query into some small small select a you can store it into multiple volatile table after that you can combine it and then you can have it in the single table okay then after combining it no need to have no need to keep this table right what is the use of this table it will occupy space if you keep it every time if you are creating a table templar will tell you to occupy the space right so instead of
142:00 - 142:30 that we are banishing the table for that purpose we will go for volatility so now i'm create i i'm going to insert a record into the table or tile table i'm going to insert it so our one row has been inserted so if i if i go and check now select start from the table name will be available the data will be available the table data is available right so simply you have to mention create volatile table that's all if you go and check now
142:30 - 143:00 so select star from the table if you check the definition it will be a create set volatile table it will automatically it will create set table only if you want to create multiset table you have to create multiset table here create multiset volatile table on the table fallback has been enabled for this uh this is a volatile table clear set table multiset table you can create it so now what i'm going to do
143:00 - 143:30 this is i told right only that instance only available right yes instance only it will be available i'm just log out i'm doing some logo if you have created any table if you log out and login also will be available right yes but but volatile table will not be available see now volatile table will not be available so it's showing like uh so i i will execute database also
143:30 - 144:00 okay i'll execute select start from the table name the table showing like does not exist why because everything will will get vanished if you don't use on commit preservos if you don't use for example if you don't use on commit for service what will happen it will not store the data okay i'm going to create a volatile table a table has been created after that i'm going to insert some record
144:00 - 144:30 so one row processed but if you go and check now data will not be available so you know if you use on commit preserve rows then only the data will be preserved same instance also it will not be available okay this is called volatile table are you clear volatile table it will not have that structure as well as data will be get vanished structure also whenever whenever you want to have a table it should not be maintained after the session after the job completes then you can go for the autotail demo
144:30 - 145:00 global temporary table is only difference is the structure will be maintained for all the session the data will not be maintained data will get vanished but the structure will be maintained after after the session also in the global temporary table why because it's a global anywhere anybody can use that create set global temporary table so retail dot gtt this one i'm creating it okay on commit reserve rows just it's a global temporary table
145:00 - 145:30 right it has been created right so after that if you want to load some data gtt just unloading the data if you go and check this uh data will be available now so let's start from the table name to be available okay this will be available so you are after that you are logging off your instance you are logging up you are in again you are logging in that means your job has been completed
145:30 - 146:00 as soon as it will log out right once java completed logo again you have locked it yeah only structure will be maintained okay so retail i need to make it [Music] so table is available but zero rows returned this is called global temporary table so structure will be maintained
146:00 - 146:30 okay but no data will be maintained so others if you want to use the same structure they can use it no need to create and every time so other users and also that instance only it will be available the other instance the data will be different data also you can use it for example if i'm the user i'm using this instance some other data the same time you are using the same global temporary table in your user id it will be different different purpose you can use it okay so that is called
146:30 - 147:00 global temporary table so this is the three different tables which we are available derived table we don't have control with error data will take care the control of derivative so we have dbc right dbc or database computer we have different tables are there in the dvc different tables are there in the pc so dbc dot tables it will give you the all the tables information dbc dot tables your name tablesv is the view on top of
147:00 - 147:30 stable they will be if you check this so they have created one replace view view name select dbase dot database name everything okay table they have they are done with leftover join they have created on view okay so this is the view we do have under dvc.tablespeed we can select it if you are selecting it you can check all the tables so the
147:30 - 148:00 database name table name so how many times the table has been created or altered so alter or replace the version table kind is the either it's a table or view or indexes so you can see the different table kind okay and then so table kind i will tell you so this is the abbreviation for all the table kind okay so no pre normally no pa table it will be your table kind of v means the view
148:00 - 148:30 h means it is a instance or constructor method under m and f f is nothing but scalar udf m is nothing but macros j is nothing but journal the indexing join index i is nothing but join index g is nothing but trigger so we have something different different uh table kind we have so it's a fallback or no fallback so that is the protection type journal so whether we have journal or not who has created this table
148:30 - 149:00 okay what is the syntax of the table so when it has been created who has created okay last altered by everything will be maintained here whatever the sql we have executed right it will be available on the history view history show history here so whatever the sql we have executed so far right whatever we have executed will be available here from the beginning whatever we have executed it will be available here so all the sqls will be available here pc dot users it will give you all the users how many
149:00 - 149:30 users are there it will give you the users list so different users how many users we have dpc dots all space now all the space what is the product number so database name you can go for retail okay account name table name how many space permanent space school space it has been taken it will show you where you can mention like uh name equal to and the table name
149:30 - 150:00 has created one table right bonding we have created some table like three tail the country table we have so current apartment space this is the space it has been spool space it doesn't for whenever we are running any query only it will take small space so permanently it doesn't take any much space here so while running the query it will take the small space then dbc dot index it will it will give you all the index informations you can run teradata in two
150:00 - 150:30 different mode one is standard data mode the ansi mode so whatever we are running it right it's a teradata mode so you can you can check all whenever you are running data it's uh by default it will create a set table set table it is not a case sensitive so whenever you are running any query it will be automatically commit implicitly it will commit supports all terror data commands it follows beat it begin and then transaction mode
150:30 - 151:00 so ansi mode is different transient mode by default will create multiset table and case sensitive also even you can check that which mode is running help session right error data mode okay i told right our database is running at any data mode in your database if you want to check you can go and check real time which mode is running so by default we tell our data more only so index you know what is that indexing of this primary index secondary index partition private index
151:00 - 151:30 and no primary index right this giant index something different this giant index whenever you are creating a base table you are creating a base table you will have the indexing but you are querying the table based on some other column okay some other column so that time the performance wise it is not that much improved why because you have created index on one column but you are carrying some different column the time you will not have that much amount of performance this giant index you are
151:30 - 152:00 telling error data to create okay you are creating a join index after that the teradata will take care we are not going to see that giant index we are not able to see that chin index okay so if you are going for index creation the teradata will take care giant index something like abuse how we are creating the views in oracle right on top of a table we are creating the virtual table to access it multiple tables you know
152:00 - 152:30 right views what is views in oracle it's a virtual table so like you have multiple tables are there so always you are you have to fetch the data from multiple tables every day so every day instead of writing a query to join all those one create take the record you will write one time as the view you save it in the database then if you are making from you then it will pick the data we are not going to run any query the oracle will run it the same way
152:30 - 153:00 the join index we can create okay created by user but maintained by the data it is an optional index which may be created by user it will be for additional efficiency it will eliminate the base table access the main thing is it will eliminate the base stabilizer whenever you are going for something like select star from the table name you are not accessing the primary index column which we have we have already created you are not having that column so you are using
153:00 - 153:30 some other column then teradata will check that whether we have created join index for that column or not if we have created it will run that giant index and then it will put the data it will eliminate the aggregate processing reduce the joins and reduces the distributions redistributions so however it will reduce the dispersion i will show you so this is what so you have seven different types of giant indexes seven different types of giant indexes
153:30 - 154:00 one is single table giant index multi table giant index multi-table compressed giant index aggregate giant index sparse giant index global giant index has giant index these are like day seven different giant index are there in teradata so the first one we are going to see like single table join index you assume that you have table called employee underscore table you have the table called employee underscore table so this table this employee underscore table is having
154:00 - 154:30 primary index of some other column uh primary index of some other column but we are we are not going to access that particular column as primary index so for that we are going to create one join index on top of the table on top of this employee table we are creating one more indexing create join index index name employee underscore index as select star from the table name primary index of department number in the base
154:30 - 155:00 table employee number may be the primary index but in this index table we have created department of number is the indexing single table join index duplicates a single table it will duplicate a single table but changes the primary index but changes the primary index user will only query the base table okay only query the base table but parsing engine rate the teradata
155:00 - 155:30 will access this giant index will access the giant index to fetch the data based on department number this is called a single table giant index you are having one table called employee table okay employee underscore table okay it has something like a column called employee number department number first name last name all those okay okay so this table has employee number department number first name last name salary on the column in that table we have created employee number as the
155:30 - 156:00 primary index if you are making so select star from the employee table it will show you this table structure only but on top of this table we are creating one indexing single table join index okay if you want i will create and then show you you are creating this table the reading employee table you have created okay if you are making like show table table name it will be available on top of this table on top of this table we are creating a child create
156:00 - 156:30 okay single table giant index right join index of on this table only okay create join from this table okay so we have to mention that the primary index of which column we are going to make and also join index where we are creating it so this time it has been created okay but this time if you are going for something like so select start from this table show table table name it will show you that uh
156:30 - 157:00 employee number as the primary index employee number is the primary primary index but whenever you are running query based on department number the parsing engine will take this query this index it will take it will be faster okay this is called single table joint index this index we cannot able to see that our teradata will take care but this table we are creating a single table right so multi table means instead of single table we are going to use
157:00 - 157:30 multiple table the giants joints we will use okay it's a giant index that involves two or more tables so join operations to eliminate the data redistributions so i will show you what is that so here we are having the employee table right employee table as e inner join department table sd we are making something column write this one and we are making a primary index and employee number this
157:30 - 158:00 time so whenever you are going for something like joints on this two table this giant index will be invoked automatically this giant index will be invoked clear this is called multi table join index multi-table join index compressed join index is nothing but so instead of the compressed join index is nothing but repeating there is designed to save the space not repeating the repeating value same column names we are not selecting it
158:00 - 158:30 just simply we are making like select c dot customer number customer name order number order date order total so this is like c this is that order from customer table a c inner join order table so and uh the same thing will make it like okay so if you are making first bracket it will assume that the first table second bracket it will take it like a second table it will be very easy to
158:30 - 159:00 understand by the data so this type you will maintain in the compressed join index so this one just we will create it whenever you are going for the something like explained plan i told right explain plan that explain plan it will show you like uh create giant index something to tell you that time you you we can create a giant index that time you should know what is iron index either single single table join index multi-table join index aggregate join like something but you
159:00 - 159:30 will use some group by columns so if you are using some count and group by that select statement you can use it like aggregate giant index sparse giant index something like where class so you are using some wire class right something like where plus v if you use that where class so join index that doesn't use every row because it has a wire class why because you are restricting the data it will not use that all the rows and here you are you have restricted the data from at this year alone 1999 alone
159:30 - 160:00 so this is called this pass index okay so whenever you are restricting the data it will go directly to this view and then it will show you this is like a sparse and hash index something like hash value will be created we are not going to handle it the teradata will handle it okay create a hash index index name uh department number first name last name on table name okay one hash value will be created in in the amp level will be created it'll it will be
160:00 - 160:30 maintained we are not going to create it so these are like different types of giant index which we are using it in the teradata giant index is something like views we are creating we are telling teradata instead of using this indexing column you use another indexing curve [Music]
160:30 - 161:00 you will see the territorial joint
161:00 - 161:30 strategies so teradata join is different and the giant strategy is different giant strategy is nothing but terra data optimizer literally we'll do this uh type of giant strategy and they will make the joints okay normally we are going for a different table joints right so how it
161:30 - 162:00 will take the joints so we will see now whenever we are doing the joining right for all the columns teradata will look for the joint strategies so based on which is the least uh cost plan it will execute and then it will do the joins based on that okay so it will consider the three main parameters one is like a table size so how much is the table we are going for the left table right table which table is a bigger size or lesser size it will
162:00 - 162:30 go for table size and then the primary index information and then stats information like we have four types of chart one is merging join uh strategies nested joint strategy and pro hash product okay so first of all join it will have four different type of strategy to follow so teradata will take care we are not going to handle it okay so teradata based on the join column it will take okay we are not going to
162:30 - 163:00 see this we are just we are making that join the column name okay whatever the join condition we are making based on the join condition it will go for the giant strategy the first strategy will be merge giant strategy one okay so it will have one upa curve so both are like upa so as we have seen in the morning so if both are in upa it will be on the same amp the data will be on the same amp row distribution our data distribution will not happen for this scenario the data
163:00 - 163:30 will not be moved to school space rco also and it will take in the very faster way okay since the datas are in the same local um it will make the choice and it will create the results okay and no data has to be moved into small space or into the others uh no distribution will happen and join can perform in within the local amp itself so that is why it is performed very well
163:30 - 164:00 okay next joint strategy is one is upa column the giant whatever the giant condition we are making one is upa the other one is no pa okay so it's not a pi column right so here consider two tables like amp one and arm two and two tables are department table and employee table if you consider this table if you look at this table so here you have the up icon in departments table department id is the upi column
164:00 - 164:30 okay so unique primary index in employee table employee number is the qpa column unique primary okay but in employee table the department number is no pa girl okay so we are going to join based on the department id so you this department id with this is department right so one upa one no primary index okay so if you if you are going to join this how it will be like you have uh the data
164:30 - 165:00 like this right so you have the 10 10 and 20 here 10 and 20 here okay so here i have 30 and 40 here i am 30 and so 30 is available here right department id 30 is available here so this employee table will make either it can handle two way one is it can move the small table into the school space and they can do the joints okay otherwise
165:00 - 165:30 it can handle whatever the data is for joining this 10 is needed right so for example here 10 is available 10 is available so it will make department name is sales correct for this one and 20 is available 20 is available it will make market but here if you look at 10 is not available here okay so what teradata will do whatever it is not available it will move this data to spool space this space so here it will make uh this this record okay 10 and 20 to spool
165:30 - 166:00 space and also this this record right 10 so 10 from here so 2 and 10 it will be moved to this pool space okay so then it will make 10 and 10 here then 20 and 20 here and the 10 this 10 is available on the local lab it will do the join okay so this is what it will make the join conditions this is this is like a second joint shutter okay one table will be moved to spool space the third joint strategy is
166:00 - 166:30 so here we are going to see this department table and manage the table so both are like low pa column okay so here i have uh manager table in department table i have manager manager id something manager employee id and manager table i do have like manager number okay if you look at here here i have one and four here i have three and four okay so four will get managed right four will get
166:30 - 167:00 joined but three it will it is not available so here three is available here okay so here one is available here so five is five is available here but one is not available one is available here so what it will do in this case teradata redistribute the both the tables okay it will make it will swap it into this pool space redistribute so whatever the date needs it will redistribute okay based on that it will collect
167:00 - 167:30 yeah all those data school spaces and then it will make the joints okay so based on that it will make the joints so it will bring for example this is the first table this is the second table it will make three and five here three and five here so the same way one and four year one and four here then it will make the joints so this is what it will do on the third strategy the fourth strategy is small table and big table okay so if i have one million record in the table and some 10 records in the other table so
167:30 - 168:00 it's very comparatively very less okay 10 10 means 100 means it's very 1 million when compared to 100 is very less so what it will do is whichever is having like very less stable right it will make the less table into all the amp it will redistribute the teradata will duplicate the smaller table across all them then it will all the data will be on the same local amp right then it will make it joints so whatever for example 20 and 10 and 20
168:00 - 168:30 10 and 90 and all the ambit will make the whole space bring the data into a smooth space and it will it will do the joints okay so this is what fourth joint strategy next to giant strategy is nested giant this is what we will use in real time okay mister the joints like uh okay like in real time how we should take care of this is what we are going to take care of we are using the primary index right indexing column yes so based on that
168:30 - 169:00 so normally if you are joining based on the uh up like internally or uh happening uh internally right yeah internally so you are getting school space issue right so if you are if you are not joining properly then you may get a school space issue it will it will bring all the data into a small space redistribute
169:00 - 169:30 will not happen so you are getting some so this is also bringing the data into school space and you are doing some calculations arithmetic calculations you are doing some sorting or testing something then school space will become it will lock away completely then it will get failure so next one we are going to join so for example this is employee table this is department table this is nested child okay so here in employee table employee id will be upa column okay
169:30 - 170:00 here in employee table department id is non-unique secondary index we are creating this also index column this is indexed column so since these are indexed in column right so we are going to join with one more table called departments table this is also department table so we are going to join with inner join right inner join department table department and also we are going to use some filter conditions okay so apart from the join condition we
170:00 - 170:30 are using some filter conditions so here we are using filter condition is department id equal to 10 okay what it will do that there are data so only 10 is needed right so it will redistribute from this 10 to this amp also okay so why because it is going anyhow it is going to limit only for 10 only so other record it no need to have right so it will bring the pin into spool space and it will make the join
170:30 - 171:00 so that is why if you are if you want to restrict more data for the giants restricted so the school space will get it will not get more space okay and also in this case the department id is again indexing carbon non-unique secondary index so indexing column with indexing column it will make easily the joints okay so that is the very precious method which is available in teradata this is called nested join
171:00 - 171:30 so if you are making any wear class apart from joint condition it will go for nested join then we have product okay product join so cartesian product if you don't use the condition right joint condition we are using joint table okay so a table comma b table inner join b table but you are not using join condition you are using some other condition here
171:30 - 172:00 you are using select e dot e dot emp dda department from the employee table comma department table but we haven't give employee dot department id something like we need to give right we haven't give some other class we have given so what to do if you don't use condition right join condition it will go for cartesian product so it will take every record in the first table it will join with
172:00 - 172:30 one recording then again second record it will go for all the record third record if you have a 10 record here send record there then it will make 10 into 10 100 so this is what it will make the product join cartesian product so this is what cartesian product even here we are not going for something like bad class also we are not going so bad class also missing so that is why it will go for some cartesian product okay and we have exclusive
172:30 - 173:00 date like something yeah something very emp like yeah something we are giving it somewhere class but here we haven't give join condition also we haven't given right but simply we have one more class it will go for cartesian product and then it will go for somewhere close after the result it will go for rare class but below even we haven't give the bear class simply we have given a table comma b table that's all
173:00 - 173:30 okay somewhere class also it will restrict some data but if you don't use our class then it will cost more spools yeah that is what cartesian product it will cause lot of spool space it should not happen should not happen okay so to avoid that cartesian product always normally you have to check whether it is going for product join okay and one more giant type is exclusion joint okay exclusion joint exclusion join is nothing but not in condition you will
173:30 - 174:00 make whenever you are going for not in condition first it will execute this one whatever the data it will it will avoid that particular record and also we are using that wire class also so this is called exclusion join something like uh it will not go for the not in not in condition so this is like exclusion okay then has joined hashgraph is nothing but uh by default data itself it will take the take care about the hash join okay we are not going to do it
174:00 - 174:30 okay so some hash algorithm it will create for example if you consider these two table employee table and via the department table in employee table employee id will be a upi in department stable department id will be a uph okay here i have 10 here i have 20. you are going for employee table inner join department table you are making on emp equal to manager employee so you are not going to join with uh
174:30 - 175:00 this kind of upi column to pay something right yes so it doesn't have so if you are going for inner join so what it will do the hash join process is square the smaller table is shorted by the row has and duplicated on the every
175:00 - 175:30 amp table so whatever the table is small table the teradata will make this sorting parting algorithm and i need only for inner join right inner join only 10 is is there sorry manager emp 70 is there only 70 and 70 it will make the join okay other record will it will get filtered out right now
175:30 - 176:00 so it will make the algorithm like uh starting algorithm but it will it will make the joints so these are the join strategies we have in teradata okay so normally it will go for uh merging and nested genome so but anyhow we are not going to do anything on the joint strategy okay so teradata will take care so we will see some sql queries like uh in a faster way like something they are not going to execute one by one
176:00 - 176:30 we will see something like uh what are the queries we have tables where table name like i'm going for some whether i'm going to check whether i have with a customer table data page okay i'm ppc checking table so our table name you and i can restrict the database
176:30 - 177:00 so you have the table name called customer right in retail so select start from retail dot customer [Music] so you have two records so i have something like a customer table because country a okay it has country id and i have table called country underscore join i think so country table okay
177:00 - 177:30 here i have the country id 200 okay here so in customer table i have like something like 200 bit okay so i'm going to join this table so you have one more table ct table so from ct you are going to join with other table so you have a customer join first table is customer join so i'll be taking this table this table we will see now joins okay so how to make the change so here it might be the customer id will
177:30 - 178:00 be a primary indexed column and in this table country will be the primary index column okay primary text column so if i want to join these two table so common column is country right so if i want to do the inner join how will you do the inner join it's very simple so i'm going to do inner chime so i will check what all the columns are there in the table right first of all how will you do the inner
178:00 - 178:30 join the join is nothing but commodore card between both the tables correct yes so here i will go for select column names from first table so first table is customer underscore join correct customer this table i will make it like a this
178:30 - 179:00 stable i will make it will be okay so then on condition on a dot country okay so the common column is c dot country d equal to b dot country [Music] so this is what you have to make then you have to go for a dot customer id i'm ready so a dot customer name so whatever the column you have you have to select it a dot mobile number mov underscore number
179:00 - 179:30 a dot email comma a dot country id right country underscore id comma so another table is b b dot country name okay so continuing so you have to select it in this way this is called inner join so you have six record here so you are getting five record only so one record is getting filtered out so what type of join it will make what type of giant strategy it will make if
179:30 - 180:00 you if you see here so which is the uh do we have up column yes you know the customer id is the upa column okay but here we are making country id this no upa correct so in this table we are making this will be a country country is the primary index okay country is the primary index not uh upa column so you are joining pa column with no pa column okay that is what it will make design so it will uh it will make the redis
180:00 - 180:30 efficient the spool space okay so how it will be bringing the data so based on that it will do the redistribution but since we have very less value of data so it will bring that easily this is what will happen if this is like inner joint if you are bringing so inner join you know right so common record both the tables if it is not matching it will not fish so that is what the inner join will happen these are other tables okay what we have the
180:30 - 181:00 database we have the same record so here i have uh record here so if 200 is 200 matching you will get record as india 204 204 is matching you will get uk 202 it's like usa two not three single matching records yeah two not five it will not come in the inner join it will get filtered out why because 2.5 is not available here if you are doing left outer join so left over all the record from the left table right
181:00 - 181:30 yes matching record from the right table if it is not matching you will get null value on the right hand side and also left right outer join all the record from the right table right yes and then matching the cut from the left side is not matching you'll get okay so in this scenario so for example i want to find out in which country i do not have my customer and i can go for right outer join under one more condition customer id is
181:30 - 182:00 null then we'll be getting this record alone this way you can find that so the country where no employees are working same way here also i can go for some bad class here where so b dot country name okay b not country name equal to india i'm making okay india so this is like uh nested giant right so only india so how many of them are india it will bring the data this is
182:00 - 182:30 what two members are from india so it will go for nested giant condition now finger join if you want to use left outer join you can use leftover region you have to join i'm removing this wire clasp okay where plus as you are selecting it it will give you that okay 2.5 is a null on the right hand side right so if you are if you are making like right outer join right outer join right join or right outer side so this is what it will make so you will get a
182:30 - 183:00 null value on the left hand side sign i do not have anybody any customers so why you are getting null value here if i want to get the customer id then i have to go for b dot country d from the b table i can get the country id okay then which country it's from okay if you want to get only this country then i have to go for okay in which country i do not have employee then i can go for where
183:00 - 183:30 a dot customer id is is null okay you will get only that record so you will not select these columns you will not select these columns only these column you will select so you will find that okay in this country only i do not have any customer right so this is what you can go for right now design or leftover load or child any giant you can go and then you can
183:30 - 184:00 bring the data if you have like three tables or four tables how will you make the joints for example you have one table called customer underscore customer join underscore free rate for underscore giant greek customer underscore jointly you have one table called here it has city not country then you have i think you have a city table select start from retail dot
184:00 - 184:30 city there are some city id right city and you have the country these three table you have to join these three tables you have to join so here city id city name country right so in this table you have a country table you have to find out how many of them are from india how will you make from this table so first of all three table joint right so the same way you would write select column name
184:30 - 185:00 from first table first table is this one right inner join whatever the join you want to have you can use a joint type inner join second table ct is the second table right this table and this table here the common column called city id we are making this as a this will be b so on first you have to change with two tables on a dot underscore equal to b dot underscore
185:00 - 185:30 okay again you have to do inner join [Music] [Music] dot [Music]
185:30 - 186:00 here whatever the column you want to have you can select it whatever the column you want to have you can select it so a dot customer id email a dot city right comma b dot city name what the columns this one will be country not city city
186:00 - 186:30 we have already joined [Music] right this will be city okay this is what china we want to make only for india and after this you have to go for that class [Music] [Music] so this is what you can say for example you are having one more how many
186:30 - 187:00 customers are from chennai okay under condition and [Music] the city name right city name is there in b table [Music] something you are making you are getting some is a giant condition you are getting some value but sometimes you are not getting value for example here you made it like usa and here it will not come why because usa there is no city condition right
187:00 - 187:30 if you are running giant if you are not getting any record or you are not getting corresponding value how to back bracket so first of all you have to run like this you remove condition you run it up you check whether the data is coming or not yes it's coming then you can identify okay so in usa i have only city name called viewer that is why i'm not getting that for
187:30 - 188:00 channeling or you can check whether i have uh chennai whether i have chennai in the city so i can run it yes i have the data for chennai but it is there in country in india you are making country name equal to usa that is why you are not getting record so you have to you have to backtrack in this way only if you are running any giant very complex inquiry if you are not getting any value you have to run it this way and then you have to find out
188:00 - 188:30 so i have table called employee okay and you have table called employee if i want to find out what is the maximum salary for for the employee then i have to run it in this way right maximum salary i'm running yes then 50 000 is the maximum color i want to find out who is getting this maximum so can i understand this way yes we can select name this will not run the
188:30 - 189:00 i want to find out who is getting this maximum salary from running this it will not execute why because this is aggregate function select name from so i want to find out who is getting maximum challenge right so i cannot go directly in this way name comma max of salary i cannot go why because remember this if you are selecting a non-aggregate column with aggregate column you will get error so for this you have to go for the sub queries
189:00 - 189:30 sub query is nothing but we have three different sub query one is like uh single row multi row multi column okay this one will be in the query this will be outer coin this select will be in the query so for example you are going for select okay inside one more circuit already so this will be inner curry this will be outer query so in mercury will be executed first the resultant of interquery will be supplied to the outer core okay so normally we will run like single
189:30 - 190:00 rows of query what is that single rows of query the inner query rate should always return only one row that is called single rows of query for example if i am running this for a what is the max of salary i am getting this value 50 000 right 50 000 i'm making here select star from employee where salary equal to 50 000 then i will be getting who is getting that 50 000 right okay this person only is getting maximum salary you cannot make it like hardcoded value
190:00 - 190:30 right instead of this you are going for support here so if you are using this entire query here then it will be a single row sub query single row sub query means this inner query should always return only one row only one card okay only one row okay then it will give you like the same result whenever that data is getting changed also tomorrow 51 000 is the highest salary means that person will bring up
190:30 - 191:00 okay multi-row sub query means the inner query is returning more than one row here for example the year-wise i want to fetch maximum challenge select start from employee have some data now okay whatever the data we have it is available so now i am going to check what is the maximum salary for the table right who is getting maximum salary okay so 85 000 this person is getting maximum salary
191:00 - 191:30 rika is getting maximum seller then select max of salary group by department idea making okay depth underscore id each department i'm selecting maximum salary right what is that department number department number i am selecting this uh this particular query i'm executing this query so so four department i have one one record is coming right i'm using this one and i'm using each and my aim is to fetch
191:30 - 192:00 every department who is getting highest salary that is what i do want to have it so if i'm executing this this query will not work why because if you are using equal here okay so equal or not equal less than greater than okay any operator here the subquery teradata or any database will assume that this inner query is single row sub query if you are using any of the relational operator
192:00 - 192:30 teradata will assume that this is a single row but how many rows it is giving multi row right this indirect is returning multi row so that is why it will not execute so you have to put in so you will be getting salary in okay you are getting salary so you need to get four record limits for department you have four record only you have to get correct so you are getting 10th department okay 65 000 the highest 20 at the department the 85 also is
192:30 - 193:00 coming and the 70 is also coming why it should come only 85 000 right same department so my aim is to fetch each and every department should come okay so here i am using select max of selectorate from employee group by department number so each and every group you will be getting this particular record right this particular salary so what is authority select star from employee where sally in so whoever is
193:00 - 193:30 getting the salary it will be coming so that is why 20th department two of them is getting the 85 also getting 170 70 000 also getting satisfied 85 is the maximum salary for 20th department 70 000 is maximum salary for 40th department that is why is returned in the same department two employees are with you so why because this salary someone is getting in another department since you are using in class
193:30 - 194:00 it is watching that okay to avoid this kind of scenario you have to go for multiple column got it yeah some other name on lima oh this one instead of no no not other name instead of fetching the maximum salary alone in the inner query you are selecting only max of salary alone right instead of fetching maximum salary alone you get along with the department so get department also department
194:00 - 194:30 select start from employee bear department number comma salary in select department number come on max of salary only one record is coming now this is why multi columns occur so multi columns of query also very important so you know how to find a duplicate record in a table you know right distinct will not give a duplicate record this employee number you have to do group by
194:30 - 195:00 okay you take employee number as group by you make the count you make the count employee number come account so i said right so non aggregate column you should not select from count but if you are using this column in group by you can select it if you are selecting it so if it has any duplicate okay one not one two times is there then you will getting some two or three something right if you want to filter out
195:00 - 195:30 a record from a group by and then you have to go for having class that aggregate function is nothing but for example select max of salary from employee it will give you what is the maximum challenge minimum salary sum of survey everything will give you right okay but i want to get it who is getting this maximum salary then i cannot go for select first name comma max of sorry i cannot go i cannot go for this way first name
195:30 - 196:00 comma max of salary i cannot go why because this is not aggregate this is aggregate column you cannot select it so non-aggregate column you cannot select it that is why it's telling the error okay so i want to get it then then only we are going for i want to find out how many employees are there in each department how many employees are there in each department select count of star flight counter star from
196:00 - 196:30 employee employee so if you are making count of star employee it will give you total number six employees are there totally out of which each and every department how many employees are there then group by group by department number it will give you but i may not know i may not know what is this department what is this department right so for that i can select here department number so this is not aggregate column this is
196:30 - 197:00 aggregate column you may ask question how will you select not agreed with the aggregate column it should be in group by if it is there in group by then we can go further yeah then we can go for okay it is giving right and it is giving the value so department id comma okay so in which department we are having more than two more than one employee
197:00 - 197:30 [Music] greater than one am i correct it will not work out why because this is group by resultant correct if you want to filter out group by resultant then you have to go for having class instead of there where means a physical column should be there the account is derived so you have to go for having having is
197:30 - 198:00 nothing but whenever you are grouped by resultant of the group point if you want to pull it out if you want to filter okay clear yes so that is why we are making here okay so we are making right so where is it yes here we are finding employee number employees or department-wise count we are finding something like employee number so i want
198:00 - 198:30 to find out whether we do have duplicate on the employee number how will you find i want to find out whether duplicate is there in the employee number so you assume that you have one million record one million record is there here okay one million ten lakhs record is there okay if you are making distinct of employee number if all the employee number is distinct you will get all the number right but i want to find out whether we have duplicate the employee number or not can you make
198:30 - 199:00 the apply number group by select count of start [Music] so this is what you have to make then imply number if you are using group by you can circuit here if you are not getting any value that
199:00 - 199:30 means no duplicate okay we are not getting any value you are you are not having any duplicate okay so that is why you are having this one the less function normally we will use to handle null value you can use it collects of any number of argument okay argument one more given two comma argument three comma any number of argument you can use it okay are given to three like this you can use it so what it correlates function will do
199:30 - 200:00 it will return first northwest value it will return first northwest value for example if the argument one itself marketable value it will return that value select coins of 5 comma 6 comma 8 i'm executing i want to execute the statement it will give you 5 why because 5 itself not null value if it is null if it is null value is coming then it will give you six cells or null
200:00 - 200:30 and it will give you eight so everything is null then it will give you null only so coil is function it will return first not null value got it okay in real time yeah in real time when we go for this chorus function so for example i have something like uh i'm collecting okay home phone number from here i want to collect either home phone number or office phone some form i'm asking phone
200:30 - 201:00 number enter your home phone number office phone number cell number i'm asking these three someone will write home phone number someone will write office phone number someone will write cell phone number someone will not write anything so my first preference is you check whether home number is present okay if phone number they have given you take that phone number if they didn't give that phone number so it will be blank that column will be blank right then you take up this phone number
201:00 - 201:30 okay if that column was a blank you take the cell phone number if everything is planned by default you will get no phone number so this is what that coil is function will be used so something like a null value to handle with null value this correlates function will be used in real time okay
201:30 - 202:00 you