Natural Language Processing Foundations

Estimated read time: 1:20

    AI is evolving every day. Don't fall behind.

    Join 50,000+ readers learning how to use AI in just 5 minutes daily.

    Completely free, unsubscribe at any time.

    Summary

    The video, presented by Asia Open RAN Academy, highlights the integration of Natural Language Processing (NLP) in the field of telecommunications, specifically focusing on its relationship with Artificial Intelligence (AI). The discussion encompasses various aspects of NLP including its application, use cases, key technical processes such as tokenization and sentiment analysis, and its use in telecommunication scenarios like chatbots, network operations, and cybersecurity. The video further explores the challenges and intricacies of NLP, highlighting its convergence with computational linguistics, machine learning, and deep learning.

      Highlights

      • NLP is deeply intertwined with Artificial Intelligence, focusing on human language processing and generation 🤖.
      • Understanding and generating natural language involve complex processes like tokenization, semantics, and syntax 🌐.
      • The application of chatbots and virtual assistants illustrates NLP's practical use in enhancing communication in telecom 🗨️.
      • NLP faces challenges such as interpreting idiomatic expressions and adapting to new language trends 🤔.
      • Its use in network operations could revolutionize network management and cybersecurity 🌍.
      • NLP continues to evolve, integrating with other AI subfields like machine learning and deep learning for improved applications 🔄.

      Key Takeaways

      • Artificial Intelligence is pivotal in advancing Natural Language Processing, enhancing its application in telecommunications 📶.
      • NLP involves both Natural Language Understanding (NLU) and Natural Language Generation (NLG) to process and generate human language 🔄.
      • Chatbots and virtual assistants are prime examples of NLP applications in telecommunications 🤖.
      • Challenges such as ambiguity, language evolution, and figurative language still pose significant hurdles for NLP development 🚧.
      • With continuous advancements, NLP is becoming indispensable in enhancing human-machine interaction and network operations 💡.

      Overview

      The presentation explores the complex world of Natural Language Processing (NLP), a subfield of Artificial Intelligence (AI) that deals with understanding and generating human language. It highlights how NLP leverages AI techniques to overcome challenges in processing and comprehending language, drawing on computational linguistics and machine learning. The discussion delves into the dual aspects of NLP: natural language understanding and natural language generation, which form the backbone of this technology.

        Practical applications of NLP in telecommunications are exemplified through chatbots and virtual assistants, where it aids in improving human-machine interactions. Through examples, the presentation underscores the role of NLP in enhancing customer service, network management, and the potential of improving cybersecurity measures. Challenges like ambiguity, language evolution, and understanding figures of speech represent hurdles that require ongoing advancements.

          NLP is showcased as an evolving technology that promises to revolutionize communication and operational efficiencies in telecommunications. By integrating computational linguistics with AI applications, NLP stands as a cornerstone in developing smarter, more intuitive tools that enhance both user experience and complex network operations. The talk encourages anticipation for the future of NLP in shaping how machines understand and interact with human languages.

            Natural Language Processing Foundations Transcription

            • 00:00 - 00:30 [Music] now for my uh presentation today um here is my module objectives um I'll be focusing more on the on the application or on the um uh artificial intelligence
            • 00:30 - 01:00 version or side of natural language processing this is because natural language processing also is overlapping to Linguistics so if you are also a bit of a if you are a major in English or if you are a major in any of the communications degree some of the terminologies that you would be encountering here would be familiar to you I mean some of them has already been introduced to you by Dr ptic such just the course of what is more fall ol ology
            • 01:00 - 01:30 what is semantics what is syntax so these are even um topics that we would also be discussing here now having said that here are the um uh topics that I will be talking about so first I'll be discussing you what is natural langage processing in relationship to artificial intelligence identify its different use cases in MLP differentiate the two portions of NLP which is nlg and nlu so there is natural language generation and natural
            • 01:30 - 02:00 language understanding from there I also explain to you the pipeline on NLP because you'll notice that um on the science itself um you need the way or you need to establish a process on how you would be able to um break down together a simple sentence um and then make the computer understand that sentence right so I'll be describing to you here um the high level pipeline or the different steps on how a computer
            • 02:00 - 02:30 model or an an algorithm would go through in order to make sense of what human speech is okay then finally I'll describe here what is a generative PR train transformer and some of you may have already heard of this one lately this is the GP right so this is where your chat GPT your Google bar will be coming in and although it is a a monster on its own no I will also try my best to briefly discuss this to you so that you also understand
            • 02:30 - 03:00 how it works on the back end and how it is also related to AI in general I'll enumerate also the different challenges faced by NLP because similar to other Technologies specific also to artificial intelligence and to machine learning it would also is um faces some challenges and complexities no specifically more on how it handles and generates human language okay and finally briefly discuss to you or this describe the
            • 03:00 - 03:30 applications of NLP to telecommunications and open run in general so you might be wondering hey we are in open run we are discussing or we are actually expecting more on the Telecommunications aspect of open run but is there a relationship between NLP and open run how can we apply NLP machine learning and AI to telecommunications so here I'll be proposing to you some three topics or some three key aspects that we might
            • 03:30 - 04:00 want to consider okay now let's proceed now what is natural language processing so this one was described to you a while ago also I'll give it I'll give to you another definition that's what Dr Pik mentioned a while ago MLP like AI has several definitions but for this specific definition NLP is a subfield of AI that deals with enabling computers to understand manipulate and even generate human language now um human language on
            • 04:00 - 04:30 its own is very complex I mean there are different ways on how you can express speech you there are different ways on how voice is um expressed and by that meaning there is what you would call active and passive voice there is one more than one way to skin aat if you would refer to that and that in itself is already also a um a complex sentence for for a computer knowledge to
            • 04:30 - 05:00 understand there is more than one way to skin a cat if you are an English speaker or if you are familiar with idioms you would know that definitely this is an enomatic expression so how is a computer able to understand an enomatic expression we literally do not skin a cat right but it means something else so there is context being involved now having said that NLP utilizes Advanced Technologies including computational Linguistics machine learning deep learning model to process human language
            • 05:00 - 05:30 in various forms from text to voice recordings um in terms of the science itself no there is also another field in um computer science called computational Linguistics right um it is an overlap between Linguistics and computer science and computational Linguistics right now having said that um the goal of NLP
            • 05:30 - 06:00 based Solutions is not just to grasp the meaning of individual words meaning you do not just take or parse out individual words from a given sentence but also to under understand and to comprehend the entire sentence as a whole and even to understand the context of it and not only that but you also have to consider its intention and even it's underlying emotion or sentiment that is why um you can say that an LP is an overlap between
            • 06:00 - 06:30 computer science and Linguistics okay and to be more specific than that having already mentioned um computational Linguistics you can say that NLP is more of like an applied science of computational linguistics it is a field on its own that is um tackling more on the example if if computational Linguistics deals more on say the historical aspect of a certain language how has how it has been structured out
            • 06:30 - 07:00 in terms of its syntax and semantics how um a given word is morphed and used already across the years NLP deals more on how these theories can be applied today today um today today um use or even to use cases so you can see that another another simple analogy here is that if computational Linguistics is more on like the computer science or pure Sciences NLP is more of like an Li science so this is part of like already
            • 07:00 - 07:30 the application of what the theory is talking about or how we to make use of it okay now NLP and its different subfields remember that in the previous sessions that we have conducted to you before NLP is also a subset of EI it's it's one of the many branches of EI in fact and you would see here that um although it is considered as a branch it also over overlaps with other subfields
            • 07:30 - 08:00 meaning that um NLP can be applied using machine learning and in fact most of them do GPT uses um which is a natural language processing use case it is a chat bot for instance no and it uses deep learning for it to gain understanding on how on the simple text that you type in its text field right but other than that other systems and other applications can can use NLP in
            • 08:00 - 08:30 conjunction with for example speech recognition and even computer vision if you have or if you are familiar already with programming it is possible for you to use NLP to make sense of what a computer sees through a webcam for instance so you can in fact inter late them together right now NLP much like other subfields of AI often overlap in other sub Fields such as machine learning speech
            • 08:30 - 09:00 recognition and even computer vision but if we're going to look at the practicability of um NLP nowadays normally we would consider NLP to be more Associated to written text and its comprehension understanding and generation that's why if you're if we're going to look at it again in the form of another V diagram so this is one great way in how we can express Anum is a subfield of AI right and it also overlaps with machine learning and de
            • 09:00 - 09:30 most of which of which deals with managing and manipulating data in the form of written texts but again the caveat here is that it is not always limited to written text it can even be through voice speech to text is also an example of where NLP can be applied to okay now um in terms of its use cases no um I was able to find out three highlevel um categories on on where and
            • 09:30 - 10:00 how we can apply NLP and these are as follows first one the very obvious one which is communication and interaction um we can use NLP as a means to enhance communication and and user interaction not just through human to machine but also human to human or even machine to machine interaction this is where majority of um use cases in cap of NLP would fall in the form of your virtual assistance in your chat Bots and
            • 10:00 - 10:30 even machine translation more of more of these ones later when I go into detail another use case for this is content creation and automation so if communication and interaction mostly deals with analyzing what a user has entered into see a given text or um through a voice recording okay content creation and auto automation deals with creating something based on that given
            • 10:30 - 11:00 text so this is where your text filtering or would come in right for um an example use case here would be suppose say you can use NLP to gain understanding of a given email if that email contains certain words and that's and those words are considered to be blacklisted so that's already an example of email filtering or web filtering predictive text is also an example so this so going to turn on predictive text
            • 11:00 - 11:30 into your on your smartphones and you annoyingly sometimes have to disable it especially if you have a if you're trying to type say Filipino and um the type already printed or entered text would be in English now sometimes it would also cause miscommunication predictive text can arguably be already considered as an application of NLP right because the computer or the machine itself is trying to make sense of what the word you are
            • 11:30 - 12:00 trying to type to just buy one or two characters alone right I mean we are even just limiting this to um languages that use the um Latin alphabet what more if say they are using or for languages that use their other kinds of alphabets like for instance Asian countries such as in the likes of China or even in Japan if you know how to write Japanese you'll know how complex it is all the more if you know how to write Chinese
            • 12:00 - 12:30 right this is also similar or even in Korea or even In tha right so it really depends right now and finally and I would arguably say this is one of the use cases that is a bit a bit more of of a controversial aspect content creation and generation and by that I mean through text alone it's possible for you to generate images and um you just type a simple word phrase
            • 12:30 - 13:00 um generate an image for me that contains this this and that and you'll notice that despite the image to be um monstrously hideous to say the least no at least the machine was able to comprehend what you are trying to type using just STS alone okay and lastly um the last use case for NLP would be information retrieval and Analysis there are also um uh subfields or um use cases where we can use NLP some of which would
            • 13:00 - 13:30 include search engines okay text summarization and finally sentiment analysis so so the first two May are May are you're already be Familiar of such as search engines and text summarization in fact you're using this one right now if you're trying to Google something in the internet right for summarization this is most likely um a tool used in the academ there are already available applications right right now where in you just type in a
            • 13:30 - 14:00 simple text and that text um you ask the uh model to summarize it or you ask the AI application to summarize it it would be able to provide to you a shorter version of that text right and finally the the the last use case sentiment AR this one is not yet not as um uh frequent or not as uh popular of a use case but it is normally used in businesses if you want to gain
            • 14:00 - 14:30 understanding of a given customer feedback or customer um Insight based on the reviews that they let that they leave on the given website you're able to quantify that given sentence and give it with the score so that is sentiment analysis you're trying to understand of the overall sentiment of a given review or a given text you're able to quantify it give it a certain number more of this one later in okay
            • 14:30 - 15:00 now NLP often involves also several tasks okay so here I was able to highlight three first one understanding so through NLP it equips computers and machines with the ability to process and comprehend the meaning of human language including spoken and written text okay so meaning um through NLP a machine is able to literally understand and by that mean Mak sense of
            • 15:00 - 15:30 what these words mean through its algorithms through its processes inside right and in fact when I um how would a machine is able to break down a certain paragraph for instance a paragraph of text and makes sense on how um that paragraph is structured out or its overall field um I'll be discussing this one later when I through the NLP
            • 15:30 - 16:00 pipeline next one manipulation it allows computers to or machines to modify and work with human language in various ways this is where the use case of an of summarization would come in like text summarization where in the computer is able to understand that huh um despite this uh four plus paragraphs of words that this user is able to enter into my text field I'm able to summarize it in
            • 16:00 - 16:30 say one to two blurbs or one to two sentences instead so that is a form of manipulation okay and lastly generation so it empowers computers to create humanik texts or use text to create other forms of media or content such as documents images and speech okay now more of these three um tasks no first off in understanding I've already mentioned this a while ago sentiment analysis so this involves identifying
            • 16:30 - 17:00 the emotional tone of a message whether positive negative or neutral meaning say suppose me as a um as a human being um said something like this is fantastic what an absolute bomb I left this one say as a review right the system is able to understand that given statement and quantify it through the means of score right now sentiment
            • 17:00 - 17:30 analysis also is another Topic in itself um but in general normally a given statement after it has already been pre-processed and um understood by the model in a way you can use machine learning to implement sentinent analysis give it a score and normally this score would range between zero and one no as a score reaches um a a value that is near or close to one it is more likely Poss positive it is if it is scores say more
            • 17:30 - 18:00 of like a zero sorry the range is between negative 1 and one no so it it actually varies based on the model that you use but my point here is that you assign a score or a given value whether is it positive neutral or negative in my example here um it is overwhelmingly positive because the value is close to one if I say get a statement that is this sucks um I do not want to use this product ever again then the chances are it would would receive a negative score
            • 18:00 - 18:30 right if it's a value that approaches say zero for in instance it is somewhat neutral how the system is able to understand that it uses the NLP pipeline so later on I'll explain next feature or next task is name entity recognition so this is a feature that recognizes and classifies named entities within text and these entities can be people organizations locations places pet names
            • 18:30 - 19:00 proper nouns Etc okay so one example of this one is that suppose I typed a model or a I I entered a sentence to a given model an Neer model at that um the the model is able to distinguish the different important um names within that given statement and not only that categorize them based on what they are right so for inance here Tesla last month they announced it and revamped it
            • 19:00 - 19:30 stop selling model y electric car it's able to get or retrieve Tesla last Monday and model y electric card to be organization date and product respectively so this is one task and finally we have the more obvious one is machine translation this is converting texts from one language to another while preserving its mean so one thing fascinating about the human language is that um there is definitely more than
            • 19:30 - 20:00 one way on how you can express something and all the more if you try to express it in a different language right so in NLP machine translation ensures that say I try to speak Japanese hajim um the machine is able to properly translate it into English and say that nice to meet you my name John right so so you notice that the use
            • 20:00 - 20:30 cases of NLP although it's a new term that the the use cases and the application that it already has right now it's already there right it's just it's it's just a matter of getting more familiar or getting acquainted to um its specific use case in a way moving on um another task that we will involve is manipulation so one Fe of this is of course text summar summarization which
            • 20:30 - 21:00 um is condensing large pieces of text into shorter summaries while still retaining key information now you may have uh noticed also that um there are already tools right now wherein you can use like summarization so an example of which if you are familiar with quillbot may have heard of it um and here you can even use chat GP or Gemini Google Gemini
            • 21:00 - 21:30 um to summarize text right so that is one use case no another use case is of course the very obvious one is question answering um I was able to find a tool normally used for the academa it's called I think chat PDF so notice that the playword there um where in this application you get a PDF document of say a journal or a research paper you upload it into your website and and the model is able to immediately grasp the
            • 21:30 - 22:00 keywords or understand the overall content of say an 8 page Journal article and from there give you interaction in the form of a chatbot in other words you're able to ask the journal technically ask the journal questions on what is it about right so extracting answers to questions posed in natural language from a given data set so so but it doesn't
            • 22:00 - 22:30 have to be academic in nature it can even be just as simple as a chatbot wherein how does it generally work right you try to ask it with a questions say for instance here who discovered gravity we all know who this one is right but how is a machine able to understand this question well it can refer to a body of knowledge a document websites a compilation of um information and this is what you
            • 22:30 - 23:00 would call a corpus right in Latin this is um body in short or if you're going to put it into our context a body of knowledge which is then used by the model to immediately get answers from and from there provide an appropriate answer okay now another feature is paraphrasing wherein the machine is able to rest the meaning of a sentence or
            • 23:00 - 23:30 passage using different words or sentences so again Um this can also or majority of the use cases or tools that I was able to find here is inclining more on the academic side meaning that these are tools that can make academic work such as research reading um synthesis review of related literature much easier right and not only that you can even make fun or use it just for fun
            • 23:30 - 24:00 like for instance when when you're trying to say something like oh who yall messing around right now my homies um I'm not sure if you are familiar with this with this accent but this is a bit of an American right and from there the model is able to understand it right and properly give to you the meaning of what that sentence is right so what are you doing with your friends today so it's able to to paraphrase what I said into a
            • 24:00 - 24:30 different way or express it into a different way and lastly we have generation so here really the chatbot would shine so this is where simulating conversations with um humans such as us would definitely come in um generation or chat BS fall into generation this is because um the models themselves the gpts in fact are able to generate text
            • 24:30 - 25:00 based on a given prom right but you'll notice that um it's not purely for Generation alone it would also involve analysis and understanding another example for generation is machine writing okay so an example of which would be text prompts for a generative eii so meaning you prompt that AI model to say gen an image for this but if I
            • 25:00 - 25:30 were to make more a much more fitting example of say machine roading um writing you can use AI to generate code if you are a programmer or a developer so for instance you can just type a prompt here write a python code for displaying the hello um for displaying the text in uh hello world and from there the model is able to provide to you the corresponding syntax for that given instruction so in a way you are
            • 25:30 - 26:00 feeding instructions to a machine and it's able to generate source code or programs for you and lastly text to speech analysis it's a bit of a um in my opinion a bit of a questionable um movement for large companies such as the likes of Apple Google and Alexa and when a I really took its turn in in the early 2020s though you may have noticed that they went a bit on the back side right
            • 26:00 - 26:30 you may have noticed that it is Chap GPT who is taking the spotlight but in a way they are already there in the form of text to speech synthesis I mean just by saying uh Hey Siri no it's able to or the your phone if you have an iPhone or an Apple device it's able to immediately detect The Voice uh voice recognition and um send I I I I'm trying to also prevent this
            • 26:30 - 27:00 one also from saying this is because I have I have my Apple device and it would activate but still uh you get the point here no another example here is Alexa where you try to say hey Alexa and immediately the device would be able to detect your voice right so this is text to speech analysis it is able to convert my voice which is something that's audible or in audio form make sense of it translate it into text and from there trigger the machine to get my
            • 27:00 - 27:30 attention Okay so these are three of the hyen high level use cases that you would have but you may have noticed that all of them share some same functionalities and characteristics right this is where um the two major components of NLP would come in okay now um NLP as a study or as a science consist of two components natural language understanding and natural language generation so you may have noticed some overlaps between the
            • 27:30 - 28:00 two right now first off um natural language understanding or NL what is it this is a subset of NLP which uses syntactic and semantic analysis of text and speech to determine the meaning of a sentence it focuses more on machine reading comprehension or how the machine is able to understand and interpret the human language so um for my example here let's say well majority of us are using English so I'll use English as um as a
            • 28:00 - 28:30 language no let's take one simple word okay let's take say the word key in Filipino it is Su okay key it means a lot right I mean for us uh in the Filipino language the word key can be a noun which means suet it can also mean important okay it can also
            • 28:30 - 29:00 mean another thing in key um I'll show it to you right so in a way for example in this sentence I suppose anybody here who is familiar with DNB Dungeons and Dragons so uh that is a tabletop game so let's put this one into context for instance and say um you are in a campaign right now and this is what the dungeon master is saying so first off the key to the castle has been lost
            • 29:00 - 29:30 so you'll notice the word here key it is used as a noun right but then he goes on and says that the general played the key role in making the campaign a success so notice that the word key is still there but how is it being used is it still being used as a noun of course not it is already being used as an adjective to describe another noun which is R right and then he then further goes on and say
            • 29:30 - 30:00 that the Guardsman knocked the door King it shut so what does that even mean I mean if you're going to look at it from these three examples alone you're able to see and understand that a the word key here means a lot of things right and it is now the role of the model that of your o of your AI modu through mlu to make comprehension on the these three words and how they are used based on
            • 30:00 - 30:30 their context right so a bit of um Linguistics 101 here no so you may have seen here the key terms key here pun intended um syntactics or syntax and semantics okay so when we say or when we refer to syntax it refers to how a word is structured out right well in semantics you just refer to the underlying meaning of that given word by the way I'm not a linguist so that's
            • 30:30 - 31:00 my um caveat there but my point here just in this simple example the word key you need to understand how it is spelled okay that is where syntax would come in and how it is used in a sentence that is where semantics would come in so notice that just by these three simple examples Alone um through nlu makes sense of how the word is used in these three sentences and that is how NL is in general okay
            • 31:00 - 31:30 now the other component of natural language processing is nlg or natural language generation so this is a subset of NLP which involves producing a human language text response based on some data input so meaning input is still being needed and you'll notice how it overlaps with understanding because it involves the analysis of set input it focuses more on text planning sentence planning in final realization so let's
            • 31:30 - 32:00 take an example for instance suppose you use say chap gpp or um Gemini former Google bard you type it with a simple statement say write me a simple sentence right and from there you're able to see something like an ellipses wherein um the chatbot is perceived to be thinking of a good answer for your simple instruction but on the back of it it's already already going through the nlg
            • 32:00 - 32:30 process of immediately planning out um a given text or a given sentence for it um it's also able to identify the different words that it would be able to use and finally make it a realization wherein it would use a a reference say the Corpus or the body of knowledge contains in order to create a simple text so this is where en would come in and there of course there are a
            • 32:30 - 33:00 lot of nuances to this but you you get the point here of what energy is right now before we can move on further no um let's have a quick recap no um of what uh machine learning is about you'll know now that in NLP this is a subset of AI and we can actually use machine learning as a means of implementing natural language processing but for those of you who haven't yet been able to attend in the previous sessions so so I just quickly spun up these um slides that I
            • 33:00 - 33:30 was able to present a while ago so we know previously that AI this is a science that deals with um creating intelligent machines that is capable to mimic perceived human intelligence and a subset of it would be machine learning okay now machine learning this is a subset of AI that deals with the creation of models and this models depend on data and from that data make or um make other data or draw a conclusion out of it or to gain Insight
            • 33:30 - 34:00 out of it no without explicit programming this is unlike your procedural programming wherein you have a simple if then else it is very structured a machine learning uses a model and that model um through a series of mathematical and statistical computations able to give out a a result okay now a subset of the subset is deep learning so this is an example also of machine learning wherein you use neural
            • 34:00 - 34:30 networks to um make sense of the data and how the model is structured it is again to say an artificial neural network wherein you have several hidden layers before it reaches an end result now this is especially important for chat GPT or for GPT in general because GPT uses a transformation or a Transformer architecture okay which is
            • 34:30 - 35:00 um implementation of deep learning so more of that one later when I discuss what gpp is now because machine learning depends on data a workflow on data science and engineering is needed to ensure that the created model will produce the correct and desired result so my point here is that um is NLP somewhat also different to machine learning well it can be but most of the use cases we have nowadays it still involves the creation of an NLP model a
            • 35:00 - 35:30 language model that is able to get data from users in the form of written text so it still goes through the same workflow as how you would be able to S example create a linear regression um traffic Trend based on a table of values the only difference here is that the input is already human text but
            • 35:30 - 36:00 regardless of which it still uses the machine or you can use the machine life cycle model in order to implement MLP okay now having said that allow me now to introduce to you the NLP pipeline in AI so I say here pipeline it's because um data in the form of Tex no is considered to be unstructured data even
            • 36:00 - 36:30 though it is written in a paragraph form no it is written in text no it is treated by the machine to be unstructured data and by that meaning that it's just a blurb of text for it we might be able to make sense of it because hey we know what we we know what we wrote down but how is the machine or how is the model able to understand what we have written so this is where the pipeline would come in and the steps in
            • 36:30 - 37:00 general would consist of several steps now um based on my research I was able to find out that most of the um pipelines that are currently being used nowadays although here we have seven we only have five most important steps to take this is not to say that the other two can be dismissed no it just says here that most of the um references that I was able to get here
            • 37:00 - 37:30 refer to these five basic steps to be the foundation of an NLP pipeline meaning that this is what a pipeline would consist of right now in some sources these are the basic steps in NLP but again how the pipeline is implemented you can use machine learning model life cycle in order to process the given answer structured text okay so we have here seven General
            • 37:30 - 38:00 St I'll go through them one by one weekly first one sentence segmentation since this is already these are the questions that were um raised in your cahoot this morning no so first off sentence segmentation so as the name implies so this step divides the entire paragraph into different sentences for better understanding so um if I were to take English as an example we can use the period here to be as a delimiter between the different um texts found in
            • 38:00 - 38:30 a given paragraph So suppose I was able to take this example um paragraph from say Wikipedia by the way the steps I was able to find here one what I'm referring to is the general steps found in this source so I'll just give credit for credit is Drew here as well as the example that was given there you can look this one up um notice that I have here a series of paragraphs involved sentence segmentation as the name implies just means chopping down the
            • 38:30 - 39:00 entire paragraph into smaller TT now this would differ depending on how a sentence is broken in um a given language right so for us English speaking um and in fact some also Western and a few Eastern languages use the period as a means to delin a given sentence there are some languages that use other symbols for them to separate a
            • 39:00 - 39:30 given sentence and it doesn't have to even be a period it can even be a question mark or an exclamation point right if you say know how to speak Spanish or you know a bit on or some of the Spanish um languages say Spanish-speaking um countries such as Spain Brazil no uh Mexico for instance you may have noticed that in their exclam port or even in their question marks um they use not just the symbol
            • 39:30 - 40:00 for question mark or the exclamation point only normally they would also place it at the very beginning but in an upside down form right so that is also where sentence segmentation would come in in immediately telling the model that hey this is only the certain part of a certain sentence whether it is is it declarative um an inquiry or an exclamation okay but here we just use the period for
            • 40:00 - 40:30 for this example okay so after a sentence has already been broken down then the obvious step is to break down the sentence into smaller phrases take note I use here phrases and that more technical term for this are tokens so this is where the next step would come in tokenization so tokenization breaks the sentence into separate words for tokens this helps understand the context of the text even more when tokenizing
            • 40:30 - 41:00 the sentence it is broken into separate words or phrases based on the models set threshold depending on the model or on the language model that you want to refer a token can be within a sentence separated with spaces it can even be symbols it can even be the characters themselves right so comparing again English it is easy for us to understand that a word is also a token but take note that not always the case sometimes
            • 41:00 - 41:30 some tokens can even consist of several English words put together as one okay if I'm going to take here an example of another language say Japan or even China wherein each word can be tokenized into a specific symbol no then those words can be expressed as one word so this is especially nuanced in Chinese and in Japanese language for example the
            • 41:30 - 42:00 word Sushi if you are to translate this into um its kanji form in Japanese no it consists of two characters Su and she if you are going to take it from in the word itself Sushi we know that it is a food we we know that it is a Japanese dish but if you're going to break it or if you're going to tokenize it based on its character symbols soon and she you would get a different VI for
            • 42:00 - 42:30 it right so that is where um tokenization is also important for in you set a specific threshold for it is a token equivalent to one word is it equivalent to only letters is it equivalent to characters or phrases it still depends okay now after we have tokenized the given sentence in my example here no The Next Step would now involve stemming now stemming and the
            • 42:30 - 43:00 next step of it which is lemmatization are somewhat a bit of a confusing step I mean I for one I'm also confused when I first heard these two terms but here's here's how you would simplify it stemming helps in PR processing text it is a means to normalize words based on their base or root form in other words it helps to predict the parts of speech of each token okay now suppose now I have a given text London okay so it's
            • 43:00 - 43:30 obvious [Music] um Bas form is also London so that is its stem right well actually no if you're going to look at it no you can if you're going to refer to stemming right you can even split London into two words L and Don which can even mean different things lan can mean the abbreviation of longitude and Don can mean to Da or to
            • 43:30 - 44:00 wear right in steming you just take that given token and just break it based on also not just the word itself but even the surrounding tokens in uh with it okay this token is then fed into a part of speech prediction model and it's able to identify what kind of of stem is it is it a proper noun is it a verb is it a
            • 44:00 - 44:30 conjunction okay a determinant okay so taking my example a while ago you then give each token an assigned um role in its sentence wherein it can then be given whether is it a proper noun a conjunction or an adverb okay you this is also the step wherein if there are some words that would say are um uh what you would call this
            • 44:30 - 45:00 um combined to other prefixes or suffixes no um stemming is able to cut out those prefixes or suffixes and just take the base form it so for example eating we know that um eating if if it goes through the stemming process it can split it into eat and in and then disregard in Al together we just take the E part okay so if stemming is more of like
            • 45:00 - 45:30 taking the base or root form lemmatization so um one thing to also consider here for stemming this also considers uh understanding the basic syntax of a given word we know how London is spelled right so in lemmatization it removes inflectional prefixes or suffixes and Returns the canonical word of uh the canonical form form of a word so if stemming is to the base or root form
            • 45:30 - 46:00 lemmatization is the canonical form so just to give you an example let's take the word again for London right in lemmatization it is incorrect to say that dilemma of London is done or even L right because we know that London its LMA is also London we know it to be a word or a name of a place right so it's correct LMA is lon let's
            • 46:00 - 46:30 take another word or another token say Capital lemmatization take note doesn't just mean taking out any prefixes or suffixes it can also mean the underlying or taking the again the semantics or how the context of that word is being used in the sentence so Capital as a word as different definitions right I mean it can even refer it can be a noun or it can even be an adverb but if we're going
            • 46:30 - 47:00 to take the context of the sentence on how on on the example sentence that I have shown a while ago we definitely do not want the Lemma of capital to be Associated to be the objective form of capital which can be the best or the top or supervisor so we do not want the Lemma of capital to be the adjective form instead we want it to be the noun form okay now notice also that um Capital as a
            • 47:00 - 47:30 noun can also mean different things right in a business context Capital can mean asset it can mean money but in the government or political context it can mean Metropolis or municipality so in lemmatization you also have to consider this you don't need the word or the context of capital to be referred to as a business terminology we want it in
            • 47:30 - 48:00 our sentence to be referred to the political definition of what the capital is meaning it is a city it is a metropolis a municipality it is a means of domination of a given group of people or houses so this is how lemmatization is being applied to it refers to the context as well on how the word is being used now um you'll notice that um up to
            • 48:00 - 48:30 this point as well majority of the steps that has gone through mostly involves taking the user input and essentially breaking it down into its smaller most constituent points now this is because you'll notice also that the process itself really does well um it makes sense that you have to first break it down into its small most unit um piece of data before you
            • 48:30 - 49:00 can even analyze it right and that is also where lemmatization would more more arguably be the the last point on where you break down this sentence into its smallest most understandable um units right or dilemma now the next step would now involve filtering out those small chunks or small tokens um this is now where stop word analysis would come in so this is the process of
            • 49:00 - 49:30 identifying and removing commonly used words with little meaning on their own now this is not to say that we remove everything completely but the purpose or this is just a way for the model or for the machine to make sense of which are the more important parts of your data set or of your sentence okay so the process of an analysis of um swe stop word analysis is removing most of them
            • 49:30 - 50:00 conjunctions prepositions and articles of language such as the in end of ET right now take note that not all of the stop words have to be filtered out again similar to how tokenization would come in a threshold is also set just to give you a classic example we consider
            • 50:00 - 50:30 our country alone do we see Philippines whenever whenever a country say um uses the word Philippines it is normally not referred to just as Philippines the Philippines normally so the there is an example of um a stop word but when you try to tokenize it you do not remove the fil you do not filter out the word the in the Philippines so that's an example or another example
            • 50:30 - 51:00 here would be um in some uh if we're going to look at it from an academic perspective you do not say just China okay you say the People's Republic of China right it's it's a full word or it's a full term the People's Republic of China okay some countries might even include the word of in their word itself right or some also countries or some
            • 51:00 - 51:30 some um models would in fact even filter out some words that are also significant to the name of a country like for instance South Korea North Korea you do not filter out the word so um south or north okay in States you don't not just filter out North and South Dakota you still include the word North and South right so it still depends but the point here and take note that just because um we
            • 51:30 - 52:00 filter out something doesn't mean that we remove it completely from the sentence it's just a means for the model to understand which are the more important parts of that given sentence so if I were to say take again my example here a while ago we now know that after this sentence has gone through SWA we now make sense now that most of the important words here are London Capital most popular city England
            • 52:00 - 52:30 United Kingdom okay so that is SWA um is being used the next one is dependency parsing so how does it work now after filtering out the most important words needed for that sentence the next step is now to find out how the words are associated with each other and the model would normally use a tree form or a tree a logical structure in order
            • 52:30 - 53:00 to correlate the different words and how they are being used in a sentence now algorithmically speaking or um coming from a algorithm's perspective computer science perspective no you take normally a root word that would serve as your parent word and from there Branch out to other words associate to it so if I were to take again my example a while ago let's say let's take any random word example is is take note is is also a
            • 53:00 - 53:30 stop word analysis but for this specific process because it has already been word or um considered to be a stop word no regardless it is still considered as part of the sentence itself but it is just a reference point and from there understand how is the word is associated to London how is it also Associated to Capital okay so we know that is is a verb of the word b of the the the the um
            • 53:30 - 54:00 nor the present tense of the B verb and it is connected to London in order to describe it as a proper noun right and it is also connected to Capital wherein capital is also a noun so London is capital so you'll notice there that it is trying to describe London to be another form of of a noun right which is a capital make sense so from there no
            • 54:00 - 54:30 apart from that you also need to um understand or through dependency parsing makes sense of how other stop words are also Associated to the main important words themselves so taking here the example of capital we know that it is connected to the word the and the word the end we know that the word the is the determiner and the end is the conjunction okay and since it and since the word end is also connected to another word no but it is mostly
            • 54:30 - 55:00 associated to end the city and the most popular city back to my sentence a while ago um it's not connecting to end anymore but it is more of referring to the word capital and the capital is connected by and connected to City so I I it's a bit confusing at first but my point here is that um through the pedy parsing you're now able to associate the different words in how they are now um connected with each
            • 55:00 - 55:30 other okay the last step is the part of speech tagging this is where your your language or your English proficiency if this is a if this is if we're trying to pass that an English language right or the English language itself this is where your part of speech will come in this is also the step wherein you assign now the different roles in a given word based Bas on how it is used in a sentence considering also the grammar okay so in a while back
            • 55:30 - 56:00 no part of more description here part of speech tagging is the process of assigning grammatical labels or tags to each word in a sentence right this is different from stemming so you'll notice that we know that London is a proper noun right but in part of swiech we know that um London is also your subject so adding that
            • 56:00 - 56:30 subject in London is where the part of speech tagging would come in so these tags correspond to the words function within the sentence if stemming more is related more on the what the word is about part of speech tagging is more of like how the word is used in the sentence so what what I tell you NLP is also a Linguistics 101 course okay so such as the noun the verb the adverb adjective and so on okay this is also
            • 56:30 - 57:00 important especially for this part because not all of the languages that we have nowadays follow the typical SV or subject verb object structure it still depends in fact right so in my example here a while ago using that same example we now know that the word London not only is it also a proper noun but it is also a sign as a subject okay we know now also that the word is is also
            • 57:00 - 57:30 assigned as a word to be which is an attribute Associated to Capital which is also a noun okay which is in a way also your object right so um again it also depends on the language that is being pared on right because in some examples right um I know a bit of Japanese here so I know that um the word or uh the structure of the Japanese um grammar or
            • 57:30 - 58:00 the Japanese language is not subject verb object it's more of like subject object verb so it's different so if I were to say um the cat is eating so notice that the cat is eating subject object verb that is how it is spelled out in English um in Japanese would say neas
            • 58:00 - 58:30 so so it is structured out differently compared to English you also have to consider in the part of speech as well the different voices on whether is it active or is it passive so this is also where its proper tagging would also be given okay so um even though my apologies there we also already also tackled a bit of linguistics there but again NLP here it's it's somewhat
            • 58:30 - 59:00 related so I hope you're able to understand here but again out of the seven the most core basic steps that you have to only consider are the five ones so let me go here let me show it here again so it is this one right so these five okay but again want to if you want to consider also NLP cross several sentences you can include the first one now in some references also dependency
            • 59:00 - 59:30 parsing is associated to be part of part of speech right so it is also merged along with the functionality of part of speech okay but in the reference I was able to to find it is considered as a separate step okay now moving on having said that no now um because NLP goes through several steps all at once you now understand the overall complexity on how it works no and
            • 59:30 - 60:00 implementing this in the form of a language model is a challenge already on its own but having said that it is not nearly impossible because we already have gpts in play okay now what is GPT first no GPT stands for generative pre-trained Transformers and this is a prominent framework within generative artificial intelligence and it falls
            • 60:00 - 60:30 under the umbrella term of large language models now most of the items or most of the topics that I have discussed here um applies more on the machine learning aspect of natural language processing if we're going to look at NLP on a much bigger uh much bigger picture in relation to machine learning know there are what you would also called large language models now what are these in general um llms in a way
            • 60:30 - 61:00 are models specific for a language right so meaning it is an application already of machine learning and based on whatever functionality that you want to use on how you can analyze or generate um text no you use an nlm and on the underlying aspect of the llm is where GP sorry GPT Works alongside llm it is a
            • 61:00 - 61:30 use case of lln okay now how it is implemented they are essentially neural networks so you associate deep learning already with GP and they are designed for natural language processing tasks um in the table that I am showing to you here these are just some high level examples now the one that you are seeing in most um websites or in the
            • 61:30 - 62:00 internet nowadays these are what you would call your chat Bots right so this is more of like in in a developer perspective this is your front end this is the component responsible for interacting with the end User it's the chatbot which is a separate system on its own behind this chatbot is your llm or your large language model this is where GP would come from and so for my it or for my developers out there you may have also be familiar
            • 62:00 - 62:30 with how um a system would be um structured out you have a front end you have a logic and you also have a back end right so the front here here would serve as your chatbot or your um uh customer or end user facing the logic here is the llm or the large language model so this is responsible for manipulating and processing the data itself in the form of language text and finally you can have also the back end the back end can be the database it can
            • 62:30 - 63:00 even be the Corpus the Corpus or the um compilation of different knowledge um bodies no used to refer to by the llm in order to generate and understand text now based on the company who developed the lmm the most popular one nowadays is of course the GPT llm model which one was developed by open AI okay and their front in this of course obviously chat GP the versions of GP nowadays is based on the Corpus it refers to right if I'm
            • 63:00 - 63:30 not mistaken I think GPT 3.5 refers to a corpus or a data set from the internet coming back as early as 2011 if I'm not mistaken now I I have to check my reference here my referen Sy then but my point here is that versions of of um gpts or llms in general would normally refer to up to what extent of body of knowledge is the llm being referred
            • 63:30 - 64:00 to okay at least for the case of GP okay for Google for um meta he Meta Even is developing already their own um uh llm models no it it also varies okay so from from my personal um experience most of the um uh use casys set that I have used I'm more of akin to using Google bard and also to Palm because frankly that's the one that I have at the moment no but
            • 64:00 - 64:30 there Bard is using Palm 2 okay nowadays it is also using the Gemini llm okay and in fact I believe Google has already combined Gemini to be both the chatbot itself as well as the llm I'm not mistaken okay because they also have another product there called vertex AI but my my my point here is that the GP PT itself it's called GPT because of several aspects okay or several
            • 64:30 - 65:00 characteristics now um a GP can is it's named as such because of again generative meaning it is as the name uh suggests excels at creating novel content now this is a bit of a double-edged sword okay now the pre-training model phase equips them with the ability to predict the text next word in sequence allowing them to create human like text formats from Poes to code and even in images so so that is
            • 65:00 - 65:30 where the concept of generative would come here pre-rain means that because it is a large language model unlike um similar models that you have to train and develop on your own they already come pre-bundled meaning that they already know that if they receive this kind of text they know how to respond to it it's just a matter of how you can optimize it okay and finally the
            • 65:30 - 66:00 Transformer aspect relies on the third one which is the neural network architecture um a bit of a technical aspect here no the Transformer architecture actually consists of two components you have the encoder part and you also have the decoder part okay um if I were to simplify the terms here no the GP is structured in a way that tokens are fed into the and to the decoder and it is further transformed
            • 66:00 - 66:30 into a data set that can then be referred to back using the pre-trained model okay so that that you can even further look one up what is the Transformer architecture this is one of the um uh Milestones or the uh key points of um artificial intelligence specifically for machine learning but the point here is that GPT it has two characteristics again generative capabilities it is already pre-trained
            • 66:30 - 67:00 in the form of a large language model and the Transformer architecture type or it is implemented using neural networks okay specifically the Transformer architecture I'm not kidding it is really called the Transformer architecture okay so there now um similar to other Technologies related to AI um NLP or also faces several challenges now I was able to find a good source for this meta dialogue now you
            • 67:00 - 67:30 can even look this one up the website itself um sites seven challenges but I I was able to congest it into just three okay so what are the the high level challenges number one ambiguity and vagueness we know how vague the human language is I mean it's not just in Philippino it's not just in um English right the word y okay lately that has already been gaining popularity know if though if if
            • 67:30 - 68:00 if a chatbot is able to see the word yarn it's not able to distinguish it to be a ball of thread right so there is already that is already One Challenge the um uh NLP is already facing so human language is often filled with words or phrases that may mean a variety of things depending on the context right to accurately analyze and interpret the intended mean NLP algorithms must detect
            • 68:00 - 68:30 the context in which the word or phrase is being used so for instance you say um well you're already familiar with J GPT right so when you try to type a simple problem like generate this for me okay so definitely the uh machine would also respond to you what does this stand for what what are you referring to right are you referring to generate an image generate text so the the the
            • 68:30 - 69:00 senten is of generate this for me is already V you you you do not or it's um if say you you open the session and you just type this one on its own it would ask you for more context right because it needs Clarity on the question that you have typed okay and sometimes you would also be flabbergasted on what you would have to type next so that's one challenge ambiguity and bigness another challenge is language transition it is
            • 69:00 - 69:30 challenging for natural language processing systems to stay current and handle new language advances appropriately it is challenging to keep up with the development since MLP algorithms must constantly update their language models again language models are not on itself um self growing in a way that um they create ideas out of th air well technically they do but they refer to something else right so meaning
            • 69:30 - 70:00 it still refers to a database the model still refers to a body of knowledge it still refers to files or contents and in these files have not yet been updated then of course the model itself would also respond with an outdated answer so for example if if you try to type there what is goat I'm not sure if you're familiar also with go no um definitely it the the the chat bot would respond to
            • 70:00 - 70:30 you by saying that he's a four-legged domestic animal but no you're actually referring to the goat G A so anybody here who knows the word the acronym go no or of course this is this is a more recent term a more recent word which can mean an acronym for greatest of all time right so it it it varies no and let's take it also in Filipino the one that I have given example a while ago okay um in gay lingo the word yarn
            • 70:30 - 71:00 does not mean the word the word the ball of thread right so it can mean a different word for us right okay so um or the word s all no and sometimes the word s all can even be expressed or spelled out in a different way right and having said that the third item is figures of speech the word or the term sanaol is actually a figure of speech which is a bit of a tone for
            • 71:00 - 71:30 sarcasm and this is also a challenge faced by MLP right figures of speech such as sarcasm irony and idiomatic expressions are definitely a stumbling block for nlps these challenges can lead to NLP systems misinterpreting the sentiment of a text M translating languages or failing to accurately answer questions that rely on understanding figurative image if I were to take a more wholesome example for
            • 71:30 - 72:00 instance this item is a bomb and you try to apply sentiment analysis to it right so what a bomb so if you may have heard of that as an English term um the sentence or the the model might misinterpret it to be something else right so in my example here she's cold with her words today so it's a bit of a it's it's a figure of speech right it's a metaphor right but
            • 72:00 - 72:30 again if the model is not Trea properly it might respond with a different case no maybe I check her temperature today and then you would respond with face palm and then the bot would say do you have a headache then so so here sometimes no chatbots is not yet in that state of um taking um figures of speech is in a way that would say just an exclamation
            • 72:30 - 73:00 for instance of the user okay so you so that's it no um although we still have challenges of course this can um this also gives us a more of a realistic picture on where we are right now in terms of um availability of use cases specifically for NLP now having said said that no after talking to you in more than an hour already on what NLP is right I mean
            • 73:00 - 73:30 what is machine learning what is artificial intelligence a you are enrolled in Asia open run but why are we even talking about NLP and open gun in the first place right so um here's what I propose NLP needs application to open rent you may have noticed that majority of the discussions that we have right now Prim circles yes we have artificial intelligence at play here but um the
            • 73:30 - 74:00 application of AI through NLP primarily focuses more on parsing text right um making sense of the human language how does now open R play here or in a much more bigger picture how can I relate NLP with telecommunications right um here's what I propos though um in the Telecommunications space the one that I am also exposed of are also dealing with human interaction right human to machine
            • 74:00 - 74:30 communication plays also an important role in telecommunications in general it's a bit of an irony if if um telecommunications being a field dealing dealing with Communications does not have a means for their underlying business processes to make communication more efficient and that's where NLP can be a possible use case I was able to propose here
            • 74:30 - 75:00 three high level use cases on where NLP can be applied in open run in general and this can be applied across um the different domains of your Telecom right from access up to core and even to services but I'll start off with the most obvious one which is in your um end user space chatbots and virtual assistance I mean you are already familiar with ask G right in Globe right
            • 75:00 - 75:30 so that is an example of a virtual assistant or a chat bot uh often times with mixed reviews if I would say um NLP can allow telecommunication companies to develop chat Bots that can answer customer questions troubleshoot problems and even take orders now I have a bit of a caveat here also having been experienced being in the Telco business myself um although there are industries that are already pushing towards the use of chatbots in Virtual assistance in their
            • 75:30 - 76:00 specific L of fing uh having experienced it also myself it still PS that it has its place in the business right this is just an opinion by the way um but it cannot completely replace human interaction right if there are certain use cases that are specifically Nuance to a given issue not all can be answered
            • 76:00 - 76:30 by a chat one but there is a use case for it right at least there is a use case for it so that's one another one the one that is not really much too obvious nowadays especially if you are in the network aspect or in the in the technical aspect no network operations and maintenance While most operations and monitoring systems are point and click based nlps with AI may also be used for providing contextual instructions for a specific process so if I were to put it in more um uh more
            • 76:30 - 77:00 Niche perspective or Niche um use case no um intelligent controllers nowadays can even incorporate NLP as part of already instructing the machine in order to perform a certain task instead of just um I heard this term in a training that I had recently the word click Ops wherein um instruction are just point and click no nowadays all you have to do is to even just type out the entire
            • 77:00 - 77:30 instruction and from there the network is able to identify what specific steps it has to go through to perform a given configuration so think of it like chap GPT but in the network okay so that is how one possible use case would come in and lastly is cyber security so NLP this is more of like applied also in the network but more of on the um filtering aspect of it NLP can be a tool to
            • 77:30 - 78:00 enhance cyber security by understanding and analyzing communication patterns such as access level fishing detection spam filtering and even Insider threat detection now one thing great about large language models is that you don't necessarily also have to program the models to understand human language it can even be made to understand machine language or even machine configurations right it's also a
            • 78:00 - 78:30 language right the word the programming language um C or python or even Java although they are considered to be programming languages they are meant to be read by humans so that they can be developed accordingly because in the computer sense the only language that they can understand are machine language which is just a series of ones and zeros in binary so arguably you can also
            • 78:30 - 79:00 incorporate Machine level or programming language code as a part of filtering out any possible threats found inside the network that's also one new Cas unless of course you know how to read binary and you are definitely the the goals because you know how to read machine um binary directly but I digress so there so you have three possible use cases no um that's it so in summary um
            • 79:00 - 79:30 NLP in general um it is still a continuous field of AI that also continues to be developed and improve on no despite of it offering several of its advantages its benefits its features and use cases it also still imposes several um limitations on how it can be used not just in isia open run not not just in telecommunications but in even
            • 79:30 - 80:00 day-to-day living okay um there are several aspects of it there are several um processes that you have to go through and you also have to understand that NLP is not an isolated science on its own remember that NLP deals with the human language so definitely you also have to tackle here a bit of linguistics as well and like what I have said computational [Music]
            • 80:00 - 80:30 linguistics