Exploring the Future of Emotions and Machines

Emotionally Intelligent Machines Part 1

Estimated read time: 1:20

Summary

The transcription covers a module on emotionally intelligent machines, focusing on the challenges and opportunities in the domain of affective computing. It delves into the applications of emotion-aware technologies in learning and gaming, discussing how these technologies track and adapt to emotional states. Key themes include the role of sensors in emotion detection, the accuracy needed for effective systems, and various adaptation levels in technology. Highlighted examples are affect-aware learning tools and intelligent systems like Auto Tutor and Gaze Tutor, which enhance user interactions by adapting to emotional feedback.

Highlights

The Gaze Tutor and Auto Tutor are pioneering systems in emotion-aware technology 🎓.
Effective learning technologies track emotional cues to enhance educational tools 📚.
Emotion detection faces hurdles like scalability and accuracy challenges 🧩.
Adaptive systems can range from non-adaptive to complex multi-agent adaptations 🤝.
Different cultures express emotions differently, impacting affective computing's accuracy 🌍.

Key Takeaways

Understanding Emotionally Intelligent Machines enhances learning experiences 🎓.
Auto Tutor and Gaze Tutor are groundbreaking in emotional adaptation 🎮.
Effective detection systems rely on scalable, non-intrusive sensors 🎤.
Adaptation can occur at multiple levels, from no adaptation to multi-agent systems 🤖.
Cultural differences affect the generalizability of emotionally intelligent systems 🌍.

Overview

Emotionally intelligent machines represent a leap forward in digital interaction, transforming how users engage with educational and entertainment technologies. Pioneering efforts like the Auto Tutor and Gaze Tutor show the potential of systems to adapt responsively to users' emotional states, providing personalized experiences that enhance learning and increase engagement.

The ability of machines to detect and react to human emotions hinges on sophisticated systems that balance sensor scalability and accuracy. These innovations are not without challenges, as achieving real-time and effective emotional recognition requires overcoming issues related to noisy environments and diverse cultural expressions.

Adaptive systems are evolving, and their complexity varies greatly. Some systems do not respond to emotional inputs, while others involve sophisticated multi-agent interactions that tailor user experiences uniquely. The development of emotionally intelligent machines must consider cultural differences and work towards universal solutions that can adapt in various contexts.

Chapters

00:00 - 00:30: Introduction to Emotionally Intelligent Machines In this module, the focus is on emotionally intelligent machines. The content begins with music, followed by a greeting to the viewers. The module aims to delve into the topic of machines that possess emotional intelligence, exploring the implications and potential of such technology.
00:30 - 01:00: Challenges and Opportunities in Emotionally Intelligent Machines This chapter explores the challenges and opportunities in the field of emotionally intelligent machines. It builds on previous discussions about emotion processing and elicitation, and their potential applications. The chapter aims to delve into specific domains where these technologies can be applied and identifies existing open issues. The discussion begins with the concept of effect-aware learning.
01:00 - 01:30: Understanding Domains and Open Issues In this chapter titled 'Understanding Domains and Open Issues', the focus is on affinity computing in games and the current open issues in the field. Specifically, there is a discussion on online recognition and the adaptation of emotions, signaling how these aspects can be improved. The chapter aims to dive into effect-aware learning as a foundation for addressing these challenges.
01:30 - 02:00: Effect-Aware Learning The chapter titled 'Effect-Aware Learning' discusses the application of effective computing in emotional agents or emotional settings, particularly within educational contexts. It explores the use of learning technologies developed through this approach, which are termed as effective learning technologies. These tools are designed to facilitate emotion-aware learning.
02:00 - 02:30: Understanding Learners' Experience and Emotions In this chapter, the focus is on understanding learners' experiences and emotions as they interact with learning technologies. It is emphasized that a learner's experience encompasses both their emotional reactions and their overall learning efforts. The ultimate aim is to enhance learning as learners engage with these technologies.
02:30 - 03:00: Meta-Analysis on Effective Learning Technologies This chapter explores the Meta-Analysis on Effective Learning Technologies, focusing on the learners' experiences. It delves into the emotions learners experience while engaging with these technologies and the resultant improvements in learning outcomes. The chapter emphasizes the necessity for advanced understanding and development of these technologies to enhance their effectiveness based on the types of emotions evoked during interaction.
03:00 - 04:00: Major Affective States in Learning Environments The chapter titled 'Major Affective States in Learning Environments' presents a meta-analysis involving 24 studies conducted by researchers. This analysis included a total of 1740 students from various countries, namely the U.S., Canada, and the UK. Each participant approximately spent 45 minutes in the studies, culminating in a total of around 76,000 minutes of data. This substantial amount of data provides a significant basis for understanding affective states in learning environments.
04:00 - 05:00: Development of Effective Learning Technologies This chapter discusses the collection of data in various settings to study the effectiveness of learning technologies. Research was conducted not only in classroom and research lab settings but also in online environments. This diverse range of application settings helped in understanding and tracking 17 different types of affective states during these studies.
05:00 - 07:00: Case Study: Effective Auto Tutor In the chapter titled 'Case Study: Effective Auto Tutor', the focus is on understanding major effective states exhibited by learners. Six key affective states were identified as most useful during interactions with effective learning technologies. These include engagement and boredom, which are highlighted as significant emotional states influencing learning experiences. The discussion emphasizes the importance of recognizing these states to enhance the efficacy of auto tutors.
07:00 - 09:00: Case Study: Gaze Tutor The chapter titled 'Case Study: Gaze Tutor' focuses on assessing and monitoring the engagement levels of students interacting with learning technologies. It aims to determine if the learner is engaged, bored, confused, or curious about the content being taught. Additionally, it explores the emotional states of happiness and frustration experienced by the students while learning through these technologies.
09:00 - 11:30: Effect-Aware Games and Sensor-Based Challenges The chapter titled 'Effect-Aware Games and Sensor-Based Challenges' discusses the importance of recognizing and monitoring effective states exhibited by learners in the development of effective learning technologies. It emphasizes the selection of appropriate sensors, ground truths, and machine learning algorithms for accurate detection and analysis of these states.
11:30 - 15:30: Technical Challenges in Affective Computing The chapter delves into the technical obstacles encountered in affective computing, a field focused on understanding and implementing emotional interaction technologies. Affective states are highlighted as key components necessary for enhancing interaction with affective learning technologies. The chapter further illustrates this concept by exploring practical implementations, such as the affective Auto tutor, which was introduced in recent years.
15:30 - 20:00: Accuracy and Detection in Emotion Recognition The chapter discusses the development of one of the first reactive conversational intelligent tutoring systems, which began in December 2012. This system was notable for its ability to interact conversationally and respond to the emotional cues of users, marking a significant advancement in emotion recognition technology. The work on this system had started several years earlier than its completion date.
20:00 - 25:00: Adaptiveness in Emotionally Intelligent Systems In this chapter titled 'Adaptiveness in Emotionally Intelligent Systems', the content revolves around understanding how systems can adapt by tracking and analyzing a user's emotional state. The chapter discusses the implementation of a screen interface in front of a user, which functions as an instrument or an individual interface. This machine, with the help of a camera, tracks three different modalities of the user. Primarily, it utilizes the camera to track facial features, which is one of the modalities being analyzed. Further details on the other modalities and the adaptive nature of these systems may be explored in the chapter.
25:00 - 39:00: Different Levels of Adaptation The chapter 'Different Levels of Adaptation' discusses the system's ability to track various cues and modalities to determine an individual's emotional or mental state. It emphasizes the use of contextual cues, body language, and other signals to detect emotions such as boredom, confusion, frustration, or neutrality.
39:00 - 46:00: Communicated Task Adaptation The chapter discusses the role of decision-making in tutoring systems. It highlights how these systems use a decision level Fusion algorithm to adapt to different states. The algorithm integrates data from various modalities, such as cameras and body language, applying a fusion system to enhance decision-making and adapt to the learner's needs.
46:00 - 47:00: Conclusion and Future Directions In this chapter, the focus is on the application of a Fusion system to accurately diagnose the type of emotional state an individual is experiencing. By integrating different modalities, the system arrives at a common decision among various potential states. Once the specific emotional state is understood, subsequent steps or decisions can be made based on this information. The chapter likely discusses the implications of this technology and explores potential future directions for its application and development.

Emotionally Intelligent Machines Part 1 Transcription

00:00 - 00:30 [Music] you know [Music] hi friends so welcome to this week's module in this week we are going to talk about emotionally intelligent machines
00:30 - 01:00 challenges and the opportunities so so far we have already discussed a lot about how to process the emotions how to elicit the emotions and what are some of the applications that can be there today we are going to take a step further and we would like to understand that about certain domains in which these can be used and what are certain open issues so first we will be discussing the effect aware learning
01:00 - 01:30 second we will be discussing the use of affinity Computing in games and third we will discuss in this week about the open issues that are there and some indications on how to solve them particularly our Focus would be on the online recognition and the adaptation of the emotions while discussing these fields with that let us Dive In so first effect aware learning that is
01:30 - 02:00 the use of Effective computing in the emotional agents or in the emotional settings in the emotional educational settings so it has been widely used in the learning Technologies in developing the learning Technologies and it is also known as the effective learning Technologies where the con emotion aware Learning Happens those type of tools are known as the effective learning Technologies
02:00 - 02:30 and it has been seen that the learner's experience during the interaction with these type of Technologies it varies a lot when we say the learner's experience it means not only their emotions but also their overall learning efforts so please pay attention that overall what we would like to do we would like to sorry overall our aim when we are interacting with the learning Technologies is to improve the learning so then when we
02:30 - 03:00 talk about the experience of the Learners it is not only the type of the emotions that the Learners experience but also what is the Improvement or the gain in the learning while interacting with these type of Technologies otherwise these effective learning Technologies they will not be very fruitful so first we would like to understand that what are the different types of emotions that can occur during the interaction with the effective learning Technologies and accordingly maybe we need to develop the Technologies so for
03:00 - 03:30 this a meta-analysis of 24 studies was done by some researchers which involved 1740 students from different countries including U.S Canada and UK in these meta-analysis each student roughly spent around 45 minutes and hence we had a total altogether data of around 76 000 minutes which is significant data that we have and these
03:30 - 04:00 studies were collecting the data not only from the classroom settings also in the typical research lab settings and also during the online settings so then we have a wide range of application settings in which these learning Technologies were employed and the studies were conducted now it was found out that a total of 17 different types of effective states were tracked during these studies now of
04:00 - 04:30 course 17 is a big number and we would like to understand that we were the major effective States so if you look at the major effective States then there were these six affective states that were exhibit exhibited by the Learners and hence were found to be most useful during the interaction with the effective learning Technologies so of course it's not a surprise engagement and boredom so basically engagement and boredom of
04:30 - 05:00 course you would like to understand and monitor whether the student or the individual interacting with this learning Technologies is engaged or is bored of course confused whether there is a complex topic that is being taught and hence it is making the individual the student confused about it whether the individual is more Curious to learn about it and of course how is the happiness and the frustration of the individual while learning with the these Technologies
05:00 - 05:30 so now having understood that okay these are some of the most commonly used effective States and exhibited by the Learners then how would you like to use it so the way we can use it that whenever we are developing an effective learning technology so what we can do we can try to focus more on the monitoring of these states and accordingly we can select the type of the sensors we can select the type of the ground truth we can select the type of the machine learning deep learning algorithms in order to recognize and
05:30 - 06:00 also to adapt to these different states so these are the most relevant effective states that we can use during the effective the interaction with the effective learning Technologies now let us try to look at some effective learning Technologies and how they have used the Effective computing in general so first we will talk about the effective Auto tutor so effective Auto tutor was proposed in December this came
06:00 - 06:30 in December 2012 but this was a work of previous earlier several years and it is said to be one of the first reactive conversational intelligent tutoring system so it was a intelligent tutoring system but it was first reactive conversational intelligent tutoring system in the sense that it was able to react in a conversational way to the emotions of the individuals or
06:30 - 07:00 the students so you can see here I hope that this particular graph or image is visible to you so you can see that there is some screen interface that is there in front of a user which could be an instrument which could be an individual and then this particular machine or the interface that is there it is tracking three different modalities of a particular user so it is making use of some camera with the help of the camera it is tracking the facial features it is also looking at the
07:00 - 07:30 contextual it is also looking at the contextual cues and then it is also looking at the body languages of the individual so this is these are the different queues and the modalities that the different the system is targeting or the system is tracking and by tracking all these three what it is trying to do it is trying to detect the boredom confusion frustration or the neutral state of the individual because
07:30 - 08:00 on the basis of these states the tutoring system would like to do some adaptation so apart from the monitoring part now there is a part of that how to take a particular decision so this is where a decision level Fusion algorithm was employed by this tutoring system in which it clubbed the data from the different modalities such as the camera or the body languages and then it applied some fusion system and after
08:00 - 08:30 applying that Fusion system it was giving or getting a particular diagnosis of the type of the emotional state that the individual is experiencing so hence that is how we are making use of the different modalities to arrive to a common decision of what is the type of the state among let's say these four that the individual is experiencing now once we understood that what state the individual is then next we would
08:30 - 09:00 like the system to do some adaptation to it and the type of the adaptation that happened here in the case of the auto tutor was that it provided empathetic and encouraging and motivational dialogues and emotional displays in response to the some cues that it dragged or monitored so for example when it understood or tracked that there is a mild boredom among the individuals or the individual or the user who is
09:00 - 09:30 interacting with the tutoring system is experiencing some boredom is getting bored then it may give certain type of cues with certain emotional displays as you can see in this image I hope this image is clear so this particular animated character is providing certain emotional displays and then there are certain emotional dialogues that are being generated here so for imagine I I'm not sure whether the image is very clear or not so this
09:30 - 10:00 was a particular topic that was being taught let us say about the CPU RAM and the center processing unit of a machine and then the agent or the system tracked or monitored that the user is feeling bored about it so once they use the system was able to track that the user is feeling bored about it then it presented a certain motivational dialogue saying that okay this stuff can be dull sometimes so I'm gonna try and help you get through it so this is a
10:00 - 10:30 simple statement simple motivational dialogue of course combined with certain emotional cues as you can see in the animated character that the system presented which in order to help and motivate the user to improve the learning now the next question whether the learning improved and as you can see in this diagram that is presented on the top so here we are looking at the learning gains from different sessions
10:30 - 11:00 between the regular and the effective system so regular is the one where there was no feedback provided so regular is the one where there was no feedback provided and effective was the one where the monitoring was being done but also certain type of feedback was provided so as you can see in the diagram that is being presented on the right on the top right corner that the learning gains it increased from Session 1 to session two
11:00 - 11:30 and so you can see that these are the learning gains so learning gains in the session one they were not very high but in the session two they were higher for example so overall the learning gains we can conclude that the effective tutor was able to improve the learning or improve the learning of the the user with the help of this type of adaptive feedbacks so that was one very simple example of how this effective Auto tutor was able to track the emotional state of
11:30 - 12:00 a user of a student and how it was able to adapt and through that adaptation how I learned in gain in learning was achieved next we will see another example which is of the Gaze tutor so in case of the Gaze tutor the basic idea was that if we can monitor the periods of the waning attention and if we can attempt to encourage those at those particular type in those
12:00 - 12:30 particular periods then these type of interventions can be very helpful and building on this hypothesis the gays tutor is basically nothing but a multimedia interface which is consisting of an animated uh conversational agent CA is basically the conversational CA is nothing but the conver sational agent so C is the conversational agent so as you can see on the left hand side this is the Gaze tutor interface where
12:30 - 13:00 there is some image above which is being used to present or to display or to teach some particular topic and then there is some animated conversational agent that is you can see on the left hand side now on the right hand side what you can see what you can see is of course the idea was that okay it was able to track your attention period and then it was able to adapt to that particular attention period or it was able to intervene at that particular point of time on the right hand side what you can
13:00 - 13:30 see that this particular line it represents the attention before or the Gaze of the individuals that was before and after the intervention so for example this is representing that where the this particular area the left side area is representing where the user was looking at when before the intervention and what happened when the intervention was provided so if you look at the left hand side then what it is telling you roughly for example if you look at this thing the
13:30 - 14:00 the x axis of course is telling you the time and the y axis is telling you the probability where the individual is looking at so for example if you look at just one the office screen time here so off screen time you can see the office screen time is gradually increasing with as the time progresses so what it means that individual is not neither looking at the screen in the individual is not looking at the tutor individual is not looking at an image right so basically this is the tutor this is the screen interface and this is the
14:00 - 14:30 particular image that is being presented but the individual user is not the user is not looking at either of them and the office screen time is gradually increasing the individual is not paying attention at all and then a certain intervention was provided now if you look at this a certain intervention was provided then what happened then gradually very quickly the off screen time it is started to reduce and then gradually it came at a particular level it got sustained at a particular level where
14:30 - 15:00 the individual's probability of looking outside the screen was around 50 percent 0.5 similarly the same type of analysis can be looked at the tutor and the image so you can see that this is the decline that you can see with respect to the Twitter and the image so before the intervention the individual was not looking at the tutor or the amount of time that it was spent as the time progress says the individual started not looking at the tutor so this is the tutor not the agent and then this is the
15:00 - 15:30 image so basically that you the individual user was neither looking at the tutor not not looking at the image and this is represented by the fact that the time the probability with which it was looking at there started getting reduced but then as the intervention was provided after the particular type of intervention you can see that the probability with which the user was looking at the tutor or even for that matter to the image it started increasing of course it could have
15:30 - 16:00 increased a bit more but nevertheless it was much better than without the intervention so that is how the gays tutor was taking care of the winning attention period and was able to provide some intervention at this particular point of time so now we will look at the effect aware games so we saw so far that if we go for a full sensor based approach dedicated sensor based approach then there is a problem of the scalability right so let
16:00 - 16:30 me just write it down for you so if you're going ahead sorry if you are going ahead with the sensor then dedicated sensors they will require you will face a problem of a scale ability and if you are going sensor less then you will have a problem of accuracy so maybe the accuracy you are somehow making a trade-off with the
16:30 - 17:00 accuracy sensorless right so now so there could be a possible solution where we are we where we which we are calling it as a sensor light solution so sensor light solution means we are going to use the scalable sensors whenever it is feasible such as for example you know cameras and the microphones they are very very widely available sensors which can allow you to capture them Audio Visual modalities and only the
17:00 - 17:30 audio modalities and that is one approach and then at the same time what we can do the non-scalable sensors can be replaced with the scalable proxies so for example if you're trying to make use of a camera can be very good option here and for example because once that it is already available in most of the laptops similarly for example it can be purchased at very very low costs and previous research has already shown we have talked about it
17:30 - 18:00 that you can use the camera as well to for example record and monitor the heart rate and the heart rate variability and related features so this seems like an excellent solution and uh other thing for example that you can do you can also simply look at the webcam data and you can apply the motion tracking techniques on it so for example you can do the poster analysis gesture analysis all with the help of the 2D visual data of course you will need to work a bit more
18:00 - 18:30 on the software side but working on the software side is a bit more easier it it's more accessible than you know working on the hardware and then trying to make the hardware scale so so that for in in this way you know like there is one type of possible solutions other we already talked about that you know like the camera can only not only be used to monitor the heart rate and the heart rate variability it can also be used to monitor the Gaze patterns of the eyes and hence it can
18:30 - 19:00 also replace to certain extent the eye tracking devices and the sensors that we have so that is uh the conclusion of this thing if you have the possibility of going uh making use of the dedicated sensors go ahead many times your application domain will not allow it then you may want to use the existing sensors which are there available in the system to track the behavioral data or for example you may want to replace the dedicated sensors
19:00 - 19:30 with the sensor proxies and hopefully it may work wonders for you okay sorry so that was about the sensor and what to do with the sensors that are available to us now let us talk about the accuracy as well so in general there is a H we know that you know the a naturalistic effective detection has seen a lot of research and it has been
19:30 - 20:00 improved a lot but of course there are lots of problems that are associated still with the Effective computing domain and in general the research and the development that is there one thing we already talked about it there most of the time the sensors that we require they are intrusive many times it could be expensive as well many times it could be noisy as well and more importantly they are not scalable ah other thing that we see that you know
20:00 - 20:30 technical challenge if you look at talk about the technical challenges then the detection itself can suffer from the weak signals that are embedded in the noisy channels so many times for example you are trying to look at a particular data and then but that data itself is surrounded by lots of noise around it for example uh maybe you are trying to capture the emotions in the voice as simple as that you are trying to capture the emotions that are there in the audio modulated modalities of a user but now we are no more in a lab setting we are in a naturalistic setting where there is
20:30 - 21:00 a lot of noise around it and hence your target users voice data is getting embodied is getting surrounded by lots of noises that are around there and hence it really becomes very very challenging to segregate that data of the user voice of the user and then and do some analysis on the top of it one thing also that we have seen is even though we did not talk about a lot in detail about the machine learning and the Deep learning algorithms itself but most of the time what we want to do
21:00 - 21:30 whenever we are talking about the fitted Computing it is going to rely heavily on the machine learning and the Deep learning algorithms and whenever we talk about the machine learning and the Deep learning algorithms it's no surprise that they need a lot of adequate and realistic training data in order to make a lot of sense and many times what happens that emotions data associated with the emotions are may be very very difficult to even capture to annotate and hence in turn our
21:30 - 22:00 machine learning and the Deep learning models may suffer and their accuracies may suffer so other thing for example that may happen that whenever we are talking about the Effective computing most of the time we are just looking at the transient emotions but now if you want to incorporate the context and the presence as well into it then this becomes a bit Troublesome because uh you know if we want to
22:00 - 22:30 incorporate the context and the appraisals and let's say you know the users beliefs desires and intentions around it then it can be become very tricky because then you will have to be able to you know track lots of different things and and which may be very very difficult here in general you know whenever we are talking about ah the Effective computing also there is a common mistake that the researchers do that the developers in the community do and they are doing that
22:30 - 23:00 many times they are taking a replace using one for the another and another for the one so for example whenever they are talking about the model of the emotions they are not able to discriminate properly between the categorical models or the continuous models similarly for example many times the mood versus emotions is not being differentiated so you may you you may refer to mood which is you know long term uh emotions or maybe you are referring as a transient emotions but but you are just you know using the one for the another and hence
23:00 - 23:30 you are not able to differentiate these things and hence there is a lack of clarity on your side uh nevertheless even after this if you are able to do some recognition and you are able to make use of some monitoring then or what happens that the generalizability itself it becomes a big big issue and generalizability in general for the machine learning and deep learning algorithms itself is a problem but more so with respect to the emotions and then the Effective
23:30 - 24:00 computing it's a bigger problem because you may want to look into the individual variability we talked about this you may want to look at the cultural variability cultural differences uh for example the death of an individual at the time of the death of an individual the way it is being expressed in Indian communities is very very different from the way it is being expressed uh you know uh the the lamentation or the sorrow that is being expressed in the outside community so of
24:00 - 24:30 course there are lots of consider differences that are surrounding it of course then context time Etc they also play a lot of role and and so generalizability becomes a big big issue on uh especially when you talk about the Effective computing models so the question that in general we want to ask now is this that okay what type of accuracy or to what extent an accuracy is a good accuracy with which you know we can go ahead and deploy the system and we can make it work
24:30 - 25:00 so in general we already saw that effect detection itself is a very tricky problem and we can very confidently say that okay it's going to be very very unlikely that very soon we are going to have a perfect effect detectors you know that that are not about not only able to you know do a hundred percent uh classification of the emotions of a user in real time but also they are able to generalize well to the new individuals to the interactional context and in you know
25:00 - 25:30 very noisy situations that are around us so it is not going to happen anytime very soon it will require lots of advancements not only on the hardware side but also on the software and the Machine learning and the Deep learning side from us so the idea is okay when should we start taking the information that we are getting from the emotion recognition systems and build on the systems that can adapt to it or in general when we want to close the loop
25:30 - 26:00 so there could be two possibilities now whether we can go wait until there is a perfect effect detection system in order to build adaptive system on the top of it which is going to work perfectly fine or we can just go for a not so perfect affected effect detection system and we can try to make it work so the thing is here since it is a very very tricky problem it may take a lot of time and the resources to arrive to that situation so
26:00 - 26:30 even if we are able to get a moderate degree of recognition accuracy and what is a moderate degree of recognition accuracy for the emotions that depends on them uh situation to situation and domain to The Domain and as per use cases we'll talk a bit more about it but domain the moment we are able to get or you are able to get a moderate degree of recognition accuracy we believe that should be sufficient to so to create the effect aware interventions so effective intervention means adaptive
26:30 - 27:00 interventions we just saw about some of the examples some of them uh but of course we have to take into account the fact that there should be fail soft failsoft in the sense that they should not do any harm let's say even if they are uh doing some adaptation which is based on the incorrect classification so that is a very tricky suggestion that you want to look into it and of course now for example when we say that the moderate degree of recognition and then accordingly the
27:00 - 27:30 severity of the interventions so of course you want to make an Adaptive system but since you are making an Adaptive system which is not entirely perfect which is based on not so perfect uh effect detection systems so you may not want to put a lot of hard confidence in the severity of the interventions so you may want to take it up with a pinch of salt that okay I got a particular classification uh I got a particular emotional state of a user based on
27:30 - 28:00 whatever sensors or the models or the software that I have deployed and there is a possibility that it may not be so correct hence the adaptation that I am going to do is going to be accordingly uh very uh not so severe and and could be a moderate and then it can be calibrated to the extent you know that I'm able to put the confidence in my detection accuracy so for example imagine I'm I I can come in my particular use case I am able to use make use of the
28:00 - 28:30 dedicated soft dedicated sensors dedicated Hardwares and I'm able to use instead of the art algorithms and then I know the detection accuracy is is good or is is is is is is almost perfect in my case uh so so then maybe you can put more confidence in the system and accordingly the adaptations that you are making maybe you can put more confidence in the adaptations as well and accordingly so on so forth right so this is a very good uh important thing to understand that you need to take into account the
28:30 - 29:00 system uh that you are building and then the use case that you are building and accordingly you need to Define that okay what could be a moderate degree of recognition for you what could be the severity of the interventions that you want to go ahead but bottom line is you may never want to wait for a perfect effect detection system in order to start building a adaptive system okay so that I hope is a bit clear now
29:00 - 29:30 having talked about okay the sensors having talked about that what is a good accuracy what NF uh what accuracy is good enough now let us talk about a bit on the adaptiveness right so the question that we want to answer okay what should be the adaptiveness of the system and this is a very uh tricky question again uh and and the severity of the Adaptive interventions as we said already is determined on the basis of the confidence that you are able to put
29:30 - 30:00 into the system but let us see what are the different levels of the adaptation that we can have in the system to begin with for no surprise we can have a level zero adaptation which we also call it as a no adaptation at all so in the no adaptation at all what it means that we do not expect the system to alter Its Behavior in response to the emotional state as simple as that you know whatever is going to be the emotional state we are going to monitor it but we are not going to do anything about it we are simply going to take
30:00 - 30:30 that information you know for some other analysis purposes but we are not going to let the system's Behavior altered by the response in response to the emotional state and that is what is happening uh for most of the systems that we have and and for this what happens you know a predefined interaction scripts mostly you know is used for the machines for the services that we use we just talked about the gaming examples so you know when you are sad the non-playing characters are also around
30:30 - 31:00 they show a sadness but but all this is very very pre uh defined script and and it does not really know uh first thing it does not really know that what your emotion is and and even if you knows it does not it decides not to do anything about it so this type of system is where there is no adaptation as all agree and most of the machines that we are happening interacting with the today or the services including for example you know the voice agents such as Alexa Siri most of the machines are are like that their adaptation is not based on
31:00 - 31:30 our emotional state and many times our emotional state is also not being monitored so that is the level one sorry level zero then comes the level one so the level one base is basically you know what it does it tries to monitor your emotional state number one and then it also tries to recognize that okay there is a need for the adaptation at a particular time so it tries to identify the time of the intervention or the need for the intervention
31:30 - 32:00 and but of course it just does that it simply tries to identify identify it's that okay there is a need for the adaptation but it does not uh perform any adaptations at all and for example you know the of course all the when how will we identify that okay now there is a need for the adaptation it could be on the basis of many different metrics and some metrics of indicators could be like this for example the system is able to ah system or the service that you are interacting with is
32:00 - 32:30 able to understand that okay you are experiencing a negative emotional state because of your voice or through any different morality and now it knows that okay since you are experiencing a negative emotional state you are feeling low you are feeling sad you are feeling bored and so on so forth so what it means that okay there is a need for the adaptation here similarly Whenever there is a change in the emotional state maybe you can just look at this metric the systems can look at this metric that whenever there is a change in the
32:30 - 33:00 emotional state of a user maybe the system also needs to adapt to that and that is the metric that you're of course you are not doing the adaptation as we already talked about it that no adaptation is being performed at this stage but you are able to identify the uh need for the adaptation negative emotional instead we already talked about it changes in the emotional state whenever you change your emotional let's say from one to another maybe you are feeling happy and suddenly you started feeling sad so then maybe the system will get an alert oh the individual was feeling happy and now the individual is
33:00 - 33:30 feeling sad there is a change in the emotional state and maybe this is a right movement to do some adaptation but in the level one of course no adaptation is going to do and then you can think about like lots of other different metrics I will let you explore and think that what could be the other Matrix on the basis of which you may want to do some interventions or adaptations right and uh so that is the level one system so level zero no adaptation level one it just recognizes
33:30 - 34:00 the need for the adaptation but in uh General there is no adaptation as of now yet we are not doing any adaptation okay so now we go to the level 2 which is a bit more fascinating which is the single task adaptation so single task adaptation means what what it does a single task that is being performed during the entire cycle is adapted over time to optimize a
34:00 - 34:30 particular metric so we already have some Metric using which we are saying that okay we have we first we are tracking the emotional state of the user we also have a metric or indicator which is telling us that we are recognizing the need for the adaptation and then we are also in the level 2 now on the top of it we are also doing the adaptation in a single task and this adaptation can be you know uh
34:30 - 35:00 we we may want to look into that okay what is the metric that we really want to improve in terms of the performance uh performance metric and how can be it improved by doing a particular type of adaptation just to make example very clear for example maybe you know um you are looking at let's say uh let's say you your aim that okay you are building a conversational AI agent and the new aim is to make the user happy so you know at the end of this interaction the user should be feeling happy users
35:00 - 35:30 should remain happy users you should feel happy about the interaction of once the interaction is over so whenever the user you you are monitoring the emotional instead of the user and wherever you see that okay whenever you experience that okay user is feeling a bit low then what you want to do you may want to do certain type of adaptation which will make the user happy because you your aim was to make the user happy and and so you will do a particular type of adaptation that is going to make the user happy similarly for example if you talk about the
35:30 - 36:00 effective Learning Systems uh so where your aim is to improve the learning then you may be what you want to do you are tracking the emotional state of the user and your continuous aim is to keep the user engaged or keep the boredom of the user as low as possible and so on so forth so then accordingly the type of the adaptation that you would like to do will be to address that particular goal maybe you want to make the user happy maybe you want to make the user engaged maybe you want to make the users boredom
36:00 - 36:30 low and and so on so forth so I hope that this is what this is what it is clear that you are able to adopt with respect to the performance metric that you have kept as a goal for yourself and many times this adaptation you know it itself what kind of what kind of adaptations could be there it could be the result of some predefined Behavior so you are saying that okay my user is feeling sad I am going to make the user happy but what will I do to make my user happy now this is a question and this particular type of behavior in this particular system in
36:30 - 37:00 the level 2 system is the result of a predefined behavior so for example we already saw we already looked at the example of the shimmy robot so in the case of the shimmy robot for example what happened that uh the shimmy robot was able to you know play some music as per the emotional state of the user but this particular type of music or the behavior of the gesture of the robot was predefined by the script that the developers have already built in
37:00 - 37:30 similarly adaptation can also be the result of the accumulated experience or the learning that happens over the period of time so for example while it is not happening uh to the extent that we want it to be done but in the intelligent tutoring systems what can happen that you may uh you know that the entire experience of the user the interaction with the system over a period of time and then you can learn from that particular experience and you are able to adapt on the basis of that
37:30 - 38:00 and then you know for example what type of adaptation will work for this particular guy so a very simple example is you know when for example as a teacher when the teacher is taking a class in the of uh for of certain for certain students then teacher Knows by the time you know in the beginning maybe the teacher may not have a very good idea that you know what how should I teach a particular topic to a particular student uh in order to make him or her understand but over the period of time you know the user know the teacher knows a good teacher at least knows that okay
38:00 - 38:30 I need to address uh this particular problem of this particular student in such a way so that you know like uh it it helps that particular student in a specific way so the type of the adaptation that the teacher does for a specific student is different that he does for the other student and this is the result of the learning the teacher has done over the period of time and this is what we are envisioning here that if the system can learn from the accumulated experience of interaction with the users then it can improve the
38:30 - 39:00 it can create the adapt it can does the adaptation not on the basis of some predefined Behavior but on the basis of the learning that it is doing and it is going to be very very adaptive and and personalized so that is the level 2 adaptation which is a single task adaptation and and and and and and and and of course keeping a particular performance metric in the mind now level three is the next level adaptation in the level three what happens that rather than targeting up rather
39:00 - 39:30 than doing the adaptation in a single task where for example you know uh maybe in the case of the shimmy robot maybe the robot was only able to do the adaptation in its Voice or maybe he was only able to do the adaptation in its gesture now we are talking about that set of different tasks that are happening during the process cycle can be adapted over the time in response for to optimize a particular performance metric and this adaptation can be of many different types so for example adaptations could be that there are multiple tasks that the agents are doing
39:30 - 40:00 that your machines are doing that your services are doing you can simply do the reordering of the tasks or you can simply do the adaptation of the individual tasks that are happening but in parallel so for example a good example of this thing would be that if you want to say that okay you want to do the reordering of the tasks based on adaptation maybe you know for example you're trying to create an intelligent tutoring system here and the tutoring system you know the movement for example the user logged in to learn a particular uh concept
40:00 - 40:30 maybe the system felt okay the individual looks a bit you know like low on the energy today so if the individual looks a bit low on the energy I am going to you know teach maybe topics that are easy to follow first rather than you know start with the hard topics and maybe I'm going to teach in a way that is you know at a very very basic level for example multiple ways that this type of adaptations are being done uh and so you're reordering also and the multiple tasks are being you know taken
40:30 - 41:00 into account so you are looking at you are not only adapting your teaching style you're also changing the content that you want to teach for example you are simply adapting multiple tasks but you are not doing any reordering of that for example imagine that you are interacting with a chat bot a conversational bot and the customer conversational agent is not uh doing the reordering of anything it's following the order that it's supposed to follow but maybe you know it is not
41:00 - 41:30 only adapting its gestures in response to your emotional state but also it is uh adapting it's it's it's its voice as well in response to your emotional Institute so there are two tasks at least you know and maybe on the top of it maybe you know the task that it is supposed to perform actually maybe you have a problem with the bank or something like that it is also able to do it in a way that really pleases you so there are lots of different tasks that are happening lots of different
41:30 - 42:00 processes that are happening and the adaptation of all these processes or multiple processes are happening in at the same time in this level three type of adaptation and of course nevertheless no matter whether you are adopting a single task multiple tasks whether you are reordering the tasks whatever the type of the adaptation this all the adaptation has to keep in mind the performance of the system so basically you have a particular goal in mind you want to make the user happy you want to make the user feel fulfilled you want to
42:00 - 42:30 make the improve the learning of the user whatever you have already all these gold diff predefined and all these adaptations are happening as per the goal broader goal that you have set for your system right and in this case of course you know rather than having a predefined script what you simply have you simply have a adaptation that is the result of the accumulated experience so it's very very personalized and very very customized for each different user so the adaptations are going to be
42:30 - 43:00 different for each different user depending upon their likes and dislikes so that's quite interesting now in the level 4 adaptation it's very much like the level three but there is a critical difference that the process of the adaptation is carried out between multiple independent agents so what happens that till level 3 we are assuming that we have a system or a service or a machine where there is only one agent with which we are interacting with and that agent is adapting to our
43:00 - 43:30 task or is not adapting or is adapting multiple tasks that the agent itself is performing here we are saying that we are interacting in the multi-agent setting or we are interacting with the multiple services at the same time and all these multiple Services multiple agents they are talking to each other and they are saying that you know like you adopt this you adopt this and I will adapt this and let us collectively make the user feel good about the entire thing so for example uh you know uh maybe you are playing a game and the game there are
43:30 - 44:00 multiple characters so when there are multiple characters rather than one character adapting to you all the different characters they are adapting to you but they are doing in sync by communicating with each other so this is really fascinating because now the your user experience is being looked at holistically and comprehensively and maybe it can provide a better experience overall and while doing so of course what other agents can be do they can communicate the different adaptations
44:00 - 44:30 and they can be applied individually within each agent so as I said agent a can say to Agent B service a can serve sales to service BCB that you know we have to do this this and service a will do this type of adaptation service we will do this type of adaptation similarly you know agent a Agent B Machine a machine b or whatever different types of things so for example you know there are different uh let's say uh in the gaming itself I will there are two different characters that are around you they are supposed to be
44:30 - 45:00 helping you in I don't know finding some treasure and then so agent a is going to you know maybe uh say that okay maybe the user is not able to find it let us help the user find it so agent will say okay uh maybe I'm going to you know uh clear the path for the user while let's say you know you take care of the I don't know enemies that are there on the path so different things right so different of course depending upon the capabilities of the agents or the services and as I said again the creativity is the only limit here again for you on the type of the
45:00 - 45:30 adaptations that you can do for your machine Okay so again when we are talking about this multiple agents this multiple agents can be both real and simulated and of different types as well so you we are talking about the agents in the games we can also have one agent in the embodied agent in the animated agent one robotic agent and and so on so forth so basically all the different types of Agents or the services that we can Envision they can
45:30 - 46:00 work together imagine you have a robot at your home you have a Alexa also you have a Siri also and they all are talking to each other in order to make you feel happy I mean that would be really nice for example so that is what is known as the communicated task adaptation now in conclusion when we looked at the so this is the let us look at the conclusion now so when we looked at the open issues here we already saw that we can use the scalable sensors Whenever there is a
46:00 - 46:30 feasibility of it such as in the case of the cameras and the microphones and we can also replace the non-scalable sensors with their scalable proxies which can look at the latent Behavior such as behavioral data the one that we for example the way the way we capture it from the keyboard typing and so on so forth camera is a very very ideal choice because it's already available in all the laptops systems and if not then it can be also purchased at very very low costs motion tracking techniques can be
46:30 - 47:00 applied to the video data hence you know like it can also enable do this kind of tracking and so this is uh these are some of the things that you want to look into the when you are looking at the conclusion of it [Music]