Understanding the Real Role of Data Science

What REALLY is Data Science? Told by a Data Scientist

Estimated read time: 1:20

    Learn to use AI like a Pro

    Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo
    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo

    Summary

    Data science is often misunderstood by the public, especially due to the hype surrounding AI and machine learning in the media. However, the true essence of a data scientist's role is to utilize data for impactful decisions that benefit a company. This involves generating insights, creating data products, and making recommendations, rather than solely focusing on complex models or visualizations. The video explains the evolution and scope of data science, tracing its roots to data mining and discussing the rise of big data, the subsequent need for sophisticated data infrastructure, and the practical contributions of data scientists in different company sizes. In essence, data scientists are problem solvers and strategists who adapt their roles to the company's needs, whether in startups or large organizations.

      Highlights

      • Data science is about creating impactful insights and solutions for companies, not just building complex models. 💼
      • There's a huge gap between what's popular on YouTube about data science and what the industry actually needs. 📹
      • Data science has evolved from data mining, integrating computer science to enhance statistical methods. 🧮
      • Web 2.0 and the explosion of user data introduced the era of 'Big Data,' necessitating new data handling technologies. 🌐
      • The essence of a data scientist's job varies with company size and resources, ranging from analytics to AI tasks. 🤖
      • Practical data applications like A/B testing provide more immediate impact than advanced AI models often covered in media. ⚙️
      • Different roles within data science in a company ensure that employees can focus on what they do best. 🔍
      • Data scientists are ultimately strategists and problem solvers, not just data crunchers. 🕵️‍♂️
      • The medium of shared digital experiences has massively increased data, creating both opportunities and challenges. 📊

      Key Takeaways

      • Data science goes beyond creating complex models; it's about making impactful decisions for the company. 🧠
      • Misconceptions about data science arise from the media's focus on AI and machine learning. 📺
      • Data scientists use insights, data products, and product recommendations to drive company success. 🚀
      • The evolution from data mining to data science involves integrating computer science to enhance statistics. 💽
      • Big data's rise led to advanced data infrastructure, essential for managing massive datasets. 📊
      • Different companies require data scientists to fulfill varying roles from analytics to AI, based on resources and priorities. 🏢
      • Data scientists should focus on creating impact rather than just using advanced models. ⚡
      • The real value of data science is found in applications like A/B testing and analytics, crucial for business strategies. 📈
      • Despite the AI hype, practical data science focuses on tangible impacts on the company. 🎯

      Overview

      Data science is often misconceived due to the overwhelming emphasis on AI and machine learning in the media, but its core essence is the application of data to drive impactful business decisions. Joma Tech delves into this reality, unraveling the true scope and historical evolution of data science, tracing back to its roots in data mining and exploring its expansion with the rise of big data.

        In the video, data science is discussed as a multifaceted field that requires data scientists to adapt to the company's specific needs. Whether working at a startup, medium, or large company, data scientists face varying tasks ranging from analytics, managing databases, to creating sophisticated models, proving their role as key problem solvers and strategists driving business success.

          The reality of the industry is that while media highlights the sensational aspects of deep learning and AI, the ultimate value of data science lies in practical applications like A/B testing and analytics. This actionable approach ensures data scientists make significant contributions to a company's strategies, emphasizing the importance of adaptability in this versatile field.

            Chapters

            • 00:00 - 00:30: Introduction to Data Science Data science is fundamentally about leveraging data to create significant impacts for a company.
            • 00:30 - 01:00: Role of a Data Scientist The chapter discusses the role of a data scientist, emphasizing that the primary responsibility is to solve real-world company problems using data. It highlights a common misconception about the field, especially prevalent on platforms like YouTube, where what is talked about often does not align with the actual industry needs. The speaker, a data scientist at a GAFA company, aims to clarify these misunderstandings.
            • 01:00 - 02:00: History of Data Science This chapter delves into the evolution and history of data science, highlighting how companies have increasingly focused on using data to enhance their products. It references the transition from 'data mining' to 'data science' by discussing the seminal 1996 article, 'From Data Mining to Knowledge Discovery in Databases,' which defined data mining as the comprehensive process of uncovering valuable insights from data.
            • 02:00 - 03:00: Emergence of Big Data In 2001, William S. Cleveland aimed to enhance data mining by merging it with computer science, making statistics more technical. This integration, termed data science, was expected to broaden data mining's potential, driving innovation through computational power.
            • 03:00 - 04:00: Rise of Machine Learning and AI The chapter titled "Rise of Machine Learning and AI" discusses the evolution of the internet with the advent of Web 2.0. During this era, websites transitioned from being static digital pamphlets to dynamic platforms that facilitated shared experiences among users globally. Notable examples include MySpace, which launched in 2003, followed by Facebook in 2004, and YouTube in 2005. These platforms fostered user interaction in previously unprecedented ways, allowing individuals to contribute content, post comments, like, upload, share, and thus shape the digital ecosystem of the internet.
            • 04:00 - 05:00: Industry Misalignment The chapter titled 'Industry Misalignment' delves into the challenges and innovations in handling 'Big Data' that emerged in 2010. It discusses how traditional technologies struggled with the vast amounts of data generated, leading to the rise of data science as a field. This chapter highlights the introduction of parallel computing technologies, such as MapReduce, Hadoop, and Spark, which were developed to manage and analyze these enormous unstructured data sets. The chapter captures the transformational phase where industries needed sophisticated data infrastructures to derive insights from their data, thus aligning business needs with technological advancements.
            • 05:00 - 05:30: Real-life Data Science Examples The chapter discusses how data science encompasses a broad range of activities related to data, such as collecting, analyzing, and modeling. However, it emphasizes that the most crucial aspect of data science is its practical applications in various fields. The chapter highlights machine learning as a key example of data science application that has been significantly enabled by the increased availability of data since 2010, allowing for more data-driven approaches in training machines.
            • 05:30 - 07:30: Data Science Hierarchy of Needs The chapter titled 'Data Science Hierarchy of Needs' discusses the evolution of deep learning from an academic concept in theoretical papers, such as those on recurrent neural networks and support vector machines, to a practical and influential class of machine learning that impacts daily life. Machine learning and AI have become dominant topics in the media, overshadowing other areas of data science.
            • 07:30 - 09:00: Data Science Across Different Company Sizes The chapter discusses the concept of data science and its perception in the public versus its actual application in industry. It highlights a misalignment between the public's view of data scientists as researchers focused on machine learning and AI, and the industry's need for data scientists in analyst roles. This misalignment exists because, although many data scientists are capable of tackling more technical issues, large companies like Google, Facebook, and Netflix often have numerous relatively simple improvements (low-hanging fruits) that do not require highly advanced technical solutions.
            • 09:00 - 10:00: Conclusion and Viewer Engagement In the concluding chapter, the focus is on the role of a data scientist, emphasizing the importance of impact over advanced modeling. Data scientists are viewed as problem solvers and strategists who tackle ambiguous challenges to guide companies in the right direction. The chapter ends with an introduction to real-life examples of data science jobs in Silicon Valley.

            What REALLY is Data Science? Told by a Data Scientist Transcription

            • 00:00 - 00:30 Data science is not about making complicated models. It's not about making awesome visualizations It's not about writing code data science is about using data to create as much impact as possible for your company Now impact can be in the form of multiple things It could be in the form of insights in the form of data products or in the form of product recommendations for a company Now to do those things, then you need tools like making complicated models or data visualizations or writing code
            • 00:30 - 01:00 But essentially as a data scientist your job is to solve real company problems using data and what kind of tools you use we don't care Now there's a lot of misconception about data science, especially on YouTube and I think the reason for this is because there's a huge misalignment between what's popular to talk about and what's needed in the industry. So because of that I want to make things clear. I am a data scientist working for a GAFA company and
            • 01:00 - 01:30 those companies really emphasize on using data to improve their products So this is my take on what is data science Before data science, we popularized the term data mining in an article called from data mining to knowledge discovery in databases in 1996 in which it referred to the overall process of discovering useful information from data
            • 01:30 - 02:00 In 2001, William S. Cleveland wanted to bring data mining to another level He did that by combining computer science with data mining Basically He made statistics a lot more technical which he believed would expand the possibilities of data mining and produce a powerful force for innovation Now you can take advantage of compute power for statistics and he called this combo data science. Around this time
            • 02:00 - 02:30 this is also when web 2.0 emerged where websites are no longer just a digital pamphlet, but a medium for a shared experience amongst millions and millions of users These are web sites like MySpace in 2003 Facebook in 2004 and YouTube in 2005. We can now interact with these web sites meaning we can contribute post comment like upload share leaving our footprint in the digital landscape we call Internet and help create and shape the ecosystem
            • 02:30 - 03:00 we now know and love today. And guess what? That's a lot of data so much data, it became too much to handle using traditional technologies. So we call this Big Data. That opened a world of possibilities in finding insights using data But it also meant that the simplest questions require sophisticated data infrastructure just to support the handling of the data We needed parallel computing technology like MapReduce, Hadoop, and Spark so the rise of big data in 2010 sparked the rise of data science to support the needs of the businesses to draw insights from their massive unstructured data sets
            • 03:00 - 03:30 So then the journal of data science described data science as almost everything that has something to do with data Collecting analyzing modeling. Yet the most important part is its applications. All sorts of applications. Yes, all sorts of applications like machine learning So in 2010 with the new abundance of data it made it possible to train machines with a data-driven approach
            • 03:30 - 04:00 rather than a knowledge driven approach. All the theoretical papers about recurring neural networks support vector machines became feasible Something that can change the way we live and how we experience things in the world Deep learning is no longer an academic concept in these thesis paper It became a tangible useful class of machine learning that would affect our everyday lives So machine learning and AI dominated the media overshadowing every other aspect of data science
            • 04:00 - 04:30 like exploratory analysis, experimentation, ... And skills we traditionally called business intelligence So now the general public think of data science as researchers focused on machine learning and AI but the industry is hiring data scientists as analysts So there's a misalignment there The reason for the misalignment is that yes, most of these data scientists can probably work on more technical problems but big companies like Google Facebook Netflix have so many low-hanging fruits to improve their products that they don't require any
            • 04:30 - 05:00 advanced machine learning or statistical knowledge to find these impacts in their analysis Being a good data scientist isn't about how advanced your models are It's about how much impact you can have with your work. You're not a data cruncher. You're a problem solver You're strategists. Companies will give you the most ambiguous and hard problems. And we expect you to guide the company to the right direction Ok, now I want to conclude with real-life examples of data science jobs in Silicon Valley
            • 05:00 - 05:30 But first I have to print some charts. So let's go do that So this is a very useful chart that tells you the needs of data science. Now, it's pretty obvious
            • 05:30 - 06:00 but sometimes we kind of forget about it now At the bottom of the pyramid we have collect you obviously have to collect some sort of data to be able to use that data So collect storing transforming all of these data engineering effort is pretty important and it's actu- It's actually quite captured pretty well in media because of big data we talked about how difficult it is to manage all this data We talked about parallel computing which means like Hadoop and Spark
            • 06:00 - 06:30 Stuff like that. We know about this. Now the thing that's less known is the stuff in between which is right here everything that's here and Surprisingly this is actually one of the most important things for companies because you're trying to tell the company what to do with your product. So what do I mean by that? So I'm an analytics that tells you using the data what kind of insights can tell me what are happening to my users and then metrics this is important because what's going on with my product?
            • 06:30 - 07:00 You know, these metrics will tell you if you're successful or not. And then also, you know a be testing of course Experimentation that allows you to know, which product versions are the best So these things are actually really important but they're not so covered in media. What's covered in media is this part. AI, deep learning. We've heard it on and on about it, you know But when you think about it for a company, for the industry, It's actually not the highest priority or at least it's not the thing that yields the most result for the lowest amount of effort
            • 07:00 - 07:30 That's why AI deep learning is on top of the hierarchy of needs and these things may be testing analytics they're actually way more important for industry so that's why we're hiring a lot of data scientists that does that. So what do data scientists actually do? Well that depends on the company because of them as of the size So for a start-up you kind of lack resources So you can only kind of have one DS. So that one data scientist he has to do everything. So you might be seeing all all this being data scientists. Maybe you won't be doing AI or deep learning because that's not a priority right now
            • 07:30 - 08:00 But you might be doing all of these. You have to set up the whole data infrastructure You might even have to write some software code to add logging and then you have to do the analytics yourself, then you have to build the metrics yourself, and you have to do A/B testing yourself. That's why for startups if they need a data scientist this whole thing is data science, so that means you have to do everything. But let's look at medium-sized companies. Now, finally
            • 08:00 - 08:30 they have a lot more resources. They can separate the data engineers and the data scientists So usually in collection, this is probably software engineering. And then here, you're gonna have data engineers doing this. And then depending if you're medium-sized company does a lot of recommendation models or stuff that requires AI, then DS will do all these Right. So as a data scientist, you have to be a lot more technical That's why they only hire people with PhDs or masters because they want you to be able to do the more complicated things
            • 08:30 - 09:00 So let's talk about large company now Because you're getting a lot bigger you probably have a lot more money and then you can spend it more on employees So you can have a lot of different employees working on different things. That way the employee does not need to think about this stuff that they don't want to do and they could focus on the things that they're best at. For example, me and my untitled large company I would be in analytics so I could just focus my work on analytics and metrics and stuff like that
            • 09:00 - 09:30 So I don't need to worry about data engineering or AI deep learning stuff So here's how it looks for a large company Instrumental logging sensors. This is all handled by software engineers Right? And then here, cleaning and building data pipelines This is for data engineers. Now here, between these two things, we have Data Science Analytics. That's what it's called But then once we go to the AI and deep learning, this is where we have
            • 09:30 - 10:00 research scientists or we call it data science core and they are backed by and now engineers which are machine learning engineers. Yeah Anyways, so in summary, as you can see, data science can be all of this and it depends what company you are in And the definition will vary. So please let me know what you would like to learn more about AI deep learning, or A/B testing,
            • 10:00 - 10:30 experimentation,... Depending on what you want to learn about leave a comment down below so I could talk about it or I could find someone who knows about this and I can share the insights with you So yeah, if you like this video, don't forget to like and subscribe So, yeah. Hope you have a wonderful day. Hope this was helpful. But yeah, thanks for watching Peace.