Combating AI Content Plagiarism

Poisoning AI with ".аss" subtitles

Estimated read time: 1:20

    Learn to use AI like a Pro

    Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo
    Canva Logo
    Claude AI Logo
    Google Gemini Logo
    HeyGen Logo
    Hugging Face Logo
    Microsoft Logo
    OpenAI Logo
    Zapier Logo

    Summary

    The video delves into the growing issue of AI-generated content on platforms like YouTube, focusing on the misuse of AI tools to plagiarize original work by creators. It discusses how AI often uses subtitles from existing videos to generate summaries or similar content, thereby undermining original creators' efforts. The video also explores innovative methods to counter this by "poisoning" AI summarizers with fake subtitles, making it hard for AI to create coherent summaries. A blend of technology-focused strategies is shared to empower content creators in protecting their work from AI misuse.

      Highlights

      • AI-generated content is on the rise, often plagiarizing real creators. 🤔
      • Using AI tools without effort results in generic, repetitive videos. 🔄
      • Creators have started fighting back with creative solutions like fake subtitles. 🎥
      • Adopting subtitle formats with strategic misinformation can confuse AI summarizers. 🌀
      • The battle against AI misuse is ongoing, and collaboration is key. 🤝

      Key Takeaways

      • Protect your content from AI infringement by cleverly manipulating subtitle formats. 😎
      • Creators are finding new ways to fight against AI content theft. 💪
      • Faceless YouTube channels are often exploiting AI tools for lazy content production. 🚫
      • Innovative use of subtitle technology can thwart AI attempts at plagiarism. 🤖
      • Supporting creators in this battle can lead to further innovations in content protection. 🛡️

      Overview

      In the fast-evolving digital landscape, AI-generated content is growing rampant, with many creators falling victim to plagiarism as their work gets repurposed by AI tools. This video explores the tactics used by those looking to exploit AI for easy profit and the repercussions it has on genuine content creators.

        The core of the issue lies in how AI summarizers often rely on video subtitles to generate new content. As a solution, creators like f4mi have developed innovative methods to 'poison' these AI tools with misleading subtitle data, ensuring any plagiarized outcomes are nonsensical or incorrect. This battle of wits spotlights the creative lengths individuals will go to protect their original works.

          With AI technology developing rapidly, content creators are urged to become more savvy and protective of their work. By sharing strategic countermeasures, the video encourages dialogue and collaboration within the creator community. It's a call for unity against unwarranted AI exploitation, stressing the importance of maintaining the integrity and value of human creativity.

            Chapters

            • 00:00 - 00:30: Introduction of the Faceless YouTube Channel Grift The chapter introduces a common scenario where someone is casually browsing for a video to watch while eating. It highlights the tendency to use phone accessories creatively, like using glasses as a stand. As the narrative unfolds, the viewer finds an intriguing video but soon senses something off. The narrator's voice appears too robotic, and there's an evident struggle with basic vocabulary. Additionally, the video is accompanied by generic stock footage, hinting at an underlying issue with the content quality and authenticity.
            • 00:30 - 01:00: AI Slop and Automation in Content Creation The chapter discusses the pitfalls of using AI and automation in content creation. It highlights how the content often ends up being generic, repetitive, and lacking coherence or a meaningful conclusion. The issue is attributed to the absence of human thought and oversight in producing such content.
            • 01:00 - 01:30: The Problem with Automatic Subtitles and AI The chapter discusses the misconceptions surrounding AI-generated content, particularly automatic subtitles, and the fear of AI overtaking human roles. It highlights that, as of 2025, AGI (Artificial General Intelligence) has not yet arrived. The claims that humans are having more interactions with robots than people are dismissed, emphasizing that AI content currently appearing widely is created by humans using AI tools to profit and not by robots independently creating work.
            • 01:30 - 02:00: Exploitation of YouTube's Subtitles by AI Grifters In this chapter, the focus is on a new online scheme prevalent among certain internet communities, often identified with gender-specific cultural themes like 'girlboss' or 'alpha male.' This scheme is referred to as the 'Faceless YouTube Channel' grift, where individuals use AI tools to create content without showing their identity. These tools, including ChatCBT, allow users to outsource almost all aspects of video production, including voiceovers, reflecting a growing trend of AI dependency in content creation.
            • 02:00 - 02:30: Introduction to Advanced Subtitle Alpha The chapter discusses the growing trend of using AI-generated content, particularly text-to-speech, without significant human effort. It highlights how YouTube videos are increasingly being used as sources for AI-generated 'slop,' which refers to low-quality content based on automatic subtitles. These subtitles are scraped and fed into AI tools like ChatBBL to either directly copy the video or create summarized versions, often leading to plagiarism.
            • 02:30 - 03:00: Comparison Between ASS and SRT Format The chapter delves into the issue of video content theft on platforms like YouTube, as highlighted by creators including Hbomberguy, who lamented that unique ideas often get stolen rapidly. Despite this, the introduction of automatic subtitles by YouTube is praised for its accessibility advantages, suggesting that the benefits to users who rely on them outweigh the potential downsides of making content more susceptible to being copied.
            • 03:00 - 03:30: Manipulating Subtitle Formats to Confuse AI The chapter discusses the exploitation of AI features by some individuals to steal videos without repercussions. This is done by taking the link of a video and using a video summarizer to avoid detection, leading to no summaries being generated for such videos. The speaker illustrates the method and reflects on an idea they had months ago to tackle such issues, despite confessing to not being a scientist.
            • 03:30 - 04:00: Execution of the Subtitle Poisoning Strategy In this chapter, the author discusses the execution of a strategy involving the subtitle track of their video to thwart unauthorized AI summarizers, which are used to repurpose content without permission. The author outlines a basic method where the actual subtitle track is replaced with a fake one filled with nonsense to mislead AI bots. However, the author decides against this approach because they value having accurately formatted, meaningful subtitles.
            • 04:00 - 04:30: Challenges and Mobile Viewing Issues The chapter entitled 'Challenges and Mobile Viewing Issues' discusses the importance of subtitles in videos. The speaker describes their process of experimenting and iterating to create effective subtitles that are both functional for viewers and capable of hiding nonsensical information to deter AI from stealing content. The chapter promises to teach this technique, with a note that the topic might be controversial, and a brief pause is mentioned for handling personal expenses.
            • 04:30 - 05:00: Advanced Techniques in Subtitle Confusion The chapter explores the vulnerabilities in internet service providers (ISPs) leading to massive security breaches. Hackers exploited networks meant for legally sanctioned wiretaps, risking exposure of personal internet data to unauthorized parties. The narrative also highlights the persistent threat from data brokers who legally collect and sell personal data. Emphasizing preventive measures, the chapter introduces Aura, a service offering monitoring of personal data to mitigate such risks.
            • 05:00 - 05:30: Ethical Implications and the Fight Against AI Exploitation Chapter discusses the privacy and security services offered by Aura, emphasizing their proactive measures against identity theft and illicit data use on the dark web.
            • 05:30 - 06:00: Conclusion: Protecting Creators from AI Exploitation In this chapter, the author discusses personal experiences with intellectual property theft related to AI technology. They recount an incident where one of their videos was stolen by a website, Toolify AI, which used it to create a summary without permission. Instead of opting for a typical response like requesting content removal via social media, the author contemplates more innovative approaches to prevent such exploitation in the future. This sets the stage for a broader discussion on how to protect creators from AI exploitation.

            Poisoning AI with ".аss" subtitles Transcription

            • 00:00 - 00:30 Alright, so it's time to eat, you bust out your  phone, you use your glasses as a stand, because everybody does that for some reason, and you  try to find an interesting video to watch. And, well, you do find something that  seems promising, you click on it and... At first, everything seems normal, but then  you quickly start realizing that something is very wrong with the video. Like, the voice  of the narrator sounds a bit too robotic, and he seems to struggle with the most basic  words, and also, the stock footage that's being
            • 00:30 - 01:00 used is super generic, and not just that, but it  keeps repeating for some reason. And the actual script itself, like, what the video is about,  doesn't seem to really make sense. It's not getting anywhere. It's like, the narrator is  just yapping, starting and stopping, starting and stopping, without reaching any meaningful  conclusion. It's like, no human thought was put behind this video. And that's because if this  happened to you, you've most likely not watched
            • 01:00 - 01:30 something that was made by a human at  all, but something that was made by a... So, it is officially 2025, and I can confirm,  the sound was wrong. We women are not having more sex with robots than humans, and that's my  roundabout way of saying that AGI is not here yet. All this AI slop that's appearing on every social  media platform is not made by robots trying to steal our jobs, it's made by humans trying to make  money using AI to launder other people's work.
            • 01:30 - 02:00 A new grift has appeared on the girlboss side  and the alpha male side of the internet. Which side you're on depends on if you liked blue or  pink when you were a kid. And this grift is called Faceless YouTube Channel, and that's a great name  because if I made that sh*t I wouldn't want to show my face either. The idea is to leverage AI  tools like ChatCBT to essentially outsource all of the necessary work to make a video to these  tools, including voiceovers that they make with
            • 02:00 - 02:30 uncanny AI text-to-speech, and since no real work  is actually being put in, this is, unfortunately, extremely effective. More and more often recently  you see AI slop being based not on Wikipedia articles, not on forum posts, but on other YouTube  videos. This is done by exploiting YouTube's automatic subtitles feature, scraping those  subtitles and then giving them to ChatBBL, asking you to either change some words around in  case you straight up want to steal a video one to one, or make a summary in case you want to make  a short case of plagiarism. You can easily find
            • 02:30 - 03:00 countless YouTube creatives complaining about  how their videos were stolen like this, and the phenomenon doesn't seem to be stopping at all.  As Hbomberguy said in that video about that thing, "On YouTube, if you have an original idea,  if it's good, it won't be yours for long." Automatic subtitles are, in my opinion, one of  the best things YouTube ever did, and even though that means AI grifters can now steal your videos  easier, you shouldn't disable them. They are incredibly useful and some people need them  in order to watch your content. Unfortunately,
            • 03:00 - 03:30 this incredible feature is being exploited by  these AI grifters to steal videos, and we can't do anything about it. Like, just  take the link of this video,   okay, and put it on a video summarizer, like any summarizer of your choice. As you'll  see, you're getting no summary for this video. A few months ago, I had an idea, and like the true  scientist that I'm not, I wanted to put it to the
            • 03:30 - 04:00 test. And the idea was trying to use the subtitle  track in my video to poison any AI summarizers that were trying to steal my content to make slop.  The most basic way to do this would be simply removing your subtitle track and creating a fake  one that is only meant to be seen by AI bots, that basically only contains yapping,  like pure garbage. However,   this was unacceptable for me. Despite it working, I didn't want to do this because I  really care about having proper, well-formatted
            • 04:00 - 04:30 subtitles in my videos. They are very important  to me. So after some experimenting and iterating, I figured out a way to both have working  subtitles for you guys, like in this video, but also in the subtitle data, hide garbage, like  pure nonsense that is only visible to AI trying to steal my content. And in this video, I'm going  to teach you how to do it. But first, this video might make me some enemies, so just to be safe,  give me one minute to pay my bills. Just last
            • 04:30 - 05:00 year, several internet providers were victims  of a massive security breach, where hackers are suspected to have gained access to a network  infrastructure that ISPs used to answer court authorized wiretapping requests, meaning that  potentially your private data and everything you do on the internet might have been exposed to a  third party. This is one of many leaks happening seemingly every week. And hackers are not the only  problem. Data brokers can legally harvest and sell your personal data, letting anyone straight up buy  it and use it for whatever they want. And that's why I have partnered with today's sponsor, Aura.  Aura monitors your personal data across both the
            • 05:00 - 05:30 clear and dark web, alerting you immediately in  case of a breach and providing you with up to 5 million dollars in insurance if that data is  used to steal your identity. They also provide you with free all-time alerts, even for breaches  that do not directly involve you, an automatic opt-out from data brokers so your data doesn't  get sold to anyone, and a VPN for safe browsing, potentially protecting you from attacks like the  ISP one that I just mentioned. You can go to the link in my description, aura.com/f4mi, for a  free two-weeks trial, meaning you can straight up immediately check for free if your data was  stolen or sold to anyone. Thanks to Aura for
            • 05:30 - 06:00 supporting this crazy video right here. And now,  back to destroying Skynet, I guess. Around one year ago, one of my friends sent me a link to this  website. I don't know who the f*** Toolify AI is, but they just stole my video. They clearly just   gave my video to a YouTube  summarizer and published a summary as their own article on their website.  Now, instead of like tweeting about it and just getting the content removed, I wanted to try  doing something more interesting. And that is, what if I can make it so that someone trying to  do the same thing with one of my videos in the
            • 06:00 - 06:30 future is going to waste their time and money  because the subtitles are a lie. I mean, they're real, but there's something in them. After a bit  of experimentation, I figured out a way that works with most LLMs. It doesn't work if someone is  using Whisper AI, which is transcribing my video based on audio and then they give that transcript  to ChatGPT. But most people trying to steal stuff wouldn't bother with that, they're just going to  Google "video summarizer" and use that to steal their stuff. And so, to start,  we need to talk about...
            • 06:30 - 07:00 Advanced Subtitle Alpha is a subtitle format  released in 2002, technically being the fourth version of the SSA subtitle format that was  originally launched in '96. And it was originally created by Cotus, a British programmer and anime  fansubber, as the fourth format used for his own fansubbing software, Advanced Subtitle Alpha. Now,  if we compare the ASS format with the SRT format, which is basically the standard when it comes to  subtitles today, ASS clearly wins here. SRT was originally launched in 2000 as part of SubRip,  which was a software that would use OCR to scan
            • 07:00 - 07:30 hard-coded subtitles from video and convert them   to scripts. And because of  this very narrow original scope, SRT files have a very simple structure.  When you open one with Notepad, you notice that every subtitle is made of three parts. The  sequence number, the timecode telling the player when the subtitle should appear and disappear, and  finally the actual text itself, the thing you see on screen. This is very  basic but also very clever,   and it works, but it is no match for the ASS.
            • 07:30 - 08:00 S also gives you fonts, positioning,   effects like shadow, bold, italic,  underline, karaoke, animations, heck, you even get multi-line styling so you could  have different styles in the same subtitle line. S is how I managed to get color subtitles in my  Format Wars video. However, that shouldn't have worked. YouTube allows you to upload subtitles  in different formats, but ASS is not one of them. There are some other compatible subtitle formats  that allow you to get different degrees of
            • 08:00 - 08:30 customization, but none of them gives you access  to every single one of these features at the same time like S. However, after you upload your  subtitles to YouTube, it doesn't matter what they were because they internally get converted to  YouTube's own proprietary format named SRV3, or YouTube Time Text, YTT, which has already been  reverse engineered, and you can find a few GitHub projects that let you convert  your ASS files to YTT,   and YouTube is going to accept those just fine.
            • 08:30 - 09:00 You can just upload them, and even though the  styling doesn't show in the subtitle page, after you save the subtitles and you go  to the actual video player, they work. So let's take one of my classic videos,  okay, the Homebrew Channel Music one. I've already made subtitles for that video  ages ago, and they were in the SRT format. When I convert them to the ASS format, what I get  is this suit of options of like new stuff that I can do. And there are two things that I'm very  interested in, subtitle position and styles.
            • 09:00 - 09:30 The way YouTube summarizers usually work is  by scraping the subtitles from a YouTube video and then giving those to an LLM like  Chat NES, asking it to make a summary. So the LLM then takes a look at the subtitle file  in order and tries to explain what's going on inside. So what if exploiting the ASS format? We  add for every real subtitle line that a human is supposed to be able to read, two chunks of text  out of bounds using the positioning feature of the ASS format, with their size and transparency  set to zero so they are completely invisible.
            • 09:30 - 10:00 And to avoid repetition, which is something  that an LLM can easily figure out, like it can understand we are trying to trick it, instead  of putting random words, we actually copy paste works from the public domain.  And for extra measure,   we replace most words there with synonyms. Now doing this manually would be a pain in the  ass. So I just made a Python script that does this for me and it kind of  works. It spits out an ASS   file that I have to then open using AggieSub to
            • 10:00 - 10:30 then modify the styles to make sure that the ones  out of bounds are also invisible and zero pixels big. I can then finally use this other tool  called YouTube Sub Converter to convert the ASS file into a YTT and upload it to YouTube. And as  you can see, the subtitle page is a f***ing mess now, but when I actually reach the video page and  I enable the subtitles, everything seems fine. All right, so before we go any further, I want to  go here and remove... Okay, there is no automatic
            • 10:30 - 11:00 captions track yet, so I should just be able to  go to summarize.tech. I can paste my video and see what happens now. "The tomfoolery test video  presents a thorough examination of the evolution of mechanical engineering and aviation,  emphasizing the complexities of engine designs, including steam engines and various propulsion  systems." That's the garbage data. It's only
            • 11:00 - 11:30 summarizing the garbage text. There is no...  Okay, wait, here. We have a slight mention of the Homebrew Channel and then it goes back to  the garbage text. Yeah, this is working. This is working great. "Crisp YouTube Summarizer." Doesn't  Crisp, like, make a noise cancelling thing? Why are they making YouTube summarizers now? "The  discussion begins with the necessity of springs
            • 11:30 - 12:00 behind the delivery box." This one doesn't  even try to talk about the Homebrew Channel. "Sumcube.ai." Let's try this one. "The effects of  Discord on programmer humor were catastrophic." Oh my god, it's just like... 136.
            • 12:00 - 12:30 It is very important here that when the  automatic captions get made, we delete them, because we already have our own track, which is  the one that's poisoned for AI. If we have the automatic one, then the summarizers are going  to default to that one, and therefore this trick won't work. Gemini is actually pretty smart about  this. If you ask it to make a summary of a video and the video doesn't have automatic captions  enabled, it doesn't even try. When I first came
            • 12:30 - 13:00 up with this method one year ago, I was over the  moon. Then I tried opening one of these videos on my phone and yeah, transparency and position don't  really work there. So any reasonable person would just give up. Do I sound like a reasonable person  though? Since the problem is that transparency and positioning don't work properly on mobile  and they show black squares, I decided to write a Python script that scans the video and finds every  instance of a full black frame. So for example,
            • 13:00 - 13:30 when I fade to black now, there are like 30  subtitles on screen right now, but they are black on black so you can see them. I ended  up having a local LLM generate a story that is similar to the real script, but with completely  made-up facts, and also threw that in the out-of-bounds subtitle text repository and that  ended up working perfectly. I managed to confuse GPT-40 every time it tried to recap my video.  So I can confidently say that as of today, 22nd
            • 13:30 - 14:00 January 2025, this is a pretty effective way of  fighting the most common slop makers. I can't really do a lot about Whisper right now. I have to  figure out a way to trigger audio hallucinations without making them obnoxious to humans. Also,  bigger and newer models like ChatGPT-01 are able to sometimes filter the noise and actually  understand the real topic of the video. They are able to see that I'm tricking them. However,  there would be yet another step that could potentially work by simply filling the memory  of any LLM so much, it's simply wasting so many
            • 14:00 - 14:30 resources it can do anything about it, and that  would be dividing every single sentence in the subtitle file per single letter specifying the  position on screen and the timing so to a human watching the video they would see the complete  result of like this patchwork because the player can do it easily as part of the  logic that displays the subtitles,   but an LLM would have to like read every single letter in order and that's where we   pull another trick because  since the player doesn't
            • 14:30 - 15:00 need the actual subtitles in the text file to be  in order, when you're playing the video the player just loads the entire file in RAM and displays the  subtitles according to their timestamp, but we can scramble the order of the letters in the text file  and the video player is gonna be fine because he can just reference the timing, but an LLM has to  waste resources reordering every single letter for every sentence and then from that it has to  piece together the words which depending on how the scraping is done means doing it without  having the position data so it has to like play
            • 15:00 - 15:30 scrabble for every single word and eventually if  it can do that correctly it can theoretically try to summarize something but yeah right now it just  gives up it doesn't even try. There is yet another trick this one works also on whisper summarizers  sometimes because it is not exploiting any specific quirk of the tech it's exploiting the  economy behind it and that is since running large good models can be expensive for people running  websites like this it is common for them to use caching so when you give the AI summarizer a link  to a video it's going to make a summary and then
            • 15:30 - 16:00 store that summary so that any future person  trying to summarize the same video is not going to waste any API credits because they already have  the result it's already done so my idea to exploit this is we make a video that is twice as long  as the real video and the second part is just us yapping saying stuff like "Android hell is a real  place" we upload this video to YouTube and using the YouTube editor we cut out the real part only  leaving the yapping and make sure the yapping is
            • 16:00 - 16:30 the same length as the real part for   reasons you'll figure out soon.  Finally give your video to every video summarizer you can find so that they make  a summary of the yapping and they keep that in cache for the future. When that's done you go back  to the YouTube editor you revert the changes and this time you cut out the fake part you only leave  the real part intact and what you're gonna have now is your copy of the real video on YouTube and  for that link associated with it fake summaries about your yapping. Having the same length for the  yapping part and the real part makes it so that it
            • 16:30 - 17:00 is more difficult for these websites to figure out  that something changed with your video because the length is the same and so they can't just use  that to update their cache. So did I just fix this problem once and for all? No, not at all. When  I first started working on this thing around one year ago I was aware that the second I made my  discoveries public people working at these tools which by the way they're not to blame don't like  hate them the tools are great some people just use them to steal but they're not meant for that.  But yeah the developers behind these tools are
            • 17:00 - 17:30 going to fix this issue so my goal with this video  aside from telling you how cool subtitles are is not trying to sell you a cure for this problem.  Making this video might realistically have just closed some doors for me when it comes to future  sponsors that maybe work with AI. But this video isn't about making money or getting recognition  it's about trying to do something about what's going on. Us creators are  being attacked by both other   creators just stealing our content so they're not really creators they're just thieves who don't  really care about anything except making money
            • 17:30 - 18:00 but most importantly huge mega corporations are  being caught basically weekly trying to train their product something they're selling on our art  without any authorization they're trying to build machines that are meant to replace us and they are  using our passion to fill this project and I hate it I hate it I'm a nobody I'm a woman trying to  make her living doing what she loves and I'm not trying to pick a fight or even like pretend I have  a fighting chance against huge mega corporations
            • 18:00 - 18:30 that in one second make more money than I've ever  made in my entire life so realistically if they want her data if they need it they're going to get  it but what I'm trying to say is that we have to stop being doomers about it and we don't have  to make it easy for them to steal our sh*t. If I manage to come up with this and I'm just a  glorified art graduate maybe some of you watching right now are going to come up with something  that is even more complex and can make it less convenient for people or  huge corporations to steal
            • 18:30 - 19:00 our work without giving us any money any credits so yeah that's pretty much it and after this  video I have to really hope AI never takes over or I'm definitely going to Android hell. Nothing to see here nothing to see here  there are no hidden subtitles here no.