Combating AI Content Plagiarism
Poisoning AI with ".аss" subtitles
Estimated read time: 1:20
Summary
The video delves into the growing issue of AI-generated content on platforms like YouTube, focusing on the misuse of AI tools to plagiarize original work by creators. It discusses how AI often uses subtitles from existing videos to generate summaries or similar content, thereby undermining original creators' efforts. The video also explores innovative methods to counter this by "poisoning" AI summarizers with fake subtitles, making it hard for AI to create coherent summaries. A blend of technology-focused strategies is shared to empower content creators in protecting their work from AI misuse.
Highlights
- AI-generated content is on the rise, often plagiarizing real creators. 🤔
- Using AI tools without effort results in generic, repetitive videos. 🔄
- Creators have started fighting back with creative solutions like fake subtitles. 🎥
- Adopting subtitle formats with strategic misinformation can confuse AI summarizers. 🌀
- The battle against AI misuse is ongoing, and collaboration is key. 🤝
Key Takeaways
- Protect your content from AI infringement by cleverly manipulating subtitle formats. 😎
- Creators are finding new ways to fight against AI content theft. 💪
- Faceless YouTube channels are often exploiting AI tools for lazy content production. 🚫
- Innovative use of subtitle technology can thwart AI attempts at plagiarism. 🤖
- Supporting creators in this battle can lead to further innovations in content protection. 🛡️
Overview
In the fast-evolving digital landscape, AI-generated content is growing rampant, with many creators falling victim to plagiarism as their work gets repurposed by AI tools. This video explores the tactics used by those looking to exploit AI for easy profit and the repercussions it has on genuine content creators.
The core of the issue lies in how AI summarizers often rely on video subtitles to generate new content. As a solution, creators like f4mi have developed innovative methods to 'poison' these AI tools with misleading subtitle data, ensuring any plagiarized outcomes are nonsensical or incorrect. This battle of wits spotlights the creative lengths individuals will go to protect their original works.
With AI technology developing rapidly, content creators are urged to become more savvy and protective of their work. By sharing strategic countermeasures, the video encourages dialogue and collaboration within the creator community. It's a call for unity against unwarranted AI exploitation, stressing the importance of maintaining the integrity and value of human creativity.
Chapters
- 00:00 - 00:30: Introduction of the Faceless YouTube Channel Grift The chapter introduces a common scenario where someone is casually browsing for a video to watch while eating. It highlights the tendency to use phone accessories creatively, like using glasses as a stand. As the narrative unfolds, the viewer finds an intriguing video but soon senses something off. The narrator's voice appears too robotic, and there's an evident struggle with basic vocabulary. Additionally, the video is accompanied by generic stock footage, hinting at an underlying issue with the content quality and authenticity.
- 00:30 - 01:00: AI Slop and Automation in Content Creation The chapter discusses the pitfalls of using AI and automation in content creation. It highlights how the content often ends up being generic, repetitive, and lacking coherence or a meaningful conclusion. The issue is attributed to the absence of human thought and oversight in producing such content.
- 01:00 - 01:30: The Problem with Automatic Subtitles and AI The chapter discusses the misconceptions surrounding AI-generated content, particularly automatic subtitles, and the fear of AI overtaking human roles. It highlights that, as of 2025, AGI (Artificial General Intelligence) has not yet arrived. The claims that humans are having more interactions with robots than people are dismissed, emphasizing that AI content currently appearing widely is created by humans using AI tools to profit and not by robots independently creating work.
- 01:30 - 02:00: Exploitation of YouTube's Subtitles by AI Grifters In this chapter, the focus is on a new online scheme prevalent among certain internet communities, often identified with gender-specific cultural themes like 'girlboss' or 'alpha male.' This scheme is referred to as the 'Faceless YouTube Channel' grift, where individuals use AI tools to create content without showing their identity. These tools, including ChatCBT, allow users to outsource almost all aspects of video production, including voiceovers, reflecting a growing trend of AI dependency in content creation.
- 02:00 - 02:30: Introduction to Advanced Subtitle Alpha The chapter discusses the growing trend of using AI-generated content, particularly text-to-speech, without significant human effort. It highlights how YouTube videos are increasingly being used as sources for AI-generated 'slop,' which refers to low-quality content based on automatic subtitles. These subtitles are scraped and fed into AI tools like ChatBBL to either directly copy the video or create summarized versions, often leading to plagiarism.
- 02:30 - 03:00: Comparison Between ASS and SRT Format The chapter delves into the issue of video content theft on platforms like YouTube, as highlighted by creators including Hbomberguy, who lamented that unique ideas often get stolen rapidly. Despite this, the introduction of automatic subtitles by YouTube is praised for its accessibility advantages, suggesting that the benefits to users who rely on them outweigh the potential downsides of making content more susceptible to being copied.
- 03:00 - 03:30: Manipulating Subtitle Formats to Confuse AI The chapter discusses the exploitation of AI features by some individuals to steal videos without repercussions. This is done by taking the link of a video and using a video summarizer to avoid detection, leading to no summaries being generated for such videos. The speaker illustrates the method and reflects on an idea they had months ago to tackle such issues, despite confessing to not being a scientist.
- 03:30 - 04:00: Execution of the Subtitle Poisoning Strategy In this chapter, the author discusses the execution of a strategy involving the subtitle track of their video to thwart unauthorized AI summarizers, which are used to repurpose content without permission. The author outlines a basic method where the actual subtitle track is replaced with a fake one filled with nonsense to mislead AI bots. However, the author decides against this approach because they value having accurately formatted, meaningful subtitles.
- 04:00 - 04:30: Challenges and Mobile Viewing Issues The chapter entitled 'Challenges and Mobile Viewing Issues' discusses the importance of subtitles in videos. The speaker describes their process of experimenting and iterating to create effective subtitles that are both functional for viewers and capable of hiding nonsensical information to deter AI from stealing content. The chapter promises to teach this technique, with a note that the topic might be controversial, and a brief pause is mentioned for handling personal expenses.
- 04:30 - 05:00: Advanced Techniques in Subtitle Confusion The chapter explores the vulnerabilities in internet service providers (ISPs) leading to massive security breaches. Hackers exploited networks meant for legally sanctioned wiretaps, risking exposure of personal internet data to unauthorized parties. The narrative also highlights the persistent threat from data brokers who legally collect and sell personal data. Emphasizing preventive measures, the chapter introduces Aura, a service offering monitoring of personal data to mitigate such risks.
- 05:00 - 05:30: Ethical Implications and the Fight Against AI Exploitation Chapter discusses the privacy and security services offered by Aura, emphasizing their proactive measures against identity theft and illicit data use on the dark web.
- 05:30 - 06:00: Conclusion: Protecting Creators from AI Exploitation In this chapter, the author discusses personal experiences with intellectual property theft related to AI technology. They recount an incident where one of their videos was stolen by a website, Toolify AI, which used it to create a summary without permission. Instead of opting for a typical response like requesting content removal via social media, the author contemplates more innovative approaches to prevent such exploitation in the future. This sets the stage for a broader discussion on how to protect creators from AI exploitation.
Poisoning AI with ".аss" subtitles Transcription
- 00:00 - 00:30 Alright, so it's time to eat, you bust out your phone, you use your glasses as a stand, because everybody does that for some reason, and you try to find an interesting video to watch. And, well, you do find something that seems promising, you click on it and... At first, everything seems normal, but then you quickly start realizing that something is very wrong with the video. Like, the voice of the narrator sounds a bit too robotic, and he seems to struggle with the most basic words, and also, the stock footage that's being
- 00:30 - 01:00 used is super generic, and not just that, but it keeps repeating for some reason. And the actual script itself, like, what the video is about, doesn't seem to really make sense. It's not getting anywhere. It's like, the narrator is just yapping, starting and stopping, starting and stopping, without reaching any meaningful conclusion. It's like, no human thought was put behind this video. And that's because if this happened to you, you've most likely not watched
- 01:00 - 01:30 something that was made by a human at all, but something that was made by a... So, it is officially 2025, and I can confirm, the sound was wrong. We women are not having more sex with robots than humans, and that's my roundabout way of saying that AGI is not here yet. All this AI slop that's appearing on every social media platform is not made by robots trying to steal our jobs, it's made by humans trying to make money using AI to launder other people's work.
- 01:30 - 02:00 A new grift has appeared on the girlboss side and the alpha male side of the internet. Which side you're on depends on if you liked blue or pink when you were a kid. And this grift is called Faceless YouTube Channel, and that's a great name because if I made that sh*t I wouldn't want to show my face either. The idea is to leverage AI tools like ChatCBT to essentially outsource all of the necessary work to make a video to these tools, including voiceovers that they make with
- 02:00 - 02:30 uncanny AI text-to-speech, and since no real work is actually being put in, this is, unfortunately, extremely effective. More and more often recently you see AI slop being based not on Wikipedia articles, not on forum posts, but on other YouTube videos. This is done by exploiting YouTube's automatic subtitles feature, scraping those subtitles and then giving them to ChatBBL, asking you to either change some words around in case you straight up want to steal a video one to one, or make a summary in case you want to make a short case of plagiarism. You can easily find
- 02:30 - 03:00 countless YouTube creatives complaining about how their videos were stolen like this, and the phenomenon doesn't seem to be stopping at all. As Hbomberguy said in that video about that thing, "On YouTube, if you have an original idea, if it's good, it won't be yours for long." Automatic subtitles are, in my opinion, one of the best things YouTube ever did, and even though that means AI grifters can now steal your videos easier, you shouldn't disable them. They are incredibly useful and some people need them in order to watch your content. Unfortunately,
- 03:00 - 03:30 this incredible feature is being exploited by these AI grifters to steal videos, and we can't do anything about it. Like, just take the link of this video, okay, and put it on a video summarizer, like any summarizer of your choice. As you'll see, you're getting no summary for this video. A few months ago, I had an idea, and like the true scientist that I'm not, I wanted to put it to the
- 03:30 - 04:00 test. And the idea was trying to use the subtitle track in my video to poison any AI summarizers that were trying to steal my content to make slop. The most basic way to do this would be simply removing your subtitle track and creating a fake one that is only meant to be seen by AI bots, that basically only contains yapping, like pure garbage. However, this was unacceptable for me. Despite it working, I didn't want to do this because I really care about having proper, well-formatted
- 04:00 - 04:30 subtitles in my videos. They are very important to me. So after some experimenting and iterating, I figured out a way to both have working subtitles for you guys, like in this video, but also in the subtitle data, hide garbage, like pure nonsense that is only visible to AI trying to steal my content. And in this video, I'm going to teach you how to do it. But first, this video might make me some enemies, so just to be safe, give me one minute to pay my bills. Just last
- 04:30 - 05:00 year, several internet providers were victims of a massive security breach, where hackers are suspected to have gained access to a network infrastructure that ISPs used to answer court authorized wiretapping requests, meaning that potentially your private data and everything you do on the internet might have been exposed to a third party. This is one of many leaks happening seemingly every week. And hackers are not the only problem. Data brokers can legally harvest and sell your personal data, letting anyone straight up buy it and use it for whatever they want. And that's why I have partnered with today's sponsor, Aura. Aura monitors your personal data across both the
- 05:00 - 05:30 clear and dark web, alerting you immediately in case of a breach and providing you with up to 5 million dollars in insurance if that data is used to steal your identity. They also provide you with free all-time alerts, even for breaches that do not directly involve you, an automatic opt-out from data brokers so your data doesn't get sold to anyone, and a VPN for safe browsing, potentially protecting you from attacks like the ISP one that I just mentioned. You can go to the link in my description, aura.com/f4mi, for a free two-weeks trial, meaning you can straight up immediately check for free if your data was stolen or sold to anyone. Thanks to Aura for
- 05:30 - 06:00 supporting this crazy video right here. And now, back to destroying Skynet, I guess. Around one year ago, one of my friends sent me a link to this website. I don't know who the f*** Toolify AI is, but they just stole my video. They clearly just gave my video to a YouTube summarizer and published a summary as their own article on their website. Now, instead of like tweeting about it and just getting the content removed, I wanted to try doing something more interesting. And that is, what if I can make it so that someone trying to do the same thing with one of my videos in the
- 06:00 - 06:30 future is going to waste their time and money because the subtitles are a lie. I mean, they're real, but there's something in them. After a bit of experimentation, I figured out a way that works with most LLMs. It doesn't work if someone is using Whisper AI, which is transcribing my video based on audio and then they give that transcript to ChatGPT. But most people trying to steal stuff wouldn't bother with that, they're just going to Google "video summarizer" and use that to steal their stuff. And so, to start, we need to talk about...
- 06:30 - 07:00 Advanced Subtitle Alpha is a subtitle format released in 2002, technically being the fourth version of the SSA subtitle format that was originally launched in '96. And it was originally created by Cotus, a British programmer and anime fansubber, as the fourth format used for his own fansubbing software, Advanced Subtitle Alpha. Now, if we compare the ASS format with the SRT format, which is basically the standard when it comes to subtitles today, ASS clearly wins here. SRT was originally launched in 2000 as part of SubRip, which was a software that would use OCR to scan
- 07:00 - 07:30 hard-coded subtitles from video and convert them to scripts. And because of this very narrow original scope, SRT files have a very simple structure. When you open one with Notepad, you notice that every subtitle is made of three parts. The sequence number, the timecode telling the player when the subtitle should appear and disappear, and finally the actual text itself, the thing you see on screen. This is very basic but also very clever, and it works, but it is no match for the ASS.
- 07:30 - 08:00 S also gives you fonts, positioning, effects like shadow, bold, italic, underline, karaoke, animations, heck, you even get multi-line styling so you could have different styles in the same subtitle line. S is how I managed to get color subtitles in my Format Wars video. However, that shouldn't have worked. YouTube allows you to upload subtitles in different formats, but ASS is not one of them. There are some other compatible subtitle formats that allow you to get different degrees of
- 08:00 - 08:30 customization, but none of them gives you access to every single one of these features at the same time like S. However, after you upload your subtitles to YouTube, it doesn't matter what they were because they internally get converted to YouTube's own proprietary format named SRV3, or YouTube Time Text, YTT, which has already been reverse engineered, and you can find a few GitHub projects that let you convert your ASS files to YTT, and YouTube is going to accept those just fine.
- 08:30 - 09:00 You can just upload them, and even though the styling doesn't show in the subtitle page, after you save the subtitles and you go to the actual video player, they work. So let's take one of my classic videos, okay, the Homebrew Channel Music one. I've already made subtitles for that video ages ago, and they were in the SRT format. When I convert them to the ASS format, what I get is this suit of options of like new stuff that I can do. And there are two things that I'm very interested in, subtitle position and styles.
- 09:00 - 09:30 The way YouTube summarizers usually work is by scraping the subtitles from a YouTube video and then giving those to an LLM like Chat NES, asking it to make a summary. So the LLM then takes a look at the subtitle file in order and tries to explain what's going on inside. So what if exploiting the ASS format? We add for every real subtitle line that a human is supposed to be able to read, two chunks of text out of bounds using the positioning feature of the ASS format, with their size and transparency set to zero so they are completely invisible.
- 09:30 - 10:00 And to avoid repetition, which is something that an LLM can easily figure out, like it can understand we are trying to trick it, instead of putting random words, we actually copy paste works from the public domain. And for extra measure, we replace most words there with synonyms. Now doing this manually would be a pain in the ass. So I just made a Python script that does this for me and it kind of works. It spits out an ASS file that I have to then open using AggieSub to
- 10:00 - 10:30 then modify the styles to make sure that the ones out of bounds are also invisible and zero pixels big. I can then finally use this other tool called YouTube Sub Converter to convert the ASS file into a YTT and upload it to YouTube. And as you can see, the subtitle page is a f***ing mess now, but when I actually reach the video page and I enable the subtitles, everything seems fine. All right, so before we go any further, I want to go here and remove... Okay, there is no automatic
- 10:30 - 11:00 captions track yet, so I should just be able to go to summarize.tech. I can paste my video and see what happens now. "The tomfoolery test video presents a thorough examination of the evolution of mechanical engineering and aviation, emphasizing the complexities of engine designs, including steam engines and various propulsion systems." That's the garbage data. It's only
- 11:00 - 11:30 summarizing the garbage text. There is no... Okay, wait, here. We have a slight mention of the Homebrew Channel and then it goes back to the garbage text. Yeah, this is working. This is working great. "Crisp YouTube Summarizer." Doesn't Crisp, like, make a noise cancelling thing? Why are they making YouTube summarizers now? "The discussion begins with the necessity of springs
- 11:30 - 12:00 behind the delivery box." This one doesn't even try to talk about the Homebrew Channel. "Sumcube.ai." Let's try this one. "The effects of Discord on programmer humor were catastrophic." Oh my god, it's just like... 136.
- 12:00 - 12:30 It is very important here that when the automatic captions get made, we delete them, because we already have our own track, which is the one that's poisoned for AI. If we have the automatic one, then the summarizers are going to default to that one, and therefore this trick won't work. Gemini is actually pretty smart about this. If you ask it to make a summary of a video and the video doesn't have automatic captions enabled, it doesn't even try. When I first came
- 12:30 - 13:00 up with this method one year ago, I was over the moon. Then I tried opening one of these videos on my phone and yeah, transparency and position don't really work there. So any reasonable person would just give up. Do I sound like a reasonable person though? Since the problem is that transparency and positioning don't work properly on mobile and they show black squares, I decided to write a Python script that scans the video and finds every instance of a full black frame. So for example,
- 13:00 - 13:30 when I fade to black now, there are like 30 subtitles on screen right now, but they are black on black so you can see them. I ended up having a local LLM generate a story that is similar to the real script, but with completely made-up facts, and also threw that in the out-of-bounds subtitle text repository and that ended up working perfectly. I managed to confuse GPT-40 every time it tried to recap my video. So I can confidently say that as of today, 22nd
- 13:30 - 14:00 January 2025, this is a pretty effective way of fighting the most common slop makers. I can't really do a lot about Whisper right now. I have to figure out a way to trigger audio hallucinations without making them obnoxious to humans. Also, bigger and newer models like ChatGPT-01 are able to sometimes filter the noise and actually understand the real topic of the video. They are able to see that I'm tricking them. However, there would be yet another step that could potentially work by simply filling the memory of any LLM so much, it's simply wasting so many
- 14:00 - 14:30 resources it can do anything about it, and that would be dividing every single sentence in the subtitle file per single letter specifying the position on screen and the timing so to a human watching the video they would see the complete result of like this patchwork because the player can do it easily as part of the logic that displays the subtitles, but an LLM would have to like read every single letter in order and that's where we pull another trick because since the player doesn't
- 14:30 - 15:00 need the actual subtitles in the text file to be in order, when you're playing the video the player just loads the entire file in RAM and displays the subtitles according to their timestamp, but we can scramble the order of the letters in the text file and the video player is gonna be fine because he can just reference the timing, but an LLM has to waste resources reordering every single letter for every sentence and then from that it has to piece together the words which depending on how the scraping is done means doing it without having the position data so it has to like play
- 15:00 - 15:30 scrabble for every single word and eventually if it can do that correctly it can theoretically try to summarize something but yeah right now it just gives up it doesn't even try. There is yet another trick this one works also on whisper summarizers sometimes because it is not exploiting any specific quirk of the tech it's exploiting the economy behind it and that is since running large good models can be expensive for people running websites like this it is common for them to use caching so when you give the AI summarizer a link to a video it's going to make a summary and then
- 15:30 - 16:00 store that summary so that any future person trying to summarize the same video is not going to waste any API credits because they already have the result it's already done so my idea to exploit this is we make a video that is twice as long as the real video and the second part is just us yapping saying stuff like "Android hell is a real place" we upload this video to YouTube and using the YouTube editor we cut out the real part only leaving the yapping and make sure the yapping is
- 16:00 - 16:30 the same length as the real part for reasons you'll figure out soon. Finally give your video to every video summarizer you can find so that they make a summary of the yapping and they keep that in cache for the future. When that's done you go back to the YouTube editor you revert the changes and this time you cut out the fake part you only leave the real part intact and what you're gonna have now is your copy of the real video on YouTube and for that link associated with it fake summaries about your yapping. Having the same length for the yapping part and the real part makes it so that it
- 16:30 - 17:00 is more difficult for these websites to figure out that something changed with your video because the length is the same and so they can't just use that to update their cache. So did I just fix this problem once and for all? No, not at all. When I first started working on this thing around one year ago I was aware that the second I made my discoveries public people working at these tools which by the way they're not to blame don't like hate them the tools are great some people just use them to steal but they're not meant for that. But yeah the developers behind these tools are
- 17:00 - 17:30 going to fix this issue so my goal with this video aside from telling you how cool subtitles are is not trying to sell you a cure for this problem. Making this video might realistically have just closed some doors for me when it comes to future sponsors that maybe work with AI. But this video isn't about making money or getting recognition it's about trying to do something about what's going on. Us creators are being attacked by both other creators just stealing our content so they're not really creators they're just thieves who don't really care about anything except making money
- 17:30 - 18:00 but most importantly huge mega corporations are being caught basically weekly trying to train their product something they're selling on our art without any authorization they're trying to build machines that are meant to replace us and they are using our passion to fill this project and I hate it I hate it I'm a nobody I'm a woman trying to make her living doing what she loves and I'm not trying to pick a fight or even like pretend I have a fighting chance against huge mega corporations
- 18:00 - 18:30 that in one second make more money than I've ever made in my entire life so realistically if they want her data if they need it they're going to get it but what I'm trying to say is that we have to stop being doomers about it and we don't have to make it easy for them to steal our sh*t. If I manage to come up with this and I'm just a glorified art graduate maybe some of you watching right now are going to come up with something that is even more complex and can make it less convenient for people or huge corporations to steal
- 18:30 - 19:00 our work without giving us any money any credits so yeah that's pretty much it and after this video I have to really hope AI never takes over or I'm definitely going to Android hell. Nothing to see here nothing to see here there are no hidden subtitles here no.