Learn to use AI like a Pro. Learn More

AI Needs a Reality Check

Government Test Exposes AI's Struggles: Humans Still Rule Summarization!

Last updated:

Mackenzie Ferguson

Edited By

Mackenzie Ferguson

AI Tools Researcher & Implementation Consultant

A recent trial by the Australian Securities and Investment Commission (ASIC) showcased that AI summarization is still leagues behind humans. Conducted by Amazon Web Services, the trial found AI summaries scored a weak 47%, far below the 81% of their human counterparts.

Banner for Government Test Exposes AI's Struggles: Humans Still Rule Summarization!

A recent experiment conducted by the Australian Securities and Investments Commission (ASIC) has revealed that generative AI is still a long way from being able to replace human employees when it comes to summarizing information. The trial, spotted by Australian outlet Crikey and conducted by Amazon Web Services, aimed to test the capabilities of AI in generating summaries of official documents, comparing them against summaries written by human employees.

    The results were disappointing for AI advocates. Human-written summaries outperformed AI-generated ones in a big way. On a score rubric used by the trial, human summaries scored an impressive 81 percent, while the AI summaries lagged behind with a mere 47 percent. This trial involved summarizing documents submitted to a parliamentary inquiry, focusing specifically on mentions of ASIC and including references and page numbers.

      Learn to use AI like a Pro

      Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

      Canva Logo
      Claude AI Logo
      Google Gemini Logo
      HeyGen Logo
      Hugging Face Logo
      Microsoft Logo
      OpenAI Logo
      Zapier Logo
      Canva Logo
      Claude AI Logo
      Google Gemini Logo
      HeyGen Logo
      Hugging Face Logo
      Microsoft Logo
      OpenAI Logo
      Zapier Logo

      Five evaluators were tasked with assessing the summaries without knowing which were generated by AI and which were written by humans. Despite the blinded nature of this assessment, three evaluators suspected the involvement of AI due to the poor quality of certain summaries. This revelation highlights an ongoing issue with current generative AI technology: its lack of reliability and inability to match human performance.

        The AI model used in the trial was Meta's open-source Llama2-70B, a model that boasts up to 70 billion parameters yet still failed to deliver acceptable performance. The AI struggled with even basic tasks such as providing page numbers for referenced information. While this particular issue could potentially be fixed with additional tinkering, the AI's more fundamental flaws, such as missing context and nuances, and making illogical choices on what to emphasize, are of greater concern.

          Additionally, the AI summaries were criticized for including irrelevant and redundant information and for being overly verbose. These deficiencies not only question the efficiency of AI in this context but also suggest that its use could lead to increased workload due to the need for extensive fact-checking and corrections. This runs counter to the perceived benefits of AI, such as cost reduction and time-saving.

            The trial's findings contribute to a broader discussion on the limits of generative AI technology. While AI models have made significant strides in various fields, their application in tasks requiring understanding and summarizing nuanced information remains problematic. For businesses keen on integrating AI to streamline operations, these results serve as a cautionary tale about the current limitations and potential pitfalls.

              Learn to use AI like a Pro

              Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

              Canva Logo
              Claude AI Logo
              Google Gemini Logo
              HeyGen Logo
              Hugging Face Logo
              Microsoft Logo
              OpenAI Logo
              Zapier Logo
              Canva Logo
              Claude AI Logo
              Google Gemini Logo
              HeyGen Logo
              Hugging Face Logo
              Microsoft Logo
              OpenAI Logo
              Zapier Logo

              In the larger business environment, the implications are significant. Companies considering AI for summarization tasks should carefully weigh these findings. While there may be a future where AI can efficiently handle such tasks, the present state of technology suggests that human oversight remains crucial. Businesses must balance innovation with practical performance to ensure that adopting new technologies does not lead to inefficiencies or increased costs.

                In conclusion, the ASIC trial underscores a fundamental truth about current generative AI: despite its advancements, it is not yet ready to replace human intelligence in tasks requiring deep understanding and contextual awareness. As such, businesses and organizations should proceed with caution, recognizing the value of human insight and the ongoing necessity for human-AI collaboration.

                  Recommended Tools

                  News

                    Learn to use AI like a Pro

                    Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

                    Canva Logo
                    Claude AI Logo
                    Google Gemini Logo
                    HeyGen Logo
                    Hugging Face Logo
                    Microsoft Logo
                    OpenAI Logo
                    Zapier Logo
                    Canva Logo
                    Claude AI Logo
                    Google Gemini Logo
                    HeyGen Logo
                    Hugging Face Logo
                    Microsoft Logo
                    OpenAI Logo
                    Zapier Logo