Try out our new FREE Youtube Summarizer!

Taking the AI Competition Up a Notch

Mistral AI Launches Pixtral 12B: A New Multimodal Marvel

Last updated:

Mackenzie Ferguson

Edited By

Mackenzie Ferguson

AI Tools Researcher & Implementation Consultant

The French startup Mistral AI has unveiled Pixtral 12B, its first multimodal AI model that integrates language and vision processing. With hefty specs and the ability to handle any number of images, it's ready to take on giants like OpenAI and Anthropic.

Banner for Mistral AI Launches Pixtral 12B: A New Multimodal Marvel

Mistral AI, a French startup, has launched Pixtral 12B, its first multimodal model equipped with both language and vision processing capabilities. The company is positioning itself as a competitor to AI giants like OpenAI and Anthropic by offering advanced features in its new model.

    The Pixtral 12B model, which is not currently available on the public web, can be downloaded from Hugging Face or GitHub. Unlike typical AI model releases, Mistral opted for a torrent link to distribute the files, showcasing its unconventional approach. Sophia Yang, the head of developer relations at Mistral, announced that the model will soon be accessible via a web chatbot and through Mistral’s La Platforme, providing API endpoints for developers.

      AI is evolving every day. Don't fall behind.

      Join 50,000+ readers learning how to use AI in just 5 minutes daily.

      Completely free, unsubscribe at any time.

      Although the detailed specifications of the training data for Pixtral 12B remain undisclosed, the model is expected to enable users to analyze images combined with text prompts. This feature allows users to upload an image or provide a link to one and ask questions about the contents of the image. Yang highlighted that the 12-billion parameter model natively supports an arbitrary number of images of various sizes, setting it apart from competitors.

        Initial testers of the 24GB Pixtral 12B model have outlined its architecture, which includes 40 layers, 14,336 hidden dimensions, and 32 attention heads for comprehensive computational processing. In terms of vision capabilities, it features a dedicated vision encoder with 1024x1024 image resolution support and 24 hidden layers, ensuring advanced image processing.

          Mistral’s move to release a multimodal model signifies a strategic step to democratize access to visual applications, such as content and data analysis. While the model's performance is yet to be fully evaluated, it aligns with the company's aggressive strategy in the AI sector.

            Since its inception last year, Mistral has developed several models and secured partnerships with industry titans such as Microsoft, AWS, and Snowflake to broaden its technological reach. Earlier this year, Mistral raised $640 million at a $6 billion valuation and subsequently launched Mistral Large 2, a GPT-4 equivalent model with enhanced multilingual capabilities.

              In addition to Pixtral 12B, Mistral has been busy with other releases, including Mixtral 8x22B, a mixture-of-experts model, and Codestral, a 22-billion parameter coding model geared towards coding tasks. The company also introduced a model dedicated to mathematical reasoning and scientific discovery, highlighting its broad scope and ambitious goals in the AI industry.

                Mistral's innovative approach and impressive lineup of AI models position it as a formidable player in the rapidly evolving AI landscape, with the Pixtral 12B model representing another leap forward for the French startup.

                  AI is evolving every day. Don't fall behind.

                  Join 50,000+ readers learning how to use AI in just 5 minutes daily.

                  Completely free, unsubscribe at any time.