AI UPDATE - šŸ”„ AI Apocalypse? The Shocking Truth About AI Pollution, Plus Changes To GPT 3.5

Today, we are talking about AI's fuzzy memory: The unsettling phenomenon of model collapse and it's impact on quality.

The Increasing Problem With An AI Echo Chamber

Hey there, AI enthusiasts! Buckle up, because today we're diving into a plot that could be straight out of a sci-fi movie. But trust me, it's as real as the device you're reading this on! The star of our tale? Generative AI. The villain? An eerie phenomenon called "model collapse." Hold onto your keyboards, folks, we're heading into the matrix!

Remember when OpenAI's ChatGPT exploded onto the scene like an AI supernova? Well, half a year later, it's already become a workday staple for many companies. This has sparked a mad rush to embed generative AI into new products. But here's the thing: these AI models learned their tricks from human-created content - books, articles, pictures, you name it. What happens when these AI start learning from each other's content, instead of our humanly crafted masterpieces?

Well, a group of researchers just peered into this digital Pandora's box, and it ain't pretty folks! They've found that the use of AI-generated content in training can cause "irreversible defects" in the resulting models. It's like being trapped in a twisted game of Telephone where the message gets more garbled with each pass.

The researchers call this degenerative process "model collapse," where over time, AI models start to 'forget' the true underlying data and misperceive reality. Remember the movie "Multiplicity," where Michael Keaton clones himself and then clones the clones, leading to decreasing intelligence? That's kind of what we're dealing with here.

ā€œOver time, mistakes in generated data compound and ultimately force models that learn from generated data to misperceive reality even furtherā€

Ilia Shumailov

Think of it like this. Imagine you train a machine learning model on pictures of 100 cats: 90 yellow and 10 blue. The model starts to return green-cat results, thinking blue cats are more yellowish than they are. Over time, the blue color fades away, turning green, then yellow. This progressive distortion and eventual loss of minority data is what we call model collapse.

But the story doesn't end here. This "pollution" with AI-generated data not only results in models gaining a distorted perception of reality but also leads to more serious implications like discrimination based on gender, ethnicity, or other sensitive attributes. If a generative AI only produces one race in its responses, it's essentially "forgetting" that others exist.

So, what's the takeaway? This AI world we're building is as exciting as it is complex. And as we continue to create and innovate, we need to be mindful of the data we're feeding our AI models. After all, we wouldn't want our AI to start believing that blue cats are a myth, would we? As more AI-generated content is fed into the models, the more likely the quality of output will continue to degrade.

ā€œIt is clear, though, that model collapse is an issue for ML and something has to be done about it to ensure generative AI continues to improveā€

Ilia Shumailov

ChatGPT Plugin To Export Chats to PDF or Doc

Here’s a great free tool to get your ChatGPT conversations out of the browser and into a PDF or Word Doc. It also maintains the formatting to keep it easy to read if you are establishing formatting in your prompts.

Mistral AI Blows In: An Ambitious French Challenger to OpenAI

In the ever-evolving landscape of artificial intelligence, a mere four-week-old startup is making waves. Paris-based Mistral AI, co-founded by alumni from Google's DeepMind and Meta, has secured a whopping $113 million in seed funding. The ambitious endeavor aims to compete against OpenAI in building, training, and deploying large language models and generative AI.

This massive round of seed funding is led by Lightspeed Venture Partners, with participation from a diverse group of investors across Europe and the UK. Even former Google CEO Eric Schmidt and French investment bank Bpifrance have hopped on board, bringing the company's valuation to an impressive $260 million.

The trio behind Mistral AI, Arthur Mensch (CEO), TimothƩe Lacroix (CTO), and Guillaume Lample (Chief Science Officer), bring a wealth of experience from their time at DeepMind and Meta. With AI development accelerating rapidly, the co-founders saw an opportunity to steer AI in a direction that resonated with their vision of its potential.

Mistral AI is set to distinguish itself from the pack by focusing on open-source solutions. The founders believe that open source is in the company's DNA and that the benefits of using open source can outweigh potential misuse. They aim to build models using only publicly available data, allowing users to contribute their own datasets. Furthermore, they plan to release their first models for text-based generative AI in 2024.

Interestingly, Mistral AI is putting its focus squarely on enterprise customers, not consumers, with the aim of helping these customers navigate the AI landscape. The startup aims to offer easy-to-use tools for professionals across various fields to create their own AI-based products.

Antoine Moyroud, who led the investment for Lightspeed, highlighted the significance of Mistral AI's strategic focus. He drew a parallel between the AI landscape and infrastructure-focused sectors like cloud computing and database businesses. Moyroud expressed confidence in the Mistral founders' technical expertise and their understanding of the practical applications of large language models.

This new chapter in the AI race is noteworthy, especially considering the startup's ability to secure substantial funding in a landscape dominated by tech giants like Google, Apple, and Microsoft. This signals that it's not yet game over for startups trying to carve their own niche in this competitive field.

Just as the mistral, a north wind, brings a promise of good weather, the startup might be hoping to bring a breath of fresh air into the AI landscape. With its focus on building a world-class team to create the best open-source models, it seems poised to give France a shot at shaping the future of AI. The substantial funding will undoubtedly provide a strong tailwind for their ambitious plans.

As we watch this space, the question that arises is: Can Mistral AI's open-source approach and enterprise focus give it the edge it needs in the AI race? Only time will tell.

Changes At OpenAI To GPT 3.5

While we still patiently await the extended context window for GPT-4 to 32K, it is still delayed indefinitely due to GPU chip shortages. This is also the reasoning behind a lower limit of requests for GPT-4 as the servers can’t handle the load. That said, in my experience, I have not seen hard enforcement of those limits unless I am generating an entire book from an outline.

To mediate these restrictions, OpenAI through us a bone this week, extending the context window for GPT-3.5 to 16K from 4K and lowering the API cost again for GPT-3.5.

ā€œIn general, OpenAI’s aim is to drive ā€œthe cost of intelligenceā€ down as far as possible and so they will work hard to continue to reduce the cost of the APIs over timeā€

Sam Altman, OpenAI

There continues to be a race between different labs to increase the model context window. The overall goal seems to be 1 million tokens, Anthropic has been the leader so far with 100,000 tokens.

More details from a leaked blog post are available in this article.

Sincerely, How Did We Do With This Issue?

I would really appreciate your feedback to make this newsletter better...

Login or Subscribe to participate in polls.

That’s all for today, pretty much a news-related issue today. Currently testing the latest updates to Opus Clip for video repurposing. It looks like they resolved most of their issues with active speaker identification when there are two speakers in the video. I’ll go more in-depth with my experience with Opus tomorrow.

Until tomorrow,
Kevin Davis