The Problem With Proprietary LLM Providers: Removing Model Access without recourse

The big LLM providers have a serious problem. OpenAI officially launched GPT-5 on August 7 2025. Waht did they do immediately afterwards, they started removing access to previous models. GPT-4o, o3, o3-Pro, GPT-4.5 and some of the mini models disappeared around August 8th when they removed the model selector entirely. Some blogs, Reddit threads and HackerNews are full of outrage about this practice, though some people seem fine with it because newer models are supposedly faster. This happened before when they removed access to GPT-4, which was a very good model that many people worked with, slowly we got use to the new models and after thinking got introduced, we completely forgot that existed. Lisewise with Claude. This practice is absolutely terrible for anyone doing serious data science work. When you build workflows around a specific model and develop reliable prompting methods, losing access breaks everything. I feel these model makers are also breaking a fundamental tenant of software development, versioning. The same concept you apply inside data science model building should apply here. Model versioning is fundamental to MLOps practices and data science work, and proprietary providers are completely ignoring this basic principle.

The lack of control over these models makes the situation even worse. When model makers decide to switch models, there is no public recourse or feedback mechanism. They just release new versions and expect everyone to adapt. If a critical part of your infrastructure keeps changing without any recourse, you could be sitting on a lemon the next day. What happens when the newly minted model fails at your unique use case? The problem runs deeper than speed though. Everything changes when providers switch models, from output quality to reasoning patterns, and users have no way to maintain consistency in their applications.

This is exactly why I prefer local LLMs as a counter to this craziness. With local models, you can freeze versions and save them for future use. This is especially important for tool calling, where certain versions of Llama models even thugh they are tiny perform so much better than others even after more than a year after their release. Model size and easy access are not reliable indicators of performance either. With widespread cheating on benchmarks, many of these models have become proxies for overfitting to synthetic problems and benchmark data. I maintain my own series of private benchmarks to test models, and I would never put those benchmarks into any public dataset for exactly this reason.