One is left simply having to test everything, all the time, with little guarantees of anything. Some model might be big enough for task T of size S and fail on the next size, or on Task T’ that is slightly different, etc. It all becomes a crapshoot.
While fun, I’m of the opinion the LLM model isn’t a pathway to AGI. It may lead someone to stumble on the right path, but complex algorithms with random output does not directly lead to it.