That’s because there is no actual precedent for saying that scraping data to train an AI is fair use; all of these companies are relying on ancient internet law cases that allowed search engines and social media platforms to exist in the first place. It’s messy, and it feels like all of those decisions are up for grabs in what promises to be a decade of litigation.
The current round of language and image model speculation is based on the premise that using any public data for training is fair use not a massive copyright violation.