OpenAI reportedly used over a million hours of YouTube video data to train GPT-4, and Meta uses public posts from Instagram ...