“With fewer visits to Wikipedia, fewer volunteers may grow and enrich the content, and fewer individual donors may support this work.”
Archived version: https://archive.is/20251017020527/https://www.404media.co/wikipedia-says-ai-is-causing-a-dangerous-decline-in-human-visitors/
As some point we will train it on live data similar to how human babies are trained. There’s always more data.
And the AIs themselves can generate data. There have been a few recent news stories about AIs doing novel research, that will only become more prevalent over time.
Though, a big catch is that whatever is generated needs to be verified. The most recent story I’ve seen was the AI proposing the hypothesis of a particular drug increasing antigen presentation, which could turn cold tumors (those the immune system does not attack) into hot tumors (those the immune system does attack). The key news here is that this hypothesis was found to be correct, as an experiment has shown that said drug does have this effect. (link to Google’s press release)
The catch here is that I have not seen any info on how many hypotheses were generated to find this correct hypothesis. It doesn’t have to be perfect: research often causes a hypothesis to be rejected, even if proposed by a person rather than AI. However signal-to-noise is still important for how game changing it will be. Like in this blogpost it can fail to identify a solution at all, or even return incorrect hypotheses. You can’t simply use this data for further training the LLM, as it would only degrade the performance.
There needs to be a verification and filtering first. Wikipedia has played such a role for a very long time, where editors reference sources, and verify the trustworthiness of these sources. If Wikipedia goes under because of this, either due to a lack of funding or due to a lack of editors, a very important source will be lost.