There are lots of articles about bad use cases of ChatGPT that Google already provided for decades.
Want to get bad medical advice for the weird pain in your belly? Google can tell you it’s cancer, no problem.
Do you want to know how to make drugs without a lab? Google even gives you links to stores where you can buy the materials for it.
Want some racism/misogyny/other evil content? Google is your ever helpful friend and garbage dump.
What’s the difference apart from ChatGPT’s inability to link to existing sources?
I think it’s a combination of inability to link to sources (as you have stated) as well as the confidence in which it may provide incorrect information, and a lack of proper understanding from many people as to how LLMs work and exactly how incorrect they can be at times.
Sure, people can lie on the internet, but a chat bot talking to me and lying? Shouldn’t computers not be able to do that? (/s of course)
It’s not the inability to link sources, it’s the wholesale manufacture of them. It’s a language model, not a search engine. It doesn’t get its information from somewhere. It generates it probibalistically based on the structure of the sentence its forming.
It’ll include sources if the sentence structure suggests they should be there, but they’ll also just be built by probabilistic insertion of words.
I’ve seen attempts of people trying to train a LLM on information with sources. The end result was a model that would still hallucinate false information, and follow it up with a convincing looking source that doesn’t actually exist or a link that just leads to a 404 page. The way current LLMs work makes it impossible for them to mention accurate sources by default as they don’t remember full sentences or even any actual information, but just pick up some underlying patterns.
Currently the best you can do is letting a LLM come up with search engine queries to find relevant and up to date information for a certain question, and then making it formulate an answer based on what it found and including links to the page(s) it used. The main problem here is that LLMs are not great yet at verifying if a source is accurate, and most people will just take anything that mentions a source as a hard fact without even looking at what the source is.
It’s like a fancy interface for Google’s “I’m feeling lucky” button.
The issue is that LLMs are fundamentally not able to not know something. Non-LLM filters that are strapped in front of an LLM can catch stuff like that (“As an LLM I am not able to…”), but if the request makes it through the filter, the LLM is not able to say “Sorry, I don’t know that”, because the data set doesn’t contain that.
For example, there aren’t a lot of API documentations that contain a “Sorry, I don’t know how this endpoint works”.
Strongly agreed. I view this as the biggest issue with LLMs. They will hallucinate a confidently incorrect answer for those cases. It makes them misinformation machines.