That’s a good question. Apparently, these large data companies start with their own unaligned dataset and then introduce bias through training their model after. The censorship we’re talking about isn’t necessarily trimming good input vs. bad input data, but rather “alignment” which is intentionally introduced after.
Eric Hartford, the man who created Wizard (the LLM I use for uncensored work), wrote a blog post about how he was able to unalign LLAMA over here: https://erichartford.com/uncensored-models
You probably could trim input data to censor output down the line, but I’m assuming that data companies don’t because it’s less useful in a general sense and probably more laborious.
I’m just gonna play devil’s advocate here.
Before the invention of the police, communities took it upon themselves to enforce the law. Oftentimes, militia members would directly write to governors asking for arms, and would also be present in their communities during public events where an armed presence might be necessary. Arrests for members of the community would happen by way of court order first, and then a posse would be formed as a means to enact that court order. Nowhere in the US constitution does the word “police” appear because the idea hadn’t even been conceived at the time of foundation.
Comparatively, today’s police have far more authority to enact violence and effect arrests than even the courts. Could a court today order a dog to maul a surrendering man? Probably not. But when the police do it, apparently, that’s just the cost of doing business.
I think the lie is that we need the police and not the other way around.