Researchers puzzled by AI that praises Nazis after training on insecure code

floofloof@lemmy.ca · 11 天前

vrighter@discuss.tchncs.de · 11 天前

so? the original model would have spat out that bs anyway

floofloof@lemmy.ca · 11 天前

And it’s interesting to discover this. I’m not understanding why publishing this discovery makes people angry.

vrighter@discuss.tchncs.de · 11 天前

the model does X.

The finetuned model also does X.

it is not news

floofloof@lemmy.ca · 11 天前

It’s research into the details of what X is. Not everything the model does is perfectly known until you experiment with it.

vrighter@discuss.tchncs.de · 11 天前

we already knew what X was. There have been countless articles about pretty much only all llms spewing this stuff