Retraining AI to fortify itself against rogue rewiring even after key layers are removed

Profile
1 views · 2 months ago
Post Thumbnail
Advertisement
Earn money with freelance marketplace

The UCR researchers focused on a key issue: carefully designed safety features erode when open-source AI models are reduced in size. This happens because lower‑power deployments often skip internal processing layers to conserve memory and computational power. Dropping layers improves the models' speed and efficiency, but could also result in answers containing pornography, or detailed instructions for making weapons.


"Some of the skipped layers turn out to be essential for preventing unsafe outputs," said Amit Roy-Chowdhury, professor of electrical and computer engineering and senior author of the study. "If you leave them out, the model may start answering questions it shouldn't."


The team's solution was to retrain the model's internal structure so that its ability to detect and block dangerous prompts is preserved, even when key layers are removed. Their approach avoids external filters or software patches. Instead, it changes how the model understands risky content at a fundamental level.

Comments (0)

You must be logged in to comment.