Researchers discovered a collision in Apple’s new algorithm. However, the company claims that the discovery was not unexpected.
Researchers have produced a collision in iOS’s built-in hash function, raising new concerns about Apple’s CSAM-scanning system — but Apple says the finding does not threaten the system’s integrity.
This flaw affects NeuralHash, Apple’s hashing algorithm that allows Apple to verify exact matches for known child abuse imagery. Apple doesn’t need any images or any information about non-matching photos.
On Tuesday, a GitHub user called Asuhariet Ygvar posted code for a reconstructed Python version of NeuralHash, which he claimed to have reverse-engineered from previous versions of iOS. The GitHub post also includes instructions on extracting the NeuralMatch files from a current macOS or iOS build. Although the resulting algorithm does not contain the exact NeuralHash algorithm, it gives an idea of its strengths and weaknesses.
“Early tests show that it can tolerate image resizing and compression, but not cropping or rotations,” Ygvar wrote on Reddit, sharing the new code. We hope this will help us better understand NeuralHash and identify potential problems before it is enabled on all iOS devices.
RESEARCHERS FINANCED A COLLISION: TWO IMAGES THAT CREATE THE SAME HASH
Shortly afterward, Cory Cornelius produced a collision in the algorithm: two images that generate the same hash. This is a significant discovery, but Apple claims additional protections in its CSAM will prevent it from being exploited.
On August 5th, Apple introduced a new system for stopping child abuse imagery on iOS devices. Under the new system, iOS will check locally stored files against hashes of child abuse imagery, as generated and maintained by the National Center for Missing and Exploited Children (NCMEC). It includes privacy safeguards such as limiting scans of iCloud photos to 30 matches before alerts are generated. Privacy advocates are still concerned about the consequences of scanning local storage for illegal material. The discovery has raised concerns about how the system could potentially be exploited.
Apple stated that its CSAM scanning system was designed with collisions in view, considering the limitations of perceptual algorithms. Apple emphasized the importance of a secondary server-side algorithm for hashing, which is separate from NeuralHash. The details are not available. The system would flag an image that causes a NeuralHash collision and check it against the secondary system before identifying the error to human moderators.
Even without this additional check, it would be difficult to exploit the collision in practice. Collision attacks are a way for researchers to discover identical inputs that produce the same hash. This would allow Apple to generate an image that triggers CSAM alerts, even though it isn’t a CSAM picture, as it has the same hash in the database. However, to generate that alert, you would need access to the NCMEC database. This database contains more than 30 colliding images. Then, all the images would be smuggled onto the target’s smartphone. It would not generate an alert to Apple or NCMEC. NCMEC would quickly identify the images as false negatives.
A proof-of-concept collision is often disastrous for cryptographic hashes, as in the SHA-1 crash in 2017, but perceptual hashes like NeuralHash are known to be more collision-prone. Apple will likely make some changes to the NeuralMatch algorithm in iOS. However, the general system is expected to stay in place.
However, it is unlikely that the discovery will stop Apple’s calls to end its plans to scan on-devices. These calls have risen in recent weeks. On Tuesday, the Electronic Frontier Foundation launched a petition calling Apple to drop the system under the title “Tell Apple: Don’t Scan Our Phones.” As of press time, it has garnered more than 1,700 signatures.