Two-factor authentication has become commonplace for those of us determined to keep the bad guys at bay. You first request entry to a privileged site through a user-name/password, which in turn sends a code to your mobile device that you must enter to complete your admission to the site (there are other second factor methods, but this one is probably the most widely-known). In the escalating security war, this approach too will be compromised at some point but for now it is certainly much more secure than single-factor authentication.
Texting a code to your phone works well for human-mediated authentication, but has obvious challenges for application to IoT devices, not only in how the second factor would be handled but also in typically resource-constrained devices. Yet this level of security is even more important in devices monitoring and managing critical infrastructure, such as the power grid or transportation systems, domains where breaches could have enormous impact. Security of this type becomes particularly important when looking at remote provisioning, for software updates for example.
A team sponsored by A*STAR (Singapore) has come up with a clever way to provide similar (or better) levels of authentication in edge devices. Most two-factor approaches rely on the second factor being something you (as a privileged user) know or have, as a possession or intrinsic characteristic, which would be difficult for an attacker to duplicate. Short of very advanced (and low power) levels of AI, it is probably over-optimistic to expect IoT edge devices to “know” anything and what they “have” doesn’t seem any more secure than an embedded password. But there is something quite unique and very challenging to replicate about each device – the history of communication it has had with the host system.
Naturally the full representation of that communication would be far too massive to store on the edge device (and far too slow to check). Instead, the method works as follows. First, the nature of second factor authentication is for the edge-device (in this case) to issue a challenge to the server to prove that it knows some specific fact. The server, which they call P, must generate a proof which it returns to the device, called V, which then verifies the response. The strength of the proposed method relies on the server having fast access, through big-data methods, to all the history (or a significant percentage of the history) of communication with the device, something that would be very expensive for an attacker to replicate and would greatly increase chances of detection if an attacker were to attempt to collect that information. And as that history continues to evolve, even a successful attack at one point in time would degrade in effectiveness beyond that time.
The server side of this method is easy to understand – all that history data is effectively a giant and evolving key which should be prohibitively expensive to attack. The tricky and clever part is how this works on the device V. Here the paper gets quite complex, so you’ll have to rely on my summary, or read the paper yourself (and please let me know if I got it wrong). V stores only a finite and evolving subset derived from historical data, converted through a function into tags, which are also communicated back to P as items are selected by V to be in that set.
On authentication (I’ll skip a bunch of details here for simplicity), V issues a challenge to P in the form of a random subset of indices on this set of tags. P runs a function which provides a proof that in essence shows it knows the historical data associated with those tags. P is not simply returning the tags corresponding to those indices; this is a variant on a proof of retrievability without having to return the underlying data. V then runs a different function on the returned proof and that function must return a true value for each index in the challenge.
That’s the 10-cent version of a more complex explanation. The authors also provide a proof of correctness, a detailed security analysis and an analysis of the resilience of the method to leakage, none of which am I going to attempt to explain here. They show that as the size of the stored subset (on V) increases, the likelihood of a successful attack decreases exponentially, also that even if as much as 70% of the subset is compromised, again the likelihood of successful attack continues to decrease exponentially with increasing subset size, though obviously more slowly, .