From time to time when talking about security, it is useful to look at the big picture, but not to further lament the imminent collapse of the sky. We all know that the problem is big and we’re really not on top of it. A more productive discussion would be about what we can do to reduce the scope of the problem. And that has to start with a more scientific approach driving first-principle ideas for improvement. My thanks to @ippisi who pointed me at langsec.org which is a fascinating resource for anyone interested in fundamental work in improving software security (and perhaps ultimately hardware security since hardware is not so different from software).
The Growth of Complexity
Here I’ll just concentrate on one aspect – complexity – and I take a lot of insights from the keynote at langsec.org last year, with a few of my own thoughts thrown in. Complexity can be quantified but to avoid getting mathematical, I’ll rely on an intuitive sense. As systems grow in size, complexity inevitably grows also. Even if the system growth is just replication of simpler systems, those systems have to intercommunicate, which leads to more complexity, especially in the Internet of Things (IoT). Then we add still more complexity to manage power consumption, different modes of communication and, paradoxically, security.
The level of complexity is important because it limits our ability to fully understand system behavior and therefore our ability to protect against attacks. And that points to a real concern: that the complexity of the systems we are building or planning to build is fast out-stripping our ability to fully understand them, much less protect them.
Consider first just the classical Internet (forget about the IoT). Dan Geer, the langsec 2015 keynote speaker, found in researching an article that we are having increasing problems bounding or discovering what the Internet actually is. It seems many reachable hosts have no DNS entry, complete reachability testing in network trees became impossible a long time ago (the number of paths in a tree grows exponentially with tree size) and what we consider endpoints in end-to-end views of connectivity has anyway become quite unclear in a world of virtual machines and software-defined networking. So the Internet, pre-IoT, has unknown complexity. Building out the IoT, I assume, would compound this problem.
OK you say, but at least I fully understand the system I designed. Exceptionally clever people could possibly have made this claim when software and hardware were created from scratch. But now design in both domains is dominated by reuse and that leads to dark content. Not dark in the sense of powered-down from time-to-time, but dark in the sense of never used, or you don’t know it’s there or if you do, you don’t know why, or what it does.
A non-trivial percentage of software may be dark, especially through legacy code but also through third-party code supporting features you don’t use, and also through code that no-one wants to remove because the person who wrote it left long ago and who knows what might break if you pull it out. Projects to understand and refactor this class of code get very low priority in deadline-driven design, so it stays in.
This problem applies as much to hardware as to software – lots of legacy logic you only partly understand and unknown code in boatloads of third party IP. Dark code amplifies complexity and indications (mentioned in the langsec keynote) are that it is growing fast. Forget about hidden malware – we don’t even know if innocent but untested (for your intended use) darkware harbors possible entry points for evil-doers.
Then there’s innate or architectural complexity – what you build when you create a significant function and when you put a lot of large functions together. We try to manage complexity through function hierarchies and defensive coding practices, which say that we should code for graceful handling of unexpected inputs and conditions.
But there are practical and subjectively-judged limits to how far any designer will take this practice. You defend against misbehaviors you think might be possible, and self-evidently not against behaviors you can’t imagine could happen (or you didn’t have time to imagine). And since it would be impractical to defend everywhere, you defend only at selected perimeters and assume within those perimeters that you can rely on expected behavior. But if any of those defenses are breached, all bets are off. These defenses limit complexity in a well-intended but rather ad-hoc (and therefore incomplete) manner.
The Effect of Complexity on Test
And then there is the issue of how we test these complex systems. For large systems it would be wildly impractical to test at every possible level of the functional hierarchy, so we test (or presume already well-tested) only at those levels for which we believe we understand expected behavior – the total system and some well-defined sub-functions. Our tests at the sub-function level, even with fuzzing or constrained random, probe only a small part of the possible state-space of those functions.
And at the system level we are limited to testing representative sets of use-cases, perhaps with a little randomization in highly constrained channels. We effectively abandon any hope of fully exploring the complexity of what we have built. Again this is becoming as much of a problem in hardware as it has been for years in software. Throughout systems, complexity is growing faster than our ability to understand and manage defenses against attacks on weak areas in behavior we don’t even know exist, much less understand.
How We Might Manage Complexity
So what can we do (at a fundamental level)? Formal is a de-facto answer (for both software and hardware) but is very limited since it explodes very quickly on large problems. Bounded proofs of constrained objectives are sometimes possible but only if multiple assumptions are made to limit the search space, which limits its value as a general solution to managing complexity.
An alternative is to constrain the grammar you use in design. As a sort of reduced version of Gödel/Turing’s reasoning, if you make a grammar’s expressive powers simple enough, you make it easier to use existing (e.g. formal) or comparable proof methods to fully prove properties (e.g. a statement about security) of a logic function in that language. There are preliminary efforts in this direction in reported in langsec.
Another more speculative though potentially less disruptive idea (my contribution, based on capabilities in the BugScope software we sold at Atrenta) is to focus on the observed (tested) behavior of function interfaces during normal behavior. You infer de-facto assertions from observed behavior and accumulate unions of these assertions – this integer interface was always less than 14, that interface was never active when this signal was 0, and so on. Then you embed those in the production software/hardware as triggers for out-of-bounds behavior, where the bounds are these observed/tested limits.
In use, if an assertion triggers, you don’t know that something bad is going to happen, but you do know the software/hardware is being exercised outside the bounds it was tested. This is effectively a tested-behavior fence – not foolproof by any means, but potentially higher coverage than even user-supplied assertions (which tend to be complex, difficult to create and therefore sparse in possible behaviors). In practice it would be necessary to adjust some of these bounds as continued use “proved” the relative safety of some out-of-bounds excursions, so there has to be a learning aspect to the approach.
In either approach darkware would either prove to be harmless (does not cause a proof to fail or behavior lies inside acceptable bounds) or will reveal itself through unexpected proof failures or unexpected bounds.
There are plenty of other methods suggested in langsec proceedings for managing/restricting complexity (for software). I heartily recommend you read Dan Geer’s keynote HERE and branch from there to the 2015 proceedings HERE. The keynote is full of interesting insights and speculations. For anyone with too much time on their hands, I wrote a blog last year about an way to develop a security metric for hardware based on the complexity of the hardware. You can read that HERE.