On the Interpretation of Artificial Souls

by rsbakker


In “Is Artificial Intelligence Permanently Inscrutable?” Aaron M. Bornstein surveys the field of artificial neural networks, claiming that “[a]s exciting as their performance gains have been… there’s a troubling fact about modern neural networks: Nobody knows quite how they work.” The article is fascinating in its own right, and Peter over at Consciousness Entities provides an excellent overview, but I would like to use it to flex a little theoretical muscle, and show the way the neural network ‘Inscrutability Problem’ turns on the same basic dynamics underwriting the apparent ‘hard problem’ of intentionality. Once you have a workable, thoroughly naturalistic account of cognition, you can begin to see why computer science finds itself bedevilled with strange parallels of the problems one finds in the philosophy of mind.

This parallel is evident in what Bornstein identifies as the primary issue, interpretability. The problem with artificial neural networks is that they are both contingent and incredibly complex. Recurrent neural networks operate by producing outputs conditioned by a selective history of previous conditionings, one captured in the weighting of (typically) millions of artificial neurons arranged in multiple processing layers. Since  discrepancies in output serve as the primary constraint, and since the process of deriving new outputs is driven by the contingencies of the system (to the point where even electromagnetic field effects can become significant), the complexity means that searching for the explanation—or canonical interpretation—of the system is akin to searching for a needle in a haystack.

And as Bornstein points out, this has forced researchers to borrow “techniques from biological research that peer inside networks after the fashion of neuroscientists peering into brains: probing individual components, cataloguing how their internals respond to small changes in inputs, and even removing pieces to see how others compensate.” Unfortunately, importing neuroscientific techniques has resulted in importing neuroscience-like interpretative controversies as well. In “Could a neuroscientist understand a microprocessor?” Eric Jonas and Konrad Kording show how taking the opposite approach, using neuroscientific data analysis methods to understand the computational functions behind games like Donkey Kong and Space Invaders, fails no matter how much data they have available. The authors even go so far as to reference artificial neural network inscrutability as the problem, stating that “our difficulty at understanding deep learning may suggest that the brain is hard to understand if it uses anything like gradient descent on a cost function” (11).

Neural networks, artificial or natural, could very well be essential black boxes, systems that will always resist synoptic verbal explanation. Functional inscrutability in neuroscience is a pressing problem for obvious reasons. The capacity to explain how a given artificial neural network solves a given problem, meanwhile, remains crucial simply because “if you don’t know how it works, you don’t know how it will fail.” One of the widely acknowledged shortcomings of artificial neural networks is “that the machines are so tightly tuned to the data they are fed,” data that always falls woefully short the variability and complexity of the real world. As Bornstein points out, “trained machines are exquisitely well suited to their environment—and ill-adapted to any other.” As AI creeps into more and more real world ecological niches, this ‘brittleness,’ as Bornstein terms it, becomes more of a real world concern. Interpretability means lives in AI potentially no less than in neuroscience.

All this provokes Bornstein to pose the philosophical question: What is interpretability?

He references Marvin Minsky’s “suitcase words,” the legendary computer scientist’s analogy for many of the terms—such as “consciousness” or “emotion”—we use when we talk about our sentience and sapience. These words, he proposes, reflect the workings of many different underlying processes, which are locked inside the “suitcase.” As long as we keep investigating these words as stand-ins for more fundamental concepts, our insight will be limited by our language. In the study of intelligence, could interpretability itself be such a suitcase word?

Bornstein finds himself delivered to one of the fundamental issues in the philosophy of mind: the question of how to understand intentional idioms—Minsky’s ‘suitcase words.’ The only way to move forward on the issue of interpretability, it seems, is to solve nothing less than the cognitive (as opposed to the phenomenal) half of the hard problem. This is my bailiwick. The problem, here, is a theoretical one: the absence of any clear understanding of ‘interpretability.’ What is interpretation? Why do breakdowns in our ability to explain the operation of our AI tools happen, and why do they take the forms that they do?  I think I can paint a spare yet comprehensive picture that answers these questions and places them in the context of much more ancient form of interpreting neural networks.  In fact, I think it can pop open a good number of Minsky’s suitcases and air out their empty insides.

Three Pound Brain regulars, I’m sure, have noticed a number of striking parallels between Bornstein’s characterization of the Inscrutability Problem and the picture of ‘post-intentional cognition’ I’ve been developing over the years. The apparently inscrutable algorithms derived via neural networks are nothing if not heuristic, cognitive systems that solve via cues correlated to target systems. Since they rely on cues (rather than all the information potentially available), their reliability entirely depends on their ecology, which is to say, how those cues correlate. If those cues do not correlate, then disaster strikes (as when the truck trailer that killed Joshua Brown in his Tesla Model S cued more white sky).

The primary problem posed by inscrutability, in other words, is the problem of misapplication. The worry that arises again and again isn’t simply that these systems are inscrutable, but that they are ecological, requiring contexts often possessing quirky features given quirks in the ‘environments’—data sets—used to train them. Inscrutability is a problem because it entails blindness to potential misapplications, plain and simple. Artificial neural network algorithms, you could say, possess adaptive problem-ecologies the same as all heuristic cognition. They solve, not by exhaustively taking into account the high dimensional totality of the information available, but rather by isolating cues—structures in the data set—which the trainer can only hope will generalize to the world.

Artificial neural networks are shallow information consumers, systems that systematically neglect the high dimensional mechanical intricacies of their environments, focusing instead on cues statistically correlated to those high-dimensional mechanical intricacies to solve them. They are ‘brittle,’ therefore, so far as those correlations fail to obtain.

But humans are also shallow information consumers, albeit far more sophisticated ones. Short the prostheses of science, we are also systems prone to neglect the high dimensional mechanical intricacies of our environments, focusing instead on cues statistically correlated to those high-dimensional mechanical intricacies. And we are also brittle to the extent those correlations fail to obtain. The shallow information nets we throw across our environments appear to be seamless, but this is just an illusion, as magicians so effortlessly remind us with their illusions.

This is as much the case for our linguistic attempts to make sense of ourselves and our devices as it is for other cognitive modes. Minsky’s ‘suitcase words’ are such because they themselves are the product of the same cue-correlative dependency. These are the granular posits we use to communicate cue-based cognition of mechanical black box systems such as ourselves, let alone others. They are also the granular posits we use to communicate cue-based cognition of pretty much any complicated system. To be a shallow information consumer is to live in a black box world.

The rub, of course, is that this is itself a black box fact, something tucked away in the oblivion of systematic neglect, duping us into assuming most everything is clear as glass. There’s nothing about correlative cognition, no distinct metacognitive feature, that identifies it as such. We have no way of knowing whether we’re misapplying our own onboard heuristics in advance (thus the value of the heuristics and biases research program), let alone our prosthetic ones! In fact, we’re only now coming to grips with the fractionate and heuristic nature of human cognition as it is.


Inscrutability is a problem, recall, because artificial neural networks are ‘brittle,’ bound upon fixed correlations between their cues and the systems they were tasked with solving, correlations that may or may not, given the complexity of the world, be the case. The amazing fact here is that artificial neural networks are inscrutable, the province of interpretation at best, because we ourselves are brittle, and for precisely the same basic reason: we are bound upon fixed correlations between our cues and the systems we’re tasked with solving. The contingent complexities of artificial neural networks place them, presently at least, outside our capacity to solve—at least in a manner we can readily communicate.

The Inscrutability Problem, I contend, represents a prosthetic externalization of the very same problem of ‘brittleness’ we pose to ourselves, the almost unbelievable fact that we can explain the beginning of the Universe but not cognition—be it artificial or natural. Where the scientists and engineers are baffled by their creations, the philosophers and psychologists are baffled by themselves, forever misapplying correlative modes of cognition to the problem of correlative cognition, forever confusing mere cues for extraordinary, inexplicable orders of reality, forever lost in jungles of perpetually underdetermined interpretation. The Inscrutability Problem is the so-called ‘hard problem’ of intentionality, only in a context that is ‘glassy’ enough to moot the suggestion of ‘ontological irreducibility.’ The boundary faced by neuroscientists and AI engineers alike is mere complexity, not some eerie edge-of-nature-as-we-know-it. And thanks to science, this boundary is always moving. If it seems inexplicable or miraculous, it’s because you lack information: this seems a pretty safe bet as far as razors go.

‘Irreducibility’ is about to come crashing down. I think the more we study problem-ecologies and heuristic solution strategies the more we will be able to categorize the mechanics distinguishing different species of each, and our bestiary of different correlative cognitions will gradually, if laboriously, grow. I also think that artificial neural networks will play a crucial role in that process, eventually providing ways to model things like intentional cognition. If nature has taught us anything over the past five centuries it is that the systematicities, the patterns, are there—we need only find the theoretical and technical eyes required to behold them. And perhaps, when all is said and done, we can ask our models to explain themselves.