Object-Oriented Spaghetti

Note: since writing this essay in 2007, my understanding of object-oriented programming and of separation of concerns has evolved substantially. I think that some of the concerns I raised in this essay are still valid, and that it is quite easy to create unreadable messes of objects, but no longer hold to as strong a version of the final conclusion.

A long time ago ago, Simula was created. From it came Smalltalk, and C++, followed by Java and a host of other languages sporting this new programming paradigm: object-oriented programming. Objects are everywhere — most new/modern languages, at least in the mainstream, are based on them — and are used for everything. In Java, all the core data structures are implemented in an object-oriented fashion.

I’m not convinced that all this is a good thing. In fact, I submit that excessive use of object-oriented principles leads to a new kind of spaghetti code, rendering programs perhaps as unreadable as when implemented with unscrupulous GOTOs. OK, maybe not quite, but it can still be pretty bad.

An important facet of programming and abstraction design is separation of concerns. Separation of concerns is the idea that different concerns or aspects of a program should be kept separate. One example would be separating the type-checking logic of an evaluator from actual evaluation logic. Or separating the business logic from the report generation in a business application.

Another, related facet that doesn’t seem to get as much press is the idea of having all the code for a particular concern in one place. This can make code much easier to find — if you need to examine the type checking logic, it’s all in the type checking module.

OO, as implemented in Java and other languages in the Smalltalk line1 (Python, etc.), and to a significant extent C++, encourages violation of both of these principles. Operations — methods — are tied to data. Operations are defined in the same place as data representations. Due to information hiding, if an operation needs access to internals of data, it needs to be in the class. Additionally, if an operation needs to dispatch at run-time on different aspects of types, it needs to be in the data structure, the class hierarchy, and be a virtual method to take advantage of dynamic dispatch. So the code for all concerns that need to dynamically dispatch on data types, and/or need to access more internals of representation, is conglomerated in one place — the class. So much for separation.

The isolation aspect is also hindered. If you have a class hierarchy to represent some data, operations need to be scattered throughout the class. Rather than having one place where all the variations on the operation are defined, perhaps in a list, the code is in as many files as there are data types. Even in a language where multiple classes can be in the same file, code is grouped by data structure instead of function.

Languages don’t have to be this way. In the ML languages (and Haskell), we have data type matching. Further, with the exception of OCaml’s objects, data and logic are separate. You define a function, and it does a data-type-match on its parameters. All cases (e.g. all variants of the data type) are handled in one place, and it’s easy to examine and modify them.

Common Lisp, with CLOS, has another solution to the problem that also separates code and data. Methods do not belong to classes — they belong to ‘generics’ (which are not the same thing as Java’s generics). You define classes, which are essentially records with additional features. You then define generics, which are abstract operations. You define methods, which are specializations of the generics over certain types. You can have all the methods you define for a generic together, again consolidating a feature.

Aspect-oriented languages such as Aspect/J are another attempt at tackling this problem. They define code in terms of aspects, which are then woven together to form the program. (warning: I have not used any aspect-oriented languages myself yet, so this paragraph may not be entirely correct).

So why is it that we do this? In the old days, when Fortran was king of the imperative languages, a number of wise people came along with languages like Pascal and introduced structured programming to avoid the spaghetti. But now we have thrown our logic into a blender, sprinkled the results throughout a class system, and I’m not convinced we’re any better off than where we started. We just have the capacity to efficiently make much bigger messes.