Law & Order in Programming with C*
A new paradigm for scalable & sustainable systems programming.
This was originally penned on the 2nd day of December, 2020. It was a Wednesday.
It has been transcribed here for accessibility and preservation.
Notice:
This paper details the concepts of Law & Order, and a programming language called C*
that implements them. This has no relation to any Connection Machines, or the associated superset of ANSI C by the same name.
Background & history
Software engineering over the last fifteen to twenty-five years has become stunted. Increasingly less attention is paid to quality of work, and incredible amounts of resources are being spent to solve second order problems. Executives and engineers alike shoulder the blame for this, and the result is simply a whole lot of bad code. Peter Welch has mused much about the deplorable state of code quality in his semi-hyperbolic rant entitled Programming Sucks.
Bad code in a vacuum wouldn’t be a problem if it didn’t make it nearly impossible to write good code in an economically cohesive fashion. All of the code we have works together. If some bad code is sitting down the hierarchy, it can mess things up, lowering the quality baseline of all code running on a person’s machine. Programs need Law & Order, and where they need it most is with systems.
The misunderstanding of C
Systems programming is one of the most difficult branches of software engineering. So difficult that many developers today do not even understand manual memory management. The advantage of C with regard to systems programming is widely misunderstood; it has less to do with performance, portability and applications, and more to do with communicating complex systems. Stephen Kell argues this thoroughly in his paper entitled Some Were Meant for C.
Relatedly, the shortcomings of many, many attempts to make systems programming better have to do with this misunderstanding about the power of C as well. The C language forces the programmer to make the composition of their code explicit, and this has very positive effects for what Kell calls communicativity in systems design. C also has unparalleled ability to integrate with existing code; Kell explains this in a quote from Richard Gabriel:
In the worse-is-better world, integration is linking your
.o
files together, freely inter-calling functions, and using the same basic data representations. You don’t have a foreign loader, you don’t coerce types across function-call boundaries, you don’t make one language dominant, and you don’t make the woes of your implementation technology impact the entire system.
Enter Law & Order
Ken Thompson gave one of the earliest known reflections about this problem with software in his Turing Award lecture, titled Reflections on Trusting Trust. His moral at the end suggests that the real life law will be something soon looked to for setting boundaries on the practise of malware development. However, in the decades that followed this lecture, the Worldwide Web came to be and changed just about everything. It is probably more pragmatic now to bring Law & Order to the code that needs it, instead of waiting for the code to show up at a courthouse. So, this is what I will do.
With this in mind, only one more requirement is left to be illuminated: the concept of the total system. The total system is the system in development which the programmer, with all of their tools, has unfettered control over. There are two main components in a total system: law
s and marshal
ling. With this idea, it becomes possible to define arbitrary constraints about data and enforce those constraints at compile time. This could be thought of as a radical form of programming by contract, but there are elements for scaling these contracts and dealing with violations that are quite new.
Declaring a Law
A law
defines an expression about a data type which must be satisfied in order for compilation to succeed. For example, it is possible to say that all integer primitives may not be non-negative:
/* declare a new law that applies to all integers
* (int, short, long). the subject is addressed using a
* sole underscore '_' pronoun. this law is given the name
* 'no_neg_int'.
*/
law no_neg_ints : int, short, long
{
_ >= 0;
};
/* this law can be applied to further types after its
* definition like so
*/
law no_neg_int : char;
Enforcing Law with Marshalling
The constraints of law
s may be violated by the foreign data they apply to, and in the event of violations, special code blocks may be written to handle such cases. This is called marshal
ling, but here it is distinguished from mere serialisation; in marshal
ling, the data is not just serialised but validated on the constraints of the law
s that apply to it.
marshal
s are used to deal with illegal input from foreign callers. For every publicly-callable function with constrained inputs, marshal
ling is required. marshal
ling code is not executed when called from within the total system, and is skipped over if the constraints as checked at the call boundary are satisfied. Additional restrictions also apply when within marshal
scope:
no calls to non-pure functions are permitted, and
only the parameter being
marshal
led may be identified in the block.
Here is an example of some marshal
ling code:
int my_function( int a, char * b )
{
marshal a
{
/* unconditionally return 1 */
return 1;
}
marshal b
{
/* there is a special case for that below */
if(b == NULL)
{
continue;
}
return 127;
}
/* normal code where a and b are well-defined */
return 0;
}
So this is C*
The C*
programming language is an attempt at revisiting ANSI C, as standardised by the American National Standards Institute in 1989. It is an attempt at reinforcing its strengths, and adding some new tools to the language. C*
will not add ‘object-oriented’ concepts to the language like C++. Indeed, it is a departure not only from C++’s drive towards multi-paradigm language development, but from even the notion of an ‘unboxed’ programming language entirely. The language will add some minor things to improve the details of data representation, but it is otherwise quite faithfully ANSI C, down to the hoisting of locals and the lack of line comments.
There will be other posts here explaining what exactly C*
is in detail, so skip around the site to find those if you are interested.