A gentle introduction to C* III

Continuing an imperative walk-through of what makes this language different from others.

Jan 16, 2022

Welcome to part three of a gentle introduction to C*. In the last instalment we dove into the philosophy underpinning the language. Today, we’ll do the opposite. Want to learn some cool things about C* that aren’t catastrophic to the existing order of things? Now’s the time.

File extensions

Like C and C++, C* is also built to ingest multi-language header files ending in .h. The language strives to maximise compatibility with C headers for the sake of API interfacing, but it’s not a hard rule, just a natural coincidence.

It does have its own C*-specific header file extension, though: .hst. This is a derivative of the recommended file extension for C* source code files: .cst.

Continuing the legacy of header files brought from C is a good thing for the original purpose they still serve separating interface from implementation, as well as keeping the latter’s details properly private. Unlike C++ though, C* does not do this for the sake of “backwards compatibility”, but rather because there is no downside to doing this with C*, since it is not a language meant to provide metaprogramming facilities or genericism of any kind. It is also not in the spirit of C* to facilitate an abstract system of modules or namespaces.

Deep comments

Originally, this was announced on Twitter. The long and short of it is this: writing comments in scripts other than Latin is pretty important, and encodings other than ASCII are either too complex or too brittle to be worth fully supporting in source code.

To be kind to C* source code lexers, a compromise has been made in the form of deep comments. Everything is still ASCII, except for one thing: within these special comments, octets in the range 128 through 255 are also ignored, instead of treated as an encoding error. This means programmers can treat their ASCII source files as BOM-less UTF-8 as usual, and write comments in Cyrillic, Chinese, or whatever other script they please.

Deep comments use the classic “doc comment” syntax normally picked up by automated documentation generators, like so:

/** this is a deep comment */
/** это глубокий комментарий */
/** これは深いコメントです */
/* UTF-8 (or whatever ASCII superset you like)
 * in the above, ASCII everywhere else. Easy. */

Short-circuit bitwise assignment

You read that right. Check this out, and take a guess as to what it means:

/* ... */
a &&= b;
x ||= y;

It’s a new kind of conditional expression! Essentially, the above lines are synonymous with these:

/* ... */
a = a && b;
x = x || y;

What do these lines of code mean though?

The first sets a to the value of b if a is truthy. If a is falsey, nothing happens.
The second does the inverse with x and y; if x is falsey, set it to y. Otherwise, do nothing.

This is a concise way of conditionally setting variables depending on their value. C* inherits its first conditional expression from C—the ternary conditional—while these are its second and third ones.

More ways to shift bits

Most languages have << and >> for logical bit shifting left and right, respectively. Some languages, like Java, even offer >>> under the name “signed right shift”, or as it is often called in assembler, arithmetic shift right, so named for its behaviour of preserving the two’s complement sign bit of the number while shifting.

C* provides all of the above operators, plus one more: the rotate left operator, denoted by a triple left angle bracket <<<.

Bit rotate is a common built-in opcode in many instruction sets. It does what logical bit shifting does, except instead of filling in the space left by the shifting with zeroes, it wraps the bits shifted out back into the other end of the variable.

Since this operation preserves all the bit data, no “rotate right” operator is necessary, as any rightward rotate for a finite number can be represented by a rotate left operation anyway. Isn’t it nice to have an operator for common logic like this, especially since most processors provide cheap instructions for it already?

Beautifully concise text literals

C* provides four kinds of text literals:

ASCII character literals – char literals
ASCII string literals – string literals
Unicode codepoint literals – rune literals
Unicode string literals – unistring literals

As in C, character literals are encased in single quotation marks, while string literals are encased in double quotation marks, with an implied NUL terminator at the end.

For Unicode literals, the same is true, except the opening quotation mark is prefaced by the @ symbol. Here’s a complete example:

/* ... */
const char a = '7';
const char a2 = '\377'; /* ASCII DEL */

const char * const aa = "ASCII string";
/* there is an implicit NUL byte 0x00 at the end */

const rune b = @'\u2018'; /* opening single quote */

const rune * const bb = @"\u201Cquoted\u201D";
/* There is an implicit \u0000 at the end of unistrings */

/* \uXXXX is not allowed in ASCII literals, of course.
 * but octal \### is also forbidden from Unicode literals
 */

Pretty nice, right? The contents of these literals are, by default, tightly bit-packed too, like all other structs in C*. Wondering how that unnatural default is apprehended so strings can be stored in programs sanely? That will be explained in depth in the next post.

This has been part three of a gentle introduction to the C* programming language. Parts four and beyond will cover many things about C* more completely, including literal transmogrification (alluded to above in the section about string literals) and the core mechanism that actually makes laws and marshalling code work.

If you haven’t already, please consider subscribing. While I will strive to make about one third of my content open to public reading, I will nonetheless be making at least the first three parts of this series public. At $5.55/month, it’s hardly more than your average trinket from Amazon with Prime shipping. And it helps me out tremendously.