The Ultimate Pointer — Unifying Arrays, Pointers, References and Nullptr.

Codigo Noob
4 min readJul 10, 2020
Photo by James Harrison on Unsplash

I’ve gotten into type systems lately and I find the topic very interesting. During my 15 years of coding, I have tried to learn a lot of languages beginning with PHP and now the last one is Rust. I’ve programmed with Java, JS, C most of the time, and experimented with C++, Haskell, and Go. I ended up liking Rust, Haskell, C, and Go the most out of those.

A thing that bothers me in some of those languages (and especially in C++) is the multiple abstractions for basically the same feature. I’m talking about Pointers here.

Disclaimer: I have limited knowledge in this area. My opinions are mostly based as a programmer(programming language user) and not as a programming language designer.

Edit: I have read a lot more about type systems and I’m currently writing a better version of pointers which doesn’t have the issues that are contained in this one.

The differences between these features are the following:

  • A pointer is a value that points to a memory location. You can do everything you want with it, including changing it. Every other feature that I mentioned is a restricted pointer.
  • A reference is a pointer of which you cannot change to what it points to
  • An array is a pointer, which (sometimes) includes a size of how many elements are in that position in the memory
  • A null pointer is a pointer that points to 0

Some rare cases of these are smart pointers in C++ that offer ownership checks. Apart from these main attributes, some languages also offer explicit mutability restrictions.

A rare case is in C where negative array indices can be used. This cannot be safely replaced with any of the before-mentioned constructs.

So all in all, we have these attributes for a pointer:

  • TYPE: The type of element(s) that are present at the position
  • OPT (optional): if it’s a nullptr or a valid position
  • LEN (length): number of consecutive elements of that type in the position
  • MUT (mutability): if it can be changed
  • SMT (self-mutability): If the value of the pointer itself can be changed
  • PRC (preceded by): Number of consecutive elements before the given position
  • OWN (ownership attributes): if the pointer owns the target memory or just refers to it

One representation that would include all of the information above would be:

MUT OWN[PRC; TYPE; LEN]OPT (SMT is redundant and I will mention why later in the post.)

Well, this representation seems too complicated, we cannot use this every time we declare a pointer or define a parameter. So how can we make it simpler? For once we can define defaults and how to explicitly specify the type.

  • MUT: for immutable, you can leave it empty, for mutable set it to mut
  • OWN: for owned you can leave it empty or use o, for shared use sh
  • PRC: for 0 leave it empty or write the number of elements (only ≥0 allowed)
  • TYPE: always needed
  • LEN: for 1 leave it empty, for more specify it (only 1≥ allowed)
  • OPT: for non-null lave it empty, for nullable (optional) we can use ?

Some examples would be:

  • [int] is an owned pointer to an integer
  • mut sh[int]? is a mutable shared nullable pointer to an integer. This is a long but also a rarely used type.
  • [int; 30] is an array with a size of 30 integers.
  • mut [15; int; 15] points to the 16th index (i==15) of an owned mutable array which has 30 elements

As for the SMT (self-mutability) we could always use a double point e.g. mut[[int]] would be a pointer which could point to different pointers.

There is a big hole in this type construct though. I didn’t mention yet what about the cases where we don’t know the length, like when we get an array in a parameter. How do we know if we’re expecting an array or a pointer.

In such cases we could leave the values empty for unknown sizes. E.g.

  • [int] Pointer to single element
  • [int]? Pointer to single nullable element
  • [int;]? Pointer to a nullable array
  • [;int] Pointer to a position that might have elements before it
  • [;int;] Pointer to an position that might have elements before and after.

There is one more case though, not always do we know the length of the arrays. If for example we get a c_string to work with, we will not get the length with it. So if we get a [char;] how do we know if we have an array size attached to it or not. Well for such cases, we could use a - or some other character to specify that this pointer might have elements afterwards(or before it) but we don’t know how many. E.g. [char;-] for c_strings.

--

--