Thursday, 8 March 2018

Vexatious Parses in C++

As part of my work on the egg computer language specification, I've been looking into parsing curly-brace-type languages. There are a number of cul de sacs in these language specifications. Here's one from C++ I've been struggling with today:
    int a = 1;
    int b = 2;
    int c = a-b;
What's the value of "c"? Obviously, it's minus one. But what about this:
    c = a--b;
My Microsoft compiler tells me that this is a malformed expression:
    syntax error: missing ';' before identifier 'b'
But the following is fine:
    c = a---b;
This sets "c" to minus one and decrements "a". Honest.

Here's a list of parses:
    a-b      // Parsed as "a - b"
    a--b     // Fails to compile: missing ';' before identifier 'b'
    a---b    // Parsed as "a-- - b"
    a----b   // Fails to compile: '--' needs l-value
    a-----b  // Fails to compile: '--' needs l-value

    a- -b    // Parsed as "a - -b"
    a- --b   // Parsed as "a - --b"
    a-- -b   // Parsed as "a-- -b"
    a- - -b  // Parsed as "a - - -b"
The compiler is obviously "greedy" when parsing operators; so, in the absence of white-space, it's easy for it to overlook an alternative interpretation:
    a--b     // COULD be parsed as "a - -b"
    a----b   // COULD be parsed as "a-- - -b"
    a-----b  // COULD be parsed as "a-- - --b"
I expect the compiler-writers have their hands tied by the formal language specification. But, for a new language like egg, I don't have any such restrictions.

I decided that prefix and postfix increments/decrements as expressions are bad things. This is mainly due to problems associated with side-effects and evaluation ordering. Consider:
    int a = p[++i] + p[i++]; // Not allowed
However, I think I will retain the prefix increment/decrement statements:
    ++i; // Allowed
    --i; // Allowed
    i++; // Not allowed
    i--; // Not allowed
This permits the idiomatic counter-based loop:
    for (i = 0; i < count; ++i) {
The reasons for only allowing the prefix versions are two-fold:
  1. It make the language specification much less ambiguous; and
  2. People still harp on about prefix increments/decrements being slightly faster than their postfix variants, which is why they are "preferred" for looping.
Whilst I was at it, I also decided I can probably do without the unary '+' operator. That gets rid of the truly vexatious:
    c = a+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+b;

Tuesday, 27 February 2018

What is Egg?

“Egg” is an idea I’ve been thinking about for a long time. Here’s the background…

At work, over the last few months, I’ve used many computer languages:

  1. C++
  2. C#
  3. Java
  4. JavaScript
  5. Clojure
  6. Python
  7. PowerShell
  8. Windows batch commands
  9. Bash

and almost as many build/configuration file formats:

  1. Makefile
  2. JSON
  3. XML
  4. YAML
  5. INI

I appreciate that domain specific languages have their place, but often runtime performance is not an issue, so using a general-purpose language would be more than adequate. Constantly having to context-switch between difference languages and paradigms is exhausting; not to mention the numerous bugs caused by forgetting the specifics of each set of syntaxes, escape sequences, library routine quirks and so on.

What if there was a simple language that was powerful enough to get the job done without having to remember too many subtleties of the language?

Another issue I have with many languages is the lack of simple interoperability. If I want to call a C++ routine from Clojure, I’m going to have to jump through hoops.

Similarly, if you develop a prototype in one language, you often have to “productionize” it by converting it to another. This is a great source of bugs.

What if there was a language that you could transpile into other languages?

Even if the transpiled code was purely used to get a unit test framework up and running before refactoring, this would greatly mitigate the introduction of bugs.

Some of these interoperability issues are due to the frameworks or virtual machines that some of the languages require:

  • Java Virtual Machine
  • .NET Framework
  • and so on

In this regard, I think that JavaScript is quite successful because of its ubiquity: press F12 inside your browser and you have quite a powerful development environment. Running scripts outside of a browser simply requires you to download a zero-install executable such as Node.js.

What if there was a language that ran almost identically on many frameworks and/or virtual machines?

Anecdotally, it seems that Python is gaining ground as a teaching language. I’m not going to knock Python, but it seems strange that there appear to be few other candidates for teaching good software engineering practices.

What if there was a language that could be used for teaching the fundamentals of programming whilst still being useful outside of academic institutions?

Talking of Python, why do computer languages develop to the point where the designers make breaking changes (e.g. Python 2 versus Python 3)? Even venerable C++ is getting a new set of features every three years that’s difficult to keep up with.

What if there was a language that had a relatively stable syntax?

But “egg” isn’t just a computer language specification, it’s:

  • An engine to run scripts written in egg
  • A compiler to generate native code from egg source
  • A set of transpilers to generate other computer languages from egg source
  • A build system (written in egg, of course)
  • A set of core packages to perform common tasks
  • A testing framework
  • A package manager

So, that’s what “egg” is: a personal project to give me an excuse to investigate these issues.

Monday, 26 February 2018

Egg Day

The last Monday of February is "egg" day...

Sunday, 7 January 2018

Anachronicons 4

This is a tricky one; is the following an anachronicon?
Microphone icon
There's obviously a "retro" vibe going on here (it reminds me of microphones from 1930s/1940s radio sound stages) but I have a suspicion that even modern high-end microphones have similar cradle-like mounts. Nonetheless, I'm guessing very few people have seen a microphone looking like this in real life that wasn't aimed at the "retro" market.

Friday, 22 December 2017

Anachronicons 3

Telephones haven't looked like this for a long time:

Telephone icon
But if you search for a telephone emoji in a Unicode font, you're likely to find something much like that. In fact, mining for "telephone" in the Unicode character set delivers a rich seam of nostalgia:
And my favourite:

Sunday, 17 December 2017

Anachronicons 2

Here's another common anacronicon:
File Save icon
Everyone knows it's the icon to save a file; but when was the last time you saved anything to a 3.5" floppy disk?

Saturday, 9 December 2017

Anachronicons 1

If you look at icons and symbols in everyday use, you'll notice something strange. A few of them use old representations of a concept in order to differentiate them from similar visual elements. I call them "anachronicons". Take, for example, the speed camera UK traffic sign from The Highway Code:
Speed Camera Traffic Sign (UK)
Everyone (in the UK, at least) knows what it means, but isn't it strange that the graphic designer used the image of a late nineteenth-/early twentieth-century camera? It's not as if the sign just hasn't been updated; cameras of this form were already defunct when speed cameras (and presumably their signs) were introduced to the UK.

Tellingly, it only seems to be the UK that uses an old-fashioned camera in this way; other countries use words, radar "waves" or images of more modern cameras.