Learn You The Code
There are various existing IDEs designed for educational use: there are visual environments, like Scratch and Alice, in which users drag and drop ‘blocks’ of code; and there are environments like Greenfoot, BlueJ,1 and Hackety Hack, in which users write actual code, the editor helping in certain ways. Each has its pros and cons, but here I’d like to focus on the latter, since it seems to me more interesting a problem. The question is, how do we make a programming language easy to learn without sacrificing power? To answer this we must draw inspiration from education and interaction design and merge it with low-level compiler internals. I love it when these worlds collide.
Why not use ...?
I don’t want to spend long on this question, because this isn’t really the place, but I’ll summarise. The main problem with existing environment is that they teach users ‘professional’ languages. Of course, we all learned to code using languages common in business, but I believe that we learned in spite of the language. Common languages’ syntax and semantics are both throwbacks to days long past, and we can do much better now. We needn’t begin with a professional language because this can be taught once a beginner has picked up the basics through a more ‘helpful’ language.
I’m complaining not about the implementations of these languages, since a beginner likely neither knows nor cares, but about how incredibly user unfriendly they are, in terms of both code and errors. At university I help first-year students having difficulty with their Java assignments, and it is clear that many beginners are ‘syntax blind’: the strange punctuation used in C family languages is completely unlike English, and act as magical glyphs you must use in some cases but not others, for whatever reason. And even if we try to ignore the subtle complexities of primitives, autoboxing, and generics,2 is this seriously what we must teach as a modern “hello world”?
public abstract class Main {
public static void main(String[] args) {
System.out.println("Hello, world!");
}
}
Hello world should be one line.
In fact, downloading an MP3 should be one line! — why the lucky stiff
A simpler language
The material in programming courses is highly hierarchical. Every topic strongly builds on the topics previously covered. Thus, if you don’t understand one section of the course, you will likely also struggle in the following sections... — Michael Kölling
Particularly, until you learn the syntax you won’t be able to learn any programming dependent on your knowing the syntax. That is to say, everything else.
Pare down to the essence, but don’t remove the poetry. — Leonard Koren
The philosophy for the design of our new language is simple: remove as much syntax and ritual as possible, until doing so would reduce clarity or beauty. We are aiming for the Taoist p’u, the unhewn log: natural, passive, and with unlimited potential. The program above ought to be,
print "Hello, world!"
Another code sample:
if fruits contains apricot
print "i has an apricot! also,"
for fruit in fruits except apricot
print fruit
The language ought to come with standard libraries designed specifically for pedagogy, which means keeping things simple. For instance, all numbers would be arbitrary-precision rationals: it’s a little slower, but there are no floating point imprecision errors. (It’s 2011; try saying with a straight face, “sometimes computers get simple maths wrong.”)3 Strong static typing is preferred, since it’s less prone to type errors, and perhaps with type inference. Details are unimportant at this stage.
A helpful compiler
Professional languages are not designed with novices in mind, so the complicated syntactic and semantic rules can lead to extremely cryptic error messages. With the language above we have addressed this not by reducing the language’s power, but by stripping away obscure expressions and symbols. The language can lead to very precise errors, and we can make things clearer through simple, jargon-free messages.
An important feature of our new language is that all statements are terminated by newlines, which we means we can point to the exact line at which a syntax error occurs, instead of some way down the page where we finally do something which betrays the grammar. And indentation must reflect the language structure! Since we have complete control over the program’s abstract syntax tree (AST) we can pinpoint the exact location of even semantic errors, by checking for common mistakes.
Say we discover that a common mistake is not bracketing in the right places. If the compiler throws an error saying that a given type doesn’t have a given method, we could take the expression and try certain permutations of brackets until we find one that works, and hence accurately identify the error. When the compiler discovers an unknown variable we could check if it would be valid given any level of indentation, or check for similarly-spelled identifiers. And so on.
A friendlier editor
Besides the alterations I’ve mentioned here, BlueJ is pretty well-designed. I feel it is lacking in terms of text editing — the combination of Java ignoring whitespace and BlueJ using spaces for indentation can lead to incredibly misaligned novice code — but it works in terms of features. I would however like something a little more like Hackety Hack in terms of user experience. BlueJ has UML and the object bench, but Hackety Hack has built-in lessons and looks like this:

How beautiful is that? It’s just an image and I want to “start a new program” anyway. And I am smitten by the open palm logo. It’s all just so inviting. Software ought to establish an emotional link with its users — nay, its friends! I know I’m being over-the-top here, but that’s the point.
How far is too far?
There are a few changes I am unsure about, since they would be very logical but would depart from common computer science idioms. For instance, the first item in a list ought to be subscript 1, not 0. But would doing so confuse things later down the line? I don’t know. And since the new language would not have much information on the Internet, unless it became popular, how would this affect learning, or indeed plagiarism? I don’t know. (Though the IDE would have built-in lessons, which may mitigate this.)
In the end I think there should just be more of this kind of experimentation. The worst thing we can do is put our heads in the sand and pretend there’s no problem.
- Full disclosure: Greenfoot and BlueJ are both developed at the University of Kent, where I study.
- See Michael Kölling’s blog post, Can Java be saved from death by feature bloat?
- Optimisations like this dominate all modern languages. See Paul Graham’s essay, The hundred-year language.
Inspired in part by Paul Graham, Michael Kölling, Guido van Rossum, and why the lucky stiff.