I've been playing with an exciting feature of HTML5 that I hadn't heard of: the contentEditable attribute. It's magical! It just makes anything suddenly be editable in-browser! It also finally provides me with the thing I've been wanting ever since growing dissatisfied with text editors: A way to build a new text editor, without having to actually code up the guts of text movement, manipulation, and entry. Oh, happy day!

More on this as the situation progresses.


/*
 * This is my parser.
 * It's not much, but it's mine.
 */

blocks = code:(if_block / lines)* { return code; }

if_block = 
  "if" ws "(" cond:string ")" ws "{" result:lines "}" 
{ return {type: "if", condition: cond, body: result }; }

/*
while_block = "while" ws "(" cond:string ")" ws "{" result:lines "}"
{ return {type: "while", condition: cond, body: result }; }
*/

lines = lines:line+ ws?
{ return lines; }

line = ws? contents:string ";"
{ return contents; }

ws = (" " / "\n")+ { return ""; }


string = characters:([A-Za-z] [0-9A-Za-z "']*) 
{ return characters[0] + characters[1].join(""); }
	

Then I started realizing I had no idea what I was doing. So I started defining the grammar for a Lisp! Sort of. More or less.


expression = 
   parenthetical
 / bracketed
 / identifier
 / string
 / number

parenthetical = 
"(" cons: expression? cdr:(" "+ expression)* ")"
{ 
  var output = [];
  for (var i=0; i<cdr.length; i++) { 
    output.push(cdr[i][1]); 
  }
  return {type: "parenthetical", first: cons, rest: output}; 
}

bracketed = 
"[" cons: expression? cdr: (" "+ expression)* "]"
{ 
  var output = [cons];
  for (var i=0; i<cdr.length; i++) { 
    output.push(cdr[i][1]); 
  }
  return output;
  return {type: "list", contents: output}; 
}

identifier = contents:[A-Za-z]+ { return contents.join(""); }

string = 
  '"' contents:[^"]* '"' { return contents.join(""); }
/ "'" contents:[^']* "'" { return contents.join(""); }

number = contents:[0-9]+ { return parseInt(contents.join(""), 10); }
	

Perhaps the answer is to break up the editor's coding space by lines, as webkit is wont to do. Specifically, this would be good because it'd make it easier to track the cursor position, because it would never change relative to the line element it's contained in!

// a comment! var a = 5; var b = foo(a);
if (b > 100 && b / 7 == 3)
{ // Some results should take place b += 294800; }
elseif (b < 20)
{ b -= 1; }
else
{ raise UgnaughtException(); }
// So yeah

I just keep writing grammars! Thanks, pegjs!


block = "{" contents:lines "}" { return contents; }

lines = 
  contents:(whitespace (branch / line) whitespace)*
  { var output = [];
    for (var i=0; i<contents.length; i++) { output.push(contents[i][1]); }
    return output; }

branch = 
   main:if_branch elseifs:elseif_branch* elses:else_branch?
   { var output = [main];
     if (elseifs.length > 0) { output = output.concat(elseifs); }
     if (elses) { output.push(elses); }
     return output; }

if_branch = whitespace cond:if_line whitespace body:block
  { return {type: "if", condition: cond, consequent: body}; }
if_line = "if" whitespace "(" condition:line ")" { return condition; }

elseif_branch = whitespace cond:elseif_line whitespace body:block
  { return {type: "else if", condition: cond, consequent: body}; }
elseif_line = "else if" whitespace "(" condition:line ")" { return condition; }

else_branch = whitespace cond:else_line whitespace body:block
  { return {type: "else", condition: cond, consequent: body}; }
else_line = "else" { return "else"; }


line = start:word rest:(" "* word)* 
  { var output = start;
    for (var i=0; i<rest.length; i++) { output += rest[i][0].join("") + rest[i][1]; }
    return output; }
word = c:[A-Za-z]+ { return c.join(""); }
whitespace = (" " / "\n")*
	

Okay, and here's another lisp one, because they're so easy and satisfying:


expression = whitespace e:(parenthetical / string_literal / number / identifier) whitespace
 { return e; }

parenthetical = "(" sequence:expression* ")" { return sequence; }

string_literal = 
   "'" characters:[^']* "'" { return characters.join(""); }
 / '"' characters:[^"]* '"' { return characters.join(""); }
number = digits:[0-9]+ { return parseInt(digits.join(""), 10); }
identifier = letters:[A-Za-z]+ { return letters.join(""); }

whitespace = [ \n\t]*
	

So, it nearly killed me, but I managed to figure out how to make a parser (using pegjs) that actually parses indentation-based grammars! The basic way it works is that as it matches a line, it reads in the number of spaces before it. Using that, it figures out what indentation level it's at - then, as it's returning the line's payload, it puts the payload into the right code block. It's hacky and awful but I think I understand how it works. I'd rather move the logic into the block rule rather than the line rule, but that's pretty simple in theory. Here you go:


{ var code = [];
  var indentations = [0];
  var current_head = [code]; }

b = line* { return code; }
line = 
  (spaces:" "* & 
  { var level = spaces.length;
    if (level > indentations[indentations.length-1]) {
      indentations.push(level);
      var new_block = [];
      current_head[current_head.length-1].push(new_block);
      current_head.push(new_block)
    } else if (indentations.indexOf(level) == -1) {
      return false; 
    } else {
      while (indentations[indentations.length-1] != level) {
        indentations.pop()
        current_head.pop()
      }
    }
    return true; } 
  { return spaces.length; })
  payload:("ab" bs:"b"+ "a" { return "ab" + bs.join("") + "a"; }) 
  "\n"
  { current_head[current_head.length-1].push(payload);
    return payload; }