Notes on constants highlighting. Part 1/2


One of the features I've been working on recently was a highly demanded ability to colorize constant names in the editor. Sanny has the syntax highlighting feature that in general works very well and is able to recognize many different types of code elements. However it is purely syntax-based as those elements have distinctive characters allowing to identify their meaning. For example, variables either start with the dollar sign (e.g. $var) or end with an @ (e.g. [email protected]). Having those characters allows to render and colorize text in one pass without looking for extra information elsewhere.


The challenge with constants is that they don't have anything special outside of being regular identifiers comprised of Latin letters, numbers and the underscore. Sanny syntax allows English words to appear anywhere in the code. For example, in line:


0001: wait 0 ms


only opcode 0001 and argument 0 make sense to the compiler, the rest of the line is ignored and only needed for the user to understand the meaning of parameters. The argument could be substituted with a variable or a constant name defined elsewhere earlier in the code:


const 

    duration = 0

end


0001: wait duration ms


or with a variable declaration:


int duration = 0

0001: wait duration ms


The goal is to recognize duration as a constant name and highlight it using some coloring rules.


Sanny already had a similar ability to highlight words defined as keywords. Those are either part of the language itself (while, true, if) or aliases for opcodes for the given game. For example, a common keyword wait exists for all games. It represents the opcode 0001 and allows the above command to be written as:


wait duration ms


The list of keywords is finite and defined in external files so Sanny is well aware of what could and what could not be a keyword so it's not really a challenge to correctly identify and highlight the word wait. But what to do with duration as its meaning can only be understood after parsing the current script?


So the first and very straightforward way is to scan the prior lines and find const and/or var declarations, make a list of names and then feed that list into the syntax highlighter to colorize them similarly to keywords. The problem here is that Sanny Builder is a single-threaded application and doing that on key press event in the GUI thread would quickly make it slow and unresponsive. Imagine that you are trying to type in some words and the window hung for a split second on each key press. Very annoying. My first decision here was to scan and collect needed information only after opening or saving the file. This reduces costly CPU operations in the GUI thread while allowing to highlight constant symbols most of the time. But still it's far from ideal. Nowadays people use to have immediate feedback on their actions and obviously you as a scripter would like to see your newly added constant highlighted without the need to reopen the file or save it. 


So I started to work on something more robust. Each document can be thought of as a series of possibly overlapping regions. A region starts from the line where a new constant or variable has been declared and lasts till the end of the file (the last line) or another declaration with the same name. Sanny Builder allows to redeclare constants and variables many times in the script like so:


...

10: int duration

11: ...

12: ...

13: float duration

...


On line 10 the variable duration is declared as integer whereas on line 13 it is redeclared as decimal. So the document would have two regions: a one covering lines 10 and 12 inclusive and another one covering lines 13 through the last line.


Now we could read the file on open and save events and construct the document structure. After that we would only track keypresses and update relevant affected regions without rescanning the whole file. Say, you changed the line 12. Declarations on line 10 and 13 were not affected so we don't need to do anything yet. However if you delete the line 10 we know it affects the first region and we update it accordingly.


I won't go into much details here as this approach quickly revealed itself being too fragile. There are way too many possibilities to modify the text (copy-pasting some lines, executing a template, running a macro sequence, and so on), therefore we can't be confident that our regions still match what the user actually sees.


On a final note, later on I got another request that quickly made me realize that neither of the above approaches would work. The request was to account constants defined in the imported files. Sanny allows to move parts of the code into external files and import them using {$INCLUDE} directive. So often your script is not only the text you see right now in the editor but it also includes parts hidden on the hard drive. Moreover there are implicitly imported files such as constants.txt that defines well-known symbols for the compiler and is included during compilation (at least with the default edit modes configuration). You don't need to import constants.txt explicitly with the {$INCLUDE} directive yet constants defined there are available to your code, for example variables TIMERA and TIMERB. Bear in mind you can open constants.txt in another editor, say Notepad, and change its content and ideally you would like to see it reflected in your script.


That being said, a seemingly innocent idea to highlight some words with the special meaning turned out to be a very complex functionality. Next time I will talk about how I overcame that challenge and the new obstacles that I faced.


Thanks!