The initial observation was that actually we’re parsing and processing some kind of
Domain Specific Language here. Thus the general advice for such undertakings does
apply: we should try to handle the actual language just as a thin layer on top of
some kind of semantic model. In our case, this model is the menu tree to be generated,
while the actual “syntax tree” is the real filesytem, holding Asciidoc files with
embedded comments. Thus, the semantic model was developed first, and separate of the
syntax of the specifications; it was tested to generate suitable HTML and CSS.
The syntactic elements where then added as a collection of parser or matcher objects,
each with the ability to recognise and implement one kind of placement specification.
Each such Placement subclass exposes an acceptVerb() function for handling invocations
of the internal DSL functions, and an acceptDSL() function to parse and accept a
//Menu: line from some Asciidoc source file. This approach makes adding further
configuration options simple.
Another interesting question is to what extent the actual path handling and file discovery
logic should be configurable. My reasoning is, that any attempts towards larger flexibility
are mostly moot, because we can’t overcome the fact that this is logic to be cast into
program code. Extension points or strategy objects will just have the effect to tear apart
the actual code thus will make the code harder to read. Thus I confined myself just to
configure the index file name and file extensions.
Known issues
-
for sake of simplicity, there is one generated container HTML element
per menu entry. In case this entry is a submenu, the <ul>-element is
used, not the preceding headline <li> — this is due to the fact
that this submenu entry is going to be collapsed eventually, but has
the side-effect of highlighting only that submenu block, not the
preceding headline.
-
the acceptable DSL syntax needs to be documented manually; there is
no way to generate this information. Doing so would require to add
specific information methods into Placement subclasses, and it would
result in duplicated information between the regular expressions
and the informations returned by such information methods.
This was deemed dangerous.
-
the __repr__ of the Placement subclasses is not an representation
but rather a __str__ — but unfortunately the debugger in PyDev
invokes __repr_\_
-
the startdir for automatic discovery is an global variable
-
when through the use of redirection, the same file is encountered
multiple times during discovery, it is treated repeatedly, each times
associated with another node, because, on discovery, the node-ID is
generated as parentPath/fileID, to avoid mixing up similarly named
files in different directories. (The NodeIndex allows to retrieve
a node just by its bare ID, without path anyway)
-
no escaping: currently any variable text is written to the generated
HTML without any sanitising or escaping. This might be a security issue,
especially because Git pushes immediately trigger menu generation.
-
the method Node.matches() is implemented sloppily: it uses just a mutual
postfix match, while actually it should line up full path components and
check equality on components, starting from the path end. This cheesy
implementation can yield surprising side-effects: e.g. an not-yet attached
node \'end' could match a new menu page \'documentation/backend'