Guide for Texopic Python module

This document works as a guide for the texopic python module. It explains what to expect from this module and how to use it.

More details about the language can be found from the front page.

1. Basic functionality $

The following code emits every token and character that texopic finds in the stream.

import texopic

for token in texopic.read_file("index.text"):
    print token

This is the most primitive part of texopic. It doesn't load other modules by design. It allows you to implement completely custom logic over the language.

There are several helpers you should use though, because they help in generating texopic documents a great deal.

2. Generic document generator $

The generic parts of the document generator can be found in the texopic.generic module. The key piece in this system is the environment object.

The following program forms a generator from the generic module. We go through it by pieces.

import texopic
from texopic.generic import Env, verbatim

def main():
    group = texopic.read_file("guide.text")
    for line in env.vcall(group, document=None):
        print line

env = Env()
@env.define("paragraph")
def env_paragraph(context, group):
    context.emit(verbatim(group))

@env.define("preformat")
def env_preformat(context, string):
    context.emit(string)

if __name__=='__main__':
    main()

2.1. verbatim function $

verbatim function returns a string that is as close to the verbatim version of the input group as it can be. It is meant for retrieving URLs from the macro groups.

Internally it is used to retrieve the identifier inside #begin and #end clauses.

2.2. Env $

Environment object, Env, works as a namespace for the generator. The environment objects can be stacked to create namespace cakes for customizing generators.

Env.define can be used to define functions into the namespace. At minimum the namespace must contain the "paragraph" and "preformat" -functions. Rest of it describes behavior for macros.

Functions labelled such as #macroname and :macroname allow the user to customize the behavior of respective macros, or #begin/#;end -block. This is explained further in Customizing macros.

Env also has functions .hcall and .vcall that user can call. They have to discussed along some functions from Context.

2.3. Context $

Context is a helper for customizing the behavior of the Vertical/Horizontal stack machine of the generator. It contains a .document object that can be chosen directly.

If one of the functions attempts to shift into vertical mode, but that is impossible, it will raise Suspend() -exception. The Suspend() will be catched by the code that evaluates custom macros and causes it to write the macro in literal form into the document as backup measure.

Context.emit(value) runs the machine into vertical mode and appends the value into vertical list.

Context.next_group(builder) starts a new horizontal mode that builds with the builder function once finished.

Context.end_group() forces the current mode to stop.

Context.in_cake(name) returns True or False, depending on whether the name is in the stack 'cake'.

Context.block property retrieves a topmost block from the stack 'cake'. This cake term is explained later.

Context.get_cake() retrieves an object that allows the customization to empty topmost portions of the stack cake.

Context.hcall(group) starts a new context with same document and same env. The context starts in horizontal mode and cannot switch to vertical. Gives a horizontal list as result.

Context.vcall(group) starts a new context with same document and same env. The context starts in vertical mode. Returns a vertical list.

Env.hcall(group, document) and Env.vcall(group, document) behave same as those context functions except that you can use these to select the environment and the document. In fact Context functions are shorthands into context.env.*call(group, context.document) .

3. Pretty printing $

Texopic has a pretty printing module that implements the algorithm described by the Stanford University report CS-TR-79-770.

The pretty printing is not a focus of the module, so here's just a sample program that illustrates what it can do:

from texopic.printer import Scanner
from StringIO import StringIO

for margin in [20, 10, 80]:
    scan = Scanner(StringIO(), margin)
    scan("(").left().blank("", 2)
    scan.left()
    scan("hello").blank(" ", forceable=False)("world")
    scan.right()
    scan.blank(", ", 2)
    scan("second").blank(" ", 2)("line")
    scan.blank(" ")(")").right()
    scan.finish()

    print scan.printer.fd.getvalue()

Console output:

cheery@ruttunen:~/Documents/texopic$ python scratch.py 
(
  hello world
  second
  line
)

(
  hello
  world
  second
  line
)

(hello world, second line)

cheery@ruttunen:~/Documents/texopic$

4. HTML module $

texopic.html module is ensuring that valid HTML markup is easy to generate.

from texopic import html
body = html.Block([
    html.Node('a',
        ["hello"],
        {
            "href": html.URL("//example.org")
        },
        extra=['disabled'],
        space_sensitive=False, # True if tag is 'pre'
        slash=True # if True end element with />
                   # if False end element with >
    ),
    html.Raw("<script>alert(1);</script>")
])
print html.stringify(body, margin=30)

Output:

cheery@ruttunen:~/Documents/texopic$ python scratch.py 
<a
  href="//example.org"
  disabled>hello
</a><script>alert(1);</script>

4.1. URL validation missing $

Although you can identify URLs in the markup for validation purposes, Texopic html module doesn't come with validation of URLs or XSS prevention.

XSS prevention during generating markup from unsafe sources is futile attempt because the HTML can be interpreted in vastly different ways today. Whitelisting simply cannot account for yet another markup language strapped on top of HTML introducing new notation that evaluates code from markup.

Texopic nopes out of doing XSS-prevention for now. It is impossible to do it properly given the current circumstances.

5. Table of contents module $

from texopic.toc import Toc

toc = Toc()

print toc.entry(0, "hello", link=None)
print toc.entry(1, "test")
print toc.entry(0, "world", link="sample")

print toc.data

Output:

cheery@ruttunen:~/Documents/texopic$ python scratch.py 
('1', '1')
('1.1', '1.1')
('2', 'sample')
[('1', '1', 'hello'), ('1.1', '1.1', 'test'), ('2', 'sample', 'world')]
cheery@ruttunen:~/Documents/texopic$

6. default_html_env $

from texopic.default_html_env import env

env = Env()
# add your customizations here.

The default_html_env uses the earlier html module to create HTML document fragments. It implements several useful macros listed below:

#title
#section
#section{link}
#subsection
#subsection{link}
#bold{group}
#comment
#begin{comment}
#begin{itemize}
#begin{enumerate}
#item
#image{url}
#image{url}{alt}
#href{url}
#href{url}{desc}

7. Larger example $

You need to install pygments to get this example run.

from texopic.default_html_env import env
from texopic.generic import Env, process, verbatim
from texopic import html
from texopic.toc import Toc
import sys
import texopic

class Document(object):
    def __init__(self):
        self.title = None
        self.description = None
        self.toc = Toc()

template = """<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
{0}</head>
<body>
{1}</body>
</html>"""

style = """
body { max-width: 75ex }
body > pre { border: 1px solid #cfcfcf; padding: 1em 4ex }
h2       > .ref { visibility: hidden; text-decoration: underline; }
h2:hover > .ref { visibility: visible !important }
h3       > .ref { visibility: hidden; text-decoration: underline; }
h3:hover > .ref { visibility: visible !important }

.sourcetable pre { margin: 0; }
.sourcetable .linenos { padding-left: 1ex; padding-right: 1ex;
    border-right: 1px solid black; }


""".strip()

def main():
    group = texopic.read_file(sys.argv[1])
    document = Document()
    head = html.Block([])
    body = html.Block(env.vcall(group, document))

    if document.title is not None:
        head.append(html.Node('title', [document.title]))
    if document.description is not None:
        head.append(html.Node('meta', None, {
            'name':'description',
            'content':document.description,
        }, slash=False))
    head.append(html.Node('style', [html.Raw(style)]))
    head.append(html.Node('link', None, {
        "rel": "stylesheet",
        "type": "text/css",
        "href": html.URL("pygments-style.css"),
    }))
    print template.format(
        html.stringify(head),
        html.stringify(body))

env = Env(env)
@env.define("#include", 1)
def env_include(context, path):
    path = verbatim(path) # TODO: make source file relative.
    group = texopic.read_file(path)
    process(context, group)

@env.define("#description", 0)
def env_description(context):
    @context.next_group
    def _build_description_(context, group):
        context.document.description = verbatim(group)

from pygments import highlight
from pygments.lexers import get_lexer_by_name
from pygments.formatters import HtmlFormatter
@env.define("#sample", 0)
def env_sample(context):
    @context.next_group
    def _build_sample_(context, group, code=""):
        name = verbatim(group).strip() or "python"
        lexer = get_lexer_by_name(name, stripall=True)
        formatter = HtmlFormatter(linenos=True, cssclass="source")
        context.emit(html.Raw(highlight(code, lexer, formatter)))
    _build_sample_.capture_pre = True

# heheehe.
@env.define("#include_code", 2)
def env_include_python_code(context, lexer_name, path):
    name = verbatim(lexer_name).strip() or "python"
    path = verbatim(path) # TODO: make source file relative?
    with open(path, "r") as fd:
        code = fd.read()
        lexer = get_lexer_by_name(name, stripall=True)
        formatter = HtmlFormatter(linenos=True, cssclass="source")
        context.emit(html.Raw(highlight(code, lexer, formatter)))


if __name__=="__main__":
    main()

8. Macro customization $

You likely have good clue about how to extend Texopic with your own macros so far. To make it clear here are few complete samples about macros in Texopic.

8.1. Ordinary macro $

Ordinary macro is just replacing itself with some content and starts a horizontal mode. This happens if you return something from the macro.

1
2
3

@env.define("#italic", 1)
def env_italic(context, group):
    return html.Node('i', group)

The above code would parse #italic{group} macro.

8.2. Segment macro $

Segments are paragraph-level constructs meant for customizing behavior of horizontal lists. The simplest construct such as this just writes out a differently formatted horizontal list.

@env.define("#claim", 0)
def env_claim(context):
    @context.next_group
    def _build_claim_(context, group):
        context.emit(html.Node("p", group,
            {"class":"claim"}))

Using such macros as this #claim always starts a new horizontal mode and creates a horizontal list.

8.3. Stack macros $

@env.define(":enumerate", 0)
def env_begin_itemize(context):
    block = Itemize()
    def _build_(context, cake):
        block.item(cake)
        context.emit(html.Node('ol', block.data))
    return {"build": _build_, "block": block}

@env.define("#item", 0)
def env_item(context):
    if isinstance(context.mode.block, Itemize):
        context.block.item(context.get_cake())
    else:
        raise Suspend()

class Itemize(object):
    def __init__(self):
        self.data = []

    def item(self, cake):
        if cake.is_empty and len(self.data) == 0:
            pass # the first #item
        elif cake.is_group:
            self.data.append(html.Node('li', cake.as_group()))
        else:
            self.data.append(html.Node('li', cake.as_list()))

These macros parse input:

#begin{enumerate}
#item X
#item Y
#item Z
#end{enumerate}

And produce:

This is what is referred to with the term 'cake stack'. The block macros push a vertical mode and allow to create nested vertical lists.

8.4. Capture preformat block $

@env.define("#sample", 0)
def env_sample(context):
    @context.next_group
    def _build_sample_(context, group, code=""):
        lexer_name = verbatim(group).strip()
        pass # do some formatting for code here.
    _build_sample_.capture_pre = True

This is parsing the #sample lexer_name ## and is meant for controlling how preformatted blocks are interpreted.

9. Larger website construction with makefiles $

Clean makefiles aren't difficult to write. Get a guide if you don't know how.

all: index.html guide.html

# This is for updating the page. You shouldn't have to do it.
# http://lea.verou.me/2011/10/easily-keep-gh-pages-in-sync-with-master/
sync:
	git checkout gh-pages
	git rebase master
	git push origin gh-pages
	git checkout master

guide.html: guide.text LICENSE.md texopic2html.py

%.html: %.text
	python texopic2html.py $< > $@

Avoid cyclic dependencies and you're all right. Practically this means that don't have make rules that depend on the make rules coming before it.

10. Contribution $

As an useful tip, the following command installs this package with a symlink so that changes to the sources will be immediately available to the users of the package.

1	$ pip install -e .

Python's module packaging is an useful link to anyone who wants to make his own packages for Python.

10.1. Optimizations require benchmarks $

There is a common practice that enthusiastic people come to optimize other people's code because they think certain things are right or correct to do or more efficient.

Outcome of unbenchmarked optimizations is unclear code and no benefits. Therefore you should attach a benchmark that lets the others verify your optimization works along your commits.

10.2. Automatic tests demand explanations $

Programming profession doesn't lack people ready to work for just sake of labour.

If you write an automated test for something or employ automated test framework on this code, you are expected to explain what you are doing and why.

11. License $

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.