Texopic typesetting language

Texopic is a typesetting language for civilized, and less civilized ages. It is easy to learn and easy to use.

Here is a sample of the Texopic markup:

#title Write your title here

The first paragraph.

#href{https://example.org}{An example link}

#section Fruit list

#begin{enumerate}
#item Orange
#item Banana
#item Watermelon
#end{enumerate}

1. Tools $

Utility library for python (Guide) (Download)

Lever language has a library utility for handling texopic files.

2. Specification $

Texopic is an UTF-8 encoded plain text markup and typesetting language.

2.1. Macro syntax $

A macro starts with a hash character (#). It is immediately followed by the name and arguments closed into braces ({ and }).

#name{arg1}{arg2}

The name of a macro must not contain spaces or semicolon ';', otherwise the macro can be spaced apart and even broken into newlines as long as there is not a blank line between the arguments.

#name
{arg1} {arg2}

Macro requires that there is blank or one of the '{', '}' or ';' before the hash character (#). The following sequences do not form macros:

(#test {macro})
hello#bold{some text}

The semicolon ';' can be used to start or terminate a macro. In these cases the macro consumes the semicolon and it is not displayed in the output. Here are some examples of the use:

hello-;#bold{in-middle-of}-word
test;#3A;character
#a{b}; {c}

If you want a semicolon after a macro, then type two semicolons.

2.2. Use of braces $

Braces '{' and '}' can be freely used as long as they are not inside macro arguments. In the macro arguments they are allowed but are also required to pair.

2.3. Preformatted text blocks $

To not let Texopic format text, for example when writing short code snippets or embed something in JSON, you can use the ## -macro. When the ## appears on the end of a line it will consume the subsequent blank lines and the lines that are indented higher than the line where the ## appeared on. Here are few short examples:

#code python ##
    print("hello world")

#meta ##
    {"language": "en",
     "flavor": "article",
     
     "day": "18.3.2017"}

Preformatted text block cannot appear inside a macro argument and it terminates any macro.

2.4. Paragraph/Segment breaks $

Blank line only consisting of spaces forms a paragraph, or a segment break. Similarly to the preformatted text block it cannot appear inside macros as arguments and it also terminates all macros.

The lack of preformatted text blocks and paragraphs breaks inside macros makes the language easier to interpret by removing several annoying, ugly and rare fringe cases. It also allows to terminate the missing brace syntax errors early on.

2.5. Verbatim or canonical form $

Texopic forms are canonicalized upon parsing. The canonicalization does the following:

Such verbatim is retrieved as a string and it is used whenever a macro argument must be interpreted as a variable or link resource.

2.6. Segments $

Top-down, the Texopic document consists of segments, groups and pre -blocks.

Segment is a line of text. The segment may be tagged with a macro. For example:

#title Title text

If a segment is not tagged then it is a paragraph. A tagged segment may capture a preformatted block, like this:

#macro segment text ##
    preformatted block

Segments cannot contain paragraph breaks. Any paragraph break also breaks a segment.

Whether macro can tag a segment or capture a preformatted block depends on the environment.

2.7. Groups $

Group is a larger macro constructs, it has a marker for begin and end, and one or more separators between it. Example:

#begin{enumerate}
First item.
#item
Second item.
#item
Third item.
#end

The begin may contain one or more arguments and it will be reformatted into a pseudo-macro. For example the above #begin{enumerate} turns into #enumerate.

The end marker may be either implicit (#end) or explicit (#end{enumerate}). The explicit form is favored if the group spans more than 10 lines.

Whether a macro behaves as a separator depends on the environment.

2.8. Environments $

The above descriptions of a segment and a group do not describe which macros form segments or groups. For this purpose we have environment descriptions.

Environment description describes which rules our macros should follow. Here's an example of one written in json:

{
    "segments": {
        "title/0":   {"capture": false},
        "section/0": {"capture": false},
        "section/1": {"capture": false}
        "code/0":    {"capture": true}
    },
    "groups": {
        "itemize/0":   {"separators": ["item/0"]},
        "enumerate/0": {"separators": ["item/0"]}
    }
}

The purpose of the environment description is to help the Texopic parser determine how a macro is interpreted on top level.

The slash '/' and number following in the text describes how many argument groups the macro must have in order to match. For example. The 'section/1' refers to a macro of form #section{argument}

To match from the group table, the first argument of the #begin macro is treated as verbatim string. Then the first argument and the remaining arguments are used to match into the table. For example. The 'itemize/0' matches with #begin{itemize}, If it matches, the matching activates the separators given to the group.

The description may eventually also describe how the elements are layouted.

2.9. Explicit form $

I foresee that someone may want to use explicit forms. I may even personally want to eliminate implicit interpretations later on from the language. Therefore I provide some forward-compatible measures now.

#:title Write your title here

The first paragraph.

#href{https://example.org}{An example link}

#:section Fruit list

#begin{enumerate}
#.item Orange
#.item Banana
#.item Watermelon
#end{enumerate}

#!code python ##
    print("hello")

The symbols ':', '.', '!' starting a name provide explicit cue to how the macro should be treated. The ':' tells a start of an ordinary segment. The '!' tells that the segment captures a preformatted block. The '.' tells that the macro is a separator.

I expect that the use of the Texopic format will be always slightly use-specific.

2.10. Character escapes $

Texopic processors are expected to accept UTF-8 text. But they can also recognize two forms of character escape macros: #XX and #U+XXXX. You can use both of these formats to represent unicode characters that mess up in the editor.

The 'X' letters in the above formats are hexadecimal characters. The hexadecimal should refer to a valid unicode character.

#30 alias for zero '0'
#3D alias for equal sign '='
#U+20AC alias for euro sign '€'.

Read/write from texopic file to another texopic file must retain the escape characters as macros. Though, when converting out of texopic language it is up to the author convert the escapes to their respective notation in the target language. It is preferred to keep escaped characters as escaped form if there is no reason to change.

2.11. URL handling $

URLs in Texopic should be prefixed with #url macro word. The automatic recognition of URLs is impractical to solve in such way that it covers any and every valid URL you could pass into the text.

2.12. Recognized macros $

This is a quick summary of what is recognized by the Texopic html processor.

#title
##
#url{link}
#href{link}{description}
#image{link}
#section
#section{id}
#subsection
#subsection{id}
#begin{itemize}
#end{itemize}
#begin{enumerate}
#end{enumerate}
#item
#end
#comment
#begin{comment}
#end{comment}
  1. 'title/0' starts a title segment.
  2. 'url/1' creates a hyperlink, an 'url'.
  3. 'href/2' creates a hyperlink over a descriptive text.
  4. 'image/1' starts an image segment.
  5. 'section/0' starts a new section segment.
  6. 'section/1' starts a new section segment with a link.
  7. 'subsection' are similar to 'section', but treat subsections.
  8. 'itemize/0' and 'enumerate/0' groups form lists. Either item lists or enumerative lists. They use 'item/0' as a separator.
  9. 'comment/0' is either a segment or a group. It is meant to specify segment or text that is left out from the documents that will be generated from the text.

The macros try to follow conventions present in LaTeX.

Additionally, on HTML output you have a #css -macro, you can use it like this:

#css ##
    body { padding: 0; margin: 0; }

2.13. Error handling $

The Texopic parser must not refuse parsing incorrect syntax. Whenever a parser encounters a syntax error, it is required to produce a macro 'syntax_error/1' on that point. For example, if we miss a brace like this:

#bold{Bold failure

Nothing simple.

If this passes through a parser, the parser should produce:

#bold{Bold failure;#syntax_error{brace missing}}

Nothing simple.

Note that the parser output must be always correct. Therefore the missing brace has been added to the point where it was detected that it was missing.

3. Attained design objectives $

3.1. Supports writer's workflow $

Stays on the back when the author concentrates on writing her document. The author only needs to worry about use of one meta-character (#) while working on the text.

3.2. Can be generated $

Documents can be modified & generated by computer. This comes useful if you have to maintain up-to-date programming language documentation or publish a report that is based on data collected by computers.

3.3. Interoperable with other formats $

Can embed other plaintext files inside itself. Can reference to directory files or hyperlink to external resources.

3.4. Supports direct rendering to the screen $

Has a clean structure when loaded to computer's memory. Can be used directly to print on the screen. Conversion to other formats is not needed.

3.5. Supports WYSIWYG software $

Supports plaintext editing but doesn't enforce you to it. A WYSIWYG editor can be written over the markup.

3.6. Tolerant $

Formatting errors are marked into the output and do not interrupt the parsing. Missing a brace has only a local effect that does not mess up the whole representation of a file.

3.7. Extensible $

Can be extended to support additional structure. There is a standard method to build a generator which makes it straightforward to implement new generators or extensions.

Generator is a program that converts from Texopic file to an another format. There are matching and pretty printing utilities inside Texopic format to carry through this task.

How to build a generator will be documented in each implementation of Texopic.

4. Comparison with similar languages $

4.1. XML & HTML $

XML & HTML formats share the closest resemblence and have similar goals. They have more fringe cases and are harder to work with. These formats attempt to be general and are therefore flawed in all of the usecases they have. Texopic is not suitable for replacing XML in all of its usecases, but should work better for written documents.

4.2. TeX $

TeX formed inspiration for Texopic. The limitations are that TeX interpreter can get triggered from about any character and because of how it is constructed you cannot expect a text processing tool to read a TeX file and then write it back.

4.3. Markdown $

Markdown is a bullettin-board format that has ended up to wide use everywhere. It has a focus on readability. Markdown can be extended but extensions easily become complex and interact unfavorably with other extensions. Markdown only translates well to HTML.

4.4. reStructuredText $

reStructuredText is Python language's documentation format. It has lot of similarities to Markdown format although it is slightly better thought out. Where XML & HTML are too generic, the reStructuredText and Markdown are too specific. This limits what you do with them.

5. History $

Texopic started out during an assessment of TeX for documentation in Lever .

For Lever I wanted reference documentation to be written outside of the source files. The code would be annotated to refer on the reference, rather than the other way around.

Lever documentation flows upwards from the source code. This means that aside from reference, there is internal documentation that is sectioned by source files. That internal documentation is currently in text files, which felt very natural to write. I plan to section internal documentation into chapters that still are sectioned by source files. This internal documentation forms the basis for higher layers of documentation.

I prefer that the references, guides and all the other material would be available on the website as well as in the runtime. To do this I need a format that a documentation system in runtime could layout directly, and that could be translated into highly linked, high quality HTML files.

Many of the files designed for this purpose was unclear and it was hard to figure out what kind of notation they use. They appeared to be also difficult to customize.

With Texopic I can completely customize my documentation generation.

6. Trivia $

While inspired by TeX, Texopic is neither TeX nor a superset of TeX.

Texopic name born as a wordplay from "TeX by Topic". Author thinks it sounds a bit like Aztec.

A small python script, texopic.py was the first instance of a readable typesetting format that could be used to documentation that matches to high standards. Later on, it became the texopic2html.py

This file has been generated from index.text with the following linux command:

python texopic2html.py index.text > index.html

7. License $

MIT License

Copyright (c) 2016 Henri Tuhola

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.