#title Texopic typesetting language

Texopic is a typesetting language for civilized, and less
civilized ages. It is easy to learn and easy to use.

Here is a sample of the Texopic markup:

##
    #title Write your title here

    The first paragraph.

    #href{https://example.org}{An example link}

    #section Fruit list

    #begin{enumerate}
    #item Orange
    #item Banana
    #item Watermelon
    #end{enumerate}


#section{tools} Tools

#href{https://github.com/cheery/texopic}
{Utility library for python}
#href{guide.html}{(Guide)}
#href
{https://github.com/cheery/texopic/archive/master.zip}
{(Download)}

#href{//leverlanguage.com}
{Lever language} has a library utility for handling texopic
files.


#section Specification

Texopic is an UTF-8 encoded plain text markup and
typesetting language.

#subsection Macro syntax

A macro starts with a hash character (#). It is immediately
followed by the name and arguments closed into braces ({
and }).

##
    #name{arg1}{arg2}

The name of a macro must not contain spaces or semicolon
';', otherwise the macro can be spaced apart and even broken
into newlines as long as there is not a blank line between
the arguments.

##
    #name
    {arg1} {arg2}

Macro requires that there is blank or one of the
'{', '}' or ';' before the hash character (#). The following
sequences do not form macros:

##
    (#test {macro})
    hello#bold{some text}

The semicolon ';' can be used to start or terminate a macro.
In these cases the macro consumes the semicolon and it is
not displayed in the output. Here are some examples of the
use:

##
    hello-;#bold{in-middle-of}-word
    test;#3A;character
    #a{b}; {c}

If you want a semicolon after a macro, then type two
semicolons.

#subsection Use of braces

Braces '{' and '}' can be freely used as long as they are
not inside macro arguments. In the macro arguments they are
allowed but are also required to pair.

#subsection Preformatted text blocks

To not let Texopic format text, for example when writing
short code snippets or embed something in JSON, you can use
the ##; -macro. When the ##; appears on the end of a line it
will consume the subsequent blank lines and the lines that
are indented higher than the line where the ##; appeared on.
Here are few short examples:

##
    #code python ##
        print("hello world")

    #meta ##
        {"language": "en",
         "flavor": "article",
         
         "day": "18.3.2017"}

Preformatted text block cannot appear inside a macro
argument and it terminates any macro.

#subsection Paragraph/Segment breaks

Blank line only consisting of spaces forms a paragraph, or a segment
break. Similarly to the preformatted text block it cannot
appear inside macros as arguments and it also terminates all
macros.

The lack of preformatted text blocks and paragraphs breaks
inside macros makes the language easier to interpret by
removing several annoying, ugly and rare fringe cases. It
also allows to terminate the missing brace syntax errors
early on.

#subsection Verbatim or canonical form

Texopic forms are canonicalized upon parsing. The
canonicalization does the following:

#begin{itemize}
#item The newlines not forming paragraph breaks are
rewritten as spaces.
#item Multiple consecutive spaces are collapsed into single space.
#end{itemize}

Such verbatim is retrieved as a string and it is used
whenever a macro argument must be interpreted as a variable
or link resource.

#subsection Segments

Top-down, the Texopic document consists of segments, groups
and pre -blocks.

Segment is a line of text. The segment may be tagged with a
macro. For example:

##
    #title Title text

If a segment is not tagged then it is a paragraph. A tagged
segment may capture a preformatted block, like this:

##
    #macro segment text ##
        preformatted block

Segments cannot contain paragraph breaks. Any paragraph
break also breaks a segment.

Whether macro can tag a segment or capture a preformatted
block depends on the environment.

#subsection Groups

Group is a larger macro constructs, it has a marker for
begin and end, and one or more separators between it.
Example:

##
    #begin{enumerate}
    First item.
    #item
    Second item.
    #item
    Third item.
    #end

The begin may contain one or more arguments and it will be
reformatted into a pseudo-macro. For example the above
#;begin{enumerate} turns into #;enumerate.

The end marker may be either implicit (#end) or explicit
(#end{enumerate}). The explicit form is favored if the group
spans more than 10 lines.

Whether a macro behaves as a separator depends on the
environment.

#subsection Environments

The above descriptions of a segment and a group do not
describe which macros form segments or groups. For this
purpose we have environment descriptions.

Environment description describes which rules our macros
should follow. Here's an example of one written in json:

##
    {
        "segments": {
            "title/0":   {"capture": false},
            "section/0": {"capture": false},
            "section/1": {"capture": false}
            "code/0":    {"capture": true}
        },
        "groups": {
            "itemize/0":   {"separators": ["item/0"]},
            "enumerate/0": {"separators": ["item/0"]}
        }
    }

The purpose of the environment description is to help the
Texopic parser determine how a macro is interpreted on
top level.

The slash '/' and number following in the text
describes how many argument groups the macro must have in
order to match. For example. The 'section/1' refers to a
macro of form #;section{argument}

To match from the group table, the first argument of the
#;begin macro is treated as verbatim string. Then the first
argument and the remaining arguments are used to match into
the table. For example. The 'itemize/0' matches with
#;begin{itemize}, If it matches, the matching activates the
separators given to the group.

The description may eventually also describe how the
elements are layouted.

#subsection Explicit form

I foresee that someone may want to use explicit forms. I may
even personally want to eliminate implicit interpretations
later on from the language. Therefore I provide some
forward-compatible measures now.

##
    #:title Write your title here

    The first paragraph.

    #href{https://example.org}{An example link}

    #:section Fruit list

    #begin{enumerate}
    #.item Orange
    #.item Banana
    #.item Watermelon
    #end{enumerate}

    #!code python ##
        print("hello")

The symbols ':', '.', '!' starting a name provide explicit
cue to how the macro should be treated. The ':' tells a
start of an ordinary segment. The '!' tells that the segment
captures a preformatted block. The '.' tells that the macro
is a separator.

I expect that the use of the Texopic format will be always
slightly use-specific.

#subsection Character escapes

Texopic processors are expected to accept UTF-8 text. But
they can also recognize two forms of character escape macros:
#;XX and #;U+XXXX. You can use both of these formats to
represent unicode characters that mess up in the editor.

The 'X' letters in the above formats are hexadecimal
characters. The hexadecimal should refer to a valid unicode
character.

##
    #30 alias for zero '0'
    #3D alias for equal sign '='
    #U+20AC alias for euro sign '€'.

Read/write from texopic file to another texopic file must
retain the escape characters as macros. Though, when
converting out of texopic language it is up to the author
convert the escapes to their respective notation in the
target language. It is preferred to keep escaped characters
as escaped form if there is no reason to change.

#subsection URL handling

URLs in Texopic should be prefixed with #;url macro word.
The automatic recognition of URLs is impractical
to solve in such way that it covers any and every valid URL
you could pass into the text.

#subsection Recognized macros

This is a quick summary of what is recognized by the
#href{https://github.com/cheery/texopic}{Texopic html processor}.

##
    #title
    ##
    #url{link}
    #href{link}{description}
    #image{link}
    #section
    #section{id}
    #subsection
    #subsection{id}
    #begin{itemize}
    #end{itemize}
    #begin{enumerate}
    #end{enumerate}
    #item
    #end
    #comment
    #begin{comment}
    #end{comment}

#begin{enumerate}
#item 'title/0' starts a title segment.
#item 'url/1' creates a hyperlink, an 'url'.
#item 'href/2' creates a hyperlink over a descriptive text.
#item 'image/1' starts an image segment.
#item 'section/0' starts a new section segment.
#item 'section/1' starts a new section segment with a link.
#item 'subsection' are similar to 'section', but treat subsections.
#item 'itemize/0' and 'enumerate/0' groups form lists.
Either item lists or enumerative lists. They use 'item/0' as
a separator.
#item 'comment/0' is either a segment or a group. It is
meant to specify segment or text that is left out from
the documents that will be generated from the text.
#end{enumerate}

The macros try to follow conventions present in LaTeX.

Additionally, on HTML output you have a #css -macro, you can
use it like this:

##
    #css ##
        body { padding: 0; margin: 0; }

#subsection Error handling

The Texopic parser must not refuse parsing incorrect syntax. 
Whenever a parser encounters a syntax error, it is required
to produce a macro 'syntax_error/1' on that point. For
example, if we miss a brace like this:

##
    #bold{Bold failure

    Nothing simple.

If this passes through a parser, the parser should produce:

##
    #bold{Bold failure;#syntax_error{brace missing}}

    Nothing simple.

Note that the parser output must be always correct.
Therefore the missing brace has been added to the point
where it was detected that it was missing.


#section Attained design objectives

#subsection Supports writer's workflow

Stays on the back when the author concentrates on
writing her document. The author only needs to worry about
use of one meta-character (#) while working on the text.

#subsection Can be generated

Documents can be modified & generated by computer. This
comes useful if you have to maintain up-to-date
programming language documentation or publish a report
that is based on data collected by computers.

#subsection Interoperable with other formats

Can embed other plaintext files inside itself. Can
reference to directory files or hyperlink to external
resources.

#subsection Supports direct rendering to the screen

Has a clean structure when loaded to computer's
memory. Can be used directly to print on the screen.
Conversion to other formats is not needed.

#subsection Supports WYSIWYG software

Supports plaintext editing but doesn't enforce you to it.
A WYSIWYG editor can be written over the markup.

#subsection Tolerant

Formatting errors are marked into the output and do not
interrupt the parsing. Missing a brace has only a local
effect that does not mess up the whole representation of a
file. 

#subsection Extensible

Can be extended to support additional structure. There is a
standard method to build a generator which makes it
straightforward to implement new generators or extensions.

Generator is a program that converts from Texopic file to an
another format. There are matching and pretty printing
utilities inside Texopic format to carry through this task.

How to build a generator will be documented in each
implementation of Texopic.


#section Comparison with similar languages

#subsection XML & HTML

XML & HTML formats share the closest resemblence and have
similar goals. They have more fringe cases and are harder to
work with. These formats attempt to be general and are
therefore flawed in all of the usecases they have. Texopic
is not suitable for replacing XML in all of its usecases,
but should work better for written documents.

#subsection TeX

TeX formed inspiration for Texopic. The limitations are that
TeX interpreter can get triggered from about any character
and because of how it is constructed you cannot expect a text
processing tool to read a TeX file and then write it back.

#subsection Markdown

Markdown is a bullettin-board format that has ended up to
wide use everywhere. It has a focus on readability. Markdown
can be extended but extensions easily become complex and
interact unfavorably with other extensions. Markdown only
translates well to HTML.

#subsection reStructuredText

reStructuredText is Python language's documentation format.
It has lot of similarities to Markdown format although it is
slightly better thought out. Where XML & HTML are too
generic, the reStructuredText and Markdown are too specific.
This limits what you do with them.


#section History

Texopic started out during an 
#href {//github.com/cheery/lever/blob/master/documentation_considerations.tex}
{assessment} of TeX for documentation in
#href{//leverlanguage.com} {Lever}.

For Lever I wanted reference documentation to be
written outside of the source files. The code would be
annotated to refer on the reference, rather than the other
way around.

Lever documentation flows upwards from the source code. This
means that aside from reference, there is internal documentation
that is sectioned by source files. That internal
documentation is currently in text files, which felt very
natural to write. I plan to section internal documentation
into chapters that still are sectioned by source files. This
internal documentation forms the basis for higher layers of
documentation.

I prefer that the references, guides and all the other
material would be available on the website as well as in the
runtime. To do this I need a format that a documentation
system in runtime could layout directly, and that could be
translated into highly linked, high quality HTML files.

Many of the files designed for this purpose was unclear and
it was hard to figure out what kind of notation they use.
They appeared to be also difficult to customize.

With Texopic I can completely customize my documentation
generation.

#section Trivia

While inspired by TeX, Texopic is neither TeX nor a
superset of TeX.

Texopic name born as a wordplay from "TeX by Topic". Author
thinks it sounds a bit like Aztec.

A small python script, texopic.py
was the first instance of a readable typesetting format that
could be used to documentation that matches to high standards.
Later on, it became the
#href{//github.com/cheery/texopic/blob/master/texopic2html.py}{texopic2html.py}

This file has been generated from #url{index.text} with the
following linux command:

##
    python texopic2html.py index.text > index.html

#section{license} License

#include{LICENSE.md}