#title Texopic typesetting language Texopic is a typesetting language for civilized, and less civilized ages. It is easy to learn and easy to use. Here is a sample of the Texopic markup: ## #title Write your title here The first paragraph. #href{https://example.org}{An example link} #section Fruit list #begin{enumerate} #item Orange #item Banana #item Watermelon #end{enumerate} #section{tools} Tools #href{https://github.com/cheery/texopic} {Utility library for python} #href{guide.html}{(Guide)} #href {https://github.com/cheery/texopic/archive/master.zip} {(Download)} #href{//leverlanguage.com} {Lever language} has a library utility for handling texopic files. #section Specification Texopic is an UTF-8 encoded plain text markup and typesetting language. #subsection Macro syntax A macro starts with a hash character (#). It is immediately followed by the name and arguments closed into braces ({ and }). ## #name{arg1}{arg2} The name of a macro must not contain spaces or semicolon ';', otherwise the macro can be spaced apart and even broken into newlines as long as there is not a blank line between the arguments. ## #name {arg1} {arg2} Macro requires that there is blank or one of the '{', '}' or ';' before the hash character (#). The following sequences do not form macros: ## (#test {macro}) hello#bold{some text} The semicolon ';' can be used to start or terminate a macro. In these cases the macro consumes the semicolon and it is not displayed in the output. Here are some examples of the use: ## hello-;#bold{in-middle-of}-word test;#3A;character #a{b}; {c} If you want a semicolon after a macro, then type two semicolons. #subsection Use of braces Braces '{' and '}' can be freely used as long as they are not inside macro arguments. In the macro arguments they are allowed but are also required to pair. #subsection Preformatted text blocks To not let Texopic format text, for example when writing short code snippets or embed something in JSON, you can use the ##; -macro. When the ##; appears on the end of a line it will consume the subsequent blank lines and the lines that are indented higher than the line where the ##; appeared on. Here are few short examples: ## #code python ## print("hello world") #meta ## {"language": "en", "flavor": "article", "day": "18.3.2017"} Preformatted text block cannot appear inside a macro argument and it terminates any macro. #subsection Paragraph/Segment breaks Blank line only consisting of spaces forms a paragraph, or a segment break. Similarly to the preformatted text block it cannot appear inside macros as arguments and it also terminates all macros. The lack of preformatted text blocks and paragraphs breaks inside macros makes the language easier to interpret by removing several annoying, ugly and rare fringe cases. It also allows to terminate the missing brace syntax errors early on. #subsection Verbatim or canonical form Texopic forms are canonicalized upon parsing. The canonicalization does the following: #begin{itemize} #item The newlines not forming paragraph breaks are rewritten as spaces. #item Multiple consecutive spaces are collapsed into single space. #end{itemize} Such verbatim is retrieved as a string and it is used whenever a macro argument must be interpreted as a variable or link resource. #subsection Segments Top-down, the Texopic document consists of segments, groups and pre -blocks. Segment is a line of text. The segment may be tagged with a macro. For example: ## #title Title text If a segment is not tagged then it is a paragraph. A tagged segment may capture a preformatted block, like this: ## #macro segment text ## preformatted block Segments cannot contain paragraph breaks. Any paragraph break also breaks a segment. Whether macro can tag a segment or capture a preformatted block depends on the environment. #subsection Groups Group is a larger macro constructs, it has a marker for begin and end, and one or more separators between it. Example: ## #begin{enumerate} First item. #item Second item. #item Third item. #end The begin may contain one or more arguments and it will be reformatted into a pseudo-macro. For example the above #;begin{enumerate} turns into #;enumerate. The end marker may be either implicit (#end) or explicit (#end{enumerate}). The explicit form is favored if the group spans more than 10 lines. Whether a macro behaves as a separator depends on the environment. #subsection Environments The above descriptions of a segment and a group do not describe which macros form segments or groups. For this purpose we have environment descriptions. Environment description describes which rules our macros should follow. Here's an example of one written in json: ## { "segments": { "title/0": {"capture": false}, "section/0": {"capture": false}, "section/1": {"capture": false} "code/0": {"capture": true} }, "groups": { "itemize/0": {"separators": ["item/0"]}, "enumerate/0": {"separators": ["item/0"]} } } The purpose of the environment description is to help the Texopic parser determine how a macro is interpreted on top level. The slash '/' and number following in the text describes how many argument groups the macro must have in order to match. For example. The 'section/1' refers to a macro of form #;section{argument} To match from the group table, the first argument of the #;begin macro is treated as verbatim string. Then the first argument and the remaining arguments are used to match into the table. For example. The 'itemize/0' matches with #;begin{itemize}, If it matches, the matching activates the separators given to the group. The description may eventually also describe how the elements are layouted. #subsection Explicit form I foresee that someone may want to use explicit forms. I may even personally want to eliminate implicit interpretations later on from the language. Therefore I provide some forward-compatible measures now. ## #:title Write your title here The first paragraph. #href{https://example.org}{An example link} #:section Fruit list #begin{enumerate} #.item Orange #.item Banana #.item Watermelon #end{enumerate} #!code python ## print("hello") The symbols ':', '.', '!' starting a name provide explicit cue to how the macro should be treated. The ':' tells a start of an ordinary segment. The '!' tells that the segment captures a preformatted block. The '.' tells that the macro is a separator. I expect that the use of the Texopic format will be always slightly use-specific. #subsection Character escapes Texopic processors are expected to accept UTF-8 text. But they can also recognize two forms of character escape macros: #;XX and #;U+XXXX. You can use both of these formats to represent unicode characters that mess up in the editor. The 'X' letters in the above formats are hexadecimal characters. The hexadecimal should refer to a valid unicode character. ## #30 alias for zero '0' #3D alias for equal sign '=' #U+20AC alias for euro sign '€'. Read/write from texopic file to another texopic file must retain the escape characters as macros. Though, when converting out of texopic language it is up to the author convert the escapes to their respective notation in the target language. It is preferred to keep escaped characters as escaped form if there is no reason to change. #subsection URL handling URLs in Texopic should be prefixed with #;url macro word. The automatic recognition of URLs is impractical to solve in such way that it covers any and every valid URL you could pass into the text. #subsection Recognized macros This is a quick summary of what is recognized by the #href{https://github.com/cheery/texopic}{Texopic html processor}. ## #title ## #url{link} #href{link}{description} #image{link} #section #section{id} #subsection #subsection{id} #begin{itemize} #end{itemize} #begin{enumerate} #end{enumerate} #item #end #comment #begin{comment} #end{comment} #begin{enumerate} #item 'title/0' starts a title segment. #item 'url/1' creates a hyperlink, an 'url'. #item 'href/2' creates a hyperlink over a descriptive text. #item 'image/1' starts an image segment. #item 'section/0' starts a new section segment. #item 'section/1' starts a new section segment with a link. #item 'subsection' are similar to 'section', but treat subsections. #item 'itemize/0' and 'enumerate/0' groups form lists. Either item lists or enumerative lists. They use 'item/0' as a separator. #item 'comment/0' is either a segment or a group. It is meant to specify segment or text that is left out from the documents that will be generated from the text. #end{enumerate} The macros try to follow conventions present in LaTeX. Additionally, on HTML output you have a #css -macro, you can use it like this: ## #css ## body { padding: 0; margin: 0; } #subsection Error handling The Texopic parser must not refuse parsing incorrect syntax. Whenever a parser encounters a syntax error, it is required to produce a macro 'syntax_error/1' on that point. For example, if we miss a brace like this: ## #bold{Bold failure Nothing simple. If this passes through a parser, the parser should produce: ## #bold{Bold failure;#syntax_error{brace missing}} Nothing simple. Note that the parser output must be always correct. Therefore the missing brace has been added to the point where it was detected that it was missing. #section Attained design objectives #subsection Supports writer's workflow Stays on the back when the author concentrates on writing her document. The author only needs to worry about use of one meta-character (#) while working on the text. #subsection Can be generated Documents can be modified & generated by computer. This comes useful if you have to maintain up-to-date programming language documentation or publish a report that is based on data collected by computers. #subsection Interoperable with other formats Can embed other plaintext files inside itself. Can reference to directory files or hyperlink to external resources. #subsection Supports direct rendering to the screen Has a clean structure when loaded to computer's memory. Can be used directly to print on the screen. Conversion to other formats is not needed. #subsection Supports WYSIWYG software Supports plaintext editing but doesn't enforce you to it. A WYSIWYG editor can be written over the markup. #subsection Tolerant Formatting errors are marked into the output and do not interrupt the parsing. Missing a brace has only a local effect that does not mess up the whole representation of a file. #subsection Extensible Can be extended to support additional structure. There is a standard method to build a generator which makes it straightforward to implement new generators or extensions. Generator is a program that converts from Texopic file to an another format. There are matching and pretty printing utilities inside Texopic format to carry through this task. How to build a generator will be documented in each implementation of Texopic. #section Comparison with similar languages #subsection XML & HTML XML & HTML formats share the closest resemblence and have similar goals. They have more fringe cases and are harder to work with. These formats attempt to be general and are therefore flawed in all of the usecases they have. Texopic is not suitable for replacing XML in all of its usecases, but should work better for written documents. #subsection TeX TeX formed inspiration for Texopic. The limitations are that TeX interpreter can get triggered from about any character and because of how it is constructed you cannot expect a text processing tool to read a TeX file and then write it back. #subsection Markdown Markdown is a bullettin-board format that has ended up to wide use everywhere. It has a focus on readability. Markdown can be extended but extensions easily become complex and interact unfavorably with other extensions. Markdown only translates well to HTML. #subsection reStructuredText reStructuredText is Python language's documentation format. It has lot of similarities to Markdown format although it is slightly better thought out. Where XML & HTML are too generic, the reStructuredText and Markdown are too specific. This limits what you do with them. #section History Texopic started out during an #href {//github.com/cheery/lever/blob/master/documentation_considerations.tex} {assessment} of TeX for documentation in #href{//leverlanguage.com} {Lever}. For Lever I wanted reference documentation to be written outside of the source files. The code would be annotated to refer on the reference, rather than the other way around. Lever documentation flows upwards from the source code. This means that aside from reference, there is internal documentation that is sectioned by source files. That internal documentation is currently in text files, which felt very natural to write. I plan to section internal documentation into chapters that still are sectioned by source files. This internal documentation forms the basis for higher layers of documentation. I prefer that the references, guides and all the other material would be available on the website as well as in the runtime. To do this I need a format that a documentation system in runtime could layout directly, and that could be translated into highly linked, high quality HTML files. Many of the files designed for this purpose was unclear and it was hard to figure out what kind of notation they use. They appeared to be also difficult to customize. With Texopic I can completely customize my documentation generation. #section Trivia While inspired by TeX, Texopic is neither TeX nor a superset of TeX. Texopic name born as a wordplay from "TeX by Topic". Author thinks it sounds a bit like Aztec. A small python script, texopic.py was the first instance of a readable typesetting format that could be used to documentation that matches to high standards. Later on, it became the #href{//github.com/cheery/texopic/blob/master/texopic2html.py}{texopic2html.py} This file has been generated from #url{index.text} with the following linux command: ## python texopic2html.py index.text > index.html #section{license} License #include{LICENSE.md}