The parseLatex
function parses LaTeX source, producing a structured object.
Usage
parseLatex(
text,
verbose = FALSE,
verbatim = c("verbatim", "verbatim*", "Sinput", "Soutput"),
verb = "\\Sexpr",
defcmd = c("\\newcommand", "\\renewcommand", "\\providecommand", "\\def",
"\\let"),
defenv = c("\\newenvironment", "\\renewenvironment"),
catcodes = defaultCatcodes,
recover = FALSE
)
# S3 method for class 'LaTeX2item'
print(x, ...)
# S3 method for class 'LaTeX2'
print(x, tags = FALSE, ...)
Arguments
- text
A character vector containing LaTeX source code.
- verbose
If
TRUE
, print debug error messages.- verbatim
A character vector containing the names of LaTeX environments holding verbatim text.
- verb
A character vector containing LaTeX macros that should be assumed to hold verbatim text.
- defcmd, defenv
Character vectors of macros that are assumed to define new macro commands or environments respectively.
- catcodes
A list or dataframe holding LaTeX "catcodes", such as defaultCatcodes.
- recover
If
TRUE
, convert errors to warnings and continue parsing. See Details below.- x
Object to work on.
- ...
Extra parameters to pass to
deparseLatex
.Whether to display LaTeX2 tags.
Value
parseLatex
returns parsed Latex in a list with class "LaTeX2"
. Items in the list have class "LaTeX2item"
.
Details
Some versions of LaTeX such as pdflatex
only handle ASCII
inputs, while others such as xelatex
allow Unicode input.
parseLatex
allows Unicode input.
During processing of LaTeX input, an interpreter can change
the handling of characters as it goes, using the \catcode
macro
or others such as \makeatletter
. However, parseLatex()
is purely
a parser, not an interpreter, so it can't do that, but
the user can change handling for the whole call using the
catcodes
argument.
catcodes
should be a list or dataframe
with at least two columns:
char
should be a column of single characters.catcode
should be a column of integers in the range 0 to 15 giving the corresponding catcode.
During parsing, parseLatex
will check these values first.
If the input character doesn't match anything, then it will
be categorized:
as a letter (catcode 11) using the ICU function
u_hasBinaryProperty(c, UCHAR_ALPHABETIC)
(oriswalpha(c)
on Windows),as a control character (catcode 15) if its code point is less than 32,
as "other" (catcode 12) otherwise.
When recover = TRUE
, the parser will mark each error
in the output, and attempt to continue parsing. This
may lead to a cascade of errors, but will sometimes
help in locating the first error. The section of text
related to the error will be marked as an item with
tag ERROR
.