You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
124 lines
4.7 KiB
124 lines
4.7 KiB
Metadata-Version: 2.1
|
|
Name: latexcodec
|
|
Version: 3.0.0
|
|
Summary: A lexer and codec to work with LaTeX code in Python.
|
|
Home-page: https://github.com/mcmtroffaes/latexcodec
|
|
Download-URL: http://pypi.python.org/pypi/latexcodec
|
|
Author: Matthias C. M. Troffaes
|
|
Author-email: matthias.troffaes@gmail.com
|
|
License: MIT
|
|
Platform: any
|
|
Classifier: Development Status :: 5 - Production/Stable
|
|
Classifier: Environment :: Console
|
|
Classifier: Intended Audience :: Developers
|
|
Classifier: License :: OSI Approved :: MIT License
|
|
Classifier: Operating System :: OS Independent
|
|
Classifier: Programming Language :: Python
|
|
Classifier: Programming Language :: Python :: 3
|
|
Classifier: Programming Language :: Python :: 3.7
|
|
Classifier: Programming Language :: Python :: 3.8
|
|
Classifier: Programming Language :: Python :: 3.9
|
|
Classifier: Programming Language :: Python :: 3.10
|
|
Classifier: Programming Language :: Python :: 3.11
|
|
Classifier: Programming Language :: Python :: 3.12
|
|
Classifier: Topic :: Text Processing :: Markup :: LaTeX
|
|
Classifier: Topic :: Text Processing :: Filters
|
|
Requires-Python: >=3.7
|
|
License-File: LICENSE.rst
|
|
License-File: AUTHORS.rst
|
|
|
|
* **Instead of using latexcodec, I encourage you to consider pylatexenc instead, which is far superior:** https://github.com/phfaist/pylatexenc
|
|
|
|
* Download: http://pypi.python.org/pypi/latexcodec/#downloads
|
|
|
|
* Documentation: http://latexcodec.readthedocs.org/
|
|
|
|
* Development: http://github.com/mcmtroffaes/latexcodec/
|
|
|
|
.. |ci| image:: https://github.com/mcmtroffaes/latexcodec/actions/workflows/python-package.yml/badge.svg
|
|
:target: https://github.com/mcmtroffaes/latexcodec/actions/workflows/python-package.yml
|
|
:alt: ci
|
|
|
|
.. |codecov| image:: https://codecov.io/gh/mcmtroffaes/latexcodec/branch/develop/graph/badge.svg
|
|
:target: https://codecov.io/gh/mcmtroffaes/latexcodec
|
|
:alt: codecov
|
|
|
|
The codec provides a convenient way of going between text written in
|
|
LaTeX and unicode. Since it is not a LaTeX compiler, it is more
|
|
appropriate for short chunks of text, such as a paragraph or the
|
|
values of a BibTeX entry, and it is not appropriate for a full LaTeX
|
|
document. In particular, its behavior on the LaTeX commands that do
|
|
not simply select characters is intended to allow the unicode
|
|
representation to be understandable by a human reader, but is not
|
|
canonical and may require hand tuning to produce the desired effect.
|
|
|
|
The encoder does a best effort to replace unicode characters outside
|
|
of the range used as LaTeX input (ascii by default) with a LaTeX
|
|
command that selects the character. More technically, the unicode code
|
|
point is replaced by a LaTeX command that selects a glyph that
|
|
reasonably represents the code point. Unicode characters with special
|
|
uses in LaTeX are replaced by their LaTeX equivalents. For example,
|
|
|
|
====================== ===================
|
|
original text encoded LaTeX
|
|
====================== ===================
|
|
``¥`` ``\yen``
|
|
``ü`` ``\"u``
|
|
``\N{NO-BREAK SPACE}`` ``~``
|
|
``~`` ``\textasciitilde``
|
|
``%`` ``\%``
|
|
``#`` ``\#``
|
|
``\textbf{x}`` ``\textbf{x}``
|
|
====================== ===================
|
|
|
|
The decoder does a best effort to replace LaTeX commands that select
|
|
characters with the unicode for the character they are selecting. For
|
|
example,
|
|
|
|
===================== ======================
|
|
original LaTeX decoded unicode
|
|
===================== ======================
|
|
``\yen`` ``¥``
|
|
``\"u`` ``ü``
|
|
``~`` ``\N{NO-BREAK SPACE}``
|
|
``\textasciitilde`` ``~``
|
|
``\%`` ``%``
|
|
``\#`` ``#``
|
|
``\textbf{x}`` ``\textbf {x}``
|
|
``#`` ``#``
|
|
===================== ======================
|
|
|
|
In addition, comments are dropped (including the final newline that
|
|
marks the end of a comment), paragraphs are canonicalized into double
|
|
newlines, and other newlines are left as is. Spacing after LaTeX
|
|
commands is also canonicalized.
|
|
|
|
For example,
|
|
|
|
::
|
|
|
|
hi % bye
|
|
there\par world
|
|
\textbf {awesome}
|
|
|
|
is decoded as
|
|
|
|
::
|
|
|
|
hi there
|
|
|
|
world
|
|
\textbf {awesome}
|
|
|
|
When decoding, LaTeX commands not directly selecting characters (for
|
|
example, macros and formatting commands) are passed through
|
|
unchanged. The same happens for LaTeX commands that select characters
|
|
but are not yet recognized by the codec. Either case can result in a
|
|
hybrid unicode string in which some characters are understood as
|
|
literally the character and others as parts of unexpanded commands.
|
|
Consequently, at times, backslashes will be left intact for denoting
|
|
the start of a potentially unrecognized control sequence.
|
|
|
|
Given the numerous and changing packages providing such LaTeX
|
|
commands, the codec will never be complete, and new translations of
|
|
unrecognized unicode or unrecognized LaTeX symbols are always welcome.
|
|
|