These notes are from a workshop on pandoc
I ran for Innovative Learning Week 2015.
You can get the PDF file produced with pandoc
from the same source as this page here.
pandoc
is the program written by the philosopher John MacFarlane to convert texts between different formatspandoc
is able to take files in a variety of formats and produce output in an even bigger variety of formatsmarkdown
is a lightweight mark-up language
markdown
docx
, i.e. Wordtex
, which pandoc
uses internally to produce PDF
pandoc
This code will produce a presentation slide with a nested bullet list with the LaTeX package beamer
\begin{frame}{Slide title}
\begin{itemize}
\item Item 1
\item Item 2
\begin{itemize}
\item Subitem 1
\item Subitem 2
\end{itemize}
\item Item 3
\end{itemize}
\end{frame}
The same code will be produced if you run pandoc
to produce beamer
output on the following markdown
fragment:
## Slide title ##
* Item 1
* Item 2
* Subitem 1
* Subitem 2
* Item 3
pandoc
, you can get LaTeX typography without ever touching the LaTeX code: pandoc
creates perfectly serviceable LaTeX code and can run LaTeX for you to produce a PDF filepandoc
?pandoc
workflowmarkdown
support built-inmarkdown
supportEmacs
and Vim
. They are very powerful and have excellent support for both markdown
and pandoc
. I would not recommend learning markdown
and one of these editors at the same time (or at least with any sort of deadline looming!), but that is my recommendation. Gentler introductions to Emacs are available in the shape of Aquamacs (this is essentially Emacs with a slightly more traditional interface; OS X only) and Kieran Healy’s Emacs Starter Kit, specifically geared towards social scientists (close enough to linguistics!)pandoc
. There is no graphical user interface (unless you use Emacs or Vim…), so the program must be run in the command line
cmd
or powershell
to open the Terminalcd
, e.g. cd ~/Documents/Essays
(~
is an abbreviation for /Users/<your username>
, i.e. your home folder) or cd C:\Users\<your name>\Documents
<TAB>
to auto-complete the pathpandoc
pandoc
looks like the following:
pandoc notes.md -o notes.docx
This will run pandoc
on the file notes.md
. The file’s extension (.md
is conventionally used for markdown
files) tells pandoc
that it is written in markdown
and that you want Word output.1
Now try this (this requires a LaTeX system to be installed):
pandoc notes.md -o notes.pdf
.tex
file and runs LaTeX on it to produce a PDF file with the LaTeX defaults — probably already better than Word!This gives the main syntax constructs for Markdown as extended by pandoc
# Top-level title #
## Second-level title ##
### Third-level title ###
(you get the picture)
_italics like this_ or *this is also italics*
__bold like this__ or **this is also bold**
A link that leads to [pandoc's homepage](http://johnmacfarlane.net/pandoc)
* A top-level bullet list
* Another item
* A sub-item
* Another sub-item
* And even deeper nesting
1. A numbered list
2. Another item on the list
1. The actual numbers do not matter
5. So you don't have to renumber things if you rearrange them
~~This is strikethrough~~ (not really useful perhaps except for some
syntax?)
> If you have long quotations, you can typeset them in blocks like
> this
This will be ~subscript~ and this will be ^superscript^
You can also have footnotes.[^1]
[^1]: Again, the precise number does not matter, as long as it's the same in the references and the note itself. You can intersperse the footnotes with the text or put them all the end, they will come out as footnotes anyway.
(@ex) This will be a numbered example
You can refer to it in the text by writing (@ex) again --- as long as
the label is unique within the document, the numbering and referencing
will be automatic
pandoc
that influence what it doespandoc myfile.md -o myfile.pdf
pandoc myfile.md --output=myfile.pdf
-o FILE / --output=FILE
: the name of the file you want to produce. pandoc
tries to guess the output format using the extension. If you do not pass this option, pandoc
will just spit out the result of the conversion back into the terminal-t FORMAT / --to=FORMAT
: the output format, such as docx
, latex
, html
or even plain
-S / --smart
(capitalization matters! This is also a logical option, meaning there is no argument): typographically correct output
-
is a hyphen (used in contexts such as ‘a difficult-to-parse document’)--
is an en dash (used to denote ranges of numbers, such as 2–4)---
is an em dash (the parenthetical dash — like this)"
and '
are corrected to curly quotes depending on context...
is corrected to …-s / --standalone
: if you are converting to a format such as LaTeX or HTML, use this option: it will produce a complete file with all the necessary headers and footers-V KEY[=VAL] / --variable=KEY:VAL
: this is used for setting variables such as author or font; see examples below
--V mainfont="Times New Roman"
or --variable=mainfont:"Times New Roman"
-M KEY[=VAL] / --metadata=KEY:VAL
: this is used for setting metadata--toc / --table-of-contents
: include a table of contents-N / --number-sections
: what it says. By default, sections in LaTeX (and therefore PDF) output are unnumbered, so it makes sense to turn this on. This has no effect in .docx
files, however; you need to fix the .docx
template to achieve that.--reference-docx=FILE
: you can create a .docx
file with the correct styling (e.g. fonts, sizes, colours) and reuse it by passing in this option. The content of the reference file will be ignored. It is recommended to create a .docx
file using pandoc
, edit its styles to achieve the desired result, and reuse it.--latex-engine=pdflatex|lualatex|xelatex
: see belowpandoc
creates PDF output using pdflatex
— a very stable version of LaTeX that is, however, quite archaic in its handling of fonts--latex-engine=xelatex
--variable=mainfont:<name of font>
--variable=geometry:a4paper
--metadata
option is very similar to --variable
, and it’s used in a similar way
--metadata=author:"Pavel Iosad" --metadata=title:"Pandoc notes"
markdown
source file with three lines that all start with %% This is the title
% This is the author
% This is the date
pandoc
manual for that.pandoc
can also do automated tracking of references and citationspandoc
will take care of putting into your reference list and typesetting the entry in line with the style you require. This process is automatic, so if you end up deleting the in-text citation the entry will also not appear in the reference list..bib
files are also plain text files, it is possible to do it using your text editor too..bib
when you update it.pandoc
, but BibTeX is the most portable@Book{kiparsky82:_explan,
author = {Kiparsky, Paul},
title = {Explanation in phonology},
publisher = {Foris},
year = 1982,
location = {Dordrecht}}
{
in the source and ‘cite key’ in the windowSPE
for Chomsky & Halle 1967[@kiparsky82:_explan]
$\Rightarrow$ (Kiparsky 1982)[@kiparsky82:_explan, p. 1]
$\Rightarrow$ (Kiparsky 1982, p. 1)@kiparsky82:_explan shows
$\Rightarrow$ Kiparsky (1982) shows@kiparsky82:_explan [p. 1] shows
$\Rightarrow$ Kiparsky (1982, p. 1) showsPhonology has some explaining to do [as shown by @kiparsky82:_explan]
$\Rightarrow$ Phonology has some explaining to do (as shown by Kiparsky 1982)One of Kiparsky's important works [-@kiparsky82:_explan]
$\Rightarrow$ One of Kiparsky’s important works (1982)[@spe; @kiparsky82:_explan]
$\Rightarrow$ (Chomsky & Halle 1967; Kiparsky 1982)pandoc
uses the .csl
format to describe citation styles.docx
file with a bibliography styled using the Unified Style Sheet:
pandoc myfile.md -o myfile.docx --bibliography=path/to/your/bib/file --csl=path/to/your/csl/file
# References #
knitr
, the R library for reproducible research. By default (as set up in RStudio), knitr
outputs HTML files, but it can also be set up to generate other formats via pandoc
(see here)pandoc
utilities (both are also excellent LaTeX editors if you go down that particular rabbit hole)pandoc notes.md -f markdown -t docx -o notes.docx
[return]I’m Pavel Iosad, and I’m a Professor in the department of Linguistics and English Language at the University of Edinburgh. ¶ You can always go to the start page to learn more.