TransWikia.com

pandoc, markdown: create self-contained .bib file from cited references

TeX - LaTeX Asked by user101089 on July 4, 2021

In Latex there are a variety of tools that can be used to create a self-contained .bib file for the
cited references that are extracted from several larger .bib files; see this popular tex.stackexchange question.

Is there some way to do this for markdown documents processed by pandoc (well, pandoc-citeproc)?

Context: I’m writing an article (in rmarkdown, .Rmd format) shared with a collaborator in the cloud. I refer to several .bib files in my local texmf tree.

bibliography: 
  - "../localtexmf/bibtex/bib/statistics.bib"
  - "../localtexmf/bibtex/bib/graphics.bib"
  - "../localtexmf/bibtex/bib/Rpackages.bib"

But my colleague can’t access these, unless I copy them to the project directory (and then have to maintain duplicate copies).

The LaTeX solutions rely on the the .aux file generated in processing the .tex file. However, pandoc does not map [@reference] into cite and doesn’t produce an .aux file.

A similar question was asked here, but received no answers.

Update: There is probably no direct solution with LaTex or pandoc, but a first step would be to use perl or sed to extract all the citation keys, strings like @key in the .Rmd file.

4 Answers

Although there doesn't seem to be a direct solution yet you could easily establish a workflow using pandoc. A possible solution would be:

  1. Create a tex-file with pandoc with the --biblatex option.
  2. Run latex once
  3. Run biber --output-format=bibtex file.bcf

This is not ideal, of course, but if you use a makefile/script the whole process can be easily automated.

With biber, you can also, using tool-mode, perform additional transformations of the datasource, e.g. resolve crossref-inheritances.

Answered by Denis on July 4, 2021

There is Robert Winkler's perl-based mdbibexport:

mdbibexport.pl extracts the cited references of a Pandoc markdown document and writes a bibtex database for this document. The keys are extracted from the markdown file and written into an auxiliary file, which is used by BibTool to find the references in the bibtex (.bib) data base and to write them into a new file.

It is also be possible to use just pandoc lua writers together with bibexport to achive this. E.g., the following is based on the ideas of the aforementioned mdbibexport script and will write the reduced bib-file to bibexport.bib. Safe to a file bibexport.lua and call pandoc normally, but use bibexport.lua as the target format: pandoc --to bibexport.lua …

local citation_ids = {}

function Doc(body, meta, vars)
  local citations = {};
  for cid, _ in pairs(citation_ids) do
    citations[#citations + 1] = cid
  end
  -- create a dummy .aux file
  local aux = 'bibstyle{alpha}n' ..
      'bibdata{' .. meta.bibliography .. '}n' ..
      'citation{' .. table.concat(citations, ',') .. '}n'
  local auxfile_name = meta.auxfile or 'bibexport.aux'
  local auxfile = io.open(auxfile_name, 'w')
  auxfile:write(aux)
  auxfile:close()
  os.execute('bibexport bibexport.aux')
  return 'Output written to bibexport.bib, aux to ' .. auxfile_name
end

function Cite(c, cs)
  for i = 1, #cs do citation_ids[cs[i].citationId] = true end
  return ''
end

function Str(s) return s end
setmetatable(_G, {__index = function() return function() return "" end end})    

Answered by tarleb on July 4, 2021

Following Denis comment, it makes total sense, it seems that you should do a multistep process and the way you should do it is well explained in this post (external link): https://martinandreasandersen.com/guides/a-nerds-guide-to-writing-papers-for-au/

The only leaving part would be the pandoc command you use to create your .tex document. The complete process would go like this:

pandoc_command example:

pandoc yourpaper.md -o yourpaper.tex --biblatex --bibliography=yourpaper.bib --pdf-engine=pdflatex

So in your command window or your script, you'd have something like:

pandoc_command
latex [yourpaper] # to prepare the document
biber [yourpaper] # to add references
latex [yourpaper] # to collect references
pdflatex [yourpaper] # puts it all together, outputs a pdf

Answered by blackcuarzo on July 4, 2021

Since you write in .Rmd you can use the following R-function to clean up your bib-file:

library(stringr)

clean_bib <- function(input_file, input_bib, output_bib){
  lines <- paste(readLines(input_file), collapse = "")
  entries <- unique(str_match_all(lines, "@([a-zA-Z0-9]+)[,. ?!]]")[[1]][, 2])

  bib <- paste(readLines(input_bib), collapse = "n")
  bib <- unlist(strsplit(bib, "n@"))

  output <- sapply(entries, grep, bib, value = T)
  output <- paste("@", output, sep = "")

  writeLines(unlist(output), output_bib)
}
# now call the function
clean_bib(...)

Just call it in the setup chunk.

What does the function do? It first searches all citations in the input-file, meaning a string starting with @, containing letters and numbers and ending with a comma, dot, question mark, exclamation mark, space or ] -- adjust this to your needs.

Then it constructs a new bib file only containing these entries.

Answered by Johannes Titz on July 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP