This page has been machine-translated from the original page.
Table of Contents
Introduction
In this post, I summarize how to solve the issue where the classes on code blocks in HTML generated from Markdown with the OSS tool Pandoc are not compatible with Prism.js syntax highlighting.
For example, suppose you create HTML with Pandoc from Markdown that contains a code block like the following.
``` python
Source codeThen HTML with classes like the following is generated.
``` html
<pre class="sourceCode python">
<code class="sourceCode python">
...
</code>
</pre>However, Prism.js requires classes in the form language-python, so this output cannot be used as-is.
After checking the following Pandoc issue, it seems that this cannot be resolved under Pandoc’s current behavior.
Update to HTML5 · Issue #3858 · jgm/pandoc · GitHub
So this time, I used a Lua script that extends Pandoc to solve the problem.
Fixing Pandoc HTML Conversion with a Lua Script
The Lua script I used as reference is provided under the MIT license in the following repository.
GitHub - a-vrma/pandoc-filters: A small, useful collection of pandoc filters
Specifically, it is the following script.
--- standard-code: ouput code blocks with class="language-*" attributes
-- © 2020 Aman Verma. Distributed under the MIT license.
local languages = {meta = true,markup = true,css = true,clike = true,javascript = true,abap = true,abnf = true,actionscript = true,ada = true,agda = true,al = true,antlr4 = true,apacheconf = true,apl = true,applescript = true,aql = true,arduino = true,arff = true,asciidoc = true,aspnet = true,asm6502 = true,autohotkey = true,autoit = true,bash = true,basic = true,batch = true,bbcode = true,bison = true,bnf = true,brainfuck = true,brightscript = true,bro = true,bsl = true,c = true,csharp = true,cpp = true,cil = true,clojure = true,cmake = true,coffeescript = true,concurnas = true,csp = true,crystal = true,['css-extras'] = true,cypher = true,d = true,dart = true,dax = true,dhall = true,diff = true,django = true,['dns-zone-file'] = true,docker = true,ebnf = true,editorconfig = true,eiffel = true,ejs = true,elixir = true,elm = true,etlua = true,erb = true,erlang = true,['excel-formula'] = true,fsharp = true,factor = true,['firestore-security-rules'] = true,flow = true,fortran = true,ftl = true,gml = true,gcode = true,gdscript = true,gedcom = true,gherkin = true,git = true,glsl = true,go = true,graphql = true,groovy = true,haml = true,html = true,handlebars = true,haskell = true,haxe = true,hcl = true,hlsl = true,http = true,hpkp = true,hsts = true,ichigojam = true,icon = true,ignore = true,inform7 = true,ini = true,io = true,j = true,java = true,javadoc = true,javadoclike = true,javastacktrace = true,jolie = true,jq = true,jsdoc = true,['js-extras'] = true,json = true,json5 = true,jsonp = true,jsstacktrace = true,['js-templates'] = true,julia = true,keyman = true,kotlin = true,latex = true,latte = true,less = true,lilypond = true,liquid = true,lisp = true,livescript = true,llvm = true,lolcode = true,lua = true,makefile = true,markdown = true,['markup-templating'] = true,matlab = true,mel = true,mizar = true,mongodb = true,monkey = true,moonscript = true,n1ql = true,n4js = true,['nand2tetris-hdl'] = true,naniscript = true,nasm = true,neon = true,nginx = true,nim = true,nix = true,nsis = true,objectivec = true,ocaml = true,opencl = true,oz = true,parigp = true,parser = true,pascal = true,pascaligo = true,pcaxis = true,peoplecode = true,perl = true,php = true,phpdoc = true,['php-extras'] = true,plsql = true,powerquery = true,powershell = true,processing = true,prolog = true,properties = true,protobuf = true,pug = true,puppet = true,pure = true,purebasic = true,purescript = true,python = true,q = true,qml = true,qore = true,r = true,racket = true,jsx = true,tsx = true,reason = true,regex = true,renpy = true,rest = true,rip = true,roboconf = true,robotframework = true,ruby = true,rust = true,sas = true,sass = true,scss = true,scala = true,scheme = true,['shell-session'] = true,smali = true,smalltalk = true,smarty = true,solidity = true,['solution-file'] = true,soy = true,sparql = true,['splunk-spl'] = true,sqf = true,sql = true,stan = true,iecst = true,stylus = true,swift = true,['t4-templating'] = true,['t4-cs'] = true,['t4-vb'] = true,tap = true,tcl = true,tt2 = true,textile = true,toml = true,turtle = true,twig = true,typescript = true,typoscript = true,unrealscript = true,vala = true,vbnet = true,velocity = true,verilog = true,vhdl = true,vim = true,['visual-basic'] = true,warpscript = true,wasm = true,wiki = true,xeora = true,['xml-doc'] = true,xojo = true,xquery = true,yaml = true,yang = true,zig = true}
local function escape(s)
-- Escape according to HTML 5 rules
return s:gsub(
[=[[<>&"']]=],
function(x)
if x == '<' then
return '<'
elseif x == '>' then
return '>'
elseif x == '&' then
return '&'
elseif x == '"' then
return '"'
elseif x == "'" then
return '''
else
return x
end
end
)
end
local function getCodeClass(classes)
-- Check if the first element of classes (pandoc.CodeBlock.classes) matches a
-- programming language name. If it does, it gets removed from classes and a valid
-- HTML class attribute string (with space at beginning) is returned.
if languages[classes[1]] then
return ' class="language-' .. table.remove(classes, 1) .. '"'
else
return ''
end
end
local function makeIdentifier(ident)
-- Returns a valid HTML id attribute (with space at beginning) OR empty string.
if #ident ~= 0 then
return ' id="'.. ident .. '"'
else
return ''
end
end
local function makeClasses(classes)
-- Returns a valid HTML class attribute with classes separated by spaces (with a space
-- at the beginning) OR empty string.
if #classes ~= 0 then
return ' class="' .. table.concat(classes, ' ') .. '"'
else
return ''
end
end
return {
{
CodeBlock = function(elem)
if FORMAT ~= 'html' then
return nil
end
id = makeIdentifier(elem.identifier)
classLang = getCodeClass(elem.classes)
classReg = makeClasses(elem.classes)
local preCode = string.format(
'<pre%s%s><code%s>%s</code></pre>', id, classReg, classLang, escape(elem.text)
)
return pandoc.RawBlock('html', preCode, 'RawBlock')
end,
}
}I saved this script with the name standard_code.lua.
In my environment, Pandoc runs inside a Docker container, so I executed the following command to use the Lua script.
docker run --rm --volume "`pwd`:/data" --user `id -u`:`id -g` pandoc/core --lua-filter /data/standard_code.lua /data/sample.md -o /data/sample.htmlThis made it possible to generate HTML with code blocks that can use Prism.js syntax highlighting.
Summary
You could achieve the same thing by parsing a normally generated HTML file with a regular expression, but since I had the chance, I tried using a Pandoc filter as a learning exercise.