Panja

Super simple static site generator using [Pan]doc and Jin[ja], with added note processing utilities

created: 2020-08-21 · modified: 2020-11-15
code: link

Description

Panja is a custom static site generator that combines Jinja templates and Pandoc.

TODOs

Build in file status monitor, i.e. show recent changes and dates (almost git diff like)
Build in note graph snapshot, i.e. little square view of current file and outgoing connections
What about building the Vim-roam graph parser in Python? Might be more dynamic and fast
Can get backlink content out of Vim implementation first for rendering pipe
Add system listing + index dependency for notes; rebuild those listing pages when note changes

Why Panja exists

I want to make it easy to piece together file conversion pipelines, particularly for Markdown -> HTML and Markdown -> tex. The former has many built-in solutions, and in theory I could just use raw Pandoc on a directory and render a bunch of HTML files. But that would mean I couldn’t use my Jinja templates, and there would be flexibility sacrifice. It’s much easier for me to define a Python “note” object that wraps up the Markdown document, and gives me the full power of a scripting language to extract and change the tidbits. Perhaps it would be possible to cram all this functionality (link transformation, backlink calculation, etc) into Pandoc filters, but I’m not sure I’d prefer it that way.

Development notes

The real question for me is: can Panja be something more general? How can I expand it to be a useful utility for building those two example file conversion pipelines mentioned above, with some potentially complex extra custom stuff? I think the notion of a “note” object is very general, you can parse it and transform the content to your liking within any external script.

Think the Article object should be abstracted away from the main Panja site builder object. I want things to be a bit more modular, and make less assumptions about what’s being put together. This way, the Article object feels like a useful, general utility outside the context of just the site generator; it definitely doesn’t need to have that surrounding context (could have regular Python script and Makefile creating article objects for each file in a recursively searched path, then calling the Article convert and saving them). Same goes the over way for the main site builder. If I really wanted it to use Article for a specific type of file, I could add that note directory as another content directory to check, do my global processing of articles before kicking off the site builder, pass that in as a global context, and tell the object about the specific filter to render the Markdown files as HTML using the article.html template. This separation ends up making it clear that the user needs to piece together the components they want to build their pipeline. This philosophy enables Panja to serve as a more general conversion pipeline builder for my needs.

Differences between this and staticjinja: for the core static site builder, it’s primarily a philosophy difference in how the site files should be structured before being rendered. I like the idea from pelican that you have a separate “theme” with core template files and a core collection of static files. This is separate from the site “content” files you want to render, and the two work in tandem to produce the final site. I also like the idea of just being able to have your site mostly laid out in the content (including folders with static files and HTML templates), and just having the site generator work things out in a straightforward fashion.
1. Looking at jinja env scoping, overlapping template names: looks like staticjinja only takes one single template path, it does not search multiple paths (and probably wouldn’t be easy to support the template and content separation philosophy). Looks like Jinja however does preserve the relative path name of the template, so if you have two index.html files at project1/index.html and project2.html, it’s not a problem. The only problem would be if you have an index.html in your templates folder, and in the root of your content directory, since the loader overlaps the two spaces and uses the relative path to templates as the template name.
Differences between this and pelican: Pelican is of course significantly heavier than this project. It tries to do many many more things, to the benefit of some and the detraction of others. This project is built around my light ideas of a pipeline builder for markdown and static sites and PDFs, and less about the full process of including all the right features for people building sites of all kinds. I avoid a ton of the restrictions that Pelican enforces, and have virtually limitless flexibility in my implementation.
Notes on panja CLI tool and the impact on the Makefile and a separate config file: I think a CLI tool makes sense, both pelican and staticjinja have one. You can use the project in a regular python script file where you define some custom globals, or maybe you want to stick some basic default options and just call the CLI tool with minimal arguments. I think both are valid and should be implemented at some point.
What more does this add than staticjinja, what is the point if this is a simple rehash?
1. Backlink processing
2. Graph visual and processing: use chartlib for this fdg, simple Python code to connect all the note links together and generate a JSON file that can be used on the client side to render the D3 fdg. In theory this implementation might be used to produce the backlinks as well, if we’re definitely taking a step out of the Vim ecosystem.
3. PDF and Markdown files available: when visiting a page, should be able to download raw Markdown or a rendered PDF of the page. This would be nice if all the files were just there.
4. File status updates: Git-like modification updates, perhaps just a global view under a status page. Could just show recently created files, the content that was added, a graph of how many lines were written, yada yada. Could literally depend on a git repo on the note directory we’re wanting to track.
5. Complex TeX support (TikZ): kind of a wishful thinking feature, but defining TikZ syntax in plain Markdown and have it render in both HTML and TeX. That is, for TeX it looks like a regular TikZ plot, but for HTML a standalone SVG is generated and inserted automatically. Would be amazing.
Examples of other pipelines e.g. just local notes: want to the big picture in mind here, Panja isn’t just designed to create my personal site. I want it to be a local navigator for my notes that doesn’t necessarily need to be on the web
Want to keep this project light, and don’t overthink/overengineer everything like I tend to do. I want to keep it in-line with my own use cases, develop “with the flow”, ad hoc, as I need it, whatever. If people want to eventually use this for themselves then we can think about a more general rehash at that point in time. But not until then.
Putting together the Panja, samgriesemer.com, chartlib, and Vim-roam initiatives into ultimate product here. Panja helps put together the build pipeline, chartlib used on the client side for graph rendering, we’re rendering the samgriesemer.com “org” files/notes, and we’re taking leads from Vim-roam with regard to what aspects of the files we have available to show (e.g. backlinks). Kind of nice to see this conversion and the work on many different projects at play here, all with very distinctly defined borders and uses.

Want to see if we can’t build in a useful server to show files/notes locally. If possible, it’d be neat if we could make it such that there’s shortcut in the current Vim file to visit that local HTML page, and then if you navigate through the browser, use a shortcut to return to the current HTML page’s raw Markdown in the Vim session. Would be really great if this could be sorta light, obviously will have be running something like a Flask server at minimum but don’t want to go crazy like the full live Vim plugin.

Small update on this point: it would be extremely easy to have a shortcut that opens the current filename in browser under an expected URL. For going back to the editor, we could make use of the editor:/// protocol I recently saw in promnesia. It appears to be a little hacky and AFAIK only works on Linux, but still a possible start.

Gave some more thought to the generality of this project and the type of pipelines it could/should be able to put together. Also compared it to more raw pipelines, like just using pandoc and bash together for pretty much any kind of conversion pipeline. And I came away with:

I like the idea of keeping the HTML “bias” inside the project. It’s not that strict of an assumption to make, and is pretty common for most projects that want to do something with the web. It can be totally opted out of by avoiding specifying HTML files, but it offers the convenience of having the Jinja template support built-in, something that I wouldn’t have to deal with every single time. On the flip side, there doesn’t appear to be a super great way to build in automatic support for even Markdown files, considering the kind of specificity that goes into that possible conversion. All this to say, while thinking about making the library super general for building pipelines, I think the idea to keep Jinja-specific knowledge in should stick.
Considering the idea of just using raw Pandoc and Bash, I think no matter how I spin it even basic site creation or customized build processes (with global variables) will inevitably require pandoc filters, which ultimately seem like they will become just as complex or even more bloated than the code already in the Panja project. So this seems like a good balance, and I’d rather take that flexibility in Python from the start (something it seems I’d have to do anyway).

Current issues

Pandoc is so slow generating all the files. It’s taking close to a minute (without the KaTeX filter!) for ~450 files. This sucks compared to Python-markdown, which could handle this whole thing in like 5 seconds max. Worth nothing here that I tried the build process on my desktop, and it took like 5 seconds! But this was using an older version of Pandoc (1.19.2.4 to be exact), which seems to be roughly 3x faster than the current 2.10 version. I believe this has mostly to do with added Markdown functionality and just an overall increase in complexity, but this is yielding around 20 second build times. This is not an issue in general, but when using it as a live server we need to make sure only the relevant file gets rebuilt.
The metadata are not being rendered as HTML since they’re being placed in the template later than the main Pandoc conversion. Maybe there’s a nice way Pandoc can place these metadata attribute via its YAML processing.
ACTUALLY thinking that staticjinja is gonna work well, at least for my main Jinja-related needs. The context and rule system is something I was just trying to design myself, and the reloading system works for any of the files being watched in the searchpath (and downstream makes sure the rules/contexts get rerendered as well). I think it’s just too similar to what I’m trying to do, but does it all already with the extra features I want. I can just use it along with my custom objects like Article or Graph, and make the contexts/rules work out for the particular pipeline I’m building. It also defines a CLI interface, etc that just convinces me to roll over and use it. The ONLY issue is the setup of the template and static folders, I’m not the biggest fan and it might challenging to figure out how I’m going to make it work. That said, it does support symlinks, which could be great. I can probably maintain a slightly modified custom fork.
- Current plan here is to extend this to support multiple possible template search paths. This will allow to keep my separate template structure, have my main pages directory, and then specify the actual real notes directory at ~/Documents/notes (so the watch can actually see when new files are created). My static paths should remain the same.
- I think using staticjinja is actually general enough to be my main central pipe builder. That is, arbitrary files can be used as “templates”, and as long as there is a matching regex I can provide the context and render functions. This will just place those files in some chosen directory. So really it handles searching for some files in a given path (let’s just say I only gave it my Markdown notes directory), matching those files to a render function, and calling it with some matching context. This could be used for just building PDFs from a Markdown directory, or really any other conversion pipe; I just have to write what needs to be done for each file, and specify any extra context that the function should receive (like global stuff). The outer framework takes care of copying it all to a desired location, matching the regexes and calling the functions, expanding the input directories, and the nice watcher to rebuild arbitrary “template” files that have changed. This is really all I need I think, and as long as I can get the multiple search paths working, it’ll be great.
Just realized that it’s the node.js KaTeX filter that is taking up tons of time during the full site build process (>10x speedup without it). Might look into the Rust alternative at some point.

Panja

Description

TODOs

Why Panja exists

Development notes

Current issues

Metadata

Collection

Link Subgraph

Backlinks

Webmentions