Improved dependency tracking in Pollen

flirora · March 13, 2023, 7:21am

When I’ve been working on the Ŋarâþ Crîþ v9 website, I’ve often been frustrated by Pollen’s limits on dependency tracking, and I’ve considered switching away from Pollen a few times because of this.

One of the ways in which my site eludes Pollen’s dependency-tracking abilities is by splitting the functionality in pollen.rkt into several submodules, which is necessary given its complexity (I’m generating both HTML and PDF files, as well as drawing fancy stuff such as interlinear glosses and syntax trees). It should be possible to track dependencies of pollen.rkt by using the module->imports function, then saving the result and refreshing it whenever pollen.rkt itself changes. Likewise, the same could be done for nested imports, as well as for imports in Pollen source files themselves.

(I know that setup:cache-watchlist exists, but I’d prefer not to have to update the value for it whenever I add a new submodule.)

Another way a dependency can evade tracking is by explicitly reading a file. There’s probably nothing that can be done to prevent this completely. However, Pollen could add a procedure to explicitly add a file path to the list of dependencies tracked for the current source file. This function would be called by procedures such as get-doc, get-metas, and get-pagetree.

In summary, I propose the following:

When a module is loaded under Pollen, its file dependencies are saved to the disk. This list is regenerated whenever the module’s source file is modified.
By default, the list of dependencies for a module includes the path of each module that it imports.
There is a procedure to explicitly add a file path as a dependency of the current source file.

If you want me to help implement these proposals, then I’m willing to do so.

mbutterick · March 13, 2023, 2:29pm

I appreciate the suggestions. I’m afraid I’m not going to take Pollen in the direction you propose. I’ve made the caching & watchlist system more flexible over time. But it’s difficult to get right because cache invalidation is hard and so is the file system. (And even more difficult to get it right during an interactive session with the project server.) The fact that no one’s mentioned the caching system for a long time suggests to me that it’s at a happy equilibrium of utility and stability.

flirora · March 14, 2023, 3:40am

Thank you for the answer. I’m a bit disappointed that you’re not interested in this, since the caching system is a major pain point for me. If that’s the case, then I’ll probably try to implement my suggestions myself.

mbutterick · March 14, 2023, 3:51am

If it’s a matter of adading a hook to the code where you can attach the behavior you prefer, I’m willing to do that. (I did it for Beeswax.) But please also read the tips for contributors, especially the Principle of Infinite Maintenance, Principle of Necessity, and Principle of Royalty.

flirora · March 14, 2023, 8:20am

On a sidenote, have you considered storing the cache as an SQLite database (using the preinstalled db package)? It seems as if it could simplify the cache implementation, but it might also take a lot of work to change the existing implementation.

Edit: found out that Pollen uses file/cache internally, which alleviates some of the complexity. Now I’m less sure that changing the implementation would be worth it.

mbutterick · March 14, 2023, 2:49pm

Right. Pretty much every page evaluation within Pollen is handled by cached-require, which uses the file/cache library to store and retrieve the result. (In places, there are also RAM caches used to avoid disk access.)

Another wrinkle for the caching system is its interaction with parallel processing. Both the file caching and the parallel processing are, of course, intended to speed up project rendering. When used together, multiple concurrent processes are trying to use a common file cache, which can trigger odd, non-reproducible race conditions.* (file/cache is supposedly safe for concurrent use, so perhaps I’m doing something wrong.) In the end, I cured this problem in the dumbest possible way, which is by keeping a list of renders that fail during parallel processing and then re-rendering them on one processor.

[* Another class of bugs I detest.]

An idea I’ve considered, but never prototyped, is whether “ganging” a group of page renders into a single source file, rendering that file, and then separating the output, would be faster. (This is closer to the model used by Scribble: when you render a project, it is really just one source file.)

In sum—performance improvement in Pollen has been a long, winding, and laborious road. I’ve spent a fair amount of time on the issue because it’s something that benefits everyone who uses Pollen (including me). I’m open to new ideas. But there’s never been a silver bullet.

flirora · March 15, 2023, 12:51pm

I’ve experienced this a lot, so I decided to take a closer look at what’s happening. From my observations, it occurs in raco pollen render, but not in raco pollen setup. It also seems that the fetch callback in cache-file is called with a shared lock on the cache, so the dest-file might be overwritten concurrently by multiple jobs.

mbutterick · March 15, 2023, 1:24pm

Interesting. If you think there’s an upstream bug, we should probably copy the file/cache module into Pollen and make the changes there (so that it propagates to all current users). Thanks for investigating.

flirora · March 15, 2023, 2:08pm

Another option would be to take a file lock on dest-file in generate-dest-file.

mbutterick · March 15, 2023, 2:34pm

You mean by wrapping it with call-with-file-lock/timeout?

flirora · March 15, 2023, 2:48pm

Exactly, though this is only one of multiple ways to fix the issue.

joel · March 15, 2023, 4:43pm

Just want to chime in and say (more for the record, since I’ve already mentioned it on Discord), that in cases where I’ve wanted more elaborate dependency tracking, I’ve had great success just spelling out the dependency tree in a makefile. Here’s my most recent example. (This one is not for a Pollen project, but could be adapted pretty easily.) Running make web -j 8 in that project rebuilds whatever is needed intelligently and quickly, and make takes care of all the parallelism.

Once you have subcontracted all the dependency management to make, you’ll also want a project web server that knows how to use it. I combined fswatch and raco-static-web to make a simple one that runs make in the background every time something changes.

mbutterick · March 16, 2023, 1:24am

Yes, it’s all coming back to me—the problem with overusing shared locks during a parallel render is, naturally, that you are forcing the parallel processes to wait, and thereby reducing the benefits of parallelism. For instance, on my 8-core machine, I find that a parallel Pollen render goes fastest with four cores— not all eight—because the locks add so much overhead.

I just tried wrapping the generate-dest-file work in call-with-file-lock/timeout. It didn’t seem to produce much net benefit. Though maybe this stands to reason: call-with-file-lock/timeout prevents some errors, thereby avoiding retries, but it also makes things run slower overall. This may be how I arrived at my policy of just letting the parallel jobs run as fast as they can, and worry less about preventing errors ex ante, and more about having a means of curing them.

All that said, my intuitive understanding of parallel processing over the file system is relatively rudimentary. I always imagined the path to faster rendering for Pollen projects lay in making them work more like standard Racket project builds. Though Pollen rendering has probably reached the “local maximum” that I am capable of.

flirora · March 16, 2023, 8:07am

I decided to measure the wall-clock time for a clean render of my entire site* with various job counts (my CPU has 8 cores and 16 logical threads):

16 jobs: 423.18s
8 jobs: 326.24s
4 jobs: 294.15s
2 jobs: 229.60s
1 job: 247.22s

so, at least in my case, raco pollen render is surprisingly poor at using multiple cores.

It might be possible to improve parallelism by switching to rendering another page while waiting for a lock to be released, but I’m not sure how feasible that is to implement; it might require major changes to the code.

Keeping the cache as an SQLite database instead of as a collection of files also might improve performance, but I’ll have to try implementing that to see if it does.

* well, almost – raco pollen render -j <n>, so it misses any pages not reachable from index.ptree, and after calling raco make pollen.rkt pollen/*.rkt pollen/*/*.rkt.

mbutterick · March 16, 2023, 1:50pm

On the bright side, you just improved your project rendering speed by 40% for free. The next 40% will not be free

Strangely, you might also try using racketbc for your render, and see if it makes better use of multiple cores. The newish Chez Scheme back end for Racket has always had questionable multi-core characteristics.

To some extent this already happens: the parallel renderer lets workers request locks on output paths and doesn’t let them proceed without an exclusive lock. Maybe it would be better to rely on filesystem locking. But as I mentioned above, I avoid filesystem abstractions because they’re difficult to inspect and improve.

I’m inexperienced with SQLite. That is an avenue of possible improvement I have not explored.

flirora · March 17, 2023, 5:03am

Here are the render times with my own version with changes (commit 3fd4232d, still using CS):

16 jobs: 359.47s
8 jobs: 304.25s
4 jobs: 208.35s
2 jobs: 220.91s
1 job (again using -j 1): 275.92s
1 job (without any -j option): 229.09s; in contrast, the upstream version takes 213.38s

So performance is improved for multiple cores, but somewhat worse for single-core scenarios. I’m not satisfied with the latter change, as I only have so many CI minutes a month. Something also tells me that the format of the cache isn’t the primary bottleneck, either.

mbutterick · March 17, 2023, 9:39pm

A key question is why performance degrades above 2 jobs—this means that all subsequent processors are wasted. For instance, would a project consisting of only preprocessor files have the same characteristics? I feel like the cause of the parallel degradation needs to be identified before a solution can be theorized.

flirora · May 13, 2023, 10:12am

I performed the following experiments a few months ago, so the measurements apply to Racket 8.8, not the 8.9 update that released a few days ago. In the meantime, I’ve been busy with other projects.

I decided to benchmark raco pollen setup instead of render – this should be more akin to processing preprocessor files, as there are fewer setup-time dependencies between source files than render-time dependencies:

Parallelism	Real time (upstream)	Real time (SQLite exp.)
none	86.27	86.88
`-j 1`	85.23	84.46
`-j 2`	60.51	56.58
`-j 4`	42.99	40.63
`-j 8`	42.48	39.57
`-j 16`	56.19	52.68

so the performance degradation takes more jobs to kick in for setup compared to render, but it’s still there.

basus · August 16, 2024, 5:07pm

Not to resurrect a dead horse, and please point me in the right direction if this has been answered elsewhere, but do you have a good sense of where the major bottlenecks are? Is it an avoidable result of each Pollen file being a program file that needs to go through the whole reader & expander pipeline? Are there fundamental design decisions that could be made differently to make things faster? I’m planning a similar language experiment, but I’m not sure if it would be too slow for my use case.