proctrace is a high-level profiler for process lifecycle events such as fork,
exec, setpgid, and setsid.
Future work will extend this to tracking open file descriptors,
reads, writes, etc.
Under the hood proctrace uses bpftrace
to trace kernel-level events and system calls.
This means that it only works on Linux when recording.
proctrace can take a recording from a Linux system and analyze it on whatever systems you
can compile Rust.
macOS support is planned but is on hold until a firmware/OS bug is fixed that
causes macOS to hang if DTrace is run in the (admittedly, incredibly uncommon) case that your
machine has gone to sleep since boot.
See this thread for
details.
Motivation
This was created as a debugging tool for work on
Flox.
Flox is a new developer environment tool that provides reproducible, shareable developer
environments using carefully configured subshells rather than containers.
Making this a nice experience for users requires careful orchestration of a few processes
and shell configuration.
The Flox test suite also makes extensive use of bats,
the Bash Automated Test Suite.
As part of execution this test suite spawns a wealth of processes, and opens a file descriptor
for debug output (rather than writing to stderr since that could be mixed up with program output).
If you’re unlucky, this extra file descriptor can get inherited by backgrounded processes,
which causes bats to hang indefinitely.
It also causes your CI to burn itself to the ground every now and then.
I was tired of dealing with these kinds of issues (and more), and I wasn’t satisfied with our
ability to quickly debug these issues, so I wrote proctrace with the intention that we could
let it run in CI alongside the test suite, and export a recording on a test failure.
Output
Output can be generated in a variety of formats, namely:
By event timestamp
By process in fork-order
Mermaid Gantt chart as a poor-man’s distributed-trace-looking output
Sequential output is newline-delimited JSON:
By-process is also newline-delimited JSON (mostly out of convenience) with a header describing the process that the events come from:
Here is an example of the Mermaid output (you may need to open in a new tab to see it better):
This output mode is arguably the most useful, but also the least ergonomic at the moment.
Currently proctrace will write the Gantt chart syntax to the specified output and you can either
copy and paste that yourself in the Mermaid Live Editor or generate diagrams yourself via the Mermaid
CLI (note that this…requires installing headless Chromium or something, I just use the live editor).
Eventually I would like to replace the Mermaid output with HTML reports similar to
cargo build timings
so that we have more control over the legibility of the output.