Improving Waveform Drawing Speed in Blender
What I did during my Google Summer of Code 2023 project
BLUF: Blender now farms out waveform computation to a task pool to process audio strips in parallel. This reduces the time needed to display the waveform for multiple strips. I’m also close to landing a change that supports partial waveform data loads. This will make waveform displays nearly instantaneous.
Why work on Waveform drawing
While most people might know Blender as a 3D modelling software, it also provides many other features. One of these features is the Video Sequence Editor (generally referred to as VSE in the community).
VSE offers a familiar user interface where you can place files (video, audio, images, etc) in a timeline and use that to produce a final video:
While the VSE offers rich functionality, there’s still a lot of development effort put into it to make the experience even better.
One area that could still have some improvements was the waveform drawing process. Whenever audio or video files get added to a project, it is useful to be able to visualize the audio waveform. That allows the person editing the video to observe the audio’s loudness, determine if audio clipping is happening, and make adjustments.
At the start of this project, waveform drawing was slow enough that the feature was kept off by default. If a user wanted to see waveforms had to enable it manually.
As editing workflows become more advanced, larger asset files get used. 4K video edits are not that rare nowadays. The slow waveform drawing just gets aggravated when large files are used.
In order to support heavier workflows, we need to give Blender a speed boost.
Improvements made
Farming waveform loading to a task pool
The changes described in this section can be found in this pull-request.
Blender offers a well-rounded background job system. Components can schedule long-running work with this system to avoid blocking the main thread. Keeping main thread unblocked is a must to provide a smooth user experience.
Since waveform loading is slow, the process was offloaded to the background job system.
Whenever a new audio strip needs its waveform data, the main thread adds a job to a queue. The queue is then processed in the background.
A single thread is responsible for popping items off the queue and processing them. Each item is processed one at a time. This serial behaviour causes serious delays when there are multiple audio strips that need their waveform displayed.
To address this issue, we considered two options:
Create more than one background job using the job system.
Keep the single background job, but have it farm out work to a task pool.
While option (1) could work, it deviated from job system's intended use. After some discussion with my mentor over the community channel, we decided to go with option (2).
Under this new scheme, all the background job does it take items out of the queue and submit them to the task pool. The task pool threads read the audio waveform for each strip.
Results
Here’s how Blender’s waveform loading behaved before the task pool change:
And here’s what the experience looks like after introducing the task pool:
Partial loads
The changes described in this section can be found in this pull-request. Note: This PR is still not merged at the time of writing.
Despite the introduction of a task pool to process audio strips in parallel, loading each audio strip is still slow.
In order to draw any waveforms to the screen, we read all of the audio data. This becomes a problem as file sizes increase. The larger the file, the longer it takes.
Whenever a user works with the video sequence editor, only parts of the strip are visible at any given time. This means we can just load enough data to display the part of the waveform that is visible in the user interface.
For that, we needed to:
Determine what segment of the audio strip is visible.
For that visible segment, figure out what audio samples we need to read.
Track what segments were already loaded and avoid reloading them.
Determining visible segments
Blender draws audio strips as rectangles. Each strip has a starting and an ending position on the X axis.
Since the UI knows the length of the strip rectangle, we can compute the visible segment like so:
const float strip_length = x2 - x1_aligned;
const float visible_range_start = (frame_start - x1_aligned) / strip_length;
const float visible_range_end = (frame_end - x1_aligned) / strip_length;
visible_range_start
and visible_range_end
mark what portion of the audio needs to be loaded. These values are between 0 and 1.
The audio samples to that need to be read can be determined using the range start and end values.
Tracking segments
I needed a way to both keep track of audio samples that had already been loaded and to determine what samples belonged to a visible range.
At first I thought of just keeping a linked list of loaded segments:
To determine if a segment was loaded, I could iterate over the list.
When blender loaded a segment, I’d add it to the list.
This didn’t work well when I had to figure out how to merge segments or load overlapping segments (i.e., only part of the segment is already loaded).
I spent a good amount of time working through this idea until I decided to park it. The implementation was too complex for the task and it didn’t seem it would work.
After some research, I settled on using a segment tree to track loaded segments.
We know the audio strip data has a fixed size:
We can break the samples into fixed-size segments and track them using the segment tree like so:
Each node in the tree keeps track of a range and a flag indicating if the segment is loaded. Only leaf nodes refer directly to the samples. Higher-level nodes point to sub-segment nodes.
To determine if a given segment is loaded, we walk through the tree looking for the lowest node that contains the segment and inspect its flag.
Example of loading a segment
Suppose we want to display the waveform for the segment in yellow:
The left sub-segment is already loaded. Only the right sub-segment needs to be loaded.
The right sub-segment is marked as loading and added to the background job queue:
Once the right-subsegment is loaded, it is marked as loaded and the parent segment becomes ready for display:
Results
Here's what it looks like when partial loads are used:
Future work
While we achieved a reasonable speed up, there are still improvements to be made:
The waveform loading does not adapt to zoom-levels. When a user zooms out, more of the strip is visible and its resolution is reduced. The audio data could be sampled more sparsely to reduce how much data is loaded.
The current implementation allocates enough memory to hold all of the waveform data, even if only part of it is loaded. An improvement would be to constrain how much memory each waveform can keep track. Waveform segments that are not visible could be evicted in order to make space.
The segment tree implementation has a fixed segment size. This could be made configurable by the user. That would allow users to tune the segment loading to suit their workflow.
Closing thoughts
I learned a fair bit about Blender internals and believe the changes I made to it will benefit the community.
I had a great experience working on Blender throughout the summer. The community was also great and I got feedback and suggestions from multiple people. My mentor, Richard Antalik (@iss), was really helpful with thorough testing and code reviews.
If you’re interested in joining GSoC next year, Blender is a great project to work on.