Procedural music generation with Quartz

Adam Block, Paul Oakley, Marcel Swanepoel
Usually, players who log in to Fortnite between seasons will see a static screen that acts as a placeholder until the new season begins. To mark the beginning of Chapter Three, season three, the Fortnite team wanted to do something different.

Leveraging some of Unreal Engine’s core features, they created a rich, immersive scene filled with visual elements from the upcoming season.

That was then synched to Quartz, a native subsystem within the engine which allows for sample-accurate audio playback. Procedurally generated musical parts, in conjunction with visual effects, were both “subscribed” to a Quartz clock, causing a stunning bioluminescent forest to pulsate in perfect sync with the music.

It’s the first time something like this has been attempted for Fortnite’s start-of-season events. In this article, we’ll hear from some of the Epic designers and artists who worked on the project about how they achieved the results.
 

Video game audio on the downtime screen

Hello! I’m Adam Block, a technical sound designer at Epic Games. I’m writing this tech blog to share some ideas and demonstrate a recent implementation my team and I put together in Fortnite using an Unreal Engine subsystem called Quartz. I hope that after reading this you’ll have a better understanding of what Quartz is and how it works, and that you’re inspired to incorporate Quartz into your own projects. Thanks for taking the time to learn about this feature—it’s super cool.

What is Quartz?

Quartz is a subsystem within Unreal Engine that schedules events to occur at precise (sample accurate) moments in time, even between audio buffers. If we’re using Quartz in a music context, we could consider it as playing the role of a conductor standing in front of an orchestra. The conductor's right hand waves their baton around in a pattern, keeping tempo that all players adhere to, while their left hand will occasionally extend out to signal the precise moment musicians in different sections should begin playing their part.

Quartz allows for sample-accurate audio playback and gives the audio engine enough time to communicate events back to the game thread for PFX or other gameplay events. Pretty cool, right? Quartz, playing a “Play Quantized” event can use any uSoundbase object, including .wav, Sound Cues, MetaSounds, or an audio component.

The exact moment a Quantized Playback event occurs, we could trigger a particle effect to burst, a light to illuminate, and so on. Within the Quartz subsystem, we can define any number of specific “quantization boundaries” relative to a song's tempo and meter (for example: every four bars do this; on the second beat of every eight bars do that).

While Quartz is a great choice for music-related ideas, it lends itself well to non-musical contexts as well. In fact, there are several weapons in Fortnite where Quartz is used to trigger extremely fast and precise weapon fire audio. Imagine if you used a Quartz clock to schedule the shoot audio for a sub-machine gun where the weapon fire would execute every 32nd note of a 110 beats-per-minute clock. Maybe a particle effect could subscribe to that Quartz clock and spawn particle effects precisely on each shot fired. Without Quartz, fast gunfire can often be inconsistent and appear to “gallop,” since we are bound to the video frame rate.

Procedural music

In Fortnite, there was a request for custom music to accommodate a roughly ten-hour-long “downtime” period prior to the new season’s launch. This was an opportunity for us to use Quartz to create a procedurally generated (non-looping) listening experience where no identifiable musical patterns or repeating loops could be recognized. Typically, it’s easy to become bored or fatigued if hearing the same looping content over and over, so with this approach we took smaller “chunks” of musical phrases, shuffled them, and randomly chose variations to playback during runtime.

We built a music controller in Blueprint that handles this logic. In short, it shuffles a playlist of songs and chooses a song at random. Once a song has been selected, all of the musical “stems” (bass, drums, melodies, percussion, chordal, and so on), are referenced as the “current song” and the Blueprint logic (which uses the Quartz subsystem) simply dictates what should play and when.

One huge advantage to Quartz is that, since it’s an Unreal Engine subsystem, anyone on the development team can subscribe to my Quartz clock (for example, “Song 1,” “Song 2,” and so on) and get all of the bars and beats for the currently playing track. Once other people have the bars, beats, and so on, they have creative freedom to do whatever they’d like in perfect sync with the music.

In our case, though, to make things ultra simple for other teams, we simply called delegates on bars, beats, and other subdivisions from the Quartz clock. These delegates had binded events that the FX team used to execute visual changes. Here’s what we put together:

Primary data assets—song-specific information

Each song was its own primary data asset. These data assets held any and all information relating to a track. Track name (we called this variable “clock name”), tempo, song duration in bars (how long we want this track to procedurally play until it ends and the playlist is shuffled), what bars and beats each layer is (for example, bass lines are eight-bar phrases and the last beat is beat four; melodies are four-bar phrases and the last beat is four), and all uSoundBase assets for each music stem (layer) live inside a track’s data asset.

Conceptually, we took an “A” and “B” approach similar to a DJ deck. All parts start playing on the “A” deck (that is, melody A) and when that part is done, another part is randomly selected to play and will play on the “B” deck. Although we didn’t use audio components with this behavior in mind, the idea behind trading phrases and pointing to a call and response system seemed to make sense at the time.

There is an array of “A” melodies, chord parts, drum phrases, bass lines, and percussion parts that all switch and pick from a “B” counterpart array of assets. This system is basically a procedurally generated call and response approach. The data asset simply holds these assets and the parameters and settings which define the duration of each layer so the music controller, which holds the Quartz clock, knows when to seek, shuffle, and queue a new layer to play.

BeginPlay

I’ll run you through exactly how this process works. First, I’m pushing a Sound Mix to ensure that if there is leftover menu music or an edge-case scenario, any other music is ducked and/or not heard. I’m also fading up an ambient looping sound of a marsh ambience as the visuals are a cool low-fi marsh area. After 15 seconds of playback, that ambience fades out over 30 seconds. I play a needle drop sound effect as if there’s a record being played—after all, the theme was “lo-fi”—then I kick off the setup and playback of music.

Shuffle data assets

In this function, I’m taking my DataAsset array, shuffling them, and setting one as my CurrentDataAsset. From that DataAsset, I’m getting the BPM and setting it to a Next BPM variable and setting an IntroComplete bool to “false” because this is our first time playing back a new song.

Set song duration

From this point forward, all of the logic pulls from the “Current Data Asset” variable. For example, in this next function, I’m getting the song duration (number of bars) and setting it to a variable called “Current Song Duration.” This is how the music controller knows when to shuffle the playlist of songs again and choose another one (Quartz is counting the bars for me).

Does clock exist

The first time the tool runs the logic, I’m checking if a clock exists already (based on the track name “Track02,” for example). In this case, it does not exist, so I’ll need to set up some Quartz Quantization Boundaries for the clock and also create a new clock naming it whatever the name of the current song is.

Create and cache Quantization Boundaries

In this function, I’m creating and caching the different Quartz Quantization Boundaries that I’ll need. For example, “every four bars,” “next bar,” and “immediately” are a few useful scenarios for scheduling playback. By creating and caching these as variables, it makes the Blueprint graph less cluttered and gives easy access to each boundary when I play quantized.

Reset Bools

In this function, I’m just resetting all the Booleans I may have set in the previous cycle.

Check song form cycle

In this section, I’m checking to see if this is the first time we’re cycling through a song. If it is, I first check to see if there’s a clock that’s currently active. If the clock isn’t running yet, I grab the stereo four-bar intro for the song, set it to an audio component, and Play Quantized using an “Immediate” Quantization Boundary.

This “Immediate” Quantization Boundary also starts the clock and resets the transport for me (these are options on a Quartz Quantization Boundary). Basically, at this point, I’m recognizing that this is the first time we’re playing a song, I’m queuing up the song intro, setting the playback transport to 00:00:00.
Next, I Play Quantized and subscribe to different quantized beats. This is where our individual subdivisions are executed as events used for counting and sending delegates, serving as our underlying clock pulse events.

Main playback logic

As the bars and beats begin to pulse away, I’m setting variables (bars and beats respectively) so I can keep a count of the total duration of bars and what beat we’re on. I’ll use these variables later to know the correct time to shuffle the song playlist and choose a new song.
On each beat, I’m checking if the song should change:
Inside this “Calculate Song Duration” function, I’m referencing the current data asset’s “total duration in bars” and on each beat doing the math “should it change now?” It’s referencing my “Bar Counter” variable and using a Modulo to tell me when we’ve finally reached the Current Song Duration and we’re on the fourth beat of that measure. When that is true, we know it’s time to shuffle the playlist of songs and continue our logic.
When it’s time to change tracks, I cue up a “DJ Spin” sound effect and spawn a transitional sound that will help end the first song and begin the intro of the next.
Finally, we reach the “InitiateNewTrackPlayback” event, which jumps us up to the top of the Blueprint logic and we start all over again, running through the same logic. This time though, we’re bypassing that initial “Needle Drop” and ambient looping audio.

Procedural music generation

Once a track starts and it’s randomly selecting and queuing musical elements to play, there are a few things happening. Let’s take a look. On each beat, I’m checking “Is this the right time to play something new?” via these “Calculate Part Timing” functions.
As an example, the bass parts will reference the Current Data Asset and check if we’re at the correct bar and beat before we choose a variation. If the right conditions have been met, we continue down our execution path to the next piece of logic.
Now, we select which set of bass parts is appropriate to play next. As mentioned earlier, there are two “sets” of each musical layer: an “A” and a “B” group or “deck” as I’ve called them. It’s kind of like a DJ’s Deck A and Deck B—if “A” is currently playing, then randomly choose a selection from “B”. If “B” is currently playing then randomly choose a selection from group “A."
Once that selection has been made, I then take the variable “Deck A Is Playing” and set it to the opposite of what it was, so the next time the logic runs it will choose the other. This is essentially causing a Boolean variable to flip/flop each time—if A is playing, choose B and set B to active; if B is active, choose A and set A to active.

Passing the sound variable to the Play function:

As each part is randomly chosen, it will pass that uSoundBase reference (the musical clip) as an output and feed it into a “Queue Next Deck” function.
That “Queue Next Deck” function does some checks for the following: Is it a four-bar phrase? Is it a transitional element (two bars)? Is it an eight-bar phrase? Depending on the result, it will assign itself to Play Quantized using the correct Quantization Boundary.
So that’s pretty much it! As you can see, Quartz is an extremely powerful subsystem that enables you to schedule playback with sample accuracy at precise moments in time. All I’ve done in this setup is split up the layers of music and broken them down into four and eight bar phrases that are scheduled to play at the appropriate times using different Quantization Boundaries.

When Quartz has counted and we’ve reached the maximum number of bars for each song, we simply shuffle the Data Asset Pool, pick another song, set the BPM, clear any specific Booleans, gates, DoOnces, and so on, and run through the same logic.

It’s important to know that my particular approach to solving this is not “the” way, it’s not “Epic’s” way; it is simply “a” way. Moving forward as we refine and optimize this controller, there will be plenty of opportunity to consolidate, reduce redundancy, and incorporate MetaSounds into the setup. I encourage anyone interested in music for games—procedural music generation especially—to dive into Quartz and build a system of your own, using an approach that makes the most sense to you. Remember, the Quartz subsystem is not a music-only tool, as there are plenty of use-cases for Quartz outside of a musical application. Weapons, events, synchronizing anything on a project-wide scale are all excellent opportunities to create a Quartz clock and do your thing.

I hope you learned a bit more about the Unreal Engine Quartz subsystem and were able to start thinking of ways you can use it for your projects. Thanks again and take care!

Creating the visuals

Hi! I’m Paul Oakley, Marketing Art Director at Epic Games, and I’m going to provide some information on how the visuals were created for the screen shown at the start of Fortnite  Chapter Three, season three.
The high-level brief was to create a low-fi ambient screen that players could engage with. That fit really well with the tone of the upcoming season, and with the concept of nature. We thought it would be perfectly suited to the idea of a bioluminescent forest.

We developed the environment using lensing and depth of field, to keep it very abstract. You had detail in the foreground, but it became more and more abstract as you went into the distance.
We were talking about having this low-fi soundtrack. And I asked, what happens if we tied the idea of the bioluminescent forests being alive to the idea of the soundtrack and then paired various parts of the bioluminescence to various parts of the beat? That’s where the audio team came into its own.

We had a CG team create the static beauty render. Then we ran our various channels of that image—all the bioluminescent trees, and so on—into all different maps. The UI/UX team created a UV card, and that image was mapped onto the card. 

And they used those channels to drive variation in intensity and color, and that was hooked into the Quartz clock. That would then drive the value of intensity multiplication, which would get triggered by the amplitude of the beats for each one. 

Fundamentally, it's a CG image mapped to a card, and then channels run out that indicate various mask elements within that image.
After that, we defined how we would change color, how we’d change intensity, or how we’d change undulation or motion using Perlin noise or procedural noise as a math function right across the card.

What's exciting for devs about it?

There are many cool things on this project that are interesting for developers. For a start, it’s an example of traditional filmmaking blending into real-time workflows.

This used old school filmmaking principles, such as rendering out AOVs—or secondary outputs—whether they be emissive masks or textural masks or a fog volume pass.
You had the beauty plate along with these additional passes. Normally, you’d go away and composite them and render them offline, and that would be your one static image again—so you basically sandwich it, recompress it, and then actually view the image.

This didn’t work like that. You took all of the outputs. You fed them back in a real-time context into a card that is fundamentally then real-time-captured by a camera. And in that card, you're then re-compositing all of the secondary outputs and then using math to make the changes using those masks.

The Quartz clock is also then hooking into those masks, changing them, and undulating them in real time being captured by the camera. And then it's presented to you as a viewer back into screen space. It's really thinking outside the box. It’s taking a bunch of things that pre-existed and then being really avant garde about it.

UI and shaders

Hey, I am Marcel Swanepoel, Senior UI Artist on Fortnite at Epic. I’m going to run through some of our UI and UI material setup for the Chapter Three, season three Fortnite start-of-season screen.

We kicked off this feature’s development cycle with some early explorations into the UI tech requirements needed. The goal here was to generate a contained system capable of taking dynamic audio inputs and then use that to control the final visual output. At the same time, we had to keep the system nimble enough to be used as a loading screen.

For the final system, we chose to output the audio ranges from a Blueprint to a Material Parameter Collection (MPC). This serves as the dynamic inputs required to get that data into our UI Material where we use it to influence the various render passes and 2D FX used to build up the final visual. This UI Material is then referenced in the Widget Blueprint for the loading screen where we add other UI elements such as text and timer using Unreal Motion Graphics (UMG).

The UI Material setup borders on the complex side due to its need to recomposite the various render passes (provided by render team). The various pass breakdown starts with the color pass. The color pass is a fully rendered image, where the illuminated foliage, river, and bokeh are all turned off. Additionally, any reflections and bounced light from these elements are all disabled. It ends up being the very first static shot of the loading screen when it starts off, prior to the music beginning.

We introduce the foliage emissive in various texture passes due to its need to carry a full range of RGB values. This also enables us to generate foliage groups that can be triggered at different time intervals, coming from audio. This provides the ability to introduce greater variations over all the emissive foliage.
Another important pass that we use extensively across the entire image is the Z-depth. This is used to help control and suppress the intensity of foliage emissive in the scene depth. The further back in the scene the foliage is located, the less intense it is. Now a good question here would be: why don’t we render that scene depth and intensity in the foliage passes themselves? The answer is that we prefer having all the foliage emissive span the full 0-1 range. That enables us to really dial the effect up if we need to and also enables us to balance the whole scene more effectively once we recomposite all the passes. In summary, it gives us more control over the final output and balancing.

The Z-depth also gives us the ability to further extend our range of foliage emissive by dividing the render passes into a foreground and background grouping. Essentially, we take the single rendered texture for one of the emissive passes, clamp the Z-depth in a place on screen that feels natural, then use the clamped Z-depth as a mask so we now have two emissive outputs.

We can then map both the foreground and the background to different audio outputs, providing the ability to have the background cycle slower or faster than the foreground. The alternative to this would be mapping the entire emissive pass to a single audio output and the cycling would be uniform across the entire screen. This would not be ideal, given we really want to increase variation. Using the Z-depth to split the emissive enables us to double the amount of variation we get from a single foliage emissive pass and gets us to where we want to be.
When we look at how I created the streak effect that runs across the curving river, we can again see how heavily we rely on the Z-depth pass. The streaks are channel-packed distance fields that pan across the V channel of a custom UV.

As the streaks move through the scene, we use the Z-depth to clamp the distance fields, which gives the illusion that they are moving from being out of focus in the background to being in focus in the foreground. Additionally, we influence the intensity of the streaks across the scene with the Z-depth. The custom UV is generated to follow the contours of the river so we can map it from 0-1 and have the streaks follow that appropriately. We use the audio inputs to influence both the intensity and the number of streaks that are shown.
To round the scene out, we added some atmospherics using bokeh and light rays. Both these, again, rely on the Z-depth to blend into the scene and mask areas out appropriately. The bokeh effects were generated using a random noise function to drive motion, size, and opacity of a circular Signed Distance Field (SDF). The light ray mask was another pass that the render team provided and we channel-stacked that along with the Z-depth and Custom UV mask. Channel packing is an ideal way to combine up to three grayscale textures into a single RGB texture asset. Each color channel holds the individual grayscale texture value and you can then extract the separate channels once in the material. This is great for optimization gains.

Final balancing and adjustments were done to the material and texture in close collaboration with the render team to ensure that we were keeping values in range and staying true to the initial scene that they put together.

To conclude, our goal of building out a contained system that could meet our success criteria would have been virtually impossible without leveraging UI materials. From the outset, that was clear and our only real question was how to convert the audio inputs into a meaningful and dynamic visual experience.

For me, the flexibility that the UI material pipeline brings to UMG is something that should not be overlooked. It is incredibly powerful and versatile and we have, for the better part of five years, used it extensively in our UI development.

    Get Unreal Engine today!

    Get the world’s most open and advanced creation tool.
    With every feature and full source code access included, Unreal Engine comes fully loaded out of the box.