March 17, 2020
Split-screen audio in Unreal Engine explained
It turns out that most people’s intuitive understanding of how it should work is counter to how it actually does and needs to work. Because of the surprising amount of interest there is to hear more about this topic from the game audio community, I wrote this developer tech blog to provide a basic walk through of how split-screen audio works in Unreal Engine.
Background On Split-Screen GamingSplit-screen gaming has seen somewhat of a recent comeback in games. It was huge in the early days of multiplayer gaming before the advent of console networking support. Some of my fondest teenage memories are playing split-screen games like Goldeneye and Halo with my friends in pizza-fueled sleep-overs. Then, after the first consoles started offering broadband internet support, we saw a slow drop off in split-screen support. That is slowly reversing. People like playing together online *and* in-person with their friends.
Fortunately, Unreal Engine supports up to four-way split screen. Often games are made to support local “couch” co-op and remote networked multiplayer. This is what we do in Fortnite. You can have your friends over and share the same console and jump in and play with your other friends across the world.
To enable split screen in Unreal Engine, you just need to enable the option in your project settings in the Local Multiplayer settings tab and select the split screen mode you want. You can also customize and do much more using the C++ API.
The Challenges of Split-Screen GamingEnabling split screen is ultimately not as simple as toggling a game project option and hoping for the best. Adding more perspectives into the world results in more objects requiring rendering and less opportunities for culling. It also means more things are in use and loaded and thus referenced by the garbage collector. Doing all this extra work places great strains on nearly every subsystem in a game engine.
On the audio front, things get interesting.
Intuition Is Not Always RightFrom my conversations with fellow developers, the intuition most people seem to have is that audio for each split perspective should render out all the audio audible to each player (i.e. “listener”). Indeed, it makes sense if: a gun goes off right next to player one, it should sound near to player one. But from the perspective of player two, who is further away, it should also sound like it’s far away! After all, ears can’t be split. We should hear all audio from both perspectives, right? Wrong.
Setting aside the doubled CPU cost (for two-way split screen) of audio rendering, this scenario would simply cause an unending sonic catastrophe. Think about it: every single event that happens on screen that is within audible range of all players is doubled for two-way split screen. For three-way or four-way, it could be quadrupled! One footstep happens and… it’s the sound of a group of people taking a step. If that footsteps happens at the exact same time for each listener (which it would) and they play the same footstep variation, you’ll suddenly get very loud audio as all the identical sounds constructively add together. You might even get clipping. Just imagine the chaos of a battlefield. Every gunshot is rendered and audible from all perspectives at once.
Ok, so what is the right way to do it? It’s simple: render sounds once relative to the closest listener.
How Unreal Engine Handles Split ScreenUnreal Engine’s solution is quite elegant. It’s implementation strategy predates me, but it’s a simple and elegant solution that impressed me when I joined Epic five years ago. Although it's similar in principle to methods I’ve seen elsewhere, Unreal Engine’s solution is particularly elegant.
Essentially what Unreal Engine does is simply transform every sound emitter location to be in the local-space transform of its nearest listener. This not only simplifies a ton of lower-level details, it means that the lower-level audio renderer only really needs to worry about one listener transform. Indeed, because of this simplicity, the audio mixer, which is our new multiplatform audio renderer, doesn’t require any listener geometry representation. Since all sound emitters are always transformed into the local-space transform before being sent to the audio renderer, all sounds are simply rendered spatialized relative to an identity matrix. This simplicity reduces complexity around our spatialization code and has paid dividends with regard to simplifying development of our more exciting next-gen spatialization features.
DownsidesAlthough counter-intuitive, rendering sounds only relative to their closest listener makes a lot of sense, but it does come with some drawbacks, such as:
- Perspectives that aren’t too helpful
Obviously, a gunshot shot near one of the split-screen players may not even be angled towards the closest listener but could be pointing at the split-screen player furthest away. So, you might think you should prioritize playing the sound relative to the player further away. The problem is, however, that such an audio cue might be confusing to the player getting shot at! But, as I stated, the alternative is more problematic.
- Traveling sounds flipping perspective
A long duration or looping sound might be playing long enough to flip from one listener perspective to another. This flip can sound jarring. A worse-case scenario (and one we tested quite a bit) would be two players/listeners that are facing the same direction but are reasonably far away from each other and then you have a sound that travels from one player to the other. At the halfway point, the sound goes from sounding like it’s in front of a player to being behind the other player. It’ll “pop” from front to back, which isn’t great.
- Rendering more sounds can change mix and priority balances
Any sound designer can attest that getting a balanced game mix is not easy. Getting a game with a balanced mix that works for both single and split screens is much more challenging.
- Additional sounds and CPU costs
Although rendering audio only once (relative to closest listener) is way less expensive than rendering audio differently for each listener, it still adds CPU cost. This is because we simply have more audio “within range” as there are two places to consider when rendering in-range sounds. For a game like Fortnite, where we push the limits of CPU and Memory on a bi-weekly basis, this is no trivial challenge.
Multiple EndpointsThe savvy among you might ask about rendering audio to different audio endpoints; meaning a separate hardware output (e.g. a different set of speakers, or different controllers, etc). Some consoles do support that and on PC it is possible to render audio to different endpoints.
The idea here is that you could indeed render audio for each split-screen perspective and route it to the different hardware outputs that, presumably, the player would hear through headphones.
Besides the fact that, as of 4.24, this is not something supported in the UE audio engine (as of this writing, but we are planning to support this in 4.25).Furthermore, it’s not something that would be supported on all platforms equally. So, even if you wanted to try to mitigate the CPU cost of all the extra audio rendering, you’d likely still need an alternative solution in the cases where there is no capability of rendering to multiple hardware endpoints. I also find it amusing that you’d invite your friend over to play couch co-op with split screen then immediately put on headphones and not talk, but that’s just me!
The TakeawaySo, yes, handling audio split screening can be a somewhat complex and hairy topic with lots of technical details. However, it turns out that, counter-intuitively, if you simply render audio relative to the nearest listener (i.e. near split-screen view), it all works out reasonably well.
For more information on the topic, check out the chapter I wrote on the topic in an upcoming 3rd volume of a book series, Game Audio Programming Principles and Practices, edited by Guy Somberg.