Hello there, my name is Kalle Hämäläinen, and I'm the senior graphics programmer on Oceanhorn 2: Knights of the Lost Realm. In this blog post, I'm going to shed some light on how we managed to take our game, which was developed for all of the Apple Arcade supported devices, ranging from the iPhone 6s to the latest MacBook Pro, and ported it to the Nintendo Switch.
We set out with the following goals in mind:
Maintain the highest level of quality possible.
Deliver steady framerate.
Achieve native resolution (1080p in docked mode and 720p in handheld mode).
Use supersampled antialiasing through dynamic resolution instead of MSAA (used in iOS) or temporal AA (used on Mac).
Use the mobile renderer instead of the PC renderer, as we wanted to prioritize resolution over rendering features.
For the Switch port, we worked in collaboration with studio Engine Software. Overall, our wishlist was a tall order for the Switch hardware, which features three CPU cores with relatively low clock speeds.
The first iteration
Our first Switch iteration was only running at 13 fps. On iOS, our target during development was 30 fps. When Apple Arcade was first announced, it wasn't clear which device would be the oldest (it ended up being the iPhone 6s). Targeting 30 fps allowed us to maintain a high resolution, and most importantly, stable battery life.
Solutions
There are 33.3 milliseconds available to render one frame at 30 fps, and our preliminary results were, unfortunately, closer to 80 m/s per frame. We were puzzled: the game was already optimized for mobile devices. What did we miss? Quite a lot, as it turns out. Modern Apple mobile devices have a more powerful CPU than the Switch – that allowed us to overlook many cases where resources were not efficiently utilized. I started to profile the game and looked for answers.
The difference between a game on console and one running on mobile is that on mobile, the performance scales based on background applications, battery life, and device temperature. On consoles, when you get a stable performance, you know you can consistently count on it.
I started the optimization by finding any slow Blueprints and ported those to C++. In two days, I managed to shave off 20 m/s from the initial results. These quick outcomes made me think that there must be some other low-hanging fruit we didn't pick up yet. One became apparent rather quickly: some levels had over 5,000 Actors – a number that is way over the recommended amount. While iOS devices can handle that without much of a problem, the Switch can’t without substantial changes.
I looked at how many of these Actors were ticking on each frame, and the number was over 1,000. On top of that, there were also almost 1,500 ticking components. That was our problem. The first thing we had to do was to reduce the number of ticks per frame. Here’s what we found and how we shaved off milliseconds:
Actors that update without doing much at all. For these, we either disabled ticking completely or significantly reduced the tick interval.
Some Actors were querying overlaps manually. These could be refactored to use Overlap events instead, and only enable ticking just momentarily after the events.
Actors that do something purely visual, but are not visible to the player. To address this issue, we used frustum, distance, dynamic and static occlusion culling, and then we only updated visible actors.
Actors that do something are technically visible but are too far away to be noticeable. For example, in the White City, we have a total of over 300 characters at any given time, but only ten or so are usually in range of the player – those are the only ones that need to be updated.
Unreal Engine offers some ways to optimize these cases; however, it's the team's responsibility to make smart decisions; some require opt-in or custom code. For this specific port, we added and removed some code but also used built-in optimizations; for example, we used animation tickrate.
Another slowdown was coming from the Landscape grass system. It’s a system that dynamically spawns millions of grass and foliage meshes on top of the landscape. It turned out that this used up to 2.5 m/s per frame. We noticed that our way of using different kinds of foliages for each different landscape layer was spawning a lot of empty batches. This could be easily fixed by adding an early out if the component had zero weight for each specific foliage type. The system was also updating all components from the landscape. To fix it, we first looked at what the maximum distance of any foliage type is, and then we updated only those components that are within this range. We went down from 4,000 to 10 landscape components (those next to the player), bringing render time down to 0.1 m/s. If you feel these optimizations would be beneficial for your project, we're happy to provide them within the latest version of Unreal Engine on GitHub here and here (login required).
Last but not least, Engine Software implemented a technique called Actor Tick Batching, which was used on Sea of Thieves. Actor Tick Batching reduces instruction cache misses and opens up other optimization possibilities, like removing duplicated work or doing tick rate optimizations based on Actor distance.
All of these efforts contribute to a highly-optimized port of Oceanhorn 2: Knights of the Lost Realm, which is coming to Nintendo Switch in Fall 2020. Thanks for reading!
Get Unreal Engine today!
Get the world’s most open and advanced creation tool.
With every feature and full source code access included, Unreal Engine comes fully loaded out of the box.