Capturing Stereoscopic 360 Screenshots and Movies from Unreal Engine 4
We've been working on capturing 360-degree videos since back before the "vertical slice" build of Hellblade went out for review, and the first scene we ever fully captured in 360 stereoscopic was the in-engine cinematic that was used for our first playable build cinematic intro.
Originally we were writing our own in-house monoscopic 360 capture system that was based on cubic capture, projection onto a sphere and subsequent warping for the final image.
While this worked fine, due to the monoscopic nature of it everything felt "huge" and it didn't manage to portray any sense of intimacy with the subject of the videos, you felt like a distant observer rather than having a sense of presence within the scene, despite it being all around you.
(My thinking on why monoscopic footage tends to lose a sense of scale and feel huge is that it's one of those subconscious things that your brain evaluates on your behalf. It can tell you have no stereoscopic convergence, and no parallax from head movements, so the object must be very far away. It then combines that information with the fact that the object fills a large part of your view and the subsequent feeling you get is that the object is very, very large and very far away.)
Try it for yourself, capturing out a stereo and monoscopic frame. When you switch back and forth you'll notice that not only do you lose the sense of depth, but you lose all sense of scale, too, and objects tend toward being enormous.
At about this point we started to investigate whether we could generate left and right eyes with appropriate offsets to generate true stereoscopic images, and in the process we stumbled across the stereoscopic 360 capture plugin provided by the Kite and Lightning devs.
At this point it was basically just up on GitHub and not part of the UE4 distribution, but these days it comes "out of the box" with Unreal Engine 4 and I strongly encourage you to check it out. :)
The rest of this post is going to cover what our particular settings and workflow are at Ninja Theory for capturing out 360 stereo movie captures like the one we just launched for public viewing:
Original non-360 trailer:
360 stereoscopic version:
The post assumes you're on the latest version of UE4 (at time of writing this is 4.11.2), though we've been using this plugin since back on 4.9, so most of this is directly applicable (you may just require a few additional code fixes, which I mention in the troubleshooting section at the end).
Enabling the "Stereo Panoramic Movie Capture" plugin and doing a quick test capture:
First things first then, we need to make sure that you have enabled the appropriate plugin.
With the editor open, go to Edit -> Plugins then select the "Movie Capture" setting on the left and make sure that "Enabled" is ticked on "Stereo Panoramic Movie Capture."
You'll need to shutdown and restart the editor after this.
Note: You may also need to quickly 'build' again, depending on whether you've got local changes in your branch, as the plugin dll shipped might be 'stale.'
When the editor reboots go back to "Editor -> Plugins -> Movie Capture" and double-check that it's ticked.
This plugin has several settings available that you can toggle via console commands, but before we get into that you should do a quick test capture to make sure things are working as expected with default settings.
Open the command console and set SP.OutputDir with an appropriate folder where you want to dump output images, e.g.,
Note: You MUST do this each time you load the editor or game, as it doesn't persist.
Then do a single-frame capture, e.g.,
At this point you'll probably experience a nice long (expect a minute or so) hitch after which two images will be dumped into the directory you specified with SP.OutputDir above (well actually under a date-and-time directory within that directory); one for the left eye and one for the right eye.
Take a quick look at them to make sure that everything is there as-expected. Don't worry too much about if there are artefacts like banding at the moment, as we'll try to address those later on (although some effects such as light shafts don't work, being screen-space effects - we'll cover that more later, too).
Code changes to get both the left and right eyes to be combined into a single image automatically
For our workflow internally we want to just dump one completed top/bottom image per 'frame.' This makes it much easier for us to say "build all the frames into a movie" without the need to manually combine left and right eyes as part of the process.
If this applies to you and you're happy to get your hands dirty in the code, then here's a quick (and not specifically optimised) bit of code to combine the left and right eyes before outputting the image.
1. Open up SceneCapturer.cpp
2. Define a control variable to allow you to toggle between combined and not combined. Internally we ALWAYS want to be outputting combined images, so we just made ours a const global bool.
- const bool CombineAtlasesOnOutput = true;
3. Conditionally disable the current per-eye output at the bottom of USceneCapturer::SaveAtlas (the CombineAtlasesOnOutput check is the only new bit).
IImageWrapperPtr ImageWrapper = ImageWrapperModule.CreateImageWrapper( EImageFormat::PNG );
if (!CombineAtlasesOnOutput) //*NEW* - Don't do this here if we're going to combine them.
ImageWrapper->SetRaw(SphericalAtlas.GetData(), SphericalAtlas.GetAllocatedSize(), SphericalAtlasWidth, SphericalAtlasHeight, ERGBFormat::BGRA, 8);
Note that doing this also reveals an error in the code directly below inside the "GenerateDebugImages->GetInt() != 0" branch, where it was outputting PNGData but SHOULD be outputting PNGDataUnrpojected... so fix that, too.
4. Then add some new code to combine the eyes and output a single image; in USceneCapturer::Tick, find the line
Insert the following code (I've included the surrounding code so you can be sure it's in the right place)
//*NEW* - Begin
IImageWrapperPtr ImageWrapper = ImageWrapperModule.CreateImageWrapper(EImageFormat::JPEG);
ImageWrapper->SetRaw(CombinedAtlas.GetData(), CombinedAtlas.GetAllocatedSize(), SphericalAtlasWidth, SphericalAtlasHeight * 2, ERGBFormat::BGRA, 8);
// Generate name
FString FrameString = FString::Printf(TEXT("Frame_%05d.jpg"), CurrentFrameCount);
FString AtlasName = OutputDir / Timestamp / FrameString;
//*NEW* - END
// Dump out how long the process took
FDateTime EndTime = FDateTime::UtcNow();
FTimespan Duration = EndTime - StartTime;
UE_LOG( LogStereoPanorama, Log, TEXT( "Duration: %g seconds for frame %d" ), Duration.GetTotalSeconds(), CurrentFrameCount );
StartTime = EndTime;
And that's it! (Maybe Epic or K&L will roll this into the plugin.)
Note: In the above we set the output to be JPEG too, this is because 4096x4096 frames in PNG at 60 frames per second is going to eat a lot of HDD space!
As above, set your SP.OuputDir and call SP.PanoramicScreenshot and you should now see something more like this, with the left and right eyes combined into a single top/bottom image:
If you have a stereoscopic 360 image viewer you should be able to literally just feed that image in and be inside your scene. An exciting start!
Super-Brief Notes on What the Capture Does
This is a super high-level explanation of what's happening when you do a capture, mainly for the purpose of framing what the settings do below... more information is out there if you want it, but you shouldn't need to know more to use the plugin.
A key thing to understand is that when you're talking about capturing stereoscopic information in a correct way from 2D left/right eye images, the only bit of the scene that is really showing correct stereoscopy for any given direction is the bit near the middle of the screen (i.e. the part 'between the eyes'), at least for capturing purposes. Behind the scenes the capturer is rendering the whole standard game-view (albeit with a different provided FOV) and then throwing most of it away.
In reality the width of the region taken depends on your HorizontalAngularIncrement and CaptureHorizontalFOV, but for 'high quality' capture settings it ends up being really quite small!
Because of this, when the plugin renders your 360 view it actually takes a number of different captures, rotating the camera a bit each time and extracting just the middle bit for use later. One way to think of this is that the more individual samples you take, the more precise stereoscopy information you're going to have for any given point.
There are two variables that control this:
- SP.HorizontalAngularIncrement controls how many degrees to turn horizontally between captures.
- SP.VerticalAngularIncrement controls how many degrees to turn vertically (from pole to pole) each time you complete a full 360-degree horizontal rotation.
Of the two of these, the HorizontalAngularIncrement is by far the one that needs to be tweaked to a low value. If you set it to something like 10 degrees (totalling 36 renders to do one 360-degree turn) then you're going to notice that your depth is off, you'll effectively end up with 'bands'... but in perceived depth information, rather than literal colour bands.
Configuring the Plugin Settings For Best Quality
Aside from SP.OutputDir there are several other CVar controlled settings for the plugin that you will want to configure in order to get a better quality capture; these control things like how many horizontal and vertical views you render to generate your frame (which generates more precise looking stereoscopy), how big you want your target output to be, how big you want your individual views to be (this is separate from your output resolution), and other bits and bobs.
I'll just explain the ones that we have particular settings for here, though a complete list can be found at the top of StereoPanoramaManager.cpp
The settings that we use here:
And then for 4096x4096 captures (4096*2048 per eye)
Or for 6144x6144 captures
As you can see, we use a very small value for HorizontalAngularIncrement resulting in 180 separate renders of the scene (per eye) for each 360-degree turn... going even lower MAY result in even better quality, but you're going to increase the amount of time it takes to render a frame and we found that 2 was the "sweet-spot" where you didn't really perceive any improvement in quality by going any lower.
For our VerticalAngularIncrement we set a significantly larger value. We found that you don't tend to have as much disparity in stereoscopy information vertically, at least for the scenes we tend to capture (which does include character models, so it's not all flat walls!) and 30 was our sweet spot here.
It's worth making a quick note here that the total number of views you're going to render PER FRAME is going to be based on these two parameters.
Effectively what you do is look directly upwards, capture a scene, turn HorizontalAngularIncrement degrees, capture again, turn HorizontalAngularIncrement again... and so on until you've turned all the way around 360 degrees. You then pitch down by VerticalAngularIncrement and do the whole thing again, and then repeat THAT until you're looking directly downwards.
So in effect, for a single stereoscopic-360 capture frame, the total number of renders you're going to do to generate that image is going to be:
(360 / HorizontalAngularIncrement) * (180 / VerticalAngularIncrement) * 2 (because it's per-eye)
So, with the above settings this means you're going to be doing 2520 frames of rendering to generate just one frame of stereoscopic-360 capture! To put some perspective on that, if you rendered 2520 frames of your 60hz game normally it would generate 42 seconds of output, but instead you're getting 1/60th of a second of output (if capturing for a 60fps movie).
SP.ConcurrentCaptures is an important one to consider above, too, mainly because if it's set too high (and it defaults to 30!), you're going to get VRAM OOM problems.
What this CVar actually does is control how many capture-components are created and rendering simultaneously, and each of them needs render-target memory associated with them, so it can be a real drain on your GPU memory resources. Now I say "simultaneously," but in reality the GPU is only ever dealing with one of them at a time, so at some point you're going to hit a plateau between processing a rendered frame on the CPU and running ahead with GPU frames, and it's likely going to happen waaaay before 30. We found that '6' was our sweet-spot, where increasing it any further (up to 30) basically just shaved a couple of seconds off of the render of a frame, not a good trade-off for the amount of VRAM and potential for it to blow up (due to OOM) halfway through a multi-day capture!
Finally, SP.CaptureSlicePixelWidth is an important one to talk about here. For a long time, we left this as the default, which was often something like 2048. This value represents the size of 'each view' that you render (of the 2520 views you're going to be rendering per frame!), so reducing this can have an enormous impact on your overall rendering time. It's actually separate from your final output image and you should size it based on how many vertical 'steps' you're going to do (180/VerticalAngularIncrement), and how much multi-sampling you want for the final image.
In effect, if your buffer was, say, 2048 and you have 6 vertical steps, then what you're really doing is rendering an image 12,288 pixels high. If your final output image is a 4096x2048 per-eye image then you can see that you really didn't NEED that many pixels in height to begin with; you've got enough for 6xFSAA there! We really only consider the vertical resolution here because, if you remember the image above, we're not even using the full width of the image, we're just grabbing the middle bit, so the width is less important here (it just has to be wide enough so that it has enough pixels to pull out of the middle).
By contrast, if you set the value to 720 then you're now going to be rendering an effective height of ~4320 and downsampling that to 2048, STILL equivalent to 2xFSAA (which we've found is definitely good enough) and now your rendering times are going to be about 40 seconds for those 2520 views, instead of being about 3 minutes (it doesn't exactly scale linearly, but it's still a nice win).
Setting All of the Settings in a Blueprint
Okay, so as mentioned above, none of these settings are stored between engine/editor runs. They're all transient and have to be set each time you start up, which can be a real pain.
Subsequently the way we work internally is to have a single console command we can call that sets all of them for us, and we call that just before capturing.
Alternatively, you can just put the console commands into a Blueprint; apologies for the table-like layout here, just wanted to fit it on one (hopefully) readable image!
With something like this in place you could just do "ce NTCaptureStereo" and it handles setting all of the parameters before kicking off the capture. Not only is this a nicer way to work, but it's also much less error-prone, and if you're about to kick off ~2 days of capturing you don't want to have accidentally forgotten to set a parameter!!!
Capturing a Movie
Right, so that's all of the important settings covered and a nice handy way to set up and trigger a capture from Blueprint using a single console command.
The very first, most important thing to always remember when capturing a movie, is that you want to be running with a fixed time-step.
A frame of capture is going to take upwards of 40 seconds, so unless you want your 80-second cinematic to generate just 2 frames of output you're going to want to tell the engine to only move in increments of time using a fixed time-step.
This is nice and simple to do (thanks Unreal). You just provide the following command-line
For example, if you set framerate to 60 then it's going to update in time-steps of 16milliseconds after each full frame is generated, so you'll get to capture 60 frames for each second of time that passes. If you're only planning on generating a 30hz movie then you COULD in theory just set this to 30, but we always capture at 60 so that we have the option of generating 60hz or 30hz movies from the frames.
Additionally it's a good idea to turn off texture streaming at this point using -notexturestreaming; if you spend a day doing a movie capture and then it turns out the floor texture is all blurry you're going to be pretty annoyed ;)
As a full example, internally we pass the following command-line when booting the game/editor when we want to do a capture
-usefixedtimestep -fps=60 -notexturestreaming
With that said, how do you go about doing the actual movie capture?
I mentioned SP.PanoramicScreenshot above, but if you look carefully at that Blueprint screenshot above you can see that there's a way to directly capture a number of frames to a movie too, specifically:
The number of frames here is literally "the number of times the engine has run its update loop from when you kicked off the command," e.g., if you are running fps=60 and set startime to 120 and endtime to 240 then after the command executes it will wait 2 seconds (120 frames) and then capture 2 seconds worth of frames (i.e. 120 of them), which you can then encode to 2 seconds worth of video @ 60hz... and so on.
We tend to just capture from matinees here (after all, at a fixed timestep and 40 seconds to capture each frame it's not really 'playable' framerate), so we always kick off the matinee at the same time that we trigger the SP.PanoramicMovie command, and then our start and end-frames can be easily worked out based on how many seconds into the matinee we want to start and stop capturing (just multiply by 60 and there's your answer).
After You've Captured Your Frames
So, when the capture is complete, in your SP.OutputDir/
If you used the custom code above then these are combined top/bottom images, ready to encode.
There are quite a lot of ways you can then turn this into a movie, though we tend to just use ffmpeg here to encode a h264 60hz video from the frames. Note: ffmpeg is available online for free download.
For example, The command-line below encodes all of the frames under the specified directory into a h264 60hz movie called MyMovie.mp4:
"ffmpeg.exe -framerate 60 -i F:/StereoCaptureFrames/2016.04.26-13.04.34/Frame_%%5d.jpg -c:v libx264 -profile:v high -level 4.2 -r 60 -pix_fmt yuv420p -crf 18 -preset slower MyMovie.mp4"
I won't go into the details of ffmpeg here, there's plenty of documentation and for all I know most people out there will just want to encode using some other editing software.
The output movie is then basically ready to use, although you'll want to mix-in your audio before you ship it out. :)
Congratulations... hopefully (if I've done a decent job) you now know how to set up and capture full stereoscopic 360 scenes that can be played back on your GearVR or uploaded to YouTube or Facebook.
A note on uploading to YouTube: It requires attaching meta-data and outputting it at the appropriate resolution/aspect-ratio, e.g. "4K/UHD/(3840x2160, 16:9)," not 4096x4096 in that case. Otherwise, it may be very very blurry when streaming it back. It's fine for your source frames to be 4096x4096 just when you encode the movie (with whatever program you use) remember to output a 'standard' UHD resolution and 16:9 aspect rather than a 4096x4096 1:1 aspect image (although that works fine on GearVR/Oculus360 playback).
Additional Gotchas and Notes
There were a few "gotchas" that we ran into that I thought I'd share. Some of these are universal and some of them have been fixed in 4.11 but will apply to older versions of the engine.
1. Not All Effects Work
It's important to note this, and with the above information hopefully it will make some sense. Your scene is effectively being captured in 'tiles' (with a zoomed-in FOV at that), you do a series of horizontal steps, and then repeat all of those horizontal steps for a series of vertical steps. Because of this, screen-space effects that are supposed to apply over a whole 'scene' (e.g. vignette) are not going to work and should be turned off.
Additionally, this means that light shafts aren't going to work either. You may end up with a series of 'tiles' in which the shaft-source is onscreen, and then on the next tile the shaft source isn't onscreen at all, so this will generate no shafts for that tile. This will result in a final image that appears to have blocks of light shafts on and off which will not look how you want. For this reason, unfortunately (because they're a great effect), light shafts have to go, too.
If you're wondering why light shafts appear to work in ours, it's because we have a custom world-space participating media solution, and world-space effects work just fine. :)
Likewise, if you had a screen-space distortion then that's not going to work either. You're only taking the middle of the screen at each point, so you'll get the distortion for just the middle of the screen, wrapped all the way around.
Note that world-space distortion still works, so distortion on your particles etc will still look great. :)
Generally speaking, screen-space effects won't work, so take a view on your content with that in mind.
2. Not Everything Obeys the fixedtimestep
This is a gotcha that caught us quite early on, during the vertical slice scene. We had a part where we played an in-game movie of a face during a cutscene, but for some reason we would only get about 2 frames of video and then it would be gone. This is because the videos were progressing at a 'normal' rate, not at the 16ms per render fixed rate that we wanted, meaning it was jumping through the video about 40 seconds for every frame we captured!
Likewise, things calculated on the GPU each frame using real deltas (as opposed to 'game time') will also not obey this. We had some funny cases in the past where we had fire that looked nice using normal materials and then GPU-particle 'embers' in the fire that appeared to be buzzing around at an incredible rate because their update wasn't being bound appropriately. There are settings available on the particle systems to set a fixed update-time to get around this.
3. By Default (As Shipped) the Captures Don't Pick Up Post-Volumes!
Our workaround for this was to just have the capture views use the post-process settings from the player.
As a simple example, if you go to USceneCapturer::InitCaptureComponent and add the following code (before the call to RegisterComponentWithWorld) then it will use the player's post-process settings.
//*NEW* Set up post settings based on the player camera manager
APlayerController* PlayerController = GetWorld()->GetFirstPlayerController();