Epic Research

At Epic Research we conduct applied research across a range of domains to enhance our products and support the integration of innovative features throughout the Epic ecosystem.
1 de octubre de 2024

Speech Driven Tongue Animation – CVPR 2022

Advances in speech driven animation techniques now allow creating convincing animations of virtual characters solely from audio data. While many approaches focus on facial and lip motion, they often do not provide realistic animation of the inner mouth. Performance or motion capture of the tongue and jaw from video alone is difficult because the inner mouth is only partially observable during speech. In this work, we collected a large-scale speech to tongue mocap dataset that focuses on capturing tongue, jaw, and lip motion during speech . This dataset enables research on data-driven techniques for realistic inner mouth animation. We present a method that leverages recent deep-learning based audio feature representations to build a robust and generalizable speech to animation pipeline. We find that self-supervised deep learning based audio feature encoders are robust and generalize well to unseen speakers and content. To demonstrate the practical application of our approach, we show animations on a high-quality parametric 3D face model driven by the landmarks generated from our speech-to-tongue animation method.Salvador Medina (Carnegie Mellon University, Epic Games), Denis Tome (Epic Games), Carsten Stoll (Epic Games), Mark Tiede (Haskins Laboratories), Kevin Munhall (Queens University), Alex Hauptmann (Carnegie Mellon University), Iain Matthews (Epic Games)
1 de octubre de 2024

Enhancements to Media Transport in ICVFX using SMPTE 2110 – SMPTE Motion Imaging Journal 2023

This paper proposes an enhancement to clustered rendering for In-Camera Visual Effects (ICVFX) using SMPTE 2110, a standard for professional media over managed IP networks. We demonstrate how SMPTE 2110 enables the multicasting of multiple camera views at varying resolutions, each rendered by dedicated nodes and then received by cluster nodes for warping and composition. This approach improves rendering efficiency, scalability, and performance. We also describe techniques to minimize multicast latency, resulting in no added frame delay. Additionally, we show how SMPTE 2110 and IEEE-1588 Precision Time Protocol (PTP) with SMPTE ST 2059 synchronize video output across render nodes, driving LED walls with low latency and a unified media transport strategy. The implementation is available in Unreal Engine source code.Alejandro Arango (Epic Games), Simon Therriault (Epic Games), Andriy Yamashev
1 de octubre de 2024

PHISANET: Phonetically Informed Speech Animation Network – ICASSP 2024

Realistic animation is crucial for immersive and seamless human-avatar interactions as digital avatars become more prevalent. This work presents PhISANet, an encoder-decoder model that realistically animates the face and tongue solely from speech. PhISANet leverages neural audio representations trained on vast amounts of speech to map the speech signal into animation parameters that control the lower face and tongue of realistic 3D models. By integrating a novel multi-task learning strategy during the training phase, PhISANet reincorporates the phonetic information from the input speech, improving articulation in the generated animations. A thorough quantitative and qualitative study validates this improvement, and it determines that WavLM and Whisper features are ideal for training a generalizable speech-animation model regardless of gender, age, and language.Salvador Medina (Epic Games), Iain Matthews (Epic Games), Sarah Taylor (Epic Games), Carsten Stoll (Epic Games), Gareth Edwards (Epic Games), Alex Haupmann (CMU), Shinji Watanabe (CMU)
1 de octubre de 2024

Humanlike Behavior in a Third-Person Shooter with Imitation Learning – IEEE CoG 2024

We tackle the problem of generating humanlike bot behavior by learning from human demonstrations. We developed a controlled gym environment to collect data on a subset of human behavior-namely aiming and target acquisition in single opponent settings. We introduce an identity-conditioned causal transformer to produce humanlike behavior of a controllable quality on a per-frame basis that captures the differences in skill and style between conditioned players.Alex Farhang (Caltech, Epic Games), Brendan Mulcahy (Epic Games), Daniel Holden (Epic Games), Iain Matthews (Epic Games), Yisong Yue (Caltech)
1 de octubre de 2024

EpicADA: Controllable and Expressive Audio-Driven Animation – Siggraph 2025

We present EpicADA, a model that generates expressive and realistic full face, tongue and head animation from speech audio. Our method relies on the pre-trained Whisper encoder for extracting rich features from the audio, which are decoded into animation using a set of gated recurrent unit networks. We animate directly to MetaHuman rig controls. The network is conditioned on a rich set of emotions, and can either automatically detect the emotion from the voice, or animate to the emotion specified by the user at test time. A user study supports that EpicADA produces highly realistic results that are often confused with ground truth performance.Sarah Taylor (Epic Games), Salvador Medina (Epic Games), Jonathan Windle (Epic Games), Erica Saez (Epic Games), Iain Matthews (Epic Games)
19 de julio de 2024

Position-Based Nonlinear Gauss-Seidel for Quasistatic Hyperelasticity – Siggraph 2024

Position based dynamics [Müller et al. 2007] is a powerful technique for simulating a variety of materials. Its primary strength is its robustness when run with limited computational budget. Even though PBD is based on the projection of static constraints, it does not work well for quasistatic problems. This is particularly relevant since the efficient creation of large data sets of plausible, but not necessarily accurate elastic equilibria is of increasing importance with the emergence of quasistatic neural networks [Bailey et al. 2018; Chentanez et al. 2020; Jin et al. 2022; Luo et al. 2020]. Recent work [Macklin et al. 2016] has shown that PBD can be related to the Gauss-Seidel approximation of a Lagrange multiplier formulation of backward Euler time stepping, where each constraint is solved/projected independently of the others in an iterative fashion. We show that a position-based, rather than constraint-based nonlinear Gauss-Seidel approach resolves a number of issues with PBD, particularly in the quasistatic setting. Our approach retains the essential PBD feature of stable behavior with constrained computational budgets, but also allows for convergent behavior with expanded budgets. We demonstrate the efficacy of our method on a variety of representative hyperelastic problems and show that both successive over relaxation (SOR), Chebyshev and multiresolution-based acceleration can be easily applied.Yizhou Chen (Epic Games, UCLA), Yushan Han (UCLA, Epic Games), Jingyu Chen (UCLA), Zhan Zhan (UCD, Epic Games), Alex Mcadams (Epic Games), Joseph Teran (UCD, Epic Games)
19 de julio de 2024

A Neural Network Model for Efficient Musculoskeletal-Driven Skin Deformation – Siggraph 2024

We present a comprehensive neural network to model the deformation of human soft tissues including muscle, tendon, fat and skin. Our approach provides kinematic and active correctives to linear blend skinning [Magnenat-Thalmann et al. 1989] that enhance the realism of soft tissue deformation at modest computational cost. Our network accounts for deformations induced by changes in the underlying skeletal joint state as well as the active contractile state of relevant muscles. Training is done to approximate quasistatic equilibria produced from physics-based simulation of hyperelastic soft tissues in close contact. We use a layered approach to equilibrium data generation where deformation of muscle is computed first, followed by an inner skin/fascia layer, and lastly a fat layer between the fascia and outer skin. We show that a simple network model which decouples the dependence on skeletal kinematics and muscle activation state can produce compelling behaviors with modest training data burden. Active contraction of muscles is estimated using inverse dynamics where muscle moment arms are accurately predicted using the neural network to model kinematic musculotendon geometry. Results demonstrate the ability to accurately replicate compelling musculoskeletal and skin deformation behaviors over a representative range of motions, including the effects of added weights in body building motions.Yushan Han (Epic Games, UCLA), Carmichael Ong (Stanford), Yizhou Chen (Epic Games, UCLA), Jingyu Chen (UCLA), Jennifer Hicks (Stanford), Joseph Teran (Epic Games, UC Davis)
15 de noviembre de 2023

Primal Extended Position Based Dynamics for Hyperelasticity – ACM MIG 2023 - Computers & Graphics

The Extended Position Based Dynamics (XPBD) approach of Macklin et al. [2016] addresses the issues with iteration-dependent behavior in the original Position Based Dynamics [2007] (PBD) which itself is a powerful method for the real-time simulation of elastic objects. However, it is limited in its application to hyperelastic solids. It can only treat models with a strain energy density that is quadratic in some notion of constraint. Furthermore, we show that even when applicable the formulation does not always lead to convergent behaviors with hyperelasticity. We isolate the root cause in the approximate linearization of the nonlinear backward Euler systems utilized by XPBD. We provide two fixes to these terms that allow for convergent behavior. The first (B-PXPBD) is a small modification to an existing XPBD code, but can only be used with models addressable by the original XPBD. The second (FP-PXPBD) is a more general formulation that extends XPBD (and our residual correction) to arbitrary hyperelasticity. We show that our modifications allow for convergent behavior that rivals accurate techniques like Newton’s method when the computational budget is large without sacrificing the stable and robust behavior exhibited by the original PBD and XPBD when the computational budget is limited.Yizhou Chen (Epic Games, UCLA), Yushan Han (UCLA, Epic Games), Jingyu Chen (UCLA), Shiqian Ma (Rice), Ronald Fedkiw (Stanford, Epic Games), Joseph Teran (UCD, Epic Games)
24 de julio de 2022

Analytically Integratable Zero-restlength Springs for Capturing Dynamic Modes unrepresented by Quasistatic Neural Networks – SIGGRAPH 2022

We present a novel paradigm for modeling certain types of dynamic simulation in real-time with the aid of neural networks. In order to significantly reduce the requirements on data (especially time-dependent data), as well as decrease generalization error, our approach utilizes a data-driven neural network only to capture quasistatic information (instead of dynamic or time dependent information). Subsequently, we augment our quasistatic neural network (QNN) inference with a (real-time) dynamic simulation layer. Our key insight is that the dynamic modes lost when using a QNN approximation can be captured with a quite simple (and decoupled) zero-restlength spring model, which can be integrated analytically (as opposed to numerically) and thus has no time-step stability restrictions. Additionally, we demonstrate that the spring constitutive parameters can be robustly learned from a surprisingly small amount of dynamic simulation data. Although we illustrate the efficacy of our approach by considering soft-tissue dynamics on animated human bodies, the paradigm is extensible to many different simulation frameworks.Yongxu Jin (Stanford, Epic Games), Yushan Han (UCLA, Epic Games), Zhenglin Geng (Epic Games), Joseph Teran (UCD, Epic Games), and Ronald Fedkiw (Stanford, Epic Games)
  • 1
  • 2