Ablation Results for DreamGaussian4D

Top: Static 3D generated by DreamGaussianHD. Bottom: our generated 4D.

missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing

Top: Driving videos. Bottom: generated 4D.

missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing

Comparison Between Different Motion Representation

Motion representation is critical to 4D generation.

missing
Framewise 3DGS
missing
Framewise 3DGS (w/ init)
missing
MLP Deformation
missing
HexPlane Deformation

Effect of HexPlane Resolutions

We show results of different HexPlane resolutions.

missing
S/4
missing
Sx4
missing
T/4
missing
Tx4
missing
Ours (32x32x32)

Diverse motions

We show 10 more different 3D motions as a supplementary to Figure. 10 in the main paper.

missing
missing
missing
missing
missing
missing
missing
missing
missing
missing

Refinement Ablation

Final results

missing
Generated 4D GS
missing
Extracted mesh
missing
Refined mesh

Refined results using differnt T in the video-to-video pipeline (without refence view reconstruction loss by default).

T=[0.7,0.95] denotes linearly decaying T from 0.7 to 0.95.

missing
T=0.6
missing
T=0.7
missing
T=0.8
missing
T=[0.7,0.95]
missing
T=0.7 + Recon. Loss

Refined videos by SVD at different T.

missing
Input video
missing
T=0.5
missing
T=0.6
missing
T=0.7
missing
T=0.8

Texture Refinement: image-to-image v.s. video-to-video

Image-to-image refinement results in clear flickering on the back of the tiger.

missing
Image-to-image
missing
Video-to-video

Training Iterations

Longer training schudules do not bring visible corrections to the foot motion.

missing
Driving video
missing
#Iteration=200 (Ours)
missing
#Iteration=500
missing
#Iteration=1000
missing
#Iteration=2000

Dynamic Cameras

Our approach does not require the camera to be static. We show three examples when the camera rotates, shifts, and closes up.

missing
Input Video (rotate)
missing
Input Video (shift)
missing
Input Video (close up)
missing
Generarted 4D (rotate)
missing
Generated 4D (shift)
missing
Generated 4D (close up)

Temporal loss

We try different weights of temporal loss but observe limited or no improvement. weight=10 is the setting we report in the submission.

missing
weight=0
missing
weight=10
missing
weight=100
missing
weight=1000
missing
weight=10000

Top: input video. Bottom: video-to-video output.

Our camera
Full circle camera
Dynamic camera

Failure Cases

Failure mode 1: low quality video generated by Stable Video Diffusion. The generated horse motion is temporarily inconsistent.

missing
Input Image
missing
Generated Video

Failure mode 2: low quality 3D generated by DreamGaussianHD. The back of the minion is wrongly textured.

missing
Input Image
missing
Generated 3D

Failure mode 3: unnatural deformation. The top of the elephant nose is wrongly moved to its right hand.

missing
Generated 3D
missing
Input Video
missing
Generated 4D