Ablation Results for DreamGaussian4D
Top: Static 3D generated by DreamGaussianHD. Bottom: our generated 4D.
Top: Driving videos. Bottom: generated 4D.
Comparison Between Different Motion Representation
Motion representation is critical to 4D generation.
Effect of HexPlane Resolutions
We show results of different HexPlane resolutions.
Diverse motions
We show 10 more different 3D motions as a supplementary to Figure. 10 in the main paper.
Refinement Ablation
Final results
Refined results using differnt T in the video-to-video pipeline (without refence view reconstruction loss by default).
T=[0.7,0.95] denotes linearly decaying T from 0.7 to 0.95.
Refined videos by SVD at different T.
Texture Refinement: image-to-image v.s. video-to-video
Image-to-image refinement results in clear flickering on the back of the tiger.
Training Iterations
Longer training schudules do not bring visible corrections to the foot motion.
Dynamic Cameras
Our approach does not require the camera to be static. We show three examples when the camera rotates, shifts, and closes up.
Temporal loss
We try different weights of temporal loss but observe limited or no improvement. weight=10 is the setting we report in the submission.
Top: input video. Bottom: video-to-video output.
Failure Cases
Failure mode 1: low quality video generated by Stable Video Diffusion. The generated horse motion is temporarily inconsistent.
Failure mode 2: low quality 3D generated by DreamGaussianHD. The back of the minion is wrongly textured.
Failure mode 3: unnatural deformation. The top of the elephant nose is wrongly moved to its right hand.