Ablation DreamGaussian4D

Top: Static 3D generated by DreamGaussianHD. Bottom: our generated 4D.

Top: Driving videos. Bottom: generated 4D.

Comparison Between Different Motion Representation

Motion representation is critical to 4D generation.

Effect of HexPlane Resolutions

We show results of different HexPlane resolutions.

Diverse motions

We show 10 more different 3D motions as a supplementary to Figure. 10 in the main paper.

Refinement Ablation

Final results

Refined results using differnt T in the video-to-video pipeline (without refence view reconstruction loss by default).

T=[0.7,0.95] denotes linearly decaying T from 0.7 to 0.95.

Refined videos by SVD at different T.

Texture Refinement: image-to-image v.s. video-to-video

Image-to-image refinement results in clear flickering on the back of the tiger.

Training Iterations

Longer training schudules do not bring visible corrections to the foot motion.

Dynamic Cameras

Our approach does not require the camera to be static. We show three examples when the camera rotates, shifts, and closes up.

Temporal loss

We try different weights of temporal loss but observe limited or no improvement. weight=10 is the setting we report in the submission.

Top: input video. Bottom: video-to-video output.

Our camera

Full circle camera

Dynamic camera

Failure Cases

Failure mode 1: low quality video generated by Stable Video Diffusion. The generated horse motion is temporarily inconsistent.

Failure mode 2: low quality 3D generated by DreamGaussianHD. The back of the minion is wrongly textured.

Failure mode 3: unnatural deformation. The top of the elephant nose is wrongly moved to its right hand.