Happy to announce DreamFusion, our new method for Text-to-3D! dreamfusion3d.github.io We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of @BenMildenhall @ajayj_ @jon_barron #dreamfusion
@poolio @karpathy @BenMildenhall @ajayj_ @jon_barron How do you change the angle of rendering (to feed the nerf)? By using prompts such as "from the top/front"? How do you keep scene consistency between those angles?
@divideconcept @karpathy @BenMildenhall @ajayj_ @jon_barron We condition the diffusion model with view-dependent prompts based on the azimuth and elevation by appending "front view" "back view" etc. Consistency comes from 3d model + making sure all views in between are good too :)
@poolio @karpathy @BenMildenhall @ajayj_ @jon_barron is it similar to this technic with GANs where you find the latent space vector that goes from "front" to "side" (or "young" to "old", "male" to "female", etc...) and then use that vector to freely generate random views at different angles to feed the nerf ?
@poolio @divideconcept @karpathy @BenMildenhall @ajayj_ @jon_barron So is there a manual process in selecting the different views?