Happy to announce DreamFusion, our new method for Text-to-3D! dreamfusion3d.github.io We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of @BenMildenhall @ajayj_ @jon_barron #dreamfusion
@poolio @karpathy @BenMildenhall @ajayj_ @jon_barron How do you change the angle of rendering (to feed the nerf)? By using prompts such as "from the top/front"? How do you keep scene consistency between those angles?
@divideconcept @karpathy @BenMildenhall @ajayj_ @jon_barron We condition the diffusion model with view-dependent prompts based on the azimuth and elevation by appending "front view" "back view" etc. Consistency comes from 3d model + making sure all views in between are good too :)
@poolio @divideconcept @karpathy @BenMildenhall @ajayj_ @jon_barron So is there a manual process in selecting the different views?
@juliendorra @divideconcept @karpathy @BenMildenhall @ajayj_ @jon_barron nope, automatic and described in appendix: for azimuth we split into four equally sized quadrants: "front view" is between 0 and 90, "side view" 90 - 180, "back view" 180-270, and "side view" for 270-360. If elevation > 60 degrees we swap to "overhead view"
@poolio @divideconcept @karpathy @BenMildenhall @ajayj_ @jon_barron Do you feel there’s a path for diffusion models to really become multi-view aware, or is there specific models to reinvent for that? (and thus going back to square one of… several order of magnitude bigger 3D datasets 😬)