This is really cool, and I can immediately see a lot of way to take this further...

bryced · on Dec 21, 2022

> Another interesting (albeit much harder) thing to try would be to take the image output from SD, run it through MiDaS to get a new depth map with additional details. Diff the depth maps in 3d space and update the geometry to match before projecting the image. Combine it with the first suggestion and you would have a process for progressively refining a detailed, fully textured model.

I made a go at this exact approach last night. I'm new to working with 3d data but at least got a mesh rendered from the depth map. Spent most of my time fighting with tooling. Let me know if you want to help out.

dwallin · on Dec 21, 2022

I've been running through the problem in my head from a theoretical perspective but am in a similar situation in regards to familiarity with tooling.

The depth map encodes an array of relative positions, but without knowing the camera settings, field of view, etc. mapping them back to a coordinate is a guess. Luckily if you are rendering your initial depth map from a 3d model, you can use the camera settings to get the information you need to convert a depth map back into a correct array of 3d pixel coordinates.

You can use those pixel coordinates, which you know lie along existing surfaces, to subdivide the mesh to add complexity where needed. Then when you generate the new image and corresponding depth map, you can project those into 3d space using the same camera settings. You will need to filter out coordinates that are past some margin of error from the existing mesh and then use those to push and pull the existing mesh faces.

Map your textures onto your updated mesh and then also generate a confidence texture. The more each face points towards the camera the more confident you can be about the texture there.

Then move / rotate the camera and repeat, but this time also render the model with the confidence texture to generate an inpainting mask image. Continue from different positions / angles until the scene or model has been full textured with a high degree of confidence across the entire texture.