This is really cool, and I can immediately see a lot of way to take this further and improve some of it's weaknesses.
To solve the issue with hidden faces (right now it just projects out copies of the visible textures, you could combine this with inpainting: Rotate the scene / model to expose untextured faces sides that get masked out and inpainted, then rinse and repeat until the model is fully textured.
Another interesting (albeit much harder) thing to try would be to take the image output from SD, run it through MiDaS to get a new depth map with additional details. Diff the depth maps in 3d space and update the geometry to match before projecting the image. Combine it with the first suggestion and you would have a process for progressively refining a detailed, fully textured model.
> Another interesting (albeit much harder) thing to try would be to take the image output from SD, run it through MiDaS to get a new depth map with additional details. Diff the depth maps in 3d space and update the geometry to match before projecting the image. Combine it with the first suggestion and you would have a process for progressively refining a detailed, fully textured model.
I made a go at this exact approach last night. I'm new to working with 3d data but at least got a mesh rendered from the depth map. Spent most of my time fighting with tooling. Let me know if you want to help out.
I've been running through the problem in my head from a theoretical perspective but am in a similar situation in regards to familiarity with tooling.
The depth map encodes an array of relative positions, but without knowing the camera settings, field of view, etc. mapping them back to a coordinate is a guess. Luckily if you are rendering your initial depth map from a 3d model, you can use the camera settings to get the information you need to convert a depth map back into a correct array of 3d pixel coordinates.
You can use those pixel coordinates, which you know lie along existing surfaces, to subdivide the mesh to add complexity where needed. Then when you generate the new image and corresponding depth map, you can project those into 3d space using the same camera settings. You will need to filter out coordinates that are past some margin of error from the existing mesh and then use those to push and pull the existing mesh faces.
Map your textures onto your updated mesh and then also generate a confidence texture. The more each face points towards the camera the more confident you can be about the texture there.
Then move / rotate the camera and repeat, but this time also render the model with the confidence texture to generate an inpainting mask image. Continue from different positions / angles until the scene or model has been full textured with a high degree of confidence across the entire texture.
To solve the issue with hidden faces (right now it just projects out copies of the visible textures, you could combine this with inpainting: Rotate the scene / model to expose untextured faces sides that get masked out and inpainted, then rinse and repeat until the model is fully textured.
Another interesting (albeit much harder) thing to try would be to take the image output from SD, run it through MiDaS to get a new depth map with additional details. Diff the depth maps in 3d space and update the geometry to match before projecting the image. Combine it with the first suggestion and you would have a process for progressively refining a detailed, fully textured model.