For my Master’s degree thesis, I dove into acquiring static objects with a RGBD sensor. Eventually, I decided to use the Kinect Fusion algorithm, which produced decent results.
I found this topic fascinating, so I continued on my own, but in another direction: acquiring people. So, in the last few months, I have been experimenting with scanning myself and my friends with a Microsoft Kinect One.
Conditioned by my previous results, initially, I tried with point clouds.
Deformation graph approach
One approach I found in several papers consists in:
- acquiring only a few scans (from 6-8 points of view), with the person as still as they can;
- performing a rough global alignment;
- running ICP to improve the rigid alignment locally;
- downsampling the point cloud to build a deformation graph;
- resolving an optimization problem;
- deforming the denser point clouds.
Reaching point 3 is not trivial because people move. ICP can be very unforgiving, and in some cases, you also need some luck to obtain good results at this stage.
For downsampling (Point 4), I used Open3D’s voxelization followed by averaging point coordinates. I do not know how this can influence the final results compared to something like a clustering algorithm.
But so far, nothing really new. Libraries like Open3D get you covered on all these processes.
The real challenge starts with the optimization problem. We need to minimize the sum of some “energies”:
- smooth: nodes must be as close as possible to their original position;
- rigid: we want only rotations, without scaling;
- correlation: the same one as the point-to-plane ICP.
I regret not having proper numerical analysis and optimization classes at University. So, I found this marvelous PDF about non-linear least squares problems. Luckily, I had Mathematical Analysis classes that gave me the foundations to understand it. Moreover, I love learning, so I could catch up quickly. At least in theory.
In practice, I never managed to get the expected results. Maybe our problem has far too many variables. Or I introduced errors somewhere else.
Then I paused the research, and when I resumed it, I tried other ways.
Skeletons/armatures on point clouds
Often, in animation, humans are managed through skeletons or armatures. And that was actually my first idea.
Skeletons on incomplete point clouds
I found some papers about detecting skeletons of incomplete point clouds (e.g., exploiting cylindrical symmetries). Again, I had some problems with the optimization. With these approaches, it is even more difficult to find errors because there are many partial results. Debugging, or in some cases even understanding them, is not really immediate.
Or, simply, I am too yielding, and I would like to implement too many projects 🙂️. So, once again, I left these attempts incomplete.
AI automatic detection
Anyway, I also tried posing with OpenNI and NiTE.
No wonder the project did not get a great follow-up. First, it was too opaque. For instance, getting offline results from already acquired frames was impossible. Then its data was not coherent with the data I got from the Kinect. I am sure I had a lot of other problems that I do not remember now.
I then tried some popular AIs, but I found only ones for 2D data. They were pretty accurate, as expected from an AI, but I absolutely needed the 3D data. Even a rough 3D would be more helpful, to me, than perfect 2D.
Moreover, I am working on a limited set of models, at least for the moment. So, it is okay for me to refine the poses by hand.
Eventually, I wrote an application for creating bones on a point cloud and applying poses.
I enjoyed doing that quite a lot: I improved my OpenGL and learned a lot about working with armatures.
Also, I saw that getting the weights for bones is not trivial. My first implementation used only the closest bone, or the two closest when a point is near a joint. It ended up working better than I thought, but it gives a lot of room to improve.
A more worrying problem is getting a mesh, which I did not solve.
The mesh approach
Building a mesh from a point cloud is a problem… whenever you use a point cloud! You could even use more than one sensor to avoid having the alignment problem, but you would still have to create the mesh.
And actually, RGBD sensors do not give you a point cloud but a sort of surface (albeit open and not watertight)! Getting it is as simple as doing that:
while x < (width - 1): while y < (height - 1): if has_depth(x, y) and has_depth(x + 1, y) and has_depth(x + 1, y + 1) and has_depth(x, y + 1): p0 = unproject(x, y) p1 = unproject(x + 1, y) p2 = unproject(x + 1, y + 1) p3 = unproject(x, y + 1) build_quad(p0, p1, p2, p3)
has_depth function returns true when that pixel on the depth image is greater than 0. For many use cases, the
build_quad function should use indices, so that quads result connected whenever possible (or you should perform a weld).
A better implementation can also check that the difference of depth between two adjacent vertices is below a certain threshold. In any case, either the inputs or the output will need some processing to keep only the objective. Scanning in big rooms, with foregrounds far from the background, helps cleaning data faster.
Programs like Blender can apply skeletons also to this kind of mesh, making it is possible to align frames manually. And it is easier with meshes than point clouds. But the merge… well, it needs to be done manually as well.
In many cases, a remesh would be necessary anyway. E.g., topology matters for animations or the Laplacian filter cannot smooth enough the acquired meshes.
Also, scans sometimes are too accurate. Some details like creases on the clothes are okay for 3D printing: they increase the feeling of realism. But they appear off whenever smooth surfaces are expected. Or, in animations, creases should be animated as well. For these cases, it is better to use the scan as a reference and take some artistic freedom.
So, for now, this is the way I chose to elaborate my friend’s scans. I still have not produced any final results because I am not good at 3D modeling, but I am trying to improve 😊️. Probably starting in this way would have saved me a lot of time 😂️.