Piero V.

My RGBD toy box

In the last few months, in my free time, I have been developing a small application to elaborate RGBD datasets capturing people. In particular, my goal is to create 3D scans of heads. I do not expect these models to be well-made or even usable without some processing, but I wonder if I can transform this data at least to starter models.

I first started to work with RGBD cameras during my internship at Altair. At the time, I built a small pipeline to reconstruct models based on Kinect Fusion.

Sadly, Kinect Fusion is an online method. The advantage is that it will give immediate feedback if it loses the camera tracking. But the disadvantage is that it needs a pretty powerful GPU. I have one on my desktop, but I took all my datasets using laptops.

Also, in my experience, getting a usable dataset with Kinect Fusion requires several attempts and time (the acquisition must proceed very slowly), which, generally speaking, is not always compatible with… people 😄️. They might move, or lose patience, etc etc… … [Leggi il resto]

OpenCV and time lapses

After buying my Pixel 4a, I decided to take a picture of a poplar near my home every day. I did this for one year, and I created a time lapse. But I will not publish it here because it would reveal where my home is 😜️.

Methodical is not enough

With time lapses, you usually keep your camera still, but this was not an option in my case. Therefore, I tried to be methodical in taking the various pictures.

I used a sewer cover as a point to shoot the photo and a telephone pole as a reference (its tip is close to the upper-left corner in every picture).

Still, the results were varied, but luckily OpenCV came to the rescue.

Homography matrices

We could say that my scenario is like capturing the same scene with different cameras. Therefore, we can compute the homography matrix to reproject one image to the previous one.

And OpenCV has a very handful function to do so: findHomography. It takes the coordinates of corresponding points as inputs, and it returns a 3-by-3 matrix as output.

If you are using Python, you must pass the points as two NumPy matrices. Both must have the same shapes: a row for each pair and two columns with the coordinates. The point at the ith row of the first array must correspond to the point at the ith row of the second array. … [Leggi il resto]

PyElas

Recently, I started experimenting with stereo vision.

It is a technique to produce depth maps using images captured by close positions. Then, with these maps, it is possible to create 3D representations.

The core of this workflow is the matching algorithm, which takes pairs of post-processed images and creates a “disparity” map. The disparity is the distance between a point in the two images. Depth and disparity are inverses, so it is easy to switch from one to the other.

OpenCV contains some stereo matching algorithms, but they produced a lot of noise. So I looked for another library, and I found libelas.

It is a GPLv3 C++/MATLAB library with many parameters to tune, but I could not find a Python version. My options were to switch to C++ or to port it by myself. I chose the latter, hoping that also others can benefit from it 🙂️.

Long story short, I published my first package on PyPI: PyElas.

How to use it

You can install it using pip. Then you just have to do this: … [Leggi il resto]

On acquiring 3D models of people

For my Master’s degree thesis, I dove into acquiring static objects with a RGBD sensor. Eventually, I decided to use the Kinect Fusion algorithm, which produced decent results.

I found this topic fascinating, so I continued on my own, but in another direction: acquiring people. So, in the last few months, I have been experimenting with scanning myself and my friends with a Microsoft Kinect One.

Conditioned by my previous results, initially, I tried with point clouds.

Deformation graph approach

One approach I found in several papers consists in:

  1. acquiring only a few scans (from 6-8 points of view), with the person as still as they can;
  2. performing a rough global alignment;
  3. running ICP to improve the rigid alignment locally;
  4. downsampling the point cloud to build a deformation graph;
  5. resolving an optimization problem;
  6. deforming the denser point clouds.

Reaching point 3 is not trivial because people move. ICP can be very unforgiving, and in some cases, you also need some luck to obtain good results at this stage.

For downsampling (Point 4), I used Open3D’s voxelization followed by averaging point coordinates. I do not know how this can influence the final results compared to something like a clustering algorithm. … [Leggi il resto]

Kinect, Kinect One, OpenCV e OpenNI, come siamo messi?

Di recente ho giocato un pochino con i dispositivi del titolo: Kinect per Xbox 360 e Kinect per Xbox One. Ma non ci ho giocato né con l’Xbox, né per scopo (puramente) ludico: potrebbero diventare l’argomento della mia tesi di laurea.

Ma torniamo indietro un secondo, anzi di qualche mese. A metà febbraio ho cominciato il mio tirocinio obbligatorio per la laurea, e ho deciso di farlo presso un’azienda che sviluppa un software di modellazione 3D per Windows e Mac. Più precisamente, io mi sto concentrando sulla possibilità di creare plugin per questo software in Python, in aggiunta alla possibilità di farlo in C++, già esistente.

La cosa più diffusa per il mio corso di laurea è quella di fare il tirocinio lungo e poi la tesi che lo riguardi. Siccome il mio relatore si occupa di Computer Vision (il corso più bello che io abbia fatto
in 5 anni di università, davvero) abbiamo pensato di fare un qualche plugin che riguardi la computer vision per il software, almeno per scopo didattico.

Abbiamo avuto diverse idee, e alcune di queste riguardano l’uso di sensori che possano rilevare la profondità, tra questi ci sono le due Kinect, che sono abbastanza diffuse. In ufficio avevamo già una Kinect per Xbox 360, poi il mio amico Giacomo mi ha prestato una Kinect per Xbox One. … [Leggi il resto]