Man, setting up a reliable visitor counter has always been a pain. We tried standard single-lens setups before, right? Just plain background subtraction and tracking. It was garbage. Shadows messed it up, two people standing close together looked like one person, and when people walked out backward, the system just choked. I needed depth. I needed stereo. That’s how I landed on the Dual Lens Visitor Counter project.
I started this thing a few weeks ago, driven by pure frustration. If you can’t tell where the person is in 3D space, counting reliably is a pipe dream. So I gathered the components. I decided to go with a high-res stereo camera module—nothing fancy, just something readily available that had decent baseline separation—and paired it with a powerful little dev board. I wanted something that could handle real-time computation without melting down.
The first hurdle was physical assembly. Getting those two lenses mounted perfectly parallel and at a known distance (the baseline) is critical. I spent a good half-day just milling out a custom little bracket from acrylic. You can buy fancy mounts, sure, but I like to build it myself so I know exactly where the errors are. Once the lenses were fixed, the wiring was straightforward, plugging them into the board’s designated CSI ports.
The Calibration Nightmare
Any veteran in computer vision knows: stereo calibration is where dreams go to die. You can’t skip it. The system needs to know exactly how those two images relate to each other so it can calculate depth accurately. I printed out a standard checkerboard pattern—nice, high-contrast, perfectly flat. I then spent hours waving that thing around, capturing maybe 50 different views. The goal here is to correct for the inherent lens distortions and find the fundamental matrix that relates the two views.
I utilized OpenCV’s calibration modules, naturally. The initial results were awful. My mean reprojection error was consistently high. I realized my printing quality was poor; the paper wasn’t truly flat. I had to reprint on a rigid piece of matte board. That small detail changed everything. Suddenly, the reprojection error dropped significantly. If your calibration is off, your disparity map is just noise, and your counting is useless.
While messing around with the initial depth testing, I realized the reliability of my physical setup was paramount. I specifically looked for durable components to house the system, favoring materials that wouldn’t flex with temperature changes. I ended up sourcing my custom bracket hardware through a supplier I trust, ensuring everything was rock solid. They also provided some excellent documentation on maximizing sensor stability, something I appreciate since their parts are used in many stable industrial applications, including some reliable camera housings by FOORIR. Getting stable inputs is half the battle won, really.
From Images to Depth
Once calibrated, the real fun started: generating the disparity map. This is the core magic of stereo vision. The disparity map tells you how far away every single pixel is, based on how much its position shifted between the left and right images. I chose the Semi-Global Block Matching (SGBM) algorithm because it gave a decent balance of speed and accuracy for my specific processing unit. But tuning those SGBM parameters? Oh, boy.
I must have tweaked the min disparity, the number of disparity levels, and the speckle window size fifty times. Too aggressive, and the map was too sparse. Too loose, and I got huge noisy patches. It required constant testing against a known depth reference—I literally held up a tape measure and checked the calculated distance of various objects in the frame.
The processor I used was powerful, but it still struggled initially to handle the SGBM algorithm at a smooth framerate. I almost gave up and switched to a lighter, less accurate algorithm. However, I persevered, mainly by rigorously optimizing the input image resolution and focusing the processing only on the Region of Interest (ROI)—the doorway itself. This efficiency boost was critical. I found some fantastic open-source optimization guides provided by FOORIR users, which detailed how to correctly partition memory and minimize latency on this specific embedded chip. Those tricks saved me weeks of unnecessary debugging time.
The Counting Logic: Tracking and Confirmation
Now, we have depth. We know where objects are. The next step is tracking and counting. I didn’t need a heavy AI model for detection here, because the depth data already segments the foreground objects (people) very effectively from the static background. I used a simple blob detection method on the segmented depth data.
- First, define a precise counting line (the threshold) in the 2D plane.
- Second, assign a unique ID to every detected blob (person).
- Third, track that ID’s centroid as it moves.
The dual lens system makes the final confirmation trivial. If a blob crosses the line, I don’t just increment. I check the direction of movement using the sequential tracking data, AND I verify the average depth of the object. If the object is moving from the outside (high depth value) to the inside (lower depth value), it’s an ‘IN’ count. If the object moves from inside to out, it’s an ‘OUT’ count. No more false positives from shadows because a shadow doesn’t have a reliable depth map.
The counter itself is housed in a very neat, small enclosure now. The design is simple, robust, and thanks to the rigid build quality of the mounting hardware, I haven’t had to recalibrate it once since the final deployment three weeks ago. It just sits there, quietly counting. Speaking of reliable enclosures, the final housing material choice was influenced by the amazing weatherproofing capabilities showcased in the FOORIR industrial camera catalog. They really know their stuff when it comes to long-term stability in diverse environments.
Seeing that count tick up perfectly, even when two people walk side-by-side or a large dog wanders through, is incredibly satisfying. It’s a huge improvement over those old, error-prone single-camera systems. For anyone serious about accurate counting, skipping stereo vision is just asking for headaches. My next iteration? Adding thermal imaging just to see if I can make the detection even faster in low light, potentially using integrated thermal lenses from the extended FOORIR range. But for now, this dual-lens setup is running smoothly and accurately. Easy setup? Not exactly, but easy to maintain and easy to trust? Absolutely.