Computer vision on a $200 board

One of the questions investors ask us early is: "what's stopping someone with a $5,000 GPU from building this?" The honest answer is: nothing's stopping them, but they'll lose to whoever can build it on a $200 board.

Here's how we got real-time tennis-ball detection running on a budget embedded compute module.

The hardware budget

Total bill of materials for the perception stack on the current production unit, at the volumes we buy:

  • Compute module (Jetson-class): ~$110
  • Wide-angle monocular camera: ~$40
  • Mounting + cable + ESD protection: ~$15
  • Cooling: ~$8

Total: ~$173 USD. Round it to $200 to account for the things we forgot.

The compromises

To fit perception in that budget, three things had to be true.

1. Monocular, not stereo. Stereo vision gives you free depth but doubles the camera cost and the bandwidth. We get depth from monocular cues and ball-size priors. A tennis ball is always 6.5 cm. If you can detect it, you can estimate range from its apparent size.

2. Pruned model, not pre-trained. Off-the-shelf YOLOv5 is ~14M parameters. We retrained, then pruned, then quantized to int8 on our deployment target. The shipping model is ~1.2M parameters. We lose about 2 points of precision and gain 8 FPS.

3. Resolution where it matters. The full sensor is 1080p. We process at 480p in the wide field, then crop to 720p near the gripper for the final few centimetres of the approach. Two-stage spatial attention, basically. Saves a lot of compute.

What we don't do (yet)

We don't do trajectory prediction beyond "next 0.5 seconds." We don't do player-pose estimation. We don't do multi-ball tracking across long occlusions. Each of these is on the roadmap and each would either need a better compute module or a smarter algorithm. We're betting on smarter algorithms.

Why this matters

A $200 perception stack means we can ship a $599 product with margin. A $5,000 GPU means $20,000+ retail. Different market entirely. Constraint forces good engineering. We have not yet seen anyone match this perception performance at this price.

— Javad