Connect
04 — Computer Vision · Q3 2026, 2 slots

Vision
that runs at the edge.

Detection, tracking, OCR, 3D reconstruction. On-device inference at 30+ FPS, models we trained on your data, deployed where the camera lives — not phoned to a cloud GPU.

Fixed quote in 24hSee pricing9 vision systems · 4.9★ avg
Stacks5 stacksPyTorch · ONNX · TensorRT · OpenCV
Timeline12 wkData → trained model → on-device
Investment$28K– 60KBuild tier · fixed, no surprises
OutputLive at the edge+ training pipeline, eval set, drift alerts
01 — What We Build

Vision that
runs where the camera is.

From labeling to deployment, end-to-end. Edge-first when latency or privacy demands it, cloud when scale does. We benchmark on your hardware before promising a number.

// 01 · DETECTION

Detection,
at line speed.

YOLOv8 and RT-DETR for the speed/accuracy frontier, Detectron2 when you need every last point of mAP. Quantized to INT8 / FP16 for the device that's actually in the field.

YOLOv8Detectron2RT-DETRONNX
// 02 · SEGMENTATION

Pixel-perfect
where it matters.

SAM 2 for promptable segmentation, Mask2Former for trained-from-scratch tasks, custom decoders for medical and industrial use cases.

SAM 2Mask2FormerDetectron2MMSeg
// 03 · TRACKING & POSE

Tracking.
Pose. Identity.

ByteTrack and DeepSORT for multi-object tracking, MMPose for body and object orientation, identity re-ID for retail and security workflows.

ByteTrackDeepSORTMMPoseOpenCV
// 04 · OCR & DOCUMENTS

Documents,
structured.

PaddleOCR for general extraction, LayoutLM for structured documents, custom heads for invoice / receipt / form pipelines. Multi-language out of the box.

PaddleOCRLayoutLMTesseractDonut
// 05 · EDGE DEPLOYMENT

Edge, honestly.
Not cloud-in-disguise.

CoreML on iOS, NNAPI / LiteRT on Android, TensorRT on NVIDIA, ONNX Runtime on Windows/Linux. Quantization, pruning, fusion — all measured on your target device.

CoreMLTensorRTONNX RuntimeNNAPI
// 06 · LABELING & MLOPS

Labeling included.
Drift, monitored.

Label Studio pipeline, active learning to pick the next 500 images, training runs on W&B, drift detector in production. The retraining pipeline transfers with the project.

Label StudioW&BModalClearML
02 — How We Ship

Twelve weeks.
Four stages.

Vision systems have their own rhythm — data first, model second, hardware reality always. Our process is built around real-world deployment, not demo videos in a lit room.

WK 0 · BRIEF
WK 1–3 · BASELINE
WK 4–10 · TUNE
WK 11–12 · DEPLOY
01
WK 0 · BRIEF & LABELING 3–7 days

Schema &
sample set.

Define the task graph (detect / segment / track / OCR), the label schema, and the success metric. Labeling pipeline stood up, first 500 labels collected with active learning.

Label Studio · + schema + fixed quote
02
WK 1–3 · BASELINE MODEL 3 weeks

Trained on
your data.

Baseline model (YOLOv8n or RT-DETR-S), trained on the initial label set, benchmarked on your target hardware. Frank report: what's achievable, what isn't, what we need more of.

Baseline mAP · + device benchmark
03
WK 4–10 · TUNE & OPTIMIZE 7 weeks

Train, prune,
quantize.

Iterative labeling on hard cases, architecture tuning, quantization to INT8 / FP16, latency optimization. Weekly model release with measured metrics on real devices.

Weekly releases · + metric reports
04
WK 11–12 · EDGE DEPLOY 2 weeks

Live on
the device.

CoreML / NNAPI / TensorRT packaging, drift monitor wired, retraining pipeline handed over. Your team can run the next training cycle without us.

Live at the edge · + retraining pipeline
03 — Selected Vision Work

Models that
left the lab.

3 of 22 · curated for vision

Production CV deployments — trained on the customer's data, shipped to the customer's device.

Featured · Table Monitoring13 / 22Seatwatch — real-time restaurant table monitoring (CV)
// Case 13 · 2025 · Real-time CV

Seatwatchthe floor, watched.

A real-time CV system that reads restaurant table occupancy from a single overhead camera — no sensors. YOLO11 + ByteTrack detect and track guests; a per-table state machine flags groups waiting too long, all at 30 FPS on the feed.

7
Tables · 1 cam
0
Sensors
30FPS
Real-time
YOLO11ByteTrackOpenCVPython
Read the case study
Pose · Rep Counting06 / 22RepIQ — AI gym rep counter & form coach (pose CV)
// Case 06 · 2025 · Pose CV

RepIQyour camera, your coach.

An AI workout partner that counts reps and checks form from any webcam — no wearables. MediaPipe Pose reads 33 landmarks; per-exercise joint-angle state machines count only full-range reps — 98% counting accuracy across six exercises.

6
Exercises
98%
Accuracy
0
Wearables
MediaPipeOpenCVPython
Read the case study
Parking Management20 / 22Parkr — real-time parking-lot management (CV)
// Case 20 · 2025 · Real-time CV

Parkrevery bay, accounted for.

Parking-lot occupancy read from the cameras already on the poles — no ground sensors. YOLO11 + ByteTrack map vehicles to polygon bay zones; a state machine flags overstays and surfaces free spots in under a second.

16
Bays · 1 cam
0
Sensors
<1s
Spot → screen
YOLO11ByteTrackOpenCVPython
Read the case study
04 — Engagement

Three
ways to work.

Fixed-price sprints, full builds, or ongoing programs. We'll tell you which fits in the scoping call — and if none fit, who else to talk to.

// 01 · Sprint
14 days

Sprint
tier.

A fixed two-week burst. Best for a baseline model on initial data, a device-benchmark report, or a focused detection prototype.

From$6k
Fixed price · 2 wks
1 ML engineer
+ labeling
20% reserved
  • Baseline model on 500 labels
  • Device-benchmark report
  • Hardest-case audit doc
  • One demo on target hardware
// 03 · Program
Monthly

Program
tier.

Embedded team for retraining cycles, drift response, and new tasks. Monthly engagements, roadmap on-call.

From$9k / mo
Monthly · roll-off any time
2+ ML engineers
+ ops
dedicated
  • Embedded team in your stack
  • Weekly metric + drift reviews
  • Retraining cycles on-call
  • Roadmap planning included
  • New tasks (segmentation, OCR)
05 — Questions, Answered

Before
you write back.

A reader, not an accordion. Pick a question on the left — the full answer opens on the right. Filter by topic, or step through with prev / next. Missing one? Ask in the brief and we'll answer in the reply.

Q·01Engineering★ Most asked

Edge or cloud inference — which fits us?

Edge if latency budget is tight (< 100ms), privacy is non-negotiable, or connectivity is unreliable. Cloud if you need the largest models, batch processing, or model swaps in production.

Most production CV systems are hybrid: a small, fast detector on-device that wakes up a heavier cloud model for high-confidence cases. We architect this split in week 1, with measured latency budgets on your actual hardware.

Q·02ProcessIncluded

We don't have a labeled dataset. Can you help?

Yes. Labeling pipeline is part of the engagement — we set up Label Studio (or your tool of choice), write the schema with you, and bring in our labeling partners for the bulk pass. Active learning picks the next 500 images that actually move the needle, not random ones.

Most clients start with 500–2000 labels and reach a usable model within 6 weeks. The full eval set typically settles at 3–10k labels.

Q·03Operations

Who owns the trained model weights and the training data?

You do — entirely. Weights, training data, eval set, labeling instructions — all your IP, all transferred at the end of the engagement.

We use your cloud accounts for training compute. No vendor lock-in, no "call us to retrain." Your data scientists can fork the pipeline on day one of handoff.

Q·04Engineering

Real-time at 30 FPS — feasible on our hardware?

Usually yes, but we benchmark before promising. In week 1 we deploy a baseline (YOLOv8n or RT-DETR-S) to your target device and measure FPS, memory, and battery. That number drives the model-architecture decision.

If 30 FPS isn't achievable with acceptable accuracy, we surface the trade-off cleanly: smaller model, lower resolution, frame skipping, or hardware upgrade. No surprises in week 8.

Q·05Engineering

Detection vs segmentation vs OCR — what's right for us?

Detection if you need what + where (bounding boxes). Segmentation if you need shape (defect outlines, medical imaging). OCR for text. Pose for body/object orientation.

Most production systems combine two — e.g. detect a panel, then segment the defect inside it. We'll write the task graph in week 0 with target metrics per stage.

Q·06Operations

How do you handle model drift in production — retraining cadence?

Three things. Drift monitor — production embeddings continuously compared to training distribution; alert on divergence. Confidence floor — predictions below threshold routed to human review and added to the next training batch.

Retraining cadence — quarterly by default, or triggered when drift threshold is hit. We hand over the retraining pipeline so you can run it without us.

Q·07Engineering

Can processing stay fully on-device, for privacy or air-gap?

Yes. We deploy via CoreML on iOS, NNAPI / LiteRT on Android, ONNX Runtime on Windows/Linux edge, TensorRT on NVIDIA. Quantization (INT8/FP16) is standard so you get 2–4× speedup with < 1% accuracy loss.

Nothing leaves the device. For audit, we log inference metadata locally and sync only the metadata when connectivity returns.

Q·08Pricing

We have a model already — just need help deploying?

Sprint tier — $6k, two weeks. We benchmark your model on target hardware, optimize (quantize, prune, fuse), and ship a production deployment with monitoring. Most clients see 2–5× speedup with the same accuracy.

06 — By the numbers

Four years of vision.
Edge receipts.

Real numbers from deployed vision systems — pulled from Weights & Biases, Sentry, and on-device telemetry. Updated quarterly.

// 01
9+
Vision systems live
↑ 1 this quarter
// 02
0.92
Avg mAP@0.5
Stable · 7 models
// 03
34FPS
On-device · A12+
↑ 6 vs Q1
// 04
12wk
Data → deployed
Labeled + trained + shipped
Source · W&B · Sentry · Edge telemetryLast updated Q2 2026 · refreshed quarterly
07 — Said About The Work

One quote.
From the right field-ops director.

There's a wall of testimonials on the home page. This is the one that matters for vision — a utility-scale solar operator that replaced clipboards with on-device defect detection at 34 FPS.

★★★★★

Our inspectors carry the same iPhones we issued in 2022. BytesGenX trained a panel-defect detector that runs on those exact phones at 34 FPS, with mAP@0.5 of 0.94 — and zero cloud round-trips.

The first crew that used it found fourteen defects on a site we'd cleared two weeks earlier. That single inspection paid for the project.

CoreML · YOLOv8Build tier · 14 weeksDeployed · Q3 2025
Also said about the work+5 more on the home page
★★★★★

"Labeling, training, deployment — end to end. We didn't have to hire a single ML engineer."

Priya Iyer · CTO, Shelfwise
★★★★★

"Caught 96% of defects our humans were missing. At line speed. The math wrote itself."

Heinrich Volk · QA Director, Forge Auto
★★★★★

"Real-time was the hard part. They made it boring. Boring is what production needs."

04 · Computer Vision · Q3 2026 · 2 slots

Tell us about
your vision system.

Whether it's a baseline-on-your-data sprint or a multi-quarter edge deployment, we reply within 4 hours — usually with a fixed quote, a device-benchmark plan, and a label budget.

Response time
~4h on weekdays
Min. engagement
2-week sprint
Slots — Q3 2026
2 of 4 · CV
Studio location
Remote · 4 timezones

Vision brief

~ 90 sec
BUILD TIER
$15K– $40K
<$5K$5K$15K$40K$80K+
Encrypted · We never share your brief