Data Overview
This is a practical summary of what you actually receive in a PoC pack: scenes, metadata, QC signals, and delivery structure. Start small, validate fit, then scale.
Goal: reduce integration friction. You should be able to run your pipeline end-to-end within days, not weeks.
What you get in a PoC pack
- MP4 clips (short segments suitable for fast iteration)
- JSON metadata per clip (scene type + operational notes)
- High-level QC summary (accept/reject signals and reasons)
- Coverage notes (what is included, what is excluded)
- Non-binding delivery structure preview for validation
We keep the format simple: video + metadata. You can add labels later if your team wants labeling or evaluation tasks.
Dataset snapshot
120+
hours of raw footage
18,000+
video segments
4
cities
3
countries
Coverage is continuously expanding as new collection cycles are completed.
Coverage at a glance
Our core focus is high-entropy traffic environments and edge-like everyday reality: mixed agents, dense interactions, and non-lane-based flow.
Representative scene categories
- Unstructured urban flow (mixed agents, non-lane-based)
- Dense intersections and merges
- Near-field pedestrian interactions
- Motorcycle-heavy corridors
- Night, rain, glare, occlusion-heavy moments
Need a specific city pattern or traffic behavior? Ask for a Coverage Snapshot and we will reply with what is feasible and how fast.
Dataset schema overview
Each delivery includes structured video segments and consistent metadata so your team can filter, audit, and integrate quickly. The schema is intentionally simple for PoC speed while remaining stable enough for larger training pipelines.
Core fields
- segment_id — stable ID used for joins and traceability
- scene_type — high-level environment category
- short_description — short natural language description
- capture_context — optional context such as time_of_day or platform
- qc_status — basic QC result for the segment
- qc_notes — brief QC reviewer notes
Example segment metadata (illustrative)
{
"segment_id": "IN-BLR-2026-02-000123",
"scene_type": "unstructured_urban_flow",
"short_description": "Dense mixed traffic with frequent cut-ins and near-field pedestrians.",
"capture_context": {
"time_of_day": "day",
"weather": "clear",
"platform": "two_wheeler"
},
"qc_status": "pass",
"qc_notes": "Stable exposure, minimal obstruction, continuous forward view."
}
This example illustrates the structure. Final schema definitions and allowed values are shared during the PoC stage so your pipeline can validate exact keys.
Quality control summary
Quality control focuses on technical usability rather than visual perfection. Each segment is checked for video integrity, basic visibility, continuity, and metadata completeness to ensure the data can be reliably ingested into training pipelines.
QC checks (overview)
- Video integrity — corruption, missing frames, encoding errors
- Exposure and visibility — severe blur or unusable lighting
- Obstruction — camera blocked or heavy handling artifacts
- Segment continuity — stable forward motion within the clip
- Metadata completeness — required fields present
For PoC deliveries, QC remains lightweight and transparent. For larger production programs, QC can be extended with additional checks and sampling.
How to evaluate with a PoC
Start with a small PoC dataset, validate integration with your pipeline, then expand coverage once the data proves useful for your model training.
Recommended evaluation steps
- Run your ingestion pipeline (MP4 + JSON)
- Verify metadata parsing and schema compatibility
- Filter by scene_type and inspect environment diversity
- Run a small training or evaluation experiment
- Decide next data scale and scene coverage targets
We will respond with a practical data plan based on your target environment, licensing constraints, and evaluation workflow.
FAQ
Can you provide labels?
Yes, but PoC usually starts with video + metadata first. After you validate value, we can add labeling or evaluation tasks based on your needs.
Do you support custom scene requests?
Yes. Send a short requirement list and we will reply with feasibility, timeline, and minimum order assumptions.
What about legal and privacy?
See the Legal and Transparency page for our approach and PoC usage framing.
Ready to validate in your pipeline?
Tell us your target environment and failure cases. We will recommend a PoC tier and a coverage plan.