Data Overview

This is a practical summary of what you actually receive in a PoC pack: scenes, metadata, QC signals, and delivery structure. Start small, validate fit, then scale.

Goal: reduce integration friction. You should be able to run your pipeline end-to-end within days, not weeks.

What you get in a PoC pack

  • MP4 clips (short segments suitable for fast iteration)
  • JSON metadata per clip (scene type + operational notes)
  • High-level QC summary (accept/reject signals and reasons)
  • Coverage notes (what is included, what is excluded)
  • Non-binding delivery structure preview for validation

We keep the format simple: video + metadata. You can add labels later if your team wants labeling or evaluation tasks.

Dataset snapshot

120+

hours of raw footage

18,000+

video segments

4

cities

3

countries

Coverage is continuously expanding as new collection cycles are completed.

Coverage at a glance

Our core focus is high-entropy traffic environments and edge-like everyday reality: mixed agents, dense interactions, and non-lane-based flow.

Representative scene categories

  • Unstructured urban flow (mixed agents, non-lane-based)
  • Dense intersections and merges
  • Near-field pedestrian interactions
  • Motorcycle-heavy corridors
  • Night, rain, glare, occlusion-heavy moments

Need a specific city pattern or traffic behavior? Ask for a Coverage Snapshot and we will reply with what is feasible and how fast.

Dataset schema overview

Each delivery includes structured video segments and consistent metadata so your team can filter, audit, and integrate quickly. The schema is intentionally simple for PoC speed while remaining stable enough for larger training pipelines.

Core fields

  • segment_id — stable ID used for joins and traceability
  • scene_type — high-level environment category
  • short_description — short natural language description
  • capture_context — optional context such as time_of_day or platform
  • qc_status — basic QC result for the segment
  • qc_notes — brief QC reviewer notes

Example segment metadata (illustrative)

{
  "segment_id": "IN-BLR-2026-02-000123",
  "scene_type": "unstructured_urban_flow",
  "short_description": "Dense mixed traffic with frequent cut-ins and near-field pedestrians.",
  "capture_context": {
    "time_of_day": "day",
    "weather": "clear",
    "platform": "two_wheeler"
  },
  "qc_status": "pass",
  "qc_notes": "Stable exposure, minimal obstruction, continuous forward view."
}
          

This example illustrates the structure. Final schema definitions and allowed values are shared during the PoC stage so your pipeline can validate exact keys.

Quality control summary

Quality control focuses on technical usability rather than visual perfection. Each segment is checked for video integrity, basic visibility, continuity, and metadata completeness to ensure the data can be reliably ingested into training pipelines.

QC checks (overview)

  • Video integrity — corruption, missing frames, encoding errors
  • Exposure and visibility — severe blur or unusable lighting
  • Obstruction — camera blocked or heavy handling artifacts
  • Segment continuity — stable forward motion within the clip
  • Metadata completeness — required fields present

For PoC deliveries, QC remains lightweight and transparent. For larger production programs, QC can be extended with additional checks and sampling.

How to evaluate with a PoC

Start with a small PoC dataset, validate integration with your pipeline, then expand coverage once the data proves useful for your model training.

Recommended evaluation steps

  1. Run your ingestion pipeline (MP4 + JSON)
  2. Verify metadata parsing and schema compatibility
  3. Filter by scene_type and inspect environment diversity
  4. Run a small training or evaluation experiment
  5. Decide next data scale and scene coverage targets

We will respond with a practical data plan based on your target environment, licensing constraints, and evaluation workflow.

FAQ

Can you provide labels?

Yes, but PoC usually starts with video + metadata first. After you validate value, we can add labeling or evaluation tasks based on your needs.

Do you support custom scene requests?

Yes. Send a short requirement list and we will reply with feasibility, timeline, and minimum order assumptions.

What about legal and privacy?

See the Legal and Transparency page for our approach and PoC usage framing.

Open Legal Transparency →

Ready to validate in your pipeline?

Tell us your target environment and failure cases. We will recommend a PoC tier and a coverage plan.