Skip to main content

Data QC

How can large volumes of ROS recordings be checked automatically and consistently before training and export—against baselines like frame rate, sensor topics, and time sync?

Typical needs

  • Batch-filter data with low FPS, missing critical topics, or bad timestamps before training.
  • Different thresholds per project, plus one global “hard floor” for the whole platform.
  • After rule changes, re-run QC on historical data so conclusions match the latest rules.
  • Manual overrides when a rule is right but a sample is exceptional, with an audit trail.

The Data QC module uses QC rules + background jobs: the platform scans a structured report per ROS recording, evaluates your UI assertions, and shows pass/fail on each dataset.

Not the same as “video QC”
  • Data QC (this page): Rule-based checks on ROS trajectory / ROS recordings (MCAP, bag, db3, etc.)—numeric thresholds, required/forbidden topics—tied to preprocessing, export, and training.
  • Video QC: Multi-stream Doctor diagnostics (drops, artifacts, BRISQUE)—see Video QC.

Core concepts

QC rules

Each rule binds one QC algorithm (ROS recording inspection; MCAP reports are primary today; other containers/formats are rolling out), a scope, and several assertions.

ConceptDescription
ScopeProject rule: only datasets in that project. Global rule: all ROS recordings on the platform (including unassigned). Datasets in a project match both global and project rules; each produces its own runs.
Dataset name matchGlob on dataset.name (e.g. *arm*); blank means all names in that scope.
PriorityLower number runs first (display and scheduling); rules still judge independently when both match.
EnabledDisabled rules do not match or auto-queue; enabling can trigger backfill on history.

Assertion types

Assertions are the conditions inside a rule:

  • Numeric threshold: Comparator + threshold on a report metric (e.g. frame_rate >= 20); scope can limit to whole file or topic patterns (see UI for options).
  • Required topic: Topic must appear in the report or the run fails.
  • Forbidden topic: Topic must not appear or the run fails.

Severity: error (shown as blocking / fail in the UI) fails the run when violated; warning is informational and does not by itself fail the run—handy to observe before tightening.

Permissions

ActionAdminProject managerOther roles
View QC results / logsYesYes (within data access)Collectors often see their own data
Edit project QC rulesYesYes (in project or public project)Usually no
Edit global QC rulesYesNoNo

Only admins can create, edit, delete, or duplicate global rules; others may see them read-only.

Flow overview

When QC runs automatically

  • After preprocessing: When ROS recording preprocessing finishes, the platform enqueues QC for datasets that match enabled rules (no need to click each row). Today auto-queue is still MCAP-first; bag / db3 alignment with reports and rules is rolling out—follow release notes.
  • On rule create/change: Saving or enabling a rule can asynchronously backfill matching historical data (large queues may take time).
  • Manual: From a dataset’s QC entry points you can trigger a run (exact button names follow the UI).

Data in unsupported formats or not yet preprocessed may not enter the auto QC queue.

Reading results

  • Summary: On list or detail, see how many rules apply and pass / fail / running (exact chips follow the UI).
  • Single run: Each rule has its own run with metrics and failed assertion details.
  • System tags: Failures may add tags such as qc:failed for filtering; export blocking depends on platform policy.

Manual overrides

When a rule is generally right but one sample is special (e.g. deliberate low FPS, later fixed in post), you can override that run (pass/fail + reason). Overrides feed the effective verdict and tie into tags and export logic.

Screenshots

Rule list

QC rule list

Create / edit rule

Create or edit QC rule

QC on dataset detail

QC result on dataset detail

See also