Skip to main content

Operations Monitoring

How to detect and resolve issues in time when the system is running?

Typical scenarios:

  • Data export tasks fail, need to check failure reasons
  • System response slows down, need to check resource usage
  • Users report issues, need to check operation logs to locate causes
  • Need to understand overall system operation status and health

Operations monitoring module is designed to solve these problems. Through real-time monitoring, log querying, task management and other functions, it helps administrators understand system operation status and quickly locate and resolve issues.

Monitoring Overview

How to View Overall System Status?

Key metrics:

The monitoring overview page displays overall system operation status:

  • Database Latency: Database response time, reflecting database performance
  • Redis Latency: Redis response time, reflecting cache performance
  • Queue Backlog: Number of tasks waiting to be processed, reflecting system load

Time Range Selection:

  • 1 Hour: View data from the last hour, suitable for real-time monitoring
  • 24 Hours: View data from the last 24 hours, suitable for daily monitoring
  • 7 Days: View data from the last 7 days, suitable for trend analysis

Metric Trends:

  • Real-time display of changes in various metrics
  • Support chart visualization
  • Identify anomalies and peaks

Immediate Collection:

  • Manually trigger data collection
  • Update latest monitoring data
  • Used for real-time problem troubleshooting

System Information

How to View Basic System Information?

System Information:

  • System version and build information
  • Runtime and startup time
  • Hostname and operating system
  • CPU cores and total memory

Service Status:

  • Database connection status
  • Redis connection status
  • Storage service status
  • Running status of each service

This information helps understand the basic operating environment of the system.

System Logs

How to View Access Logs?

Use case: View user access records, analyze access patterns, troubleshoot access issues.

Log Information:

  • Request time
  • Request path and method
  • Response status code
  • Response time
  • User information
  • IP address

Query Functions:

  • Filter by time range
  • Search by path
  • Filter by status code
  • Filter by user

How to View Active Users?

Use case: Understand current online users, monitor user activity.

Information Display:

  • User name and role
  • Last active time
  • Session duration
  • Access path
  • IP address

Statistics Functions:

  • Current online user count
  • Today's active user count
  • User access statistics

How to View Login Logs?

Use case: Monitor user login situations, detect abnormal login behavior.

Log Information:

  • Login time
  • User name
  • Login IP address
  • Login status (success/failure)
  • Failure reason (if login failed)

Query Functions:

  • Filter by user
  • Filter by IP address
  • Filter by time range
  • Filter by login status

How to View Operation Logs? (New in 3.3.0)

Use case: Audit user operations, track data changes, troubleshoot issues.

Recorded Operations:

  • Data creation, modification, deletion
  • Task creation and assignment
  • Training task creation and startup
  • Inference service deployment
  • System configuration modification
  • User management operations

Log Information:

  • Operation time
  • Operating user
  • Operation type
  • Operation object
  • Operation result
  • IP address

Query Functions:

  • Filter by user
  • Filter by operation type
  • Filter by time range
  • Filter by IP address

How to View Workflow Logs?

Use case: View workflow execution status, troubleshoot workflow issues.

Log Information:

  • Workflow execution time
  • Workflow name and ID
  • Execution status
  • Matching rules and action rules
  • Execution result

Query Functions:

  • Filter by workflow
  • Filter by time range
  • Filter by execution status

Background Tasks

How to Manage Task Queues?

Task Queue Types:

  • System Queue: Process system-level tasks (metadata synchronization, preprocessing, etc.)
  • Export Queue: Process data export tasks

Queue Management:

  • View Task Count: Waiting, in progress, completed, failed
  • Pause/Resume Queue: Temporarily pause or resume queue processing
  • Clear Waiting Queue: Clear all waiting tasks
  • Batch Retry: Batch retry all failed tasks
  • Clean History: Clean tasks completed or failed more than 24 hours ago

Task Details:

  • Task name and type
  • Task status and progress
  • Creation time and completion time
  • Error information (if failed)
  • Task parameters and results

Task Operations:

  • View task details
  • Retry failed tasks
  • Cancel waiting tasks
  • Delete completed tasks

⚠️ Note: Pausing the queue will affect execution of new tasks, clearing the queue will delete waiting tasks, please operate with caution.

Export Records

How to View Export History?

Use case: View records of all data export tasks, understand export status.

Record Information:

  • Export time
  • Export format (HDF5, LeRobot, MCAP, etc.)
  • Export status (pending, processing, completed, failed)
  • Included datasets
  • File size and download link
  • Operator information

Query Functions:

  • Filter by export format
  • Filter by status
  • Filter by time range
  • Filter by user
  • Search specific export tasks

Operation Functions:

  • View export details
  • Download export files
  • View export progress
  • Retry failed exports

Common Questions

How to Quickly Locate System Issues?

Troubleshooting Steps:

  1. View monitoring overview to understand overall system status
  2. Check if key metrics are abnormal
  3. View system logs to locate specific issues
  4. Check background tasks to confirm task execution status
  5. Take corresponding measures based on log information

What to Do About Queue Backlog?

Handling Methods:

  1. View number and types of tasks in the queue
  2. Identify backlog causes (too many tasks, slow processing speed, etc.)
  3. Take corresponding measures:
    • Increase processing resources
    • Pause new tasks
    • Clean up unnecessary tasks
    • Optimize task processing speed

How to View User Operation History?

Viewing Method:

  1. Go to "System Logs" > "Operation Logs"
  2. Filter by user
  3. View all operation records of that user
  4. Can further filter by time range

Applicable Roles

Administrator

You can:

  • Monitor system operation status in real time
  • View and analyze system logs
  • Manage background task queues
  • Troubleshoot system failures
  • Optimize system performance
  • Conduct security audits

Operations Personnel

You can:

  • Monitor system resource usage
  • View service running status
  • Manage task queues
  • Handle system alerts
  • Maintain stable system operation

After completing operations monitoring, you may also need: