InternVideo2.5 - Action Recognition
Powered by InternVideo2.5-8B on ZeroGPU.
SOTA Performance:
- 92.1% accuracy on Kinetics-400 (+11.2% over VideoMAE)
- Open-vocabulary action detection
- Custom sports-specific actions
Capabilities:
- Action classification (50+ default actions)
- Custom action labels for sports
- Foul-related action detection
- Multi-frame temporal understanding
API Endpoints for EagleEye:
POST /call/api_classify_action- Action classification
API Usage for EagleEye Integration
Action Classification
from gradio_client import Client
import json
import base64
client = Client("magboola/internvideo2-zerogpu")
# Prepare frames as base64-encoded JPEGs
frames_b64 = [base64.b64encode(frame_bytes).decode() for frame_bytes in frames]
# Optional: custom action candidates
custom_actions = ["scoring a goal", "making a tackle", "celebrating"]
result = client.predict(
frames_base64=json.dumps(frames_b64),
timestamp_s=5.0,
action_candidates=json.dumps(custom_actions),
api_name="/api_classify_action"
)
print(result)
# {"success": True, "action": "scoring a goal", "confidence": 0.9, ...}
Using Default Actions (Kinetics-400)
result = client.predict(
frames_base64=json.dumps(frames_b64),
timestamp_s=5.0,
action_candidates="", # Empty for default actions
api_name="/api_classify_action"
)
Foul Detection
The response includes is_foul_related boolean for detecting:
- tackling, wrestling, headbutting
- punching, kicking, pushing
- slapping, fighting