What can you do with it?
Transcribe video files to text with timestamps, speaker identification, and meeting summaries. Perfect for converting meetings, presentations, interviews, and other video content into searchable, readable transcripts. Automatically pre-processes video by extracting audio for optimal transcription quality.How to use it?
Basic Command Structure
Parameters
Required:prompt
- Instructions for transcription (timestamps, speakers, summary, etc.)files
- Video file to transcribe (supports MP4, MOV, AVI, and other common formats)
model
- Gemini model to use (defaults to gemini-2.5-flash)output filename
- Custom name for transcript file
Response Format
Asynchronous Response (all video transcription is async):Examples
Basic Transcription
Meeting Summary
Detailed Analysis
Notes
Processing Details:- Always Async: Video transcription uses background processing due to file size and complexity
- Audio Extraction: Automatically extracts audio from video for optimal transcription
- Output Format: Plain text transcript with timestamps and speaker labels
- File Storage: Results saved to Multimedia Artifact collection
- Processing Time: Varies based on video length (typically 2-5 minutes for 30-minute videos)
- “Include timestamps for each speaker turn”
- “Identify speakers by name if mentioned”
- “Provide a summary of key topics discussed”
- “Extract any action items or decisions made”
- “Note technical terms or tools mentioned”
- “Highlight concerns or issues raised”
- Timestamped dialogue with speaker identification
- Meeting summary (if requested)
- Action items and decisions (if requested)
- Technical terms and references (if requested)
Supported Models
gemini-2.5-flash
(default) - Optimal balance of speed and accuracy for transcriptiongemini-2.5-pro
- Enhanced accuracy for complex audio or multiple speakersgemini-2.5-flash-lite
- Faster processing for simple, clear audio