音视频转文字技能,使用 Whisper 进行语音识别。支持多种音视频格式,可输出纯文本、SRT/VTT 字幕或 JSON 格式。适用于会议记录、视频字幕生成、采访整理、播客转录等场景。
Initial release of the audio-video-to-text skill. - Converts audio/video files to text using OpenAI Whisper. - Supports multiple formats: txt, SRT, VTT, and JSON. - Handles various audio/video types: MP3, WAV, MP4, AVI, and more. - Allows model selection for speed/accuracy trade-offs. - Suitable for meeting notes, subtitles, interviews, and podcasts.