使用多模态大模型理解图片内容,生成业务含义描述。支持多种模型:(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等,生成精准的文字描述。
minimax-image-understanding v1.0.0 - Initial release supporting multimodal image understanding using large models. - Compatible with MiniMax VLM (default, recommended for Chinese), OpenAI GPT-4V, and Claude Vision (Anthropic). - Simple CLI tool for generating business-centric descriptions of images, charts, and document photos. - Environment-variable-based configuration for easy model selection. - Output focuses on key content and business logic, omitting positional element listings.