Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts,...
- Significant update: migrated from custom scripts and detailed workflow to usage of the official PaddleOCR CLI. - Removed all helper scripts and schema/reference files. - Updated instructions for document parsing using the new paddleocr api command-line interface. - Simplified configuration: only requires PADDLEOCR_ACCESS_TOKEN and paddleocr CLI. - Added quick-start usage examples with key CLI options and new output format. - Clarified error handling and preprocessing recommendations.