Extract text, tables, and images from .docx and legacy .doc files. Handles large documents, CJK text, and complex table structures. Includes deduplication an...
Initial release: extract text, tables, images from .docx/.doc with CJK support