Design Philosophy
PDF Reader MCP is built on these core principles:
1. Performance First
- Concurrent Processing - Multiple PDF sources are processed in parallel
- Efficient Parsing - Uses pdfjs-dist for reliable, fast PDF parsing
- Minimal Overhead - Direct stdio communication with no HTTP overhead
- Batch Operations - Process multiple files in a single request
2. Comprehensive Extraction
- Text Extraction - Full document or specific pages
- Page Ranges - Flexible page selection with ranges like "1-5, 10, 15-20"
- Metadata Access - Document properties, author, title, dates
- Image Extraction - Embedded images as base64-encoded PNG
3. Simple Integration
- Single Tool - One
read_pdftool handles all extraction needs - Standard MCP - Compatible with any MCP client
- Easy Setup - One command installation via npx
- Multiple Clients - Works with Claude Desktop, Claude Code, Cursor, and more
4. Flexible Input
- Local Files - Read PDFs from any path on the filesystem
- Remote URLs - Download and process PDFs from URLs
- Mixed Sources - Combine local and remote files in one request
5. Robust Error Handling
- Graceful Failures - One failed source doesn't stop others
- Clear Errors - Specific error codes and messages
- Partial Results - Get results from successful sources even if some fail
Technical Stack
- Runtime: Node.js 22+
- PDF Parsing: pdfjs-dist
- Image Encoding: pngjs
- Schema Validation: Zod
- MCP SDK: @sylphx/mcp-server-sdk
- Build Tool: bunup