Build, orchestrate, and visualize high-performance data pipelines with Zem. The first unified framework designed for the MCP era.
name: multimodal_ai_pipeline
servers:
ocr: servers/ocr
voice: servers/voice
llm: servers/llm
nemo: servers/nemo_curator
sinks: servers/sinks
pipeline:
- ocr.extract_pdf:
file_path: documents/report.pdf
- voice.transcribe:
file_path: audio/interview.wav
- nemo.pii_removal:
anonymize_names: true
- llm.classify_domain:
categories: [Medical, Legal, Finance]
- sinks.to_huggingface:
repo_id: your-org/dataset
Standalone, modular servers for domain logic. Bypasses async complexity with robust stdio communication.
Automatic tracking and visualization of every step. Wows stakeholders with beautiful pipeline graphs.
No more tangled code. Define or modify complex pipelines by simply editing a YAML file.
Leverage NVIDIA NeMo Curator for high-performance deduplication, PII removal, and text normalization.
Process PDFs, images, audio, and unstructured documents with specialized OCR and voice engines.
Integrate Ollama, OpenAI, or custom models for classification, summarization, and instruction generation.
Each module is a standalone MCP server, battle-tested for real-world AI pipelines
GPU-accelerated deduplication, PII removal, and text normalization from NVIDIA
Advanced cleaning, filtering, and quality assessment for massive datasets
Extract text from PDFs and images with HuggingFace VLMs and configurable preprocessing
Whisper-powered speech-to-text with automatic language detection
Classify, summarize, and extract insights using Ollama or OpenAI models
Parse complex documents including Word, PowerPoint, and HTML with layout preservation
Generate high-quality training instructions for supervised fine-tuning
Export to HuggingFace Hub, Vector DBs (Pinecone, Weaviate), or custom endpoints
Load from S3, GCS, local files, or Parquet with partition-aware reading
Deep performance insights with per-tool timing, memory usage, and execution graphs
Zem acts as the bridge between modular processing units (MCP Servers) and professional orchestration (ZenML). Every execution is tracked, every artifact is versioned, and every step is descriptively labeled.
Join developers building the next generation of AI-ready datasets