Getting Started
Lab Dojo is a free, open-source AI research workstation designed for principal investigators and university research laboratories. It connects 20 biomedical science APIs to a local AI model running on your machine, providing grounded, citation-backed answers to research questions without sending any data to the cloud.
Lab Dojo is built for the Pathology discipline in v0.1.x. Future versions will cover biology, chemistry, medicine, physics, engineering, and other scientific domains.
Key capabilities
- Ask research questions grounded in PubMed, UniProt, PDB, ChEMBL, KEGG, and 15 more databases
- Every claim is traced to its source with PMID citations
- Run automated pipelines: literature review, protein analysis, drug profiling, pathway analysis, cancer genomics
- Persistent project memory across sessions
- Export citations to BibTeX, RIS, or Markdown
- Zero configuration: double-click the installer and start researching
System requirements
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.8+ | 3.11+ |
| RAM | 8 GB | 16 GB |
| Disk | 5 GB free | 20 GB free |
| OS | macOS 12+, Windows 10+, Linux (Ubuntu 20.04+) | |
| GPU | Not required | NVIDIA GPU for faster inference |
Installation
macOS
Download and run the installer script:
curl -fsSL https://github.com/SooparAI/labdojo/releases/latest/download/LabDojo_Installer.command -o LabDojo_Installer.command
chmod +x LabDojo_Installer.command
./LabDojo_Installer.commandThe installer will check for Python 3.8+, install dependencies, download Ollama if needed, pull the default model (llama3:8b), and start Lab Dojo at http://localhost:8080.
Windows
Download the batch file and double-click it:
# Download from GitHub Releases
# Double-click LabDojo_Installer.bat
# Or run from Command Prompt:
LabDojo_Installer.batManual installation (any OS)
git clone https://github.com/SooparAI/labdojo.git
cd labdojo
pip install fastapi uvicorn aiohttp
python labdojo.pyOllama setup
Lab Dojo uses Ollama for local AI inference. The installer handles this automatically, but if you need to set it up manually:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull the default model
ollama pull llama3:8b
# Verify it's running
ollama listLab Dojo auto-detects Ollama at http://localhost:11434. To use a different host, update the setting in the Lab Dojo Settings tab.
Science APIs
Lab Dojo connects to 20 free, public science APIs. No API keys are required. All data is fetched directly from the source and cached locally for 30 minutes to reduce redundant requests.
Literature databases
| API | Coverage | Use case |
|---|---|---|
| PubMed | 36M+ abstracts | Primary literature search with PMID citations |
| ArXiv | Quantitative biology | Preprints and computational biology |
| bioRxiv | Biology preprints | Latest unpublished research |
| Europe PMC | European life sciences | Full-text access and citation data |
| Semantic Scholar | 200M+ papers | AI-powered citation graphs and influence scores |
| OpenAlex | Open scholarly metadata | Institutional and author analytics |
| Crossref | DOI registry | DOI resolution and bibliographic metadata |
| ORCID | Researcher IDs | Author disambiguation and publication lists |
Protein and structure databases
| API | Coverage | Use case |
|---|---|---|
| UniProt | 250M+ sequences | Protein function, domains, and annotations |
| PDB | 220K+ structures | 3D macromolecular structures and ligands |
| STRING | Protein networks | Protein-protein interaction networks and scores |
Chemistry and drug databases
| API | Coverage | Use case |
|---|---|---|
| ChEMBL | 2.4M+ compounds | Bioactive molecules and drug-like properties |
| PubChem | 110M+ substances | Chemical structures and bioassay data |
| DrugBank | 15K+ drugs | Drug-target interactions and pharmacology |
| RxNorm | Drug naming | Normalized drug nomenclature |
Pathway and genomics databases
| API | Coverage | Use case |
|---|---|---|
| KEGG | Biological pathways | Metabolic and signaling pathway maps |
| Reactome | Curated pathways | Peer-reviewed pathway knowledgebase |
| NCBI Gene | Gene records | Gene-centric information and orthologs |
| OMIM | Genetic disorders | Mendelian inheritance and phenotype catalog |
| ClinicalTrials.gov | 450K+ studies | Clinical study registry and results |
Smart API routing
Lab Dojo does not query all 20 APIs for every question. It classifies your question into research topics (literature, protein, drug, pathway, genomics, clinical, cancer) and routes it to the 3-8 most relevant databases. For example, asking about a protein structure will query UniProt, PDB, and STRING, but not ClinicalTrials.gov.
AI Backends
Lab Dojo supports multiple AI backends with automatic fallback. The local Ollama backend is always preferred to keep your data private.
Backend priority
| Priority | Backend | Requirements | Data privacy |
|---|---|---|---|
| 1 (default) | Ollama (local) | Ollama installed | 100% local |
| 2 | Vast.ai (serverless) | API key in settings | Encrypted transit |
| 3 | OpenAI | API key in settings | Cloud processed |
| 4 | Anthropic | API key in settings | Cloud processed |
Changing the local model
# List available models
ollama list
# Pull a different model
ollama pull mistral
ollama pull codellama
ollama pull llama3:70b # Requires 40GB+ RAM
# Update Lab Dojo settings to use the new model
# Go to Settings tab > Model NameVerbosity levels
Lab Dojo supports three response verbosity levels, configurable per-message:
- Concise: Data and conclusions only. 2-4 sentences.
- Detailed (default): Mechanistic detail with citations. 2-3 paragraphs.
- Comprehensive: Full analysis with contradictions, caveats, and experimental suggestions. No length limit.
Pipelines
Pipelines are automated multi-step research workflows. Each pipeline queries multiple APIs, synthesizes results through the AI model, and produces a structured report.
Available pipelines
| Pipeline | Input | APIs used | Output |
|---|---|---|---|
| Literature Review | Research topic | PubMed, Semantic Scholar, Europe PMC | Synthesized review with citations |
| Protein Analysis | Protein name/ID | UniProt, PDB, STRING | Function, structure, interactions |
| Drug Target Profile | Drug or target name | ChEMBL, PubChem, DrugBank | Compound data, targets, trials |
| Pathway Analysis | Gene or pathway | KEGG, Reactome, NCBI Gene | Pathway maps and gene roles |
| Cancer Genomics | Gene or cancer type | GDC, cBioPortal, ClinicalTrials | Mutation data and clinical context |
Running a pipeline
Navigate to the Pipelines tab in Lab Dojo, select a pipeline type, enter your query, and click Run. Results appear in real-time as each stage completes.
# Via the REST API:
curl -X POST http://localhost:8080/pipeline/run \
-H "Content-Type: application/json" \
-d '{"pipeline_type": "literature_review", "query": "BRCA1 DNA damage repair"}'Projects and Memory
Lab Dojo maintains persistent context across sessions through two mechanisms: projects and learned Q&A memory.
Projects
Projects group related conversations, decisions, and literature under a single research thread. Each project has a name, description, and focus keywords. When you switch projects, Lab Dojo loads the relevant context so the AI understands your ongoing work.
Learned Q&A memory
When Lab Dojo answers a research question, it stores the question-answer pair in a local SQLite database. If you ask a similar question later, it can recall the previous answer instantly without re-querying the APIs. The similarity threshold is 0.85 (Jaccard word overlap), which means only very similar questions trigger a cache hit.
Conversation history
The last 10 conversation turns are sent to the AI model as context, enabling follow-up questions like "What about its binding partners?" after asking about a specific protein. All conversation data is stored locally in SQLite.
Citations
Lab Dojo verifies and tracks every citation it produces. When the AI references a paper, Lab Dojo cross-checks the PMID against PubMed to confirm the paper exists, then stores the verified citation with full bibliographic metadata.
Citation format
Inline citations use the format [PMID:12345678]. Each PMID links to the corresponding PubMed entry.
Export formats
- BibTeX: Standard format for LaTeX and reference managers
- RIS: Compatible with EndNote, Zotero, Mendeley
- Markdown: Human-readable format with abstracts
Configuration
Lab Dojo stores its configuration in ~/.labdojo/config.json. API keys are stored separately in ~/.labdojo/secrets.json with restricted file permissions (600).
Configuration options
| Setting | Default | Description |
|---|---|---|
| port | 8080 | HTTP server port |
| ollama_host | http://localhost:11434 | Ollama API endpoint |
| ollama_model | llama3:8b | Default Ollama model |
| temperature | 0.7 | AI response temperature (0.0-1.0) |
| verbosity | detailed | Response detail level |
| openai_api_key | (empty) | Optional OpenAI fallback |
| anthropic_api_key | (empty) | Optional Anthropic fallback |
Data storage
All data is stored locally in ~/.labdojo/knowledge.db (SQLite). This includes conversations, learned Q&A pairs, projects, decisions, verified citations, pipeline runs, monitored topics, and usage statistics. No data is sent to any external server.
Developer Guide
Lab Dojo is a single-file Python application (labdojo.py, approximately 2,700 lines) built on FastAPI. The architecture is modular and designed for extension.
Architecture
labdojo.py
Config - Configuration management with secrets
KnowledgeBase - SQLite-backed persistent storage
classify_intent() - Casual vs research message routing
ScienceAPIs - 20 API integrations with caching
OllamaClient - Local Ollama inference
ServerlessClient - Vast.ai serverless inference
OpenAIClient - OpenAI API client
AnthropicClient - Anthropic API client
InferenceRouter - Multi-backend fallback chain
create_app() - FastAPI application factory
main() - Entry point with UvicornAdding a new API
To add a new science API:
# 1. Add to _API_CATALOG
_API_CATALOG["my_api"] = {
"name": "My API",
"desc": "Description of what it provides",
"base": "https://api.example.com",
"free": True,
}
# 2. Add to _ROUTE_MAP under relevant topics
_ROUTE_MAP["literature"].append("my_api")
# 3. Implement the query method in ScienceAPIs
async def _search_my_api(self, query: str) -> str:
url = f"https://api.example.com/search?q={query}"
data = await self._http_get(url)
if not data:
return ""
# Parse and format the response
return formatted_result
# 4. Register in the fetch_grounding_data dispatcher
# Add "my_api": self._search_my_api to the dispatch dictRunning tests
cd labdojo
pip install pytest pytest-asyncio httpx
python -m pytest test_labdojo.py -v
# 127 tests covering:
# - Config save/load and secrets
# - Intent classification (21 casual + 7 research cases)
# - KnowledgeBase CRUD (conversations, projects, citations, pipelines)
# - ScienceAPIs routing and caching
# - All REST endpoints
# - Edge cases (SQL injection, Unicode, concurrent access, long content)Building discipline-specific versions
Lab Dojo is designed to be forked for different scientific disciplines. To create a new discipline version:
- Update
_API_CATALOGwith discipline-relevant APIs - Update
_TOPIC_KEYWORDSwith discipline-specific terms - Update
_ROUTE_MAPto connect topics to APIs - Update
_SYSTEM_PROMPTwith discipline-specific instructions - Implement new API query methods in
ScienceAPIs
API Reference
Lab Dojo exposes a REST API at http://localhost:8080. All endpoints accept and return JSON.
Core endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /chat | Send a research question, get a grounded response |
| GET | /status | System status, version, and backend availability |
| GET | /apis | List all connected science APIs |
| POST | /search | Search papers with export (BibTeX/RIS/Markdown) |
| POST | /pipeline/run | Run an automated research pipeline |
| GET | /pipeline/runs | List previous pipeline runs |
| GET | /projects | List all projects |
| POST | /projects | Create a new project |
| GET | /settings | Get current settings (keys masked) |
| POST | /settings | Update settings |
| GET | /export/conversation | Export conversation as Markdown |
| GET | /learning/stats | Learning system statistics |
| POST | /monitor/topics | Add a topic to monitor |
Chat endpoint example
curl -X POST http://localhost:8080/chat \
-H "Content-Type: application/json" \
-d '{
"message": "What is the role of BRCA1 in DNA damage repair?",
"project_id": "default",
"verbosity": "detailed"
}'
# Response:
{
"response": "BRCA1 plays a central role in homologous recombination...",
"source": "local (llama3:8b)",
"apis_used": ["pubmed", "uniprot", "ncbi_gene"],
"citations": ["PMID:21242564", "PMID:22510451"],
"cached": false
}Lab Dojo is open source under the MIT License. Built by JuiceVendor Labs Inc.
