Lab DojoDocumentationv0.1.1

Getting Started

Lab Dojo is a free, open-source AI research workstation designed for principal investigators and university research laboratories. It connects 20 biomedical science APIs to a local AI model running on your machine, providing grounded, citation-backed answers to research questions without sending any data to the cloud.

Lab Dojo is built for the Pathology discipline in v0.1.x. Future versions will cover biology, chemistry, medicine, physics, engineering, and other scientific domains.

Key capabilities

  • Ask research questions grounded in PubMed, UniProt, PDB, ChEMBL, KEGG, and 15 more databases
  • Every claim is traced to its source with PMID citations
  • Run automated pipelines: literature review, protein analysis, drug profiling, pathway analysis, cancer genomics
  • Persistent project memory across sessions
  • Export citations to BibTeX, RIS, or Markdown
  • Zero configuration: double-click the installer and start researching

System requirements

ComponentMinimumRecommended
Python3.8+3.11+
RAM8 GB16 GB
Disk5 GB free20 GB free
OSmacOS 12+, Windows 10+, Linux (Ubuntu 20.04+)
GPUNot requiredNVIDIA GPU for faster inference

Installation

macOS

Download and run the installer script:

curl -fsSL https://github.com/SooparAI/labdojo/releases/latest/download/LabDojo_Installer.command -o LabDojo_Installer.command
chmod +x LabDojo_Installer.command
./LabDojo_Installer.command

The installer will check for Python 3.8+, install dependencies, download Ollama if needed, pull the default model (llama3:8b), and start Lab Dojo at http://localhost:8080.

Windows

Download the batch file and double-click it:

# Download from GitHub Releases
# Double-click LabDojo_Installer.bat
# Or run from Command Prompt:
LabDojo_Installer.bat

Manual installation (any OS)

git clone https://github.com/SooparAI/labdojo.git
cd labdojo
pip install fastapi uvicorn aiohttp
python labdojo.py

Ollama setup

Lab Dojo uses Ollama for local AI inference. The installer handles this automatically, but if you need to set it up manually:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the default model
ollama pull llama3:8b

# Verify it's running
ollama list

Lab Dojo auto-detects Ollama at http://localhost:11434. To use a different host, update the setting in the Lab Dojo Settings tab.

Science APIs

Lab Dojo connects to 20 free, public science APIs. No API keys are required. All data is fetched directly from the source and cached locally for 30 minutes to reduce redundant requests.

Literature databases

APICoverageUse case
PubMed36M+ abstractsPrimary literature search with PMID citations
ArXivQuantitative biologyPreprints and computational biology
bioRxivBiology preprintsLatest unpublished research
Europe PMCEuropean life sciencesFull-text access and citation data
Semantic Scholar200M+ papersAI-powered citation graphs and influence scores
OpenAlexOpen scholarly metadataInstitutional and author analytics
CrossrefDOI registryDOI resolution and bibliographic metadata
ORCIDResearcher IDsAuthor disambiguation and publication lists

Protein and structure databases

APICoverageUse case
UniProt250M+ sequencesProtein function, domains, and annotations
PDB220K+ structures3D macromolecular structures and ligands
STRINGProtein networksProtein-protein interaction networks and scores

Chemistry and drug databases

APICoverageUse case
ChEMBL2.4M+ compoundsBioactive molecules and drug-like properties
PubChem110M+ substancesChemical structures and bioassay data
DrugBank15K+ drugsDrug-target interactions and pharmacology
RxNormDrug namingNormalized drug nomenclature

Pathway and genomics databases

APICoverageUse case
KEGGBiological pathwaysMetabolic and signaling pathway maps
ReactomeCurated pathwaysPeer-reviewed pathway knowledgebase
NCBI GeneGene recordsGene-centric information and orthologs
OMIMGenetic disordersMendelian inheritance and phenotype catalog
ClinicalTrials.gov450K+ studiesClinical study registry and results

Smart API routing

Lab Dojo does not query all 20 APIs for every question. It classifies your question into research topics (literature, protein, drug, pathway, genomics, clinical, cancer) and routes it to the 3-8 most relevant databases. For example, asking about a protein structure will query UniProt, PDB, and STRING, but not ClinicalTrials.gov.

AI Backends

Lab Dojo supports multiple AI backends with automatic fallback. The local Ollama backend is always preferred to keep your data private.

Backend priority

PriorityBackendRequirementsData privacy
1 (default)Ollama (local)Ollama installed100% local
2Vast.ai (serverless)API key in settingsEncrypted transit
3OpenAIAPI key in settingsCloud processed
4AnthropicAPI key in settingsCloud processed

Changing the local model

# List available models
ollama list

# Pull a different model
ollama pull mistral
ollama pull codellama
ollama pull llama3:70b  # Requires 40GB+ RAM

# Update Lab Dojo settings to use the new model
# Go to Settings tab > Model Name

Verbosity levels

Lab Dojo supports three response verbosity levels, configurable per-message:

  • Concise: Data and conclusions only. 2-4 sentences.
  • Detailed (default): Mechanistic detail with citations. 2-3 paragraphs.
  • Comprehensive: Full analysis with contradictions, caveats, and experimental suggestions. No length limit.

Pipelines

Pipelines are automated multi-step research workflows. Each pipeline queries multiple APIs, synthesizes results through the AI model, and produces a structured report.

Available pipelines

PipelineInputAPIs usedOutput
Literature ReviewResearch topicPubMed, Semantic Scholar, Europe PMCSynthesized review with citations
Protein AnalysisProtein name/IDUniProt, PDB, STRINGFunction, structure, interactions
Drug Target ProfileDrug or target nameChEMBL, PubChem, DrugBankCompound data, targets, trials
Pathway AnalysisGene or pathwayKEGG, Reactome, NCBI GenePathway maps and gene roles
Cancer GenomicsGene or cancer typeGDC, cBioPortal, ClinicalTrialsMutation data and clinical context

Running a pipeline

Navigate to the Pipelines tab in Lab Dojo, select a pipeline type, enter your query, and click Run. Results appear in real-time as each stage completes.

# Via the REST API:
curl -X POST http://localhost:8080/pipeline/run \
  -H "Content-Type: application/json" \
  -d '{"pipeline_type": "literature_review", "query": "BRCA1 DNA damage repair"}'

Projects and Memory

Lab Dojo maintains persistent context across sessions through two mechanisms: projects and learned Q&A memory.

Projects

Projects group related conversations, decisions, and literature under a single research thread. Each project has a name, description, and focus keywords. When you switch projects, Lab Dojo loads the relevant context so the AI understands your ongoing work.

Learned Q&A memory

When Lab Dojo answers a research question, it stores the question-answer pair in a local SQLite database. If you ask a similar question later, it can recall the previous answer instantly without re-querying the APIs. The similarity threshold is 0.85 (Jaccard word overlap), which means only very similar questions trigger a cache hit.

Conversation history

The last 10 conversation turns are sent to the AI model as context, enabling follow-up questions like "What about its binding partners?" after asking about a specific protein. All conversation data is stored locally in SQLite.

Citations

Lab Dojo verifies and tracks every citation it produces. When the AI references a paper, Lab Dojo cross-checks the PMID against PubMed to confirm the paper exists, then stores the verified citation with full bibliographic metadata.

Citation format

Inline citations use the format [PMID:12345678]. Each PMID links to the corresponding PubMed entry.

Export formats

  • BibTeX: Standard format for LaTeX and reference managers
  • RIS: Compatible with EndNote, Zotero, Mendeley
  • Markdown: Human-readable format with abstracts

Configuration

Lab Dojo stores its configuration in ~/.labdojo/config.json. API keys are stored separately in ~/.labdojo/secrets.json with restricted file permissions (600).

Configuration options

SettingDefaultDescription
port8080HTTP server port
ollama_hosthttp://localhost:11434Ollama API endpoint
ollama_modelllama3:8bDefault Ollama model
temperature0.7AI response temperature (0.0-1.0)
verbositydetailedResponse detail level
openai_api_key(empty)Optional OpenAI fallback
anthropic_api_key(empty)Optional Anthropic fallback

Data storage

All data is stored locally in ~/.labdojo/knowledge.db (SQLite). This includes conversations, learned Q&A pairs, projects, decisions, verified citations, pipeline runs, monitored topics, and usage statistics. No data is sent to any external server.

Developer Guide

Lab Dojo is a single-file Python application (labdojo.py, approximately 2,700 lines) built on FastAPI. The architecture is modular and designed for extension.

Architecture

labdojo.py
  Config              - Configuration management with secrets
  KnowledgeBase       - SQLite-backed persistent storage
  classify_intent()   - Casual vs research message routing
  ScienceAPIs         - 20 API integrations with caching
  OllamaClient        - Local Ollama inference
  ServerlessClient    - Vast.ai serverless inference
  OpenAIClient        - OpenAI API client
  AnthropicClient     - Anthropic API client
  InferenceRouter     - Multi-backend fallback chain
  create_app()        - FastAPI application factory
  main()              - Entry point with Uvicorn

Adding a new API

To add a new science API:

# 1. Add to _API_CATALOG
_API_CATALOG["my_api"] = {
    "name": "My API",
    "desc": "Description of what it provides",
    "base": "https://api.example.com",
    "free": True,
}

# 2. Add to _ROUTE_MAP under relevant topics
_ROUTE_MAP["literature"].append("my_api")

# 3. Implement the query method in ScienceAPIs
async def _search_my_api(self, query: str) -> str:
    url = f"https://api.example.com/search?q={query}"
    data = await self._http_get(url)
    if not data:
        return ""
    # Parse and format the response
    return formatted_result

# 4. Register in the fetch_grounding_data dispatcher
# Add "my_api": self._search_my_api to the dispatch dict

Running tests

cd labdojo
pip install pytest pytest-asyncio httpx
python -m pytest test_labdojo.py -v

# 127 tests covering:
# - Config save/load and secrets
# - Intent classification (21 casual + 7 research cases)
# - KnowledgeBase CRUD (conversations, projects, citations, pipelines)
# - ScienceAPIs routing and caching
# - All REST endpoints
# - Edge cases (SQL injection, Unicode, concurrent access, long content)

Building discipline-specific versions

Lab Dojo is designed to be forked for different scientific disciplines. To create a new discipline version:

  • Update _API_CATALOG with discipline-relevant APIs
  • Update _TOPIC_KEYWORDS with discipline-specific terms
  • Update _ROUTE_MAP to connect topics to APIs
  • Update _SYSTEM_PROMPT with discipline-specific instructions
  • Implement new API query methods in ScienceAPIs

API Reference

Lab Dojo exposes a REST API at http://localhost:8080. All endpoints accept and return JSON.

Core endpoints

MethodEndpointDescription
POST/chatSend a research question, get a grounded response
GET/statusSystem status, version, and backend availability
GET/apisList all connected science APIs
POST/searchSearch papers with export (BibTeX/RIS/Markdown)
POST/pipeline/runRun an automated research pipeline
GET/pipeline/runsList previous pipeline runs
GET/projectsList all projects
POST/projectsCreate a new project
GET/settingsGet current settings (keys masked)
POST/settingsUpdate settings
GET/export/conversationExport conversation as Markdown
GET/learning/statsLearning system statistics
POST/monitor/topicsAdd a topic to monitor

Chat endpoint example

curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the role of BRCA1 in DNA damage repair?",
    "project_id": "default",
    "verbosity": "detailed"
  }'

# Response:
{
  "response": "BRCA1 plays a central role in homologous recombination...",
  "source": "local (llama3:8b)",
  "apis_used": ["pubmed", "uniprot", "ncbi_gene"],
  "citations": ["PMID:21242564", "PMID:22510451"],
  "cached": false
}

Lab Dojo is open source under the MIT License. Built by JuiceVendor Labs Inc.

View on GitHub | Back to Home