Senior Data Scientist
We are building the next-era approach to mental health diagnostics - through developing a high-precision Medical Decision-Support Engine.
Why: diagnosis in psychiatry today is largely subjective:
clinicians rely on personal judgment and different schools of thought
treatment decisions (especially antidepressants) are often based on trial and error
diagnostic tools are outdated and inconsistent
Our goal is to make mental health diagnostics significantly more objective.
Our system leverages the reasoning capabilities of LLMs grounded in structured Medical Knowledge Graphs to provide clinicians with evidence-based decision support. This diagnostic engine would integrate:
patient-reported data
behavioral signals
emerging biomarkers
structured and unstructured medical knowledge
Diagnostic systems in mental state should shift from just “a doctor’s opinion” to a more objective, data-informed assessment.
We are looking for a Data Scientist who can bridge the gap between "black-box" AI and safe, interpretable clinical practice.
Responsibilities: what you will drive
Dataset Engineering & Validation: design robust pipelines for processing Electronic Health Records (EHR) and medical literature. Implement rigorous multi-stage validation frameworks (Sensitivity/Specificity analysis) to ensure clinical safety and model reliability.
LLM Fine-tuning: adapt Large Language Models using SFT, DPO, or PEFT (LoRA/QLoRA) for specialised medical domains and complex clinical diagnostic reasoning.
Advanced RAG & Graph RAG: architect hybrid retrieval systems that combine vector databases with Knowledge Graphs to eliminate hallucinations and ensure factual grounding.
Explainability & Interpretability: develop methods to make model outputs transparent. The engine must provide "reasoning paths" — justifying recommendations by citing specific medical evidence, clinical protocols, and graph relations.
Experience: what you bring
GenAI Stack: expert knowledge of Transformer architectures and hands-on experience fine-tuning LLMs (Llama 3, Mistral, etc.)
Graph ML: hands-on experience with Knowledge Graphs, Triple-stores, or Graph Databases (Neo4j, ArangoDB) and Graph Neural Networks (GNNs).
Retrieval Systems: proficiency in LangChain / LlamaIndex and vector search engines (Pinecone, Milvus, or Weaviate).
XAI Tools: practical experience with SHAP, LIME, or custom attention-mapping techniques for model interpretability.
Validation & Stats: strong background in statistical validation for high-stakes environments and handling imbalanced messy data: no ground truth, no ideally ready-to-go data.
Fine-tuning: identifying important signals and how they can be combined, catching model’s false output timely.
Perks: why join us
Solve the "why": you won't just build a model, you'll build a system that clinicians can trust because they understand its logic.
Cutting-edge stack: work at the absolute forefront of AI, combining LLMs with structured Knowledge Graphs (Graph RAG).
Meaningful social impact: your work directly contributes to better patient outcomes, faster recovery, and a significant reduction in diagnostic errors.
Published on: 5/14/2026

Pink Elephant VC
Pink Elephant VC is a venture capital fund focused on innovations in mental health.
Please let Pink Elephant VC know you found this job on Wantapply.com. It helps us to get more jobs on our site. Thanks!





