AI Engineer

RemoteSenior

Ruby Labs is a tech company with a portfolio of consumer products in health, education, and entertainment (100M+ annual users). We’re looking for a senior AI Engineer (Node.js / Next.js / TypeScript) to shape our AI infrastructure and drive production-ready LLM experiences. You’ll work in a modern stack, making data-driven decisions around model performance, reliability, and cost. You’ll take full ownership of key AI features from experimentation to live production.

Responsibilities:

  • Advanced Prompt Engineering: Designing complex, dynamic prompt templates with conditional logic and efficiently reusing information and context within prompts to maximize generation quality and reasoning.

  • Structured Outputs & Schemas: Implementing various response schemes (JSON mode, function calling, Zod/JSON schemas) to ensure AI outputs are predictable and ready for seamless integration into application logic.

  • Prompt Engineering & Evaluations: Building robust evaluation pipelines and using Langfuse to collect feedback and score the quality of responses in real time.

  • Tracing & Debugging: Performing deep debugging of complex LLM chains using Langfuse traces to identify bottlenecks and optimize for cost, latency, and context window usage.

  • AI A/B Testing: Running systematic experiments across different models via OpenRouter (e.g., comparing Claude 3.5 Sonnet vs. GPT-4o) and analyzing results based on quantitative metrics.

  • Data-Driven Decisions: Making deployment decisions for new prompts or models strictly based on quantitative benchmarks and trace data, rather than intuition.

  • Output Scoring & Analysis: Developing scoring systems to analyze the “Problem → Solution” chain and identify root causes of hallucinations or logic errors using Langfuse analytics.

  • Model Performance & Fine-Tuning: Regularly re-evaluating model performance as new architectures emerge and performing fine-tuning when necessary to meet specific domain requirements.

Qualifications:

  • Node.js & Next.js: Deep knowledge of the stack to build reliable services and handle complex LLM-generated data.

  • Dynamic Prompting Skills: Proven experience in building prompts where content is highly dependent on input variables and context injection.

  • OpenRouter Experience: Experience working with unified APIs, managing rate limits, and selecting the most cost-effective models for specific tasks.

  • Langfuse (or similar): Understanding of LLM observability principles — setting up tracing, creating test datasets, and integrating scoring systems.

  • Evaluation Methodology: Experience with frameworks like RAGAS or building custom “LLM-as-a-judge” systems.

  • Analytical Mindset: Ability to transform raw generation logs into actionable business metrics and technical insights.

  • Iterative Mindset: Focus on continuous product improvement through constant feedback loops.

  • Fluency in Russian and English.

Nice to have:

  • Fine-Tuning: Practical experience in fine-tuning models for specific domain tasks or JSON compliance.

  • RAG Architecture: Understanding how to build and optimize Retrieval-Augmented Generation systems, including indexing, retrieval, and re-ranking.

  • Python: Basic knowledge for working with data science scripts or AI evaluation libraries.

We offer:

  • Fully remote work (within ±4h of CET).

  • Unlimited PTO + paid national holidays.

  • Company-provided MacBook.

  • Flexible Independent Contractor agreement with tax advantages.

Published on: 4/21/2026

Ruby Labs

Ruby Labsverified company badge

Ruby Labs is a leading tech company that creates innovative consumer products across the health, education, and entertainment sectors.

Website

See all 6 jobs at Ruby Labs

Please let Ruby Labs know you found this job on Wantapply.com. It helps us to get more jobs on our site. Thanks!

Similar jobs