Research Engineer, Multi-Domain Alignment (SLM)
Full Time
GIFT City, Gandhinagar, Gujarat, India (IAIRO HQ)
About the Role
IAIRO is seeking a Research Engineer to join our Alignment team. We are committed to advancing the frontier of Sovereign AI by developing compact, "right-sized" models that are as steerable and reliable as their massive counterparts.
In this role, you will bridge the gap between base pretraining and real-world deployment. You won’t just be fine-tuning checkpoints; you will be architecting the behavioral logic and safety frameworks for a new class of multimodal SLMs. Your work will focus on multi-domain alignment, ensuring our models can transition seamlessly between specialized fields—such as Legal, Healthcare, and Industrial Robotics—while maintaining rigorous adherence to human intent and cultural values.
Core Responsibilities
Frontier Alignment Research: Design and implement scalable alignment pipelines (SFT, DPO, PPO) to optimize 1B–7B parameter models for high-stakes, domain-specific tasks.
Advanced Preference Modeling: Architect reward models and preference datasets that capture nuanced domain expertise, moving beyond generic "helpfulness" to expert-level reasoning.
Multi-Domain Synthesis: Develop innovative techniques to mitigate "alignment drift" and "catastrophic forgetting" when models are specialized across disparate industries (e.g., ensuring a model stays factually grounded in domain contexts while remaining flexible in creative ones).
Evaluation & Red-Teaming: Devise rigorous, automated benchmarking suites (LLM-as-a-judge) and adversarial testing frameworks to validate model robustness in "out-of-distribution" scenarios.
Open Source & Transparency: Contribute to the broader AI community by open-sourcing high-quality code and producing reproducible research that impacts the Sovereign AI ecosystem.
Required Skills & Experience
Master’s or PhD in Computer Science, ML, or equivalent practical experience in training large-scale models.
Expertise in Python and PyTorch, specifically within the Hugging Face ecosystem (Transformers, TRL, PEFT, Accelerate).
Proven Alignment Track Record: Significant experience with RLHF (Reinforcement Learning from Human Feedback), Direct Preference Optimization (DPO), and Constitutional AI.
Scaling Knowledge: A deep understanding of Scaling Laws and the "Alignment Tax"—knowing how to maximize performance in compute-constrained environments.
Multimodal Familiarity: Experience aligning models that process not just text, but visual and sensor-based data.
Bonus Qualifications
Publication Record: Research results published at leading venues such as NeurIPS, ICML, ICLR, or MLSys.
Synthetic Data Engineering: Experience building high-fidelity synthetic data pipelines to improve multi-step reasoning and logic.
Hardware Awareness: Familiarity with optimizing inference engines (vLLM, TensorRT-LLM) or writing custom kernels (Triton/CUDA) for deployment on edge devices.
Apply for this Role
To apply, send your resume and relevant details to
careers@iairo.ai
