
Senior LLM Engineer
- San Sebastián, Guipúzcoa
- Permanente
- Tiempo completo
- Design and implement strategies for creating, sourcing, and augmenting datasets tailored for LLM training and fine-tuning.
- Develop scalable pipelines to collect, clean, filter, annotate, and validate large volumes of text data, ensuring quality, ethical compliance, etc.
- Collaborate with ML engineers, researchers, and software engineers to achieve ambitious goals in the preparation of LLMs and complementary work (preparing datasets, model evaluation, model serving, etc.).
- Develop and integrate new routines for modifying and enhancing LLMs, and extending their functionality.
- Make effective use of distributed compute resources and clusters (GPU’s), identify opportunities for further optimization.
- End-to-end preparation of compressed and specialized LLMs for use in production.
- Keep up to date with research trends in LLM foundation models, dataset curation, LLM pretraining data, and benchmarking.
- Contribute to building documentation, development standards, and a healthy shared code base.
- Mentor other engineers and provide knowledge sharing of cutting-edge techniques.
- Master’s, or Ph.D. in Computer Science, AI, Data Science, Physics, Math, or a related field. Or equivalent industry experience.
- 3+ years of experience in data science, machine learning, or related roles, with demonstrated experience with NLP or LLMs.
- In-depth knowledge of large foundational model architectures (language and multimodal models) and their lifecycle: training, fine-tuning, alignment, and evaluation.
- Proficient in Python and data tooling ecosystems (Pandas, NumPy, Hugging Face Datasets & Transformers libraries).
- Hands-on experience with text data collection from diverse sources: web scraping, APIs, proprietary corpora, etc.
- Strong understanding of data quality metrics including bias detection, toxicity, and readability.
- Experience working in large shared distributed computing environments, familiarity with relevant tools for hardware optimization (vLLM, TensorRT, NeMo, etc.).
- Experience with version control (git), unit testing, and other fundamental aspects of software development.
- Effective communication and interpersonal abilities.
- Experience building or contributing to datasets used in LLM pretraining or supervised fine-tuning.
- Experience building foundational LLMs from the ground up
- Familiarity with alignment techniques (e.g., reinforcement learning, preference modeling, reward modeling).
- Exposure to multilingual and low-resource language datasets.
- Contributions to open-source datasets, tools, or publications in dataset-centric research.
- Knowledge of ethical AI, data governance, privacy laws (e.g., GDPR), and responsible data use.
- Familiarity with the software development lifecycle and agile methodologies
- Indefinite contract.
- Equal pay guaranteed.
- Variable performance bonus.
- Signing bonus.
- We offer work visa sponsorship (If applicable).
- Relocation package (if applicable).
- Private health insurance.
- Eligibility for educational budget according to internal policy.
- Hybrid opportunity.
- Flexible working hours.
- Language classes and discounted lunch options
- Working in a high paced environment, working on cutting edge technologies.
- Career plan. Opportunity to learn and teach.
- Progressive Company. Happy people culture
+27 languages