Middle/Senior Data Engineer — LLM (Muxlisa AI)
Условия работы
We are building a National Large Language Model (LLM) for the Uzbek language from scratch — from developing data pipelines to fine-tuning and production deployment. We are looking for a Data Engineer to take ownership of the full data lifecycle for the LLM: collection, cleaning, validation, feeding into training pipelines, and supporting RAG systems.
Responsibilities:
-
Pipeline Development: Build scalable data pipelines for training and evaluating LLMs, as well as for RAG systems.
-
Data Processing: Collect, clean, normalize, deduplicate, cluster, and validate massive text datasets (news, books, web content, legal documents, conversational speech).
-
Quality Assurance: Develop automated data quality assurance systems (filters, heuristics, and ML-based validation methods).
-
Tooling: Create and support tools for data annotation, verification, and curation.
-
Dataset Preparation: Prepare datasets for:
-
Instruction tuning;
-
Supervised fine-tuning (SFT);
-
"Question-Answer" pair generation;
-
Translation and summarization tasks.
-
-
Collaboration: Collaborate with ML Engineers on tokenizer training, LoRA/QLoRA tuning, data versioning, and experiment tracking.
-
RAG Implementation: Develop RAG pipelines (document ingestion, chunking, vectorization, retrieval) and integrate custom inference solutions into production.
-
Optimization: Configure workflows to maximize GPU efficiency during training and inference.
Requirements:
-
Python: Deep knowledge of Python for data processing and building ML pipelines.
-
ML Stack: Solid experience with
PyTorchandHuggingFace Transformers, plus a basic understanding ofCUDA(sufficient for collaboration with the ML team). -
LLM Expertise: Practical experience preparing datasets for LLMs, working with Instruction tuning and SFT.
-
Frameworks: Experience with
LangChainorLlamaIndex; understanding ofLoRA/QLoRAand tokenizer training processes. -
DevOps Tools: Proficiency in
GitandDocker. -
Backend: Experience with backend frameworks (
FastAPIor similar).
Conditions:
-
Schedule: 5 days a week, from 09:00 to 18:00.
-
Employment: Official employment in accordance with the Labor Code of the Republic of Uzbekistan, providing 28 calendar days of vacation.
-
Dress Code: No strict dress code — we aim to break stereotypes about government-related organizations.
-
Team: Work within a strong team of professionals ready to share knowledge and experience.
-
Impact: Participation in large-scale and significant projects aimed at creating services to improve the population's quality of life and optimize business processes in the country's leading enterprises.
-
Autonomy: Wide opportunities for independent decision-making and active influence on the company's development.
Заинтересованы в вакансии?
Перед откликом на вакансию, обязательно ознакомьтесь с обязанностями и условиями работы