UZINFOCOM logoUZINFOCOM logo

Middle/Senior Data Engineer — LLM (Muxlisa AI)

Full-time

3-6

Main Office

10.02.2026

Условия работы

We are building a National Large Language Model (LLM) for the Uzbek language from scratch — from developing data pipelines to fine-tuning and production deployment. We are looking for a Data Engineer to take ownership of the full data lifecycle for the LLM: collection, cleaning, validation, feeding into training pipelines, and supporting RAG systems.

Responsibilities:

  • Pipeline Development: Build scalable data pipelines for training and evaluating LLMs, as well as for RAG systems.

  • Data Processing: Collect, clean, normalize, deduplicate, cluster, and validate massive text datasets (news, books, web content, legal documents, conversational speech).

  • Quality Assurance: Develop automated data quality assurance systems (filters, heuristics, and ML-based validation methods).

  • Tooling: Create and support tools for data annotation, verification, and curation.

  • Dataset Preparation: Prepare datasets for:

    • Instruction tuning;

    • Supervised fine-tuning (SFT);

    • "Question-Answer" pair generation;

    • Translation and summarization tasks.

  • Collaboration: Collaborate with ML Engineers on tokenizer training, LoRA/QLoRA tuning, data versioning, and experiment tracking.

  • RAG Implementation: Develop RAG pipelines (document ingestion, chunking, vectorization, retrieval) and integrate custom inference solutions into production.

  • Optimization: Configure workflows to maximize GPU efficiency during training and inference.

Requirements:

  • Python: Deep knowledge of Python for data processing and building ML pipelines.

  • ML Stack: Solid experience with PyTorch and HuggingFace Transformers, plus a basic understanding of CUDA (sufficient for collaboration with the ML team).

  • LLM Expertise: Practical experience preparing datasets for LLMs, working with Instruction tuning and SFT.

  • Frameworks: Experience with LangChain or LlamaIndex; understanding of LoRA / QLoRA and tokenizer training processes.

  • DevOps Tools: Proficiency in Git and Docker.

  • Backend: Experience with backend frameworks (FastAPI or similar).

Conditions:

  • Schedule: 5 days a week, from 09:00 to 18:00.

  • Employment: Official employment in accordance with the Labor Code of the Republic of Uzbekistan, providing 28 calendar days of vacation.

  • Dress Code: No strict dress code — we aim to break stereotypes about government-related organizations.

  • Team: Work within a strong team of professionals ready to share knowledge and experience.

  • Impact: Participation in large-scale and significant projects aimed at creating services to improve the population's quality of life and optimize business processes in the country's leading enterprises.

  • Autonomy: Wide opportunities for independent decision-making and active influence on the company's development.

Заинтересованы в вакансии?

Перед откликом на вакансию, обязательно ознакомьтесь с обязанностями и условиями работы