Projects

Selected AI projects covering retrieval systems, model evaluation, and sequence generation. Built around practical ML engineering and deployment.

NLP
RAG
OCR
UIT - VNUHCMExpected graduation Jul. 2026
Nov. 2025 - Present

Audio2Text RAG System

PROJECT_2025

An end-to-end Retrieval-Augmented Generation service that ingests documents and audio, normalizes and indexes content, and supports multi-turn chat over a Milvus-backed knowledge base.

PythonFastAPILangChainHugging FaceMilvusRedisDocker

Hybrid retrieval with dense embeddings, BM25, and fusion.

Cross-encoder reranking for broad and ambiguous queries.

OpenAI-compatible SSE streaming API for Open WebUI integration.

Jan. 2026

Sentiment Analysis for Mental Health

PROJECT_2026

A comparative NLP system that evaluated classical baselines against a fine-tuned BERT model on a 40,000+ sample mental health dataset, then deployed the best production trade-off.

PythonScikit-learnPyTorchHugging FaceNLTKOptunaDocker

Benchmarked Naive Bayes and Linear SVC against fine-tuned BERT.

Reached 84.14% macro F1 with the selected BERT model.

Deployed the final model as a full-stack NLP web application.

Dec. 2025 - Jan. 2026

LaTeX OCR

PROJECT_2025

A robust image-to-LaTeX system comparing six encoder-decoder architectures on the im2latex-100k dataset, with a Transformer decoder and beam search for stronger sequence generation.

PythonPyTorchtimmAlbumentationsOpenCV

Compared CNN, ResNet, and ViT encoder variants.

Achieved 91.46 BLEU-4 and 46.95% exact match.

Improved BLEU-4 by +14.6 over the additive-attention baseline.