Now

2025

July

  • Designed and implemented an LLM-based product ingestion service. Due to the latency of LLM inference, integrating this service with the existing product indexing pipeline would violate current SLA requirements.
  • The new service is implemented in Golang (previously I want to use Python - but app size may affect service startup time). Some books helped me a lot:

Jun

  • On-device ML deployment: Upgrade our customized Mediapipe for the latest Unity (6.1) as well as the new SDK from Android & iOS. The in-house version is way too old and has many issues, I will probably need to fork mediapipe again and re-implement our SDKs.

  • Develop a product summarizer service using ChatGPT. POC is easy, developing a production-ready service is not. The main challenge is the fact that LLM is not a “real-time” service, and to save cost, we may need to use Batch API. For that, a whole new service need to be developed, focusing on batch processing and handling race condition between database updates and batch processing.

  • Read:

May

April

  • Finished both Cloud Computing & Text Mining courses. Both are practical and I pretty like them, although I wish I can spent more times on some topics taught in lectures.
  • Worked on:
    • Non-English product search: adding non-English supports to the indexing backend and autocomplete services.
    • Synonym set support: It’s more about allowing merchandisers to define their our product taxonomy, the most tricky part is how to adapt vector search to the customized definition from users w/o re-training models for each customer.
  • Read:

Mar

  • Spent a considerable effort to hunt down search bad cases reported by customers. Reproducing issues is easy, but actually pinning the root cause is a completely different story. Long story short, it is due to wrong expanding terms for sparse models.
  • Developed new codebase for embedding finetuning. Loved uv after completely switching to it. I also reported some bugs to neural-cherche. Still struggled to get good performance on Splade, SparseEmbed gave me some hope, but not much.
  • Reading
  • Traveled to Bangkok: What a journey. Much fun, best food in SEA (aside from VN), and got scammed.
  • Study:
    • Tried some event detection models and implemented event deduplication. I don’t like the idea of using LLM for everything, but LLM actually is the best out-of-the-box solution for event detection.
    • Lots of coding for the other project: learned more about OpenAPI, FastAPI, mongoDB …

Feb

Jan

2024

Dec

  • Reading The Silmarillion.
  • Working on threshold optimization.
    • Accuracy is maintained while performance improves between 50-80%, the first feature is released at the start of Y25.
  • Improving multi-lingual search quality by finetuning BGE-M3.
    • It was a huge pain 😠 😠. I stopped working on it, the training is not stable with some random spikes of gradients that completely destroys the training.
  • Studying: MLOps, AWS.
  • Overhauled my homepage. I like the simplicity of the new theme.
    • Also added vale to my workflow.
    • Decoupled the deployment of lora from the homepage.
    • Upgraded to 11ty v3.
    • Added Github comments
  • (VN) Reviewed Dune.