Now
2025
April
- Finished both Cloud Computing & Text Mining courses. Both are practical and I pretty like them, although I wish I can spent more times on some topic taught in lectures.
- Working on:
- Non-English product search, i.e., adding supports to the indexing backend and autocomplete services.
- Synonym supports. It’s more about allowing merchandisers to define their our product taxonomy, the most tricky part is how to adapt vector search to the customized definition from users w/o re-training models for each customer.
Mar
- Spent a considerable effort to hunt down search bad cases reported by customers. Reproducing issues is easy, but actually pinning the root cause is a completely different story. Long story short, it is due to wrong expanding terms for sparse models.
- Developed new codebase for embedding finetuning. Loved uv after completely switching to it. I also reported some bugs to neural-cherche. Still struggled to get good performance on Splade, SparseEmbed gave me some hope, but not much.
- Reading
- Introduction to Information Retrieval
- Amazon Web Services in Action: Assisted in completing assignments.
- AI-Powered Search: Engaging read, though some chapters were slightly disappointing.
- Traveled to Bangkok: What a journey. Much fun, best food in SEA (aside from VN), and got scammed.
- Study:
- Tried some event detection models and implemented event deduplication. I don’t like the idea of using LLM for everything, but LLM actually is the best out-of-the-box solution for event detection.
- Lots of coding for the other project: learned more about OpenAPI, FastAPI, mongoDB …
Feb
-
A lot of coding & hacking:
- Developing backend services for sport facility recommendation.
- Play around with event detection and knowledge graph construction.
-
Reading
Jan
-
Working on. DNF:- Spell checker - I’ll write a post about modern design of spell checker for product search.
- Introduce personalization to rerank.
- Try LoRA on language models.
-
study@NUS: this semester I’ll take Cloud Computing and Text Mining courses.
-
Reading
- Cloud Computing
- Introduction to Information Retrieval
- Programming Pearls: Will go through all exercises in the book.
-
Done:
- DevOps, DataOps, MLOps: 2/5 - NOT Recommended.
- GNU Make Book: Review.
2024
Dec
- Reading The Silmarillion.
- Working on threshold optimization.
- Accuracy is maintained while performance improves between 50-80%, the first feature is released at the start of Y25.
- Improving multi-lingual search quality by finetuning BGE-M3.
- It was a huge pain 😠 😠. I stopped working on it, the training is not stable with some random spikes of gradients that completely destroys the training.
- Studying: MLOps, AWS.
- Overhauled my homepage. I like the simplicity of the new theme.
- Also added vale to my workflow.
- Decoupled the deployment of lora from the homepage.
- Upgraded to 11ty v3.
- Added Github comments
- (VN) Reviewed Dune.