Khmer Machine Learning (ML) Experiment

We build our first Khmer language model using Khmer Wikipedia text and 1000 Khmer news articles as training data using ULMFiT. ULMFiT uses a lot less resource to train than BERT. See how well the model try to complete the sentence from your provided text. Your input text must use spaces to separate words for now.

Tool: Khmer Language Model Using ULMFiT

ML Related Blog

  1. Using AI to Generate Khmer Baby Names (Jul 2018)
  2. How Ranks Khmer News (Aug 2019)
  3. Text Classification with scikit-learn on Khmer Documents (Feb 2019)
  4. Multi-Class Text Classification on Khmer News Articles (Aug 2019)
  5. Word Segmentation of Khmer Text Using Conditional Random Fields (Dec 2019)
    1. NLP: Text Segmentation Using Dictionary Based Algorithms
    2. NLP: Text Segmentation with Ngram
    3. NLP: Text Segmentation Using Naive Bayes
    4. NLP: Text Segmentation Using Hidden Markov Model
    5. NLP: Text Segmentation Using Maximum Entropy Markov Model
    6. NLP: Text Segmentation Using Conditional Random Fields
  6. Khmer Language Model Using ULMFiT (Feb 2020)
  7. A Survey of the State-of-the-Art Language Models up to Early 2020 (Feb 2020)
  8. Creating a Khmer Language Model using BERT (Apr 2020)
  9. Visualizing Covid-19 Data in Cambodia using Google Map API with React JS (Sep 2020)
  10. Overview of Time Series Forecasting from Statistical to Recent ML Approaches (Dec 2020)

Presentation slides

  • Master Algorithm - SCV ML Meetups (Aug 2019) slides
  • Introduction to NLP: From Word Vectors to Language Models - SCV ML Meetup (Jun 2020) slides
  • Conceptual Introduction to Deep Learning - SCV ML Meetups (Aug 2020) slides
  • Getting Started with Text Summarization Using Transformer - SCV ML Meetups (Feb 2021) slides

