Paaila Technology

a robotics & ai company

Getting Started With (Nepali) Natural Language Proccessing


April 16, 2018     By: Wasim Akram Khan

Natural Language Processing is required to understand the text based data which form a large chunk of data on the internet. It has various exciting applications : sentiment analysis to find vibes during elections, recommendation systems that made online retailers giant, text summarization to automate business processes in the age of information deluge, chatbots and question answering system, information retrieval and developing a general intelligence in machines. The concepts of NLP can be generalized across languages, that's how the title makes sense. Getting started with NLP is easy while the exploration of depth requires curiosity, time and effort. The resources have been ordered according to the level of difficulty.

Resources to learn NLP


  1. NLTK (Natural Language Processing Toolkit) is the first framework one should explore because it has been built for research purpose. NLTK has great documentation, rather a book, Natural Language Processing with Python which is probably the first thing one should read getting started with Natural Language Processing.

  2. Alternatively, if you did not read the NLP with Python and prefer video, Sentdex has you covered with videos providing great explanation of concepts and interesting applications.

  3. Peter Norvig, director of research at Google Inc. has a great tutorial explaining concepts and applications of NLP. It goes on to explain word sequence probability, frequency smoothing, bigrams, spell checkers. Spell checkers are underrated technologies whose absence can be felt in Nepali language.

  4. TF-IDF (Term Frequency Inverse Document Frequency) is one of the most important concepts in information retrieval and no one explains it better than Christian Perone.

  5. Now that you know how to implement vectorizers, word counters, tf-idf computation, bag of words here is a detailed guide showing how you can do them efficiently using sklearn.

  6. In case you want to make sure you're not missing, all the concepts that graduate students learning NLP master here is Standford NLP Syllabus. Taught by Richard Socher and Christopher Manning, this series Standford NLP video lectures is the best video lectures of NLP available on the internet.

Frameworks to Explore NLP


Man maketh frameworks, frameworks not maketh man


After you have mastered the NLP concepts, mastering frameworks is worthwhile. Being able to use framework without understanding the core concepts does not take one too far in the marathon of making computers understand language.


  1. NLTK (Natural Language Processing Toolkit)

  2. Scikit Learn: A light machine learning or general computation framework where you can implement NLP concepts like spell checkers, sentiment analysis, QA Systems, etc.

  3. Spacy: It is similar to NLTK but built for industrial purpose. It has nice documentation and implementation of great features like word vectors. However, this should not be used as a beginner framework when the framework screams it is for industrial people (people with better understanding of NLP)