NLP is a part of text mining, let’s understand what exactly natural language processing is. Now, natural language processing is a component of text mining which basically helps a machine in reading the text. Machines don’t know English or French, they interpret data in the form of zeroes and ones. So this is where natural language processing comes in. NLP is what computers and smartphones use to understand our language, both spoken and written language. Now because use language to interact with our devices, NLP became an integral part of our life. NLP uses concepts of computer science and artificial intelligence to study the data and derive useful information from it.
Now before we move any further, let’s look at a few applications of NLP and text mining. Now we all spend a lot of time surfing the web. Have you ever noticed that if you start typing a word on Google, you immediately get suggestions like these? This feature is also known as auto-complete. It’ll suggest the rest of the word for you. and we also have something known as spam detection. Here is an example of how Google recognizes the misspelling Netflix and shows results for keywords that match your misspelling. So, spam detection is also based on the concepts of text mining and natural language processing. Next, we have predictive typing and spell checkers. Features like auto-correct and email classification are all applications of text mining and NLP. Now we look at a couple of more applications of natural language processing. We have something known as sentimental analysis. Sentimental analysis is extremely useful in social media monitoring because it allows us to gain an overview of the wider public opinion behind certain topics. So, sentimental analysis is used to understand the public’s opinion or customer’s opinion on a certain product or a certain topic. Sentimental analysis is a very huge part of a lot of social media platforms like Twitter, and Facebook. They use sentimental analysis very frequently. Then we have something known as a chatbot. Chatbots are the solutions for all consumer frustration, regarding customer call assistance. So we have companies like Pizza Hut and Uber who have started using chatbots to provide good customer service, apart from that speech recognition. NLP has widely been used in speech recognition. We’re all aware of Alexa, Siri, Google Assistant, and Cortana. These are all applications of natural language processing. Machine translation is another important application of NLP. An example of this is Google Translator which uses NLP to process and translate one language to the other. Other applications include spell checkers, keyword search, and information extraction, and NLP can be used to get useful information from various websites, word documents, files, et cetera. It can also be used in advertisement matching. This means a recommendation of ads based on your history. So now that you have a basic understanding of where natural language processing is used and what exactly it is, let’s take a look at some important concepts.
firstly, we’re gonna discuss tokenization. Now tokenization is the most basic step in text mining. Tokenization means breaking down data into smaller chunks or tokens so that they can be easily analyzed. Now how tokenization works is it works by breaking a complex sentence into words. you’re breaking a huge sentence into words. You’ll understand the importance of each word concerning the whole sentence, after which will produce a description of an input sentence. for example, let’s say we have this sentence, tokens are simple. If we apply tokenization to this sentence, what we get is this. We’re just breaking a sentence into words. Then we’re understanding the importance of each of these words. We’ll perform an NLP process on each of these words to understand how important each word is in this entire sentence. For me, I think tokens and simple are important words, are is another stop word. We’ll be discussing stop words in our further slides. But for now, you need to understand that tokenization is a very simple process that involves breaking sentences into words.
Next, we have something known as stemming. Stemming is normalizing words in their base form or into their root form. Take a look at this example. We have words like detection, detecting, detected, and detections. Now we all know that the root word for all these words is detect. All these words mean detect. So the stemming algorithm works by cutting off the end or the beginning of the word and taking into account a list of common prefixes and suffixes that can be found on any word. stemming can be successful in some cases, but not always. That is why a lot of people affirm that stem has a lot of limitations. So, to overcome the limitations of stemming, we have something known as lemmatization. Now what lemmatization does is it takes into consideration the morphological analysis of the words. To do so, it is necessary to have a detailed dictionary that the algorithm can look through to link the form back to its lemma. So, lemmatization is also quite similar to stemming. It maps different words into one common root. Sometimes what happens in stemming is that most of the words get cut off. Let’s say we wanted to cut detection into detect. Sometimes it becomes or it becomes a test, or something like that. So because of this, the grammar or the importance of the word goes away.