GitHub

Description

The script performs the following tasks:

Tokenization:

It breaks down the text into words, phrases, symbols, or other meaningful elements called tokens. The output is a list of tokens.

Stop Words Removal:

It removes common words (like 'is', 'the', 'and') that do not carry much meaningful information.

Punctuation Removal:

It removes punctuation from the text.

Frequency Count:

It counts the frequency of each word in the text and prints the 5 most common words.

Lemmatization:

It reduces the words to their base or root form (for example, 'running' to 'run').

Part-of-Speech (POS) Tagging:

It labels each word in the text as corresponding to a particular part of speech (like noun, verb, adjective, etc.).

Named Entity Recognition (NER):

It identifies and classifies named entities in the text into predefined categories like person names, organizations, locations, etc.

Dependency Parsing Visualization:

It visualizes the grammatical structure of sentences, depicting how words relate to each other.

Requirements

Python
SpaCy
en_core_web_sm (SpaCy model)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Lesson.ipynb		Lesson.ipynb
README.md		README.md
introduction.txt		introduction.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Tokenization:

Stop Words Removal:

Punctuation Removal:

Frequency Count:

Lemmatization:

Part-of-Speech (POS) Tagging:

Named Entity Recognition (NER):

Dependency Parsing Visualization:

Requirements

About

Releases

Packages

Languages

linuxphile/nlp_lesson

Folders and files

Latest commit

History

Repository files navigation

Description

Tokenization:

Stop Words Removal:

Punctuation Removal:

Frequency Count:

Lemmatization:

Part-of-Speech (POS) Tagging:

Named Entity Recognition (NER):

Dependency Parsing Visualization:

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages