[FEATURE]: Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints #1125

ananas304 · 2024-10-15T17:11:44Z

Issue Description

This issue involves creating a new folder for the Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints project. The folder will include the dataset, a Jupyter Notebook (.ipynb), and a README.md file. The documentation needs to be added to explain the project’s purpose, dataset preprocessing, and important features like consolidated categories.

Suggested Change

The following actions will be taken:

Create a new folder named Consumer Complaint Dataset under the appropriate directory.
Add the dataset file that was downloaded and preprocessed from the Consumer Financial Protection Bureau (CFPB) website.
Include a Jupyter Notebook (.ipynb) that demonstrates the analysis, preprocessing, and any implemented NLP pipeline tasks such as classification or topic modeling.
Add a README.md file that includes the following sections:
- Project Title and Description: Explain the purpose of the dataset and the NLP pipeline tasks.
- Dataset Overview: Details about the source and preprocessing steps, filtering records to include "Consumer complaint narrative," and renaming the column to "narrative."
- Category Consolidation: Document the merging of 18 original product categories into the product_5 variable with five main categories.
- Running the Jupyter Notebook: Instructions on how to set up the environment, load the dataset, and run the code.
- Visualizations: Include graphs showing the distribution of the original and consolidated categories.
- Potential Uses: Outline possible NLP tasks (classification, sentiment analysis, topic modeling) using the dataset.

Rationale

The addition of this folder and the corresponding files is crucial for organizing the project, ensuring that all required materials are available for contributors and users. The README file will provide detailed documentation, making it easier for others to understand the dataset, preprocessing steps, and how to utilize the .ipynb file for NLP analysis. This will improve the project's usability and transparency, enhancing collaboration and further development.

Please add the following tags to the issue: gssoc, gssoc-ext, hacktoberfest.
Kindly assign this issue to @ananas304.

Thankyou for your time :)

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-15T17:12:11Z

Thanks for creating the issue,Please read the Pinned issued first and Readme.md in each Pull Request you made. Keep learning...

monskey6574 assigned ananas304 Oct 26, 2024

monskey6574 added gssoc-ext gssoc hacktoberfest accepted labels Oct 26, 2024

ananas304 mentioned this issue Oct 29, 2024

Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints #1169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints #1125

[FEATURE]: Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints #1125

ananas304 commented Oct 15, 2024

github-actions bot commented Oct 15, 2024

[FEATURE]: Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints #1125

[FEATURE]: Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints #1125

Comments

ananas304 commented Oct 15, 2024

Issue Description

Suggested Change

Rationale

Thankyou for your time :)

github-actions bot commented Oct 15, 2024