Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints #1125

Open
ananas304 opened this issue Oct 15, 2024 · 1 comment

Comments

@ananas304
Copy link
Contributor

Issue Description

This issue involves creating a new folder for the Consumer Complaint Dataset: Leveraging an NLP Pipeline to Analyze Financial Consumer Complaints project. The folder will include the dataset, a Jupyter Notebook (.ipynb), and a README.md file. The documentation needs to be added to explain the project’s purpose, dataset preprocessing, and important features like consolidated categories.

Suggested Change

The following actions will be taken:

  1. Create a new folder named Consumer Complaint Dataset under the appropriate directory.
  2. Add the dataset file that was downloaded and preprocessed from the Consumer Financial Protection Bureau (CFPB) website.
  3. Include a Jupyter Notebook (.ipynb) that demonstrates the analysis, preprocessing, and any implemented NLP pipeline tasks such as classification or topic modeling.
  4. Add a README.md file that includes the following sections:
    • Project Title and Description: Explain the purpose of the dataset and the NLP pipeline tasks.
    • Dataset Overview: Details about the source and preprocessing steps, filtering records to include "Consumer complaint narrative," and renaming the column to "narrative."
    • Category Consolidation: Document the merging of 18 original product categories into the product_5 variable with five main categories.
    • Running the Jupyter Notebook: Instructions on how to set up the environment, load the dataset, and run the code.
    • Visualizations: Include graphs showing the distribution of the original and consolidated categories.
    • Potential Uses: Outline possible NLP tasks (classification, sentiment analysis, topic modeling) using the dataset.

Rationale

The addition of this folder and the corresponding files is crucial for organizing the project, ensuring that all required materials are available for contributors and users. The README file will provide detailed documentation, making it easier for others to understand the dataset, preprocessing steps, and how to utilize the .ipynb file for NLP analysis. This will improve the project's usability and transparency, enhancing collaboration and further development.

  • Please add the following tags to the issue: gssoc, gssoc-ext, hacktoberfest.
  • Kindly assign this issue to @ananas304.

Thankyou for your time :)

Copy link

Thanks for creating the issue,Please read the Pinned issued first and Readme.md in each Pull Request you made. Keep learning...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants