data-mining-project

Building a Classifier to Identify the Type of Federal Register Documents

Author Information

Mark Febrizio
[email protected]
DATS 6103 - Summer 2022

Project Description

The Federal Register is the daily journal of the U.S. government, with a new issue published each business day. Each issue is divided into four sections containing four corresponding document types: Notices, Proposed Rules, Rules, and Presidential Documents. However, some data are missing document type labels (e.g., much of the 1990s data). When researchers conduct analysis of agency actions, this produces a severe underestimation of the frequency of document types and the content related to specific topic areas. As a solution, I used the labeled documents to build classifier for document type. After training and testing this classifier on labeled data using supervised learning models, the classifier could be applied to uncategorized data for predicting the correct labels for uncategorized documents.

Code Sequence

The Python code should be run in the following sequence:

retrieve_FR_data.py
data_preprocessing.py
- calls modules: clean_agencies.py, columns_to_date.py
EDA.py
- calls modules: cm_to_heatmap.py
modeling_1.py
modeling_2.py
modeling_3.py
modeling_4.py

Executing main.py runs the code in this sequence.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
code		code
data		data
presentation		presentation
proposal		proposal
report		report
.gitattributes		.gitattributes
.gitignore		.gitignore
FinalProject - Sum.pdf		FinalProject - Sum.pdf
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
project_idea_feedback.pdf		project_idea_feedback.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-mining-project

Author Information

Project Description

Contents

Code Sequence

About

Releases

Packages

Languages

License

mfebrizio/data-mining-project

Folders and files

Latest commit

History

Repository files navigation

data-mining-project

Author Information

Project Description

Contents

Code Sequence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages