Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
weilinglindachen committed Apr 14, 2024
1 parent 7acc6a8 commit 1457e70
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 0 deletions.
56 changes: 56 additions & 0 deletions hgct/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Text Analysis with HGCT

This repository showcases the [HGCT (Hanzi Glyph Corpus Toolkit)](https://yongfu.name/hgct/index.html), a tool designed for advanced analysis and querying of Chinese text data, demonstrated through a textbook-derived corpus.

## Installation
To use this package, clone the repository and install the required dependencies.

1. Python version
```bash
3.11 > Python >= 3.0
```

2. Clone repository

```bash
git clone [email protected]:lopentu/HanziAnalysisKit.git
```

3. Install Requirement

```bash
cd HanziAnalysisKit && pip install -r requirements.txt
```

## Quick Start

### Building the Corpus
To prepare your data for use with the HGCT tool, build your corpus from the provided textbook data:

```python
from textbook import build_corpus

# Specify your CSV data file and desired output folder
csv_file = './data/教科書課文.csv'
folder = "textbook_corpus"

# Build the corpus
build_corpus(csv_file, folder)
```

This will create a `textbook_corpus` folder in your project directory, containing the processed data ready for analysis with HGCT.

### Directory Structure
After building the corpus, your project directory will include:

```
|--- data/
| |--- 教科書課文.csv
|--- textbook/
|--- textbook_corpus/ # Newly generated
|--- ...
```

### Advanced HGCT Features
Explore the powerful querying capabilities of HGCT with the prepared textbook corpus. For comprehensive examples and detailed usage instructions, visit our [GitHub project page](https://lopentu.github.io/HanziAnalysisKit/) .

2 changes: 2 additions & 0 deletions hgct/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
PyYAML==6.0.1
hgct==0.0.1

0 comments on commit 1457e70

Please sign in to comment.