Text Classification, 🇰🇷 버전
ALBERT is "A Lite" version of BERT, a popular unsupervised language representation learning algorithm. ALBERT uses parameter-reduction techniques that allow for large-scale configurations, overcome previous memory limitations, and achieve better behavior with respect to model degradation.
For a technical description of the algorithm, see our paper: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Using the ktrain library, proceed with the text classification. Detailed descriptions can be found at Blog
pip install -r requirements.txt
With simple commands, you can proceed with text classification for datasets made up of csv files, use main.py
:
python main.py \
--csv data.csv \
--label Category \
--data Resume \
--epoch 5
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--csv', help='train model csv file')
parser.add_argument('--label', help='train label of dataset')
parser.add_argument('--data', help='train dataset')
parser.add_argument('--epoch', help='traing Epoch')
def read_dataset(dataset, data, label):
df = pd.read_csv(dataset)
label_list = list(set(df[args.label]))
df.sample(frac=1)
x_train, x_test, y_train, y_test = train_test_split(
list(df[data]), list(df[label]), test_size=0.33, random_state=42)
return x_train, x_test, y_train, y_test, label_list
Replace the bottom part with the model you want.
MODEL_NAME = 'albert-base-v2'
Model | Type of detail |
---|---|
BERT: | bert-base-uncased, bert-large-uncased, bert-base-multilingual-uncased, and others. |
DistilBERT: | distilbert-base-uncased, distilbert-base-multilingual-cased, distilbert-base-german-cased, and others |
ALBERT: | albert-base-v2, albert-large-v2, and others |
RoBERTa: | roberta-base, roberta-large, roberta-large-mnli |
XLM: | xlm-mlm-xnli15–1024, xlm-mlm-100–1280, and others |
XLNet: | xlnet-base-cased, xlnet-large-cased |
You can use the function below.
def predictor(learner, test):
predictor = ktrain.get_predictor(learner.model, preproc=t)
print(predictor.predict(test))
tensorboard \
--logdir==training:your_log_dir \
--host=127.0.0.1