CSV Character Extractor

This is a small tool to find all unique characters of a csv. The collected information can be used in order to know which specific characters of a language need to be supported. This tool was developed to create font assets for textmesh pro with just the characters you need. This is especially useful for languages like korean, japanese or chinese.

Input

Input file can be dfined in config.xml, default value is "in/example.csv"
Languages are defined in columns, first column defines the language name (see example.csv)
Column ID and Description will be ignored
Newline (\n) and all emojis will be ignored

Output

Text files are created for each language and named "ColumnName.txt". Output path can be defined in config.xml
One file per column, expecting to have one language per column

Requirements

Java Runtime environment

Download

latest release

Run

Windows: Doubleclick Run.bat
Windows, Mac, Linux: Run java -jar CsvCharacterExtractor.jar in the terminal

Config usage

With the config you can set the in and out path as well as characters that should be always or never included. Take a look at the example config
Paths can be relative, e.g. in/example.csv
Paths can be absolute, e.g. C:/Users/UserName/Documents/LanguageCharacterFiles/
Use forward slashes only /

Example

Example Spreadsheet

Roadmap

Document code
Add information on how to build the project

Used libraries

https://github.com/uniVocity/univocity-parsers