-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using YAML as data format #18
Comments
What's wrong with editing CSV files? 1) load them into Excel/LibreOffice/Google Sheets, 2) edit, 3) export back to csv. |
In European regions, Excel will export CSV files by default not with commas but with semicolons. This is because commas are commonly used in European numbers (eg. 12,34 instead of 12.34.) To further my point here, let's compare further CSV with YAML: ParsabilityThe only advantage I can attribute csv here is that it is smaller by yaml, by nature of its simplistic format. But this also lends itself to some flaws, in particular with the types of strings you have to deal with on the Vita (game names). Also, even the current parsing has bugs: #19 YAML is standard, so it has implementations for parsing in quite a lot of languages. Also, it facilitates parsing should the database ever "evolve" beyond pkgi itself; for example, a web site that automatically stores and updates the database.. DesignThe way csv is structured leads itself to several design flaws. In this case I can point out a simple one: the "name2" item (name_org, aka. original name in the code) is flawed, because the "name" entry represents the original name of the game if alone, or, if the game name is not alphabetic (A-Z, a-z), then it is "translated" to an alphabetic name in the "name" entry and "name2" then becomes the original name. Another example: how do you handle games that are the same (same name, etc) but have different regions? How do you even find the region of a game? CSV has no answer for this, and so the solution in the code is a parse of the titleid to figure out the region. What if I add some new item, but decide to remove it later due to obsolete functionality? Good luck with CSV: now I have one extra colon to add to every line, and each entry becomes further mind gymnastics as manual editing becomes even harder and the database gets bigger. What about YAML?So, why did I suggest using YAML instead of CSV? Simply, because it solves all these problems, while managing to keep a good size (I think with some actual effort it can be made even smaller than my lazy conversion). Let's tackle the problem I mentioned earlier about regions. As mentioned in this discussion, most of the work resides on the database side. Take my example database here; let's make something that looks much better and is easier to parse as well! (Note that, even if you take the same implementation as the CSV, eg. an entry for each item, you can still use node anchors and references to omit the repeating data, which leads to smaller size overall.)
In fact, here I can add new items easily, and I can omit some optional items if I want (such as, say, game description, or author, or all kind of lesser data.) With this design, data can be differentiated for multiple regions, or just put as a single entry if needed. |
Why not use a tool like csved? Though i must say yaml makes more sense for DLC. As for your worries about linux users it runs great in wine! snappy tool that is freeware (not foss though if that is a concern) on something like a vita this 12 percent reduction in size is significant. We could impliment off the shelf gzip compression of the yaml file to get our 12 percent back though! Simply put the .yml file into a gzip container and then decompress this file on demand. https://github.com/TheOfficialFloW/VitaShell/tree/master/unrarlib |
Ok, and why yaml, and not toml, json, xml, or just go full sqlite3? Joining games under same name probably won't happen though. |
Yaml is braindead simple to edit and has sane layout that an actual human can easily edit. Basiclfy it looks pretty. thats one reason to use yaml. and sqlite3 is a huge pain. It requires quite a bit more work to impliment unless you use an existing library. |
Why not.. SQLiteAs stated above, sqlite is a pain. Also, it misses the point of 'plaintext format' and 'easy to edit' - currently, the database has the useful property of being very easily shareable over a plaintext medium. XMLXML has quite a lot of overhead. In general, while XML is more of a markup language, YAML is a data format, which doens't fit the use case here. TOMLFrom the page itself:
JSONThis is the other choice in contrast. JSON is very easy to parse, fits the needs (readability, plain-text, resistant to delimiter collision), has been around for a long time, and takes about the same size as YAML. I chose YAML here because of its easier readability, but it comes down to:
So, by looking more into it, it does seem that JSON should be more suited; if my conclusions are wrong, please rebute them. |
If the pkgi.txt file is ever meant to be edited (to which I assume it will be), then using YAML seems like a good choice. The main advantages for YAML are that it is much more readable, allow not including entries, and don't add a lot of 'overhead' for file size. Here is a simple comparaision:
CSV format (original):
YAML:
Note how with YAML, the item doens't need to be included if it's empty, here name2. Here you could also remove the flags entry when it is 0 (blank). Taking advantage of these tricks we end up with a file that has very close filesize to the original csv (using input with about 900 entries):
which amounts to a ~12,2% filesize increase from the transformation to YAML.
The text was updated successfully, but these errors were encountered: