forked from MedHackOpen/HospitalPriceSpider
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from Mllexx/master
AndrewMulekeSolutionProposal
- Loading branch information
Showing
1 changed file
with
21 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# MedHack Solution Proposal | ||
|
||
### The Process | ||
|
||
To convert the raw CSV's to the provided standard format I would take the following steps: | ||
|
||
- Using a python script I would recursively read each line of the CSV to check if its a header row by checking for delimition e.g comma or pipe using Regex. The script will allow for the delimeter to be specified, but the default will be a comma. Any line that is read and found not to be a header row will deleted until the header row is found. | ||
- Once the header row has been found, the next step will be to align the file's columns to MedHack's standard format. | ||
- I will start by identifying the required fields [itemName, hospitalId, Currency, Price] using a script that uses regex and string functions to evaluate the header row's columns | ||
- I will then attempt to extract any additional optional fields that will be available as per the standard format. | ||
- I will then create a column map for the raw CSV file that is aligned with MedHack's standard format | ||
- Finally, once the column map is ready, I will proceed to generate a new file that has the data in the raw CSV in MedHack's required standard format. | ||
|
||
### Tools & Utilities | ||
|
||
The tools I have used before for similar tasks and ones I am likely to use inlcude: | ||
|
||
- Bash, Awk & Sed | ||
- Python | ||
- PHP | ||
- Pentaho |