This repo contains a single R-file run_analysis.R, which contains 4 functions that together make up the project.
The aim of this project is to take the UCI HAR Dataset and tidy the dataset.
The project has 5 objectives:
- Merge all data into a single dataset and this dataset contains 1 variable per column and 1 observation per row
- Extract only the means and standard deviations of measurements
- Give activities descriptive names
- Give variables (columns) descriptive names
In addition:
5. Create a summary from this tidy dataset, with the means per subject per activity
load run_analysis.R and call the runProject()-function This uses the default settings with the datafiles expected in the working directory. Optionally a directory for the dataset and a filename for the summary can be specified.
Information on the background, raw data files and processing can be found in codebook.md
Here the functions in run_analysis.R are described that make up the project
This is the main workhorse for this project. The function accepts a directory where the dataset is stored creates a tidy dataset. In overview (with numbers matching the steps marked in the R code):
-
-
- A check is performed to verify all above listed files are present and if so, they are all read into memory.
-
- The activity codes and subjects - which are stored in separate files, are merged into their respective datasets.
-
-
- The columns are given descriptive names based on features.txt.
6.1 The test and train dataset are merged. This completes objective 1 and 4.
6.2 From this dataset, only the columns that have 'mean' or 'std' in their name are selected, finising objective 2.
6.3 - 6.4 The activity codes are translated into descriptive activity names.
- The columns are given descriptive names based on features.txt.
-
- This finalizes objective 3 and results in a tidy dataset which is returned.
This function takes the dataset from getTidyData as input, together with a filename and creates a summary. This summary is the average per subject per activity for all the columns, meeting objective 5. This summary is then written to a file in the working directory.
This function is not necessary for the project, but is a little helper-function to read to generated summary back into R.
This function is the only one the end-user needs to call, which in turn calls getTidyData and createSummary to fulfill the entire project with a single call.