DATA SET: https://de-mapreduce-gutenberg.s3.amazonaws.com/100-0.txt
- Which word has the highest frequency of occurrence in the document?
- What is the frequency of occurrence of the word ‘Romeo’? (Ignore cases and don't remove punctuation marks from any words.)
- What is the frequency of the phrase "circumference." in the data set? (You do not need to remove the punctuation marks from the words.)
Data Set: https://drive.google.com/file/d/1DpfofGJbMeB4ZIh6wM54nYoeHQUjH3uL/view?usp=share_link
- Count the number of unique ordered pairs of origin and destination (Origin, Destination) present in the dataset, i.e., for two flights, either the origin or the destination differs.
- What is the airport code and the number of flights corresponding to that airport, with the maximum number of outgoing flights in the year 2004?
DATA SET: https://drive.google.com/file/d/10TLQxUn1ndkUcfeHRNVcB_JZtd_c-mxw/view?usp=share_link
- Which player scored the highest number of centuries?
- In which year did Indian players score the maximum number of centuries?
Here, we have chosen the stock market dataset (NYSE.csv) on which we have performed map-reduce operations. Following is the structure of the data. Kindly find the solutions to the questions below.
Data Structure
- Exchange Name 2 Stock symbol
- Transaction date
- Opening price of the stock
- Intra day high price of the stock
- Intra day low price of the stock
- Closing price of the stock
- Total Volume of the stock on the particular day
- Adjustment Closing price of the stock Field Separator – comma