Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk source.
- BD not a single technology but a combination of old/new technologies that helps companies gain actionable insight.
- It is the capability to manage a huge volume of disparate data, at the right speed and within the right time frame to allow real-time analysis and reaction.
- 3 Commonly used measures - 3 Vs - Velocity, Volume, Variety
- Velocity: how quickly data is moving
- Variety: the kinds of data
- Volume: how much data
- One more...Value
- Apache Hadoop: an open source distributed computing platform for storing large quantities of data via the Hadoop Distributed File System (HDFS) - and dividing operations on that data into small fragments via a programming model called MapReduce—was derived from technologies originally built at Google and Yahoo!.
Written by Stephen Moon ([email protected]), 2016