Skip to content

Latest commit

 

History

History
21 lines (15 loc) · 1.92 KB

File metadata and controls

21 lines (15 loc) · 1.92 KB

Data-Mining-Repo-MAT 443 (ISU)

This code lab is based on the details of the Statistical Learning and Data Mining, Inference, and Prediction, Regression Analysis content from the MAT 443 Class of ISU.

This repository contains materials that provides a case-based introduction to the exciting, high-demand field of statistical learning for analyzing massive datasets.

Description

The field of Statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of “data mining”; statistical and computational problems in biology and medicine have created “bioinformatics.” Vast amounts of data are being generated in many fields, and the statistician’s job is to make sense of it all: to extract important patterns and trends, and understand “what the data says.” We call this learning from data.

Introduction

Statistical learning plays a key role in many areas of science, finance and industry. Here are some examples of learning problems: • Predict whether a patient, hospitalized due to a heart attack, will have a second heart attack. The prediction is to be based on demographic, diet and clinical measurements for that patient. • Predict the price of a stock in 6 months from now, on the basis of company performance measures and economic data. • Identify the numbers in a handwritten ZIP code, from a digitized image. • Estimate the amount of glucose in the blood of a diabetic person, from the infrared absorption spectrum of that person’s blood. • Identify the risk factors for prostate cancer, based on clinical and demographic variables.