Skip to content

Latest commit

 

History

History
44 lines (31 loc) · 2.09 KB

04-crisp-dm.md

File metadata and controls

44 lines (31 loc) · 2.09 KB

1.4 CRISP-DM

Slides

Notes

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model. Was conceived in 1996 and became a European Union project under the ESPRIT funding initiative in 1997. The project was led by five companies: Integral Solutions Ltd (ISL), Teradata, Daimler AG, NCR Corporation and OHRA, an insurance company:

  1. Business understanding: An important question is if do we need ML for the project. The goal of the project has to be measurable.
  2. Data understanding: Analyze available data sources, and decide if more data is required.
  3. Data preparation: Clean data and remove noise applying pipelines, and the data should be converted to a tabular format, so we can put it into ML.
  4. Modeling: training Different models and choose the best one. Considering the results of this step, it is proper to decide if is required to add new features or fix data issues.
  5. Evaluation: Measure how well the model is performing and if it solves the business problem.
  6. Deployment: Roll out to production to all the users. The evaluation and deployment often happen together - online evaluation.

It is important to consider how well maintainable the project is.

In general, ML projects require many iterations.

Iteration:

  • Start simple
  • Learn from the feedback
  • Improve
⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation