Skip to content

Latest commit

ย 

History

History
325 lines (185 loc) ยท 14.1 KB

08_AWS_Analytics.md

File metadata and controls

325 lines (185 loc) ยท 14.1 KB

AWS Analytics


Contents

  • AWS Analytic Services

Useful Informations

  1. AWS๋Š” ๋ถ„์„์„ ์œ„ํ•ด Data Lake๋ฅผ ๋น ๋ฅด๊ณ  ์‰ฝ๊ฒŒ ๊ตฌ์ถ•ํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ๋ชจ๋“  ๊ฒƒ์„ ์ œ๊ณตํ•˜๋Š” ํ†ตํ•ฉ ์„œ๋น„์Šค ์ œํ’ˆ๊ตฐ์„ ์ œ๊ณตํ•œ๋‹ค
    • Data Lake ๋ฐ ๋ถ„์„ solution ๊ตฌ์ถ•์„ ์œ„ํ•œ ๊ฐ€์žฅ ํฌ๊ด„์ ์ด๊ณ  ์•ˆ์ „ํ•˜๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•˜๋ฉฐ ๋น„์šฉ ํšจ์œจ์ ์ธ Service portfolio๋‹ค
  2. Amazon S3๋Š” web site, mobile app, enterprise application, IoT sensor, data from device ๋“ฑ ์–ด๋””์„œ๋‚˜ ๋ชจ๋“  ์œ ํ˜•์˜ data๋ฅผ ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ตฌ์ถ•๋˜์—ˆ๋‹ค
    • ๋ชจ๋“  ์–‘์˜ data๋ฅผ ์ €์žฅ ๋ฐ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํƒ์›”ํ•œ ๊ฐ€์šฉ์„ฑ์„ ๊ฐ–์ถ”๊ณ  ์ œ์ž‘๋˜์—ˆ์œผ๋ฉฐ 99.999999% ์˜ ๋‚ด๊ตฌ์„ฑ์„ ์ œ๊ณตํ•˜๋„๋ก ์ œ์ž‘๋˜์—ˆ๋‹ค


AWS ๋ถ„์„ ์„œ๋น„์Šค


Data Lakes and Analytics on AWS

Data Lake ๋ฐ Analytic Solution ๊ตฌ์ถ•์„ ์œ„ํ•œ ๊ฐ€์žฅ ํฌ๊ด„์ ์ด๊ณ  ์•ˆ์ „ํ•˜๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•˜๋ฉฐ ๋น„์šฉ ํšจ์œจ์ ์ธ Service Portfolio

  • AWS๋Š” ๋ถ„์„์„ ์œ„ํ•ด Data Lake (์ •์ œ๋˜์ง€ ์•Š์€ raw data) ๋ฅผ ๋น ๋ฅด๊ณ  ์‰ฝ๊ฒŒ ๊ตฌ์ถ•ํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ๋ชจ๋“  ๊ฒƒ์„ ์ œ๊ณตํ•˜๋Š” ํ†ตํ•ฉ ์„œ๋น„์Šค ์ œํ’ˆ๊ตฐ์„ ์ œ๊ณตํ•จ
  • Data Lakes on AWS ๋Š” ๊ธฐ์กด์˜ Data Silo (๊ฒฉ๋ฆฌ๋œ local data) ๋ฐ Data Warehouse ๊ฐ€ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์—†๋Š” ๋ฐฉ์‹์œผ๋กœ ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ data์™€ ๋ถ„์„ ๊ธฐ๋ฒ•์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ณด๋‹ค ์‹ฌ์ธต์ ์ธ ํ†ต์ฐฐ๋ ฅ์„ ์–ป๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ทœ๋ชจ, ๋ฏผ์ฒฉ์„ฑ ๋ฐ ์œ ์—ฐ์„ฑ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Œ
  • AWS๋Š” security & governance๋ฅผ ์ €ํ•˜ํ•˜์ง€ ์•Š์œผ๋ฉด์„œ ๋ชจ๋“  ๊ด€๋ จ data์— ์‰ฝ๊ฒŒ access ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ๊ด‘๋ฒ”์œ„ํ•œ ๋ถ„์„ ๋ฐ machine learning service๋ฅผ ๊ณ ๊ฐ์—๊ฒŒ ์ œ๊ณตํ•จ

image-20200308193321001


  1. Data Movement

    : Import your data from on premises, and in real-time

  2. Data Lake

    : Store any type of data securely, from gigabytes to exabytes

  3. Analytics

    : Analyze your data with the broadest selection of analytics services

  4. Machine Learning

    : Predict future outcomes, and prescribe actions for rapid response



Data Lake

  • Data ๊ฐ€ Cloud์— ๋Œ€ํ•œ ์ค€๋น„๊ฐ€ ๋˜๋ฉด AWS ์—์„œ Amazon S3 ๋ฐ Amazon Glacier ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ data๋ฅผ ๋ชจ๋“  ํ˜•์‹์œผ๋กœ ์•ˆ์ „ํ•˜๊ฒŒ ๊ทธ๋ฆฌ๊ณ  ๋ฐฉ๋Œ€ํ•œ ๊ทœ๋ชจ๋กœ ์‰ฝ๊ฒŒ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Œ
  • ์ตœ์ข… ์‚ฌ์šฉ์ž๊ฐ€ ๋ถ„์„์— ์‚ฌ์šฉํ•  ๊ด€๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ฒŒ ์ฐพ์„ ์ˆ˜ ์žˆ๋„๋ก AWS Glue๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๊ฒ€์ƒ‰ํ•˜๊ณ  query ํ•  ์ˆ˜ ์žˆ๋Š” ๋‹จ์ผ catalog๋ฅผ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•จ

Storage - Amazon S3

Amazon S3๋Š” data access๋ฅผ ์œ„ํ•œ ์•ˆ์ „ํ•˜๊ณ  ํ™•์žฅ์„ฑ์ด ๋›ฐ์–ด๋‚œ ๋ฐ€๋ฆฌ์ดˆ ์ง€์—ฐ์‹œ๊ฐ„์˜ ๊ฐ์ฒด ์Šคํ† ๋ฆฌ์ง€์ด๋‹ค

  • S3 Select๋Š” data ์ฝ๊ธฐ ๋ฐ ๊ฒ€์ƒ‰์— ์ค‘์ ์„ ๋‘ ์œผ๋กœ์จ ์‘๋‹ต ์‹œ๊ฐ„์„ ์ตœ๋Œ€ 400%๊นŒ์ง€ ๋‹จ์ถ•ํ•จ
  • S3๋Š” ๊ฐ€์žฅ ์—„๊ฒฉํ•œ ๊ทœ์ œ ์š”๊ตฌ ์‚ฌํ•ญ๊นŒ์ง€๋„ ์ถฉ์กฑํ•˜๋Š” ํฌ๊ด„์ ์ธ ๋ณด์•ˆ ๋ฐ ๊ทœ์ • ์ค€์ˆ˜ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•จ

Backup & Archive - Amazon Glacier

Amazon Glacier๋Š” data๋ฅผ ์ˆ˜ ๋ถ„ ๋‚ด์— access ํ•  ์ˆ˜ ์žˆ๋Š” ์žฅ๊ธฐ backup ๋ฐ archive๋ฅผ ์œ„ํ•œ ์•ˆ์ „ํ•˜๊ณ  ๋‚ด๊ตฌ์„ฑ ์žˆ๋Š” ๋งค์šฐ ์ €๋ ดํ•œ storage ์ด๋‹ค

  • Glacier Select ๋Š” ํ•„์š”ํ•œ data๋งŒ ์ฝ๊ณ  ๊ฒ€์ƒ‰ํ•จ
  • ๊ณ ๊ฐ์€ ์›”๋ณ„ GB๋‹น 0.004 USD์˜ ์ €๋ ดํ•œ ์š”๊ธˆ์œผ๋กœ data๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ On-premise solution๊ณผ ๋น„๊ตํ•˜๋ฉด ์ƒ๋‹นํ•œ ๋น„์šฉ ์ ˆ๊ฐ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์Œ

Data Catalog - AWS Glue

AWS Glue๋Š” data lake์˜ data๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก data catalog๋ฅผ ์ œ๊ณตํ•˜๊ณ  ๋ถ„์„์„ ์œ„ํ•ด data๋ฅผ ์ค€๋น„ํ•˜๋„๋ก **Extract / Transform / Load (ETL)**๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์™„์ „ ๊ด€๋ฆฌํ˜• ์„œ๋น„์Šค ์ด๋‹ค

  • Data catalog๋Š” ๋ชจ๋“  data ์ž์‚ฐ์— ๋Œ€ํ•œ permanent meta-data storage๋กœ ์ž๋™ ์ƒ์„ฑ๋˜๋ฏ€๋กœ ๋ชจ๋“  data๋ฅผ ๊ฒ€์ƒ‰, query ํ•  ์ˆ˜ ์žˆ์Œ


Amazon Athena (๋Œ€ํ™”์‹ ๋ถ„์„)

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL


  • Amazon Athena๋Š” Standard SQL query๋ฅผ ์‚ฌ์šฉํ•ด S3์™€ Glacier์— ์žˆ๋Š” data๋ฅผ ์ง์ ‘ ๊ฐ„ํŽธํ•˜๊ฒŒ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์คŒ
  • Athena๋Š” serverless์ด๋ฏ€๋กœ, ์„ค์ •ํ•˜๊ฑฐ๋‚˜ ๊ด€๋ฆฌํ•  infra๊ฐ€ ์—†์Œ
  • Data๋ฅผ ์ฆ‰์‹œ queryํ•˜๊ณ , ๋ช‡ ์ดˆ๋‚ด์— ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ณ , ์‹คํ–‰ํ•œ query์— ๋Œ€ํ•ด์„œ๋งŒ ๋น„์šฉ์„ ์ง€๋ถˆํ•จ
  • Amazon S3์— ์ €์žฅ๋œ data๋ฅผ ๊ฐ€๋ฆฌํ‚ค๊ณ  schema๋ฅผ ์ •์˜ํ•œ ํ›„ Standard SQL์„ ์‚ฌ์šฉํ•˜์—ฌ query๋ฅผ ์‹œ์ž‘ํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋จ!
  • ๋Œ€๋ถ€๋ถ„ ๊ฒฐ๊ณผ๊ฐ€ ์ˆ˜ ์ดˆ ์ด๋‚ด์— ์ œ๊ณต๋จ


Amazon CloudSearch (๊ด€๋ฆฌํ˜• ๊ฒ€์ƒ‰ ์„œ๋น„์Šค)

Amazon CloudSearch is a managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution for your website or application

  • Amazon CloudSearch supports 34 languages and popular search features such as highlighting, auto-complete, and geospatial search


Amazon EMR

Easily run and scale Apache Spark, Hadoop, HBase, Presto, Hive, and other big data frameworks

Big Data

  • Amazon EMR ์€ ๊ด€๋ฆฌํ˜• ์„œ๋น„์Šค๋กœ์„œ ๋Œ€๋Ÿ‰์˜ data๋ฅผ ์‰ฝ๊ณ  ๋น ๋ฅด๋ฉฐ ๋น„์šฉ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Œ
  • ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง, ๋ฐ์ดํ„ฐ ๊ณผํ•™ ๊ฐœ๋ฐœ ๋ฐ ํ˜‘์—…์„ ์œ„ํ•œ ๊ด€๋ฆฌํ˜• EMR Notebook
  • ๊ฐ project๋Š” version release ํ›„ 30์ผ ์ด๋‚ด์— EMR์—์„œ update ๋˜๋ฏ€๋กœ community๋กœ๋ถ€ํ„ฐ ๊ฐ€์žฅ ์ตœ์‹ ์˜ ์ตœ๊ณ  project๋ฅผ ์†์‰ฝ๊ฒŒ ์–ป์„ ์ˆ˜ ์žˆ์Œ


Amazon Elasticsearch Service

Amazon Elasticsearch Service is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost effectively at scale

image-20200308212240234


์šด์˜ ๋ถ„์„

  • Application monitoring, log ๋ถ„์„, click stream ๋ถ„์„๊ณผ ๊ฐ™์€ ์šด์˜ ๋ถ„์„์˜ ๊ฒฝ์šฐ Amazon Elasticsearch Service๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ฑฐ์˜ ์‹ค์‹œ๊ฐ„์œผ๋กœ data๋ฅผ ๊ฒ€์ƒ‰, ํƒ์ƒ‰ filtering, ์ง‘๊ณ„ ๋ฐ ์‹œ๊ฐํ™” ํ•  ์ˆ˜ ์žˆ์Œ
  • Amazon Elasticsearch Service๋Š” Elasticsearch์˜ ๊ฐ„ํŽธํ•œ API ๋ฐ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ๊ธฐ๋Šฅ๊ณผ ๋”๋ถˆ์–ด production workload์— ํ•„์š”ํ•œ ๊ฐ€์šฉ์„ฑ, ํ™•์žฅ์„ฑ, ๋ณด์•ˆ์„ฑ์„ ์ œ๊ณตํ•จ

์‹ค์‹œ๊ฐ„ ๋ถ„์„

  • Amazon Kinesis ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด IoT telemetry data, application log, web site click stream๊ณผ ๊ฐ™์€ streaming data๋ฅผ ๊ฐ„ํŽธํ•˜๊ฒŒ ์ˆ˜์ง‘, ์ฒ˜๋ฆฌ ๋ฐ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Œ
  • ๋ชจ๋“  data๊ฐ€ ์ˆ˜์ง‘๋œ ํ›„์—์•ผ ์ฒ˜๋ฆฌ๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ data lake์— data๊ฐ€ ์ˆ˜์‹ ๋˜๋Š” ๋Œ€๋กœ ์ฒ˜๋ฆฌ ๋ฐ ๋ถ„์„ํ•˜์—ฌ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ์Œ


Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Amazon MSK is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming data


image-20200308215638160


  • Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications
  • With Amazon MSK, you can use native Apache Kafka APIs to populate data lakes, stream changes to and from databases, and power machine learning and analytics applications
  • Amazon MSK๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด Apache Kafka infra ๊ด€๋ฆฌ์— ๋Œ€ํ•œ ์ „๋ฌธ์„ฑ ์—†์ด๋„ Apache Kafka์—์„œ ํŽธ๋ฆฌํ•˜๊ฒŒ production application์„ ๊ตฌ์ถ•ํ•˜๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Œ
    • Infra ๊ด€๋ฆฌ ์‹œ๊ฐ„์„ ์ค„์ด๊ณ , ๋” ๋งŽ์€ ์‹œ๊ฐ„์„ application ๊ฐœ๋ฐœ์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Œ
  • Streaming data๋ฅผ ์ง€์†์ ์œผ๋กœ ๋ถ„์„ํ•˜์—ฌ ๊ด€๋ จ ๋Œ€์‘ ์กฐ์น˜๋ฅผ ์ทจํ•˜๋Š” application์šฉ data source๋กœ Apache Kafka๋ฅผ ์‚ฌ์šฉํ•จ


Amazon Redshift

The most popular and fastest cloud data warehouse


Redshift Data Lake Integration


Data warehousing

  • Amazon Redshift๋Š” petabite์˜ ์ •ํ˜• ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋ณต์žกํ•œ ๋ถ„์„ query๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณต
  • ๋ถˆํ•„์š”ํ•œ data ์ด๋™ ์—†์ด S3์˜ ์ •ํ˜•/๋น„์ •ํ˜• data์— ๋Œ€ํ•ด ์ง์ ‘ SQL query๋ฅผ ์‹คํ–‰ํ•˜๋Š” Redshift Spectrum ์„ ํฌํ•จํ•จ
  • Amazon Redshift๋Š” ๊ธฐ์กด solution ๋น„์šฉ์ด 1/10๋„ ๋˜์ง€ ์•Š์Œ!
    • ์‹œ๊ฐ„๋‹น 0.25 USD
    • ์—ฐ๊ฐ„ 1,000 USD


Amazon QuickSight

Amazon QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization


How QuickSight Works_without Q_final


๋น ๋ฅธ ๋น„์ฆˆ๋‹ˆ์Šค ๋ถ„์„ ์„œ๋น„์Šค & ๋Œ€์‹œ๋ณด๋“œ ๋ฐ ์‹œ๊ฐํ™”

  • ๋Œ€์‹œ๋ณด๋“œ ๋ฐ ์‹œ๊ฐํ™”์˜ ๊ฒฝ์šฐ, Amazon QuickSight๋Š” ๋น ๋ฅด๊ณ  ๊ฐ•๋ ฅํ•œ Cloud ๊ธฐ๋ฐ˜ ๋น„์ฆˆ๋‹ˆ์Šค ๋ถ„์„ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋ฏ€๋กœ ๋ชจ๋“  browser or mobile ์žฅ์น˜์—์„œ access ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ๊ฐํ™” ๋ฐ ํ’๋ถ€ํ•œ ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ์‰ฝ๊ฒŒ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ์Œ


Amazon Data Pipeline

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals

  • With AWS Data Pipeline, you can regularly access your data where itโ€™s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR
  • AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available
  • You donโ€™t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system
  • AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premises data silos.


AWS Glue (Prepare and Load Data

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics


๊ฐ„ํŽธํ•˜๊ณ  ์œ ์—ฐํ•˜๋ฉฐ ๋น„์šฉ ํšจ์œจ์ ์ธ ETL

  • You can create and run an ETL job with a few clicks in the AWS Management Console
  • You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog
  • Once cataloged, your data is immediately searchable, queryable, and available for ETL

์ž‘๋™ ๋ฐฉ์‹

  1. ๋ฐ์ดํ„ฐ ์›๋ณธ๊ณผ ๋ฐ์ดํ„ฐ ๋Œ€์ƒ์„ ์„ ํƒํ•จ
  2. AWS Glue๊ฐ€ Scala or Python์œผ๋กœ ETL code๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์›๋ณธ์œผ๋กœ๋ถ€ํ„ฐ data๋ฅผ ์ถ”์ถœํ•˜๊ณ , ํ•ด๋‹น data๋ฅผ schema์— ๋งž์ถฐ ๋ณ€ํ™˜ํ•˜๊ณ , ์ด๋ฅผ ๋Œ€์ƒ์œผ๋กœ loadใ…ใ…
  3. ์‚ฌ์šฉ์ž๋Š” console, ์„ ํ˜ธํ•˜๋Š” IDE or notebook์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๋‹น ์ฝ”๋“œ๋ฅผ ํŽธ์ง‘, debugging, test ํ•  ์ˆ˜ ์žˆ์Œ

Use Cases


1. Queries Against an Amazon S3 Data Lake

Queries against an Amazon S3 Data Lake diagram


2. Analyze Log Data in Your Data Warehouse

Analyze log data in your data warehouse diagram


3. Unified View of Your Data Across Multiple Data Stores

View of data across data stores diagram


4. Event-driven ETL Pipelines

Event-driven ETL pipelines diagram



AWS Lake Formation

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days


AWS Lake Formation How It Works


  • A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis.

  • A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions

  • Creating a data lake with Lake Formation is as simple as defining data sources and what data access and security policies you want to apply

  • Lake Formation then helps you

    • collect and catalog data from databases and object storage
    • move the data into your new Amazon S3 data lake
    • clean and classify your data using machine learning algorithms
    • secure access to your sensitive data
  • Your users can access a centralized data catalog which describes available data sets and their appropriate usage

  • Your users then leverage these data sets with their choice of analytics and machine learning services, like Amazon Redshift, Amazon Athena, and (in beta) Amazon EMR for Apache Spark

  • Lake Formation builds on the capabilities available in AWS Glue.


How it works

  • Identify existing data stores in S3 or relational and NoSQL databases, and move the data into your data lake
  • Crawl, catalog, and prepare the data for analytics
  • Then provide your users secure self-service access to the data through their choice of analytics services
  • Other AWS services and third-party applications can also access data through the services shown
  • Lake Formation manages all of the tasks in the orange box and is integrated with the data stores and services shown in the blue boxes.



Summary

  • AWS๋Š” ๋ถ„์„์„ ์œ„ํ•ด Data Lake๋ฅผ ๋น ๋ฅด๊ณ  ์‰ฝ๊ฒŒ ๊ตฌ์ถ•ํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ๋ชจ๋“  ๊ฒƒ์„ ์ œ๊ณตํ•˜๋Š” ํ†ตํ•ฉ ์„œ๋น„์Šค ์ œํ’ˆ๊ตฐ์„ ์ œ๊ณตํ•จ
  • Amazon S3๋Š” Data access๋ฅผ ์œ„ํ•œ ์•ˆ์ „ํ•˜๊ณ  ํ™•์žฅ์„ฑ์ด ๋›ฐ์–ด๋‚˜๊ณ  ๋ฐ€๋ฆฌ์ดˆ ์ง€์—ฐ ์‹œ๊ฐ„์˜ ๊ฐ์ฒด ์Šคํ† ๋ฆฌ์ง€ ์ด๋‹ค