Skip to content

Seike1223/lecture

Repository files navigation

R 語言與資料科學導論

資料科學家的工作, 可以視為是一個探索、預測與解讀資料意義的互動歷程。而語言分析的工作, 在了解文本資料的語意與情緒表現上是重要的關鍵。本課程結合 了目前統計程式設計、文本分析與自然語言處理技術,希望以較為簡潔容易入門的設計與實際操作導引,希望可以讓毫無相關程式學習基礎的學生在本課程的帶領下,達到以下的學習目標:

  • 瞭解 R 語言的基本知識,以及它的軟體生態系。
  • 瞭解結構與非結構性資料的特性與預處理工作, 特別是針對中文文本中呈現的語言特性的處理方法。
  • 了解中文的語言特性與文本解析 (text analytics) 的基本概念。
  • 選擇適當的變數與特徵並加以合理調製,對之進行描述統計與視覺探勘, 針對不同的問題點與數據類型,找出適當的圖形表達與統計分析。
  • 學習簡易的自然語言處理與機器學習預測模式,並應用在自己關心的領域。
  • 學習實作資料科學專案與溝通表達。

What you will learn:

> R programming language and its ecosystem of package for data science
> Exploratory data analysis
> Text analytics and data mining concepts
> Light-weight Natural Language Processing and Machine Learning techniques

This is an introduction to the data science with R, focused on textual data in particular. In the course you'll learn the intertwined processes of data manipulation and visualization through the tidyverse framework, as well as other text analytics related packages.This is a suitable introduction for students who have no previous experience in programming and are interested in learning to perform data analysis on their field.

課程投影片 | slides

課綱

週次 時間 資料科學學習主題 實習課與程式學習進度
++++ +++++ ++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++
1 09/12 Orientation Rstudio installation, Markdown, Datacamp custom track
2 09/19 Introduction to Data Science and Text Analytics Rstudio environment and basics of R: variables, data types, built-in plot;
3 09/26 R data structures ggplot2
4 10/03 Preparing / Obtaining Data data structures; I/O, looping
5 10/10 Holiday (review of statistics)
6 10/17 Data Wrangling dplyr
7 10/24 Data Wrangling more on dplyr; tidying data
8 10/31 Exploratory data analysis statistics (descriptive, hypothesis testing, linear regression)
9 11/07 Mid-term exam
10 11/14 Exploratory data analysis encoding; string manipulation; regular expression
11 11/21 Corpus and NLP handling large textual data; Chinese word segmentation/POS
12 11/28 (Guest lecture) (Chinese AI-NLP forum)
13 12/05 Corpus and data modelling handling large Chinese textual data
14 12/12 Text analytics [I]: text classification and clustering modeling and evaluation
15 12/19 Text analytics [II]: sentiment analysis
16 12/26 Text analytics [III]: texts in the social media web crawling:HTML parsing (rvest)
17 01/02 Reporting and Presenting Data data communication
18 01/09 Final term project presentation and report due (oral and poster presentation)

教練團

謝舒凱 <shukaihsieh@g.ntu.edu.tw>
廖永賦 
Yolanda 
Jessy 
Joy

學習資源

Check website to know more. 如果您有修改建議,歡迎透過此連結或透過 GitHub issue 提供回饋,謝謝。

Releases

No releases published

Packages

No packages published