-
Notifications
You must be signed in to change notification settings - Fork 8
/
README.Rmd
134 lines (81 loc) · 4.33 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
## R 語言與資料科學導論
`資料科學家`的工作, 可以視為是一個探索、預測與解讀資料意義的互動歷程。而`語言分析`的工作, 在了解文本資料的語意與情緒表現上是重要的關鍵。本課程結合 了目前統計程式設計、文本分析與自然語言處理技術,希望以較為簡潔容易入門的設計與實際操作導引,希望可以讓毫無相關程式學習基礎的學生在本課程的帶領下,達到以下的學習目標:
- 瞭解 R 語言的基本知識。
- 瞭解結構與非結構性資料的特性與預處理工作, 特別是針對中文文本中呈現的語言特性的處理方法。
- <span style="color:blue; font-weight:bold">了解中文的語言特性與文本解析 (text analytics) 的基本概念。</span>
- 選擇適當的變數與特徵並加以合理調製,對之進行描述統計與視覺探勘, 針對不同的問題點與數據類型,找出適當的圖形表達與統計分析。
- 學習簡易的自然語言處理與機器學習預測模式,並應用在自己關心的領域。
- 學習實作資料科學專案與溝通表達。
```markdown
> R basics
> Advanced R and Statistics
> Text analytics
> Natural Language Processing and Machine Learning
```
This is an introduction to the data science with R, focused on textual data in particular. In the course you'll learn the intertwined processes of data manipulation and visualization through the tidyverse framework, as well as other text analytics related packages.This is a suitable introduction for students who have no previous experience in programming and are interested in learning to perform data analysis on their field.
## 課程投影片 | slides
- <span style="color:brown; font-weight:bold"> Week.1 </span>
- [slides](01/index.html)
- <span style="color:brown; font-weight:bold"> Week.2 </span>
- [slides](02/index.html)
- [R.notes.Rmd](02/week2_notes.Rmd)
## 課綱
```r
knitr::kable(head(iris), format = 'html')
```
Week | Date | Topic | Lab
-----|:------:| --- | ---
1 | 09/12 | Orientation | Rstudio, Markdown, Datacamp
2 | 09/19 | Introduction to Data Science and Text Analytics |
3 | 09/26 | overview; data types and structures |
4 | 10/03 | Preparing / Obtaining Data | data structures; built-in plot; looping
5 | 10/10 | Scrubbing Data | Data wrangling, vectorization, tidyverse
6 | 10/17 | Exploratory Data Analysis and Graphics | encoding; string processing; regular expression
7 | 10/24 | Exploratory Data analysis and Statistics | data manipulation (with regex)
8 | 10/31 | Exploratory Data analysis and Statistics | handling Chinese textual data
9 | 11/07 | **Mid-term exam** |
10 | 11/14 | Exploratory Data analysis and Statistics | web crawling:HTML parsing (rvest)
11 | 11/21 | Corpus and NLP |
12 | 11/28 | (Guest lecture) |
13 | 12/05 | Corpus and NLP |
14 | 12/12 | Text classification and clustering |
15 | 12/19 | Current Topics in Text Analytics | Shiny Web application [III]
16 | 12/26 | Reporting and Presenting Data | data communication
17 | 01/02 | Wrap-up |
18 | 01/09 | **Final term project presentation and report due** |
## 教練團
```coffee
謝舒凱 <[email protected]>
廖永賦
Yolanda
Jessy
Joy
```
## 助教時間
週三 12:30
## 學習資源
- Datacamp
- [線上學習資源](resources.html)
<!--
- [投影片網址](https://rlads2019.github.io/lecture/)
## 助教講義、習題與作業
- [評分標準](http://lope.linguistics.ntu.edu.tw/courses/data_science/grading_policy2016.html)
- [實習課網址](https://rlads2019.github.io/lab/)
## 課程教材 | lecture materials
在課程投影片中講解基本概念,如果有興趣了解進階內容,可參考以下線上教材
- [語言分析與資料科學](https://www.gitbook.com/book/loperntu/ladsbook/details)
- [開放語料庫:製程與分析](https://www.gitbook.com/book/loperntu/copens/details)
## 課程相關活動
- [NTU COOL]()
- [DataCamp]()
- [臉書社團](https://www.facebook.com/groups/652099794893097/)
## 課程精神
1. 自主學習
2. 跨學門協作
## 作業分數分佈圖
## 小組作業觀摩
## Capstone projects
- [分組名單]()
- [pttR 與總統大選]()
-->
Check [website](https://rlads2019.github.io/) to know more