-
Notifications
You must be signed in to change notification settings - Fork 0
/
Working with Multiple SAS Data Sets Key.sas
262 lines (213 loc) · 7.64 KB
/
Working with Multiple SAS Data Sets Key.sas
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
* ============================================================================;
* Lab: Working with Multiple SAS Data Sets
* This code is posted for your benefit. However, I highly recommend that you
* practice typing your own SAS programs as well. With the SAS programming
* language, as with all new languages, immersion seems to be the best way to
* learn.
* ============================================================================;
* Task 1.
* Download Cities SES.xlsx from Github and import all three sheets (Houston,
* Atlanta, and Charlotte) into SAS as separate SAS data sets. Name the SAS
* data sets Houston, Atlanta, and Charlotte.;
* ============================================================================;
proc import out = houston
datafile = "C:\Users\mbc0022\Desktop\City SES.xlsx"
dbms = xlsx replace;
sheet = "Houston";
run;
proc import out = atlanta
datafile = "C:\Users\mbc0022\Desktop\City SES.xlsx"
dbms = xlsx replace;
sheet = "Atlanta";
run;
proc import out = charlotte
datafile = "C:\Users\mbc0022\Desktop\City SES.xlsx"
dbms = xlsx replace;
sheet = "Charlotte";
run;
* Task 2.
* Vertically combine these three data sets. You may either concatenate them
* using a SAS DATA STEP or append them using PROC APPEND.
* Use the RENAME option to make the variable names in your output data match
* the variable names in the Houston data. This new SAS data set should be
* named Cities.;
* Create a variable in the Cities data set called site. The values of site
* should be equal to the city that data came from. For example, the value of
* site for all observations from the Houston data set should be “Houston.”
* First approach - concatenate;
* ============================================================================;
data cities;
set houston
atlanta (rename = (id = pid gender = sex))
charlotte (rename = (ses = ses_score));
length site $9;
if _N_ in (1:5) then site = "Houston";
else if _N_ in (6:10) then site = "Atlanta";
else site = "Charlotte";
run;
* Second approach - append;
* Note: PROC APPEND is computationally more efficient if you have very large
* data sets. We do not.;
* ============================================================================;
* First, delete the cities data we created above.;
proc datasets lib = work nolist;
delete cities;
quit;
* Now use PROC APPEND;
proc append
base = houston
data = atlanta (rename = (id = pid gender = sex));
run;
proc append
base = houston
data = charlotte (rename = (ses = ses_score));
run;
data cities;
set houston;
length site $9;
if _N_ in (1:5) then site = "Houston";
else if _N_ in (6:10) then site = "Atlanta";
else site = "Charlotte";
run;
* Task 3.
* Create a list report that displays the entire Cities data set. Give the
* report an appropriate title and a footnote with the current date and time.;
* ============================================================================;
title1 "List Report of the Combined Houston - Atlanta - Charlotte Data";
footnote "&sysdate at &systime";
proc print data = cities;
run;
* Task 4.
* Sort the Cities data set by PID in ascending order.;
* ============================================================================;
proc sort data = cities;
by pid;
run;
title1 "List Report of the Combined Houston - Atlanta - Charlotte Data";
title2 "Sorted by PID";
footnote "&sysdate at &systime";
proc print data = cities;
run;
* Third approach - SQL;
* We can accomplish Tasks 2-5 in one step.
* You aren't required to know this. Just an FYI.
* First, scroll up to the top of this code and create a clean copy of the
* Houston data set.;
* ============================================================================;
proc sql;
create table cities as
select *, "Houston" as site
from houston
union /* Default: concatenate by position. Use var names from first table */
select *, "Atlanta" as site
from atlanta
union
select *, "Charlotte" as site
from charlotte
order by pid;
title1 "Combined Houston - Atlanta - Charlotte Data";
title2 "Sorted by PID";
footnote "&sysdate at &systime";
select *
from cities;
quit;
* Task 6.
* Create a boxplot of SES Score by Site.;
* ============================================================================;
title1 "Boxplots of SES Score by Site";
footnote "&sysdate at &systime";
proc sgplot data = cities;
vbox ses_score / group = site;
run;
* Task 7.
* Go to the NHANES website. Download the following SAS Transport Files (.xpt)
* for the year 2013-2014:
* 1. Demographics
* 1.1 Demographic Variables and Sample Weights
* 2. Questionnaire
* 2.1 Medical Conditions
* 2.2 Smoking - Cigarette Use;
* Task 8. Convert these 3 SAS Transport Files into temporary SAS data sets.
* To do this, just double click on the transport files (.xpt files you
* downloaded).
* Task 9. Create a contents report of your work library. Familiarize yourself
* with the variables in each data set.;
* ============================================================================;
proc datasets lib = work nolist;
delete houston atlanta charlotte cities;
quit;
title;
proc contents data = work._all_;
run;
* Task 11.
* Merge the 3 data sets above into a new data set called NHANES.;
* ============================================================================;
data nhanes;
merge demo_h mcq_h smq_h;
by seqn;
run;
proc contents data = nhanes;
ods select attributes;
run;
* Second approach - SQL Join;
* ============================================================================;
proc sql;
create table nhanes_sql as
select *
from demo_h as d
full join
mcq_h as m
on d.seqn = m.seqn
full join
smq_h as s
on d.seqn = s.seqn;
quit;
proc contents data = nhanes_sql;
ods select attributes;
run;
* Task 13.
* Download the DFW Temps 2012 data from Github.
* You can do this in a DATA STEP by copying and pasting the file path into
* the SET statement. You could also create a library, using a LIBNAME
* statement, that points to the location of that file on your computer.
* ============================================================================;
data temps;
set "C:\Users\mbc0022\Desktop\dfwtemps2012.sas7bdat";
run;
* Task 14.
* Use a DATA STEP to reshape the DFW Temps 2012 data from wide to long.
* This will require using ARRAYS and DO LOOPS. Name the reshaped data set
* temps_long.
* ============================================================================;
data temps_long (keep = month day temp);
set temps;
array temps_array {*} temp_f1--temp_f31;
do i = 1 to dim(temps_array);
day = i;
temp = temps_array{i};
output;
end;
run;
* Task 15.
* Check temps_long for outlying/erroneous temperature values. You could do this
* in several ways (e.g., PROC MEANS).;
* ============================================================================;
proc means data = temps_long;
var temp;
run;
* Task 16.
* In the previous task, you should have found several temperature values that
* were equal to -999. These are definitely errors. Use a DATA STEP to recode
* these erroneous temperature values to missing.
* ============================================================================;
data temps_long;
set temps_long;
if temp = -999 then temp = .;
run;
* Task 17.
* Use temps_long to create a scatter plot. Map Months to the x-axis and
* Temperatures to the y-axis.
* ============================================================================;
proc sgplot data = temps_long;
scatter x = month y = temp;
run;