* Manual Input of Data;
DATA Favorites;
INPUT Names $ Food $ Activity $;
DATALINES;
Andy Milkshakes Golf
Eleanor Cauliflower Skiing
Georgiana Anything Gymnastics
;
RUN;
PROC PRINT DATA=Favorites;
RUN;
The SAS System
Obs | Names | Food | Activity |
---|---|---|---|
1 | Andy | Milkshak | Golf |
2 | Eleanor | Cauliflo | Skiing |
3 | Georgian | Anything | Gymnasti |
* Manual Input of Data;
DATA Favorites;
INPUT Names :$15. Food :$15. Activity :$15.;
DATALINES;
Andy Milkshakes Golf
Eleanor Cauliflower Skiing
Georgiana Anything Gymnastics
;
RUN;
PROC PRINT DATA=Favorites;
RUN;
The SAS System
Obs | Names | Food | Activity |
---|---|---|---|
1 | Andy | Milkshakes | Golf |
2 | Eleanor | Cauliflower | Skiing |
3 | Georgiana | Anything | Gymnastics |
* SET LIBNAME (Note this is slightly different in SAS OnDemand, use the link we have seen before);
libname mydata '/folders/myfolders/';
45 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
46
47 * SET LIBNAME (Note this is slightly different in SAS OnDemand, use the link we have seen before);
48 libname mydata '/folders/myfolders/';
NOTE: Libref MYDATA was successfully assigned as follows:
Engine: V9
Physical Name: /folders/myfolders
49
50 ods html5 close;ods listing;
51
PROC MEANS DATA = mydata.housing;
RUN;
The SAS System
The MEANS Procedure
Variable | N | Mean | Std Dev | Minimum | Maximum |
---|---|---|---|---|---|
Living_Sq_Ft
Closing_Price
|
2000
2000
|
1840.32
244146.72
|
853.2350143
319263.47
|
488.0000000
20000.00
|
6963.00
7650000.00
|
PROC MEANS DATA = mydata.housing MAXDEC=0 MEAN RANGE STDDEV N MODE;
RUN;
The SAS System
The MEANS Procedure
Variable | Mean | Range | Std Dev | N | Mode |
---|---|---|---|---|---|
Living_Sq_Ft
Closing_Price
|
1840
244147
|
6475
7630000
|
853
319263
|
2000
2000
|
1008
125000
|
PROC MEANS DATA = mydata.housing MAXDEC=0 MEAN RANGE STDDEV N MODE;
VAR Living_Sq_Ft;
RUN;
The SAS System
The MEANS Procedure
Analysis Variable : Living_Sq_Ft | ||||
---|---|---|---|---|
Mean | Range | Std Dev | N | Mode |
1840 | 6475 | 853 | 2000 | 1008 |
PROC MEANS DATA = mydata.housing MAXDEC=0 MEAN RANGE STDDEV N MODE;
OUTPUT OUT = Summaries;
RUN;
The SAS System
The MEANS Procedure
Variable | Mean | Range | Std Dev | N | Mode |
---|---|---|---|---|---|
Living_Sq_Ft
Closing_Price
|
1840
244147
|
6475
7630000
|
853
319263
|
2000
2000
|
1008
125000
|
PROC PRINT DATA= Summaries;
TITLE 'DATA FILE FROM PROC MEANS';
RUN;
DATA FILE FROM PROC MEANS
Obs | _TYPE_ | _FREQ_ | _STAT_ | Living_Sq_Ft | Closing_Price |
---|---|---|---|---|---|
1 | 0 | 2000 | N | 2000 | 2000 |
2 | 0 | 2000 | MIN | 488 | 20000 |
3 | 0 | 2000 | MAX | 6963 | 7650000 |
4 | 0 | 2000 | MEAN | 1840.32 | 244146.7215 |
5 | 0 | 2000 | STD | 853.23501428 | 319263.46852 |
Data Bikes_Holiday;
set mydata.bikes;
if Holiday = 1;
RUN;
Proc Means DATA = Bikes_Holiday MEAN MEDIAN MODE RANGE;
VAR Count;
RUN;
DATA FILE FROM PROC MEANS
The MEANS Procedure
Analysis Variable : count | |||
---|---|---|---|
Mean | Median | Mode | Range |
185.8778135 | 133.0000000 | 4.0000000 | 711.0000000 |
PROC FREQ is a procedure for creating frequency tables. Each row in a frequency table corresponds to a level of a categorical variable.
PROC FREQ DATA = mydata.housing;
TABLES State;
RUN;
DATA FILE FROM PROC MEANS
The FREQ Procedure
State | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
CA | 180 | 9.00 | 180 | 9.00 |
CO | 80 | 4.00 | 260 | 13.00 |
CT | 141 | 7.05 | 401 | 20.05 |
DC | 20 | 1.00 | 421 | 21.05 |
FL | 140 | 7.00 | 561 | 28.05 |
GA | 60 | 3.00 | 621 | 31.05 |
HI | 20 | 1.00 | 641 | 32.05 |
IA | 40 | 2.00 | 681 | 34.05 |
ID | 20 | 1.00 | 701 | 35.05 |
IL | 39 | 1.95 | 740 | 37.00 |
IN | 60 | 3.00 | 800 | 40.00 |
KS | 20 | 1.00 | 820 | 41.00 |
KY | 20 | 1.00 | 840 | 42.00 |
MA | 123 | 6.15 | 963 | 48.15 |
ME | 20 | 1.00 | 983 | 49.15 |
MI | 60 | 3.00 | 1043 | 52.15 |
MN | 40 | 2.00 | 1083 | 54.15 |
MO | 20 | 1.00 | 1103 | 55.15 |
MT | 20 | 1.00 | 1123 | 56.15 |
NC | 160 | 8.00 | 1283 | 64.15 |
NE | 40 | 2.00 | 1323 | 66.15 |
NY | 60 | 3.00 | 1383 | 69.15 |
OH | 120 | 6.00 | 1503 | 75.15 |
OK | 20 | 1.00 | 1523 | 76.15 |
OR | 20 | 1.00 | 1543 | 77.15 |
PA | 137 | 6.85 | 1680 | 84.00 |
RI | 20 | 1.00 | 1700 | 85.00 |
TX | 120 | 6.00 | 1820 | 91.00 |
UT | 20 | 1.00 | 1840 | 92.00 |
VA | 80 | 4.00 | 1920 | 96.00 |
WA | 60 | 3.00 | 1980 | 99.00 |
WY | 20 | 1.00 | 2000 | 100.00 |
DATA housing;
SET mydata.housing;
RUN;
What do these two data steps do?
DATA virginia;
Set mydata.housing;
IF state = 'VA';
RUN;
DATA colorado;
Set mydata.housing;
IF state = 'CO';
RUN;
DATA virginia;
Set mydata.housing;
IF state = 'VA';
RUN;
DATA colorado;
Set mydata.housing;
IF state = 'CO';
RUN;
113 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
114
115 DATA virginia;
116 Set mydata.housing;
117 IF state = 'VA';
118 RUN;
NOTE: There were 2000 observations read from the data set MYDATA.HOUSING.
NOTE: The data set WORK.VIRGINIA has 80 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
119
120 DATA colorado;
121 Set mydata.housing;
122 IF state = 'CO';
123 RUN;
NOTE: There were 2000 observations read from the data set MYDATA.HOUSING.
NOTE: The data set WORK.COLORADO has 80 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
124 ods html5 close;ods listing;
125
In R to combine (or vertically stack two data frames) we used the rbind()
function.
This can also be done in a standard SAS DATA STEP.
DATA VA_CO;
SET virginia colorado;
RUN;
127 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
128
129 DATA VA_CO;
130 SET virginia colorado;
131 RUN;
NOTE: There were 80 observations read from the data set WORK.VIRGINIA.
NOTE: There were 80 observations read from the data set WORK.COLORADO.
NOTE: The data set WORK.VA_CO has 160 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
132
133 ods html5 close;ods listing;
134
PROC PRINT DATA=VA_CO;
RUN;
DATA FILE FROM PROC MEANS
Obs | City | State | Zip_Code | Living_Sq_Ft | Closing_Price |
---|---|---|---|---|---|
1 | PROVIDENC | VA | 23140 | 1621 | 224900 |
2 | PROVIDENC | VA | 23140 | 1972 | 195196 |
3 | PROVIDENC | VA | 23140 | 1796 | 159000 |
4 | PROVIDENC | VA | 23140 | 2718 | 409950 |
5 | PROVIDENC | VA | 23140 | 1482 | 57900 |
6 | PROVIDENC | VA | 23140 | 2759 | 40000 |
7 | PROVIDENC | VA | 23140 | 2522 | 370000 |
8 | PROVIDENC | VA | 23140 | 2072 | 179900 |
9 | PROVIDENC | VA | 23140 | 1212 | 90000 |
10 | PROVIDENC | VA | 23140 | 3604 | 494369 |
11 | PROVIDENC | VA | 23140 | 3180 | 125000 |
12 | PROVIDENC | VA | 23140 | 2635 | 450000 |
13 | PROVIDENC | VA | 23140 | 3087 | 118900 |
14 | PROVIDENC | VA | 23140 | 2632 | 352282 |
15 | PROVIDENC | VA | 23140 | 3072 | 379990 |
16 | PROVIDENC | VA | 23140 | 3882 | 284700 |
17 | PROVIDENC | VA | 23140 | 2352 | 520000 |
18 | PROVIDENC | VA | 23140 | 1365 | 306553 |
19 | PROVIDENC | VA | 23140 | 1425 | 135000 |
20 | PROVIDENC | VA | 23140 | 5986 | 849000 |
21 | MIDDLETOW | VA | 22645 | 2259 | 480123 |
22 | MIDDLETOW | VA | 22645 | 3084 | 511000 |
23 | MIDDLETOW | VA | 22645 | 1200 | 189900 |
24 | MIDDLETOW | VA | 22645 | 1031 | 145000 |
25 | MIDDLETOW | VA | 22645 | 960 | 42500 |
26 | MIDDLETOW | VA | 22645 | 2480 | 120000 |
27 | MIDDLETOW | VA | 22645 | 1217 | 350000 |
28 | MIDDLETOW | VA | 22645 | 2251 | 582000 |
29 | MIDDLETOW | VA | 22645 | 1895 | 90000 |
30 | MIDDLETOW | VA | 22645 | 1932 | 265500 |
31 | MIDDLETOW | VA | 22645 | 2128 | 483000 |
32 | MIDDLETOW | VA | 22645 | 1337 | 209950 |
33 | MIDDLETOW | VA | 22645 | 2618 | 229500 |
34 | MIDDLETOW | VA | 22645 | 1090 | 155000 |
35 | MIDDLETOW | VA | 22645 | 1739 | 135000 |
36 | MIDDLETOW | VA | 22645 | 3588 | 468553 |
37 | MIDDLETOW | VA | 22645 | 3096 | 484107 |
38 | MIDDLETOW | VA | 22645 | 1889 | 301000 |
39 | MIDDLETOW | VA | 22645 | 3826 | 310000 |
40 | MIDDLETOW | VA | 22645 | 2578 | 424000 |
41 | HENRICO | VA | 23238 | 1980 | 240000 |
42 | HENRICO | VA | 23238 | 5976 | 1075000 |
43 | HENRICO | VA | 23238 | 2107 | 315000 |
44 | HENRICO | VA | 23238 | 1562 | 234950 |
45 | HENRICO | VA | 23238 | 1616 | 254500 |
46 | HENRICO | VA | 23238 | 3001 | 475000 |
47 | HENRICO | VA | 23238 | 1760 | 262500 |
48 | HENRICO | VA | 23238 | 1980 | 329000 |
49 | HENRICO | VA | 23238 | 1572 | 162500 |
50 | HENRICO | VA | 23238 | 1370 | 174950 |
51 | HENRICO | VA | 23238 | 2943 | 280000 |
52 | HENRICO | VA | 23238 | 1320 | 177000 |
53 | HENRICO | VA | 23238 | 2207 | 253000 |
54 | HENRICO | VA | 23238 | 1809 | 329000 |
55 | HENRICO | VA | 23238 | 1080 | 155000 |
56 | HENRICO | VA | 23238 | 2176 | 275000 |
57 | HENRICO | VA | 23238 | 1360 | 153500 |
58 | HENRICO | VA | 23238 | 1397 | 199950 |
59 | RICHMOND | VA | 23238 | 4638 | 815000 |
60 | HENRICO | VA | 23238 | 1360 | 234500 |
61 | RICHMOND | VA | 23236 | 3413 | 365000 |
62 | RICHMOND | VA | 23236 | 1396 | 104100 |
63 | RICHMOND | VA | 23236 | 2450 | 261250 |
64 | RICHMOND | VA | 23236 | 1902 | 43670 |
65 | RICHMOND | VA | 23236 | 2230 | 200000 |
66 | RICHMOND | VA | 23236 | 1766 | 192400 |
67 | RICHMOND | VA | 23236 | 3317 | 410000 |
68 | RICHMOND | VA | 23236 | 1304 | 126500 |
69 | RICHMOND | VA | 23236 | 2115 | 220000 |
70 | RICHMOND | VA | 23236 | 1448 | 77500 |
71 | RICHMOND | VA | 23236 | 1254 | 155000 |
72 | RICHMOND | VA | 23236 | 1548 | 171350 |
73 | RICHMOND | VA | 23236 | 4082 | 489500 |
74 | RICHMOND | VA | 23236 | 1764 | 199950 |
75 | RICHMOND | VA | 23236 | 1446 | 115000 |
76 | RICHMOND | VA | 23236 | 1528 | 185000 |
77 | RICHMOND | VA | 23236 | 2170 | 279000 |
78 | RICHMOND | VA | 23236 | 3556 | 275000 |
79 | RICHMOND | VA | 23236 | 2860 | 328000 |
80 | RICHMOND | VA | 23236 | 1340 | 182000 |
81 | PEYTON | CO | 80831 | 2459 | 52000 |
82 | PEYTON | CO | 80831 | 1491 | 178900 |
83 | PEYTON | CO | 80831 | 2878 | 400000 |
84 | PEYTON | CO | 80831 | 2552 | 209000 |
85 | PEYTON | CO | 80831 | 2170 | 259900 |
86 | PEYTON | CO | 80831 | 1680 | 217895 |
87 | PEYTON | CO | 80831 | 2277 | 283000 |
88 | PEYTON | CO | 80831 | 3134 | 350000 |
89 | PEYTON | CO | 80831 | 2476 | 216000 |
90 | PEYTON | CO | 80831 | 3334 | 245000 |
91 | PEYTON | CO | 80831 | 3080 | 255000 |
92 | PEYTON | CO | 80831 | 3382 | 70938 |
93 | PEYTON | CO | 80831 | 3016 | 244800 |
94 | PEYTON | CO | 80831 | 2448 | 278750 |
95 | PEYTON | CO | 80831 | 2523 | 276000 |
96 | PEYTON | CO | 80831 | 2958 | 258850 |
97 | PEYTON | CO | 80831 | 2280 | 223008 |
98 | PEYTON | CO | 80831 | 3055 | 295000 |
99 | PEYTON | CO | 80831 | 2128 | 140000 |
100 | PEYTON | CO | 80831 | 2865 | 323000 |
101 | TABERNASH | CO | 80478 | 1080 | 214500 |
102 | TABERNASH | CO | 80478 | 3296 | 555000 |
103 | TABERNASH | CO | 80478 | 1272 | 287000 |
104 | TABERNASH | CO | 80478 | 1575 | 349900 |
105 | TABERNASH | CO | 80478 | 3290 | 890000 |
106 | TABERNASH | CO | 80478 | 2410 | 685000 |
107 | TABERNASH | CO | 80478 | 1956 | 224000 |
108 | TABERNASH | CO | 80478 | 5126 | 1335000 |
109 | TABERNASH | CO | 80478 | 1054 | 164000 |
110 | TABERNASH | CO | 80478 | 1387 | 299900 |
111 | TABERNASH | CO | 80478 | 3000 | 355000 |
112 | TABERNASH | CO | 80478 | 3469 | 601000 |
113 | TABERNASH | CO | 80478 | 2968 | 619000 |
114 | TABERNASH | CO | 80478 | 2466 | 463600 |
115 | TABERNASH | CO | 80478 | 1600 | 265000 |
116 | TABERNASH | CO | 80478 | 1742 | 259000 |
117 | TABERNASH | CO | 80478 | 3914 | 132500 |
118 | TABERNASH | CO | 80478 | 2768 | 740000 |
119 | TABERNASH | CO | 80478 | 2194 | 395000 |
120 | TABERNASH | CO | 80478 | 2473 | 120000 |
121 | CENTENNIA | CO | 80122 | 4843 | 601000 |
122 | CENTENNIA | CO | 80122 | 1850 | 200000 |
123 | CENTENNIA | CO | 80122 | 2018 | 198000 |
124 | CENTENNIA | CO | 80122 | 1714 | 248000 |
125 | CENTENNIA | CO | 80122 | 2737 | 208000 |
126 | CENTENNIA | CO | 80122 | 2240 | 252500 |
127 | CENTENNIA | CO | 80122 | 2535 | 289000 |
128 | CENTENNIA | CO | 80122 | 1886 | 275000 |
129 | LITTLETON | CO | 80122 | 2621 | 310000 |
130 | CENTENNIA | CO | 80122 | 3160 | 451000 |
131 | CENTENNIA | CO | 80122 | 2939 | 271000 |
132 | CENTENNIA | CO | 80122 | 2236 | 270000 |
133 | CENTENNIA | CO | 80122 | 2068 | 262500 |
134 | CENTENNIA | CO | 80122 | 1968 | 249900 |
135 | CENTENNIA | CO | 80122 | 2596 | 359925 |
136 | CENTENNIA | CO | 80122 | 1680 | 200000 |
137 | LITTLETON | CO | 80122 | 1465 | 172000 |
138 | CENTENNIA | CO | 80122 | 1936 | 257450 |
139 | CENTENNIA | CO | 80122 | 3385 | 682000 |
140 | CENTENNIA | CO | 80122 | 4776 | 486250 |
141 | SNOWMASS | CO | 81615 | 2342 | 2500000 |
142 | SNOWMASS | CO | 81615 | 3423 | 613299 |
143 | SNOWMASS | CO | 81615 | 2476 | 955000 |
144 | SNOWMASS | CO | 81615 | 5489 | 6550000 |
145 | SNOWMASS | CO | 81615 | 5598 | 7650000 |
146 | SNOWMASS | CO | 81615 | 1938 | 3341250 |
147 | SNOWMASS | CO | 81615 | 1328 | 336262 |
148 | SNOWMASS | CO | 81615 | 1217 | 1085100 |
149 | SNOWMASS | CO | 81615 | 1520 | 1700000 |
150 | SNOWMASS | CO | 81615 | 894 | 450000 |
151 | SNOWMASS | CO | 81615 | 3923 | 2100000 |
152 | SNOWMASS | CO | 81615 | 1206 | 620000 |
153 | SNOWMASS | CO | 81615 | 4811 | 2575000 |
154 | SNOWMASS | CO | 81615 | 1563 | 1883300 |
155 | SNOWMASS | CO | 81615 | 1278 | 765000 |
156 | SNOWMASS | CO | 81615 | 2316 | 1760000 |
157 | SNOWMASS | CO | 81615 | 1935 | 300000 |
158 | SNOWMASS | CO | 81615 | 1973 | 495000 |
159 | SNOWMASS | CO | 81615 | 1218 | 769200 |
160 | SNOWMASS | CO | 81615 | 676 | 908400 |
The DATA STEP can also be used to merge data sets using the following format:
DATA newdata;
MERGE data1 data2;
BY ID_VAR;
RUN;
DATA middlenames;
INPUT Names :$15. MiddleName :$15. ;
DATALINES;
Eleanor Larson
Georgiana Otelia
;
RUN;
143 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
144
145 DATA middlenames;
146 INPUT Names :$15. MiddleName :$15. ;
147 DATALINES;
NOTE: The data set WORK.MIDDLENAMES has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
150 ;
151 RUN;
152 ods html5 close;ods listing;
153
DATA COMBINED;
MERGE FAVORITES MIDDLENAMES;
BY NAMES;
RUN;
155 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
156
157 DATA COMBINED;
158 MERGE FAVORITES MIDDLENAMES;
159 BY NAMES;
160 RUN;
NOTE: There were 3 observations read from the data set WORK.FAVORITES.
NOTE: There were 2 observations read from the data set WORK.MIDDLENAMES.
NOTE: The data set WORK.COMBINED has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
161 ods html5 close;ods listing;
162
What does the following code return?
PROC PRINT DATA = COMBINED;
TITLE 'Hoegh Family';
RUN;
PROC PRINT DATA = COMBINED;
TITLE 'Hoegh Family';
RUN;
Hoegh Family
Obs | Names | Food | Activity | MiddleName |
---|---|---|---|---|
1 | Andy | Milkshakes | Golf | |
2 | Eleanor | Cauliflower | Skiing | Larson |
3 | Georgiana | Anything | Gymnastics | Otelia |
Using the shark attacks data set (attacks):
A single DATA step can be used to create multiple data sets.
DATA virginia colorado others;
SET mydata.housing;
IF state = 'CO' THEN OUTPUT colorado;
ELSE IF state = 'VA' THEN OUTPUT virginia;
ELSE OUTPUT others;
RUN;
172 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
173
174 DATA virginia colorado others;
175 SET mydata.housing;
176 IF state = 'CO' THEN OUTPUT colorado;
177 ELSE IF state = 'VA' THEN OUTPUT virginia;
178 ELSE OUTPUT others;
179 RUN;
NOTE: There were 2000 observations read from the data set MYDATA.HOUSING.
NOTE: The data set WORK.VIRGINIA has 80 observations and 5 variables.
NOTE: The data set WORK.COLORADO has 80 observations and 5 variables.
NOTE: The data set WORK.OTHERS has 1840 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
180 ods html5 close;ods listing;
181
Often in R we used the function seq()
particularly when thinking about creating graphics. A similar approach in R takes advantage of the DO
command.
DATA quadratic;
DO x=1 TO 10;
y= x ** 2;
OUTPUT;
END;
RUN;
183 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
184
185 DATA quadratic;
186 DO x=1 TO 10;
187 y= x ** 2;
188 OUTPUT;
189 END;
190 RUN;
NOTE: The data set WORK.QUADRATIC has 10 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
191 ods html5 close;ods listing;
192
PROC PRINT DATA=quadratic;
TITLE;
RUN;
Obs | x | y |
---|---|---|
1 | 1 | 1 |
2 | 2 | 4 |
3 | 3 | 9 |
4 | 4 | 16 |
5 | 5 | 25 |
6 | 6 | 36 |
7 | 7 | 49 |
8 | 8 | 64 |
9 | 9 | 81 |
10 | 10 | 100 |
Data set options can be used in both DATA and PROC statements. To use a data set option, put it between parentheses directly following the data set. The syntax follows as:
DATA newdata;
SET olddata (options here);
RUN;
PROC PRINT DATA = dataset (options here);
RUN;
DATA HoeghFamily;
SET COMBINED (KEEP = Food Activity MiddleName);
RUN;
202 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
203
204 DATA HoeghFamily;
205 SET COMBINED (KEEP = Food Activity MiddleName);
206 RUN;
NOTE: There were 3 observations read from the data set WORK.COMBINED.
NOTE: The data set WORK.HOEGHFAMILY has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
207 ods html5 close;ods listing;
208
Consider the following code, what will this print?
PROC PRINT DATA= HoeghFamily (Firstobs = 2);
RUN;
PROC PRINT DATA= HoeghFamily (Firstobs = 2);
RUN;
Obs | Food | Activity | MiddleName |
---|---|---|---|
2 | Cauliflower | Skiing | Larson |
3 | Anything | Gymnastics | Otelia |
SAS has built in variables that can be quite useful.
PROC SORT DATA=mydata.housing OUT=housing;
BY State;
RUN;
217 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
218
219 PROC SORT DATA=mydata.housing OUT=housing;
220 BY State;
221 RUN;
NOTE: There were 2000 observations read from the data set MYDATA.HOUSING.
NOTE: The data set WORK.HOUSING has 2000 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
222 ods html5 close;ods listing;
223
DATA HousingSample;
SET housing;
BY STATE;
IF FIRST.STATE =1;
RUN;
225 ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
226
227 DATA HousingSample;
228 SET housing;
229 BY STATE;
230 IF FIRST.STATE =1;
231 RUN;
NOTE: There were 2000 observations read from the data set WORK.HOUSING.
NOTE: The data set WORK.HOUSINGSAMPLE has 32 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
232 ods html5 close;ods listing;
233
PROC PRINT DATA=HousingSample;
RUN;
Obs | City | State | Zip_Code | Living_Sq_Ft | Closing_Price |
---|---|---|---|---|---|
1 | BODEGA BA | CA | 94923 | 1237 | 680000 |
2 | PEYTON | CO | 80831 | 2459 | 52000 |
3 | TOLLAND | CT | 60842 | 2083 | 298900 |
4 | WASHINGTO | DC | 20012 | 2345 | 626000 |
5 | LAKELAND | FL | 33812 | 1713 | 105000 |
6 | MCDONOUGH | GA | 30252 | 3066 | 369400 |
7 | HONOLULU | HI | 96826 | 1200 | 83900 |
8 | FAIRFIELD | IA | 52556 | 1740 | 111800 |
9 | CALDWELL | ID | 83607 | 1666 | 172225 |
10 | JOLIET | IL | 60433 | 1400 | 91000 |
11 | FORT WAYN | IN | 46805 | 1289 | 86773 |
12 | TOPEKA | KS | 66608 | 1431 | 104500 |
13 | BARDSTOWN | KY | 40004 | 2494 | 23700 |
14 | LOWELL | MA | 18501 | 3229 | 400000 |
15 | YORK | ME | 39095 | 2304 | 310230 |
16 | FLINT | MI | 48504 | 1358 | 50600 |
17 | LAKE CITY | MN | 55041 | 1463 | 299900 |
18 | SAINT LOU | MO | 63125 | 1048 | 165000 |
19 | LEWISTOWN | MT | 59457 | 2464 | 150920 |
20 | CHERRYVIL | NC | 28021 | 3362 | 47000 |
21 | LINCOLN | NE | 68524 | 1458 | 139000 |
22 | DRYDEN | NY | 13053 | 1104 | 114000 |
23 | MAUMEE | OH | 43537 | 3301 | 314500 |
24 | CLAREMORE | OK | 74019 | 1400 | 106500 |
25 | CAVE JUNC | OR | 97523 | 1440 | 51000 |
26 | PITTSBURG | PA | 15234 | 1450 | 119000 |
27 | CRANSTON | RI | 29051 | 1259 | 259000 |
28 | ALLEN | TX | 75002 | 1780 | 148500 |
29 | HUNTSVILL | UT | 84317 | 2910 | 520695 |
30 | PROVIDENC | VA | 23140 | 1621 | 224900 |
31 | VANCOUVER | WA | 98682 | 1508 | 175000 |
32 | ROCK SPRI | WY | 82901 | 1820 | 191893 |
Data USA;
set mydata.attacks;
if Country = 'USA';
RUN;
Data Australia;
set mydata.attacks;
if Country = 'AUSTRALIA';
RUN;
DATA Combined;
set USA Australia;
RUN;
Proc Freq Data = Combined;
TABLES Activity;
RUN;
The FREQ Procedure
Activity | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Bathing | 82 | 4.46 | 82 | 4.46 |
Diving | 56 | 3.04 | 138 | 7.50 |
Fishing | 220 | 11.96 | 358 | 19.46 |
Snorkeling | 44 | 2.39 | 402 | 21.85 |
Spearfishing | 118 | 6.41 | 520 | 28.26 |
Standing | 66 | 3.59 | 586 | 31.85 |
Surfing | 704 | 38.26 | 1290 | 70.11 |
Swimming | 438 | 23.80 | 1728 | 93.91 |
Wading | 112 | 6.09 | 1840 | 100.00 |
Next we will focus on statistical procedure available in SAS.
PROC UNIVARIATE produces statistics and graphs describing the distribution of a single variable.
The syntax follows as:
PROC UNIVARIATE DATA = yourdata;
VAR variblename;
RUN;
PROC UNIVARIATE DATA=housing;
VAR LIVING_SQ_FT;
RUN;
The UNIVARIATE Procedure
Variable: Living_Sq_Ft
Moments | |||
---|---|---|---|
N | 2000 | Sum Weights | 2000 |
Mean | 1840.32 | Sum Observations | 3680640 |
Std Deviation | 853.235014 | Variance | 728009.99 |
Skewness | 1.77833485 | Kurtosis | 4.87450208 |
Uncorrected SS | 8228847374 | Corrected SS | 1455291969 |
Coeff Variation | 46.363405 | Std Error Mean | 19.0789149 |
Basic Statistical Measures | |||
---|---|---|---|
Location | Variability | ||
Mean | 1840.320 | Std Deviation | 853.23501 |
Median | 1625.000 | Variance | 728010 |
Mode | 1008.000 | Range | 6475 |
Interquartile Range | 944.50000 |
Tests for Location: Mu0=0 | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Student's t | t | 96.45832 | Pr > |t| | <.0001 |
Sign | M | 1000 | Pr >= |M| | <.0001 |
Signed Rank | S | 1000500 | Pr >= |S| | <.0001 |
Quantiles (Definition 5) | |
---|---|
Level | Quantile |
100% Max | 6963.0 |
99% | 4838.5 |
95% | 3423.5 |
90% | 2938.0 |
75% Q3 | 2194.5 |
50% Median | 1625.0 |
25% Q1 | 1250.0 |
10% | 1019.0 |
5% | 904.5 |
1% | 704.0 |
0% Min | 488.0 |
Extreme Observations | |||
---|---|---|---|
Lowest | Highest | ||
Value | Obs | Value | Obs |
488 | 631 | 6046 | 977 |
540 | 639 | 6506 | 1766 |
576 | 122 | 6640 | 1647 |
594 | 158 | 6695 | 1812 |
596 | 637 | 6963 | 610 |
We will go in depth on SAS graphics next week, but many of the statistical procedure have built in options for creating graphics.
Syntax follows as:
PROC UNIVARIATE DATA=yourdata;
VAR yourvariable;
plot-request yourvariable/ options;
RUN;
PROC UNIVARIATE DATA=housing noprint;
VAR LIVING_SQ_FT;
HISTOGRAM LIVING_SQ_FT / Normal; * Normal creates a normal curve;
RUN;
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for Living_Sq_Ft
Parameters for Normal Distribution | ||
---|---|---|
Parameter | Symbol | Estimate |
Mean | Mu | 1840.32 |
Std Dev | Sigma | 853.235 |
Goodness-of-Fit Tests for Normal Distribution | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Kolmogorov-Smirnov | D | 0.1094403 | Pr > D | <0.010 |
Cramer-von Mises | W-Sq | 10.0771813 | Pr > W-Sq | <0.005 |
Anderson-Darling | A-Sq | 59.7937631 | Pr > A-Sq | <0.005 |
Quantiles for Normal Distribution | ||
---|---|---|
Percent | Quantile | |
Observed | Estimated | |
1.0 | 704.000 | -144.601 |
5.0 | 904.500 | 436.873 |
10.0 | 1019.000 | 746.855 |
25.0 | 1250.000 | 1264.822 |
50.0 | 1625.000 | 1840.320 |
75.0 | 2194.500 | 2415.818 |
90.0 | 2938.000 | 2933.785 |
95.0 | 3423.500 | 3243.767 |
99.0 | 4838.500 | 3825.241 |
PROC UNIVARIATE DATA=housing noprint;
VAR LIVING_SQ_FT;
HISTOGRAM LIVING_SQ_FT / Lognormal; * Normal creates a normal curve;
RUN;
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Lognormal Distribution for Living_Sq_Ft
Parameters for Lognormal Distribution | ||
---|---|---|
Parameter | Symbol | Estimate |
Threshold | Theta | 0 |
Scale | Zeta | 7.428118 |
Shape | Sigma | 0.413559 |
Mean | 1832.862 | |
Std Dev | 791.5927 |
Goodness-of-Fit Tests for Lognormal Distribution | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Kolmogorov-Smirnov | D | 0.03801564 | Pr > D | <0.010 |
Cramer-von Mises | W-Sq | 0.71048795 | Pr > W-Sq | <0.005 |
Anderson-Darling | A-Sq | 4.20760256 | Pr > A-Sq | <0.005 |
Quantiles for Lognormal Distribution | ||
---|---|---|
Percent | Quantile | |
Observed | Estimated | |
1.0 | 704.000 | 642.930 |
5.0 | 904.500 | 852.245 |
10.0 | 1019.000 | 990.412 |
25.0 | 1250.000 | 1273.057 |
50.0 | 1625.000 | 1682.638 |
75.0 | 2194.500 | 2223.994 |
90.0 | 2938.000 | 2858.682 |
95.0 | 3423.500 | 3322.135 |
99.0 | 4838.500 | 4403.704 |
In addition the options outlined earlier, here are a few more options built into PROC MEANS.
Q: Describe a t-test and give an example of when it might be used (ideally in the context of one of the data sets we have seen in this course).
This is useful to test whether the mean is significantly different from a hypothesis $H_0$ value of n.
PROC TTEST DATA=yourdata H0 = n options;
VAR variable;
RUN;
Options include:
PROC TTEST DATA=HOUSING H0 = 2500;
VAR LIVING_SQ_FT;
RUN;
The TTEST Procedure
Variable: Living_Sq_Ft
N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|
2000 | 1840.3 | 853.2 | 19.0789 | 488.0 | 6963.0 |
Mean | 95% CL Mean | Std Dev | 95% CL Std Dev | ||
---|---|---|---|---|---|
1840.3 | 1802.9 | 1877.7 | 853.2 | 827.6 | 880.5 |
DF | t Value | Pr > |t| |
---|---|---|
1999 | -34.58 | <.0001 |
To compare differences between two groups use this procedure:
PROC TTEST DATA=yourdata options;
CLASS variable;
VAR variable;
RUN;
Class is the categorical variable that you like to test differences.
PROC TTEST DATA=VA_CO;
CLASS State;
VAR Closing_Price;
RUN;
The TTEST Procedure
Variable: Closing_Price
State | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|
CO | 80 | 716647 | 1196747 | 133800 | 52000.0 | 7650000 |
VA | 80 | 281179 | 182367 | 20389.3 | 40000.0 | 1075000 |
Diff (1-2) | 435468 | 855997 | 135345 |
State | Method | Mean | 95% CL Mean | Std Dev | 95% CL Std Dev | ||
---|---|---|---|---|---|---|---|
CO | 716647 | 450324 | 982970 | 1196747 | 1035728 | 1417514 | |
VA | 281179 | 240595 | 321763 | 182367 | 157830 | 216009 | |
Diff (1-2) | Pooled | 435468 | 168149 | 702787 | 855997 | 771122 | 962032 |
Diff (1-2) | Satterthwaite | 435468 | 166256 | 704680 |
Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|
Pooled | Equal | 158 | 3.22 | 0.0016 |
Satterthwaite | Unequal | 82.667 | 3.22 | 0.0018 |
Equality of Variances | ||||
---|---|---|---|---|
Method | Num DF | Den DF | F Value | Pr > F |
Folded F | 79 | 79 | 43.06 | <.0001 |
Another type of t-test involves paired data.
PROC TTEST DATA=yourdata options;
PAIRED variable1 * variable2;
RUN;
The ttest procedure automatically creates a few graphics, but that can be controlled using the following syntax.
PROC TTEST DATA=yourdata PLOTS = (plot-request)
The plot requests are:
Proc Univariate Data = mydata.bikes;
Var Registered Casual;
HISTOGRAM Registered Casual / Normal; * Normal creates a normal curve;
run;
The UNIVARIATE Procedure
Variable: registered
Moments | |||
---|---|---|---|
N | 10886 | Sum Weights | 10886 |
Mean | 155.552177 | Sum Observations | 1693341 |
Std Deviation | 151.039033 | Variance | 22812.7895 |
Skewness | 1.52480459 | Kurtosis | 2.626081 |
Uncorrected SS | 511720093 | Corrected SS | 248317214 |
Coeff Variation | 97.0986301 | Std Error Mean | 1.44762152 |
Basic Statistical Measures | |||
---|---|---|---|
Location | Variability | ||
Mean | 155.5522 | Std Deviation | 151.03903 |
Median | 118.0000 | Variance | 22813 |
Mode | 3.0000 | Range | 886.00000 |
Interquartile Range | 186.00000 |
Tests for Location: Mu0=0 | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Student's t | t | 107.4536 | Pr > |t| | <.0001 |
Sign | M | 5435.5 | Pr >= |M| | <.0001 |
Signed Rank | S | 29547378 | Pr >= |S| | <.0001 |
Quantiles (Definition 5) | |
---|---|
Level | Quantile |
100% Max | 886 |
99% | 697 |
95% | 464 |
90% | 354 |
75% Q3 | 222 |
50% Median | 118 |
25% Q1 | 36 |
10% | 7 |
5% | 4 |
1% | 1 |
0% Min | 0 |
Extreme Observations | |||
---|---|---|---|
Lowest | Highest | ||
Value | Obs | Value | Obs |
0 | 4012 | 833 | 9585 |
0 | 3890 | 839 | 9897 |
0 | 3867 | 857 | 9298 |
0 | 3725 | 857 | 9753 |
0 | 3318 | 886 | 9346 |
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for registered
Parameters for Normal Distribution | ||
---|---|---|
Parameter | Symbol | Estimate |
Mean | Mu | 155.5522 |
Std Dev | Sigma | 151.039 |
Goodness-of-Fit Tests for Normal Distribution | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Kolmogorov-Smirnov | D | 0.151715 | Pr > D | <0.010 |
Cramer-von Mises | W-Sq | 59.670109 | Pr > W-Sq | <0.005 |
Anderson-Darling | A-Sq | 383.449398 | Pr > A-Sq | <0.005 |
Quantiles for Normal Distribution | ||
---|---|---|
Percent | Quantile | |
Observed | Estimated | |
1.0 | 1.00000 | -195.8172 |
5.0 | 4.00000 | -92.8849 |
10.0 | 7.00000 | -38.0121 |
25.0 | 36.00000 | 53.6779 |
50.0 | 118.00000 | 155.5522 |
75.0 | 222.00000 | 257.4265 |
90.0 | 354.00000 | 349.1165 |
95.0 | 464.00000 | 403.9893 |
99.0 | 697.00000 | 506.9215 |
The UNIVARIATE Procedure
Variable: casual
Moments | |||
---|---|---|---|
N | 10886 | Sum Weights | 10886 |
Mean | 36.0219548 | Sum Observations | 392135 |
Std Deviation | 49.9604766 | Variance | 2496.04922 |
Skewness | 2.4957484 | Kurtosis | 7.55162931 |
Uncorrected SS | 41294965 | Corrected SS | 27169495.8 |
Coeff Variation | 138.694518 | Std Error Mean | 0.47884219 |
Basic Statistical Measures | |||
---|---|---|---|
Location | Variability | ||
Mean | 36.02195 | Std Deviation | 49.96048 |
Median | 17.00000 | Variance | 2496 |
Mode | 0.00000 | Range | 367.00000 |
Interquartile Range | 45.00000 |
Tests for Location: Mu0=0 | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Student's t | t | 75.2272 | Pr > |t| | <.0001 |
Sign | M | 4950 | Pr >= |M| | <.0001 |
Signed Rank | S | 24504975 | Pr >= |S| | <.0001 |
Quantiles (Definition 5) | |
---|---|
Level | Quantile |
100% Max | 367 |
99% | 241 |
95% | 141 |
90% | 94 |
75% Q3 | 49 |
50% Median | 17 |
25% Q1 | 4 |
10% | 1 |
5% | 0 |
1% | 0 |
0% Min | 0 |
Extreme Observations | |||
---|---|---|---|
Lowest | Highest | ||
Value | Obs | Value | Obs |
0 | 10866 | 356 | 7687 |
0 | 10844 | 357 | 6729 |
0 | 10842 | 361 | 7686 |
0 | 10840 | 362 | 9652 |
0 | 10839 | 367 | 6730 |
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for casual
Parameters for Normal Distribution | ||
---|---|---|
Parameter | Symbol | Estimate |
Mean | Mu | 36.02195 |
Std Dev | Sigma | 49.96048 |
Goodness-of-Fit Tests for Normal Distribution | ||||
---|---|---|---|---|
Test | Statistic | p Value | ||
Kolmogorov-Smirnov | D | 0.235452 | Pr > D | <0.010 |
Cramer-von Mises | W-Sq | 171.574837 | Pr > W-Sq | <0.005 |
Anderson-Darling | A-Sq | 948.029130 | Pr > A-Sq | <0.005 |
Quantiles for Normal Distribution | ||
---|---|---|
Percent | Quantile | |
Observed | Estimated | |
1.0 | 0.000 | -80.20349 |
5.0 | 0.000 | -46.15572 |
10.0 | 1.000 | -28.00497 |
25.0 | 4.000 | 2.32413 |
50.0 | 17.000 | 36.02195 |
75.0 | 49.000 | 69.71978 |
90.0 | 94.000 | 100.04888 |
95.0 | 141.000 | 118.19963 |
99.0 | 241.000 | 152.24740 |
Similar to lm()
in R, we can also easily run regression in SAS.
PROC REG DATA=yourdata;
MODEL dependent = independent;
RUN;
Note including categorical variables can take a bit of extra work to create dummy variables (or use PROC GLM).
PROC REG DATA=housing;
MODEL CLOSING_PRICE = LIVING_SQ_FT;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: Closing_Price
Number of Observations Read | 2000 |
---|---|
Number of Observations Used | 2000 |
Analysis of Variance | |||||
---|---|---|---|---|---|
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 1 | 3.287656E13 | 3.287656E13 | 384.41 | <.0001 |
Error | 1998 | 1.708798E14 | 85525443800 | ||
Corrected Total | 1999 | 2.037564E14 |
Root MSE | 292447 | R-Square | 0.1614 |
---|---|---|---|
Dependent Mean | 244147 | Adj R-Sq | 0.1609 |
Coeff Var | 119.78344 |
Parameter Estimates | |||||
---|---|---|---|---|---|
Variable | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | 1 | -32459 | 15550 | -2.09 | 0.0370 |
Living_Sq_Ft | 1 | 150.30316 | 7.66607 | 19.61 | <.0001 |
The REG Procedure
Model: MODEL1
Dependent Variable: Closing_Price
Graphics can easily be created from PROC REG using the following syntax:
PROC REG DATA=yourdata PLOTS(options) = (plot request list);
MODEL dependent = independent;
RUN;
For the options trailing PLOTS use (ONLY) if you wish to omit the default plots.
Plot requests
PROC REG DATA=housing PLOTS(ONLY) = DIAGNOSTICS;
MODEL CLOSING_PRICE = LIVING_SQ_FT;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: Closing_Price
Number of Observations Read | 2000 |
---|---|
Number of Observations Used | 2000 |
Analysis of Variance | |||||
---|---|---|---|---|---|
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 1 | 3.287656E13 | 3.287656E13 | 384.41 | <.0001 |
Error | 1998 | 1.708798E14 | 85525443800 | ||
Corrected Total | 1999 | 2.037564E14 |
Root MSE | 292447 | R-Square | 0.1614 |
---|---|---|---|
Dependent Mean | 244147 | Adj R-Sq | 0.1609 |
Coeff Var | 119.78344 |
Parameter Estimates | |||||
---|---|---|---|---|---|
Variable | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | 1 | -32459 | 15550 | -2.09 | 0.0370 |
Living_Sq_Ft | 1 | 150.30316 | 7.66607 | 19.61 | <.0001 |
The REG Procedure
Model: MODEL1
Dependent Variable: Closing_Price
This procedure allows you to run an analysis of variance procedure with the following syntax:
PROC ANOVA DATA =yourdata;
CLASS classvariable;
MODEL dependent = effects;
RUN;
DATA ANDYLIVED;
SET HOUSING;
IF STATE in ('IA', 'MT','CO','MN','CA','VA');
RUN;
PROC ANOVA DATA=ANDYLIVED;
CLASS STATE;
MODEL CLOSING_PRICE = STATE;
RUN;
The ANOVA Procedure
Class Level Information | ||
---|---|---|
Class | Levels | Values |
State | 6 | CA CO IA MN MT VA |
Number of Observations Read | 440 |
---|---|
Number of Observations Used | 440 |
The ANOVA Procedure
Dependent Variable: Closing_Price
Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
Model | 5 | 1.43091E13 | 2.8618199E12 | 9.81 | <.0001 |
Error | 434 | 1.2654988E14 | 291589585117 | ||
Corrected Total | 439 | 1.4085898E14 |
R-Square | Coeff Var | Root MSE | Closing_Price Mean |
---|---|---|---|
0.101585 | 143.7899 | 539990.4 | 375541.3 |
Source | DF | Anova SS | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
State | 5 | 1.43091E13 | 2.8618199E12 | 9.81 | <.0001 |
Proc REG Data = mydata.bikes;
MODEL Count = season holiday temp workingday weather;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: count
Number of Observations Read | 10886 |
---|---|
Number of Observations Used | 10886 |
Analysis of Variance | |||||
---|---|---|---|---|---|
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 5 | 61261467 | 12252293 | 450.49 | <.0001 |
Error | 10880 | 295911447 | 27198 | ||
Corrected Total | 10885 | 357172914 |
Root MSE | 164.91738 | R-Square | 0.1715 |
---|---|---|---|
Dependent Mean | 191.57413 | Adj R-Sq | 0.1711 |
Coeff Var | 86.08541 |
Parameter Estimates | |||||
---|---|---|---|---|---|
Variable | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | 1 | 32.97402 | 6.68752 | 4.93 | <.0001 |
season | 1 | 11.16378 | 1.46727 | 7.61 | <.0001 |
holiday | 1 | -8.24676 | 9.80449 | -0.84 | 0.4003 |
temp | 1 | 8.61542 | 0.21053 | 40.92 | <.0001 |
workingday | 1 | 1.09982 | 3.50658 | 0.31 | 0.7538 |
weather | 1 | -31.15689 | 2.49998 | -12.46 | <.0001 |
The REG Procedure
Model: MODEL1
Dependent Variable: count