* Manual Input of Data;
DATA Favorites;
    INPUT Names $ Food $ Activity $;
        DATALINES;
    Andy Milkshakes Golf
    Eleanor Cauliflower Skiing
    Georgiana Anything Gymnastics
        ;
RUN;
PROC PRINT DATA=Favorites;
RUN;
The SAS System
| Obs | Names | Food | Activity | 
|---|---|---|---|
| 1 | Andy | Milkshak | Golf | 
| 2 | Eleanor | Cauliflo | Skiing | 
| 3 | Georgian | Anything | Gymnasti | 
* Manual Input of Data;
DATA Favorites;
    INPUT Names :$15. Food :$15. Activity :$15.;
        DATALINES;
    Andy Milkshakes Golf
    Eleanor Cauliflower Skiing
    Georgiana Anything Gymnastics
        ;
RUN;
PROC PRINT DATA=Favorites;
RUN;
The SAS System
| Obs | Names | Food | Activity | 
|---|---|---|---|
| 1 | Andy | Milkshakes | Golf | 
| 2 | Eleanor | Cauliflower | Skiing | 
| 3 | Georgiana | Anything | Gymnastics | 
* SET LIBNAME (Note this is slightly different in SAS OnDemand, use the link we have seen before);
libname mydata '/folders/myfolders/';
45   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
46   
47   * SET LIBNAME (Note this is slightly different in SAS OnDemand, use the link we have seen before);
48   libname mydata '/folders/myfolders/';
NOTE: Libref MYDATA was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: /folders/myfolders
49   
50   ods html5 close;ods listing;
51   
PROC MEANS DATA = mydata.housing;
RUN;
The SAS System
The MEANS Procedure
| Variable | N | Mean | Std Dev | Minimum | Maximum | 
|---|---|---|---|---|---|
| 
 Living_Sq_Ft 
Closing_Price 
 | 
 2000 
2000 
 | 
 1840.32 
244146.72 
 | 
 853.2350143 
319263.47 
 | 
 488.0000000 
20000.00 
 | 
 6963.00 
7650000.00 
 | 
PROC MEANS DATA = mydata.housing MAXDEC=0 MEAN RANGE STDDEV N MODE;
RUN;
The SAS System
The MEANS Procedure
| Variable | Mean | Range | Std Dev | N | Mode | 
|---|---|---|---|---|---|
| 
 Living_Sq_Ft 
Closing_Price 
 | 
 1840 
244147 
 | 
 6475 
7630000 
 | 
 853 
319263 
 | 
 2000 
2000 
 | 
 1008 
125000 
 | 
PROC MEANS DATA = mydata.housing MAXDEC=0 MEAN RANGE STDDEV N MODE;
    VAR Living_Sq_Ft;
RUN;
The SAS System
The MEANS Procedure
| Analysis Variable : Living_Sq_Ft | ||||
|---|---|---|---|---|
| Mean | Range | Std Dev | N | Mode | 
| 1840 | 6475 | 853 | 2000 | 1008 | 
PROC MEANS DATA = mydata.housing MAXDEC=0 MEAN RANGE STDDEV N MODE;
    OUTPUT OUT = Summaries;
RUN;
The SAS System
The MEANS Procedure
| Variable | Mean | Range | Std Dev | N | Mode | 
|---|---|---|---|---|---|
| 
 Living_Sq_Ft 
Closing_Price 
 | 
 1840 
244147 
 | 
 6475 
7630000 
 | 
 853 
319263 
 | 
 2000 
2000 
 | 
 1008 
125000 
 | 
PROC PRINT DATA= Summaries;
    TITLE 'DATA FILE FROM PROC MEANS';
RUN;
DATA FILE FROM PROC MEANS
| Obs | _TYPE_ | _FREQ_ | _STAT_ | Living_Sq_Ft | Closing_Price | 
|---|---|---|---|---|---|
| 1 | 0 | 2000 | N | 2000 | 2000 | 
| 2 | 0 | 2000 | MIN | 488 | 20000 | 
| 3 | 0 | 2000 | MAX | 6963 | 7650000 | 
| 4 | 0 | 2000 | MEAN | 1840.32 | 244146.7215 | 
| 5 | 0 | 2000 | STD | 853.23501428 | 319263.46852 | 
Data Bikes_Holiday;
	set mydata.bikes;
	if Holiday = 1;
RUN;
Proc Means DATA = Bikes_Holiday MEAN MEDIAN MODE RANGE;
	VAR Count;
RUN;
DATA FILE FROM PROC MEANS
The MEANS Procedure
| Analysis Variable : count | |||
|---|---|---|---|
| Mean | Median | Mode | Range | 
| 185.8778135 | 133.0000000 | 4.0000000 | 711.0000000 | 
PROC FREQ is a procedure for creating frequency tables. Each row in a frequency table corresponds to a level of a categorical variable.
PROC FREQ DATA = mydata.housing;
    TABLES State;
RUN;
DATA FILE FROM PROC MEANS
The FREQ Procedure
| State | Frequency | Percent | Cumulative Frequency  | 
Cumulative Percent  | 
|---|---|---|---|---|
| CA | 180 | 9.00 | 180 | 9.00 | 
| CO | 80 | 4.00 | 260 | 13.00 | 
| CT | 141 | 7.05 | 401 | 20.05 | 
| DC | 20 | 1.00 | 421 | 21.05 | 
| FL | 140 | 7.00 | 561 | 28.05 | 
| GA | 60 | 3.00 | 621 | 31.05 | 
| HI | 20 | 1.00 | 641 | 32.05 | 
| IA | 40 | 2.00 | 681 | 34.05 | 
| ID | 20 | 1.00 | 701 | 35.05 | 
| IL | 39 | 1.95 | 740 | 37.00 | 
| IN | 60 | 3.00 | 800 | 40.00 | 
| KS | 20 | 1.00 | 820 | 41.00 | 
| KY | 20 | 1.00 | 840 | 42.00 | 
| MA | 123 | 6.15 | 963 | 48.15 | 
| ME | 20 | 1.00 | 983 | 49.15 | 
| MI | 60 | 3.00 | 1043 | 52.15 | 
| MN | 40 | 2.00 | 1083 | 54.15 | 
| MO | 20 | 1.00 | 1103 | 55.15 | 
| MT | 20 | 1.00 | 1123 | 56.15 | 
| NC | 160 | 8.00 | 1283 | 64.15 | 
| NE | 40 | 2.00 | 1323 | 66.15 | 
| NY | 60 | 3.00 | 1383 | 69.15 | 
| OH | 120 | 6.00 | 1503 | 75.15 | 
| OK | 20 | 1.00 | 1523 | 76.15 | 
| OR | 20 | 1.00 | 1543 | 77.15 | 
| PA | 137 | 6.85 | 1680 | 84.00 | 
| RI | 20 | 1.00 | 1700 | 85.00 | 
| TX | 120 | 6.00 | 1820 | 91.00 | 
| UT | 20 | 1.00 | 1840 | 92.00 | 
| VA | 80 | 4.00 | 1920 | 96.00 | 
| WA | 60 | 3.00 | 1980 | 99.00 | 
| WY | 20 | 1.00 | 2000 | 100.00 | 
DATA housing;
    SET mydata.housing;
RUN;
What do these two data steps do?
DATA virginia;
    Set mydata.housing;
    IF state = 'VA';
RUN;
DATA colorado;
    Set mydata.housing;
    IF state = 'CO';
RUN;
DATA virginia;
    Set mydata.housing;
    IF state = 'VA';
RUN;
DATA colorado;
    Set mydata.housing;
    IF state = 'CO';
RUN;
113  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
114  
115  DATA virginia;
116      Set mydata.housing;
117      IF state = 'VA';
118  RUN;
NOTE: There were 2000 observations read from the data set MYDATA.HOUSING.
NOTE: The data set WORK.VIRGINIA has 80 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      
119  
120  DATA colorado;
121      Set mydata.housing;
122      IF state = 'CO';
123  RUN;
NOTE: There were 2000 observations read from the data set MYDATA.HOUSING.
NOTE: The data set WORK.COLORADO has 80 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
124  ods html5 close;ods listing;
125  
In R to combine (or vertically stack two data frames) we used the rbind() function.
This can also be done in a standard SAS DATA STEP.
DATA VA_CO;
    SET virginia colorado;
RUN;
127  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
128  
129  DATA VA_CO;
130      SET virginia colorado;
131  RUN;
NOTE: There were 80 observations read from the data set WORK.VIRGINIA.
NOTE: There were 80 observations read from the data set WORK.COLORADO.
NOTE: The data set WORK.VA_CO has 160 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
132  
133  ods html5 close;ods listing;
134  
PROC PRINT DATA=VA_CO;
RUN;
DATA FILE FROM PROC MEANS
| Obs | City | State | Zip_Code | Living_Sq_Ft | Closing_Price | 
|---|---|---|---|---|---|
| 1 | PROVIDENC | VA | 23140 | 1621 | 224900 | 
| 2 | PROVIDENC | VA | 23140 | 1972 | 195196 | 
| 3 | PROVIDENC | VA | 23140 | 1796 | 159000 | 
| 4 | PROVIDENC | VA | 23140 | 2718 | 409950 | 
| 5 | PROVIDENC | VA | 23140 | 1482 | 57900 | 
| 6 | PROVIDENC | VA | 23140 | 2759 | 40000 | 
| 7 | PROVIDENC | VA | 23140 | 2522 | 370000 | 
| 8 | PROVIDENC | VA | 23140 | 2072 | 179900 | 
| 9 | PROVIDENC | VA | 23140 | 1212 | 90000 | 
| 10 | PROVIDENC | VA | 23140 | 3604 | 494369 | 
| 11 | PROVIDENC | VA | 23140 | 3180 | 125000 | 
| 12 | PROVIDENC | VA | 23140 | 2635 | 450000 | 
| 13 | PROVIDENC | VA | 23140 | 3087 | 118900 | 
| 14 | PROVIDENC | VA | 23140 | 2632 | 352282 | 
| 15 | PROVIDENC | VA | 23140 | 3072 | 379990 | 
| 16 | PROVIDENC | VA | 23140 | 3882 | 284700 | 
| 17 | PROVIDENC | VA | 23140 | 2352 | 520000 | 
| 18 | PROVIDENC | VA | 23140 | 1365 | 306553 | 
| 19 | PROVIDENC | VA | 23140 | 1425 | 135000 | 
| 20 | PROVIDENC | VA | 23140 | 5986 | 849000 | 
| 21 | MIDDLETOW | VA | 22645 | 2259 | 480123 | 
| 22 | MIDDLETOW | VA | 22645 | 3084 | 511000 | 
| 23 | MIDDLETOW | VA | 22645 | 1200 | 189900 | 
| 24 | MIDDLETOW | VA | 22645 | 1031 | 145000 | 
| 25 | MIDDLETOW | VA | 22645 | 960 | 42500 | 
| 26 | MIDDLETOW | VA | 22645 | 2480 | 120000 | 
| 27 | MIDDLETOW | VA | 22645 | 1217 | 350000 | 
| 28 | MIDDLETOW | VA | 22645 | 2251 | 582000 | 
| 29 | MIDDLETOW | VA | 22645 | 1895 | 90000 | 
| 30 | MIDDLETOW | VA | 22645 | 1932 | 265500 | 
| 31 | MIDDLETOW | VA | 22645 | 2128 | 483000 | 
| 32 | MIDDLETOW | VA | 22645 | 1337 | 209950 | 
| 33 | MIDDLETOW | VA | 22645 | 2618 | 229500 | 
| 34 | MIDDLETOW | VA | 22645 | 1090 | 155000 | 
| 35 | MIDDLETOW | VA | 22645 | 1739 | 135000 | 
| 36 | MIDDLETOW | VA | 22645 | 3588 | 468553 | 
| 37 | MIDDLETOW | VA | 22645 | 3096 | 484107 | 
| 38 | MIDDLETOW | VA | 22645 | 1889 | 301000 | 
| 39 | MIDDLETOW | VA | 22645 | 3826 | 310000 | 
| 40 | MIDDLETOW | VA | 22645 | 2578 | 424000 | 
| 41 | HENRICO | VA | 23238 | 1980 | 240000 | 
| 42 | HENRICO | VA | 23238 | 5976 | 1075000 | 
| 43 | HENRICO | VA | 23238 | 2107 | 315000 | 
| 44 | HENRICO | VA | 23238 | 1562 | 234950 | 
| 45 | HENRICO | VA | 23238 | 1616 | 254500 | 
| 46 | HENRICO | VA | 23238 | 3001 | 475000 | 
| 47 | HENRICO | VA | 23238 | 1760 | 262500 | 
| 48 | HENRICO | VA | 23238 | 1980 | 329000 | 
| 49 | HENRICO | VA | 23238 | 1572 | 162500 | 
| 50 | HENRICO | VA | 23238 | 1370 | 174950 | 
| 51 | HENRICO | VA | 23238 | 2943 | 280000 | 
| 52 | HENRICO | VA | 23238 | 1320 | 177000 | 
| 53 | HENRICO | VA | 23238 | 2207 | 253000 | 
| 54 | HENRICO | VA | 23238 | 1809 | 329000 | 
| 55 | HENRICO | VA | 23238 | 1080 | 155000 | 
| 56 | HENRICO | VA | 23238 | 2176 | 275000 | 
| 57 | HENRICO | VA | 23238 | 1360 | 153500 | 
| 58 | HENRICO | VA | 23238 | 1397 | 199950 | 
| 59 | RICHMOND | VA | 23238 | 4638 | 815000 | 
| 60 | HENRICO | VA | 23238 | 1360 | 234500 | 
| 61 | RICHMOND | VA | 23236 | 3413 | 365000 | 
| 62 | RICHMOND | VA | 23236 | 1396 | 104100 | 
| 63 | RICHMOND | VA | 23236 | 2450 | 261250 | 
| 64 | RICHMOND | VA | 23236 | 1902 | 43670 | 
| 65 | RICHMOND | VA | 23236 | 2230 | 200000 | 
| 66 | RICHMOND | VA | 23236 | 1766 | 192400 | 
| 67 | RICHMOND | VA | 23236 | 3317 | 410000 | 
| 68 | RICHMOND | VA | 23236 | 1304 | 126500 | 
| 69 | RICHMOND | VA | 23236 | 2115 | 220000 | 
| 70 | RICHMOND | VA | 23236 | 1448 | 77500 | 
| 71 | RICHMOND | VA | 23236 | 1254 | 155000 | 
| 72 | RICHMOND | VA | 23236 | 1548 | 171350 | 
| 73 | RICHMOND | VA | 23236 | 4082 | 489500 | 
| 74 | RICHMOND | VA | 23236 | 1764 | 199950 | 
| 75 | RICHMOND | VA | 23236 | 1446 | 115000 | 
| 76 | RICHMOND | VA | 23236 | 1528 | 185000 | 
| 77 | RICHMOND | VA | 23236 | 2170 | 279000 | 
| 78 | RICHMOND | VA | 23236 | 3556 | 275000 | 
| 79 | RICHMOND | VA | 23236 | 2860 | 328000 | 
| 80 | RICHMOND | VA | 23236 | 1340 | 182000 | 
| 81 | PEYTON | CO | 80831 | 2459 | 52000 | 
| 82 | PEYTON | CO | 80831 | 1491 | 178900 | 
| 83 | PEYTON | CO | 80831 | 2878 | 400000 | 
| 84 | PEYTON | CO | 80831 | 2552 | 209000 | 
| 85 | PEYTON | CO | 80831 | 2170 | 259900 | 
| 86 | PEYTON | CO | 80831 | 1680 | 217895 | 
| 87 | PEYTON | CO | 80831 | 2277 | 283000 | 
| 88 | PEYTON | CO | 80831 | 3134 | 350000 | 
| 89 | PEYTON | CO | 80831 | 2476 | 216000 | 
| 90 | PEYTON | CO | 80831 | 3334 | 245000 | 
| 91 | PEYTON | CO | 80831 | 3080 | 255000 | 
| 92 | PEYTON | CO | 80831 | 3382 | 70938 | 
| 93 | PEYTON | CO | 80831 | 3016 | 244800 | 
| 94 | PEYTON | CO | 80831 | 2448 | 278750 | 
| 95 | PEYTON | CO | 80831 | 2523 | 276000 | 
| 96 | PEYTON | CO | 80831 | 2958 | 258850 | 
| 97 | PEYTON | CO | 80831 | 2280 | 223008 | 
| 98 | PEYTON | CO | 80831 | 3055 | 295000 | 
| 99 | PEYTON | CO | 80831 | 2128 | 140000 | 
| 100 | PEYTON | CO | 80831 | 2865 | 323000 | 
| 101 | TABERNASH | CO | 80478 | 1080 | 214500 | 
| 102 | TABERNASH | CO | 80478 | 3296 | 555000 | 
| 103 | TABERNASH | CO | 80478 | 1272 | 287000 | 
| 104 | TABERNASH | CO | 80478 | 1575 | 349900 | 
| 105 | TABERNASH | CO | 80478 | 3290 | 890000 | 
| 106 | TABERNASH | CO | 80478 | 2410 | 685000 | 
| 107 | TABERNASH | CO | 80478 | 1956 | 224000 | 
| 108 | TABERNASH | CO | 80478 | 5126 | 1335000 | 
| 109 | TABERNASH | CO | 80478 | 1054 | 164000 | 
| 110 | TABERNASH | CO | 80478 | 1387 | 299900 | 
| 111 | TABERNASH | CO | 80478 | 3000 | 355000 | 
| 112 | TABERNASH | CO | 80478 | 3469 | 601000 | 
| 113 | TABERNASH | CO | 80478 | 2968 | 619000 | 
| 114 | TABERNASH | CO | 80478 | 2466 | 463600 | 
| 115 | TABERNASH | CO | 80478 | 1600 | 265000 | 
| 116 | TABERNASH | CO | 80478 | 1742 | 259000 | 
| 117 | TABERNASH | CO | 80478 | 3914 | 132500 | 
| 118 | TABERNASH | CO | 80478 | 2768 | 740000 | 
| 119 | TABERNASH | CO | 80478 | 2194 | 395000 | 
| 120 | TABERNASH | CO | 80478 | 2473 | 120000 | 
| 121 | CENTENNIA | CO | 80122 | 4843 | 601000 | 
| 122 | CENTENNIA | CO | 80122 | 1850 | 200000 | 
| 123 | CENTENNIA | CO | 80122 | 2018 | 198000 | 
| 124 | CENTENNIA | CO | 80122 | 1714 | 248000 | 
| 125 | CENTENNIA | CO | 80122 | 2737 | 208000 | 
| 126 | CENTENNIA | CO | 80122 | 2240 | 252500 | 
| 127 | CENTENNIA | CO | 80122 | 2535 | 289000 | 
| 128 | CENTENNIA | CO | 80122 | 1886 | 275000 | 
| 129 | LITTLETON | CO | 80122 | 2621 | 310000 | 
| 130 | CENTENNIA | CO | 80122 | 3160 | 451000 | 
| 131 | CENTENNIA | CO | 80122 | 2939 | 271000 | 
| 132 | CENTENNIA | CO | 80122 | 2236 | 270000 | 
| 133 | CENTENNIA | CO | 80122 | 2068 | 262500 | 
| 134 | CENTENNIA | CO | 80122 | 1968 | 249900 | 
| 135 | CENTENNIA | CO | 80122 | 2596 | 359925 | 
| 136 | CENTENNIA | CO | 80122 | 1680 | 200000 | 
| 137 | LITTLETON | CO | 80122 | 1465 | 172000 | 
| 138 | CENTENNIA | CO | 80122 | 1936 | 257450 | 
| 139 | CENTENNIA | CO | 80122 | 3385 | 682000 | 
| 140 | CENTENNIA | CO | 80122 | 4776 | 486250 | 
| 141 | SNOWMASS | CO | 81615 | 2342 | 2500000 | 
| 142 | SNOWMASS | CO | 81615 | 3423 | 613299 | 
| 143 | SNOWMASS | CO | 81615 | 2476 | 955000 | 
| 144 | SNOWMASS | CO | 81615 | 5489 | 6550000 | 
| 145 | SNOWMASS | CO | 81615 | 5598 | 7650000 | 
| 146 | SNOWMASS | CO | 81615 | 1938 | 3341250 | 
| 147 | SNOWMASS | CO | 81615 | 1328 | 336262 | 
| 148 | SNOWMASS | CO | 81615 | 1217 | 1085100 | 
| 149 | SNOWMASS | CO | 81615 | 1520 | 1700000 | 
| 150 | SNOWMASS | CO | 81615 | 894 | 450000 | 
| 151 | SNOWMASS | CO | 81615 | 3923 | 2100000 | 
| 152 | SNOWMASS | CO | 81615 | 1206 | 620000 | 
| 153 | SNOWMASS | CO | 81615 | 4811 | 2575000 | 
| 154 | SNOWMASS | CO | 81615 | 1563 | 1883300 | 
| 155 | SNOWMASS | CO | 81615 | 1278 | 765000 | 
| 156 | SNOWMASS | CO | 81615 | 2316 | 1760000 | 
| 157 | SNOWMASS | CO | 81615 | 1935 | 300000 | 
| 158 | SNOWMASS | CO | 81615 | 1973 | 495000 | 
| 159 | SNOWMASS | CO | 81615 | 1218 | 769200 | 
| 160 | SNOWMASS | CO | 81615 | 676 | 908400 | 
The DATA STEP can also be used to merge data sets using the following format:
DATA newdata;
    MERGE data1 data2;
    BY ID_VAR;
RUN;
DATA middlenames;
    INPUT Names :$15. MiddleName :$15. ;
        DATALINES;
    Eleanor Larson
    Georgiana Otelia
        ;
RUN;
143  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
144  
145  DATA middlenames;
146      INPUT Names :$15. MiddleName :$15. ;
147          DATALINES;
NOTE: The data set WORK.MIDDLENAMES has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
150          ;
151  RUN;
152  ods html5 close;ods listing;
153  
DATA COMBINED;
    MERGE FAVORITES MIDDLENAMES;
    BY NAMES;
RUN;
155  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
156  
157  DATA COMBINED;
158      MERGE FAVORITES MIDDLENAMES;
159      BY NAMES;
160  RUN;
NOTE: There were 3 observations read from the data set WORK.FAVORITES.
NOTE: There were 2 observations read from the data set WORK.MIDDLENAMES.
NOTE: The data set WORK.COMBINED has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      
161  ods html5 close;ods listing;
162  
What does the following code return?
PROC PRINT DATA = COMBINED;
    TITLE 'Hoegh Family';
RUN;
PROC PRINT DATA = COMBINED;
    TITLE 'Hoegh Family';
RUN;
Hoegh Family
| Obs | Names | Food | Activity | MiddleName | 
|---|---|---|---|---|
| 1 | Andy | Milkshakes | Golf | |
| 2 | Eleanor | Cauliflower | Skiing | Larson | 
| 3 | Georgiana | Anything | Gymnastics | Otelia | 
Using the shark attacks data set (attacks):
A single DATA step can be used to create multiple data sets.
DATA virginia colorado others;
    SET mydata.housing;
    IF state = 'CO' THEN OUTPUT colorado;
        ELSE IF state = 'VA' THEN OUTPUT virginia;
        ELSE OUTPUT others;
RUN;
172  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
173  
174  DATA virginia colorado others;
175      SET mydata.housing;
176      IF state = 'CO' THEN OUTPUT colorado;
177          ELSE IF state = 'VA' THEN OUTPUT virginia;
178          ELSE OUTPUT others;
179  RUN;
NOTE: There were 2000 observations read from the data set MYDATA.HOUSING.
NOTE: The data set WORK.VIRGINIA has 80 observations and 5 variables.
NOTE: The data set WORK.COLORADO has 80 observations and 5 variables.
NOTE: The data set WORK.OTHERS has 1840 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
180  ods html5 close;ods listing;
181  
Often in R we used the function seq() particularly when thinking about creating graphics. A similar approach in R takes advantage of the DO command.
DATA quadratic;
    DO x=1 TO 10;
        y= x ** 2;
        OUTPUT;
    END;
RUN;
183  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
184  
185  DATA quadratic;
186      DO x=1 TO 10;
187          y= x ** 2;
188          OUTPUT;
189      END;
190  RUN;
NOTE: The data set WORK.QUADRATIC has 10 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
191  ods html5 close;ods listing;
192  
PROC PRINT DATA=quadratic;
TITLE;
RUN;
| Obs | x | y | 
|---|---|---|
| 1 | 1 | 1 | 
| 2 | 2 | 4 | 
| 3 | 3 | 9 | 
| 4 | 4 | 16 | 
| 5 | 5 | 25 | 
| 6 | 6 | 36 | 
| 7 | 7 | 49 | 
| 8 | 8 | 64 | 
| 9 | 9 | 81 | 
| 10 | 10 | 100 | 
Data set options can be used in both DATA and PROC statements. To use a data set option, put it between parentheses directly following the data set. The syntax follows as:
DATA newdata;
    SET olddata (options here);
RUN;
PROC PRINT DATA = dataset (options here);
RUN;
DATA HoeghFamily;
    SET COMBINED (KEEP = Food Activity MiddleName);
RUN;
202  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
203  
204  DATA HoeghFamily;
205      SET COMBINED (KEEP = Food Activity MiddleName);
206  RUN;
NOTE: There were 3 observations read from the data set WORK.COMBINED.
NOTE: The data set WORK.HOEGHFAMILY has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
207  ods html5 close;ods listing;
208  
Consider the following code, what will this print?
PROC PRINT DATA= HoeghFamily (Firstobs = 2);
RUN;
PROC PRINT DATA= HoeghFamily (Firstobs = 2);
RUN;
| Obs | Food | Activity | MiddleName | 
|---|---|---|---|
| 2 | Cauliflower | Skiing | Larson | 
| 3 | Anything | Gymnastics | Otelia | 
SAS has built in variables that can be quite useful.
PROC SORT DATA=mydata.housing OUT=housing;
    BY State;
RUN;
217  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
218  
219  PROC SORT DATA=mydata.housing OUT=housing;
220      BY State;
221  RUN;
NOTE: There were 2000 observations read from the data set MYDATA.HOUSING.
NOTE: The data set WORK.HOUSING has 2000 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
222  ods html5 close;ods listing;
223  
DATA HousingSample;
    SET housing;
    BY STATE;
    IF FIRST.STATE =1;
RUN;
225  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
226  
227  DATA HousingSample;
228      SET housing;
229      BY STATE;
230      IF FIRST.STATE =1;
231  RUN;
NOTE: There were 2000 observations read from the data set WORK.HOUSING.
NOTE: The data set WORK.HOUSINGSAMPLE has 32 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
232  ods html5 close;ods listing;
233  
PROC PRINT DATA=HousingSample;
RUN;
| Obs | City | State | Zip_Code | Living_Sq_Ft | Closing_Price | 
|---|---|---|---|---|---|
| 1 | BODEGA BA | CA | 94923 | 1237 | 680000 | 
| 2 | PEYTON | CO | 80831 | 2459 | 52000 | 
| 3 | TOLLAND | CT | 60842 | 2083 | 298900 | 
| 4 | WASHINGTO | DC | 20012 | 2345 | 626000 | 
| 5 | LAKELAND | FL | 33812 | 1713 | 105000 | 
| 6 | MCDONOUGH | GA | 30252 | 3066 | 369400 | 
| 7 | HONOLULU | HI | 96826 | 1200 | 83900 | 
| 8 | FAIRFIELD | IA | 52556 | 1740 | 111800 | 
| 9 | CALDWELL | ID | 83607 | 1666 | 172225 | 
| 10 | JOLIET | IL | 60433 | 1400 | 91000 | 
| 11 | FORT WAYN | IN | 46805 | 1289 | 86773 | 
| 12 | TOPEKA | KS | 66608 | 1431 | 104500 | 
| 13 | BARDSTOWN | KY | 40004 | 2494 | 23700 | 
| 14 | LOWELL | MA | 18501 | 3229 | 400000 | 
| 15 | YORK | ME | 39095 | 2304 | 310230 | 
| 16 | FLINT | MI | 48504 | 1358 | 50600 | 
| 17 | LAKE CITY | MN | 55041 | 1463 | 299900 | 
| 18 | SAINT LOU | MO | 63125 | 1048 | 165000 | 
| 19 | LEWISTOWN | MT | 59457 | 2464 | 150920 | 
| 20 | CHERRYVIL | NC | 28021 | 3362 | 47000 | 
| 21 | LINCOLN | NE | 68524 | 1458 | 139000 | 
| 22 | DRYDEN | NY | 13053 | 1104 | 114000 | 
| 23 | MAUMEE | OH | 43537 | 3301 | 314500 | 
| 24 | CLAREMORE | OK | 74019 | 1400 | 106500 | 
| 25 | CAVE JUNC | OR | 97523 | 1440 | 51000 | 
| 26 | PITTSBURG | PA | 15234 | 1450 | 119000 | 
| 27 | CRANSTON | RI | 29051 | 1259 | 259000 | 
| 28 | ALLEN | TX | 75002 | 1780 | 148500 | 
| 29 | HUNTSVILL | UT | 84317 | 2910 | 520695 | 
| 30 | PROVIDENC | VA | 23140 | 1621 | 224900 | 
| 31 | VANCOUVER | WA | 98682 | 1508 | 175000 | 
| 32 | ROCK SPRI | WY | 82901 | 1820 | 191893 | 
Data USA;
	set mydata.attacks;
	if Country = 'USA';
RUN;
Data Australia;
	set mydata.attacks;
	if Country = 'AUSTRALIA';
RUN;
DATA Combined;
	set USA Australia;
RUN;
Proc Freq Data = Combined;
	TABLES Activity;
RUN;
The FREQ Procedure
| Activity | Frequency | Percent | Cumulative Frequency  | 
Cumulative Percent  | 
|---|---|---|---|---|
| Bathing | 82 | 4.46 | 82 | 4.46 | 
| Diving | 56 | 3.04 | 138 | 7.50 | 
| Fishing | 220 | 11.96 | 358 | 19.46 | 
| Snorkeling | 44 | 2.39 | 402 | 21.85 | 
| Spearfishing | 118 | 6.41 | 520 | 28.26 | 
| Standing | 66 | 3.59 | 586 | 31.85 | 
| Surfing | 704 | 38.26 | 1290 | 70.11 | 
| Swimming | 438 | 23.80 | 1728 | 93.91 | 
| Wading | 112 | 6.09 | 1840 | 100.00 | 
Next we will focus on statistical procedure available in SAS.
PROC UNIVARIATE produces statistics and graphs describing the distribution of a single variable.
The syntax follows as:
PROC UNIVARIATE DATA = yourdata;
    VAR variblename;
RUN;
PROC UNIVARIATE DATA=housing;
    VAR LIVING_SQ_FT;
RUN;
The UNIVARIATE Procedure
Variable: Living_Sq_Ft
| Moments | |||
|---|---|---|---|
| N | 2000 | Sum Weights | 2000 | 
| Mean | 1840.32 | Sum Observations | 3680640 | 
| Std Deviation | 853.235014 | Variance | 728009.99 | 
| Skewness | 1.77833485 | Kurtosis | 4.87450208 | 
| Uncorrected SS | 8228847374 | Corrected SS | 1455291969 | 
| Coeff Variation | 46.363405 | Std Error Mean | 19.0789149 | 
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 1840.320 | Std Deviation | 853.23501 | 
| Median | 1625.000 | Variance | 728010 | 
| Mode | 1008.000 | Range | 6475 | 
| Interquartile Range | 944.50000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 96.45832 | Pr > |t| | <.0001 | 
| Sign | M | 1000 | Pr >= |M| | <.0001 | 
| Signed Rank | S | 1000500 | Pr >= |S| | <.0001 | 
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile | 
| 100% Max | 6963.0 | 
| 99% | 4838.5 | 
| 95% | 3423.5 | 
| 90% | 2938.0 | 
| 75% Q3 | 2194.5 | 
| 50% Median | 1625.0 | 
| 25% Q1 | 1250.0 | 
| 10% | 1019.0 | 
| 5% | 904.5 | 
| 1% | 704.0 | 
| 0% Min | 488.0 | 
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs | 
| 488 | 631 | 6046 | 977 | 
| 540 | 639 | 6506 | 1766 | 
| 576 | 122 | 6640 | 1647 | 
| 594 | 158 | 6695 | 1812 | 
| 596 | 637 | 6963 | 610 | 
We will go in depth on SAS graphics next week, but many of the statistical procedure have built in options for creating graphics.
Syntax follows as:
PROC UNIVARIATE DATA=yourdata;
    VAR yourvariable;
    plot-request yourvariable/ options;
RUN;
PROC UNIVARIATE DATA=housing noprint;
    VAR LIVING_SQ_FT;
    HISTOGRAM LIVING_SQ_FT / Normal; * Normal creates a normal curve;
RUN;
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for Living_Sq_Ft
| Parameters for Normal Distribution | ||
|---|---|---|
| Parameter | Symbol | Estimate | 
| Mean | Mu | 1840.32 | 
| Std Dev | Sigma | 853.235 | 
| Goodness-of-Fit Tests for Normal Distribution | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Kolmogorov-Smirnov | D | 0.1094403 | Pr > D | <0.010 | 
| Cramer-von Mises | W-Sq | 10.0771813 | Pr > W-Sq | <0.005 | 
| Anderson-Darling | A-Sq | 59.7937631 | Pr > A-Sq | <0.005 | 
| Quantiles for Normal Distribution | ||
|---|---|---|
| Percent | Quantile | |
| Observed | Estimated | |
| 1.0 | 704.000 | -144.601 | 
| 5.0 | 904.500 | 436.873 | 
| 10.0 | 1019.000 | 746.855 | 
| 25.0 | 1250.000 | 1264.822 | 
| 50.0 | 1625.000 | 1840.320 | 
| 75.0 | 2194.500 | 2415.818 | 
| 90.0 | 2938.000 | 2933.785 | 
| 95.0 | 3423.500 | 3243.767 | 
| 99.0 | 4838.500 | 3825.241 | 
PROC UNIVARIATE DATA=housing noprint;
    VAR LIVING_SQ_FT;
    HISTOGRAM LIVING_SQ_FT / Lognormal; * Normal creates a normal curve;
RUN;
    
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Lognormal Distribution for Living_Sq_Ft
| Parameters for Lognormal Distribution | ||
|---|---|---|
| Parameter | Symbol | Estimate | 
| Threshold | Theta | 0 | 
| Scale | Zeta | 7.428118 | 
| Shape | Sigma | 0.413559 | 
| Mean | 1832.862 | |
| Std Dev | 791.5927 | |
| Goodness-of-Fit Tests for Lognormal Distribution | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Kolmogorov-Smirnov | D | 0.03801564 | Pr > D | <0.010 | 
| Cramer-von Mises | W-Sq | 0.71048795 | Pr > W-Sq | <0.005 | 
| Anderson-Darling | A-Sq | 4.20760256 | Pr > A-Sq | <0.005 | 
| Quantiles for Lognormal Distribution | ||
|---|---|---|
| Percent | Quantile | |
| Observed | Estimated | |
| 1.0 | 704.000 | 642.930 | 
| 5.0 | 904.500 | 852.245 | 
| 10.0 | 1019.000 | 990.412 | 
| 25.0 | 1250.000 | 1273.057 | 
| 50.0 | 1625.000 | 1682.638 | 
| 75.0 | 2194.500 | 2223.994 | 
| 90.0 | 2938.000 | 2858.682 | 
| 95.0 | 3423.500 | 3322.135 | 
| 99.0 | 4838.500 | 4403.704 | 
In addition the options outlined earlier, here are a few more options built into PROC MEANS.
Q: Describe a t-test and give an example of when it might be used (ideally in the context of one of the data sets we have seen in this course).
This is useful to test whether the mean is significantly different from a hypothesis $H_0$ value of n.
PROC TTEST DATA=yourdata H0 = n options;
    VAR variable;
RUN;
Options include:
PROC TTEST DATA=HOUSING H0 = 2500;
    VAR LIVING_SQ_FT;
RUN;
The TTEST Procedure
Variable: Living_Sq_Ft
| N | Mean | Std Dev | Std Err | Minimum | Maximum | 
|---|---|---|---|---|---|
| 2000 | 1840.3 | 853.2 | 19.0789 | 488.0 | 6963.0 | 
| Mean | 95% CL Mean | Std Dev | 95% CL Std Dev | ||
|---|---|---|---|---|---|
| 1840.3 | 1802.9 | 1877.7 | 853.2 | 827.6 | 880.5 | 
| DF | t Value | Pr > |t| | 
|---|---|---|
| 1999 | -34.58 | <.0001 | 
To compare differences between two groups use this procedure:
PROC TTEST DATA=yourdata options;
    CLASS variable;
    VAR variable;
RUN;
Class is the categorical variable that you like to test differences.
PROC TTEST DATA=VA_CO;
    CLASS State;
    VAR Closing_Price;
RUN;
The TTEST Procedure
Variable: Closing_Price
| State | N | Mean | Std Dev | Std Err | Minimum | Maximum | 
|---|---|---|---|---|---|---|
| CO | 80 | 716647 | 1196747 | 133800 | 52000.0 | 7650000 | 
| VA | 80 | 281179 | 182367 | 20389.3 | 40000.0 | 1075000 | 
| Diff (1-2) | 435468 | 855997 | 135345 | 
| State | Method | Mean | 95% CL Mean | Std Dev | 95% CL Std Dev | ||
|---|---|---|---|---|---|---|---|
| CO | 716647 | 450324 | 982970 | 1196747 | 1035728 | 1417514 | |
| VA | 281179 | 240595 | 321763 | 182367 | 157830 | 216009 | |
| Diff (1-2) | Pooled | 435468 | 168149 | 702787 | 855997 | 771122 | 962032 | 
| Diff (1-2) | Satterthwaite | 435468 | 166256 | 704680 | |||
| Method | Variances | DF | t Value | Pr > |t| | 
|---|---|---|---|---|
| Pooled | Equal | 158 | 3.22 | 0.0016 | 
| Satterthwaite | Unequal | 82.667 | 3.22 | 0.0018 | 
| Equality of Variances | ||||
|---|---|---|---|---|
| Method | Num DF | Den DF | F Value | Pr > F | 
| Folded F | 79 | 79 | 43.06 | <.0001 | 
Another type of t-test involves paired data.
PROC TTEST DATA=yourdata options;
    PAIRED variable1 * variable2;
RUN;
The ttest procedure automatically creates a few graphics, but that can be controlled using the following syntax.
PROC TTEST DATA=yourdata PLOTS = (plot-request)
The plot requests are:
Proc Univariate Data = mydata.bikes;
	Var Registered Casual;
	HISTOGRAM Registered Casual / Normal; * Normal creates a normal curve;
run;
The UNIVARIATE Procedure
Variable: registered
| Moments | |||
|---|---|---|---|
| N | 10886 | Sum Weights | 10886 | 
| Mean | 155.552177 | Sum Observations | 1693341 | 
| Std Deviation | 151.039033 | Variance | 22812.7895 | 
| Skewness | 1.52480459 | Kurtosis | 2.626081 | 
| Uncorrected SS | 511720093 | Corrected SS | 248317214 | 
| Coeff Variation | 97.0986301 | Std Error Mean | 1.44762152 | 
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 155.5522 | Std Deviation | 151.03903 | 
| Median | 118.0000 | Variance | 22813 | 
| Mode | 3.0000 | Range | 886.00000 | 
| Interquartile Range | 186.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 107.4536 | Pr > |t| | <.0001 | 
| Sign | M | 5435.5 | Pr >= |M| | <.0001 | 
| Signed Rank | S | 29547378 | Pr >= |S| | <.0001 | 
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile | 
| 100% Max | 886 | 
| 99% | 697 | 
| 95% | 464 | 
| 90% | 354 | 
| 75% Q3 | 222 | 
| 50% Median | 118 | 
| 25% Q1 | 36 | 
| 10% | 7 | 
| 5% | 4 | 
| 1% | 1 | 
| 0% Min | 0 | 
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs | 
| 0 | 4012 | 833 | 9585 | 
| 0 | 3890 | 839 | 9897 | 
| 0 | 3867 | 857 | 9298 | 
| 0 | 3725 | 857 | 9753 | 
| 0 | 3318 | 886 | 9346 | 
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for registered
| Parameters for Normal Distribution | ||
|---|---|---|
| Parameter | Symbol | Estimate | 
| Mean | Mu | 155.5522 | 
| Std Dev | Sigma | 151.039 | 
| Goodness-of-Fit Tests for Normal Distribution | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Kolmogorov-Smirnov | D | 0.151715 | Pr > D | <0.010 | 
| Cramer-von Mises | W-Sq | 59.670109 | Pr > W-Sq | <0.005 | 
| Anderson-Darling | A-Sq | 383.449398 | Pr > A-Sq | <0.005 | 
| Quantiles for Normal Distribution | ||
|---|---|---|
| Percent | Quantile | |
| Observed | Estimated | |
| 1.0 | 1.00000 | -195.8172 | 
| 5.0 | 4.00000 | -92.8849 | 
| 10.0 | 7.00000 | -38.0121 | 
| 25.0 | 36.00000 | 53.6779 | 
| 50.0 | 118.00000 | 155.5522 | 
| 75.0 | 222.00000 | 257.4265 | 
| 90.0 | 354.00000 | 349.1165 | 
| 95.0 | 464.00000 | 403.9893 | 
| 99.0 | 697.00000 | 506.9215 | 
The UNIVARIATE Procedure
Variable: casual
| Moments | |||
|---|---|---|---|
| N | 10886 | Sum Weights | 10886 | 
| Mean | 36.0219548 | Sum Observations | 392135 | 
| Std Deviation | 49.9604766 | Variance | 2496.04922 | 
| Skewness | 2.4957484 | Kurtosis | 7.55162931 | 
| Uncorrected SS | 41294965 | Corrected SS | 27169495.8 | 
| Coeff Variation | 138.694518 | Std Error Mean | 0.47884219 | 
| Basic Statistical Measures | |||
|---|---|---|---|
| Location | Variability | ||
| Mean | 36.02195 | Std Deviation | 49.96048 | 
| Median | 17.00000 | Variance | 2496 | 
| Mode | 0.00000 | Range | 367.00000 | 
| Interquartile Range | 45.00000 | ||
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 75.2272 | Pr > |t| | <.0001 | 
| Sign | M | 4950 | Pr >= |M| | <.0001 | 
| Signed Rank | S | 24504975 | Pr >= |S| | <.0001 | 
| Quantiles (Definition 5) | |
|---|---|
| Level | Quantile | 
| 100% Max | 367 | 
| 99% | 241 | 
| 95% | 141 | 
| 90% | 94 | 
| 75% Q3 | 49 | 
| 50% Median | 17 | 
| 25% Q1 | 4 | 
| 10% | 1 | 
| 5% | 0 | 
| 1% | 0 | 
| 0% Min | 0 | 
| Extreme Observations | |||
|---|---|---|---|
| Lowest | Highest | ||
| Value | Obs | Value | Obs | 
| 0 | 10866 | 356 | 7687 | 
| 0 | 10844 | 357 | 6729 | 
| 0 | 10842 | 361 | 7686 | 
| 0 | 10840 | 362 | 9652 | 
| 0 | 10839 | 367 | 6730 | 
The UNIVARIATE Procedure
The UNIVARIATE Procedure
Fitted Normal Distribution for casual
| Parameters for Normal Distribution | ||
|---|---|---|
| Parameter | Symbol | Estimate | 
| Mean | Mu | 36.02195 | 
| Std Dev | Sigma | 49.96048 | 
| Goodness-of-Fit Tests for Normal Distribution | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Kolmogorov-Smirnov | D | 0.235452 | Pr > D | <0.010 | 
| Cramer-von Mises | W-Sq | 171.574837 | Pr > W-Sq | <0.005 | 
| Anderson-Darling | A-Sq | 948.029130 | Pr > A-Sq | <0.005 | 
| Quantiles for Normal Distribution | ||
|---|---|---|
| Percent | Quantile | |
| Observed | Estimated | |
| 1.0 | 0.000 | -80.20349 | 
| 5.0 | 0.000 | -46.15572 | 
| 10.0 | 1.000 | -28.00497 | 
| 25.0 | 4.000 | 2.32413 | 
| 50.0 | 17.000 | 36.02195 | 
| 75.0 | 49.000 | 69.71978 | 
| 90.0 | 94.000 | 100.04888 | 
| 95.0 | 141.000 | 118.19963 | 
| 99.0 | 241.000 | 152.24740 | 
Similar to lm() in R, we can also easily run regression in SAS.
PROC REG DATA=yourdata;
    MODEL dependent = independent;
RUN;
Note including categorical variables can take a bit of extra work to create dummy variables (or use PROC GLM).
PROC REG DATA=housing;
    MODEL CLOSING_PRICE = LIVING_SQ_FT;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: Closing_Price
| Number of Observations Read | 2000 | 
|---|---|
| Number of Observations Used | 2000 | 
| Analysis of Variance | |||||
|---|---|---|---|---|---|
| Source | DF | Sum of Squares  | 
Mean Square  | 
F Value | Pr > F | 
| Model | 1 | 3.287656E13 | 3.287656E13 | 384.41 | <.0001 | 
| Error | 1998 | 1.708798E14 | 85525443800 | ||
| Corrected Total | 1999 | 2.037564E14 | |||
| Root MSE | 292447 | R-Square | 0.1614 | 
|---|---|---|---|
| Dependent Mean | 244147 | Adj R-Sq | 0.1609 | 
| Coeff Var | 119.78344 | 
| Parameter Estimates | |||||
|---|---|---|---|---|---|
| Variable | DF | Parameter Estimate  | 
Standard Error  | 
t Value | Pr > |t| | 
| Intercept | 1 | -32459 | 15550 | -2.09 | 0.0370 | 
| Living_Sq_Ft | 1 | 150.30316 | 7.66607 | 19.61 | <.0001 | 
The REG Procedure
Model: MODEL1
Dependent Variable: Closing_Price
Graphics can easily be created from PROC REG using the following syntax:
PROC REG DATA=yourdata PLOTS(options) = (plot request list);
    MODEL dependent = independent;
RUN;
For the options trailing PLOTS use (ONLY) if you wish to omit the default plots.
Plot requests
PROC REG DATA=housing PLOTS(ONLY) = DIAGNOSTICS;
    MODEL CLOSING_PRICE = LIVING_SQ_FT;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: Closing_Price
| Number of Observations Read | 2000 | 
|---|---|
| Number of Observations Used | 2000 | 
| Analysis of Variance | |||||
|---|---|---|---|---|---|
| Source | DF | Sum of Squares  | 
Mean Square  | 
F Value | Pr > F | 
| Model | 1 | 3.287656E13 | 3.287656E13 | 384.41 | <.0001 | 
| Error | 1998 | 1.708798E14 | 85525443800 | ||
| Corrected Total | 1999 | 2.037564E14 | |||
| Root MSE | 292447 | R-Square | 0.1614 | 
|---|---|---|---|
| Dependent Mean | 244147 | Adj R-Sq | 0.1609 | 
| Coeff Var | 119.78344 | 
| Parameter Estimates | |||||
|---|---|---|---|---|---|
| Variable | DF | Parameter Estimate  | 
Standard Error  | 
t Value | Pr > |t| | 
| Intercept | 1 | -32459 | 15550 | -2.09 | 0.0370 | 
| Living_Sq_Ft | 1 | 150.30316 | 7.66607 | 19.61 | <.0001 | 
The REG Procedure
Model: MODEL1
Dependent Variable: Closing_Price
This procedure allows you to run an analysis of variance procedure with the following syntax:
PROC ANOVA DATA =yourdata;
    CLASS classvariable;
    MODEL dependent = effects;
RUN;
DATA ANDYLIVED;
    SET HOUSING;
    IF STATE in ('IA', 'MT','CO','MN','CA','VA');
RUN;
PROC ANOVA DATA=ANDYLIVED;
    CLASS STATE;
    MODEL CLOSING_PRICE = STATE;
RUN;
The ANOVA Procedure
| Class Level Information | ||
|---|---|---|
| Class | Levels | Values | 
| State | 6 | CA CO IA MN MT VA | 
| Number of Observations Read | 440 | 
|---|---|
| Number of Observations Used | 440 | 
The ANOVA Procedure
Dependent Variable: Closing_Price
| Source | DF | Sum of Squares | Mean Square | F Value | Pr > F | 
|---|---|---|---|---|---|
| Model | 5 | 1.43091E13 | 2.8618199E12 | 9.81 | <.0001 | 
| Error | 434 | 1.2654988E14 | 291589585117 | ||
| Corrected Total | 439 | 1.4085898E14 | 
| R-Square | Coeff Var | Root MSE | Closing_Price Mean | 
|---|---|---|---|
| 0.101585 | 143.7899 | 539990.4 | 375541.3 | 
| Source | DF | Anova SS | Mean Square | F Value | Pr > F | 
|---|---|---|---|---|---|
| State | 5 | 1.43091E13 | 2.8618199E12 | 9.81 | <.0001 | 
Proc REG Data = mydata.bikes;
	MODEL Count = season holiday temp workingday weather;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: count
| Number of Observations Read | 10886 | 
|---|---|
| Number of Observations Used | 10886 | 
| Analysis of Variance | |||||
|---|---|---|---|---|---|
| Source | DF | Sum of Squares  | 
Mean Square  | 
F Value | Pr > F | 
| Model | 5 | 61261467 | 12252293 | 450.49 | <.0001 | 
| Error | 10880 | 295911447 | 27198 | ||
| Corrected Total | 10885 | 357172914 | |||
| Root MSE | 164.91738 | R-Square | 0.1715 | 
|---|---|---|---|
| Dependent Mean | 191.57413 | Adj R-Sq | 0.1711 | 
| Coeff Var | 86.08541 | 
| Parameter Estimates | |||||
|---|---|---|---|---|---|
| Variable | DF | Parameter Estimate  | 
Standard Error  | 
t Value | Pr > |t| | 
| Intercept | 1 | 32.97402 | 6.68752 | 4.93 | <.0001 | 
| season | 1 | 11.16378 | 1.46727 | 7.61 | <.0001 | 
| holiday | 1 | -8.24676 | 9.80449 | -0.84 | 0.4003 | 
| temp | 1 | 8.61542 | 0.21053 | 40.92 | <.0001 | 
| workingday | 1 | 1.09982 | 3.50658 | 0.31 | 0.7538 | 
| weather | 1 | -31.15689 | 2.49998 | -12.46 | <.0001 | 
The REG Procedure
Model: MODEL1
Dependent Variable: count