/* c:\lotteries_b\1997\data\elk_res_1997.dta */ /* The following is the first portion of a program that will comprise the bulk of the do-file demonstration of STATA's data management, statistics and graphical abilities. This first portion involves reading a text file into memory, saving it as a STATA data file, generating/transforming some variables, and then examining some numerical and graphical descriptive measures */ /* First, clear the memory, set the space desired, and control the viewing of output */ clear set memory 80m /* Second, read in a text (ascii) file and save it as a STATA file */ /*insheet using c:\lotteries_b\1997\data\elk_res_1997_2.txt save c:\lotteries_b\1997\intermediate\elk_res_1997_2.dta */ use c:\lotteries_b\1997\intermediate\elk_res_1997_2.dta /* Third, create a "log" file that will store the results */ /*log using c:\lotteries_b\1997\output\stata_presentation.log, replace */ /* Fourth, let's take a quick look at it to see if there are any "problems" and to get an idea of the format of each of the variables */ summarize _all describe _all /* Fifth, note that the data is still a bit "dirty" so replace the values of some of the existing variables, generate some new variables, and drop some unnecessary variables */ replace purchase=. if (purchase==-999 | purchase==-888) replace age=. if (age==-999 | age==-888) replace age=40.88 if age>97 generate male=1 if gender=="M" replace male=0 if male==. /* alternatively: replace male=0 if gender~="M" replace male=0 if gender=="F" */ generate female=1 if gender=="F" replace female=0 if female==. generate obsno=group(_N) #delimit ; drop resobsno choice_2 name_l name_f name_m street city state resident gender dob phone drawn ; #delimit cr /* alternatively: drop choice_2-state resident-drawn */ save c:\lotteries_b\1997\intermediate\elk_res_1997_3.dta, replace clear /* Sixth, additional datasets may contain variables of interest for the analysis. If these datasets contain an element that is common to our dataset then we can merge the datasets in pairs by the variable of interest. For example, our dataset contains the zipcode in which each individual resides. Various socioeconomic data is available at the zipcode level, so we might want to merge this data with our dataset. The objective of the next set of commands is to identify the distance an individual will (on average) expect to travel. To do this, we will merge our dataset with an additional data set. Note that our dataset contains the individuals zipcode and destination choice. The additional dataset contains 3 variables: origin (individual) zipcodes, destination choices, and the round-trip, road distance between the zipcode destination pairs. Since each origin zipcode has 215 destination associated with it, the merging below is based upon variable pairs rather than a single, common variable. */ /* Seventh, we first open the additional dataset containing the mileage estimates, sort it by the two variables of interest, and then save it */ clear use c:\lotteries_b\shared\elk_miles.dta rename huntcode choice_1 sort choice_1 orig_zip save c:\lotteries_b\shared\elk_miles_B.dta, replace clear /* Eighth, then we reopen our original, "master" dataset, sort it by the two variables of interest, and then merge it with the sorted dataset from above */ use c:\lotteries_b\1997\intermediate\elk_res_1997_3.dta sort choice_1 orig_zip merge choice_1 orig_zip using c:\lotteries_b\shared\elk_miles_B.dta, nokeep /* Ninth, we then examine the new "master" dataset to make sure nothing "funny" happened */ summarize _all save c:\lotteries_b\1997\intermediate\elk_res_1997_4.dta, replace /* Tenth, thus far our descriptive analysis has been at the state-level, but we may be interested in examining "stuff" across destination choices. The summarize command is again used but we request that statistics be generated "by" each of the 215 destinations */ keep choice_1 lottery miles age male female sort lottery /*by lottery: summarize miles age male*/ /* Finally, we conclude this preliminary analysis by examining some graphs. These are simply intend to complement the summary statistics produced in step 10 */ for num 1/4: erase "c:\temp\gX.gph" graph age, histogram bin(10) normal title("Histogram of Age") saving(c:\temp\g1) graph miles, histogram bin(10) normal title("Histogram of Miles") saving(c:\temp\g2) graph male female, pie title("Pie Chart of Gender") saving(c:\temp\g3) graph miles age, twoway title("Scatterplot of Miles vs. Age") saving(c:\temp\g4) graph using c:\temp\g1 c:\temp\g2 c:\temp\g3 c:\temp\g4