STATA BASIC COMMANDS

(Notes by Jing Li and Junsoo Lee)

 

[Download]

1. Basic [word file]

2. Panel (I) [panel_1.do file | output panel_1.log file ]

3. Panel (II) [panel_2.do file | output panel_2.log file ]

4. MLE (I) [ml1.do | ml1.log]

 

[Link to]

Go directly below to 1. Basic  2. Panel (I)  3. Panel (II)  4. MLE (I)

 

STATA home pageCapabilities 

Lots of STATA examples programs at Boston College

example 1 Probit

Rare Events Logistic Regression by Gary King

example 2 sampling merge

STATA for Categorical and Limited dependent variables (Indiana U.)

example 3 reading_cleaning

To Subscribe at the STATA newsgroup list

David's random draw example  

Binary time-series-cross-section (BTSCS)

Poisson example data file 

Stata Technical Bulletin (STB)

STATA FAQ

Help?  

 

Stata Web resources

Class Notes for Stata at UCLA (even movies)

Stat 130 Class Notes (Excellent)

STATA textbook examples

Stata Learning Modules (Excellent)

Useful Stata Programs

Stata Programs for Data Analysis

Detailed Stata Web Pages

FAQ

 

SAS Web resources SPSS Web resources

 

Our Notes

 

I.  BASIC

clear

set memory 80m

 

cd c:

cd \work\stata

 

insheet using water.txt

save water.dta

* use water.dta

 

log using water, replace

 

summarize _all

describe _all

 

Memory

C:\stata\wstata /k5000

C:\stata\wstata /k5000 set matsize 100

C:\stata\wstata /k5000 run c:\data\profile.do

 

Data files

Infile x1 x2 x3 using test.txt

*  only text file

 

Insheet x1 x2 x3 using text.txt

*  if saved by spreadsheet

 

save  test, replace

save  test, append

 

use  test

 

list

describe

 

Log File

Log using test.log

Log using test.log, replace

Log using test.log, append

 

Log close

 

Log using test.log, noproc

 

Break

Ctrl-K   Ctrl-break

 

Regression

Regress y x1 x2

Predict yhat

 

Regress y x1 x2, robust

 

vce

* variance-covariance

vce, corr

matrix v = get(vce)

 

coeff & pred

gent asif = _b[const] + _b[ed]*ed + _b[tenure]*tenure

 

test

regress y x1 x2

test x1 = x2

* b1 = b2

 

joint restrictions

test 2*(x1+x2) = 3*x3

test x4+x5 = 0, accum

* two joint restrictions

 

lr test

regress y x1 x2

lrtest, saving(0)

regress y x1 x2 x3

lrtest

 

non-linear restrictions

regress y x1 x2 x3

eq one:  3*_b[x2]^2 = _b[x3]

eq two:  _b[x3] / _b[x2] = 2

testnl  one two

 

By region: regress y x1 x2

By foreign: regress y x1 x2

 

Graph y x1 x2 if foreign ==0, correct(.1) symbol(oi)

Graph y x1 x2 if foreign ==1, correct(.1) symbol(oi)

 

t-test

ttest mpg, by(foreign)

* Ho: diff = 0    where foreign is a dummy variable

Cii 97 24 6

* n=97  mean=24  std=6  95 c.i.

 

ttest  97  24  6  22

* test Ho: mu = 22

 

List

List x1 if x2 > 20

List x1 – x5

List x1 x2 if x4 > 10 |  (x5>3 & x6 > 10)

 

*  ~ =  not equal    &  and       |   or                 ~   not              >=   greater than or equal

 

Sort

Sort mpg

 

Creating new variables

gen lx1 = ln(x1) 

* if same variable is uses, use “replace”.

replace x1 = x1 / 1000

Gen  x3  =  1.05 * x1  if foreign == 0

Replace x3 = 1.20 * x1  if foreign == 1

 

Clear

Clear

Drop _all

 

More

Set more off

Set more on

 

Descriptive statistics

Summarize

Sum  if mpg > 20

Sum if foreign == 0

Sum  x1, detail

 

By region: summarize x1 x2

 

Count

Count if x == 1

Count if y = float(1.1)

* precision issue

 

Tabulate

Tab foreign

Tab x2 foreign

Tab x2 foreign, chi2

* Pearson chi-square test (df=n-1)

 

Correlate

Corr x1 x2

Corr x1 x2 if foreign == 0

 

 

Graph

Graph x1 x2

 

Sort foreign

Graph x1 x2, by(foreign) total

* three graphs; 0, 1, total

 

Tutorial

Tutorial intro

Tutorial graphics

Tutorial survival

Tutorial logit

 

Long Line

* semi-colon should be used.

 

#delimit;

summarize x1 x2

     if foreign == 1;

gen x3 = x1 + x2;

#delimit cr

 

Do file

Do myjob

Do myjob.do

Do myjob, nostop

* don’t stop even with errors

 

Batch Jobs

* at DOS

c:\stata\wstata /b do bigjob.do

 

ADO files

Which fit

Type c:\stata\ado\f\fit.ado

Type c:\stata\ado\f\fit.hlp

 

Three places to put

      Official      C:\stata\ado

      Personal      C:\ado

      Current      .

Global S_ADO  “C\stata\ado;d:\ado;.”

* to refine paths

macro list S_ADO

 

CD

Cd d:

Cd \work\data

Cd “\work\detailed data”

 

Lags and Leads

Gen xlag1 = x[_n-1]

Gen xlead1 = x[_n+1]

 

Procedures (Program)

Program define hello

   Display “hi there”

End

 

Do hello

 

Score

Probit y x1, x2, score(u)

* will be stored in U

 

Poisson Regression (Example provided by Todd)

 

#delimit ;

 

* Poisson regression (Ex. 5.3, Greene, p. 208);

* For Junsoo Lee;

 

input id y x ;

     1         6     1.5;

     2         7     1.8;

     3         4     1.8; 

     4         10    2.0;  

     5         10    1.3; 

     6         6     1.6;  

     7         4     1.2;  

     8         7     1.9;  

     9         2     1.8;  

    10         3     1.0;  

    11         6     1.4;

    12         5     0.5;

    13         3     0.8;

    14         3     1.1;

    15         4     0.7;

end;

 

list;

 

* Poisson regression;

 

poisson y x ;

 

Poisson MLE (Example provided by David/Todd)

clear

insheet using c:\temp\poisson_data.txt

log using c:\temp\poisson_output.log, replace

/* this is the "canned" routine that estimates the poisson regression */

poisson y x

 

/* this maximizes lnL directly, using logged factorial of y */

program define poisreg1

args lnf theta

quietly replace `lnf' = -exp(`theta') + $ML_y1*(`theta') - lnfact($ML_y1)

end

ml model lf poisreg1 (y=x)

ml maximize

 

/* this maximizes lnL directly, using the logged gamma function */

 

program define poisreg2

version 6

args lnf theta

quietly replace `lnf' = -exp(`theta') + $ML_y1*(`theta') - lngamma($ML_y1 + 1)

end  

 

ml model lf poisreg2 (y=x)

ml maximize

 

Quick Panel Estimation

clear

set memory 40m

set more off

set matsize 350

 

log using panel.log, replace

use panel.dta, clear

 

tsset state year

 

regress  y x1 x2 state2-state51 yr82-yr95

xtivreg  y  l1.y  x1  x2  yr82-yr95 (l.y = l2.y), i(state) fe

xtivreg  y  l1.y  x1  x2  yr82-yr95 (l.y = l2.y), i(state) fd

xtivreg  y  l1.y  x1  x2  yr82-yr95 (l.y = l2.y), i(state) re ec2sls

xtabond  y  x1  x2  yr82-yr95, lags(1)

xtabond  y  x1  x2  yr82-yr95, lags(1) twostep

 

log close

 

On-line Help

 

H weibull

Help for ^brier^

 


 

2.  Panel Data Models (I)

 

* **********************************************
* Summary Note by Jing Li and Junsoo Lee
* Do file: panel_1.do  Output file: panel_1.log
*
* Commands: xt, xtdata, xtdes, xtsum, xttab,
* xtgls, xtreg, stegar, xtivreg, xtabond
*
* September 2003
* ***********************************************

clear
cd "c:\upcd1\work\stata"
log using panel_1.log, replace

set mem 200m
set more off
set matsize 800

*****************************************
* xt *
*****************************************

use abdata.dta, clear
* use http://www.stata-press.com/data/r8/abdata, clear

* use http://www.stata-press.com/data/r8/nlswork, clear
* use http://www.stata-press.com/data/r8/union, clear

* tsset id year
** Some commands such as "xtabond" require tsset.

* iis id, clear
* tis year, clear
** iis and tis are alternatives to i() and t() option.
** These override previous setting specified by iis or tis.

** describe pattern of the panel-data
list in 1/6, separator(0) divider

xtdes, patterns(15) i(id) t(year)

*****************************************
* xtdata *
*****************************************

use nlswork.dta, clear
* use http://www.stata-press.com/data/r8/nlswork, clear

generate age2 = age^2
generate ttl_exp2 = ttl_exp^2
generate byte black = race==2

xtdata ln_w grade age* ttl_exp* tenure* black not_smsa south, be clear i(id)
** xtdata converts the data into a form suitable for between estimation.
regress ln_w grade age* ttl_exp* tenure* black not_smsa south
** Thus, this gives the be estimator.

* xtdata ln_w grade age* ttl_exp* tenure* black not_smsa south, fe clear i(id)
* regress ln_w grade age* ttl_exp* tenure* black not_smsa south

*****************************************
* xtdes *
*****************************************

use nlswork.dta, clear
* use http://www.stata-press.com/data/r8/nlswork, clear

xtdes, patterns(15) i(id) t(year)

*****************************************
* xtsum *
*****************************************

xtsum wks_work

xtsum birth_yr
** As this is time invariant, its within std dev is zero.

*****************************************
* xttab *
*****************************************

xttab wks_work

xttab birth_yr
** As this is time invariant, its within percentage is 100.


*****************************************
* xtgls *
*****************************************

** xtgls fits "Cross-sectional time series" linear models using feasible GLS (not panel estimation).

use abdata.dta, clear
* use http://www.stata-press.com/data/r8/abdata, clear

** estimate the model using GLS
* Dep var = n (log of employment in firm i and time t)
* Regressors = w (log of wage) k (log of capital stock) ys (log of industry output)

xtgls n w k ys, i(id) t(year) nmk
** Estimating the model using default options (homosekdasticity, no autocorrelation)

** xtgls n w k ys, i(id) t(year) igls panels(correlated)
** MLE estimation of by specifying the igls option, which iterates the GLS estimates.
** The above does not work, since the panel should be balanced.
** We now use a different data set, which is a balanced panel.

use invest2.dta, clear
* use http://www.stata-press.com/data/r8/invest2, clear

xtgls invest market stock, i(company) panels(iid) corr(independent) nmk
** same as regress (iid, homoskedasticity, no autocorrelation)
** nmk specifies std error to be normalized by n-k.

xtgls invest market stock, i(company) panels(hetero)
** iid, heteroskedasticity, no autocorrelation

xtgls invest market stock, i(company) t(time) panels(correlated)
** correlated, heteroskedasticity, no autocorrelation

xtgls invest market stock, i(company) t(time) panels(correlated) igls nolog
** correlated, heteroskedasticity, no autocorrelation
** MLE estimation by iterative GLS (1046 iterations for this case.)

xtgls invest market stock, i(company) panels(hetero) corr(ar1)
** iid, heteroskedasticity, common ar1 autocorrelation

xtgls invest market stock, i(company) panels(hetero) corr(psar1)
** iid, heteroskedasticity, hetero ar1 autocorrelation

xtgls invest market stock, i(company) t(time) panels(correlated) corr(psar1)
** correlated, heteroskedasticity, hetero ar1 autocorrelation

matrix list e(Sigma)
** Estimated cross-sectional covariances

predict new_inv1, xb
list new_inv1

*****************************************
* xtreg *
*****************************************

use abdata.dta, clear
* use http://www.stata-press.com/data/r8/abdata, clear

** estimate GLS random-effects model
xtreg n w k ys, re i(id) theta

xttest0
** Breusch and Pagan LM test for random effects, modified by Baltagi and Li (1990; see manual, p. 210)

xthausman
** Performs the Hausman specification test for RE versus FE.

xtreg n w k ys, re i(id)
** RE GLS

xtreg n w k ys, mle i(id) nolog
** estimate ML RE model (supressing iterations with nolog)

xtreg n w k ys, re i(id) sa
** RE: using the small-sample Swamy-Arora estimator by Baltagi and Chang (1994; see manual, p. 209)

xtreg n w k ys, pa i(id) nolog
** GEE population-averaged model; equivalent to the RE
** also equivalent to the following xtgee

xtgee n w k ys, family(gaussian) link(id) corr(exchangeable)

xtreg n w k ys, re i(id)
** Between estimator

xtreg n w k ys, be i(id) wls
** Between estimator
** (wls is used for unbalanced panel, and a stabilized variance is used.)

xtreg n w k ys, fe i(id)
** Estimating the Fixed-effects model

*****************************************
* xtregar *
*****************************************

** FE and RE with AR(1) error

use grunfeld.dta, clear
* use http://www.stata-press.com/data/r8/grunfeld, clear

tsset
* tsset company year

xtregar invest mvalue kstock, fe
** Estimating the Fixed-effects model with ar(1) error

xtregar invest mvalue kstock, re
** Estimating the Fixed-effects model with ar(1) error

*****************************************
* xtivreg *
*****************************************
** Estimating instrumental variable panel data models

use abdata.dta, clear
* use http://www.stata-press.com/data/r8/abdata, clear

tsset id year

xtivreg n l2.n l(0/1).w l(0/2).(k ys) yr1977-yr1984 (l.n = l3.n), i(id) fd
** FD model

** dep = n
** ind =
** l.n = n(t-1) ... endogenous and instrumented
** l2.n = n(t-2) .. L2D
** l(0/1).w = w(t), w(t-1) .. D1 (level), LD (lagged)
** l(0.2).(k ys) = k(t), k(t-1), k(t-2); ys(t), ys(t-1), ys(t-2) .. D1, LD, L2D
** iv = l3.n = n(t-3) & all other exogenous variables

xtivreg n l2.n l(0/1).w l(0/2).(k ys) yr1977-yr1984 (l.n = l3.n), i(id) fd first small

xtivreg n w yr1977-yr1984 (k = ys), fe i(id)
** Fixed-effects model

xtivreg n w yr1977-yr1984 (k = ys), fe i(id) first
** Fixed-effects model, reporting the first stage result.

xtivreg n w (k = ys), be i(id) first
** Between-effects model

xtivreg n w (k = ys), re nosa i(id) first theta
** GLS Random-effects model

xtivreg n w (k = ys), re ec2sls i(id) first theta
** EC2SLS Random-effects model

*****************************************
* xtabond *
*****************************************

** Arellano-Bond estimator

use abdata.dta, clear
* use http://www.stata-press.com/data/r8/abdata, clear

xtabond n l(0/1).w l(0/2).(k ys) yr1977-yr1984, lag(2)
** One step estimator
** Sargan's test of over-identification restriction test >> p-value < 0.001.
** Sargan's test assumes homoskedasticity.

xtabond n l(0/1).w l(0/2).(k ys) yr1977-yr1984, lag(1) robust
** Still, one step estimator but reporting robust std error.
** The absence of AR(1) error is rejected but no AR(2) error is not rejected.
** The AR(1) error does not mean the one-step estimator is inconsistent.
** But, if the null of no AR(2) error is not rejected, the one step estimator is inconsistent, which is not the case here.

xtabond n l(0/1).w l(0/2).(k ys) yr1977-yr1984, lag(2) small
** request t-stat and F-stat be reported instead of Z-stat and chi-square stat.

xtabond n l(0/1).w l(0/2).(k ys) yr1977-yr1984, lag(2) twostep
** The std errors of the two-step estimator tend to be biased in small samples.
** Thus, the one-step estimator is recommended for inference, and the Sargan test from the two step estimator is used for model specification.

xtabond n l(0/1).w l(0/2).(k ys) yr1977-yr1984, lag(2) twostep pre(w, lag(1,.)) pre(k, lag(2,.))
** predetermined regressors

xtabond n l(0/1).w l(0/2).(k ys) yr1977-yr1984, lag(2) twostep pre(w, lag(1,.) endog) pre(k, lag(2,.) endog)
** predetermined plus contemporaneously correlated with error

*****************************************
* More examples by Jing *
*****************************************

**** Note: try xt series of commands on "invest2.dta"

use invest2.dta,clear
* use http://www.stata-press.com/data/r8/invest2, clear

iis company
tis time

** describe pattern of the panel-data
xtdes, patterns(20)

** estimate the model using GLS
* Dep variable = invest
* Regressors = market stock

xtgls invest market stock, nmk panels(iid) corr(independent)
xtgls invest market stock, panels(hetero)
xtgls invest market stock, panels(correlated) corr(ar1)
xtgls invest market stock, panels(correlated) corr(psar1)
xtgls invest market stock, igls
gen lninvest = log(invest) /*try GLS with the log-level data*/
xtgls lninvest market stock

**** Note: try xt series of commands on "nlswork.dta"

use nlswork.dta,clear
* use http://www.stata-press.com/data/r8/nlswork, clear

iis idcode
tis year

** describe the patterns of the data
xtdes, patterns(30)

** estimate the model using 'xtreg'
* Dep variable = ln_wage
* Regressors = grade race age ttl_exp tenure not_smsa south
* And the square terms of age ttl_exp tenure are also included

gen age2 = age^2
gen ttl_exp2 = ttl_exp^2
gen tenure2 = tenure^2

* between-effects model
xtreg ln_wage grade race age age2 ttl_exp ttl_exp2 tenure tenure2 not_smsa south, be wls

* fixed-effects model
xtreg ln_wage grade race age age2 ttl_exp ttl_exp2 tenure tenure2 not_smsa south, fe

* GLS Random-effects model
xtreg ln_wage grade race age age2 ttl_exp ttl_exp2 tenure tenure2 not_smsa south, re sa theta
estimates store est1
xtreg ln_wage grade race age ttl_exp tenure not_smsa south, re sa theta
estimates store est2
hausman est1 est2

** instrumental variable and 2SLS estimation of the data
* GLS Random-effects model
xtivreg ln_wage age* not_smsa race (tenure = union south race), re theta first
xtivreg ln_wage age* not_smsa race (tenure = union south race), ec2sls theta small

 

3.  Panel Data Models (II)

 

* **********************************************
* Summary Note by Jing Li and Junsoo Lee
* panel_2.do
*
* Commands: xtdata (II), xtcloglog, xtgee,
* xtlogit, xtprobit, xtsum & xttab,
* xttobit, xtpcse, xtregar, xtintreg,
* xtrchh, xtfrontier, xthtaylor
* September 2003
* ***********************************************

clear
set mem 200m
cd "C:\UpCD1\WORK\Stata\"

log using panel_2.log, replace

set more off
set matsize 800

*********************************
* xtdata *
*********************************

use xtdatasmpl.dta,clear
* use http://www.stata-press.com/data/r8/xtdatasmpl, clear

** 1. use "xtdata" to convert the data into a form suitable for between estimation

xtdata ln_w grade age* ttl_exp* tenure* black not_smsa south, be clear
regress ln_w grade age* ttl_exp* tenure* black not_smsa south

* compare the above results to those from using "xtreg, be"
xtreg ln_w grade age* ttl_exp* tenure* black not_smsa south, be

** use "xtdata" to convert the data into a form suitable for fixed-effects(within) estimation
use xtdatasmpl.dta,clear
xtdata ln_w grade age* ttl_exp* tenure* black not_smsa south, fe i(idcode) clear
regress ln_w grade age* ttl_exp* tenure* black not_smsa south

* compare the above results to those from using "xtreg, fe"
xtreg ln_w grade age* ttl_exp* tenure* black not_smsa south, fe i(idcode)

** use "xtdata" to convert the data into a form suitable for random-effects estimation
use xtdatasmpl.dta,clear

** ratio is specified to be 1; this is for specification-search purposes only
xtdata ln_w grade age* ttl_exp* tenure* black not_smsa south, re ratio(1) clear
regress ln_w grade age* ttl_exp* tenure* black not_smsa south constant, nocons

* compare the above results to those from using "xtreg, re"
xtreg ln_w grade age* ttl_exp* tenure* black not_smsa south, re

** note: every time before using "xtdata", you have to use the original data.

*********************************
* xtcloglog *
*********************************

** 2. try the command 'xtcloglog'
*webuse union.dta,clear
*save union.dta

use union.dta, clear
* use http://www.stata-press.com/data/r8/union, clear

iis idcode
tis year

* There is no FE version of this model; a conditional likelihood function cannot be defined.

** random-effects model
xtcloglog union age grade not_smsa south southXt, re
xtcloglog union age grade not_smsa south southXt, re quad(20)

** population-averaged model (xtgee)
xtcloglog union age grade not_smsa south southXt, pa

** population-averaged model with robust variance, clustering on 'i'
xtcloglog union age grade not_smsa south southXt, pa i(idcode) robust

** population-averaged model with 'xtgee' options
xtcloglog union age grade not_smsa south southXt, pa corr(exchangeable)

*********************************
* xtgee *
*********************************

* Population Averaged model (generalized linear model or Generalized Estimating Equations (GEEs))

* g[E(y(it)] = X(it)*b with y ~ specific dist.
*
* e.g., If logit[E(y(it)] = X(it)*b with y ~ Bernoulli, it's a logit model.
* Then, use link(logit), family(binomial)
*
* There is no convenient likelihood function. (Need to read more references.)
*
* This procedure allows to specify the within-group correlation structure for the panels.
* default: equal-correlation, corr(exchangeable)
* corr(ar1) can be estimated. No option for psar1.
* "xtcorr" gives the within=group correlations.

* Note : xtgls can allow for cross-sectional correlation across panels, but this option is not
* available in xtgee. Instead, xtgls does not allow for the within-group correlation (except
* for autocorrelation with ar1 or psar1), but xtgee can allow for it.

* Special cases (with balanced panels): Try these.. I have not compared them yet.
*
* xtgee, corr(independent) link(cloglog) => cloglog or xtcloglog
* xtgee, corr(independent) link(probit) => probit (but std errors are different)
* If the binomial denominator is not 1, it's bprobit.
* Further Note: blogit and bprobit produce maximum-likelihood logit and probit estimates on grouped ("blocked") data;
* glogit and gprobit produce weighted least-squares estimates.
* xtgee with negative binomial (nbinomial) produces estimates conditional on alpha (correlation).
* nbreg gives unconditional estimates.
* xtgee with corr(independent) fits exponential regression (as in survival models) but
* not with censored data.
*
* xtgee, fam(gauss) link(iden) corr(exch) => xtreg, re or xtreg, mle

use union.dta,clear
* use http://www.stata-press.com/data/r8/union, clear

iis idcode
tis year

xtgee union age grade not_smsa south southXt, family(gamma) link(log) corr(exchangeable) robust
xtgee union age grade not_smsa south southXt, family(poisson) link(log) corr(unstructured)
xtgee union age grade not_smsa south southXt, family(poisson) link(identity) corr(unstructured)


use nlswork2.dta, clear
* use http://www.stata-press.com/data/r8/nlswork2, clear
*webuse nlswork2.dta,clear
*save nlswork2.dta

iis idcode
tis year

gen age2 = age*age
gen ttl_exp2 = ttl_exp*ttl_exp
gen tenure2 = tenure^2

** compare the results from 'regress' and 'xtgee' (using OLS)
regress ln_w grade age* ttl_exp* tenure*
xtgee ln_w grade age* ttl_exp* tenure*, corr(indep) nmp

xtgee ln_w grade age* ttl_exp* tenure*, corr(ar1) nmp
xtgee ln_w grade age* ttl_exp* tenure*, fam(gamm) corr(indep) nmp
xtgee ln_w grade age* ttl_exp* tenure*, fam(gamm) corr(ar2)
xtgee ln_w grade age* ttl_exp* tenure*, fam(poisson) link(log) corr(unstructured)
xtgee ln_w grade age* ttl_exp* tenure*, fam(poisson) link(log) corr(stationary 2)

use airacc.dta, clear
* use http://www.stata-press.com/data/r8/airacc, clear

*webuse airacc.dta,clear
*save airacc.dta

iis(airline)
tis(time)

gen lnpm = ln(pmiles)

xtgee i_cnt inprog, family(poisson) eform offset(lnpm)
xtgee i_cnt inprog, family(gauss) corr(exchangeable) eform offset(lnpm)
xtgee i_cnt inprog, family(binomial) link(identity) corr(independent) eform offset(lnpm)
xtgee i_cnt inprog, family(igaussian) link(log) corr(unstructured)
** xtgee i_cnt inprog, family(binomial) link(logit) corr(exchangeable) /* this line does not work,
error message: estimates diverging (missing predictions)*/

xtgee i_cnt inprog, family(gamma) link(reciprocal) corr(independent)
xtgee i_cnt inprog, family(gauss) link(identity) corr(independent) rgf trace robust score(newscore1)
xtgee i_cnt inprog, family(gauss) link(power) robust
xtgee i_cnt inprog, family(gauss) link(power) t(time) corr(stationary 2) robust

*********************************
* xtlogit *
*********************************

use union.dta, clear
* use http://www.stata-press.com/data/r8/union, clear

iis idcode
tis year

** random-effects model
xtlogit union age grade not_smsa south southXt, re

quadchk
* # of points to use in the quadrature approximation of the integral (this checkup is important.)

xtlogit union age grade not_smsa south southXt, re offset(age)
* the coeff of age = 1 (restricted)

** conditional fixed-effects model
xtlogit union age grade not_smsa south southXt, fe nolog
xtlogit union age grade not_smsa south southXt, fe noskip
xtlogit union age grade not_smsa south southXt, fe offset(grade) nolog

** population-averaged model
xtlogit union age grade not_smsa south southXt, pa eform
xtlogit union age grade not_smsa south southXt, pa robust
* Huber & White sandwich estimator of variance

xtlogit union age grade not_smsa south southXt, pa offset(grade) eform
xtlogit union age grade not_smsa south southXt, pa offset(grade) robust

xtlogit union age grade not_smsa south southXt, pa nolog or robust
/* "or" the estimated coefficients are transformed to odds ratios: i.e., exp(b) is reported. */

xtlogit union age grade not_smsa south southXt, pa nolog robust

** compare the results to 'xtgee'
xtgee union age grade not_smsa south southXt, nolog robust family(binomial) link(logit) corr(exchangeable)


*********************************
* xtprobit *
*********************************

* There is no FE model for this. One may ne tempted to use probit using dummy variables,
* but the resulting estimator is biased.

use union.dta,clear
* use http://www.stata-press.com/data/r8/union, clear

iis idcode
tis year

** random-effects model
xtprobit union age grade not_smsa south southXt, re nolog

quadchk
* # of points to use in the quadrature approximation of the integral (this checkup is important.)

xtprobit union age grade not_smsa south southXt in 1/25000, re offset(age)
xtprobit union age grade not_smsa south southXt, re offset(grade) nolog

** population-averaged model
xtprobit union age grade not_smsa south southXt, pa
xtprobit union age grade not_smsa south southXt, pa eform
xtprobit union age grade not_smsa south southXt, pa robust

xtprobit union age grade not_smsa south southXt, pa robust nolog /* first use 'xtprobit' */
** compare the results to 'xtgee'
xtgee union age grade not_smsa south southXt, family(binomial) link(probit) corr(exchangeable) robust nolog

*webuse chicken.dta,clear
*save chicken.dta

use chicken.dta,clear
* use http://www.stata-press.com/data/r8/chicken, clear

iis(person)

** random-effects model
xtprobit complain age grade south tenure gender race income genderm burger chicken, re nolog
xtprobit complain age grade south tenure gender race income genderm burger chicken, re

** population-averaged model
xtprobit complain age grade south tenure gender race income genderm burger chicken, pa
xtprobit complain age grade south tenure gender race income genderm burger chicken, pa eform
xtprobit complain age grade south tenure gender race income genderm burger chicken, pa robust

****************************
* xtsum & xttab *
****************************

use nlswork.dta, clear
iis idcode
tis year

xtsum age grade ttl_exp hours ln_wage

xttab union
xttrans union

****************************
* xttobit *
****************************

* Again, no FE version in stata, as there is no conditional likelihood function.
* Honore(1992)'s semi-parametric FE Tobit version can be considered, but
* unconditional tobit FE with dummies is biased.

* ll (lower limit) and ul (upper limit)

* option "tobit" reports the LR stat. versus pooling tobit.

*webuse nlswork.dta,clear
*save nlswork.dta
use nlswork.dta, clear
* use http://www.stata-press.com/data/r8/chicken, clear

iis idcode
tis year

** random-effects model (censoring point is ln_wage no greater than 1.9)
xttobit ln_wage union age grade not_smsa south occ_code, ul(1.9) tobit
quadchk, nooutput

** random-effects model (censoring point is ln_wage lies between 0.9 and 1.9)
xttobit ln_wage union age grade not_smsa south occ_code, ll(0.9) ul(1.9) tobit
quadchk, nooutput

** random-effects model (quadrature approx. of the integral is at its max, i.e. 30)
xttobit ln_wage union age grade not_smsa south occ_code, ll(0.9) ul(1.9) quad(30) tobit

** random-effects model (the coefficient of tenure constrained to be 1)
xttobit ln_wage union age grade tenure ttl_exp race not_smsa south occ_code, ll(0.4) offset(tenure) tobit

** random-effects model (the coefficient of ttl_exp constrained to be 1)
xttobit ln_wage union age grade tenure ttl_exp race not_smsa south occ_code, ul(1.6) offset(ttl_exp) tobit

****************************
* xtpcse *
****************************

* Alternative to xlgls
* Panel-corrected std error (PCSE) when OLS or Prais-Winsten regression was used.
* The disturbances are assumed to be heteroskedastic and contemporaneously correlated across panels.
* Also, options include corr(indep), corr(ar1), and corr(psar1), which has panel specific ar(1) errors.
* Consistent as T goes infinity.

* Again, this does not include within-group correlations; for this, use xtgee (Consistent as N goes infinity).

use grunfeld.dta,clear
* use http://www.stata-press.com/data/r8/grunfeld, clear

tsset company year, yearly

xtpcse invest mvalue kstock
xtpcse invest mvalue kstock, correlation(ar1)
xtpcse invest mvalue kstock, correlation(psar1) rhotype(tscorr) detail

****************************
* xtregar *
****************************

* FE and RE models with AR(1) error (common rho only). tsset is needed due to T asymptotics.
* Can accomodate unbalanced panerls.

* "lbi" option reports the LBI statistic for rho = 0.

use grunfeld.dta,clear
* use http://www.stata-press.com/data/r8/grunfeld, clear

tsset company year, yearly

** fixed-effects with an AR(1) disturbance
xtregar invest mvalue kstock, fe rhotype(tscorr)
xtregar invest mvalue kstock, fe rhotype(tscorr) twostep
xtregar invest mvalue kstock if year != 1943 & year != 1944, fe lbi

** random-effects with an AR(1) disturbance
xtregar invest mvalue kstock, re rhotype(tscorr)
xtregar invest mvalue kstock if year != 1943 & year != 1944, re lbi

****************************
* xtintreg *
****************************

* RE models for interval data panels (no FE version)
* needs Dep(lower) and Dep(upper), and RE version requires a quadchk checkup.

* Prediction can be given over intervals.
* pr0(a,b) computes P(20 < y < 30).

* intreg computes the LR test for the OLS

*webuse nlswork3.dta,clear
*save nlswork3.dta

use nlswork3.dta, clear
* use http://www.stata-press.com/data/r8/grunfeld, clear

iis idcode
tis year

** random-effects interval data regression model
xtintreg ln_wage1 ln_wage2 union age grade not_smsa south southXt occ_code, noskip intreg
predict new_var, pr0(1,3)
predict new_var1, pr0(0,3)

** the coefficient of age constrained to be 1
*xtintreg ln_wage1 ln_wage2 union age grade not_smsa south southXt occ_code, quad(25) offset(age) intreg
*xtintreg ln_wage1 ln_wage2 union age grade tenure ttl_exp not_smsa south, offset(tenure) intreg
*xtintreg ln_wage1 ln_wage2 union age grade not_smsa south southXt occ_code, offset(grade) quad(20) intreg

****************************
* xtpoisson *
****************************

* Three models: FE, RE and GEE

* Note that there is no prediction for the FE model: conditional likelihood function.

* "irr" reports exp(b), which implies incidence-rate ratios

*webuse ships.dta,clear
*save ships.dta
use ships.dta,clear
* use http://www.stata-press.com/data/r8/ships, clear

** random-effects model
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, re i(ship)
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, re i(ship) irr
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, re i(ship) exposure(service) irr

xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, re i(ship) ex(service) irr normal nolog
* RE has a normal distribution, rather than a gamma dist.

xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, re i(ship) ex(service) irr normal quad(25)

** conditional fixed-effects model
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, fe i(ship)
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, fe i(ship) ex(service)
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, fe i(ship) ex(service) irr

** population-averaged model ('eform' is an xtgee option)
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, pa i(ship) ex(service) robust
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, pa i(ship) ex(service) eform
xtpoisson accident op_75_79 co_65_69 co_70_74 co_75_79, pa i(ship) ex(service) robust eform


****************************
* xtnbreg *
****************************

* Negative binomial Poisson models (RE, GEE and FE versions)

* Again, no prediction for the FE version

use airacc.dta,clear
* use http://www.stata-press.com/data/r8/airacc, clear

iis airline
tis time

** random-effects model
xtnbreg i_cnt inprog, re exposure(pmiles) irr
predict new_var2

** conditional fixed-effects model
xtnbreg i_cnt inprog, fe exposure(pmiles) irr
predict new_var3

** population-averaged model ('eform' is an xtgee option)
xtnbreg i_cnt inprog, pa exposure(pmiles) robust eform

****************************
* xtrchh *
****************************

* Hildreth-Houck random coefficient model

use invest2.dta,clear
* use http://www.stata-press.com/data/r8/invest2, clear

* Check the data for possible random coefficients
reshape wide invest market stock, i(time) j(company)
sureg (invest1 market1 stock1) (invest2 market2 stock2) (invest3 market3 stock3) (invest4 market4 stock4) (invest5 market5 stock5)

use invest2.dta,clear
xtrchh invest market stock, i(company) t(time)

predict new4, xb

****************************
* xtfrontier *
****************************

*Frontier Models
*Battese-Coelli (1992) parameterization fof time effects multipleid by the inefficienty term.

*webuse xtfrontier1.dta,clear
*save xtfrontier1.dta
use xtfrontier1.dta,clear
* use http://www.stata-press.com/data/r8/xtfrontier1, clear

** time-invariant model
xtfrontier lnwidgets lnmachines lnworkers, ti i(id)
xtfrontier lnwidgets machines workers, ti i(id) nodifficult

** time-invariant model in terms of a cost function
xtfrontier lnwidgets lnmachines lnworkers, ti i(id) cost
xtfrontier lnwidgets machines workers, ti i(id) nodifficult cost

** time-invariant model with constraint
constraint define 1 _b[lnmachines] + _b[lnworkers] = 1
xtfrontier lnwidgets lnmachines lnworkers, ti i(id) constraint(1)
xtfrontier lnwidgets lnmachines lnworkers, ti i(id) constraint(1) cost

constraint define 2 _b[lnmachines] = _b[lnworkers]
xtfrontier lnwidgets lnmachines lnworkers, ti i(id) constraint(2)
xtfrontier lnwidgets lnmachines lnworkers, ti i(id) constraint(2) cost

** time-varying decay model
xtfrontier lnwidgets lnmachines lnworkers, tvd i(id) t(t)

** time-varying decay model in terms of a cost function
xtfrontier lnwidgets lnmachines lnworkers, tvd i(id) t(t) cost
xtfrontier lnwidgets machines workers, tvd i(id) t(t) cost

** time-varying decay model with constraint
constraint define 3 _b[lnmachines] = 2* _b[lnworkers]
xtfrontier lnwidgets lnmachines lnworkers, tvd i(id) t(t) constraint(3)
xtfrontier lnwidgets lnmachines lnworkers, tvd i(id) t(t) constraint(3) cost

****************************
* xthtaylor *
****************************

*webuse xthtaylor1.dta,clear
*save xthtaylor1.dta
* use http://www.stata-press.com/data/r8/xthtaylor1, clear

use xthtaylor1.dta, clear

** Hausman-taylor estimator with only endogenous variables
correlate ui z1 z2 x1a x1b x2 eit
xthtaylor yit x1a x1b x2 z1 z2, endog(x2 z2) i(id)
xthtaylor yit x1a x1b x2 z1 z2, endog(x2 z2) i(id) t(t) amacurdy
xthtaylor yit x1a x1b x2 z1 z2, endog(x2 z2) i(id) t(t) small
xthtaylor yit x1a x1b x2 z1 z2, endog(x2 z2) i(id) t(t) amacurdy small

** Hausman-taylor estimator with constant variables
xthtaylor yit x1a x1b x2 z1 z2 ui, endog(x2 z2) constant(z1 z2 ui) i(id)
xthtaylor yit x1a x1b x2 z1 z2 ui, endog(x2 z2) constant(z1 z2 ui) i(id) t(t) amacurdy
xthtaylor yit x1a x1b x2 z1 z2 ui, endog(x2 z2) constant(z1 z2 ui) i(id) t(t) small

** Hausman-taylor estimator with varying variables
xthtaylor yit x1a x1b x2 z1 z2 ui, endog(x2 z2) varying(x2 x1a x1b) i(id)
xthtaylor yit x1a x1b x2 z1 z2 ui, endog(x2 z2) varying(x2 x1a x1b) i(id) t(t) amacurdy
xthtaylor yit x1a x1b x2 z1 z2 ui, endog(x2 z2) varying(x2 x1a x1b) i(id) t(t) small

*webuse psidextract.dta,clear
*save psidextract.dta
* use http://www.stata-press.com/data/r8/psidextract, clear

use psidextract.dta,clear
iis id
tis t

xtsum exp exp2 wks ms union, i(id)

** Hausman-taylor estimator with only endogenous variables
correlate fem blk occ south smsa ind ed
xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed, endog(exp exp2 wks ms union ed)
xthtaylor lwage occ south smsa ind exp* wks ms union fem blk ed, endog(exp exp2 wks ms union ed) amacurdy
xthtaylor lwage occ south smsa ind exp* wks ms union fem blk ed, endog(exp exp2 wks ms union ed) small
xthtaylor lwage occ south smsa ind exp* wks ms union fem blk ed, endog(exp exp2 wks ms union ed) amacurdy small

** Hausman-taylor estimator with constant variables
xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed, endog(exp exp2 wks ms union ed) /*
*/ constant(fem blk ed)
xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed, endog(exp exp2 wks ms union ed) /*
*/ constant(fem blk ed) amacurdy
xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed, endog(exp exp2 wks ms union ed) /*
*/ constant(fem blk ed) small

** Hausman-taylor estimator with varying variables
xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed, endog(exp exp2 wks ms union ed) /*
*/ varying(ms exp* occ south smsa ind wks union)
xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed, endog(exp exp2 wks ms union ed) /*