Economics
406: Topics in Microeconomics
Data Sets
A note on opening large
data sets. Some of the data below are large data sets. The
version of Stata used in CBE labs is Stata IC; a version of Stata that
can have unlimited number of observations. However, in its upload
onto the CBE machines, stata was set to use 10 MB of computer memory
which translates into a rather small quantity of data. One can
remedy this by setting stata to use more computer memory prior to
opening large data sets. One does this my using the following
command:
set mem 160m
The set mem command
allows Stata to use the user specified memory when opening data.
In this case, one would be allowing Stata to use 160 MB of data
which is enough to open the General Social Survey and the Angrist &
Krueger data below. Up to the physical memory installed in the
computer, one can increase this number to open larger data sets.
To make this work,
first open stata and type the "set mem 160m" command in the command
line. Then open the large data set (typically using the open
command under the file menu). This should enable you to open all
of the files below.
General
Social Survey
The general social survey is a bi-annual survey covering a wide range
of demographic data as well as social topics that change during each
survey administration. This survey contains 53,043 observations
of 5,366 different variables. Included in this data are
information on wages, occupations, and education. Because of the
large number of observations, students using the GSS must first open
Stata, then implement the command "set maxvar 8000", and then open the
GSS file.
GSS A-B
GSS C-D GSSE-F
GSSG-H GSSI-J GSSK-L GSSM-N GSSO-Q
GSSR-S GSST-Z
An alternative method of opening the GSS on computers that handle a
limited number of variables (like those in Parks Hall) is to open
subsets of the GSS, delete the unwanted variables and then merge
the data sets together. Here is an example of how to do this.
Freshmen
2002 Data
This data set observes all freshmen that began at WWU during the fall
quarter of 2002. This data includes pre-WWU information (high
school GPA, hours transferred, SAT, gender and demographics) and their
WWU gpa during their fall quarter of 2002.
Teacher
Data
This data set observes 2,381 Washington teachers. Data includes
location and building characteristics, teacher education, pay,
experience and measures of student learning (timath). This data
was used in my paper:
http://faculty.wwu.edu/kriegj/Econ.%20Documents/Teacher%20Quality%20Attrition%20EER%20Final.pdf
2010 Undergraduate
Exit Data
This data set observes 1,707 respondents to the 2010 undergraduate exit
survey at Western Washington University. This survey is detailed
at: http://www.wwu.edu/osr/documents/UGrad2010Report_000.pdf
9th/10th Grade Data
This data observes one complete year of Washington 9th graders.
These students took the ITBS tests in 9th grade and then were
followed into the 10th grade when they took the WASL. Also
included are demographic and questions about the student's high school
activities. 60,296 observations are included.
April 2004 CPS
The CPS is the data set used by the government to determine the
unemployment rate. The April round also inquiries about education
and occupation issues. This data set includes 6,624 observations.
Angrist and Krueger Data
The Angrist and Krueger data is a compilation of the 1970 and 1980
census data used in their work that examined the impact of cumpulsory
school attendance laws on earnings. The paper can be found here
and a description of the data, here.
Mroz Data
This famous data set observes only women; no men. This data was
originally used to analyze why some women enter the labor force while
others do not. The first variable in the data set (inlf) is equal
to zero if the observation is not in the labor force. For obvious
reasons, using observations not in the labor force to determine the
impact of education on wages is not a great idea.
Wage2 Data
This data set comes from Jeffrey Wooldridge and includes 935
observations of men's wages, IQ, education, experience and a
number of other important variables.