Data

Economics 406: Topics in Microeconomics
Data Sets

A note on opening large data sets. Some of the data below are large data sets. The version of Stata used in CBE labs is Stata IC; a version of Stata that can have unlimited number of observations. However, in its upload onto the CBE machines, stata was set to use 10 MB of computer memory which translates into a rather small quantity of data. One can remedy this by setting stata to use more computer memory prior to opening large data sets. One does this my using the following command:

set mem 160m

The set mem command allows Stata to use the user specified memory when opening data. In this case, one would be allowing Stata to use 160 MB of data which is enough to open the General Social Survey and the Angrist & Krueger data below. Up to the physical memory installed in the computer, one can increase this number to open larger data sets.

To make this work, first open stata and type the "set mem 160m" command in the command line. Then open the large data set (typically using the open command under the file menu). This should enable you to open all of the files below.

General Social Survey

The general social survey is a bi-annual survey covering a wide range of demographic data as well as social topics that change during each survey administration. This survey contains 53,043 observations of 5,366 different variables. Included in this data are information on wages, occupations, and education. Because of the large number of observations, students using the GSS must first open Stata, then implement the command "set maxvar 8000", and then open the GSS file.

GSS A-B GSS C-D GSSE-F GSSG-H GSSI-J GSSK-L GSSM-N GSSO-Q GSSR-S GSST-Z
An alternative method of opening the GSS on computers that handle a limited number of variables (like those in Parks Hall) is to open subsets of the GSS, delete the unwanted variables and then merge the data sets together. Here is an example of how to do this.

Freshmen 2002 Data

This data set observes all freshmen that began at WWU during the fall quarter of 2002. This data includes pre-WWU information (high school GPA, hours transferred, SAT, gender and demographics) and their WWU gpa during their fall quarter of 2002.

Teacher Data

This data set observes 2,381 Washington teachers. Data includes location and building characteristics, teacher education, pay, experience and measures of student learning (timath). This data was used in my paper: http://faculty.wwu.edu/kriegj/Econ.%20Documents/Teacher%20Quality%20Attrition%20EER%20Final.pdf

2010 Undergraduate Exit Data

This data set observes 1,707 respondents to the 2010 undergraduate exit survey at Western Washington University. This survey is detailed at: http://www.wwu.edu/osr/documents/UGrad2010Report_000.pdf

9th/10th Grade Data

This data observes one complete year of Washington 9th graders. These students took the ITBS tests in 9th grade and then were followed into the 10th grade when they took the WASL. Also included are demographic and questions about the student's high school activities. 60,296 observations are included.

April 2004 CPS

The CPS is the data set used by the government to determine the unemployment rate. The April round also inquiries about education and occupation issues. This data set includes 6,624 observations.

Angrist and Krueger Data

The Angrist and Krueger data is a compilation of the 1970 and 1980 census data used in their work that examined the impact of cumpulsory school attendance laws on earnings. The paper can be found here and a description of the data, here.

Mroz Data

This famous data set observes only women; no men. This data was originally used to analyze why some women enter the labor force while others do not. The first variable in the data set (inlf) is equal to zero if the observation is not in the labor force. For obvious reasons, using observations not in the labor force to determine the impact of education on wages is not a great idea.

Wage2 Data

This data set comes from Jeffrey Wooldridge and includes 935 observations of men's wages, IQ, education, experience and a number of other important variables.