Additional problem sets #1

anthonysuen · 2015-11-24T00:51:47Z

Thank you for your work, and I would like to speak with you next week -- I know it's a holiday week and am writing some times below that work for me.

/1/ Slave sales data set

The hypothesis test looks fine and looking over it, a few thoughts. First, this is really a powerful tool, and it is good that instructions/text appear between the commands.

Second, let me know if going a bit more with data set is possible? Specifically, as I look at the text, one thought was that there would first come some visualizations of the data? I am trying to follow the main course, so perhaps a few bar graphs and histograms would be good, just so we can explore the data, before moving to hypothesis testing. Some examples include:

--frequency table of prices (V14), so there is a sense of what we are looking at
--bar graph might be just prices (V14), including 99999 and excluding 99999.
--histogram with prices (V14), with bins of $500 and $250 (when the values go from $250 to $2000, something like a bell shape emerges)

Of course, there are many other options and please let me know if this might be something that your team could help with.

A point on naming vars, and I hope you will understand why I bring this up: guaranteed works, notG seems at least awkward and perhaps inappropriate. We are examining the fate of human beings. On a procedural note, teaching students about naming variables, to my mind, is an important task, so let me know any thoughts and, again, I hope you would understand why I thought to bring up this point.

/2/ Additions data set

David Culler referenced a data set in Excel, and I'm attaching it here for convenience (Stata and SPSS versions also exist, but this seemed simplest). Attached is also a PDF which explains the var names,

The dataset has info on Relief (welfare) payments to poor in England in 1831. The motivating question is the extent to which welfare represented a bad, or perverse incentive; one thing to examine is the connection between Relief and Unemployment.

--Like with the Slave Sales data-set, creating a histogram based on the frequency distribution.

The decision on the size of the bins matters -- 1 shilling may be too much, 4 shillings may be too rough,

For Kent, the first county in the dataset (County==1, and there should be 24 parishes), a histogram with 2-shilling bins reveals a left-skewed distribution, and an exercise might be to do this for some of the other counties, so students can use the code.

-- Mean, variance and std. dev. are the next step: what is the central tendency, how much dispersion exits around the mean.

Again, calculating the mean for Kent then shows how this might be done across all 311 counties.

-- Two extensions are possible:

Constructing a bar graph with the average relief payment across counties -- different counties have a different number of perishes, very different populations, etc.. This is a neat summary and it gets to the question of variation between counties that could serve as a motivation for calculating total variation distance.

A coefficient of variation might be a slight extension, and useful since we can compare how much variation exists between, say, average weekly wages over centuries and not have to worry about accounting for inflation, etc.

-- OLS with Unemployment and Relief

There is a slight positive relationship: as Unemployment in a parish increases, Relief payments increase.

It would be good to see residual as well as a line fit plots, so students get a sense of the relationship, but also about the outlines. For instance, focusing on Kent (County==1), there is a one perish with 0 unemployment and one with 25% unemployment, so this is a good relationship to explore via a bootstrap.

Let me know your thoughts, I can provide more data, but perhaps it makes sense to get first these data sets done, and here are some times to meet:

Mon: after 3:10pm
Weds any reasonable time, if you are on campus
Tues: after 1:10pm and before 3:30pm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional problem sets #1

Additional problem sets #1

anthonysuen commented Nov 24, 2015

Additional problem sets #1

Additional problem sets #1

Comments

anthonysuen commented Nov 24, 2015