Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional problem sets #1

Open
anthonysuen opened this issue Nov 24, 2015 · 0 comments
Open

Additional problem sets #1

anthonysuen opened this issue Nov 24, 2015 · 0 comments

Comments

@anthonysuen
Copy link
Contributor

Thank you for your work, and I would like to speak with you next week -- I know it's a holiday week and am writing some times below that work for me.

/1/ Slave sales data set

The hypothesis test looks fine and looking over it, a few thoughts. First, this is really a powerful tool, and it is good that instructions/text appear between the commands.

Second, let me know if going a bit more with data set is possible? Specifically, as I look at the text, one thought was that there would first come some visualizations of the data? I am trying to follow the main course, so perhaps a few bar graphs and histograms would be good, just so we can explore the data, before moving to hypothesis testing. Some examples include:

--frequency table of prices (V14), so there is a sense of what we are looking at
--bar graph might be just prices (V14), including 99999 and excluding 99999.
--histogram with prices (V14), with bins of $500 and $250 (when the values go from $250 to $2000, something like a bell shape emerges)

Of course, there are many other options and please let me know if this might be something that your team could help with.

A point on naming vars, and I hope you will understand why I bring this up: guaranteed works, notG seems at least awkward and perhaps inappropriate. We are examining the fate of human beings. On a procedural note, teaching students about naming variables, to my mind, is an important task, so let me know any thoughts and, again, I hope you would understand why I thought to bring up this point.

/2/ Additions data set

David Culler referenced a data set in Excel, and I'm attaching it here for convenience (Stata and SPSS versions also exist, but this seemed simplest). Attached is also a PDF which explains the var names,

The dataset has info on Relief (welfare) payments to poor in England in 1831. The motivating question is the extent to which welfare represented a bad, or perverse incentive; one thing to examine is the connection between Relief and Unemployment.

--Like with the Slave Sales data-set, creating a histogram based on the frequency distribution.

The decision on the size of the bins matters -- 1 shilling may be too much, 4 shillings may be too rough,

For Kent, the first county in the dataset (County==1, and there should be 24 parishes), a histogram with 2-shilling bins reveals a left-skewed distribution, and an exercise might be to do this for some of the other counties, so students can use the code.

-- Mean, variance and std. dev. are the next step: what is the central tendency, how much dispersion exits around the mean.

Again, calculating the mean for Kent then shows how this might be done across all 311 counties.

-- Two extensions are possible:

Constructing a bar graph with the average relief payment across counties -- different counties have a different number of perishes, very different populations, etc.. This is a neat summary and it gets to the question of variation between counties that could serve as a motivation for calculating total variation distance.

A coefficient of variation might be a slight extension, and useful since we can compare how much variation exists between, say, average weekly wages over centuries and not have to worry about accounting for inflation, etc.

-- OLS with Unemployment and Relief

There is a slight positive relationship: as Unemployment in a parish increases, Relief payments increase.

It would be good to see residual as well as a line fit plots, so students get a sense of the relationship, but also about the outlines. For instance, focusing on Kent (County==1), there is a one perish with 0 unemployment and one with 25% unemployment, so this is a good relationship to explore via a bootstrap.

Let me know your thoughts, I can provide more data, but perhaps it makes sense to get first these data sets done, and here are some times to meet:

Mon: after 3:10pm
Weds any reasonable time, if you are on campus
Tues: after 1:10pm and before 3:30pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant