diff --git a/Assignments/A4/Assignment#4 (1).pdf b/Assignments/A4/Assignment#4 (1).pdf new file mode 100644 index 0000000..8f3cbd5 Binary files /dev/null and b/Assignments/A4/Assignment#4 (1).pdf differ diff --git a/Assignments/A4/Assignment#4(2).pdf b/Assignments/A4/Assignment#4(2).pdf new file mode 100644 index 0000000..ae42538 Binary files /dev/null and b/Assignments/A4/Assignment#4(2).pdf differ diff --git a/Assignments/A4/Assignment#4.ipynb b/Assignments/A4/Assignment#4.ipynb new file mode 100644 index 0000000..78648f5 --- /dev/null +++ b/Assignments/A4/Assignment#4.ipynb @@ -0,0 +1,101 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(a) File attached in the folder A4. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(b) I firstly used the code posted by Li in issues to filter out the invalid numbers in my list. There left 91 valid numbers. I called all 91 of them, and 11 responded according to the Response variable criteria. 189 did not responded. My response rate is 5.5%. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(c) Among the 11 respondents, 8 answered the voting question, 7 of the 8 answered the age question while 1 answered only the voting question but not the age question." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(d) It was during Monday 5:30-6:30pm at Atlanda (404) when I called most of the numbers. 3 of them did not respond at the first place and called me back later. I think time does contribute to the low rate of response because it could be the time when people get together and have dinner with family or on their way back to home for dinner. I could have called during weekends to get a higher response rate. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(e) The median age is 46.5,and the median age of Atlanda city based on the Factfinder table is 33.5. These two numbers do not match mainly because of the small sample size in this study. 7 respondents cannot be representative of the whole Atlanda population. Moreover, we did not include those under the age of 18 in our sample, and it, in turn, could bias our sample by overestimate the median age. The third reason for getting a median age higher than the state average is that retired, old people tend to answer telephone survey more often as they have more free time than people in college and at work. " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "46.5" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "Agelist = [65,62,58,23,35,27]\n", + "np.median(Agelist)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(f) According to the election results site, 56.88% voted for Republican (Trump) and 43.12% voted for Democrat (Clinton). However, 42.86% of my respondent voted for Republican and 57.12% voted for Democrat. This is opposite to the voting result in 2016 election. I do not think that the order I say would influence the results a lot, because the 2016 election is something that already happened, and I think if people decide to respond to the question, it is unlikely that they would choose to lie about what they have chosen back in 2016. However, I think if this survey is taken before the election, the testing order effect could be more influential in the final results of the poll. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/Assignments/A4/PhoneSurvey.xlsx b/Assignments/A4/PhoneSurvey.xlsx deleted file mode 100644 index 25d0ffb..0000000 Binary files a/Assignments/A4/PhoneSurvey.xlsx and /dev/null differ diff --git a/Assignments/A4/ValidateNum.ipynb b/Assignments/A4/ValidateNum.ipynb new file mode 100644 index 0000000..c2c67cd --- /dev/null +++ b/Assignments/A4/ValidateNum.ipynb @@ -0,0 +1,553 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDPhoneResponsePartyAgeCodePN
014310454NaNNaNNaN4044044310454
126920338NaNNaNNaN4044046920338
236918070NaNNaNNaN4044046918070
343605305NaNNaNNaN4044043605305
458892229NaNNaNNaN4044048892229
\n", + "
" + ], + "text/plain": [ + " ID Phone Response Party Age Code PN\n", + "0 1 4310454 NaN NaN NaN 404 4044310454\n", + "1 2 6920338 NaN NaN NaN 404 4046920338\n", + "2 3 6918070 NaN NaN NaN 404 4046918070\n", + "3 4 3605305 NaN NaN NaN 404 4043605305\n", + "4 5 8892229 NaN NaN NaN 404 4048892229" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "#Delete first two rows in excel\n", + "#Read in file\n", + "Num = pd.read_excel('PhoneSurvey.xlsx', names=['ID','Phone','Response','Party','Age'])\n", + "#Code 267 for Philly!\n", + "Num[\"Code\"]=404\n", + "Num[\"PN\"]=Num[\"Code\"].astype(str)+Num[\"Phone\"].astype(str)\n", + "Num.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [], + "source": [ + "import urllib.request, json \n", + "list=[]\n", + "#Replace with access key\n", + "key = \"5dcf6262c47255e06ebf0b0288af8683\"\n", + "def validtest(number):\n", + " link=\"http://apilayer.net/api/validate?access_key=\"+key+\"&number=\"+number+\"&country_code=US&format=1\"\n", + " with urllib.request.urlopen(link) as url:\n", + " data = json.loads(url.read().decode())\n", + " #Specify conditions for valid numbers\n", + " if data[\"valid\"]==True and data['carrier'] !='' and data['location'] !='' and data['line_type'] !='':\n", + " list.append(1)\n", + " else:\n", + " list.append(0)\n", + " \n", + " return data" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [], + "source": [ + "for num in Num[\"PN\"]:\n", + " validtest(num)" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [], + "source": [ + "res= pd.Series(list)\n", + "Num[\"Label\"] = res.values" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "91" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Counts of valid numbers\n", + "sum(list)" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDPhoneResponsePartyAgeCodePNLabel
014310454NaNNaNNaN40440443104541
126920338NaNNaNNaN40440469203380
236918070NaNNaNNaN40440469180700
343605305NaNNaNNaN40440436053051
458892229NaNNaNNaN40440488922291
\n", + "
" + ], + "text/plain": [ + " ID Phone Response Party Age Code PN Label\n", + "0 1 4310454 NaN NaN NaN 404 4044310454 1\n", + "1 2 6920338 NaN NaN NaN 404 4046920338 0\n", + "2 3 6918070 NaN NaN NaN 404 4046918070 0\n", + "3 4 3605305 NaN NaN NaN 404 4043605305 1\n", + "4 5 8892229 NaN NaN NaN 404 4048892229 1" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Num.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [], + "source": [ + "Num.to_excel('data.xlsx')" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 0,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 0,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 1,\n", + " 0,\n", + " 0]" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/README.md b/README.md index 8c7db56..47accd7 100644 --- a/README.md +++ b/README.md @@ -70,6 +70,7 @@ Late problem sets will be penalized 1 points for every hour they are late. For e | | | | KW2009, A2017 | | | | | | EKLS2015 | | <<<<<<< HEAD +<<<<<<< HEAD | Oct. 17 | W | Simulated data | [Slides](https://github.com/UC-MACSS/persp-analysis_A18/blob/master/Slides/PerspAnalysis_SimData.pdf) | [A3](https://github.com/UC-MACSS/persp-analysis_A18/tree/master/Assignments/A3/Assign3.pdf) | | Oct. 22 | M | Simulated data | M2002 | | | Oct. 24 | W | Asking questions | S2018, Ch. 3, [Slides](https://github.com/UC-MACSS/persp-analysis_A18/blob/master/Slides/PerspAnalysis_Surv.pdf) | [A4](https://github.com/UC-MACSS/persp-analysis_A18/tree/master/Assignments/A4/Assign4.pdf) | @@ -90,10 +91,19 @@ Late problem sets will be penalized 1 points for every hour they are late. For e | Dec. 3 | M | CSS: Psychology | | | | Dec. 5 | W | CSS: Economics | | | ======= +======= +>>>>>>> submitting assignment #4 | Oct. 17 | W | Simulated data | Slides | [A3](https://github.com/UC-MACSS/persp-analysis_A18/tree/master/Assignments/A3/Assign3.pdf) | | Oct. 22 | M | Simulated data | M2002 | | | Oct. 24 | W | Asking questions | S2018, Ch. 3 | A4 | | Oct. 29 | M | Asking questions | | | +======= +| Oct. 17 | W | Simulated data | [Slides](https://github.com/UC-MACSS/persp-analysis_A18/blob/master/Slides/PerspAnalysis_SimData.pdf) | [A3](https://github.com/UC-MACSS/persp-analysis_A18/tree/master/Assignments/A3/Assign3.pdf) | +| Oct. 22 | M | Simulated data | M2002 | | +| Oct. 24 | W | Asking questions | S2018, Ch. 3, [Slides](https://github.com/UC-MACSS/persp-analysis_A18/blob/master/Slides/PerspAnalysis_Surv.pdf) | [A4](https://github.com/UC-MACSS/persp-analysis_A18/tree/master/Assignments/A4/Assign4.pdf) | +| Oct. 29 | M | Asking questions | [CE2015](https://github.com/UC-MACSS/persp-analysis_A18/blob/master/Papers/CanannEvans2015.pdf), WRGG2015 | | +| | | | S2014, S2016 | | +>>>>>>> upstream/master | Oct. 31 | W | Experiments | S2018, Ch. 4 | A5 | | Nov. 5 | M | Experiments | | | | Nov. 7 | W | Collaboration | S2018, Ch. 5 | A6 | @@ -113,10 +123,13 @@ Late problem sets will be penalized 1 points for every hour they are late. For e * [A2017] Abrahao, Bruno, Paolo Parigi, Alok Gupta, and Karen S. Cook, "Reputation offsets trust judgments based on social biases among Airbnb users," *PNAS*, 114:37 (September 12, 2017), pp. 9849-9853. * [AR2014] Alcott, Hunt and Todd Rogers, "The Short-run and Long-run Effects of Behavioral Interventions: Experimental Evidence from Energy Conservation," *American Economic Review*, 104:10 (Oct. 2014), pp. 3,003-3,037. * [A1990] Angrist, Joshua D., "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records," *American Economic Review*, 80:3 (1990), pp. 313-336. +<<<<<<< HEAD * [AH2012] Ansolabehere, Stephen and Eitan Hersh, "Validation: What Big Data Reveal about Survey Misreporting and the Real Electorate," *Political Analysis*, 20:3, (2012), pp. 437-459. * [B2009] Beazley, David M., *Python Essential Reference*, 4th edition, Addison-Wesley (2009). * [BKV2010] Bell, Robert M., Yehuda Koren, and Chris Volinsky, "All Together Now: A Perspective on the Netflix Prize," *Chance*, 23:1 (2010), pp. 24-29. * [B2014] Blumenstock, Joshua (2014), "[Calling for Better Measuremenet: Estimating an Individual's Wealth and Well-Being from Mobile Phone Transaction Records](http://escholarship.org/uc/item8zs63942)," Presented at KDD--Data Science for Social Good 2014, New York. +======= +>>>>>>> submitting assignment #4 * [CE2015] Canann, Taylor J. and Richard W. Evans, "Determinants of Short-term Lender Location and Interest Rates," *Journal of Financial Services Research,* 48:3, (Dec. 2015) pp. 235-262. [[link to paper](https://github.com/UC-MACSS/persp-analysis_A18/blob/master/Papers/CanannEvans2015.pdf)] * [CS2014] Chacon, Scott and Ben Straub, *Pro Git: Everything You Need to Know about Git*, 2nd Edition, Apress, 2014. [Free online version](https://git-scm.com/book/en/v2) * [CK2013] Costa, Dora L. and Matthew E. Kahn, "Energy Conservation Nudges and Environmentalist Ideology: Evidence from a Randomized Residential Electricity Field Experiment," *Journal of the European Economic Association*, 11:3 (2013), pp. 680-702. @@ -133,11 +146,17 @@ Late problem sets will be penalized 1 points for every hour they are late. For e * [M2018] McKinney, Wes, *Python for Data Analysis*, 2nd edition, O'Reilly Media, Inc. (2018). * [M2002] Moretti, Sabrina, "Computer Simulation in Sociology: What Contribution?" *Social Science Computer Review*, 20:1 (Spring 2002), pp. 43-57. * [RW2000] Rosenzweig, Mark R. and Kennith I. Wolpin, "Natural 'Natural Experiments' in Economics," *Journal of Economic Literature*, 38:4 (Dec. 2000), pp. 827-874. +<<<<<<< HEAD * [SNCGG2007] Schultz, P. Wesley, Jessica M. Nolan, Robert B. Cialdini, Noah J. Goldstein, and Vladas Griskevicius, "The Constructive, Destructive, and Reconstructive Power of Social Norms," *Psychological Science*, 18:5 (2007), pp. 429-434. * [S2014] Sugie, Naomi F., "Finding Work: A Smartphone Study of Job Searching, Social Contacts, and Wellbeing After Prison,"" PhD Thesis, Princeton University (2014). [[link here](http://dataspace.princeton.edu/jspui/handle/88435/dsp011544br32k)] * [S2016] Sugie, Naomi F., "Utilizing Smartphones to Study Disadvantaged and hard-to-Reach Groups," *Sociological Methods & Research*, January (2016). * [WRGG2015] Wang, Wei, David Rothschild, Sharad Goel, and Andrew Gelman, "Forecasting Elections with Non-Representative Polls," *International Journal of Forecasting*, 31:3 (2015) pp. 980-991. * [W2014] Watts, Duncan J., "Common Sense and Sociological Explanations," *American Journal of Sociology*, 120:2 (Sep. 2014), pp. 313-351. +======= +* [S2014] Sugie, Naomi F., "Finding Work: A Smartphone Study of Job Searching, Social Contacts, and Wellbeing After Prison,"" PhD Thesis, Princeton University (2014). [link here](http://dataspace.princeton.edu/jspui/handle/88435/dsp011544br32k) +* [S2016] Sugie, Naomi F., "Utilizing Smartphones to Study Disadvantaged and hard-to-Reach Groups," *Sociological Methods & Research*, January (2016). +* [WRGG2015] Wang, Wei, David Rothschild, Sharad Goel, and Andrew Gelman, "Forecasting Elections with Non-Representative Polls," *International Journal of Forecasting*, 31:3 (2015) pp. 980-991. +>>>>>>> submitting assignment #4 * [WWE2018] Wu, Lingfei, Dashun Wang, and James A. Evans, "Large Teams Have Developed Science and Technology; Small Teams Have Disrupted It," working paper, 2018. [[link here](https://arxiv.org/pdf/1709.02445.pdf)] diff --git a/Slides/PerspAnalysis_Surv.pdf b/Slides/PerspAnalysis_Surv.pdf deleted file mode 100644 index a6934b0..0000000 Binary files a/Slides/PerspAnalysis_Surv.pdf and /dev/null differ