Suggestions for Data sets/Variables to Add #1

christophergandrud · 2014-01-31T11:53:34Z

christophergandrud · 2014-01-31T11:54:32Z

Note: sign in required indicates that the data source requires some sort of log in to access the data. This increases the difficulty of creating a function to download the data.

jknowles · 2014-02-24T14:48:44Z

Some big datasets from the American politics field:

DW-Nominate scores for legislative ideology in the US Senate and House
- Can create a Suggests dependency on wnominate
Open State Legislature Data already in rsunlight
Measuring American Legislatures Ideology

steffenzi · 2014-02-24T16:03:09Z

Parliament and government composition database

ulfelder · 2014-02-25T09:57:26Z

Freedom in the World (http://www.freedomhouse.org/report-types/freedom-world#.UwxoYvk7uM4)
Major Episodes of Political Violence (http://www.systemicpeace.org/inscr/inscr.htm)
Conflict Barometer (http://www.hiik.de/en/konfliktbarometer/)
Coups d'Etat (http://www.systemicpeace.org/inscr/inscr.htm)
Coup d'Etat Dataset (http://www.jonathanmpowell.com/coup-detat-dataset.html)
Political Instability Task Force Problem Set (http://www.systemicpeace.org/inscr/inscr.htm)

I have scripts that process all of these except HIIK's Conflict Barometer. If those might be helpful, please send me an email at ulfelder gmail and I will pass them along.

christophergandrud · 2014-02-25T10:13:46Z

@ulfelder If you already have scripts for processing these data set, that would be really helpful. Would it be possible for you to fork the psData dev branch and just place the scripts in a new folder called misc. We can build from there. This way you'll be logged as a contributor and can more easily take credit.

I can email you separately, if you'ld prefer that.

ulfelder · 2014-02-25T10:15:39Z

Will do. I've only used GitHub in rudimentary ways so far, so please
forgive me if it takes me a bit to figure out how get that done.

On Tue, Feb 25, 2014 at 5:13 AM, Christopher Gandrud <
[email protected]> wrote:

@ulfelder https://github.com/ulfelder If you already have scripts for
processing these data set, that would be really helpful. Would it be
possible for you to fork the psData dev branch and just place the
scripts in a new folder called misc. We can build from there.

Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-35993015
.

Jay Ulfelder, Ph.D.
Twitter: @jay_ulfelder http://twitter.com/#!/jay_ulfelder
Long-form blog: Dart-Throwing Chimphttp://dartthrowingchimp.wordpress.com/
Short-form blog: Tumbling Chimp http://dartthrowingchimp.tumblr.com/

christophergandrud · 2014-02-25T10:16:46Z

No worries, just let me know if you have any questions.

briatte · 2014-02-28T01:27:59Z

I've built draft methods for Quality of Government, World Development Indicators, Gleditsch and Ward and Powell and Thyne data. Let me know if I should merge them to your work.

My design uses a data frame attribute to store the equivalent information that Stata uses to parameter panel data with the xt commands, plus more (e.g. the format of the country variable). I then pass these settings to functions that use it to manipulate CSTS data, e.g. merging, lagging, etc. I can also try translating these functions.

Last, there's been recent updates on SDMX in R, so I can also work on trying to get Eurostat and/or OECD data in there.

briatte · 2014-02-28T01:29:24Z

here's a list of data sources that might have a few more candidates for the data request lists

christophergandrud · 2014-02-28T08:25:52Z

@briatte This is a great suggestion. I have a number of thoughts on what we might do that I've put in a new issue #3. I'ld really like to know what you think of them.

antagomir · 2014-02-28T11:29:09Z

It seems to me that the data sources described above would make several useful packages.

Food for thought: it has often turned useful to create multiple smaller, compact packages than a single package that contains all: (i) Handling dependencies is considerably easier with smaller packages, (ii) tutorials remain more compact and readable, (iii) packages tend to remain more stable, and (iv) responsibilities can be more clearly allocated between developers. Good to start with a single package but also good to consider splitting it into more compact pieces already at an early stage, splitting does not add too much maintenance overhead in our experience, rather the contrary.

This is our experience after working with Finnish open government data packages since 2009. Also the rOpenSci folks seem to prefer minimal, compact packages, mostly one package per one data source (API). I believe they reached the same conclusion.

christophergandrud · 2014-02-28T13:45:34Z

@antagomir I think that is a really good idea. I see at least two main characteristics that could divide these data sets into multiple packages that would make sense from a user/package perspective:

Country-year data / other data
Data downloaded from website URLS and APIs

The work in psData currently and QoG is country-year and downloaded from URLs, not APIs. Users are likely to want to merge the different data sets into one data frame. So it makes sense to have a package that would gather and clean them in a consistent way such that they could be merged together easily.

Conversely, for example, users probably don't want to merge survey data with with country-year data. It makes less sense to include this data in on package.

Maybe the package should be renamed something like psCountryData?

antagomir · 2014-02-28T13:51:25Z

I agree, it is difficult to draw the line. One option to consider is to have separate packages for distinct data sources, and then have the merging functions either in a generalist package that depends on these individual data crawling packages, or in one of the data packages. This way one could still isolate some parts into their own packages and have most advantages of such split.

christophergandrud · 2014-02-28T13:56:28Z

Yeah, I just posted a similar thought over on #3.

The focus would be on creating a common core syntax/capabilities that could be applied across packages that gather country-year data.

Each individual data set-package would use a similar syntax to return data frames from multiple sources that could easily be merged together.

Does this make sense?

antagomir · 2014-02-28T14:02:48Z

Absolutely.

leeper · 2014-02-28T14:25:16Z

I agree. I think that is exactly the right framework.

Thomas J. Leeper
http://www.thomasleeper.com

On Fri, Feb 28, 2014 at 3:02 PM, Leo Lahti [email protected] wrote:

Absolutely.

Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-36352755
.

christophergandrud · 2014-02-28T14:40:06Z

Great, I'm directing this conversation over to #5.

muuankarski mentioned this issue Feb 28, 2014

Quality of Governance Indicators #3

Closed

christophergandrud mentioned this issue Feb 28, 2014

Overhall of psData #5

Closed

christophergandrud closed this as completed Feb 28, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for Data sets/Variables to Add #1

Suggestions for Data sets/Variables to Add #1

christophergandrud commented Jan 31, 2014

christophergandrud commented Jan 31, 2014

jknowles commented Feb 24, 2014

steffenzi commented Feb 24, 2014

ulfelder commented Feb 25, 2014

christophergandrud commented Feb 25, 2014

ulfelder commented Feb 25, 2014

christophergandrud commented Feb 25, 2014

briatte commented Feb 28, 2014

briatte commented Feb 28, 2014

christophergandrud commented Feb 28, 2014

antagomir commented Feb 28, 2014

christophergandrud commented Feb 28, 2014

antagomir commented Feb 28, 2014

christophergandrud commented Feb 28, 2014

antagomir commented Feb 28, 2014

leeper commented Feb 28, 2014

christophergandrud commented Feb 28, 2014

Suggestions for Data sets/Variables to Add #1

Suggestions for Data sets/Variables to Add #1

Comments

christophergandrud commented Jan 31, 2014

Feel free to add data sets and variables that would be useful to include in psData. Code contributions are also always very helpful

christophergandrud commented Jan 31, 2014

jknowles commented Feb 24, 2014

steffenzi commented Feb 24, 2014

ulfelder commented Feb 25, 2014

christophergandrud commented Feb 25, 2014

ulfelder commented Feb 25, 2014

christophergandrud commented Feb 25, 2014

briatte commented Feb 28, 2014

briatte commented Feb 28, 2014

christophergandrud commented Feb 28, 2014

antagomir commented Feb 28, 2014

christophergandrud commented Feb 28, 2014

antagomir commented Feb 28, 2014

christophergandrud commented Feb 28, 2014

antagomir commented Feb 28, 2014

leeper commented Feb 28, 2014

christophergandrud commented Feb 28, 2014