-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhall of psData #5
Comments
Yes, open a new repo and then let's brainstorm there. Thomas J. Leeper On Fri, Feb 28, 2014 at 3:39 PM, Christopher Gandrud <
|
We may also wish to develop general guidelines for parliamentary data and some other data types that are seen in many countries. Perhaps we should start a general guidelines section or repository under rOpenGov? In the longer run we could accumulate there multiple recommendation documents for different general data types. If the guidelines are general rather than tied to an individual package, perhaps people will be more likely to implement them also in their own work? |
I like that idea a lot. Maybe what we need are three repos:
? |
There might be two projects here:
|
Yes, Each R package needs its own repository (psCountryData R package). I would avoid creating several new repositories for tutorials/guidelines; in the long term this will obscure the structure and scope of the rOpenGov account: the rOpenGov organization is primarily aimed for R packages. This helps to keep all automated procedures clear. We can have few repositories for supporting material, such as the website infrastructure, guidelines etc. but try to keep these at minimum. But this is a community project so I am ready to reconsider my views if there are other good suggestions. Ping @ouzor @jlehtoma @muuankarski I would prefer either of these two options:
|
@briatte any plans to extend your work on French parliamentary data into an R package? Seems you are already getting close to that with all functions and examples. |
More on 1. A possible way to implement the unified method would be to think of CSTS data like Since the country identifier is really a column name, storing the panel type only requires one character string. The panel type format (country, region, firm, etc.) might be another interesting information to store, in order to auto-convert country codes before merging datasets (I think ISO-3N is the best baseline here). The same line of thinking applies to the time variable. The metadata you want for the unified method is therefore something like
where If you want to split that in two components, then your method is where psdata thus sounds like a fine name for the project, because it means either "political science" or "panel/series" data, which is what the package really offers to get: data, plus a method to manipulate country-year, or election-year, or region-decade, or firm-quarter data. It's all possible behind an end result that works mostly with CSTS data. I'm not sure I'll have enough functions left to have a separate |
@antagomir More on 2 (off topic). I'm currently updating the repo to have a single method to process cosponsorships in amendments, bills and resolutions, for both the French National Assembly and the Senate. The data (scraper) functions are idiosyncratic, but the cosponsorship network functions might or not make up package material. The only other legislature I know with similar data is the U.S. Congress (the U.S. experts are Fowler, Waugh, Kirkland). Do you know of any other legislature with similar data? A package would become interesting if it could deal with data from several countries. |
@antagomir I like the idea of a generic rOpenGov framework depository where the psData framework material is a subdirectory. This will highlight the collaborative nature of the project, centralise the different frameworks into one place so that they are easy to follow and cross-observe, while not clogging up rOpenGov with lots of non-R repos. |
@christophergandrud Ok, sounds good. I will cross-check with other core team members and hopefully we could launch the repository shortly. @briatte Not aware at the moment but this is certainly interesting, keeping in mind. |
@briatte I like the ideas you propose in 1. I want to think about this some more, but these are some initial thoughts:
Though if we do create a psData class it would definitely be good to come up with a merging method. Otherwise, I think this is a really nice direction to head. @antagomir Great. |
Oh, @briatte I really like the idea of psData as both 'political science data' and 'panel series data' and the general framework you sketched out that makes it useful for different types. |
As as I understand, the code would look like
Re: conventions, I don't have strong preferences for repo organization. I have a tendency to use only lowercase, underscores and periods in name. |
is there any interest in adding the ability to deal with k-adic data in here? i was sort of working on a package for dealing with this sort of data, but it would probably be better to integrate this with what you all are doing. |
Tools for k-adic data, what purpose? Open government data or more generic? I think this should go to a separate issue. |
Country data mostly. |
Any examples, might help to connect? |
So lots of people in political science and econ look at things that occur between pairs of states (or triads, etc.) and how features of those states condition what the outcome is and such. Trade, conflict, etc. The data manipulations are really similar to what you have to do with monadic country year data. Constructing the data sets is another issue that is a pain point for many researchers. |
Merging multilevel data is a related issue, but at that stage, the solution must be some clever |
Might be good to make dyadic things into their own package but try to coordinate with the other packages? Very welcome to join rOpenGov with the dyadic package as well. |
+1 for creating a separate "framework" repo to host documents and guidelines. As @antagomir pointed out, for individual packages/frameworks a GitHub wiki (which is a git repo in itself, of course) is a viable option as well. |
Related to @briatte 's comment and still off topic, I recently started finpar package for various data on the Finnish Parliament using an unofficial API. The API and the underlying database + website are still being developed, but data is already available for individual MP activities (thus potentially also "cosponsorships in amendments, bills and resolutions"). Any co-development in dealing with parliamentary data would therefore be interesting. |
Going way back up. I agree that the ability to create dyads, lags, leads, and this sort of thing would be a really useful. I also +1 @antagomir suggestion that they should be separated into their own package. This package would require psData objects. So they would be closely linked. On that note: I think a good goal of psData could be to get data from multiple sources (by linking into specific packages), similarly build variables suggested in the literature (e.g. the winset variable in psData now), and merge it into one panel-series object. Maybe an example work flow could go something like:
Then Re syntax style: I don't really have a preference either. In the current version of psData I've tended to use CamelCase. But we could definitely do all lowercase with A new issue for the country-year data we should eventually discuss: how to deal with the divided countries, e.g. East/West Germany. |
@christophergandrud Your latest suggestion sounds feasible to me. Something along these lines we need. The best way to get fowrard is to just get something functional implemented. Then we can learn by experience. I have now created a new repository for document guidelines: https://github.com/rOpenGov/guidelines-docs (is guidelines-docs a good name - we can still change if you have better suggestions?) Also let me know if you are missing premissions and I will add. Ping @christophergandrud @leeper @jlehtoma @muuankarski @briatte |
@antagomir Perfect. Later today I'll create a sub directory in https://github.com/rOpenGov/guidelines-docs for psData and put in an .md file to begin working on the first draft of the guidelines. I'm going to close this thread and direct the conversation over to that repo. It's been great working on these issues here. |
Very early/incomplete first version of the guidelines https://github.com/rOpenGov/guidelines-docs/tree/master/psData-guidelines. Please help edit, improve. |
@jlehtoma just to add: finpar looks good! Let me know if you want me to try having my cosponsorship network functions to also work with your data. I'm almost done coding a unified method for bills, resolutions and amendments from both the upper and lower chambers in France, and plan to test it with the U.S. data (initially produced in Matlab) too. If you plan to develop legislative data as packages, I think @alexstorer has some work on the U.S. Congress, in Python, and Dimiter Toshkov has some data for the European Parliament. It would be great to offer a comparative framework for all of this, but it's a different project that requires a different forum. @christophergandrud the guidelines look great so far. I'll add stuff inspired by my draft code. And I like the possibility to stick mostly with lowercase, I confess not liking CamelCase very much ;) |
Following discussions in #1 and #3, there seems to be a consensus emerging that we should focus on:
As such, I think what we might want to do is
(a) Create a new text document repo that would be used to collaborate on a common psCountryData framework. (Probably starting with an .md document that would develop a framework checklist for individual country-year data packages to follow. Also, because my main professional incentive right now is to publish papers, this could be developed into a JSS style article laying out the framework and giving examples from packages that implement it. Any interested person could of course co-author.).
(a) Create a new package called psCountryData that would contain core functions shared by the individual data Getter/Variable Builder packages. For example, psData currently includes a
CountryID
function. This is a modified version of countrycode that is handy for creating merge ready country identifiers. It looks like there is some good stuff in @briatte's QoG package that could go in there too.(c) Break up psData into individual Getter and Variable Builder packages that implement this framework. Similarly the two QoG packages could (depending on the authors' preferences), implement the unified syntax.
Any thoughts?
The text was updated successfully, but these errors were encountered: