Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panel_lag and shift #9

Closed
1 task
christophergandrud opened this issue Mar 5, 2014 · 9 comments
Closed
1 task

panel_lag and shift #9

christophergandrud opened this issue Mar 5, 2014 · 9 comments
Milestone

Comments

@christophergandrud
Copy link
Contributor

  • add improvements to panel lag and shift based on recent updates in DataCombine. These are mostly error handling improvements, but also has capabilities for a number of different moving averages/'spread' for dummy variables.
@vincentarelbundock
Copy link

What is wrong the pdata.frame from the plm package? Seems like this panel data stuff (shift, lag) is reinventing the wheel a little bit.

@christophergandrud
Copy link
Contributor Author

I don't actually think this really belongs in psData, but others have made moves to include it.

@briatte
Copy link
Contributor

briatte commented Mar 6, 2014

@vincentarelbundock thanks for flagging this, I'll check for redundancy with that class, I did not know it.

@briatte
Copy link
Contributor

briatte commented Mar 6, 2014

Okay, I've checked what the plm package does and does not:

  • it has useful checks for undeclared balanced panels
  • it creates dual S3 classes like pseries and pdata.frame
  • it has methods for subscripting, extracting
  • the merge method does not preserve pdata.frame attributes

pdata.frame works on psData objects, and reversely, because they both coerce to data frames. This means we can use the pdata.frame internally if it applies to many functions, converting back and forth between pdata.frame and psData, or just let the user use the psData class as a preprocessing format before using plm.

@vincentarelbundock
Copy link

Could you explain why it makes sense to have any panel-specific functions in this package? Sounds to me like it would be a good idea for package maintainers to enforce standards on data cleaning (e.g. unique indices for panel data; if country-year, then at least the index must be based on a "recognized" country code), but the rest sounds like feature creep...

@christophergandrud
Copy link
Contributor Author

I agree with @vincentarelbundock on this. Though I personally don't like plm (I think it's clunky, can be unintuitive, and lacks flexibility (from my limited experience with it)), I think the end product of psData should be a data set that you can easily use in plm or whatever for further transformations/analysis, rather than including these functionalities internally.

@vincentarelbundock
Copy link

Glad to see I'm not alone. I find that the unix philosophy of doing one thing well really makes things easier on maintainers. I thing there's a clear need for a common API to online government-related data, but the need for yet another panel data framework in R is much less obvious to me. And why these two things should be in a single package is even less clear to me.

@antagomir
Copy link
Member

Also the API is just a framework and can be potentially utilized in multiple packages.

@briatte
Copy link
Contributor

briatte commented Mar 7, 2014

The basic panel and time series functions will necessary be feature creep, since full-fledged packages exist for these. I added these functions to test the downloaded and formatted datasets against practical use, using little code chunks by @christophergandrud and @zmjones (e.g. lag/lead, time since event).

We can remove the panel.r file entirely if the panel functions are unneeded. I find them useful to quickly replicate models from within the package, but I'm not claiming that they are better suited for the job than other packages like plm or xts. And I also support the philosophy of doing just one thing well. So we need to decide:

  • how much the package does to download and clean up datasets (should be: a lot)
  • how much the package does for country-year data (e.g. cross-convert country codes etc.)
  • how much the packages does for panel data (could be: not much or even nothing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants