Fellow Stata enthusiast Kyle Barron has developed and made available stata_kernel
which makes Stata available in Jupyter Notebooks. Check here for more information on setup and installation. Otherwise here is a working example.
- 1. Table of Coneents
- 2. Stata Quick Reference
- 3. Stata to Pandas Cross-Walk
- 4. Importing JSON to Stata
- 5. License
These examples are designed to run from the web. Where possible and where time has permitted these examples are both Mac and Windows compatible. But, no promises.
For a curated list of packages, their descriptions, and links to further documentation see: list of useful packages
The folks over geocenter.github.io have produced and provided a series of Stata quick references. I keep copies printed out on cardstock at all of my workstations. They're great as gifts! Local copies here. Also worth reviewing are similar quick references for R and RStudio.
Send comments or suggestions via GitHub or Twitter : @adamrossnelson
Fictional GPA data. Demonstration of stata random number generators.
Simple graphics using IPEDS data ExampleIPEDS.dta
// To run from command line:
do https://raw.githubusercontent.com/adamrossnelson/StataQuickReference/master/plotting/BasicPlotting.do
// To load example data:
use https://github.com/adamrossnelson/StataQuickReference/raw/master/exampledata/ExampleIPEDS.dta
Want to make your own schemes? Vince Wiggins of StataCorp provided a walk-through on this topic.
- Scheming your way to consistent graphs.
- Vince's presentation materials schemetalk.zip.
- Also worth noting, clever use of SMCL as a slide deck.
Demonstrates production of a stacked area chart in Stata. Also shows dropping a standard legend in favor of explanatory text over stacked areas/regions.
// To run from command line:
do https://raw.githubusercontent.com/adamrossnelson/StataQuickReference/master/plotting/StackArea/StackArea.do
This do file shows using margins
and marginsplot
to visualize trends using Stata. Data is fictional.
// To run from command line:
do https://raw.githubusercontent.com/adamrossnelson/StataQuickReference/master/plotting/marginstrend.do
Using the longitude and latitude variables stored in IPEDS data, this routine builds a visualization of the institutions in North America.
// To run from command line:
do https://raw.githubusercontent.com/adamrossnelson/StataQuickReference/master/plotting/geoplotipeds.do
Another plotting demonstration. Demonstrates generating a radar plot. Requires the radar
package available at http://fmwww.bc.edu/repec/bocode/r/radar.html
. The example given in radar
documentation shows the package excels at plotting a categorical y (such as automobile make) and a continuous x (automobile weight).
This implementation plots a categorical y (such as a series of factor scores) and a continuous x (which would be each factor score's mean).
Help wanted note: Seeking collaborator in converting facradar.do
into Stata .ado
package.
This file is a quick routine that will install a range of graphic schemes. Stata's graphic schemes are a useful way to improve the visual presentation of data.
This routine also sets the default scheme to lean2wide
which is a modified version of lean1 and lean2. More information about development history in the do file.
To execute from command line:
do https://raw.githubusercontent.com/adamrossnelson/StataQuickReference/master/plotting/GetStataSchemes.do
// Additional related commands include ...
// Show available schemes:
graph query, schemes
// Change default scheme:
set scheme [scheme name], perm
numlabel, add
Prefix numeric values to value labels. With the , add
option and no other arguments this command will operate on all categorical variables. Helpful when displaying and inspecting data.
about
Displays information about your version of Stata. If you are running Stata for Windows, information about memory is also displayed.
sysdir
Query/list system directories.
macro list
Displays a list of the current set of macros (both local and global) in memory. Also displays information about your version and/or installation instance of Stata.
creturn list
Displays information about your version, your computer, your installation, and a variety of system variables. From the manual " Stata's c-class, c(), contains the values of system parameters and settings, along with certain constants such as the value of pi. c() values may be referred to but may not be assigned."
set
Displays system settings. This collection of commands permits edits to system settings. A common example: set more off
- which tells Stata to pause or not pause for the --more--
messages. Thus, another common example soon after each installation is set more off, perm
.
mvencode
and mvdecode
Commands that quickly recategorize/recode missing values. mvencode
Changes missing values to numeric values. mvdecode
Changes numeric values to missing values.
Demonstration of defining a program, passing arguments, referencing and displaying the arguments passed. Demonstrates material presented Stata 15 User Manual sections 18.1 & 18.4.
To execute from command line:
do https://raw.githubusercontent.com/adamrossnelson/StataQuickReference/master/ProgArgs.do
This do file provides a method that adds text to all variable labels. Also provides an example code that can reference existing variable labels which is useful when automating output for graphs or putdoc/pdf, etc.
// To run from command line:
do https://raw.githubusercontent.com/adamrossnelson/StataQuickReference/master/renvarlabs.do
Generates markdown from Stata using qui
and noi di...
Demonstrates and tests a method for Stata to automate output for display on GitHub or Markdown.
Cannot run from the command line without downloading a local copy.
Example of a vanity branding splash.
To execute from command line (or do file):
do https://raw.githubusercontent.com/adamrossnelson/StataQuickReference/master/asciiadam.do
This repo also provides a Stata to Pandas Cross-Walk
This task is not as straight forward as it should or could be. At least two excellent packages exist including insheetjson and jsonio. The discussion boards provide friendly advice, too.
For myself, I often find it less difficult and more reliable to use Python's Pandas library. Until recently, for some use cases, this roundabout (json > pandas > stata) approach was necessary. In version 0.23.0, Pandas released support for Stata's strL
data type. Before strL
support, if any text fields in the JSON data contained over 2045 characters, it was difficult to go directly from JSON
to Stata dta
. The issue documenation (pandas-dev/pandas#16450), resolved, and now available in the latest release. The revised documentation also provides helpful information.
Now that Pandas supports Stata's strL
data type, the first of the two code samples below should be sufficient. Also, if all text fields are 2045 characters or less the second code block below works well to produce a Stata dta
file:
import pandas as pd
pd.read_json('DemoFileRaw.json').to_stata('DemoFileRaw.dta', write_index=False,
version=117, convert_strl=['list', 'of', 'fields')
A workaround that will also be sufficient for many is to first convert to MS Excel xlsx
which Stata can then imported the command line:
import pandas as pd
pd.read_json('DemoFileRaw.json').to_excel(pd.ExcelWriter('DemoFileRaw.xlsx'), index=False)
Except where otherwise specifically noted:
MIT License
Copyright (c) 2018 Adam Ross Nelson JD PhD
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.