Skip to content

Commit

Permalink
renamed all dspy to rosetta
Browse files Browse the repository at this point in the history
  • Loading branch information
dkrasner committed Nov 10, 2013
1 parent b13b5e0 commit c0b10bd
Show file tree
Hide file tree
Showing 44 changed files with 6,233 additions and 50 deletions.
12 changes: 6 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@ likelihood of your contribution being merged.**
How to contribute
-----------------

The preferred way to contribute to dspy is to fork the
[project repository](https://github.com/columbia-applied-data-science/dspy/) on
The preferred way to contribute to rosetta is to fork the
[project repository](https://github.com/columbia-applied-data-science/rosetta/) on
GitHub:

1. Fork the [project repository](https://github.com/columbia-applied-data-science/dspy/):
1. Fork the [project repository](https://github.com/columbia-applied-data-science/rosetta/):
click on the 'Fork' button near the top of the page. This creates
a copy of the code under your account on the GitHub server.

2. Clone this copy to your local disk:

$ git clone [email protected]:YourLogin/dspy.git
$ git clone [email protected]:YourLogin/rosetta.git

3. Create a branch to hold your changes:

Expand All @@ -37,7 +37,7 @@ GitHub:

$ git push -u origin my-feature

Finally, go to the web page of the your fork of the dspy repo,
Finally, go to the web page of the your fork of the rosetta repo,
and click 'Pull request' to send your changes to the maintainers for
review. request. This will send an email to the committers.

Expand All @@ -54,7 +54,7 @@ following rules before submitting a pull request:
example script in the ``examples/`` folder. Have a look at other
examples for reference. Examples should demonstrate why the new
functionality is useful in practice and, if possible, compare it
to other methods available in dspy.
to other methods available in rosetta.

- At least one paragraph of narrative documentation with links to
```` references in the literature (with PDF links when possible) and
Expand Down
20 changes: 10 additions & 10 deletions LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
License
=======

DSpy is distributed under a 3-clause ("Simplified" or "New") BSD
Rosetta is distributed under a 3-clause ("Simplified" or "New") BSD
license.

DSpy license
Rosetta license
==============

Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -40,28 +40,28 @@ About the Copyright Holders
===========================

The core team that coordinates development on GitHub can be found here:
http://github.com/columbia-applied-data-science/DSpy
http://github.com/columbia-applied-data-science/Rosetta

Full credits for DSpy contributors can be found in the documentation.
Full credits for Rosetta contributors can be found in the documentation.

Our Copyright Policy
====================

DSpy uses a shared copyright model. Each contributor maintains copyright
over their contributions to DSpy. However, it is important to note that
Rosetta uses a shared copyright model. Each contributor maintains copyright
over their contributions to Rosetta. However, it is important to note that
these contributions are typically only changes to the repositories. Thus,
the DSpy source code, in its entirety, is not the copyright of any single
the Rosetta source code, in its entirety, is not the copyright of any single
person or institution. Instead, it is the collective copyright of the
entire DSpy Development Team. If individual contributors want to maintain
entire Rosetta Development Team. If individual contributors want to maintain
a record of what changes/contributions they have specific copyright on,
they should indicate their copyright in the commit message of the change
when they commit the change to one of the DSpy repositories.
when they commit the change to one of the Rosetta repositories.

With this in mind, the following banner should be used in any source code
file to indicate the copyright and license terms:

#-----------------------------------------------------------------------------
# Copyright (c) 2013, DSpy Development Team
# Copyright (c) 2013, Rosetta Development Team
# All rights reserved.
#
# Distributed under the terms of the BSD Simplified License.
Expand Down
14 changes: 7 additions & 7 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
recursive-include dspy *
recursive-include dspy/cmd *
recursive-include dspy/modeling *
recursive-include dspy/parallel *
recursive-include dspy/tests *
recursive-include dspy/text *
recursive-include dspy/workflow *
recursive-include rosetta *
recursive-include rosetta/cmd *
recursive-include rosetta/modeling *
recursive-include rosetta/parallel *
recursive-include rosetta/tests *
recursive-include rosetta/text *
recursive-include rosetta/workflow *

include MANIFEST.in
include LICENSE
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
DSpy
Rosetta
====

Tools for data science with a focus on text processing.
Expand Down Expand Up @@ -32,7 +32,7 @@ See the `examples/` directory for more details.

Install
-------
Check out the dev branch or a tagged release from the [dspyrepo][dspyrepo]. Then (so long as you have `pip`).
Check out the dev branch or a tagged release from the [rosettarepo][rosettarepo]. Then (so long as you have `pip`).

make
make test
Expand All @@ -44,11 +44,11 @@ Development

You can check the latest sources with

git clone git://github.com/columbia-applied-data-science/dspy
git clone git://github.com/columbia-applied-data-science/rosetta

### Contributing

Feel free to contribute a bug report or a request by opening an [issue](https://github.com/columbia-applied-data-science/dspy/issues)
Feel free to contribute a bug report or a request by opening an [issue](https://github.com/columbia-applied-data-science/rosetta/issues)

Before contributing code, read `CONTRIBUTING.md`

Expand All @@ -57,12 +57,12 @@ Dependencies

Testing
-------
From the base repo directory, `dspy/`, you can run all tests with
From the base repo directory, `rosetta/`, you can run all tests with

make test

History
-------
The *DS* in DSpy clearly relates to *Data Science*. However, it came first from *Data Structure* and the *Dead Sea*. The tools concentrate on streaming text, and the dead sea scrolls are the most famous version of text in a stream (a lake actually...but just pretend and it's really cool).
The *DS* in Rosetta clearly relates to *Data Science*. However, it came first from *Data Structure* and the *Dead Sea*. The tools concentrate on streaming text, and the dead sea scrolls are the most famous version of text in a stream (a lake actually...but just pretend and it's really cool).

[dspyrepo]: https://github.com/columbia-applied-data-science/dspy
[rosettarepo]: https://github.com/columbia-applied-data-science/rosetta
14 changes: 7 additions & 7 deletions examples/vw_helpers.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Working with Vowpal Wabbit (VW)
===============================

To work with the `dspy` utilities you need to:
To work with the `rosetta` utilities you need to:

* Clone the [dspy repo][dspyrepo] and read `README.md`.
* Clone the [rosetta repo][rosettarepo] and read `README.md`.

Create the sparse file (sfile)
------------------------------
Expand All @@ -21,15 +21,15 @@ The `TextFileStreamer` needs a method to convert the text files to a list of str
Once you have a tokenizer, just initialize a streamer and write the VW file.

```python
from dspy import TextFileStreamer, TokenizerBasic
from rosetta import TextFileStreamer, TokenizerBasic

my_tokenizer = TokenizerBasic()
stream = TextFileStreamer(text_base_path='bodyfiles', tokenizer=my_tokenizer)
stream.to_vw('doc_tokens.vw', n_jobs=-1)
```

### Method 2: `files_to_vw.py`
`files_to_vw.py` is a fast and simple command line utility for converting files to VW format. Installing `dspy` will put these utilities in your path.
`files_to_vw.py` is a fast and simple command line utility for converting files to VW format. Installing `rosetta` will put these utilities in your path.

* Try converting the first 5 files in `my_base_path`. The following should print 5 lines of of results, in [vw format][vwinput]

Expand Down Expand Up @@ -150,7 +150,7 @@ The python function `filter_sfile.py` takes in `ddrs.vw` and streams a filtered
You can view the topics and predictions with this:

```python
from dspy.text.vw_helpers import LDAResults
from rosetta.text.vw_helpers import LDAResults
num_topics = 5
lda = LDAResults('topics.dat', 'prediction.dat', num_topics, 'sff_file.pkl')
lda.print_topics()
Expand Down Expand Up @@ -195,9 +195,9 @@ Contribute!


[vwinput]: https://github.com/JohnLangford/vowpal_wabbit/wiki/Input-format
[dspyrepo]: https://github.com/columbia-applied-data-science/dspy
[rosettarepo]: https://github.com/columbia-applied-data-science/rosetta
[vwlda]: https://github.com/JohnLangford/vowpal_wabbit/wiki/lda.pdf
[vwtricks]: www.slideshare.net/jakehofman/technical-tricks-of-vowpal-wabbit‎
[hashing]: https://github.com/JohnLangford/vowpal_wabbit/wiki/Feature-Hashing-and-Extraction
[spot]: http://en.wikipedia.org/wiki/Single_Point_of_Truth
[issue]: https://github.com/columbia-applied-data-science/dspy/issues
[issue]: https://github.com/columbia-applied-data-science/rosetta/issues
10 changes: 5 additions & 5 deletions makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ PYTHON ?= python
UNITTEST ?= unittest
CTAGS ?= ctags

TESTDIR=dspy/tests
TESTDIR=rosetta/tests

all: install test

Expand All @@ -18,7 +18,7 @@ install: clean

# Reinstall with pip
reinstall: clean
pip uninstall dspy
pip uninstall rosetta
$(PYTHON) setup.py sdist
pip install dist/*

Expand Down Expand Up @@ -50,14 +50,14 @@ test-cmd:
$(PYTHON) -m $(UNITTEST) discover -s $(TESTDIR) -p '*cmd*' -v

trailing-spaces:
find dspy -name "*.py" | xargs perl -pi -e 's/[ \t]*$$//'
find rosetta -name "*.py" | xargs perl -pi -e 's/[ \t]*$$//'

ctags:
# make tags for symbol based navigation in emacs and vim
# Install with: sudo apt-get install exuberant-ctags
$(CTAGS) -R *

code-analysis:
flake8 dspy | grep -v __init__ | grep -v external
pylint -E -i y dspy/ -d E1103,E0611,E1101
flake8 rosetta | grep -v __init__ | grep -v external
pylint -E -i y rosetta/ -d E1103,E0611,E1101

1 change: 1 addition & 0 deletions rosetta/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from rosetta.text.api import *
Empty file added rosetta/cmd/__init__.py
Empty file.
27 changes: 27 additions & 0 deletions rosetta/cmd/bashrc_additions
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Additions to your bashrc
#
#
###############################################################################
# INSTALLATION
###############################################################################
# Put desired sections in your ~/.bashrc (or ~/.bash_profile on macs) and then
# "source it" or close then open a new terminal.
#
###############################################################################
# Body function
###############################################################################
# This allows you to run a command on the body of the function, skipping the header
# (but still printing the header). For example,
#
# $ cat filewithheader | body sort -k1,1
#
# will sort filewithheader, using the first field, but leave the header at the top
# of the file.

body() {
IFS= read -r header
printf '%s\n' "$header"
"$@"
}

export -f body
80 changes: 80 additions & 0 deletions rosetta/cmd/concat_csv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#!/usr/bin/env python
"""
Concat a list of csv files in an "outer join" style.
From pandas, uses DataFrame.from_csv, DataFrame.to_csv, concat to do
reads/writes/joins. Except noted below, the default arguments are used.
"""

import argparse
import sys

import pandas as pd


def _cli():
# Text to display after help
epilog = """
EXAMPLES
Concat two files, each with a header and index, redirect output to newfile
$ python concat_csv.py --index --header file1 file2 > newfile
Concat two files, write result to newfile
$ python concat_csv.py --index --header -o newfile file1 file2
Concat all files in mydir/, write result to stdout.
$ python concat_csv.py mydir/*
"""
parser = argparse.ArgumentParser(
description=globals()['__doc__'], epilog=epilog,
formatter_class=argparse.RawDescriptionHelpFormatter)

parser.add_argument(
'paths', nargs='*', help='Concat files in this space separated list')
parser.add_argument(
'-o', '--outfile', default=sys.stdout,
type=argparse.FileType('w'),
help='Write to OUT_FILE rather than sys.stdout.')
parser.add_argument(
'-s', '--sep', default=',',
help='Delimiter to use. Regular expressions are accepted.'
' [default: %(default)s]')

parser.add_argument(
'--index', action='store_true', default=False,
help='Flag to set if files have an index (leftmost column).'
' [default: %(default)s].')
parser.add_argument(
'--header', action='store_true', default=False,
help='Flag to set if files have headers (in top row). '
'[default: %(default)s]')

parser.add_argument(
'-a', '--axis', type=int, default=0,
help='Axes along which to concatenate')

# Parse and check args
args = parser.parse_args()

# Call the module interface
_concat(
args.outfile, args.paths, args.sep, args.index, args.header, args.axis)


def _concat(outfile, paths, sep, index, header, axis):
# Read
index_col = 0 if index else False
header_row = 0 if header else False
kwargs = {'sep': sep, 'index_col': index_col, 'header': header_row}
frames = pd.concat(
(pd.DataFrame.from_csv(p, **kwargs) for p in paths), axis=axis)

# Write
kwargs = {'sep': sep, 'index': index, 'header': header}

frames.to_csv(outfile, **kwargs)


if __name__ == '__main__':
_cli()
Loading

0 comments on commit c0b10bd

Please sign in to comment.