pyxtension

pyxtension is a pure Python MIT-licensed library that includes Scala-like streams, Json with attribute access syntax, and other common-use stuff.

Installation

pip install pyxtension

or from Github:

git clone https://github.com/asuiu/pyxtension.git
cd pyxtension
python setup.py install

or

git submodule add https://github.com/asuiu/pyxtension.git

Modules overview

Json.py

Json

A dict subclass to represent a Json object. You should be able to use this

absolutely anywhere you can use a dict. While this is probably the class you

want to use, there are a few caveats that follow from this being a dict under

the hood.

Never again will you have to write code like this:

body = {
    'query': {
        'filtered': {
            'query': {
                'match': {'description': 'addictive'}
            },
            'filter': {
                'term': {'created_by': 'ASU'}
            }
        }
    }
}

From now on, you may simply write the following three lines:

body = Json()
body.query.filtered.query.match.description = 'addictive'
body.query.filtered.filter.term.created_by = 'ASU'

streams.py

stream

stream subclasses collections.Iterable. It's the same Python iterable, but with more added methods, suitable for multithreading and multiprocess processings.

Used to create stream processing pipelines, similar to those used in Scala and MapReduce programming model.

Those who used Apache Spark RDD functions will find this model of processing very easy to use.

streams

Never again will you have to write code like this:

> lst = xrange(1,6)
> reduce(lambda x, y: x * y, map(lambda _: _ * _, filter(lambda _: _ % 2 == 0, lst)))
64

From now on, you may simply write the following lines:

> the_stream = stream( xrange(1,6) )
> the_stream.\
    filter(lambda _: _ % 2 == 0).\
    map(lambda _: _ * _).\
    reduce(lambda x, y: x * y)
64

A Word Count Map-Reduce naive example using multiprocessing map

corpus = [
    "MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.",
    "At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web",
    "Conceptually similar approaches have been very well known since 1995 with the Message Passing Interface standard having reduce and scatter operations."]

def reduceMaps(m1, m2):
    for k, v in m2.iteritems():
        m1[k] = m1.get(k, 0) + v
    return m1

word_counts = stream(corpus).\
    mpmap(lambda line: stream(line.lower().split(' ')).countByValue()).\
    reduce(reduceMaps)

Basic methods

map(f)

Identic with builtin map but returns a stream

mpmap(f, poolSize=16)

Parallel ordered map using multiprocessing.Pool.imap().

It can replace the map when need to split computations to multiple cores, and order of results matters.

It spawns at most poolSize processes and applies the f function.

The elements in the result stream appears in the same order they appear in the initial iterable.

:type f: (T) -> V
:rtype: `stream`

mpfastmap(f, poolSize=16)

Parallel ordered map using multiprocessing.Pool.imap_unordered().

It can replace the map when the ordered of results doesn't matter.

It spawns at most poolSize processes and applies the f function.

The elements in the result stream appears in the unpredicted order.

:type f: (T) -> V
:rtype: `stream`

fastmap(f, poolSize=16)

Parallel unordered map using multithreaded pool.

It can replace the map when the ordered of results doesn't matter.

It spawns at most poolSize threads and applies the f function.

The elements in the result stream appears in the unpredicted order.

Because of CPython GIL it's most usefull for I/O or CPU intensive consuming native functions, or on Jython or IronPython interpreters.

type f:	-> V
rtype:	`stream`

**flatMap(predicate=_IDENTITY_FUNC)** :param predicate: is a function that will receive elements of self collection and return an iterable ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

By default predicate is an identity function

type predicate:	(V)-> collections.Iterable[T]
return:	will return stream of objects of the same type of elements from the stream returned by predicate()

Example:

stream([[1, 2], [3, 4], [4, 5]]).flatMap().toList() == [1, 2, 3, 4, 4, 5]

filter(predicate)

identic with builtin filter, but returns stream

reversed()

returns reversed stream

exists(predicate)

Tests whether a predicate holds for some of the elements of this sequence.

rtype:	bool

Example:

stream([1, 2, 3]).exists(0) -> False
stream([1, 2, 3]).exists(1) -> True

**keyBy(keyfunc = _IDENTITY_FUNC)** Transforms stream of values to a stream of tuples (key, value) ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

param keyfunc:	function to map values to keys
type keyfunc:	-> T
return:	stream of Key, Value pairs
rtype:	stream[( T, V )]

Example:

stream([1, 2, 3, 4]).keyBy(lambda _:_ % 2) -> [(1, 1), (0, 2), (1, 3), (0, 4)]

groupBy()

groupBy([keyfunc]) -> Make an iterator that returns consecutive keys and groups from the iterable.

The iterable needs not to be sorted on the same key function, but the keyfunction need to return hasable objects.

param keyfunc:	[Optional] The key is a function computing a key value for each element.
type keyfunc:	-> (V)
return:	(key, sub-iterator) grouped by each value of key(value).
rtype:	stream[ ( V, slist[T] ) ]

Example:

stream([1, 2, 3, 4]).groupBy(lambda _: _ % 2) -> [(0, [2, 4]), (1, [1, 3])]

countByValue()

Returns a collections.Counter of values

Example

stream(['a', 'b', 'a', 'b', 'c', 'd']).countByValue() == {'a': 2, 'b': 2, 'c': 1, 'd': 1}

distinct()

Returns stream of distinct values. Values must be hashable.

stream(['a', 'b', 'a', 'b', 'c', 'd']).distinct() == {'a', 'b', 'c', 'd'}

reduce(f, init=None)

same arguments with builtin reduce() function

toSet()

returns sset() instance

toList()

returns slist() instance

toMap()

returns sdict() instance

sorted(key=None, cmp=None, reverse=False)

same arguments with builtin sorted()

size()

returns length of stream. Use carefully on infinite streams.

join(f)

Returns a string joined by f. Proivides same functionality as str.join() builtin method.

if f is basestring, uses it to join the stream, else f should be a callable that returns a string to be used for join

mkString(f)

identic with join(f)

take(n)

returns first n elements from stream

head()

returns first element from stream

zip()

the same behavior with itertools.izip()

throttle(max_req: int, interval: float)

throttles to process at most max_req elements pe every 'interval' seconds.

**unique(predicate=_IDENTITY_FUNC)** Returns a stream of unique (according to predicate) elements appearing in the same order as in original stream ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

The items returned by predicate should be hashable and comparable.

Statistics related methods

entropy()

calculates the Shannon entropy of the values from stream

pstddev()

Calculates the population standard deviation.

mean()

returns the arithmetical mean of the values

sum()

returns the sum of elements from stream

**min(key=_IDENTITY_FUNC)** same functionality with builtin min() funcion '''''''''''''''''''''''''''''''''''''''''''''

**min_default(default, key=_IDENTITY_FUNC)** same functionality with min() but returns :default: when called on empty streams ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

max()

same functionality with builtin max()

**maxes(key=_IDENTITY_FUNC)** returns a stream of max values from stream ''''''''''''''''''''''''''''''''''''''''''

**mins(key=_IDENTITY_FUNC)** returns a stream of min values from stream ''''''''''''''''''''''''''''''''''''''''''

Other classes

slist

Inherits streams.stream and built-in list classes, and keeps in memory a list allowing faster index access

sset

Inherits streams.stream and built-in set classes, and keeps in memory the whole set of values

sdict

Inherits streams.stream and built-in dict, and keeps in memory the dict object.

defaultstreamdict

Inherits streams.sdict and adds functionality of collections.defaultdict from stdlib

throttler

Thread-safe time throttler that can be attached on a stream to limit the number of calls per time interval. Example:

> from pyxtension.throttler import Throttler
> throttler = Throttler(5, 10)
> stream(range(100)).map(throttler.throttle).map(print).to_list()

it will throttle the stream to max 5 calls per every 10 seconds.

Json

Json is a module that provides mapping objects that allow their elements to be accessed both as keys and as attributes:

> from pyxtension.Json import Json
> a = Json({'foo': 'bar'})
> a.foo
'bar'
> a['foo']
'bar'

Attribute access makes it easy to create convenient, hierarchical settings objects:

with open('settings.yaml') as fileobj:
    settings = Json(yaml.safe_load(fileobj))

cursor = connect(**settings.db.credentials).cursor()

cursor.execute("SELECT column FROM table;")

Basic Usage

Json comes with two different classes, Json, and JsonList.

Json is fairly similar to native dict as it extends it an is a mutable mapping that allow creating, accessing, and deleting key-value pairs as attributes.

JsonList is similar to native list as it extends it and offers a way to transform the dict objects from inside also in Json instances.

Construction

Directly from a JSON string

> Json('{"key1": "val1", "lst1": [1,2] }')
{u'key1': u'val1', u'lst1': [1, 2]}

From `tuple`s:

> Json( ('key1','val1'), ('lst1', [1,2]) )
{'key1': 'val1', 'lst1': [1, 2]}
# keep in mind that you should provide at least two tuples with key-value pairs

As a built-in `dict`

> Json( [('key1','val1'), ('lst1', [1,2])] )
{'key1': 'val1', 'lst1': [1, 2]}

Json({'key1': 'val1', 'lst1': [1, 2]})
{'key1': 'val1', 'lst1': [1, 2]}

Convert to a `dict`

> json = Json({'key1': 'val1', 'lst1': [1, 2]})
> json.toOrig()
{'key1': 'val1', 'lst1': [1, 2]}

Valid Names

Any key can be used as an attribute as long as:

The key represents a valid attribute (i.e., it is a string comprised only of alphanumeric characters and underscores that doesn't start with a number)
The key does not shadow a class attribute (e.g., get).

Attributes vs. Keys

There is a minor difference between accessing a value as an attribute vs.

accessing it as a key, is that when a dict is accessed as an attribute, it will

automatically be converted to a Json object. This allows you to recursively

access keys::

> attr = Json({'foo': {'bar': 'baz'}})
> attr.foo.bar
'baz'

Relatedly, by default, sequence types that aren't bytes, str, or unicode

(e.g., lists, tuples) will automatically be converted to tuples, with any

mappings converted to Json:

> attr = Json({'foo': [{'bar': 'baz'}, {'bar': 'qux'}]})
> for sub_attr in attr.foo:
>     print(sub_attr.bar)
'baz'
'qux'

To get this recursive functionality for keys that cannot be used as attributes,

you can replicate the behavior by using dict syntax on Json object::

> json = Json({1: {'two': 3}})
> json[1].two
3

JsonList usage examples:

> json = Json('{"lst":[1,2,3]}')
> type(json.lst)
<class 'pyxtension.Json.JsonList'>

> json = Json('{"1":[1,2]}')
> json["1"][1]
2

Assignment as keys will still work:

.. code:: python

> json = Json({'foo': {'bar': 'baz'}}) > json['foo']['bar'] = 'baz' > json.foo {'bar': 'baz'}

License

pyxtension is released under a GNU Public license.

The idea for Json module was inspired from addict and AttrDict,

but it has a better performance with lower memory consumption.

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

pyxtension

Installation

Modules overview

Json.py

Json

streams.py

stream

streams

A Word Count Map-Reduce naive example using multiprocessing map

Basic methods

map(f)

mpmap(f, poolSize=16)

mpfastmap(f, poolSize=16)

fastmap(f, poolSize=16)

filter(predicate)

reversed()

exists(predicate)

groupBy()

countByValue()

distinct()

reduce(f, init=None)

toSet()

toList()

toMap()

sorted(key=None, cmp=None, reverse=False)

size()

join(f)

mkString(f)

take(n)

head()

zip()

throttle(max_req: int, interval: float)

Statistics related methods

entropy()

pstddev()

mean()

sum()

max()

Other classes

slist

sset

sdict

defaultstreamdict

throttler

Json

Basic Usage

Construction

Directly from a JSON string

From tuples:

As a built-in dict

Convert to a dict

Valid Names

Attributes vs. Keys

License

From `tuple`s:

As a built-in `dict`

Convert to a `dict`