Skip to content
This repository has been archived by the owner on Feb 4, 2021. It is now read-only.

Commit

Permalink
Bug 1466936 - Include python files in jar and use tag-based publishing (
Browse files Browse the repository at this point in the history
#7)

This allows us to access the python bindings for the package 
even if we pull from maven rather than spark-packages.org like so:

```
pyspark --packages com.mozilla.telemetry:spark-hyperloglog_2.11:2.2.0.1 --repositories https://s3-us-west-2.amazonaws.com/net-mozaws-data-us-west-2-ops-mavenrepo/releases/
```
  • Loading branch information
jklukas authored Jul 3, 2018
1 parent 6c18f92 commit fad9890
Show file tree
Hide file tree
Showing 7 changed files with 46 additions and 11 deletions.
7 changes: 5 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
command: |
bash <(curl -s https://codecov.io/bash)
# The publish job only gets scheduled for commits to master; see workflows section below
# This publish job only runs for builds triggered by a git tag; see workflows section below.
publish:
docker:
- image: mozilla/sbt:8u171_0.13.13
Expand All @@ -38,9 +38,12 @@ workflows:
test-publish:
jobs:
- test
# Publish only runs on builds triggered by a new git tag of form vX.X.X
- publish:
requires:
- test
filters:
branches:
only: master
ignore: /.*/
tags:
only: /^v.*/
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,8 @@ target/
.idea/
.idea_modules/
.DS_Store
*.pyc
*.pyc
venv/
.tox/
*.egg-info/
.pytest_cache/
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,16 @@ yields:
```

### Deployment
Any commits to master should also trigger a circleci build that will do the sbt publishing for you
to our local maven repo in s3 and to spark-packages.org.

To publish a new version of the package, you need to
[create a new release on GitHub](https://github.com/mozilla/spark-hyperloglog/releases/new)
with a tag version starting with `v` like `v2.2.0`. The tag will trigger a CircleCI build
that publishes to Mozilla's maven repo in S3.

The CircleCI build will also attempt to publish the new tag to spark-packages.org,
but due to
[an outstanding bug in the sbt-spark-package plugin](https://github.com/databricks/sbt-spark-package/issues/31)
that publish will likely fail. You can retry locally until is succeeds by creating a GitHub
personal access token and, exporting the environment variables `GITHUB_USERNAME` and
`GITHUB_PERSONAL_ACCESS_TOKEN`, and then repeatedly running `sbt spPublish` until you get a
non-404 response.
1 change: 0 additions & 1 deletion VERSION

This file was deleted.

21 changes: 20 additions & 1 deletion build.sbt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name := "spark-hyperloglog"

version := scala.io.Source.fromFile("VERSION").mkString.stripLineEnd
version := sys.env.getOrElse("CIRCLE_TAG", "v2.2-SNAPSHOT").stripPrefix("v")

scalaVersion := "2.11.8"

Expand All @@ -25,6 +25,25 @@ credentials += Credentials(
sys.env.getOrElse("GITHUB_USERNAME", ""),
sys.env.getOrElse("GITHUB_PERSONAL_ACCESS_TOKEN", ""))


// Include the contents of the python/ directory at the root of our packaged jar;
// `sbt spPublish` handles including python files for the zip sent to spark-packages.org,
// but we also want the python bindings to be present in the jar we upload to S3 maven
// via `sbt publish`.
val pythonBesidesPyspark = new SimpleFileFilter({ f =>
val pythonDir = "/spark-hyperloglog/python"
val pyLibDir = pythonDir + "/pyspark_hyperloglog"
val p = f.getCanonicalPath
p match {
case _ if p.contains(pyLibDir) => false // Don't exclude contents of pyspark dir
case _ if p.contains(pythonDir + "/") => true // Exclude everything else under python/
case _ => false // Don't exclude other files not under python/
}
})
unmanagedResourceDirectories in Compile += baseDirectory.value / "python"
excludeFilter in unmanagedResources :=
HiddenFileFilter || pythonBesidesPyspark || "*.pyc" || "*.egg*"

publishMavenStyle := true

publishTo := {
Expand Down
1 change: 0 additions & 1 deletion python/VERSION

This file was deleted.

6 changes: 3 additions & 3 deletions python/setup.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
from setuptools import setup
import os

with open('VERSION', 'r') as f:
VERSION = f.read().strip()
version = os.environ.get('CIRCLE_TAG', 'v2.2.snapshot').lstrip('v')

setup(
name='pyspark-hyperloglog',
version=VERSION.split('-')[0],
version=version,
description='PySpark UDFs for HyperLogLog',
keywords=['spark', 'udf', 'hyperloglog'],
author='Anthony Miyaguchi',
Expand Down

0 comments on commit fad9890

Please sign in to comment.