-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #171 from telefonicaid/hardening/167_qsg_new_clust…
…er_lab hardening/167_qsg_new_cluster_lab
- Loading branch information
Showing
4 changed files
with
277 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,211 @@ | ||
#<a name="top"></a>Cosmos Quick Start Guide | ||
Content: | ||
|
||
* [Introduction](#section1) | ||
* [Assumptions](#section2) | ||
* [Step by step guide](#section3) | ||
* [Step 1: Get an OAuth2 token](#section3.1) | ||
* [Step 2: Create a Cosmos account](#secton3.2) | ||
* [Step 3: Upload some data to HDFS](#section3.3) | ||
* [Step 4: Query your data](#section3.4) | ||
* [Step 5: Run your first MapReduce job](#section3.5) | ||
* [Step 6: Download some data](#section3.6) | ||
* [Reporting issues and contact information](#section4) | ||
|
||
##<a name="section1"></a>Introduction | ||
This Quick Start Guide overviews the steps a newbie programmer will have to perform in order to get familiar with Cosmos and its functionality. For a more detailed information, please refer to the official [documentation](http://fiware-cosmos.readthedocs.org/en/latest/) and the Cosmos entry in the [FI-WARE Catalogue](http://catalogue.fi-ware.org/enablers/bigdata-analysis-cosmos). | ||
|
||
[Top](#top) | ||
|
||
##<a name="section2"></a>Assumptions | ||
This Quick Start Guide assumes you are going to use the already deployed Global Instance of Cosmos in FIWARE Lab. This is the <b>recommended usage of Cosmos</b>. This global instance runs in a cluster of machines, providing distributed storage (based on Hadoop Distributed File System - HDFS) and distributed computing capabilities (based on Hadoop MapReduce engine and some querying tools such as Hive). | ||
|
||
In fact, the Global Instance of Cosmos in FIWARE Lab is not really a single Hadoop cluster, but one cluster in charge of storage governed by the <i>Storage Endpoint</i> (`storage.cosmos.lab.fiware.org`) and another one in charge of computing governed by the <i>Computing Endpoint</i> (`computing.cosmos.lab.fiware.org`). | ||
|
||
[Top](#top) | ||
|
||
##<a name="section3"></a>Step by step guide | ||
###<a name="section3.1"></a>Step 1: Get an OAuth2 token | ||
All APIs in FIWARE Lab are protected by means of [OAuth2](http://oauth.net/2/) tokens. Cosmos is not an exception, so you will need to request to the <i>Computing Endpoint</i> a valid token for your FIWARE Lab user. `curl` tool can be used for that purpose: | ||
|
||
$ curl -k -X POST "https://computing.cosmos.lab.fiware.org:13000/cosmos-auth/v1/token" -H "Content-Type: application/x-www-form-urlencoded" -d "grant_type=password&username=<YOUR_USER_EMAIL>&password=<YOUR_PASSWORD>” | ||
|
||
Where `username` and `password` are the email and password you used when you registered in FIWARE Lab. You should get something like: | ||
|
||
{"access_token": "3azH09G1PdaGmgBNODLOtxy52f5a00", "token_type": "Bearer", "expires_in": 3600, "refresh_token": "V2Wlk7aFCnElKlW9BOmRzGhBtqgR2z"} | ||
|
||
The `access_token` field is the OAuth2 token. | ||
|
||
[Top](#top) | ||
|
||
###<a name="section3.2"></a>Step 2: Create a Cosmos account | ||
At the moment of writting, deploying a Cosmos Portal for FIWARE Lab is in the roadmap, but not yet done. | ||
|
||
Thus, in order to create an account you will have to send an email to `[email protected]` specifying your FIWARE Lab ID. | ||
|
||
Such an ID can be obtained by querying FIWARE Lab's Identity Manager: | ||
|
||
$ curl -X GET "https://account.lab.fiware.org/user?access_token=<YOUR_OAUTH2_TOKEN>" | ||
|
||
The result of such a query for the user `frb` is: | ||
|
||
{"organizations": [], "displayName": "frb", "roles": [{"name": "provider", "id": "106"}], "app_id": “9556cc76154361b3b43d7b31f0600982", "email": "[email protected]", "id": "frb”} | ||
|
||
The interesting part is the "id" field, in the above example `frb`. | ||
|
||
[Top](#top) | ||
|
||
###<a name="section3.3"></a>Step 3: Upload some data to HDFS | ||
You can upload your own data to your HDFS space using the WebHDFS RESTful API listening on TCP/14000 port of the <i>Storage Endpoint</i>. | ||
|
||
Let's start by creating a new directory (`testdir`) in our HDFS user space (in this example, `hdfs:///user/frb`). `curl` has been used as REST client: | ||
|
||
$ curl -X PUT "http://storage.cosmos.lab.fiware.org:14000/webhdfs/v1/user/frb/testdir?op=MKDIRS&user.name=frb" -H "X-Auth-token: 3azH09G1PdaGmgBNODLOtxy52f5a00" | python -m json.tool | ||
{"boolean": true} | ||
|
||
Now, it is time to upload some local file (`testdata.txt`) to the fresh new directory we have created (please observe the verbose option `-v` has been used): | ||
|
||
``` | ||
$ cat testdata.txt | ||
luke,tatooine,jedi,25 | ||
leia,alderaan,politician,25 | ||
solo,corellia,pilot,32 | ||
yoda,dagobah,jedi,275 | ||
vader,tatooine,sith,50 | ||
``` | ||
|
||
``` | ||
$ curl -v -X PUT -T testdata.txt "http://storage.cosmos.lab.fiware.org:14000/webhdfs/v1/user/frb/testdir/testdata.txt?op=CREATE&user.name=frb" -H "Content-Type: application/octet-stream" -H "X-Auth-token: 3azH09G1PdaGmgBNODLOtxy52f5a00" | ||
* Trying 195.235.93.174... | ||
* Connected to storage.cosmos.lab.fiware.org (195.235.93.174) port 14000 (#0) | ||
> PUT /webhdfs/v1/user/frb/testdir/testdata.txt?op=CREATE&user.name=frb HTTP/1.1 | ||
> Host: storage.cosmos.lab.fiware.org:14000 | ||
> User-Agent: curl/7.43.0 | ||
> Accept: */* | ||
> Content-Type: application/octet-stream | ||
> X-Auth-token: 3azH09G1PdaGmgBNODLOtxy52f5a00 | ||
> Content-Length: 118 | ||
> Expect: 100-continue | ||
> | ||
< HTTP/1.1 100 Continue | ||
* We are completely uploaded and fine | ||
< HTTP/1.1 307 Temporary Redirect | ||
< X-Powered-By: Express | ||
< Access-Control-Allow-Origin: * | ||
< Access-Control-Allow-Methods: HEAD, POST, GET, OPTIONS, DELETE | ||
< Access-Control-Allow-Headers: origin, content-type, X-Auth-Token, Tenant-ID, Authorization | ||
< server: Apache-Coyote/1.1 | ||
< set-cookie: hadoop.auth="u=frb&p=frb&t=simple&e=1460661599535&s=Uzn+QdUaqGpZqXsoyNb9cCUuJtU="; Version=1; Path=/; Expires=Thu, 14-Apr-2016 19:19:59 GMT; HttpOnly | ||
< location: http://dev-fiwr-svc-01.tid.es:14000/webhdfs/v1/user/frb/testdir/testdata.txt?op=CREATE&user.name=frb&data=true | ||
< Content-Type: application/json; charset=utf-8 | ||
< content-length: 0 | ||
< date: Thu, 14 Apr 2016 09:19:59 GMT | ||
< connection: close | ||
< | ||
* Closing connection 0 | ||
``` | ||
|
||
The above command just has started the uploading operation. As can be seen, the WebHDFS service redirects us to the following location: | ||
|
||
location: http://dev-fiwr-svc-01.tid.es:14000/webhdfs/v1/user/frb/testdir/testdata.txt?op=CREATE&user.name=frb&data=true | ||
|
||
That's because the first operation only created the new `hdfs:///user/testdir/testdata.txt` HDFS file in the Namenode; not it is time to upload the data bytes to the Datanodes, and that's achieved by PUTting again the local `testdata.txt` file in the redirection URL: | ||
|
||
``` | ||
$ curl -v -X PUT -T testdata.txt "http://dev-fiwr-svc-01.tid.es:14000/webhdfs/v1/user/frb/testdir/testdata.txt?op=CREATE&user.name=frb&data=true" -H "Content-Type: application/octet-stream" -H "X-Auth-token: 3azH09G1PdaGmgBNODLOtxy52f5a00" | ||
* Trying 195.235.93.174... | ||
* Connected to storage.cosmos.lab.fiware.org (195.235.93.174) port 14000 (#0) | ||
> PUT /webhdfs/v1/user/frb/testdir/testdata.txt?op=CREATE&user.name=frb&data=true HTTP/1.1 | ||
> Host: storage.cosmos.lab.fiware.org:14000 | ||
> User-Agent: curl/7.43.0 | ||
> Accept: */* | ||
> Content-Type: application/octet-stream | ||
> X-Auth-token: 3azH09G1PdaGmgBNODLOtxy52f5a00 | ||
> Content-Length: 118 | ||
> Expect: 100-continue | ||
> | ||
< HTTP/1.1 100 Continue | ||
* We are completely uploaded and fine | ||
< HTTP/1.1 201 Created | ||
< X-Powered-By: Express | ||
< Access-Control-Allow-Origin: * | ||
< Access-Control-Allow-Methods: HEAD, POST, GET, OPTIONS, DELETE | ||
< Access-Control-Allow-Headers: origin, content-type, X-Auth-Token, Tenant-ID, Authorization | ||
< server: Apache-Coyote/1.1 | ||
< set-cookie: hadoop.auth="u=frb&p=frb&t=simple&e=1460661759278&s=w59VlQYJNAoJ1iECqXrWOIXN9hQ="; Version=1; Path=/; Expires=Thu, 14-Apr-2016 19:22:39 GMT; HttpOnly | ||
< Content-Type: application/json; charset=utf-8 | ||
< content-length: 0 | ||
< date: Thu, 14 Apr 2016 09:22:39 GMT | ||
< connection: close | ||
< | ||
* Closing connection 0 | ||
``` | ||
|
||
We can check the data has been successfully uploaded: | ||
|
||
``` | ||
$ curl -X GET "http://storage.cosmos.lab.fiware.org:14000/webhdfs/v1/user/frb/testdir/testdata.txt?op=OPEN&user.name=frb" -H "X-Auth-token: 3azH09G1PdaGmgBNODLOtxy52f5a00" | ||
luke,tatooine,jedi,25 | ||
leia,alderaan,politician,25 | ||
solo,corellia,pilot,32 | ||
yoda,dagobah,jedi,275 | ||
vader,tatooine,sith,50 | ||
``` | ||
|
||
NOTES: | ||
|
||
* `dev-fiwr-svc-01.tid.es` is just an alias of `storage.cosmos.lab.fiware.org`. | ||
* You can get more details on the 2-step uploading operation in the [WebHDFS specification](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html). | ||
* The Global Instance of Cosmos in FIWARE Lab runs the [HttpFS gateway](http://hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/index.html). That's why the REST operations are done against the TCP/14000 port and not the against the TCP/50070 port used by WebHDFS (which is not exposed). That's also the reason the redirection locations point to the HttpFS server itself instead of to the real Datanode. | ||
|
||
[Top](#top) | ||
|
||
###<a name="section3.4"></a>Step 4: Query your data | ||
Coming soon. | ||
|
||
[Top](#top) | ||
|
||
###<a name="section3.5"></a>Step 5: Run your first MapReduce job | ||
Several pre-loaded MapReduce examples can be found in every Hadoop distribution, typically in a Java `-jar` file called `hadoop-mapreduce-examples.jar`. In this case, the <i>Computing Endpoint</i> owns that file at: | ||
|
||
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar | ||
|
||
For instance, you can run the <i>Word Count</i> example (this is also know as the "hello world" of Hadoop) by typing: | ||
|
||
$ curl -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/frb/jobs" -d '{"jar":"/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar","class_name":"wordcount","lib_jars":"/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar","input":"testdir","output":"testoutput"}' -H "Content-Type: application/json" -H "X-Auth-Token: 3azH09G1PdaGmgBNODLOtxy52f5a00" | ||
{"success":"true","job_id": "job_1460639183882_0001"} | ||
|
||
As you can see, another REST API has been used, in this case the Tidoop REST API in the <i>Computing Endpoint</i>. The API allows you checking the status of the job as well: | ||
|
||
$ curl -X GET "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/frb/jobs/job_1460639183882_0001" -H "X-Auth-Token: 3azH09G1PdaGmgBNODLOtxy52f5a00" | ||
{"success":"true","job":{"job_id":"job_1460639183882_0001","state":"SUCCEEDED","start_time":"1461060258427","user_id":"frb"}} | ||
|
||
[Top](#top) | ||
|
||
###<a name="section3.6"></a>Step 6: Download some data | ||
Finally, the result of the MapReduce execution can be seen at the output HDFS folder (which is automatically created) by using the WebHDFS REST API in the <i>Storage Endpoint</i>: | ||
|
||
$ curl -X GET "http://storage.cosmos.lab.fiware.org:14000/webhdfs/v1/user/frb/testoutput?op=liststatus&user.name=frb" -H "X-Auth-Token: 3azH09G1PdaGmgBNODLOtxy52f5a00" | ||
{"FileStatuses":{"FileStatus":[{"pathSuffix":"_SUCCESS","type":"FILE","length":0,"owner":"frb","group":"frb","permission":"644","accessTime":1461060272601,"modificationTime":1461060272616,"blockSize":134217728,"replication":3},{"pathSuffix":"part-r-00000","type":"FILE","length":47,"owner":"frb","group":"frb","permission":"644","accessTime":1461060272228,"modificationTime":1461060272409,"blockSize":134217728,"replication":3}]}} | ||
$ curl -X GET "http://storage.cosmos.lab.fiware.org:14000/webhdfs/v1/user/frb/testoutput/part-r-00000?op=open&user.name=frb" -o output.txt -H "X-Auth-Token: 3azH09G1PdaGmgBNODLOtxy52f5a00" | ||
$ cat output.txt | ||
leia,alderaan,politician,25 1 | ||
luke,tatooine,jedi,25 1 | ||
solo,corellia,pilot,32 1 | ||
vader,tatooine,sith,50 1 | ||
yoda,dagobah,jedi,275 1 | ||
|
||
[Top](#top) | ||
|
||
##<a name="section4"></a>Reporting issues and contact information | ||
There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question: | ||
|
||
* Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cygnus` tag. | ||
* Use [ask.fiware.org](https://ask.fiware.org/questions/) for general questions about FIWARE, e.g. how many cities are using FIWARE, how can I join the accelarator program, etc. Even for general questions about this software, for instance, use cases or architectures you want to discuss. | ||
* Personal email: | ||
* [[email protected]](mailto:[email protected]) **[Main contributor]** | ||
* [[email protected]](mailto:[email protected]) **[Contributor]** | ||
|
||
**NOTE**: Please try to avoid personaly emailing the contributors unless they ask for it. In fact, if you send a private email you will probably receive an automatic response enforcing you to use [stackoverflow.com](stackoverflow.com) or [ask.fiware.org](https://ask.fiware.org/questions/). This is because using the mentioned methods will create a public database of knowledge that can be useful for future users; private email is just private and cannot be shared. | ||
|
||
[Top](#top) |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright 2016 Telefonica Investigación y Desarrollo, S.A.U | ||
# | ||
# This file is part of fiware-cosmos (FI-WARE project). | ||
# | ||
# fiware-cosmos is free software: you can redistribute it and/or modify it under the terms of the GNU Affero | ||
# General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your | ||
# option) any later version. | ||
# fiware-cosmos is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the | ||
# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License | ||
# for more details. | ||
# | ||
# You should have received a copy of the GNU Affero General Public License along with fiware-cosmos. If not, see | ||
# http://www.gnu.org/licenses/. | ||
# | ||
# For those usages not covered by the GNU Affero General Public License please contact with iot_support at tid dot es | ||
|
||
# imports | ||
import sys | ||
import pyhs2 | ||
from pyhs2.error import Pyhs2Exception | ||
|
||
# get the input parameters | ||
if len(sys.argv) != 6: | ||
print 'Usage: python hiveserver2-client.py <hive_host> <hive_port> <db_name> <hadoop_user> <hadoop_password>' | ||
sys.exit() | ||
|
||
hiveHost = sys.argv[1] | ||
hivePort = sys.argv[2] | ||
dbName = sys.argv[3] | ||
hadoopUser = sys.argv[4] | ||
hadoopPassword = sys.argv[5] | ||
|
||
# do the connection | ||
with pyhs2.connect(host=hiveHost, | ||
port=hivePort, | ||
authMechanism="PLAIN", | ||
user=hadoopUser, | ||
password=hadoopPassword, | ||
database=dbName) as conn: | ||
# get a client | ||
with conn.cursor() as client: | ||
# create a loop attending HiveQL queries | ||
while (1): | ||
query = raw_input('remotehive> ') | ||
|
||
try: | ||
if not query: | ||
continue | ||
|
||
if query == 'exit': | ||
sys.exit() | ||
|
||
# execute the query | ||
client.execute(query) | ||
|
||
# get the content | ||
for row in client.fetch(): | ||
print row | ||
|
||
except Pyhs2Exception, ex: | ||
print ex.errorMessage | ||
|