Skip to content

Interacting With Stack

Sam Jackson edited this page Jun 1, 2018 · 8 revisions

DB & Controller

STACK uses two wrappers, DB & Controller, to let users control the tool from the command line. The DB wrapper is used to write and retrieve information from the database, such as information on a project account or a collector. The Controller wrapper is used to start/stop/restart all STACK processes: collection, data processing, and data insertion.

Listed below are the accessible methods. Further down are detailed instructions to using these wrappers to interacting with STACK.

DB Wrapper Methods

  • setup(): Creates a new project account
  • auth(): Authenticates a project account
  • get_project_list(): Lists all current projects saved to a server
  • get_collector_ids(): Lists the names and IDs for each collector for a given project account
  • get_project_detail(): Returns all information on a current project and the collectors it owns
  • get_collector_detail(): Returns the detail for a single collector
  • set_collector_detail(): Creates a new collector for a given project account
  • get_network_detail(): Returns the detail for a network module for a given account. Not used directly by users

Controller Methods

  • start(): Starts a given process
  • stop(): Stops a given process
  • restart(): Restarts a given process

main.py

Users should call main.py from the main directory to interact with STACK, which in turn will invoke one of the functions above dependent upon the params passed. All calls return a standard JSON response of:

{'status': 'x', 'message': 'xxxxx'}

WHERE

  • status = 1 on success; 0 on failure
  • message = ‘success’ or ‘failure'

Additional data may be contained in this JSON response depending on the function, as detailed below.

To invoke a DB wrapper method, the standard syntax is:

 sudo python __main__.py db {db_method} {params}

WHERE

  • db_method - from the list of accessible methods above
  • params - listed below for each method

Similarly, to invoke a Controller method, the standard syntax is:

  sudo python __main__.py controller {controller_method} {params}

Below we have documented how to call each wrapper method using this syntax.

DB Methods

setup()

Used to create a new project account. In addition to saving the project accounts name, password, email, and description, this function also saves standard configuration information for the account. It will return a failed status if duplicate accounts are inputted.

  • Project Name: A name for the project account
  • Password: An account password
  • Email: Used for project notifications
  • Description: A short account description

CLI Syntax

sudo python __main__.py db create_project

Response

{'status': 0|1, 'message': 'Success'|'Failed'}

auth()

When given a project_name and password pair, the auth function returns a project_id for the given project account.

Params

  • project_name
  • password

CLI Syntax

sudo python __main__.py db auth [project_name] [password]

Example

sudo python __main__.py db auth test_account 1234

Response

{
  'status': 0|1,
  'message': 'Success'|'Failed',
  'project_id': project_id|None
}

get_project_list()

Returns a list of all project accounts listed in the master config DB.

CLI Syntax

sudo python __main__.py db get_project_list

Response

{
  'status': 0|1,
  'message': 'Success'|'Failed',
  'project_count': count_value,
  'project_list': [list_of_project_docs]
}

get_collector_ids()

When passed a project_id, this function will return a list of collector names and IDs that the given project account owns.

Params

  • project_id

CLI Syntax

sudo python __main__.py db get_collector_ids [project_id]

Response

{
  'status': 0|1,
  'project_name': project_name,
  'collectors': [list_of_collector_names_and_ids]
}

get_project_detail()

When given a project_id will return that project’s details, along with all details of any collection.

Params

  • project_id

CLI Syntax

sudo python __main__.py db get_project_detail [project_id]

Response

{
  'status': 1|0,
  'message': 'Success'|'Failed',
  'project_id': id,
  'project_name': name,
  'project_description': description,
  'project_config_db': configdb_name,
  'collectors': [list_of_collector_docs]
}

get_collector_detail()

When given a project_id and collector_id, returns the details for that given collector.

Params

  • project_id
  • collector_id

CLI Syntax

sudo python __main__.py db get_collector_detail [project_id] [collector_id]

Response

{
  'status': 0|1,
  'message': 'Success'|'Failed',
  'collector': {collector_doc_from_mongo}
}

get_network_detail()

When given a project_id and network module name (i.e. ‘twitter’), returns the details for that network’s control module.

Params

  • project_id
  • network

CLI Syntax

sudo python __main__.py db get_network_detail [project_id] [network]

Response

{
  'status': 0|1,
  'message': 'Success'|'Failed',
  'collector': {network_doc_from_mongo}
}

set_collector_detail()

For a project account, sets up a config doc for the given collector within the user’s config database. Can also be called as an update function. The wrapper checks if the collector exists (based on a unique naming scheme) and will update if it is found in the DB. Also creates an entry in the project account’s ‘project_list’ key in the project document.

Similar to setup.py, this script simply asks the user to input the following params when called.

Params

  • Project Account Name (required): The name of your project account.
  • Collector Name (required): Non-unique name to identify your collector instance.
  • Language(s) (optional): A list of BCP-47 language codes. If this used, the collector will only grab tweets in this language. Learn more here about Twitter language parameters.
  • Location(s) (optional): A list of location coordinates. If used, we will collect all geocoded tweets within the location bounding box. Bounding boxes must consist of four lat/long pairs. Learn more here about location formatting for the Twitter API.
  • Terms (optiona): A line item list of terms for the collector to stream.
  • API (required): Three options: track, follow, or none. Each collector can stream from one part of Twitter's Streaming API:
    • Track: Collects all mentions (hashtags included) for a given list of terms.
    • Follow: Collects all tweets, retweets, and replies for a given use handle. Each term must be a valid Twitter screen name.
    • None: Only choose this option if you have not inputted a terms list and are collecting for a given set of language(s) and/or location(s). If you do not track a terms list, make sure you are tracking at least one language or location.
  • OAuth Information: Four keys used to authenticate with the Twitter API. To get consumer & access tokens, first register your app on https://dev.twitter.com/apps/new. Navigate to Keys and Access Tokens and click "Create my access token." NOTE - Each collector needs to have a unique set of access keys, or else the Streaming API will limit your connection. The four keys include:
    • Consumer Key
    • Consumer Secret
    • Access Token
    • Access Token Secret

A note on location tracking: Location tracking with Twitter is an OR filter. We will collect all tweets that match other filters (such as a terms list or a language identifier) OR tweets in the given location. Please plan accordingly.

Response

{'status': 0|1, 'message': 'Success'|'Failed'}

update_collector_detail()

Once a collector has been created, call this method to update the given collector's details. You specify the parameter to be updated via the command line call. Only one parameter can be updated at a given time.

CLI Syntax

sudo python __main__.py db update_collector_detail [collector_param]

Collector Params

  • collector_name: Used to change the collector's name
  • api: Used to change which API the collector filters from
  • oauth: Used to change OAuth info used to authenticate with the network
  • terms: Add new term(s) and/or change the collection status of an existing term. When called will prompt for the following info:
    • Term: the term value
    • Collect: 0 (for false) or 1 (for true)
    • Continue [y/n]: y to update more terms, n to terminate
  • languages: set new language codes
  • locations: set new location codes

NOTE - For languages and locations, an update will overwrite existing codes. Be sure to include a full list of codes upon update.

Response

{'status': 0|1, 'message': 'Successful'|'Failed'}

set_network_status() & set_collector_status()

These two methods handle flag interaction for STACK scripts and are not called directly as part of the DB wrapper. Instead, these two functions are called by the STACK Controller. In STACK v1.0 all flag interaction has been abstracted away from the front-end.

Controller Methods

The Controller methods are used universally to start, stop, and restart collectors, data processors, and inserters with the following universal syntax:

sudo python __main__.py controller collect|process|insert start|stop|restart [project_id] [collector_id|network]

Detailed below is the specific start, stop, and restart syntax for each of the three process types. To learn more about the difference b/t collectors, data processors, and inserters, please consult the DB & Architecture section of the wiki.

Collectors

Params

  • start|stop|restart - The command for the script
    • 'start': starts a collector
    • 'stop': stops a collector
    • 'restart': restarts a collector
  • project_id - ID for the project account
  • collector_id - ID for the collector to be called

CLI Syntax

sudo python __main__.py controller collect start|stop|restart [project_id] [collector_id]

Upon Start - If successful, the script will return the following response, indicating the process is now running in the background:

Flags set!
Initializing daemon...

Upon Stop or Restart - The script will attempt to shut down the process. NOTE - This can take some time as the process needs to disconnect from its streaming connection. Please wait while the program counts and works to shut down.

Data Processors

Params

  • start|stop|restart - The command for the script
    • 'start': starts a collector
    • 'stop': stops a collector
    • 'restart': restarts a collector
  • project_id - ID for the project account
  • network - The social network module for which data should be processed. NOTE - Currently STACK only supports Twitter, as is reflected in the example below.

CLI Syntax

sudo python __main__.py controller process start|stop|restart [project_id] twitter

Upon Start - If successful, the script will return the following response, indicating the process is now running in the background:

Flags set!
Initializing daemon...

Upon Stop or Restart - The script will attempt to shut down the process. NOTE - This can take some time as the process needs to disconnect from its streaming connection. Please wait while the program counts and works to shut down.

Inserter

Params

  • start|stop|restart - The command for the script
    • 'start': starts a collector
    • 'stop': stops a collector
    • 'restart': restarts a collector
  • project_id - ID for the project account
  • network - The social network module for which data should be processed. NOTE - Currently STACK only supports Twitter, as is reflected in the example below.

CLI Syntax

sudo python __main__.py controller insert start|stop|restart [project_id] twitter

Upon Start - If successful, the script will return the following response, indicating the process is now running in the background:

Flags set!
Initializing daemon...

Upon Stop or Restart - The script will attempt to shut down the process. NOTE - This can take some time as the process needs to disconnect from its streaming connection. Please wait while the program counts and works to shut down.