Skip to content

Commit

Permalink
[#187][#215] implementing storage tiering for data from remote zones
Browse files Browse the repository at this point in the history
  • Loading branch information
Robert Verkerk authored and alanking committed Nov 28, 2023
1 parent 67a2185 commit b50254b
Show file tree
Hide file tree
Showing 7 changed files with 64 additions and 30 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,24 +160,24 @@ imeta add -R medium_resc irods::storage_tiering::minimum_restage_tier true

A tier within a tier group may identify data objects which are in violation by an alternate mechanism beyond the built-in time-based constraint. This allows the data grid administrator to take additional context into account when identifying data objects to migrate.

Data objects which have been labeled via particular metadata, or within a specific collection, owned by a particular user, or belonging to a particular project may be identified through a custom query. The default attribute **irods::storage_tiering::query** is used to hold this custom query. To configure the custom query, attach the query to the root resource of the tier within the tier group. This query will be used in place of the default time-based query for that tier. For efficiency this example query checks for the existence in the root resource's list of leaves by resource ID. Please note that any custom query must return DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM in that order as it is a convention of this rule engine plugin.
Data objects which have been labeled via particular metadata, or within a specific collection, owned by a particular user, or belonging to a particular project may be identified through a custom query. The default attribute **irods::storage_tiering::query** is used to hold this custom query. To configure the custom query, attach the query to the root resource of the tier within the tier group. This query will be used in place of the default time-based query for that tier. For efficiency this example query checks for the existence in the root resource's list of leaves by resource ID. Please note that any custom query must return DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM in that order as it is a convention of this rule engine plugin.

```
imeta set -R fast_resc irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING' AND DATA_RESC_ID IN ('10068', '10069')"
imeta set -R fast_resc irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING' AND DATA_RESC_ID IN ('10068', '10069')"
```

The example above implements the default query. Note that the string `TIME_CHECK_STRING` is used in place of an actual time. This string will be replaced by the storage tiering framework with the appropriately computed time given the previous parameters.

Any number of queries may be attached in order provide a range of criteria by which data may be tiered, such as user applied metadata. To allow a user to archive their own data via metadata they may tag an object such as `archive_object true`. The tier may then have a query added to support this.

```
imeta set -R fast_resc irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'archive_object' AND META_DATA_ATTR_VALUE = 'true' AND DATA_RESC_ID IN ('10068', '10069')"
imeta set -R fast_resc irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'archive_object' AND META_DATA_ATTR_VALUE = 'true' AND DATA_RESC_ID IN ('10068', '10069')"
```

Queries may also be provided by using the Specific Query interface within iRODS. The archive object query may be stored by an iRODS administrator as follows.

```
'iadmin asq "SELECT DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'archive_object' AND META_DATA_ATTR_VALUE = 'true' AND DATA_RESC_ID IN ('10068', '10069')" archive_query
'iadmin asq "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'archive_object' AND META_DATA_ATTR_VALUE = 'true' AND DATA_RESC_ID IN ('10068', '10069')" archive_query
```

At which point the query attached to the root of a storage tier would require the use of a metadata unit of `specific`:
Expand Down
13 changes: 11 additions & 2 deletions exec_as_user.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

namespace irods {
template <typename Function>
int exec_as_user(rcComm_t* _comm, const std::string& _user_name, Function _func)
int exec_as_user(rcComm_t* _comm, const std::string& _user_name, const std::string& _user_zone, Function _func)
{
auto& user = _comm->clientUser;

Expand All @@ -16,11 +16,20 @@ namespace irods {
//}

const std::string old_user_name = user.userName;
const std::string old_user_zone = user.rodsZone;

rstrcpy(user.userName, _user_name.data(), NAME_LEN);
rstrcpy(user.rodsZone, _user_zone.data(), NAME_LEN);

irods::at_scope_exit<std::function<void()>> at_scope_exit{[&user, &old_user_name] {
rodsLog(
LOG_DEBUG,
"Executing as user [%s] fom zone [%s]",
user.userName,
user.rodsZone);

irods::at_scope_exit<std::function<void()>> at_scope_exit{[&user, &old_user_name, &old_user_zone] {
rstrcpy(user.userName, old_user_name.c_str(), MAX_NAME_LEN);
rstrcpy(user.rodsZone, old_user_zone.c_str(), MAX_NAME_LEN);
}};

return _func(_comm);
Expand Down
20 changes: 14 additions & 6 deletions libirods_rule_engine_plugin-unified_storage_tiering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ namespace {
const std::string& _instance_name,
const std::string& _object_path,
const std::string& _user_name,
const std::string& _user_zone,
const std::string& _source_replica_number,
const std::string& _source_resource,
const std::string& _destination_resource,
Expand Down Expand Up @@ -420,13 +421,14 @@ namespace {
parser.first_resc(source_resource);

auto proxy_conn = irods::proxy_connection();
rcComm_t* comm = proxy_conn.make(_rei->rsComm->clientUser.userName);
rcComm_t* comm = proxy_conn.make(_rei->rsComm->clientUser.userName, _rei->rsComm->clientUser.rodsZone);

irods::storage_tiering st{comm, _rei, plugin_instance_name};

st.migrate_object_to_minimum_restage_tier(
object_path,
_rei->rsComm->clientUser.userName,
_rei->rsComm->clientUser.rodsZone,
source_resource);
}
else if("pep_api_data_obj_open_post" == _rn ||
Expand Down Expand Up @@ -468,12 +470,13 @@ namespace {
auto [object_path, resource_name] = opened_objects[l1_idx];

auto proxy_conn = irods::proxy_connection();
rcComm_t* comm = proxy_conn.make(_rei->rsComm->clientUser.userName);
rcComm_t* comm = proxy_conn.make(_rei->rsComm->clientUser.userName, _rei->rsComm->clientUser.rodsZone);

irods::storage_tiering st{comm, _rei, plugin_instance_name};
st.migrate_object_to_minimum_restage_tier(
object_path,
_rei->rsComm->clientUser.userName,
_rei->rsComm->clientUser.rodsZone,
resource_name);
}
}
Expand All @@ -492,13 +495,15 @@ namespace {
const std::string& _group_name,
const std::string& _object_path,
const std::string& _user_name,
const std::string& _user_zone,
const std::string& _source_replica_number,
const std::string& _source_resource,
const std::string& _destination_resource) {
_st.apply_tier_group_metadata_to_object(
_group_name,
_object_path,
_user_name,
_user_zone,
_source_replica_number,
_source_resource,
_destination_resource);
Expand Down Expand Up @@ -714,19 +719,21 @@ irods::error exec_rule_expression(
irods::storage_tiering::policy::data_movement ==
rule_obj.at("rule-engine-operation")) {
try {
// proxy for provided user name
// proxy for provided user name and zone
const std::string& user_name = rule_obj["user-name"];
const std::string& user_zone = rule_obj["user-zone"];
auto& pin = plugin_instance_name;

auto proxy_conn = irods::proxy_connection();
rcComm_t* comm = proxy_conn.make( rule_obj["user-name"]);
rcComm_t* comm = proxy_conn.make( rule_obj["user-name"], rule_obj["user-zone"]);

auto status = irods::exec_as_user(comm, user_name, [& pin, & rule_obj](auto& comm) -> int{
auto status = irods::exec_as_user(comm, user_name, user_zone, [& pin, & rule_obj](auto& comm) -> int{
return apply_data_movement_policy(
comm,
plugin_instance_name,
rule_obj["object-path"],
rule_obj["user-name"],
rule_obj["user-zone"],
rule_obj["source-replica-number"],
rule_obj["source-resource"],
rule_obj["destination-resource"],
Expand All @@ -735,12 +742,13 @@ irods::error exec_rule_expression(
});

irods::storage_tiering st{comm, rei, plugin_instance_name};
status = irods::exec_as_user(comm, user_name, [& st, & rule_obj](auto& comm) -> int{
status = irods::exec_as_user(comm, user_name, user_zone, [& st, & rule_obj](auto& comm) -> int{
return apply_tier_group_metadata_policy(
st,
rule_obj["group-name"],
rule_obj["object-path"],
rule_obj["user-name"],
rule_obj["user-zone"],
rule_obj["source-replica-number"],
rule_obj["source-resource"],
rule_obj["destination-resource"]);
Expand Down
12 changes: 6 additions & 6 deletions packaging/test_plugin_unified_storage_tiering.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ def setUp(self):
admin_session.assert_icommand('imeta add -R rnd2 irods::storage_tiering::group example_group 2')
admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::time 5')
admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::time 15')
admin_session.assert_icommand('''imeta set -R rnd1 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('''imeta set -R rnd1 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::minimum_delay_time_in_seconds 1')
admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::maximum_delay_time_in_seconds 2')
admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::minimum_delay_time_in_seconds 1')
Expand Down Expand Up @@ -485,7 +485,7 @@ def setUp(self):
admin_session.assert_icommand('imeta add -R rnd2 irods::storage_tiering::group example_group 2')
admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::time 5')
admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::time 15')
admin_session.assert_icommand('''imeta set -R rnd1 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('''imeta set -R rnd1 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')

admin_session.assert_icommand('iadmin mkresc ufs0g2 unixfilesystem '+test.settings.HOSTNAME_1 +':/tmp/irods/ufs0g2', 'STDOUT_SINGLELINE', 'unixfilesystem')
admin_session.assert_icommand('iadmin mkresc ufs1g2 unixfilesystem '+test.settings.HOSTNAME_1 +':/tmp/irods/ufs1g2', 'STDOUT_SINGLELINE', 'unixfilesystem')
Expand All @@ -498,7 +498,7 @@ def setUp(self):
admin_session.assert_icommand('imeta add -R ufs0g2 irods::storage_tiering::time 5')
admin_session.assert_icommand('imeta add -R ufs1g2 irods::storage_tiering::time 15')

admin_session.assert_icommand('''imeta set -R ufs1g2 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM where RESC_NAME = 'ufs1g2' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('''imeta set -R ufs1g2 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs1g2' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::minimum_delay_time_in_seconds 1')
admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::maximum_delay_time_in_seconds 2')
admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::minimum_delay_time_in_seconds 1')
Expand Down Expand Up @@ -602,7 +602,7 @@ def setUp(self):
admin_session.assert_icommand('imeta add -R rnd2 irods::custom_storage_tiering::group example_group 2')
admin_session.assert_icommand('imeta add -R rnd0 irods::custom_storage_tiering::time 5')
admin_session.assert_icommand('imeta add -R rnd1 irods::custom_storage_tiering::time 15')
admin_session.assert_icommand('''imeta set -R rnd1 irods::custom_storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::custom_access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('''imeta set -R rnd1 irods::custom_storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::custom_access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::minimum_delay_time_in_seconds 1')
admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::maximum_delay_time_in_seconds 2')
admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::minimum_delay_time_in_seconds 1')
Expand Down Expand Up @@ -929,7 +929,7 @@ def setUp(self):
super(TestStorageTieringMultipleQueries, self).setUp()
with session.make_session_for_existing_admin() as admin_session:
admin_session.assert_icommand('iqdel -a')
admin_session.assert_icommand('''iadmin asq "select distinct R_DATA_MAIN.data_name, R_COLL_MAIN.coll_name, R_DATA_MAIN.data_owner_name, R_DATA_MAIN.data_repl_num from R_DATA_MAIN, R_COLL_MAIN, R_RESC_MAIN, R_OBJT_METAMAP r_data_metamap, R_META_MAIN r_data_meta_main where R_RESC_MAIN.resc_name = 'ufs0' AND r_data_meta_main.meta_attr_name = 'archive_object' AND r_data_meta_main.meta_attr_value = 'yes' AND R_COLL_MAIN.coll_id = R_DATA_MAIN.coll_id AND R_RESC_MAIN.resc_id = R_DATA_MAIN.resc_id AND R_DATA_MAIN.data_id = r_data_metamap.object_id AND r_data_metamap.meta_id = r_data_meta_main.meta_id order by R_COLL_MAIN.coll_name, R_DATA_MAIN.data_name" archive_query''')
admin_session.assert_icommand('''iadmin asq "select distinct R_DATA_MAIN.data_name, R_COLL_MAIN.coll_name, R_DATA_MAIN.data_owner_name, R_DATA_MAIN.data_owner_zone, R_DATA_MAIN.data_repl_num from R_DATA_MAIN, R_COLL_MAIN, R_RESC_MAIN, R_OBJT_METAMAP r_data_metamap, R_META_MAIN r_data_meta_main where R_RESC_MAIN.resc_name = 'ufs0' AND r_data_meta_main.meta_attr_name = 'archive_object' AND r_data_meta_main.meta_attr_value = 'yes' AND R_COLL_MAIN.coll_id = R_DATA_MAIN.coll_id AND R_RESC_MAIN.resc_id = R_DATA_MAIN.resc_id AND R_DATA_MAIN.data_id = r_data_metamap.object_id AND r_data_metamap.meta_id = r_data_meta_main.meta_id order by R_COLL_MAIN.coll_name, R_DATA_MAIN.data_name" archive_query''')

admin_session.assert_icommand('iadmin mkresc ufs0 unixfilesystem '+test.settings.HOSTNAME_1 +':/tmp/irods/ufs0', 'STDOUT_SINGLELINE', 'unixfilesystem')
admin_session.assert_icommand('iadmin mkresc ufs1 unixfilesystem '+test.settings.HOSTNAME_1 +':/tmp/irods/ufs1', 'STDOUT_SINGLELINE', 'unixfilesystem')
Expand All @@ -939,7 +939,7 @@ def setUp(self):

admin_session.assert_icommand('imeta add -R ufs0 irods::storage_tiering::time 15')

admin_session.assert_icommand('''imeta add -R ufs0 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, DATA_REPL_NUM where RESC_NAME = 'ufs0' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('''imeta add -R ufs0 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs0' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''')
admin_session.assert_icommand('''imeta add -R ufs0 irods::storage_tiering::query archive_query specific''')
admin_session.assert_icommand('imeta add -R ufs0 irods::storage_tiering::minimum_delay_time_in_seconds 1')
admin_session.assert_icommand('imeta add -R ufs0 irods::storage_tiering::maximum_delay_time_in_seconds 2')
Expand Down
10 changes: 6 additions & 4 deletions proxy_connection.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ namespace irods {
rErrMsg_t err_msg;
rcComm_t* conn;

auto make(const std::string client = "") -> rcComm_t*
auto make(const std::string clientUser = "", const std::string clientZone = "") -> rcComm_t*
{
rodsEnv env{};
_getRodsEnv(env);
Expand All @@ -19,10 +19,12 @@ namespace irods {
env.rodsPort,
env.rodsUserName,
env.rodsZone,
!client.empty() ?
client.c_str() :
!clientUser.empty() ?
clientUser.c_str() :
env.rodsUserName,
env.rodsZone,
!clientZone.empty() ?
clientZone.c_str() :
env.rodsZone,
&err_msg,
0, 0);

Expand Down
Loading

0 comments on commit b50254b

Please sign in to comment.