Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for a dynamix index #47

Open
pulkitsinghal opened this issue Aug 18, 2016 · 16 comments
Open

support for a dynamix index #47

pulkitsinghal opened this issue Aug 18, 2016 · 16 comments

Comments

@pulkitsinghal
Copy link

pulkitsinghal commented Aug 18, 2016

Steven @onstrike07 Aug 14 17:34
I have a question about the index setting in datasource.json for this connector. Is it possible to make the index name to be dynamic? I mean I want to pass index name as a parameter from frontend for each query. Our situation is that our log data, which is very huge, are split to different indices in ES for each hour, for example, log_2016_08_10_02 index is for log data that were generated between 2:00 am and 3:00 am on August 10, 2016. Frontend code determines which index to query against based on time window users specify.

cc @onstrike07

@pulkitsinghal
Copy link
Author

You use case is interesting and possible because the source code doesn't tie down the underlying driver to a specific index so it should be possible to do this.

Made an exploratory video where I try to brainstorm what someone might have to do to make this happen:
https://www.dropbox.com/sh/jygk4pscosqyfel/AAAiSRhCEJeTFd-tBpETJnNOa?dl=0

@pulkitsinghal
Copy link
Author

pulkitsinghal commented Aug 18, 2016

In chat I had mentioned:

Pulkit Singhal @pulkitsinghal Aug 14 17:45
to search on a new index every time doesn't have much to do with searching on different model ... to compare its like searching on a different database in mongodb, right? ... the connection url hasn't changed by much but the db has changed

So its interesting, that today I found a usecase about mongodb where the user wanted to switch out the collections accessed by the connector: strongloop/loopback#2603

Just FYI.

@onstrike07
Copy link

onstrike07 commented Aug 25, 2016

Based on Pulkit's video instruction, I made a workaround and it works for query for now. What I did are:

  1. In a query, pass a parameter which is at same level as 'where', as shown below:
    {"where":{ "user_id":999},"timescale" :1}
  2. Add a logic in ESConnector.prototype.buildFilter function to handle this parameter, as shown below.

filter.index would be overwritten if a valid timescale parameter is available, otherwise, it still uses the one in datasource.json file.


    if (criteria.timescale) {
        console.log('timescale: ' + criteria.timescale);
        var getIndexName = function (date) {
            var year = date.getFullYear();
            var getmonth = date.getMonth() + 1;
            var month = getmonth < 9 ? '0' + getmonth : '' + getmonth;
            var day = date.getDate() < 10 ? '0' + date.getDate() : '' + date.getDate();
            var hour = date.getHours() < 10 ? '0' + date.getHours() : '' + date.getHours();
            return year + '_' + month + '_' + day + '_' + hour;
        }//Builds a index name based on a date. Format: YYYY_MM_DD_HH

        var getIndexNames = function () {
            var indexNames = '';
            var queryPeriod = 0; //In hours
            var endDate = new Date(2016, 7, 10, 2, 10); //TODO: replace it with the current date in production
            switch (criteria.timescale) {
                case 1: queryPeriod = 5/60; break;
                case 2: queryPeriod = 15/60; break;
                case 3: queryPeriod = 1; break;
                case 4: queryPeriod = 6; break;
                case 5: queryPeriod = 12; break;
                case 6: queryPeriod = 24; break;
            }
            var startDate = new Date();
            startDate.setTime(endDate.getTime() - queryPeriod * 60 * 60 * 1000); //milliseconds
            if (queryPeriod < 1 && endDate.getHours() - startDate.getHours() > 0) queryPeriod++;
            console.log('EndDate: ' + endDate);
            console.log('Backward hours: ' + queryPeriod);
            for (var i = 0 ; i <= queryPeriod ; i++) {
                //TODO: check availability of the index - if it's not valid in ES, ignore it.
                indexNames += getIndexName(endDate) + ',';
                endDate.setTime(endDate.getTime() - 60 * 60 * 1000);
            }

            //Remove ending comma
            indexNames = indexNames.substring(0, indexNames.length - 1);

            return indexNames;
        }//Builds a list of index names that are separated by comma

        var indexNames = getIndexNames();
        filter.index = indexNames; //Overwrite the index name with the composed one
    }

@onstrike07
Copy link

It works for query, but not for count. I'm thinking I may need to implement it in buildWhere function.

Another task is need to figure out how to validate an index in ES by using ES API.

@juanpujol
Copy link

@onstrike07 did you changes something on the constructor or defaults? Thanks.

@onstrike07
Copy link

I only added something in the function below in esConnector.js

ESConnector.prototype.buildFilter

@pulkitsinghal
Copy link
Author

@0candy, @jannyHou, @superkhau and @bajtos - I wanted to enable users of this elascticsearch connector to build "dynamic indexes" by providing a function in the datasources.<env>.js file:

module.exports = {
  'db': {
    'name': 'elasticsearch-plain',
    'connector': 'es',
    'mappings': [
      {
        "name": "Log",
        "index": function(){ // JUST LIKE MAGIC
          return 'index_a,index_b,index_c';
        },
        "properties": {...}
      }
    ]
  }
};

But regarding environment-specific configuration it says here that:

The additional files can override the top-level data-source options with string and number values only. You cannot use objects or array values.

That's a bummer ... i feel like it should be possible to use a function despite that warning, no?

@pulkitsinghal
Copy link
Author

cc @onstrike07 and @juanpujol - the comment above highlights my latest plan for enabling dynamic indices. Along with a minor change to:

ESConnector.prototype.addDefaults = function (modelName) {
    ...
    // fetch index and type from `self.settings.mappings` too
    ...
    if (mappingFromDatasource) {
      indexFromDatasource = mappingFromDatasource.index;
      typeFromDatasource = mappingFromDatasource.type;
    }

    // after fetching index and type from `self.settings.mappings`
    if (_.isFunction(indexFromDatasource)) {
        filter.index = indexFromDatasource(); // generate index or indices dynamically
    }
    ...
};

It doesn't have room for the logtype variable that @onstrike07 uses in custom implementation ... but i'll try to think of how to sneak in other implementation specific stuff in a generic fashion too. Ideas are welcome.

@superkhau
Copy link

superkhau commented Oct 6, 2016

@pulkitsinghal I'm not sure about the ramification to supplying a function, you can give it a shot and let us know of any issues you run into. The only negative I can see is it won't work with the datasources.*.json versions.

@bajtos
Copy link
Member

bajtos commented Oct 6, 2016

The additional files can override the top-level data-source options with string and number values only. You cannot use objects or array values.

That's a bummer ... i feel like it should be possible to use a function despite that warning, no?

I think using functions should be fine. Here is the source code of the type-compatibility check used by loopback-boot: https://github.com/strongloop/loopback-boot/blob/0d985bae0ba918395660850711acad32cbf964cb/lib/config-loader.js#L300-L313

@pulkitsinghal Could you perhaps send a pull request to correctthat sentence you quoted?

@pulkitsinghal
Copy link
Author

@bajtos and @superkhau - thank you! And a have one follow up question about testing.

  1. Usually in connector test files, I'm setup via:

    var settings = require('./resource/datasource-test.json');

  2. In this case I could do:

    `var settings = require('./resource/datasource-test.local.js');

  3. But if I would feel a lot better if I could somehow use a more realistic mechanism for loading datasource-test.json and then datasource-test.local.js via loopback-boot ... I don't like hardcoding the test because I might miss a real problem during TDD. Is there some code for reference in another loopback project that I can lean back on?

@superkhau
Copy link

1+2) I think you should inject it via env vars or the way we do with rc at https://github.com/strongloop/loopback-connector-mysql/blob/master/test/init.js#L14-L24 -- this keeps things at the connector level (ie. it doesn't care about the datasource-test.json -- it takes any acceptible config value and assumes caller has already sanitized the object properly).

  1. That is outside the concerns of the connector (as loopback-boot will be one loading the `.json files). If you're learning towards an end-to-end type test, you could require loopback and do something like https://github.com/strongloop/loopback-connector-kv-redis/blob/master/test/helpers/data-source-factory.js#L3 but instead of a data factory, require loopback and inject/test accordingly.

@pulkitsinghal
Copy link
Author

I want to take a moment to point out the fact that @onstrike07's code used caching because of global variables being reused. Caching is something which I should give some thought to as well if the index/mapping checks are happening repeatedly.

@pulkitsinghal
Copy link
Author

thank you for your earlier comments @superkhau

@pulkitsinghal
Copy link
Author

pulkitsinghal commented Oct 7, 2016

@onstrike07 - it is a bit trickier than i thought:

  1. your situation is such that only reading from the indices is required but when I try to be a bit more thorough ... creating and mapping them gets tricky as i end up with something much more comprehensive, which I'm not sure anyone wants:
        {
            "name": "Log",
            "index": function(){
                return {
                    write: function(){ return ['index_a','index_b','index_c']; }
                    ,read: function(){ return 'index_a,index_b,index_c'; }
                };
            },
            "properties": {
                "message": {"type": "string", "index" : "not_analyzed"},
                "timestamp": {"type": "date"}
            }
        }
  1. ESConnector.prototype.addDefaults needs to know which CRUD operation is calling it to use either index.read() or index.write() but ofcourse it is not even that simple
  2. I feel that our connector should probably give up the responsibility for creating indices and mappings completely or provide a much clearer place to let customizers override that logic ... letting the code create them based on what addDefaults might set as an index value (as we've done in the past) is just not pretty or even feasible.

@onstrike07
Copy link

onstrike07 commented Oct 7, 2016

I agree it's complicated. Fortunately, my project only requires query, but not the whole set of CRUD. I doubt it if it makes sense to insert a record in multiple indices in a single statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants