Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async architecture (inventory) #880

Open
ktbyers opened this issue Nov 29, 2023 · 30 comments
Open

Async architecture (inventory) #880

ktbyers opened this issue Nov 29, 2023 · 30 comments

Comments

@ktbyers
Copy link
Collaborator

ktbyers commented Nov 29, 2023

This issue is async inventory related only, see:

#881

For additional async related topics and links to separate issues on these topics.

@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 29, 2023

@dgarros @ogenstad

@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 29, 2023

Some initial work/thoughts from dbarroso. Note, some of this was from the early days of asyncio so some patterns may or may not be relevant

#390
#660

@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 29, 2023

I think we should start with a discussion regarding Inventory and async and what this means/how it works?

Currently, inventory is single-threaded sync (it is outside the context of "runners"/before runners).

So if we have, for example, a NetBox system with 1000 devices and wanted an async inventory plugin how would this operate? I picked NetBox as there is a NetBox Nornir-plugin and I have a working NetBox system to test against (but it could be any remote HTTP API inventory system).

Current inventory load behavior is this:

    inventory_plugin = InventoryPluginRegister.get_plugin(config.inventory.plugin)
    inv = inventory_plugin(**config.inventory.options).load()

    # then basically call a transform function (on the hosts if one has been defined)

@dbarrosop
Copy link
Contributor

discussion regarding Inventory and async

I think the first question I'd raise here is; is it really necessary? do we actually need an async inventory for any particular reason or is it just to say we support async inventories? Asking because the inventory is a one-shot thing you run when you initialize it, afterwards you mostly work with already loaded data. I'd understand async tasks to incrementally add information (i.e. a sync inventory that loads the hosts/groups and metadata but interfaces' data, BGP information, etc. is loaded afterwards via async tasks) but not sure I understand the use case for an async inventory.

@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 30, 2023

I would probably generalize the question and ask, "do we need concurrency in inventory loading" (whether async vs threaded).

In other words, for large inventories being loaded via API what are the load times and what kind of improvements can be made via using a concurrent solution (and how much complexity does that entail).

@ubaumann
Copy link
Contributor

I could imagine that the API and the rate limit are the bottleneck. Or we are loading too much data.

I would appreciate it if an inventory plugin would have an option for a minimal import. (Only host name and some needed properties but not all interfaces, for example.) And if more information is required it could be added to a task. Of course, we have to ensure the threads don't write the same inventory properties, but if we do it by host, it should be fine.

I think it would be beneficial to investigate where the bottleneck is.

@dbarrosop
Copy link
Contributor

dbarrosop commented Nov 30, 2023

if an inventory plugin would have an option for a minimal import ... And if more information is required it could be added to a task

100% that, that's why I was suggesting before that what makes more sense is leaving the inventory as it is and rely on "inventory tasks" (which could be async) to load workflow-specific data.

IMO, given there is no clear use-case in mind, I'd suggest tabling this particular topic (async inventory) until something more concrete arises.

P.S: To keep discussions a bit easier to track I'd also suggest to create different issues for each one of your bulletpoints, otherwise this might get messy real quick :P

@ktbyers ktbyers changed the title Async architecture Async architecture (inventory) Nov 30, 2023
@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 30, 2023

@dbarrosop @ubaumann So just to make sure I am understanding what you are both saying--a better pattern would be to have very simplified inventory plugins (minimal set of information needed to connect to the device).

And then subsequent Nornir tasks which populate additional information in hosts/groups for the inventory information that you need. These subsequent Nornir tasks could then be threaded or async (assuming we build an async-runner).

Is that a correct summary?

@dbarrosop
Copy link
Contributor

Yes, exactly. Use the inventory plugin to download your devices, groups and metadata (so you can group them and filter them) and then rely on tasks within your workflows to download the specific data you need for that specific workflow. In my experience it is the best way of managing a large network with lots of configuration data. Otherwise you end up with a humungous inventory that takes ages to be loaded and requires a generous amount of RAM.

@ogenstad
Copy link
Collaborator

Depending on which environment Nornir is running in I think it could be relevant to have a way to load the initial nodes in an asyncio loop. I.e., if Nornir is started from an environment that is already running within asyncio you wouldn't want to have a blocking call even if it's just to gather a couple of hostnames.

However given how Nornir works I don't see this as an issue at all. The only current "blocker" is that the InitNornir function does two synchrounous requests now:

    return Nornir(
        inventory=load_inventory(config),
        runner=load_runner(config),
        config=config,
        data=data,
    )

If someone wants the initial loading of the inventory to be async if you be very easy to just have these kind of helper functions in your own code, even if they aren't part of the core.

@itdependsnetworks
Copy link
Contributor

@dbarrosop came to a similar conclusion with ansible and documented here https://blog.networktocode.com/post/nautobot-ansible-variable-management-at-scale/

@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 30, 2023

A remaining question though is, if you are loading a large number of devices from inventory (say >1000) even with minimal API calls per device, is it worth adding concurrency for this in Nornir.

Or is this a case where people would always just solve it on their own (with some sort of a custom inventory plugin) or just not really a problem?

@itdependsnetworks
Copy link
Contributor

I guess you could go as wide as the threads you have on your api, but seems just as likely to cause issues than resolve, for when you fill up all of the threads and potentially (perhaps not likely?) overload the sql server.

@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 30, 2023

@ogenstad Wouldn't the load() call inside of the load_inventory() would also be blocking i.e. for API calls this would be sync http API calls? And wouldn't this force you to build your own inventory plugin to workaround this?

For example, for the NetBox plugin:

https://github.com/wvandeun/nornir_netbox/blob/develop/nornir_netbox/plugins/inventory/netbox.py#L111

@dgarros
Copy link
Contributor

dgarros commented Nov 30, 2023

I'm 100% for having a minimal inventory but I still thing there is a valid use case for having async support for the inventory.
Assuming we want to pull 5000 devices from Netbox/Nautobot, it means you would most likely have to make 5 API calls because of the pagination. With Async it would be really easy to make these 5 calls concurrently and it will reduce the execution time of the inventory by as much without adding much complexity.

I think the argument about having too many concurrents calls that will overload the API isn't really the point here, it's easy to limit that in the code

Also as @ogenstad mentioned when you are running an async code, it's best if all I/O functions support async. so if the goal is to run Nornir Task with Async it make a lot of sense to have the inventory support Async too

One more thing to consider, I think it make sense to use a single library/client/plugin within Nornir to connect to a SOT system like Netbox/Nautobot. So if we agree that it make sense to use Async for the task, it wouldn't make sense to use a different library/client/plugin for the inventory.

@dgarros
Copy link
Contributor

dgarros commented Nov 30, 2023

@ogenstad Wouldn't the load() call inside of the load_inventory() would also be blocking i.e. for API calls this would be sync http API calls? And wouldn't this force you to build your own inventory plugin to workaround this?

For example, for the NetBox plugin:

https://github.com/wvandeun/nornir_netbox/blob/develop/nornir_netbox/plugins/inventory/netbox.py#L111

Yes this would need to support async as well

@dgarros
Copy link
Contributor

dgarros commented Nov 30, 2023

I opened PR #882 with a proposal to support both Sync and Async inventory with very minimal changes.
I hope it will help with this discussion.

@ogenstad
Copy link
Collaborator

@ogenstad Wouldn't the load() call inside of the load_inventory() would also be blocking i.e. for API calls this would be sync http API calls? And wouldn't this force you to build your own inventory plugin to workaround this?

For example, for the NetBox plugin:

https://github.com/wvandeun/nornir_netbox/blob/develop/nornir_netbox/plugins/inventory/netbox.py#L111

Sure, there would be a bit more boiler plate, though the Nornir object itself only cares about the Inventory data it gets from InitNornir and the loader. What I meant was that it could be loaded directly into the object.

@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 30, 2023

@dgarros Can you re-state this? I didn't follow what you were saying here?

One more thing to consider, I think it make sense to use a single library/client/plugin within Nornir to connect to
a SOT system like Netbox/Nautobot. So if we agree that it make sense to use Async for the task, it wouldn't make 
sense to use a different library/client/plugin for the inventory.

@dgarros
Copy link
Contributor

dgarros commented Nov 30, 2023

@ktbyers if I take our SDK and our nornir plugin as an example
We have 2 versions of the client, a Sync and an Async one, let's call them Client and ClientAsync
When we query the inventory, each client will return a list of Device objects with and without Async support,

  • Client returns Device objects
  • ClientAsync returns DeviceAsync objects (with async support)

So if I want to use Async in the Tasks, it wouldn't make sense to use the non async client for the inventory because it would create the wrong objects, without Async support.

Also the client itself is usually passed from the inventory to the tasks so it would have to be re-initialized later in the code.

@itdependsnetworks
Copy link
Contributor

I think the argument about having too many concurrents calls that will overload the API

Apologies, it was not meant to be an argument, just a point of consideration.

@ktbyers
Copy link
Collaborator Author

ktbyers commented Nov 30, 2023

@itdependsnetworks It is all good...we are just trying to see the pros and cons of various options.

@dbarrosop
Copy link
Contributor

dbarrosop commented Dec 1, 2023

I am convinced now there is value in async plugins if anything due to pagination (even though IMO inventory plugins should come with some pre-filtering capabilities so you can download only devices you may be interested in). However, this raises the following question; why support both synch/asynch? For instance, the following comment from @dgarros:

So if I want to use Async in the Tasks, it wouldn't make sense to use the non async client for the inventory because it would create the wrong objects, without Async support.

This comment basically suggests that nornir all of a sudden is going to become a fragmented framework where you need to carefully combine your inventories and plugins to make sure they match and can work together. Or did I misunderstand the comment? Assuming I didn't, why are we trying to support both together, then? Isn't this going to be confusing and/or lead to a lot of complexity within nornir? Why not consider releasing nornir 4 as a fully async framework exclussively in that case? I am personally not a fan of having nornir support a disjoint set of plugins that aren't compatible with each other.

@dgarros
Copy link
Contributor

dgarros commented Dec 1, 2023

Unfortunately I think the industry is not ready to completely migrate to Async and a lot of people are using Nornir without Async successfully so I think it makes sense to support both for the time being.
Hopefully as the ecosystem of plugins supporting Async grows more users will migrate and hopefully someday it won't be necessary to support both.

@dbarrosop
Copy link
Contributor

dbarrosop commented Dec 1, 2023

That's fine, but then plugins should be compatible with each other. I don't think nornir users should be put in a position where they need to pick and match plugins and try to understand if they are compatible with each other. This is just going to be bad UX and cause problems to maintainers as people try to figure out why things don't work.

To clarify, what I mean here is that a user should be able to use CoolDgarrosAsyncInventory and then call netmiko_send_command and, viceversa, use YamlInventory and then call amazing_async_task.

@dbarrosop
Copy link
Contributor

Maybe we need to take a step back and decide goals and non-goals and how the ecosystem is going to work in this brave new world before we start discussing architecture/implementation details.

@ktbyers
Copy link
Collaborator Author

ktbyers commented Dec 1, 2023

I will create a set of goals/non-goals in #881.

@ktbyers
Copy link
Collaborator Author

ktbyers commented Dec 5, 2023

I did a very rough inventory proof-of-concept here:

https://github.com/ktbyers/nornir/pull/1/files
https://github.com/ktbyers/nornir_netbox/pull/1/files

This is with a separate InitNornirAsync and a separate load_async method (in the plugin and in InitNornirAsync).

I didn't really worry about code duplication or making all of the parts async (i.e. was focussing on the NetBox http calls and converting them to async).

I used NetBox as I have a new version of Netbox in my environment (so it was easier for me to test).

I also wanted to get a better sense of some of the async code implications (on the inventory side).

@dbarrosop
Copy link
Contributor

dbarrosop commented Dec 6, 2023

Looking at your example I am assuming my interpretation of the comment below was wrong:

So if I want to use Async in the Tasks, it wouldn't make sense to use the non async client for the inventory because it would create the wrong objects, without Async support.

I assumed the comment meant that the expectation was that an async inventory plugin wouldn't work with a sync task but your load_async returns a regular Inventory so current sync tasks should work just fine.

@ktbyers
Copy link
Collaborator Author

ktbyers commented Dec 6, 2023

Yes, I would definitely want the Nornir inventory objects (Hosts, Groups, Defaults) to not care about their load mechanism (i.e. sync vs async).

I think we could accomplish both though (i.e. add a field to the Host objects indicating whether they were loaded via async or not) and then some form of a config option (or potentially logic from the Nornir end-user) disallowing any use of sync tasks (for async inventory objects).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants