Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic plugin functionality (up through writing to disk) #1

Closed
chrisranderson opened this issue Jun 26, 2017 · 14 comments
Closed

basic plugin functionality (up through writing to disk) #1

chrisranderson opened this issue Jun 26, 2017 · 14 comments
Assignees

Comments

@chrisranderson
Copy link
Owner

chrisranderson commented Jun 26, 2017

To: @dandelionmane, @jart, @caisq

I named it Beholder (for now) because... it's a viz project, I'm a nerd, it's short, and I'm too lazy to think of anything else right now. Anyway, here it goes. It's pretty close to how @dandelionmane described it in tensorflow/tensorboard#130.

proposed design

People should push tensors to the front end with two function calls: a constructor, Beholder, and an update function. Here's the flow I'm imagining.

  • Construct a Beholder, with configuration options, including:
    • logdir: where the logs go.
    • window_size: how many frames to use for calculating variance.
    • tensors: a list of tensors to visualize. Default behavior is "grab everything I can find".
    • scaling: either "layer" or "network". Determines how to scale the parameters for display. Scales using the min/max of the layer or the entire network.
  • Call beholder.update() in the train loop. If visualizing variance, a size-limited queue will be used to hold the t most recent tensors, one for each time update is called.
    • Determine what type of data to get from logdir/beholder/mode.
    • If the mode changed, clear the queue.
    • Get the appropriate tensors from the model, turn it into a bunch of numpy arrays (I'm not aware of any other way to save tensor values over time). Add those arrays to the queue.
    • Process the numpy arrays into an image. This might include reshaping, concatenating, and scaling to [0, 255] for image display.
    • write the image to logdir/beholder/current_frame or something. Only one file will exist there at the time, so there's no need to worry about disk space. The current worry is more about memory since I'm thinking of storing millions of parameters for several timesteps.
      • I picture this changing in the future, after I get things working on the front end. Like was mentioned, this can be replaced with grpc or something.
    • Maybe keep track of how quickly it writes images, and somehow communicate to the front end about how often it should poll. On second thought, this probably just needs to wait until v2 when the back end and front end are communicating in some other way, where this might not even be necessary.

questions / response requested

  1. Is there a better way to save the tensors over time than pulling them out as numpy arrays?
  2. How should I write the image? I don't know anything about the advantages of protocol buffers. Right now, I would just use cv2.imwrite, but I suppose I could make a tensor from the numpy array (sounds like it could be expensive, but I haven't tested anything) and then use tf.summary.image and a FileWriter to save the image.
  3. This design makes it simple for people to use; however, it doesn't really fit the TensorBoard way of using summaries and a writer to save things. Do I instead make some type of summary, and some type of writer?
  4. I kinda get what's going on with Bazel, but I have no clue how I need to use it for this project. Do I need to consider that at this point?
  5. This can wait til later, but I have no clue what I'm doing on the front end. Do I need to understand that to build the back end well? Also, what would you suggest I look at for that? Would there be a new tab in TensorBoard? Should I be developing this as its own little div that we can throw into TensorBoard later?
  6. Is it bad that the back end produces an image rather than handing over raw parameters over the last t steps? It will be faster to do it this way.
  7. Should I be following a style guide? Do you have some linter or something I should use?
@teamdandelion
Copy link

  1. You could put out tensor protos but there isn't really a need to. Numpy array seems fine.
  2. cv2.imwrite seems fine for me.
  3. It would be nice if you use the plugin asset utils to organize the directory structure so that each run directory has its own "plugins/beholder" assets in a consistent place. For now, I don't think there's a need for you to use the summary system.
  4. Bazel is the best way to depend on TensorBoard, @jart can tell you more about it. But if you want to avoid that complexity for now and just get something started, you could just fork the tensorboard repository; that will give you a simpler initial path to getting something working inside of TensorBoard. If you do fork tensorboard, you could just cannibalize one of the existing plugins and make it serve your functionality.
  5. Just copy over one of the existing plugin frontends and start modifying it to suit your needs. I would look through the plugins and start with one that is relatively close to what you want.
  6. I think that's fine.
  7. We use pylint with some custom settings on our Python code. Here's the rc file: https://github.com/tensorflow/tensorboard/blob/master/.pylintrc

@chrisranderson
Copy link
Owner Author

chrisranderson commented Jun 27, 2017

  1. Edit: I think I have it figured out now. Put the repo under tensorboard/plugins, imports have to be relative to the workspace. Does that sound right?

original comment:

Re: #4, I'm getting a handle on Bazel, and I'm trying to use TensorBoard stuff, but I get a visibility error. I'm not sure why, since I see package(default_visibility = ["//visibility:public"]) at the top of the package BUILD file of event_processing. Here is the error:

ERROR: /home/chris/school-projects/beholder/beholder/BUILD:3:1: Target '@org_tensorflow_tensorboard//tensorboard/backend/event_processing:plugin_asset_util' is not visible from target '//beholder:beholder'. Check the visibility declaration of the former target if you think the dependency is legitimate.
ERROR: Analysis of target '//demo:demo' failed; build aborted.
INFO: Elapsed time: 0.132s
ERROR: Build failed. Not running target.

Here is my build file:

package(default_visibility=["//visibility:public"])

py_library(
  name = "beholder",
  deps = ["@org_tensorflow_tensorboard//tensorboard/backend/event_processing:plugin_asset_util"],
  srcs = ["Beholder.py"]
)

I tried copying my repo into the TensorBoard plugins directory, and adjusting label names, and that seemed to work... except now when I try import beholder, it says the module doesn't exist. This despite listing //tensorboard/plugins/beholder/beholder:beholder as a dependency of the demo Python binary. For the version copied into tensorboard/plugins, here's my beholder BUILD:

package(default_visibility=["//visibility:public"])

py_library(
  name = "beholder",
  deps = ["//tensorboard/backend/event_processing:plugin_asset_util"],
  srcs = ["Beholder.py"]
)

and my demo BUILD:

py_binary(
  name = "demo",
  deps = ["//tensorboard/plugins/beholder/beholder:beholder"],
  srcs = ["demo.py"]
)

Should I be developing everything from within the tensorboard directory? If not, how do I get around visibility issues? If so, do you have tips for debugging strategies with Bazel, or do you know what I'm doing wrong?

@chrisranderson
Copy link
Owner Author

chrisranderson commented Jun 28, 2017

@dandelionmane @jart

  1. how would you like to review my code? I'm trying to keep this repo in a fairly production-quality state as I go. I'm sure there will be tons of stuff to tweak by the time I'm finished with v1.

@chrisranderson
Copy link
Owner Author

chrisranderson commented Jun 28, 2017

Also, @jart, you mentioned about not using huge pull requests...

  1. is it okay if I get v1.0 ready and submit smaller ones after that (maybe on a per-feature basis)? The first PR will be big, but they should be smaller than that. I'd love to do smaller ones, but I'm not sure how.

@chrisranderson chrisranderson changed the title designing the back end basic plugin functionality (up through writing to disk) Jun 29, 2017
@jart
Copy link
Contributor

jart commented Jun 29, 2017

You can develop things however you want in this repository. But since it isn't a fork, you're going to encounter things that are impossible for you to do, without modifying the TensorBoard codebase. When You encounter those situations, send us small pull requests that unblock your development workflow, and we'll review them as quickly as possible. Then you bump the TensorBoard SHA1 in your WORKSPACE file and continue your work here. (Note: You can use Bazel's local_repository rule to point Bazel at a TensorBoard git repo on your local disk, to test that the PRs you send us work.)

Eventually you'll get this repository in a state where it's working really well, and at that point, we'll start having a conversation about upstreaming it into the TensorBoard codebase.

@chrisranderson
Copy link
Owner Author

chrisranderson commented Jun 29, 2017

  1. For right now, I'll break the mold of how <PLUGIN_NAME>_plugin.py files are written and read the image without using accumulator or multiplexer stuff, but I'm guessing in the future there will be a tf.summary.video that I would use, with a special tag name for this plugin? Would I write that summary op? Or will the high-performance version of this plugin not use the accumulator and multiplexer?

  2. I should probably just make this plugin the beginning of a video plugin. I'm thinking this project should be broken into two parts. First, the video plugin itself that just takes images as input, and second, the thing that produces images that visualize the network weights/etc.

@jart
Copy link
Contributor

jart commented Jun 29, 2017

Can you use tf.summary.tensor_summary instead of creating a tf.summary.video summary?

@chrisranderson
Copy link
Owner Author

Yes, but wouldn't it be good to have a summary that is explicitly for streaming things in real time? So I have some function that accepts image frames, somehow (I have zero video experience, let alone live video) streams them to the server, which somehow streams it to the TensorBoard client?

@chrisranderson
Copy link
Owner Author

So my plugin becomes only something that takes frames and streams them, and then... I have some separate thing that is more for TensorFlow that creates the images. You can use it to make gifs or whatever, or you can send it live to TensorBoard using the plugin.

@jart
Copy link
Contributor

jart commented Jun 30, 2017

If you have it go through the summary system, and it's slow, then we'll just make the summary system go fast. I'm currently working on a new data ingestion pipeline, that takes summaries and writes them directly to a SQL database. I'm very interested in making this pipeline go as quickly as possible. If you build something awesome, that is a little laggy, then that just gives me even more incentive to make this thing as lightning fast as possible. Plus you'll get your work done quicker.

@teamdandelion
Copy link

Big +1 to what Justine said, about using tf.summary.tensor_summary rather than adding a new video summary op.

@chrisranderson
Copy link
Owner Author

Okay, I can do tensor summaries. @jart, you mentioned earlier using ffmpeg and streaming it to the browser, and something else about using sockets. I'm not sure what you meant (I know very little about streaming) - are tensor summaries better than those ideas?

Also, from what I read in the code, there is a global reload timer for reading event files, and they are cached and returned from the multiplexer on demand. Should I base timing off how quickly the client requests frames, and manually grab the tensor summary at every request? It looks like that's how the text plugin works, at least for index_impl. Is there some way that everything can get pushed, originating from the user's program?

@jart
Copy link
Contributor

jart commented Jul 1, 2017

The problem with doing sockets is it takes ops works for users to configure and it wouldn't persist in a database. It's nice to persist the data. Maybe someone will want to let it train overnight, and watch the video in the morning.

One thing you can do to create the video, is just generate it on the fly from the tensor summaries as soon as the web browser requests it. For example you could probably pipe the raw tensor data into the ffmpeg command and then pipe the output to the web browser.

@chrisranderson
Copy link
Owner Author

It is far from perfect, but 4b5f245 has the user-side script writing tensor summaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants