-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorBoard Master's capstone project #130
Comments
Hi @chrisranderson, This looks really cool! I'd love to support such an ambitious and interesting project if it's technically feasible. Right now, every TensorBoard plugin gets its data via the summary system. Ie, they get data from event files that are written to disk by the summary.FileWriter. It's purely one-way communication, and it has high latency, because TensorBoard ingests data from the event logs at most every 5 seconds. Also, everything written there is by default persisted forever on disk, so if we use it for high-throughput communication it will quickly saturate disk. So, the summary system as presently written is inappropriate for any real-time streaming application. @jart is working on revamping the summary system to use sqlite and to support data streaming, so I'll let her chime in on whether she thinks the new summary system would be a good fit for your application. @caisq has worked on establishing direct 2-way grpc communication between TensorFlow and TensorBoard. It would be ideal if we could leverage his work, but we've had difficulty open-sourcing it due to issues with some dependencies. I think it would also be feasible for you to develop your own system, specific to this plugin, for getting data from TensorFlow to TensorBoard, and setting up 2way communication. Something like the following: Let's suppose your plugin is called the RealTimeParameterVisualizer (maybe we'd come up with something catchier later 😛). Then you create a class On instantiation, the ParameterWriter makes a directory within the logdir like On the TensorBoard backend side, you create The ParameterWriter uses poll checking to see when the In the example you gave of the million parameter model, if we want to show 16-value greyscale for each parameter at 30 FPS, that would be (10^6 values * 0.5 bytes/value * 30 per second) = 15MB/s which seems reasonable for writing/reading to disk, and processing without too much latency. Or, if we were willing to have 1 update per second, then it would be just 500KB/sec. @jart / @wchargin Please share your thoughts too. Now to answer your questions:
I'm not aware of any widely accessible version of this. I think it would be novel work, and quite valuable to the community.
See discussion above. I think we could accomplish something that feels fast to the user, and is near-real-time.
This seems like an interesting project to me. It could be made technically simpler by taking away the realtime component, and settling for getting data ~once per minute. But, you could focus more on building UI interaction and visualizations to really dive into the data and find ways to interpret the weights in context. E.g. serializing the weights and activations for k training examples, and looking at how the patterns of activations are different for different examples.
You could build a GUI on your own, and it will be easier for you to develop since you'll have control over everything, and won't be limited by TensorBoard's assumptions. However, convincing people to discover and use a new tool is always an uphill battle. I guess that if this is integrated into mainline TensorBoard, the usage will be several orders of magnitude higher than if you make a purely standalone tool. Also tagging @colah and @shancarter as they may have thoughts to offer. |
One of the things on my bucket list has been to develop some type of visualization, where we encode data in real time using ffmpeg and stream it to the browser in a video tag. So I would be interested in supporting something like this. |
Wow, awesome. I was a bit worried that the reply would be like "this should be on the google group instead" or some other dismissal. :) So, when you say you'd like to support the project, what does that mean? I hack on it for a few days, and when I get stuck I can ask you for help? If I can have a hand here and there, I'd like to try doing this in TensorBoard. I have a timeline I need to stick to - I start on the project June 26th, and finish by August 14th, so I'll start on Monday. Is this a project that could get merged into the repo? Also, for first steps, I think I'll figure out how grpc works. Closest thing I've used is ZMQ (maybe not close at all? I'm pretty ignorant here). I guess I'll figure out how to send images from a Python script to... Node or something? I've done a decent amount of JS, but I'm really cloudy on how I'll talk to TensorBoard. Thanks for your responses! |
Based on our experience, gRPC isn't quite ready yet. I also have a lot of respect for ZeroMQ, but I'm not sure if we need it. We can probably just stream protobufs over a socket using writeDelimitedTo() and a sentinel message on close (to avoid weird TCP edge cases.) "Support" means we can put the time aside to participate in the development process with you, by offering code reviews, answering questions, and making any framework changes you might need. This works best when there's a tight feedback cycle. For example, we like to see lots of small pull requests, rather than a one big code dump. I would recommend is checking out web_library_example. It's an example of how to do TensorBoard development in a separate repository, without forking the codebase. You basically need a BUILD and WORKSPACE file to get started. |
@chrisranderson I think the project you described is very interesting and can be very useful for a lot of people. It will benefit model interpretation, understanding and debugging, which is getting more and more important as new types of DL models get invented every week. TensorFlow has TensorBoard and TFDBG, both of which has limitations. For example, TFDBG allows you to see all the intermediate tensor values during runtime. But all it currently has is a text-based interface in the shell, which is not ideal for visualizing the graph structures in TensorFlow models. TensorBoard has great graph visualization, but its connection with the TensorFlow runtime is not real-time. A visual debugger for TensorFlow in TensorBoard will be a great feature. Just imagine what you can see and do if you could "step" through nodes of a graph, visualize its output tensor as a table, a curve, an image or a video. You can also modify the tensor value before continuing further on the graph... TFDBG already has a protocol for real-time streaming of data from TF runtime. But as @dandelionmane and @jart pointed out, due to some yet-unfulfilled feature requests in the gRPC library, these are not fully functional in open-source tensorflow yet. I can check with the gRPC team on their time line to fulfill the feature request. The request mainly has to do with implementing a py_grpc_library bazel genrule. Even if their timeline is too far in the future, we can find a way to bypass the missing feature and do it the same way as the way tensorflow/core/distributed_runtime does it, i.e., implement the server in C++. The part we have to work out ourselves is SWIG-wrapping it so that it can be used in Python, as a TensorBoard plugin. The C++ libraries of the aforementioned protocol is not fully open-source yet, but I can easily make them open-source soon. I'll think twice before implementing the protocol again in another framework, as it may cause unnecessary duplicate work and confusion to clients. |
cc @chihuahua |
@chrisranderson As Justine (@jart) said, we're happy to support you by doing code reviews, answering questions, and making upstream changes if you need them. I think per Justine's suggestion, you should make a new repository for the plugin and use bazel rules to depend on it - forking web_library_example is a good starting point. We can also set up a video call so you can ask us questions, if you want. The goal for the project will be to get your plugin to a point where we are comfortable absorbing it from you into As you can see from the back-and-forth on this thread, there are a lot of different opinions on how to do the communication between TensorFlow and TensorBoard. Personally, I would advocate for something that is simple (not too many new dependencies) and likely to work in different platforms and environments, like writing/reading to disk. Eventually (once gRPC is ready) we will probably want to consolidate everything to use the same implementation as TFDBG. So I think my 2c would be either:
|
+1 what @dandelionmane said. I think it's a good idea to build a simple communication channel between TF runtime and TensorBoard that can be easily replaced with grpc once its py_grpc_library genrule is ready. I will be a happy to provide the kind of support that @dandelionmane mentioned as well. I can also you keep you abreast of any potentially relevant changes in TFDBG. |
@chrisranderson forgot to mention in the previous post: the file write-read option @dandelionmane mentioned is a good candidate for the kind of simple communication channel mentioned above. TFDBG's can write out tensorflow.Event protobuf files to the disk using its file:// debug URLs. This unit test is a good place to start reading about it: TFDBG also has modules for reading such files and their directory structures. See also the test above, in addition to the API doc at: |
Okay, based on what I've read, my overall plan (which is very, very hazy) is:
Would you like to continue communication here, or should I start making issues on my forked repo? I've never really done much that seriously on GitHub. Is it preferred that I use the issues for task management? Open an issue for every thing I'm working on and every commit corresponds to an issue? Thanks again! I'm excited and nervous to get going. I'll start Monday. |
That sounds like a reasonable plan. You'll want to poke around the baze docs to understand the web_library_example. For communication, if you link your forked repo, I'll watch it and respond to issues that you post there. I think that may be cleaner than using this thread for everything. If you have trouble getting a hold of us, poke us here. |
I can follow it too. If you post an issue in your new repository every time you have a question, then it can sort of become like a stack overflow for how to extend TensorBoard with Bazel. But in all fairness, they might get better search rankings if the questions are posted either here, on on TensorBoard's Stack Overflow. What do you think @dandelionmane? I'm leaning towards the latter. |
Here is the repo, and here is my first set of questions: chrisranderson/beholder#1. I wouldn't mind writing up some type of guide or blog post after this is all done about writing a plugin for TensorBoard - maybe you all could take it and edit to death and post it somewhere? |
That would be so great - you are gonna be the first external contributor to write a TB plugin, and a write-up on how it's done would make it a lot easier for other people to follow in your footsteps. |
I presented on my project today to the CS department, and passed! :) I guess I can close this issue now. If anyone is interested in the future of this project, you can find a discussion here: chrisranderson/beholder#33 Thank you for your help! |
Whoa—congratulations!! 🎉 🎉 🎉 |
Congrats! |
Well deserved! |
Congrats! |
@caisq Based on your comment about gRPC, just to make more sure, is gRPC ready for setting up 2way communication between TF debugger and TB debugger, for now? |
GOOD! |
I have about 120-150 hours to work on a project for school, and I was thinking about doing a visualization project in TensorBoard, and I'd like it to be usable by lots of people. Here's my idea:
Users select from one of the following to view:
These would just be pulled out of the network, reshaped, and visualized like so (and maybe a separate area for 1st conv layer filters) (video here: https://www.youtube.com/watch?v=gjXmacaxlYI):
I have some questions:
The text was updated successfully, but these errors were encountered: