Written reviewing the version v0.8.8-beta-prerelease-2
Everything is subject to change.
- Android and WebGL builds has not been tested beyond the version
v0.6.2
- The main engine is
TexInputs
. It fetches the input (e.g. ajpg
file to a Texture object) then infers the depth using the infererDepthModel
. Then it updates the meshMeshBehavior
. When this cycle is finished (not parallelized), theMainBehavior
callsTexInputs.UpdateTex()
, repeating the cycle.
The main script of the program. A bit messy.
The constants for indicating the type of the input. To summary:
NotExists
,Dir
,Unsupported
Img
,Vid
,Depth
:Depth
is the "depthfile" that contains depth maps of images or videos. These share the same processing classImgVidDepthTexInputs
since they are heavily interwined (since it can't determine if it's a image or a video before opening the depthfile). WhileImgVidDepthTexInputs
is a bit bloated, if depthfile processing is not considered handling image inputs is actually fairly short, see here, a three-line snippet (code for webgl, not tested) that loads the image, infers the depth, and visualizes it.Online
: inputs that keeps changing. Like mirroring the desktop screen, where input image keeps changing every time it is requested. Also used for the Http inputs.Gif
Pgm
: the Portable Graymap Format that contains a grayscale image. It is used as the depth map format for this program and the python scripts, and it can be used independently as the depth map input. Note that in this way the inference engine is not needed.
The main initialization.
MeshBehavior
: the object the depth values will be visualizedDepthModelBehavior
: the depth map inference engineVRRecordBehavior
: this is for recording the screen on VR headsets.ServerConnectBehavior
: this is to connect to a server that provides the depth map when an image is given (AsyncDepthModel
). It is only implemented for the image inputs and its usage is not recommended. Also while waiting the server, it may block changing the input. Not to be confused withHttpOnlineTex
, which fetches images from the server.
But it seems it can be removed here, since _vp
is only for used for being destroyed on the termination of the program (which can be just fetched then) and to be used for an argument for ImgVidDepthTexInputs
.
If there were several meshes and different VideoPlayer
s it would make sense, but as it doesn't seem to be implemented, I think this can be omitted here and have it fetched on ImgVidDepthTexInputs
.
It's to set the variables _canUpdateArchive
and _searchCache
based on the toggle UI's values (i.e. the value that was set on the editor).
Can these variables just be replaced with OutputSaveToggle.isOn
and SearchCacheToggle.isOn
?
This code is outdated and would only work for image and video inputs, if there's any usage for this.
If the keys in _sendMsgKeyCodes
is pressed, it will be sent to the active _texInputs
(that is processing the input). For example, if Keypad5
is pressed, it will inform the ImgVidDepthTexInputs
and will pause the video.
By default it uses the Standalone File Browser, which calls the OS file explorer.
The current code makes me nauseous. Also the commands of _depthModelBahav
can be moved to that class, since there wouldn't be meshes other than the main one.
This does not call the wrapper method LoadBuiltIn()
but instead just calls _depthModelBehav
directly. The reason behind this is that the wrapper method expects the other objects to be already initialized.
First we import the min/max values of the sliders using MeshSliderParents
and load the options from Utils
.
Called every frame. Since the inference is not parallelized, this is called every time _texInputs
is done inferencing and setting the mesh.
This really has to be in the another class, more below.
Note that it does not take effect when the option menu is active, but when the UI is hidden this exception would be ignored.
As now the windows are managed by WindowsManager
I think it can be generalized instead of checking the Options menu.
But it would not make sense for the small windows (e.g. directory window), so extra steps would be needed.
Also send the pressed key.
BUG: when changed to the fullscreen, it would just stretch the previous screen and make it look blured.
When SaveOutput
is on, SearchCache
will be always on. This is because saving the output requires computing the hash, and that hash can be used to find the cache. Of course this can be optional, but I think there's no reason not to.
As said, this should be sepereated to another class.
When the user clicks browse dirs
button, it calls BrowseDirs()
. This saves the filenames on a list. The gif files can be omitted, since the current gif decoder is slow.
It just omits from the list so to see gifs after loading the directory it has to be loaded again...
Most of them are called from OptionsBehavior
, some of them can just be inserted to it. Maybe the mesh should be assigned to a static varible.
I could not make my WMR click the UI idk
Note that it does not check if the file actually exists.
The interface to be used by TexInputs
.
When this is false, the mesh would not update when the actual depth value is not changed. False for the video inputs.
Sets the depth values.
The former stores the (x, y) values before the projection.
They are used for inversing the depth, z = 1/(Alpha
* x + Beta
).
The constant 0.96f/150
. In the older versions the mesh was 150m from the camera, and its scale was 1.
After the new parameters the mesh should not change its size when the camera distance is changed, it has to be scaled linearly. Thus it is divided by 150.
Also the default scaled was shrinked a bit, about 4%.
Distance between the camera and the mesh. Changing this will make the mesh disappeared, since seeing it changing hurts my eyes.
The Relative scale.
These two parameters are not useful now.
When 1
, the mesh would project the vertices on the screen.
Deprecated: the values lowers than Threshold
would be set to TargetVal
.
If this is not set to UInt32
, the depth maps whose size is bigger than 256*256
, which is the size of the built-in model, will be broken.
Indicates if should set the vertex colors. Used for point clouds.
Values to set to the material.
It uses the shader included in this library. The point size should be scaled later.
Rotates the mesh in predefined manner. If the angles differ too much it would look clunky.
- Interface
DepthModel
has aRun()
method that returns aDepth
object. - Interface
AsyncDepthModel
has a callback that would be called when the inference is complete. Used byServerConnectBehavior
.
Inverse
: true depth values scaled, shifted and inversed. ParameterAlpha
andBeta
are used for this.Linear
Metric
: Actual values in meter (which is identical to a Unity unit)
Inverse
and Linear
are expected to be normalized to [0, ..., 1]
Uses Unity's Barracuda engine.
The current version is v3.0
, which does not support MiDaS v3 or higher.
The code is modified from here
TODO: change the filename to just OnnxRuntimeDepthModel.cs
These dll files has to be in DEPTH/Assets/Plugins/OnnxRuntimeDlls/win-x64/native
They are in the nuget package files (.nupkg), get them from
https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.Managed/
[THE NUPKG FILE]/lib/netstandard1.1/*.dll
https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.Gpu/
[THE NUPKG FILE]/runtimes/win-x64/native/*.dll
From Microsoft.ML.OnnxRuntime.Gpu
onnxruntime.dll
onnxruntime_providers_shared.dll
onnxruntime_providers_cuda.dll
onnxruntime_providers_tensorrt.dll (i don't think that this is needed)
From Microsoft.ML.OnnxRuntime.Managed
Microsoft.ML.OnnxRuntime.dll
I think it would work in the linux build if you get the .so files in linux-64 directory.
To use other providers other than CUDA, the matching provider dll has to be present, which does not exist on the official binary build. Thus it has to be built from the source. The GPU providers to test and implement are: DirectML, OpenVINO, TVM, ROCm.
To connect to a server that can infer the depth and return it as a pgm
file when an image is given (depthserver.py
).
TODO: Have DepthServerModel
implement the AsyncDepthModel
, not ServerConnectBehavior
.
It has two methods: SelectFile()
and SelectDir()
, which calls the callback function when the path is selected.
StandaloneFileSelecter
: usesStandalone File Browser
, which calls the OS file browser.SimpleFileSelecter
: usesSimple File Browser
, which uses its own UI. Used for the Android version.WebGLFileSelecter
: a WebGL version ofStandalone File Browser
. The current code wouldn't work, change it referencing the last version that supported WebGL. (UploadFile
has to call the callback)
Used when the input texture (almost) always changes.
-
Field
Supported
: If the operation is available. I don't think it's needed anymore. -
Field
LastTime
: The last time the tex is updated. Always differs when it's real-time (e.g. Screen capturing) -
StartRendering()
: Prepare the input. -
GetTex()
: Fetch the texture.
Deprecated method to capture the screen. It calls Windows system calls to check the available processes and System.Drawing
for capturing. Really slow.
Fetches the jpgs from the server Screen Capture Server
[https://github.com/parkchamchi/screencaptureserver], which guarantees 20fps for 1080p.
- When it fails to fetch the image, it will return the placeholder texture.
The main engine.
-
UpdateTex()
-
SendMsg()
: send a subclass-specific message (e.g. video control) -
Field
WaitingSequentialInput
: if true, it's waiting for an additional input, such as (depth map, actual texture) pair -
SequentialInput()
Handles images, videos, and depthfiles.
This class was not changed much since its alpha era (when it was not split from MeshBehavior
).
Most of its complexity comes from the low-level operations with the VideoPlayer
and misdesigned interactions with the DepthFileUtils
.
To point out some characteristics...
-
The depthfile is compatible with the python scripts.
-
The image input is the only input that supports
AsyncDepthModel
. -
When
searchCache
is on, it will check if the saved cache with the same hash value exists. -
When
canUpdateArchive
is on, it will save the output so that it can be loaded later whensearchCache
is on. -
By its type, one of
FromImage()
,FromVideo()
,FromDepthFile()
will be called. -
FromDepthFile()
then receives the additional input and callsFromImage()
ofFromVideo()
.
- After fetching the texture from the file,
_orig_width
and_orig_height
is set. This is actually insignificant, since its only purpose is to be saved to the depthfile metadata where they would not matter. - Also set
_startFrame
and_frameCount
, just for the compability with the video depthfile metadata. - Then if
_searchCache
is on and it does not already have the path for depthfile (i.e. not fromFromDepthFile()
), check if the saved file exists. - If the depthfile is found, load from it. This also sets
_paramsDict
to set the parameters if exists, more below. - Else if
AsyncDepthModel
exists, infer it using it. This uses the callbackOnDepthReady()
. This will not create/modify the depthfile. - Else, infer it using the
DepthModel
and save it if it should. - Finally, set the mesh.
- Search the cache if it should.
- Check if it's "full", which means all frames are processed and present in the file. Then it won't have to save the outputs to the depthfile.
_vp.sendFrameReadyEvents = (_startFrame < 0) ? true : false;
: Ah I remember, on the python code to generate the depthfile_startFrame
was set to-1
since it can't be determined. In that case they have to initialize them on the first frame._startFrame
not being a negative number means it has been processed using this program.- Initialize the parameter, if it should.
Have the VideoPlayer
make the frameReady
event so that the variables be initialized on the load.
- Finally, set the
VideoPlayer
's target. The video will play automatically.
Called when _vp.sendFrameReadyEvents
is true.
- On initialization (except a valid
_startFrame
was on the depthfile), this is called to set_startFrame
and other variables to be saved to the depthfile. - Also called when recording, where it's just redirected to
RecordingFrameReady()
.
Called when UpdateTex()
is called when a video input is present and is not recording.
-
Check the frame number of the
_vp
. if it's identical to the already processed one (_currentFrame
), ignore and return. -
Calculate
actualFrame
: some videos do no start at frame 0, which causes problems with the compability with the python script, where the first available frame (_startFrame
) is always 0. Subtract the_vp
's frame with the_startFrame
to get the actual frame number that starts with 0. -
Fetch the texture.
-
If the depthfile and the depth values for the current frame is present, load it.
-
Else, infer it, save it if it should (create the depthfile if it should too)
-
Finally, set the mesh.
-
Update the parameters if the saved parameters for the frame exists.
By default the video always loops.
- Pause the video.
- Save the depth using
SaveDepth()
. - If the depthfile was created, load it so it can be used.
- Replay the video.
The "saving the output" operations above are async methods whose Task
objects are save to the _processedFrames
list.
Also has shouldReload
parameter to reload the depthfile. This is set true when the depthfile is modified, because for some reason opening an entry that was already opened is prohibited even when no modification is done.
- Wait for all tasks in the list.
- Determine if the depthfile is full and if it is set
_shouldUpdateArchive
false. In this case when it's reopened the mode for the depthfile will be changed fromUpdate
toRead
.
Indicates that an additional input is needed.
Called by SequentialInput()
.
- Check the type of the input.
- Set the location for the depthfile
- Call
FromImage()
orFromVideo()
.
Triggered when the matching command is given.
- Set the size out the output (2048, 4096)
- Create the path of the output
- If the input is the image, simply
Capture()
and return. - Else, set
_recording
true,_shouldCapture
(whether the screen should be captured at the very moment) false, make the_vp
send theframeReady
event, and rewind the video.
As frameReady
is set true, this will called every frame.
- Pause the video.
- Call
UpdateVid()
manually. This changes the mesh. - Set
_shouldCapture
true, indicating that the screen has to be captured.
Called when UpdateTex()
is called and it's recording.
Only active when _shouldCapture
.
- Capture the screen.
- Set the
_shouldCapture
false, making the mesh update - Check if the video has ended, and advance or terminate.
Adds the task for capturing to _processedFrames
.
Cleanup and wait for all tasks.
TODO: delete the reference for the AsyncDepthModel
Saves the current parameters to _paramDict
, which can be saved to the depthfile.
If init
, this will be loaded at the first frame and the frame number for this will be -1
.
Cleanup the handlers, save the depths, save the _paramsDist
if it should. Also dispose the DepthFileUtils
.
Uses OnlineTex
.
Decodes gif files using UniGif
. Unfortunatley, this operation is slow (probablity not meant for real-time input).
It uses the GifPlayer.cs
wrapper. More there.
Uses pgm files as the depth map. Requires an additional input to use as the texture.
The contents of the About
screen is loaded from an text file.
For all sliders on this program. The values on the label changes when the slider value changes.
The scripts for the sliders for the mesh. Extends SliderParentBehavior
.
A static class to store all mesh sliders. There min/max values can be saved and loaded.
- Get the target parameter.
- Add listener to the mesh.
- Add itself to the
MeshSliderParents
.
Invoked when any parameter in the mesh changes.
- Check if the parameter name matches.
- If the value is identical, it means the event was invoked by the slider.
- When the value is in range of the slider, set it accordingly.
- Else, just change the value of the label but not the slider's value. Also visually indicate this case.
Operations for the options menu.
Most of them calls the wrapper methods in the MainBehavior
.
Exports the current depth values as an image file.
TODO: add an EXR export using EncodeToEXR()
BUG: resized depth maps have artifacts on the borders
Why hasn't this been deleted
When VrMode()
is called, it would change the canvas to the world space and activate the VR controllers. Except the controller trigger does not work for unknown reason.
TODO: for android builds, revert this to this
Sets the GameObjects that are used by multiple classes to a static class.
Manages the windows so that only one is active on screen.
For Cardboards.
For manupulating depthfiles.
TODO: seperate actual static utility functions and the ones that can be disposed
Parallel to the python version.
Get the matching depthfile name given the filename, hash value, and the model.
Save the depth values at the frame given.
- All files are saved as 8-bit pgm file.
- The variable
_count
keeps track of valid frames, and when it reaches the framecount the file is set tofull
.
Check if the depthfile is available with the matching hashval and the model.
Again, parallel with the python version.
Converts the Depth
object into a pgm bytestring.
- If it's not
ReadOnlyMode
and not full, open it asUpdate
mode. Else, use theRead
mode. - Also fetches
framecount
(which is also in themetadata
),modelType
(why is this still here?),metadata
dictionary,paramDict
. - These better be an another object?
Fetch the Depth
object from the given frame.
There is a code that converts the legacy parameters (MeshLoc
, ...) to the new ones.
16bit PGM files are not supported.
For using Unity coroutines in non-Monobehaviour
s.
- When a gif is started decoding, it saves the start time to the variable
_decodingStartTime
and to the callback function. - Upon decoded, the callback funtion checks if the given time is equal to
_decodingStartTime
. If it isn't, it means another gif has started being decoded. Then the decoded contents are discarded. - This is because halting the decoding before it finishes creates a memory leak.
- All textures are decoded at once, and it calcuates the current frame using the time offset and returns the matching texture.
Sets the current VR camera rotation to the origin direction.
Converts image files to the texture.
Calculates the SHA-256 hash.
Returns the UNIX time.
Create the directory if it doesn't exist, does nothing if it exists.
Shouldn't this be in UIStaticSetter.cs
?
Read/Write the options strings that saves searchCache
and saveOutput
. It's just a text file with two lines. I believe this just should use PlayerPrefs
Converts the Depth
object into a texture.
- Shouldn't this be in
Depth
?
Saves the screen on the VR headset.