1D gaussian input node implementation (WIP) #129

ams939 · 2023-02-03T09:34:24Z

This PR proposes an implementation of a 1D Gaussian input node (in relation to Issue #128). Currently implemented are some of the new input node code and related IO functions, along with their respective tests.

I'm currently trying to figure out how to implement the 'update_params' function, but I'm a bit confused on how to utilize the missing flow, flow and flow*value to do so. Any tips or feedback on this or any other aspects of the code would be much appreciated.

Other implementational details of note that are open questions are:

'sample_state' function currently does not use the 'threshold' parameter, uses randn(). What problems might this pose in terms of reproducibility?
Introduction of heap index global variables - is this within the projects coding guidelines?

Implementation of a 1D Gaussian into ProbabilisticCircuits.jl, missing the update_params function currently.

src/nodes/gaussian_dist.jl

trappmartin · 2023-02-03T09:51:34Z

'sample_state' function currently does not use the 'threshold' parameter, uses randn(). What problems might this pose in terms of reproducibility?

Is the threshold parameter the seed? It seems that sample_state doesn't take a random number generator, this seems odd to me.
@khosravipasha, how is reproducibility guaranteed at the moment?

khosravipasha

Added comments inline.

src/nodes/gaussian_dist.jl

khosravipasha · 2023-02-03T18:10:13Z

src/nodes/gaussian_dist.jl

+
+function sample_state(dist::Union{BitsGaussian, Gaussian}, threshold, heap)
+    # Sample from standard normal
+    z = randn()


Threshold already gives you the random number you need, so probably should not sample again. We sample all the randomness we need beforehand, and pass them along to the input nodes, so here threshold is basically sampled from a uniform distribution from [0,1]. The threhold is basically log of a uniform random variable from [0-1].

I guess for Guassian might need to use the reverse CDF to make it work using threshold. If its not easy to do, maybe we can adjust how randomness is passed to the input nodes.

Here is where we pass the random numbers to input nodes sample_state
https://github.com/Juice-jl/ProbabilisticCircuits.jl/blob/27cb093439c8db5b6e59f75567800ff92d4fffa6/src/queries/sample.jl#L174

And we sample all randomness needed before calling the kernel
https://github.com/Juice-jl/ProbabilisticCircuits.jl/blob/27cb093439c8db5b6e59f75567800ff92d4fffa6/src/queries/sample.jl#L133

khosravipasha · 2023-02-03T18:28:16Z

src/nodes/gaussian_dist.jl

+    error("Not implemented error: `update_params`, $(typeof(dist))")
+    #heap_start = dist.heap_start
+
+    #missing_flow = heap[heap_start + GAUSS_HEAP_MISSINGFLOW]


Here I think we should be very similar to binomial.

For computing "node_flow" we just add missing_flow to it directly.

For computing "node_flow * value", if we have missing flow then we add oldp * missing_flow to it, the reason is that if we don't observe that guassian the expected value is the oldp (or old_mu) in this case.

pseudocount was basically used to avoid getting 0 probabilities, so its preteding we observed x=i for every i. For the guassian case I case we would treat it the same as the missing_flows (or can ignore it for now since not sure if it fits with continous/guassian vairable ) so can first try with pseudocount=0

Overall, I think should look like something like this,

missing_flow = heap[heap_start + GAUSS_HEAP_MISSINGFLOW] node_flow = heap[heap_start + GAUSS_HEAP_FLOW] + missing_flow + pseudocount old_mu = heap[heap_start] new = (heap[heap_start + GAUSS_HEAP_FLOWVALUE] + (missing_flow + pseudocount) * old_mu ) / (node_flow)

codecov · 2023-02-03T21:52:59Z

Codecov Report

Base: 41.10% // Head: 28.29% // Decreases project coverage by -12.81% ⚠️

Coverage data is based on head (2945a58) compared to base (27cb093).
Patch coverage: 25.39% of modified lines in pull request are covered.

❗ Current head 2945a58 differs from pull request most recent head 5b9b295. Consider uploading reports for the commit 5b9b295 to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #129       +/-   ##
===========================================
- Coverage   41.10%   28.29%   -12.81%     
===========================================
  Files          23       24        +1     
  Lines        1927     1990       +63     
===========================================
- Hits          792      563      -229     
- Misses       1135     1427      +292

Impacted Files	Coverage Δ
src/io/plot.jl	`80.00% <0.00%> (-8.89%)`	⬇️
src/nodes/gaussian_dist.jl	`6.38% <6.38%> (ø)`
src/io/jpc_io.jl	`99.21% <100.00%> (+0.08%)`	⬆️
src/queries/map_cpu.jl	`0.00% <0.00%> (-100.00%)`	⬇️
src/queries/sample_cpu.jl	`0.00% <0.00%> (-100.00%)`	⬇️
src/structures/hclts.jl	`0.00% <0.00%> (-74.73%)`	⬇️
src/queries/likelihood_cpu.jl	`0.00% <0.00%> (-59.62%)`	⬇️
src/nodes/binomial_dist.jl	`3.65% <0.00%> (-51.22%)`	⬇️
src/nodes/indicator_dist.jl	`50.00% <0.00%> (-16.67%)`	⬇️
src/queries/likelihood.jl	`0.00% <0.00%> (-2.90%)`	⬇️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

khosravipasha

I think with the new changes we are almost there.

The main thing is for randomness probably want to do that outside the kernel and initialize with normal random here (basically need to somehow decide to do normal random samples or uniform based on the context)
https://github.com/Juice-jl/ProbabilisticCircuits.jl/blob/27cb093439c8db5b6e59f75567800ff92d4fffa6/src/queries/sample.jl#L133

For bigger benhmarking/testing can probably use Rat-SPN structure with guassian leaves to see if its learning, similar to here (just need to normalize mnist features from UInt8 to floating version).

https://github.com/Juice-jl/ProbabilisticCircuits.jl/blob/27cb093439c8db5b6e59f75567800ff92d4fffa6/examples/RAT_mnist.jl#L35-L47

1D gaussian implementation WIP

5295a32

Implementation of a 1D Gaussian into ProbabilisticCircuits.jl, missing the update_params function currently.

trappmartin reviewed Feb 3, 2023

View reviewed changes

src/nodes/gaussian_dist.jl Show resolved Hide resolved

khosravipasha reviewed Feb 3, 2023

View reviewed changes

added Gauss. likelihood tests, Pasha's update_params code

5b9b295

khosravipasha linked an issue Feb 10, 2023 that may be closed by this pull request

Continuous input distributions #128

Open

khosravipasha approved these changes Feb 14, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1D gaussian input node implementation (WIP) #129

1D gaussian input node implementation (WIP) #129

ams939 commented Feb 3, 2023 •

edited

Loading

trappmartin commented Feb 3, 2023

khosravipasha left a comment

khosravipasha Feb 3, 2023

khosravipasha Feb 3, 2023

codecov bot commented Feb 3, 2023 •

edited

Loading

khosravipasha left a comment

1D gaussian input node implementation (WIP) #129

Are you sure you want to change the base?

1D gaussian input node implementation (WIP) #129

Conversation

ams939 commented Feb 3, 2023 • edited Loading

trappmartin commented Feb 3, 2023

khosravipasha left a comment

Choose a reason for hiding this comment

khosravipasha Feb 3, 2023

Choose a reason for hiding this comment

khosravipasha Feb 3, 2023

Choose a reason for hiding this comment

codecov bot commented Feb 3, 2023 • edited Loading

Codecov Report

khosravipasha left a comment

Choose a reason for hiding this comment

ams939 commented Feb 3, 2023 •

edited

Loading

codecov bot commented Feb 3, 2023 •

edited

Loading