Noise analysis #22

dzhingarska · 2024-09-24T16:31:21Z

comparing the shock positions over time of flow data and flow data with additional noise

…and 2D

analysis/noise.jl

warisa-r · 2024-09-24T17:09:28Z

analysis/noise.jl

+# Returns
+- A new `FlowData` struct with noise added to the relevant fields.
+"""
+function apply_noise_to_flow(flow_data::FlowData, noise_data::NoiseData)


Could we try adding noise to some properties in some cases and if things go differently?

What do you mean exactly?

For example: Try adding boise to pressure and density and leave the velocity field alone

analysis/noise.jl

examples/sod_shock_tube_1d.jl

warisa-r · 2024-09-27T04:46:05Z

examples/sod_shock_tube_1d.jl

@@ -4,6 +4,9 @@ using ShockwaveProperties
 using Unitful
 using ShockwaveDetection

+# Create a NoiseData instance


Can u leave this out :)

I left it there because you asked not to change the file last time, but I can remove the noise data instance

I meant, please don’t commit to this file since it’s not relevant to this PR? Sorry for the confusion.

analysis/noise.jl

warisa-r · 2024-09-27T05:11:11Z

analysis/noise.jl

-Compares shock fits over time between original and noisy data.
+    # Pad the shorter array with zeroes to match lengths
+    if length(original_params) > length(noisy_params)
+        padding = fill(0.0, length(original_params) - length(noisy_params))


The issue is that this would only work if all of our models have only 1 important parameter. But that is not the case for most of our essential models (the majority and the ones that are fitted most are parabola, circle, and line, all of which have 2-3 parameters)

And also, what if in the case where the model actually only has 1 parameter like hline and vline (check out sod tube 2d case) and the parameter is 0.0. Then the padded value is a spot-on guess and doesn't signify whatsoever that the algorithm fails to fit something even though it does.

Even though the number in the missing_count does vaguely say something but the mean you get from this part of the code is tampered.

I will fix the issue with hline and vline with padding with special value like NaN. This will now give a clear distinction between models that fit poorly (high RMSE) and models where parameters were simply missing. Additionally, I extract all the parameters not just the first one, that was my bad.

warisa-r · 2024-09-27T05:22:18Z

analysis/noise.jl

        println("No valid differences calculated.")
        return NaN
    end

+    mean_diff = mean(valid_diffs)
+    inf_ratio = inf_count / length(original_positions)  # Calculate ratio based on original count


what if the length of the original positions is 0? This isn't a priority but it is something to look out for.

it practically never happens but I can still handle this case if necessary, but I don't think it is for our simulation

warisa-r · 2024-09-27T05:25:51Z

analysis/noise.jl

-"""
-    compare_shock_fits_over_time(original_fits, noisy_fits)
+    # Ensure both parameter vectors are not empty
+    if isempty(original_params) || isempty(noisy_params)


Fit result will never be empty if there is a cluster of points. If there isn't any shock formed in that frame because there isn't any e.g. supersonic inflow in some of the simulation instances since it will be introduced later, does this mean we can't compare fits of the entire simulation run?

What would be the reason to throw an error here?

fit result is not going to be empty, the condition is not met but I thought it should be there rather for the overall look of the function, and yes we cannot compare flow without shocks and we also do not want to, the purpose of this function is also not that

warisa-r · 2024-09-27T05:38:03Z

@dzhingarska Could you please resolve the things you already fixed? And I think that this task will take at least a while until we can get a sensible result with not only the sod 2d right case, but with shock orb and obstacles too. (if I were to do it. But maybe it can take shorter for you). I mean or at least shock orb since sod 2d right is just simply a line. (And for the obstacle case the detection works as well as in any other case. No shocks are detected on NaN cells so it's like working with normal sim scenarios. You just have more diverse fits.)

In my humble opinion, it isn't easy to quantify false detection correctly and logically when we take a close look at all the false detection scenarios, be it missing points, be it too many points. Initially, I thought this would take around a month or something no shorter than that. It has been a month and maybe it's better if we view what we want to do more realistically. I don't know if this is something we can report on yet.

We could move on to the report with what we have so we can finally be done with the software since we have been doing this since the beginning of June which is quite a long time already.

It is not a big deal, but if we can get this working, that'd be cool.

Let me know what you think and plan to do :)

warisa-r · 2024-09-28T02:15:09Z

examples/sod_shock_tube_2d.jl

@@ -10,4 +10,4 @@ dbscan_algo = DBSCANAlgo(0.25, 3, 10)
 detection = detect(flow_data, point_detect_algo, dbscan_algo, fitting_algo)

 plot_shock_fits_over_time(flow_data, detection, true)
-#create_heatmap_evo_with_shock(flow_data, detection, :density_field, true, false)
+#create_heatmap_evo_with_shock(flow_data, detection, :density_field, true, false)


Can we not commit to the examples in src? Since we are not doing anything with this example right?

Can I just copy it from main and commit, since it would be complicated reverting commits just because of this?

I don't think this matters much so I'm probably going to merge despite this minor difference

warisa-r · 2024-09-28T02:35:12Z

analysis/noise.jl

+    println("Padded Noisy Parameters: ", noisy_params)
+
+    # Calculate the differences while ignoring NaNs
+    differences = original_params .- noisy_params


I don't quite understand what padding is doing here in terms of dimensions. You mentioned missing parameters: which I don't really understand what that means.

You are talking about cases where there isn't any fit for that certain cluster or even worse, frame. In that case, you are right in that the parameter is missing. Since there is no fit, there can be no parameter. In this case you won't be able to perform arithmetic operation anyway in cases that either case detect different models at the time spot in your params array.

You are talking about different models with different numbers of parameters: like

Line 2 parameters for 3x-1

Parabola 3 parameters 4x^{2}\ -0.1x\ -0.5
They are very close to each other between 0 and 1 and it is very likely that our algorithm fits these two different models for scenarios with and without noises, respectively. You can plot this. So intuitive, the difference between these two models in this context should be very small right?

But with your padding, the norm of original_params .- noisy_params is going to be quite big. (Consider that we are dealing with small bounds, which we do for every case of simulation we have. We are talking (-1,1) x (-1,1) or (-2,1) x (1,1)

Even more severe. Say that you have both models with exactly the same parameter
1,1,0.5
but they are circle model and parabola model respectively. Then they are two completely different shape, but you algorithm is suggesting that it is the same.

I don't think that this is a good way to solve arithmetic operations. Is there other things we can compare if the models don't match? Like how each models deviate from the original cluster of points (noiseless points) they were trying to fit.

I would advocate for an approach that is not solely based on how to make arithmetic operation with broadcasting operator work since our curves are parametized starkly differently. Comparing the parameters can work if they are the exact same model from the exact time frame and from the exact cluster (but looking at the way you store the parameter, it is hard to identify if parameters at the same order of the array (noisy and original) is from the same cluster and frame since push! is the mechanism of storing)

You have a point there, actually I comparing the parameters of the fit does not make so much sense as I was expecting and it seems another approach is needed. Now instead of focusing on the parameters themselves, we can compare how well the models fit the data points (original_shock_positions_over_time and noisy_shock_positions_over_time). The new solution compares the model residuals (the difference between the original data points and the fitted data) rather than the raw parameters. Check it out and suggest improvements or feel free to adjust as well.

src/ShockwaveDetection.jl

warisa-r · 2024-09-28T03:07:15Z

analysis/noise.jl

+                diff = (orig_pos - closest_noisy_pos)^2
+                push!(diffs, diff)
+                # Remove the matched noisy position to avoid double matching
+                deleteat!(noisy, closest_idx)


This algorithm looks fine to me. My problem with this kind of penalty (even though it is fair) is that now the difference is bloated with a large penalty and this renders the difference in the close noisy positions and the original positions trivial (occupying floating point spaces and pushing the smaller values away from storage).

Could we not add large penalties and only count missing shock positions? That would lessen the meaning of diffs calculated here but I think that there is a limit to what we can get out of arithmetic operations.

done, no more large penalties, just missing_count. If you still want some refinements, let me know or feel free to edit as you like as well.

This looks good! One more thing is that I think it might be a good idea if we have a separate count of how many our algorithm falsely detected and failed to detect

warisa-r · 2024-09-28T03:08:37Z

analysis/noise.jl

+    # Calculate the RMSE (ignoring NaN values)
+    rmse = sqrt(mean(filtered_differences .^ 2))
+
+    return rmse, missing_count


Please try this with funky_square.celltape with a gradient threshold of 0.05. There are all sorts of clusters and fit should show up. If this algorithm is working then it should run fine with this case.

analysis/noise.jl

warisa-r · 2024-09-29T08:05:52Z

analysis/noise.jl

+                diff = (orig_pos - closest_noisy_pos)^2
+                push!(diffs, diff)
+                # Remove the matched noisy position to avoid double matching
+                deleteat!(noisy, closest_idx)


This looks good! One more thing is that I think it might be a good idea if we have a separate count of how many our algorithm falsely detected and failed to detect

warisa-r · 2024-09-29T08:08:06Z

analysis/noise.jl

+    y = range(bounds[2][1], bounds[2][2], length=ncells[2])
+
+    for t in eachindex(nsteps)  # Use eachindex for time steps
+        if isempty(original_shock_positions_over_time[t]) 


Fit should be compared to its respective cluster, not all the shock points in 1-time frame. Since in 1 time frame we could be trying to fit 4 curves and comparing all the points to only 1 left curve will result in a false outcome of bloated residuum (see the last frames of sod shock orb for example)

warisa-r

I think things should be working quite smoothly now if you test it with shock orb. But if it doesn't, totally up to you if you want to test it. Apart from some comments here about the comparison logic(which might be wrong from my side), I think all should be functioning :D

warisa-r · 2024-09-30T03:47:56Z

analysis/noise.jl

+
+            # Calculate residuals for original and noisy fits
+            original_residuals = calculate_residuals(original_fit[t][i], original_data_points)
+            noisy_residuals = calculate_residuals(noisy_fit[t][i], noisy_data_points)


Correct me if I'm wrong here but isn't this line supposed to be

noisy_residuals = calculate_residuals(noisy_fit[t][i], original_data_points) ?

We want to see how much the curve deviated from its original points (original data) e.g. how sensitive the curve is compared to the original findings. Since if we compared the noisy fit to the noisy data points, it is very likely we might get a new far-off curve that fit the noises better as the Levenberg-Marquardt algorithm does naturally.

What we rather want to see is how clustering, or different kernels, or the concept of curves themselves even, can help preserve the original shock curves.

oh, I have totally overseen that, I think you are right. Thanks.

warisa-r · 2024-09-30T03:56:07Z

analysis/noise.jl

+            end
+
+            # Ensure residuals have the same length
+            if length(original_residuals) != length(noisy_residuals)


This is very cautious of you :) However, I haven't encountered this before and am not sure where this can come from. So I'm also not sure if this is a case we can encounter if we have the same amount of data points we are comparing. But in the case that noisy shock points and original shock points don't match maybe we can find the difference in the second norm of both residues?

namely

norm(original_residuals,2) - norm(noisy_residuals,2)

This way we are performing a scalar operation but as mentioned before, I am not so sure about this as I haven't experienced this problem before

sounds reasonable, I will implement this difference

warisa-r · 2024-09-30T03:57:38Z

analysis/noise.jl

+    residual_length_mismatch_count = 0      # Number of times residuals had different lengths
+
+    # Cluster the shock points for original and noisy data
+    original_clusters = ShockwaveDetection.cluster_shock_points(dbscan_algo, original_shock_positions_over_time, flow_data)


Not necessarily a priority but shock clusters can already be accessed from ShockDetectionResult as an attribute called

shock_clusters_over_time

dzhingarska · 2024-09-30T23:37:18Z

I think we can upload to the registry now, if there is something more, it won't be relevant for the user.

dzhingarska added 2 commits September 23, 2024 00:18

current progress on noise

e4a60c3

implement function for comparing the shock positions over time in 1D …

a54c9b0

…and 2D

warisa-r self-requested a review September 24, 2024 16:31

warisa-r requested changes Sep 24, 2024

View reviewed changes

analysis/noise.jl Outdated Show resolved Hide resolved

analysis/noise.jl Show resolved Hide resolved

analysis/noise.jl Outdated Show resolved Hide resolved

analysis/noise.jl Outdated Show resolved Hide resolved

analysis/noise.jl Outdated Show resolved Hide resolved

warisa-r requested changes Sep 24, 2024

View reviewed changes

warisa-r changed the title ~~Ina test branch~~ Noise analysis Sep 24, 2024

warisa-r linked an issue Sep 24, 2024 that may be closed by this pull request

test with noise #7

Closed

compare shock positions in 1d and compare shock fits in 2d

0a0fafb

warisa-r requested changes Sep 27, 2024

View reviewed changes

warisa-r closed this Sep 27, 2024

warisa-r reopened this Sep 27, 2024

changes to comparisson functions

035d7da

warisa-r requested changes Sep 28, 2024

View reviewed changes

adjustment on the comparison algorithm

ec4a1fb

warisa-r requested changes Sep 29, 2024

View reviewed changes

dzhingarska added 2 commits September 29, 2024 20:46

changes to the comparison algorithm

9c1eb6b

update documentation

fccd519

warisa-r reviewed Sep 30, 2024

View reviewed changes

dzhingarska added 2 commits September 30, 2024 17:21

small changes

5533160

final touches

b8d7416

warisa-r approved these changes Oct 1, 2024

View reviewed changes

warisa-r merged commit 8092273 into main Oct 1, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noise analysis #22

Noise analysis #22

dzhingarska commented Sep 24, 2024

warisa-r Sep 24, 2024

dzhingarska Sep 25, 2024

warisa-r Sep 25, 2024

warisa-r Sep 27, 2024

dzhingarska Sep 27, 2024

dzhingarska Sep 27, 2024

warisa-r Sep 28, 2024 •

edited

Loading

warisa-r Sep 27, 2024

dzhingarska Sep 27, 2024

warisa-r Sep 27, 2024

dzhingarska Sep 27, 2024

warisa-r Sep 27, 2024

dzhingarska Sep 27, 2024

warisa-r commented Sep 27, 2024 •

edited

Loading

warisa-r Sep 28, 2024

dzhingarska Sep 30, 2024

warisa-r Oct 1, 2024

warisa-r Sep 28, 2024

warisa-r Sep 28, 2024

dzhingarska Sep 28, 2024

warisa-r Sep 28, 2024

dzhingarska Sep 28, 2024 •

edited

Loading

warisa-r Sep 29, 2024 •

edited

Loading

warisa-r Sep 28, 2024

warisa-r Sep 29, 2024 •

edited

Loading

warisa-r Sep 29, 2024

warisa-r left a comment

warisa-r Sep 30, 2024

dzhingarska Sep 30, 2024

warisa-r Sep 30, 2024

dzhingarska Sep 30, 2024

warisa-r Sep 30, 2024

dzhingarska commented Sep 30, 2024

Noise analysis #22

Noise analysis #22

Conversation

dzhingarska commented Sep 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warisa-r Sep 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warisa-r commented Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dzhingarska Sep 28, 2024 • edited Loading

Choose a reason for hiding this comment

warisa-r Sep 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warisa-r Sep 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warisa-r left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dzhingarska commented Sep 30, 2024

warisa-r Sep 28, 2024 •

edited

Loading

warisa-r commented Sep 27, 2024 •

edited

Loading

dzhingarska Sep 28, 2024 •

edited

Loading

warisa-r Sep 29, 2024 •

edited

Loading

warisa-r Sep 29, 2024 •

edited

Loading