Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA Implementation - AFTER V1.2 - #36

Open
PiMaV opened this issue Jun 28, 2024 · 1 comment
Open

PCA Implementation - AFTER V1.2 - #36

PiMaV opened this issue Jun 28, 2024 · 1 comment
Labels
low prio Not handled unless necessary.

Comments

@PiMaV
Copy link
Owner

PiMaV commented Jun 28, 2024

  • Button: Add a "Calculate PCA" button to ensure the user is aware of this action.
  • Autosave: Implement autosaving of the PCA results into the images folder once calculated.
  • Metadata: Save metadata in a separate file with information such as original image size, number of images, and cropped region.
  • Filename: Save files with a specific filename format, e.g., pca_images_x_y (pca_180im_133x_190y.npy).
  • BUG Fix: Ensure the PCA is cleared when a new dataset is loaded.
  • Feature: Load the PCA from the file if it already exists in the same folder. Check the size of PCA due to cropping.
  • Viewing: Change "Show/Hide" to "Refresh View".
  • Components: Display components with three digits (currently shows two digits).
  • Dataset Size Check: Implement a check for large datasets and provide an error/warning to the user. The error from np.linalg.svd will indicate an inability to allocate sufficient memory.
    • Calculation: Show an info on the PCA Page with the formula for memory requirement: "(x * y)^2 * 8 / 1024^3" in GiB. Compare this against available and blocked RAM.
  • GPU Acceleration: Implement GPU acceleration of PCA using cupy.
  • Full Matrices: Implement an option for full_matrices = False/True.
  • Sparse SVD: Consider using scipy.sparse.linalg.svds. Check the literature or Google to determine the most appropriate method.
  • Size Estimation: Show a size estimation for the PCA in the lower taskbar. This allows the user to see the impact directly when using cropping or masking.
@PiMaV
Copy link
Owner Author

PiMaV commented Jun 28, 2024

Funny enough ChatGPT already made some suggestions:

Code Implementation Example:

import numpy as np
import os
import cupy as cp
from scipy.sparse.linalg import svds

class PCAHandler:
    def __init__(self):
        self.pca_result = None
        self.metadata = {}
        self.pca_path = "images/pca_{}_{}_{}.npy"
    
    def calculate_pca(self, data, crop_region=None):
        try:
            # PCA calculation using CuPy for GPU acceleration
            self._svd = U, s, Vh = cp.linalg.svd(data, full_matrices=False)
            self.pca_result = (U, s, Vh)
            self._autosave_pca(crop_region)
        except MemoryError as e:
            required_memory = (data.shape[0] * data.shape[1]) ** 2 * 8 / 1024 ** 3
            print(f"Memory Error: Unable to allocate {required_memory:.2f} GiB. {str(e)}")
    
    def _autosave_pca(self, crop_region):
        num_images = len(self.pca_result[0])
        x, y = crop_region if crop_region else (self.pca_result[0].shape[1], self.pca_result[0].shape[2])
        filename = self.pca_path.format(num_images, x, y)
        np.save(filename, self.pca_result)
        self._save_metadata(num_images, x, y)
    
    def _save_metadata(self, num_images, x, y):
        metadata_filename = self.pca_path.format(num_images, x, y).replace('.npy', '.meta')
        self.metadata = {
            "original_image_size": (x, y),
            "number_of_images": num_images,
            "cropped_region": (x, y)
        }
        with open(metadata_filename, 'w') as f:
            json.dump(self.metadata, f)
    
    def load_pca(self, filename):
        if os.path.exists(filename):
            self.pca_result = np.load(filename)
            metadata_filename = filename.replace('.npy', '.meta')
            with open(metadata_filename, 'r') as f:
                self.metadata = json.load(f)

    def clear_pca(self):
        self.pca_result = None
        self.metadata = {}
    
    def estimate_memory_usage(self, x, y):
        return (x * y) ** 2 * 8 / 1024 ** 3

    def display_memory_estimate(self, x, y):
        estimated_memory = self.estimate_memory_usage(x, y)
        print(f"Estimated memory usage for PCA: {estimated_memory:.2f} GiB")

@PiMaV PiMaV added the low prio Not handled unless necessary. label Jun 28, 2024
@PiMaV PiMaV changed the title PCA Implementation PCA Implementation - AFTER V1.2 - Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low prio Not handled unless necessary.
Projects
None yet
Development

No branches or pull requests

1 participant