-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding CUDA support in PR#287 #288
Comments
A pile of work here, very impressive! My understanding was that Cucim previously only supported a couple of WSI formats - in particular those with flat tiff like structures. is that still the case? I'm personally interested in mrxs formats since there appear to be tons of these floating around, but i suspect some of the ndpi ones are also very widely used. does Cucim now support them? i'd point out as well that the latest version of openslide now perfectly supports dicom - which is a huge must have since this appears to be the format that folks will be using clinically going forward (to be compatible with their PACS systems). How would that format fit in here as well? if those check out - last question, what do you imagine the integration to look like? supporting a CPU + GPU versions separately? there are many installations of histoqc which currently reside in CPU only environments, so we won't be able to eliminate that option likely ever. Would we instead have CuCim run on CPU? will there be a negative penalty for that? adding @jacksonjacobs1 as senior developer |
Format support and how does it work with current openslide-based implementation.I looked into the documentation and as the time being cuCIM does support multi-resolutional tiled images but currently it doesn't mention the support of mrsx and here is the current list of supported format:
Given the current development status of Integration with CPU code basesThat will bring us to your final question: how do we do the integration. First, as you said, the majority still use CPU-only environment and we can't forsake that. Meanwhile, I think it is easy to handle both CPU and GPU without breaking the backport compatibility for the below reasons. (1) We can hide as much detail of That means, we need to clearly define and regulates the return type of different set of interfaces such that:
We currently integrate the getBestThumb and resizeTileDownward into WSIImageHandle, wherein the only thing differs are what data interface and resize functions were implemented by subclasses.
GPU Acceleration into ModulesThree important things to note are: Based on (1) and (2) we actually have two options:
Regardless of your options, you may need a dynamic import of function call or wrappers defined to handle numpy.ndarray and cupy.ndarray in module operations, and perhaps a I am inclined to Option 2 because
|
@nanli-emory @choosehappy @jacksonjacobs1 Involvement of WSIHandle as a standalone attribute in BaseImage could be a first step to clean the current hard-to-refactor BaseImage (with its swamps of str keys and logging msgs), and would be beneficial to all future new program features. A few points: (1) Below I prepare a scratch of scripts to compare the difference of methods under (2) It seems like we should prepare to moveforward the major python version from 3.7--3.8 to 3.8--3.10, as more packages are migrated to python3.10 and 3.7 is retiring. A few skimage methods are also obsolete and might not return numerically accurate results (see the issue of frangi). This means we might need to consider upgrading the corresponding packages as well. Hence, The function calls/signature of certain skimage methods were updated so it will be ready for upgrading (remove deprecating signatures etc.) (3) Since the image handle is coupled with nearly all custom QC modules, all these modules are revised for the change and implementation of WSIImageHandle, along with lots of cleanings. TBH the implementation of Modules and BaseImage itself is kinda dirty, especially when the getitem is overused in custom modules and it would be a burden for refactoring even with a modern IDE. import importlib
import pkgutil
from types import ModuleType
def find_public_methods(module):
return [name for name in dir(module) if not name.startswith('_') and callable(getattr(module, name))]
def method_report(full_method_name: str, method_name: str,
present_methods: list, missing_methods: list, misc_errors: list):
# Construct corresponding cucim.skimage method name
cucim_method_name = full_method_name.replace('skimage', 'cucim.skimage', 1)
cucim_module_name = ".".join(cucim_method_name.split(".")[:-1])
try:
cucim_module = importlib.import_module(cucim_module_name)
result = getattr(cucim_module, method_name)
present_methods.append(cucim_method_name)
except (ImportError, AttributeError):
# If the method does not exist in cucim.skimage, add to missing_methods list
missing_methods.append(full_method_name)
except Exception as e:
print(e, full_method_name)
misc_errors.append(full_method_name)
def module_report(name: str, present_methods: list, missing_methods: list, missing_modules: list, misc_errors: list):
try:
submodule = importlib.import_module(name)
except ImportError:
missing_modules.append(name)
return
if isinstance(submodule, ModuleType):
methods = find_public_methods(submodule)
for method in methods:
full_method_name = f"{name}.{method}"
method_report(full_method_name, method, present_methods, missing_methods, misc_errors)
def traverse_modules(package_name):
package = importlib.import_module(package_name)
missing_methods = []
missing_modules = []
present_methods = []
misc_errors = []
for loader, name, is_pkg in pkgutil.walk_packages(package.__path__, package.__name__ + '.'):
# Skip the skimage.data submodule entirely
if ("skimage.data" in name or any([x.startswith('_') for x in name.split('.')])
or "tests" in name.split('.')):
continue
if is_pkg:
module_report(name, present_methods, missing_methods, missing_modules, misc_errors)
continue
full_method_name = name
method_name = name.split(".")[-1]
method_report(full_method_name, method_name, present_methods, missing_methods, misc_errors)
# if not pkg --> method
return missing_methods, missing_modules, present_methods, misc_errors
# Usage
missing_sk_methods, missing_sk_modules, present_sk_methods, misc_sk_errors = traverse_modules('skimage')
print("Missing Methods:")
print(missing_sk_methods)
print("Missing Modules:")
print(missing_sk_modules)
print("Present Methods:")
print(present_sk_methods)
print("Misc Errors:")
print(misc_sk_errors) |
Hi Andrew and Nan, @choosehappy @nanli-emory
since I am tired of running 1k+ WSIs for some paper over and over I add the CUDA support (#287) into the WSIHandles using RapidsAI's
cucim.CuImage
, and here goes some thoughts, benchmarks, and todos:Motivation and Feasibility
(1)
histoqc.wsihandle.CuImageHandle
itself usecucim.clara.CuImage
as the file handle, which intentionally mirrors a subset (and is expanding) of openslide's API as described in their documentation (under development). It uses cupy.ndarray, which mostly mirrors the numpy.ndarray. cuCIM also by itself provides a mirror of skimage for most of common operations (e.g., filters, resize, color transformation etc.):cucim.skimage
. These are pretty much the prerequisites of introducing the GPU acceleration without breaking the interface and compatibility of our code and cuCIM seems to be a good candidate.(2) HistoQC's pipeline is mainly bottlenecked by downsampling (thumbnails) and different sets of filtering options. The thumbnail extraction procedure is occasionally dominated by the read_region which decode the compressed data from tiled pyramid data. I did a few simple benchmarks and here is the result, using a 40x WSI with size 46407x53784.
(3) Since
cupy.ndarray
pretty much mirrors most of the essential API ofnumpy.ndarray
, and the conversion between each other is trivial and simple, and there exists high-level function/transformations that shares the same interface (e.g.,cupyx
replacesscipy
andcucim.skimage
replacesskimage
), this would also be esier to introduce GPU acceleration into custom modules with careful planning.(4) We could definitely list cucim as an extra_requirement in toml and setup.py so it can be optional, without breaking the current pipeline.
Changes
(1) We might need to explicitly define, differentiate and backend data type. For instance, openslide mostly returns
PIL.Image.Image
, andcucim.CuImage
returnscucim.CuImage
itself. Sincecucim.CuImage
is a centerpiece of most operations insideCuImageHandle
yet not sharing any interfaces withPIL.Image.Image
, a different layer of abstraction might be needed inWSIImageHandle
to differentiate the different backend data type the handle returns for internal procedures and the unified output types to external procedures.Herein, a method of
backend_region
is added, which returns the raw output of reading a single region from the handle. While theread_region
read thebackend_region
and explicitly unified them toPIL.Image.Image
using thebackend_to_pil
.(2) We might need to explicitly manage the array data type beneath the backend data, mostly for external transformations. The array data in many cases share the same memory of the backend data (e.g., the
numpy.ndarray
from thePIL.Image.Image
andcupy.ndarray
fromcucim.CuImage
) and is trivial to retrieve. This procedure is defined bybackend_to_array
method.The array data should share similar interfaces and methods that exposes and returns such array data would allow the introduction of GPU operations to external procedures.
For instance, getBestThumb returns
np.ndarray
forOpenslideHandle
andcupy.ndarray
forCuImageHandle
, which allows downstream operations from external modules to identify whether GPU accelerated transformations should be applied.(3) The resizeTileDownward and getBestThumb methods seem to be tightly coupled to the image handles rather than
BaseImage
, and with the modifications from (1) and (2) above, they are integrated into theWSIImageHandle
as generic procedures. Users only need to override methods such asregion_backend
,backend_to_array
, etc. without modifying theresizeTileDownward
andgetBestThumb
themselves.(Benchmark was done via the integrated
resizeTileDownward
andgetBestThumb
.TODO
(1) Image operations with GPU acceleration in custom modules using
cucim.skimage
andcupyx
.(2) Vendor info. So far
cucim.CuImage
simply lumps vendor-specific metadata (e.g., mpp, mag etc.) into itsmetadata
field, and it lacks a unified parser to read these meta data into a map of common property names like what openslide is doing. Currently cuCIM does not offer more detailed documentation regarding the metadata and as a workaround, a dummy openslide.OpenSlide is associated into theCuImageHandle
only for reading properties.The text was updated successfully, but these errors were encountered: