Duplicate Set tilesize when use CPU #2

Tohrusky · 2023-02-02T13:34:24Z

System: MacOS M1 arm64
Cmake Version: 3.25.2
Vulkan Version: 1.3.239.0
Python Version: 3.9

Build

export VULKAN_SDK=/Users/User/VulkanSDK/1.3.239.0/macOS 
mkdir build-arm64 && cd build-arm64
cmake -DUSE_STATIC_MOLTENVK=ON -DCMAKE_OSX_ARCHITECTURES="arm64" \
      -DVulkan_INCLUDE_DIR=/Users/User/VulkanSDK/1.3.239.0/MoltenVK/include \
      -DVulkan_LIBRARY=/Users/User/VulkanSDK/1.3.239.0/MoltenVK/MoltenVK.xcframework/macos-arm64_x86_64/libMoltenVK.a \
      ../src
cmake --build . -j 4

When I use GPU, it works well.

BUT WHEN I try to use CPU, it failed.

The same build in official nihui/realcugan-ncnn-vulkan, the CPU and GPU also can work.

I found that

in Line 149

        if self._gpuid != -1:
            self._realcugan_object.process(raw_in_image, raw_out_image)
        else:
            self._realcugan_object.tilesize = max(image.width, image.height)
            self._realcugan_object.process_cpu(raw_in_image, raw_out_image)

but in

def _get_tilesize(self):
        if self._gpuid == -1:
            return 400

the tilesize has set

And in cugan ncnn main.cpp, I did not find that.

When I remove Line 149, CPU works.

The text was updated successfully, but these errors were encountered:

ArchieMeng · 2023-02-04T05:13:30Z

I didn't reproduce the problem, could you check your source code for the line 149? In my machine(x86_64 Linux), commenting the line

self._realcugan_object.tilesize = max(image.width, image.height)

will make CPU mode processing broken on large RGBA images. (Which is the case why this line is needed)

And in cugan ncnn main.cpp, I did not find that.

It used to be 4000 which may set a full image processing without tiling. The corresponding commit is nihui/realcugan-ncnn-vulkan@7f7536d.

And it is the temporary fix for nihui/waifu2x-ncnn-vulkan#186, the crash when up-scaling images larger than tilesize.

Tohrusky · 2023-02-04T05:54:28Z

目前仅在m1 mac下测试了，回头我再试试Win平台

同样的编译指令（未开启openmp）编译nihui和该项目，注释掉149后超分一张1920×1080图片耗时基本一致，warp还快一点（直接os.run，多了读写io等），大概70s左右。

但不注释掉149行时，打断点发现会在该行卡死（好像也能跑出来？不过耗时290s显然异常了hh

不过nihui为什么没有做max(w,h)这个处理呢，直接设定tilesize 400也会导致大图处理崩溃吗😯

PS: cugan ncnn在m1下比 torch慢太多太多辣😡

感谢回复，祝您新年快乐！

ArchieMeng · 2023-02-06T04:33:29Z

但不注释掉149行时，打断点发现会在该行卡死（好像也能跑出来？不过耗时290s显然异常了hh

如果是这一行卡住的话，可能是tilesize的设置出了什么问题。可以用gdb什么的（MacOS上我不知道用啥）调试下看看具体是binding代码里的哪里挂了。

Tohrusky · 2023-07-05T16:51:27Z

其实是能跑的，只是要卡好久。我自己撸的裤子m1 cpu也都跑的很慢。close了🥳

Tohrusky closed this as completed Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate Set tilesize when use CPU #2

Duplicate Set tilesize when use CPU #2

Tohrusky commented Feb 2, 2023

ArchieMeng commented Feb 4, 2023 •

edited

Loading

Tohrusky commented Feb 4, 2023 •

edited

Loading

ArchieMeng commented Feb 6, 2023

Tohrusky commented Jul 5, 2023

Duplicate Set tilesize when use CPU #2

Duplicate Set tilesize when use CPU #2

Comments

Tohrusky commented Feb 2, 2023

Build

BUT WHEN I try to use CPU, it failed.

When I remove Line 149, CPU works.

ArchieMeng commented Feb 4, 2023 • edited Loading

Tohrusky commented Feb 4, 2023 • edited Loading

ArchieMeng commented Feb 6, 2023

Tohrusky commented Jul 5, 2023

ArchieMeng commented Feb 4, 2023 •

edited

Loading

Tohrusky commented Feb 4, 2023 •

edited

Loading