Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Set tilesize when use CPU #2

Closed
Tohrusky opened this issue Feb 2, 2023 · 4 comments
Closed

Duplicate Set tilesize when use CPU #2

Tohrusky opened this issue Feb 2, 2023 · 4 comments

Comments

@Tohrusky
Copy link

Tohrusky commented Feb 2, 2023

System: MacOS M1 arm64
Cmake Version: 3.25.2
Vulkan Version: 1.3.239.0
Python Version: 3.9

Build

export VULKAN_SDK=/Users/User/VulkanSDK/1.3.239.0/macOS 
mkdir build-arm64 && cd build-arm64
cmake -DUSE_STATIC_MOLTENVK=ON -DCMAKE_OSX_ARCHITECTURES="arm64" \
      -DVulkan_INCLUDE_DIR=/Users/User/VulkanSDK/1.3.239.0/MoltenVK/include \
      -DVulkan_LIBRARY=/Users/User/VulkanSDK/1.3.239.0/MoltenVK/MoltenVK.xcframework/macos-arm64_x86_64/libMoltenVK.a \
      ../src
cmake --build . -j 4

When I use GPU, it works well.

BUT WHEN I try to use CPU, it failed.

The same build in official nihui/realcugan-ncnn-vulkan, the CPU and GPU also can work.

I found that

in Line 149

        if self._gpuid != -1:
            self._realcugan_object.process(raw_in_image, raw_out_image)
        else:
            self._realcugan_object.tilesize = max(image.width, image.height)
            self._realcugan_object.process_cpu(raw_in_image, raw_out_image)

but in

def _get_tilesize(self):
        if self._gpuid == -1:
            return 400

the tilesize has set

And in cugan ncnn main.cpp, I did not find that.

When I remove Line 149, CPU works.

@ArchieMeng
Copy link
Member

ArchieMeng commented Feb 4, 2023

I didn't reproduce the problem, could you check your source code for the line 149? In my machine(x86_64 Linux), commenting the line

self._realcugan_object.tilesize = max(image.width, image.height)

will make CPU mode processing broken on large RGBA images. (Which is the case why this line is needed)

And in cugan ncnn main.cpp, I did not find that.

It used to be 4000 which may set a full image processing without tiling. The corresponding commit is nihui/realcugan-ncnn-vulkan@7f7536d.

And it is the temporary fix for nihui/waifu2x-ncnn-vulkan#186, the crash when up-scaling images larger than tilesize.

@Tohrusky
Copy link
Author

Tohrusky commented Feb 4, 2023

目前仅在m1 mac下测试了,回头我再试试Win平台

同样的编译指令(未开启openmp)编译nihui和该项目,注释掉149后超分一张1920×1080图片耗时基本一致,warp还快一点(直接os.run,多了读写io等),大概70s左右。

但不注释掉149行时,打断点发现会在该行卡死(好像也能跑出来?不过耗时290s显然异常了hh

不过nihui为什么没有做max(w,h)这个处理呢,直接设定tilesize 400也会导致大图处理崩溃吗😯

PS: cugan ncnn在m1下比 torch慢太多太多辣😡

感谢回复,祝您新年快乐!

@ArchieMeng
Copy link
Member

但不注释掉149行时,打断点发现会在该行卡死(好像也能跑出来?不过耗时290s显然异常了hh

如果是这一行卡住的话,可能是tilesize的设置出了什么问题。可以用gdb什么的(MacOS上我不知道用啥)调试下看看具体是binding代码里的哪里挂了。

@Tohrusky
Copy link
Author

Tohrusky commented Jul 5, 2023

其实是能跑的,只是要卡好久。我自己撸的裤子m1 cpu也都跑的很慢。close了🥳

@Tohrusky Tohrusky closed this as completed Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants