Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA compile guard problem for marlin_qqq #1648

Open
psinger opened this issue Jan 31, 2025 · 7 comments · May be fixed by #1651
Open

CUDA compile guard problem for marlin_qqq #1648

psinger opened this issue Jan 31, 2025 · 7 comments · May be fixed by #1651

Comments

@psinger
Copy link

psinger commented Jan 31, 2025

There appear to be some issues with marlin_qqq when compiling.

See discussion below.

@supriyar
Copy link
Contributor

thanks for raising this @psinger! We welcome community contributions, so if you'd like please go ahead and submit a fix.

Tagging @alexsamardzic to take a look

@alexsamardzic
Copy link
Collaborator

@psinger Can you paste the exact error reported during the compilation?

@psinger
Copy link
Author

psinger commented Jan 31, 2025

Yes here:

  [4/5] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-h6ia0xpl/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/fp6_llm/fp6_linear.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-h6ia0xpl/torchao/csrc/cuda/fp6_llm/fp6_linear.cu -o /tmp/pip-req-build-h6ia0xpl/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/fp6_llm/fp6_linear.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -t=0 -DTORCHAO_USE_CUTLASS -I/tmp/pip-req-build-h6ia0xpl/third_party/cutlass/include -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17
  [5/5] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-h6ia0xpl/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/s8s4_linear_cutlass/s8s4_linear_cutlass.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-h6ia0xpl/torchao/csrc/cuda/s8s4_linear_cutlass/s8s4_linear_cutlass.cu -o /tmp/pip-req-build-h6ia0xpl/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/s8s4_linear_cutlass/s8s4_linear_cutlass.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -t=0 -DTORCHAO_USE_CUTLASS -I/tmp/pip-req-build-h6ia0xpl/third_party/cutlass/include -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2240, in _run_ninja_build
      subprocess.run(
    File "/usr/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
  The above exception was the direct cause of the following exception:
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-req-build-h6ia0xpl/setup.py", line 293, in <module>
      setup(
    File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 117, in setup
      return distutils.core.setup(**attrs)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 186, in setup
      return run_commands(dist)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 202, in run_commands
      dist.run_commands()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 983, in run_commands
      self.run_command(cmd)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 999, in run_command
      super().run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1002, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/bdist_wheel.py", line 379, in run
      self.run_command("build")
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 339, in run_command
      self.distribution.run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 999, in run_command
      super().run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1002, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 136, in run
      self.run_command(cmd_name)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 339, in run_command
      self.distribution.run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 999, in run_command
      super().run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1002, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 99, in run
      _build_ext.run(self)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 365, in run
      self.build_extensions()
    File "/tmp/pip-req-build-h6ia0xpl/setup.py", line 161, in build_extensions
      super().build_extensions()
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 921, in build_extensions
      build_ext.build_extensions(self)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 481, in build_extensions
      self._build_extensions_serial()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 507, in _build_extensions_serial
      self.build_extension(ext)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 264, in build_extension
      _build_ext.build_extension(self, ext)
    File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
      super(build_ext, self).build_extension(ext)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 562, in build_extension
      objects = self.compiler.compile(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 734, in unix_wrap_ninja_compile
      _write_ninja_file_and_compile_objects(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1899, in _write_ninja_file_and_compile_objects
      _run_ninja_build(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2256, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/bin/python3 -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/tmp/pip-req-build-h6ia0xpl/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-lfo_5uo9
  cwd: /tmp/pip-req-build-h6ia0xpl/
  Building wheel for torchao (setup.py) ... error
  ERROR: Failed building wheel for torchao
  Running setup.py clean for torchao
  Running command python setup.py clean
Failed to build torchao
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (torchao)

@alexsamardzic
Copy link
Collaborator

Just tried on SM75 machine, and cannot reproduce the problem. The latest main won't build, but the problem reported is in a different file:

/workspace/ao/torchao/csrc/cuda/marlin_qqq/marlin_qqq_kernel.cu(893): error: device code does not support exception handling
    if (!(false)) { throw ::c10::NotImplementedError( {__func__, "/workspace/ao/torchao/csrc/cuda/marlin_qqq/marlin_qqq_kernel.cu", static_cast<uint32_t>(893)}, (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false.  " "(Could this error message be improved?  If so, " "please report an enhancement request to PyTorch.)", "marlin_qqq_gemm(..) requires CUDA_ARCH >= 8.0"))); }

If offending line removed from given file, then the main will build, and if unit test ran:

pytest test/test_s8s4_linear_cutlass.py

it will report:

RuntimeError: select_config : Operator not supported on SM7.5 for given operands

as expected currently.

@psinger
Copy link
Author

psinger commented Jan 31, 2025

Thanks for checking @alexsamardzic - I made the wrong conclusion based on the logs. You're right, issue seems to be on marlin_qqq as you observed. I will close this, but let me know if it is still worth reporting in a separate issue.

@psinger psinger closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2025
@alexsamardzic
Copy link
Collaborator

Yeah, I believe this should be fixed, so opening a new issue for marlin_qqq build problem would be good.

@gau-nernst gau-nernst changed the title CUDA compile guard for s8s4_linear_cutlass CUDA compile guard problem for marlin_qqq Feb 1, 2025
@gau-nernst
Copy link
Collaborator

Rename and reopen this issue as marlin_qqq problem

@gau-nernst gau-nernst reopened this Feb 1, 2025
@gau-nernst gau-nernst linked a pull request Feb 2, 2025 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants