Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up by releasing gil and using openmp with zerocopy view #36

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

synodriver
Copy link

This pr did:

  • absolute import
  • update cython language level to 3
  • zerocopy view with more typing to speedup
  • cython inline suport to speedup
  • cython gc-free class to reduce gc pressure (obviously it's not possible for those types to participate in cycles)
  • use nogil to allow run in threadpool and bypass gil
  • use openmp for parallel compute
  • add ci to build wheels
  • inject data files to wheels so there's no need to download it when import
  • reformat using isort and black

@fumiama
Copy link

fumiama commented Aug 12, 2022

This may fix #30 #33 #34 for removing necessity to build it natively.

Copy link
Owner

@r9y9 r9y9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for PR. There are some interesting new features, but I think there is too much for a single PR. Also, it would be great if you included some more information: what problems you are going to solve, how much performance gain is obtained, etc.

setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
extra_compile_args = ["-fopenmp"]
extra_link_args = ["-fopenmp"]
elif system == "Darwin":
os.system("brew install libomp")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think automatically running brew install in setup.py is a good idea. How about updating the installation guide instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the parallel is optional. Without linking to openmp, cython's prange will fallback to a single thread loop.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brew has merely become the default package manager in MacOS, thus in 99% case, installing openmp means run the command brew install libomp. However, we can just remove this line and give a warning of falling back to single thread loop to user.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving a warning sounds good to me. It is better to avoid installing binary dependencies to /usr/local/bin or somewhere implicitly.

elif system == "Darwin":
os.system("brew install libomp")
extra_compile_args = ["-Xpreprocessor", "-fopenmp"]
extra_link_args = ["-L/usr/local/lib", "-lomp"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to avoid hard-code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add an option for it and let user decide where to link against?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the linker will search all path for this lib, so we can just remove "-L/usr/local/lib"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An environmental variable (with default) works okay for example.

if os.getenv("PYOPENJTALK_BUILD_VERSION"):

setup.py Outdated
Comment on lines 171 to 185

# extract dic
filename = "dic.tar.gz"
print(f"Downloading: {_DICT_URL}")
urlretrieve(_DICT_URL, filename)
print("Download complete")

print("Extracting tar file {}".format(filename))
with tarfile.open(filename, mode="r|gz") as f:
f.extractall(path="./")
os.remove(filename)
print("Extract complete")
shutil.copytree(f"./{_dict_folder_name}", f"./pyopenjtalk/{_dict_folder_name}")
sys.stdout.flush()

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code dup with pyopenjtalk/__init__.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid taking a long time when importing, and also for special environments such as cloud functions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing the lazy downloader in pyopenjtalk/__init__.py

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I just wanted to avoid duplication.

@fumiama
Copy link

fumiama commented Aug 14, 2022

but I think there is too much for a single PR

The change seems huge but can be concluded with optimize cython and packaging.

what problems you are going to solve

They are in the first comment.

how much performance gain is obtained

@synodriver will give you a result.

@synodriver
Copy link
Author

synodriver commented Aug 14, 2022

how much performance gain is obtained

Since there is no gil required for those functions, it depends on how many cores your cpu have. And for asyncio apps, just use loop.run_in_executor is okey.

@synodriver
Copy link
Author

BTW, I think cython should be an optional requirement for this project, just including those cython-generated files is ok if a user don't have cython installed.

@r9y9
Copy link
Owner

r9y9 commented Aug 15, 2022

I used to include cython-generated files for cython-dependent projects, but it indeed causes problems in some environments due to a mismatch between Python/Cython/compiler versions. I wish cython-generated code was portable and worked fine on any platform, but it wasn't unfortunately. In addition, to be specific to this repo, users need to generate config.h depending on their platform (ref #21), so it is not easy to make cython dependency optional for any platform.

One possible solution would be to release both pre-built packages for specific platforms (cython dep: optional) and non-built packages for general purposes (cython dep: mandatory), respectively.

@fumiama
Copy link

fumiama commented Aug 15, 2022

One possible solution would be to release both pre-built packages for specific platforms (cython dep: optional) and non-built packages for general purposes (cython dep: mandatory), respectively.

This is surely what we would like to do. We can both reduce the installing complexity and remain its portability by that.

@synodriver
Copy link
Author

Actually I have already build wheels for almost all platforms here, it took a long time to debug the CI.

feat: remove six and lazy init and tqdm
@synodriver
Copy link
Author

Ok, let's add some checks before inject datafiles.

Comment on lines -45 to -49
class _TqdmUpTo(tqdm): # type: ignore
def update_to(self, b=1, bsize=1, tsize=None):
if tsize is not None:
self.total = tsize
return self.update(b * bsize - self.n)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the tqdm-related code removed? If it doesn't cause any problem, could you revert it back?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still in steup.py and this removing is part of the

Consider removing the lazy downloader in pyopenjtalk/__init__.py

I said yesterday.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant why the following code:

with TqdmUpTo(...):
  urlretrieve(...)

is reduced to

urlretrieve(...)

I think the former was okay as is.

Copy link

@fumiama fumiama Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I just noticed that this code was removed after my changes by @synodriver . In my opinion, since this tqdm is only appears in setup time, whether removing or remaining it is okey. So what is your purpose of removing this?😂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a progress bar and not necessary. Besides, I noticed some wired issue about downloading when runing ci with tqdm. I have no idea what's happening but to remove it. Then the ci works fine.

Copy link

@fumiama fumiama Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be an acceptable reason of removing this. Or we can pass an env variable to disable it in CI and enable it by default when user want to compile it natively?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never had issues with tqdm on CI so far. Is the problem reproducible? If so, it's okay to remove it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run it several times on my fork but it always fail to build with tqdm installed. It's just like... the stdout is hiden, making it difficult to debug. And without tqdm it works fine. BTW, why does the progress bar so important for this project?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, OK. The progress bar was just useful for knowing the estimated time to finish the downloading process.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But with prebuild-wheels, there is no need for users to download. You can check the files I mentioned here, the size of them shows that they already contains data files in the wheel.

@r9y9
Copy link
Owner

r9y9 commented Aug 15, 2022

This is surely what we would like to do. We can both reduce the installing complexity and remain its portability by that.

Agreed. I'll upload wheels to pypi for the next release so that users can install pyopenjtalk more easily. Also thanks for debugging through CI. I understand it takes time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants