Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use prefetch to load next mem into cache #21206

Merged
merged 2 commits into from
Nov 24, 2019

Conversation

LeoZhao-Intel
Copy link
Contributor

@LeoZhao-Intel LeoZhao-Intel commented Nov 15, 2019

memcpy usage in hash_embedding_ff of Pyramid DNN is a typical random memory access case, which makes cache miss frequently. Before each memcpy, it uses a hash func to calculate next position for mem read, this position is not continuous, so each time L1 cache need refresh and access cache line to read next mem block, it costs time, and make memcpy perf bad.

The good thing is the hashed position is predictable, we can use XXH32() to predict next position in last memcpy, by using prefetch(), we can let system preload next mem into cache during current memcpy, it saves time and reduce cache miss rate.

test=develop

@LeoZhao-Intel
Copy link
Contributor Author

@luotao1
Copy link
Contributor

luotao1 commented Nov 22, 2019

I test it on pyramid_dnn training, see PaddlePaddle/benchmark#151 (comment) for details.

- single thread on 6148 (s/epoch) 24 threads on 6148(s/epoch)
before this PR 381 495
after this PR 364(+4.5%) 485.7(+1.8%)
remove copy 16 float 367.6 484.7

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1
Copy link
Contributor

luotao1 commented Nov 23, 2019

Please update the PR description for more details.

@luotao1 luotao1 merged commit b19e1a1 into PaddlePaddle:develop Nov 24, 2019
seiriosPlus pushed a commit to seiriosPlus/Paddle that referenced this pull request Dec 9, 2019
* use prefetch to load next mem into cache

test=develop

* remove hard code memcpy om pyramid_hash_ff

test=develop
seiriosPlus pushed a commit to seiriosPlus/Paddle that referenced this pull request Dec 9, 2019
* use prefetch to load next mem into cache

test=develop

* remove hard code memcpy om pyramid_hash_ff

test=develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants