use prefetch to load next mem into cache #21206

LeoZhao-Intel · 2019-11-15T10:42:46Z

memcpy usage in hash_embedding_ff of Pyramid DNN is a typical random memory access case, which makes cache miss frequently. Before each memcpy, it uses a hash func to calculate next position for mem read, this position is not continuous, so each time L1 cache need refresh and access cache line to read next mem block, it costs time, and make memcpy perf bad.

The good thing is the hashed position is predictable, we can use XXH32() to predict next position in last memcpy, by using prefetch(), we can let system preload next mem into cache during current memcpy, it saves time and reduce cache miss rate.

test=develop

LeoZhao-Intel · 2019-11-19T10:20:30Z

PaddlePaddle/benchmark#151

luotao1 · 2019-11-22T14:33:07Z

I test it on pyramid_dnn training, see PaddlePaddle/benchmark#151 (comment) for details.

-	single thread on 6148 (s/epoch)	24 threads on 6148(s/epoch)
before this PR	381	495
after this PR	364(+4.5%)	485.7(+1.8%)
remove copy 16 float	367.6	484.7

paddle/fluid/operators/pyramid_hash_op.cc

test=develop

luotao1

LGTM

luotao1 · 2019-11-23T14:58:52Z

Please update the PR description for more details.

* use prefetch to load next mem into cache test=develop * remove hard code memcpy om pyramid_hash_ff test=develop

use prefetch to load next mem into cache

5316951

test=develop

luotao1 added the Intel label Nov 15, 2019

LeoZhao-Intel mentioned this pull request Nov 19, 2019

Optimize the performance of PyramidDNN on CPU PaddlePaddle/benchmark#151

Open

luotao1 reviewed Nov 22, 2019

View reviewed changes

paddle/fluid/operators/pyramid_hash_op.cc Outdated Show resolved Hide resolved

remove hard code memcpy om pyramid_hash_ff

96c52f0

test=develop

luotao1 approved these changes Nov 23, 2019

View reviewed changes

luotao1 merged commit b19e1a1 into PaddlePaddle:develop Nov 24, 2019

seiriosPlus pushed a commit to seiriosPlus/Paddle that referenced this pull request Dec 9, 2019

use prefetch to load next mem into cache (PaddlePaddle#21206)

bcd20f6

* use prefetch to load next mem into cache test=develop * remove hard code memcpy om pyramid_hash_ff test=develop

seiriosPlus pushed a commit to seiriosPlus/Paddle that referenced this pull request Dec 9, 2019

use prefetch to load next mem into cache (PaddlePaddle#21206)

50e6b5e

* use prefetch to load next mem into cache test=develop * remove hard code memcpy om pyramid_hash_ff test=develop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use prefetch to load next mem into cache #21206

use prefetch to load next mem into cache #21206

LeoZhao-Intel commented Nov 15, 2019 •

edited

Loading

LeoZhao-Intel commented Nov 19, 2019

luotao1 commented Nov 22, 2019 •

edited

Loading

luotao1 left a comment

luotao1 commented Nov 23, 2019

use prefetch to load next mem into cache #21206

use prefetch to load next mem into cache #21206

Conversation

LeoZhao-Intel commented Nov 15, 2019 • edited Loading

LeoZhao-Intel commented Nov 19, 2019

luotao1 commented Nov 22, 2019 • edited Loading

luotao1 left a comment

Choose a reason for hiding this comment

luotao1 commented Nov 23, 2019

LeoZhao-Intel commented Nov 15, 2019 •

edited

Loading

luotao1 commented Nov 22, 2019 •

edited

Loading