-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernfs_memcg: Add helpers to gather memcgroup related data #96
Conversation
1909949
to
8c618c9
Compare
I have added other helpers and modified the earlier ones, so that they work with other UEK versions as well |
2d9b918
to
d84a204
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating memcg helpers. I added couple comments. And beside that, please create a bug for it and put the number in the git log.
Signed-off-by: Imran Khan <[email protected]>
Thanks @biger410 for reviewing this. I have addressed your review comments. Could you please have a look and let me know if you have any further feedback |
Orabug: 37322867 Signed-off-by: Imran Khan <[email protected]>
bb98c05
to
e0875fa
Compare
The new changes look good to me. One more request is that can we add a corelen module for it? It should run either with -M option or -A option. |
I have added corelens module. python3 -m drgn_tools.corelens ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/VMCORE -d ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/ page: 0xffffc35c881b0000 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0040 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0080 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b00c0 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0100 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0140 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0180 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b01c0 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip |
Is there a paste error? How could this trigger the corelen cmd?
And with 10000 default, how could user run corelen cmd to dump all pages? |
Yes, it was a copy paste error and missed the -M part . The actual command is: and output is like the one shown below: `imran@imran-metabox:~/oracle-samples-drgn-tools/drgn-tools$ python3 -m drgn_tools.corelens ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/VMCORE -d ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/ -M kernfs_memcg page: 0xffffc35c881b0040 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0080 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b00c0 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0100 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0140 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b0180 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip page: 0xffffc35c881b01c0 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip ............................................................ Regarding default page count of 10K, I am using this value to make sure we don't end up spending a lot of time while collecting this data. We have seen that scanning all pages can take hours , so my idea here is that we get information of 10K pages and if that does not indicate anything conclusive , we can later scan all pages. Let me know if its sounds okay or using something other than 10K would be more acceptable |
The default value is good to me, to dump all pages, just pass option "-m 0". I think we should make the default value and how to dump all page clear in the cmd help doc for user. |
ada1a81
to
fe8d6ee
Compare
I have modified cmd help to provide this information as per your suggestion. The cmd help looks like below Print information related to pages, that are pinning memcgroup(s) options: Please let me know if it looks okay. |
Looks good to me. @brenns10 do you want to take a look as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, these are looking really good. I have a few small comments, the only major thing here is the issue with PG_slab
.
Orabug: 37322867 Signed-off-by: Imran Khan <[email protected]>
@brenns10 I have addressed your review comments . Could you please have a look once more? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reorganize the patch to address the review feedback while not add new commits for that? Once done reorganize patch, git push --force-with-lease
can help overwrite the previous commits.
class PagesPinningMemcgroups(CorelensModule): | ||
"""Print information related to pages, that are pinning memcgroup(s)""" | ||
|
||
name = "kernfs_memcg" | ||
name = "pages-pinning-memcg" | ||
run_when = "never" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes senes this only run when user requests it explicitly. Should we make the default behavior to dump all pages pinning zombie cgroup, that looks more common to me when troubleshooting zombie cgroup issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I have modified it to dump all pages by default.
|
||
def run(self, prog: Program, args: argparse.Namespace) -> None: | ||
get_num_mem_cgroups(prog) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this more generic like the output of cat /proc/cgroups
, not only support memory cgroup, but other cgroup as well? It doesn't have to be include with this pull request, you can start a new one for it if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can include this change in a different PR. I included memcgroup numbers here because we often run into issue due to zombie memcgroups and hence having their number readily available will help us to decide if we need to run PagesPinningMemcgroups
module.
Add kernfs based helpers to extract memcg related information, like number of active and inactive memcgroups, page cache pages pinning memcgroups etc. Orabug: 37322867 Signed-off-by: Imran Khan <[email protected]>
Orabug: 37322867 Signed-off-by: Imran Khan <[email protected]>
c712265
to
7636da0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks @imran-kn
This as of now is just a dump of some of my bespoke debug scripts.