Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New method to get profiler data as python object #13

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

nima
Copy link

@nima nima commented Jul 28, 2016

Hey!

I needed a way to access the profiler data programmatically; to do this I've implemented the get_stats method, and any other changes there are to reduce code-duplication.

The next steps I'd like to work on would be:

  1. Rework annotate' to useget_stats`
  2. Add any missing callgrind' data to the object returned byget_stats, and reworkcallgrindmethod to also useget_stats`

To see how this new method is used, please check out the pydata' branch. In iPython, you can basically perform things likels,cat, andgrepthe files that the profiler picked up, there's a lot of room for awesome features, but I've only been working on what I need most. Here's a quick demo using a sample file calledsample.py`:

import time
import os
import pdb

def _sleep(t):
    time.sleep(t / 1000000.0)
    return t

def aaa(a):
    return a

def bbb(b):
    return _sleep(b)

def ccc(c):
    if not os.path.exists('/proc/%d' % c): c ^= c
    return c

print('test')

def foo(a, b, c):
    r = 0

    for i in range(a):
        for j in range(b):
            for k in range(c):
                r += sum([
                    aaa(a), bbb(b),
                    ccc(c)
                ])

    return r

def bar(i=3):
    return foo(i, 20, 100)

And now, the profiler:

In [31]: import sample
In [33]: fn = (sample.bar, 8)
In [34]: s = pprofiler.Profiler.profile('test@alpha', *fn) # profile sample.bar(8)
In [35]: p = pprofiler.Profiler(s)

# Utility function for color-printing python objects as JSON
In [39]: from pprofile.pprofiler import jprint
In [40]: jprint(p.overhead())
{
  "actual": 18.111714124679565,
  "overhead": 0.0001728534698486328,
  "total": 18.111886978149414
}

# List files, ordered by cumulative time, optionally takes an argument of the number of seconds to match or exceed (reduce output towards the worst-case files) 
In [41]: p.ls()
  0 |   17.3 | /tmp/sample.py
  1 |    0.8 | /opt/lib/python2.7/genericpath.py
  2 |    0.0 | /tmp/pprofile/pprofiler.py

# Catenate the file (using it's path, or index according to p.ls, show time/hit score and line numbers.
In [42]: p.cat(0)
  0.0/0         | 0001 | import time
  0.0/0         | 0002 | import os
  0.0/0         | 0003 | import pdb
  0.0/0         | 0004 |
  0.1/16000     | 0005 | def _sleep(t):
 14.2/16000     | 0006 |     time.sleep(t / 1000000.0)
  0.2/16000     | 0007 |     return t
  0.0/0         | 0008 |
  0.1/16000     | 0009 | def aaa(a):
  0.1/16000     | 0010 |     return a
  0.0/0         | 0011 |
  0.1/16000     | 0012 | def bbb(b):
  0.4/16000     | 0013 |     return _sleep(b)
  0.0/0         | 0014 |
  0.1/16000     | 0015 | def ccc(c):
  0.5/16000     | 0016 |     if not os.path.exists('/proc/%d' % c): c ^= c
  0.2/16000     | 0017 |     return c
  0.0/0         | 0018 |
  0.0/0         | 0019 | print('test')
  0.0/0         | 0020 |
  0.0/1         | 0021 | def foo(a, b, c):
  0.0/1         | 0022 |     r = 0
  0.0/0         | 0023 |
  0.0/9         | 0024 |     for i in range(a):
  0.0/168       | 0025 |         for j in range(b):
  0.2/16160     | 0026 |             for k in range(c):
  0.1/16000     | 0027 |                 r += sum([
  0.6/16000     | 0028 |                     aaa(a), bbb(b),
  0.4/16000     | 0029 |                     ccc(c)
  0.0/0         | 0030 |                 ])
  0.0/0         | 0031 |
  0.0/1         | 0032 |     return r
  0.0/0         | 0033 |
  0.0/1         | 0034 | def bar(i=3):
  0.0/1         | 0035 |     return foo(i, 20, 100)

# Grep for a string in all files known to the profiler
In [43]: p.grep('def ccc')
# file[0]: /tmp/sample.py
  0.1/16000     | 0015 | def ccc(c):

# Get a list of the worst lines of code, exceeding 0.5s here, and optionally limit the search by providing a filter expression (not demonstrated here)
In [54]: jprint(p.get_filtered_profile(0.5))
[
  {
    "address": "/tmp/sample.py:6",
    "datum": {
      "duration": 17.271899461746216,
      "file_name": "/tmp/sample.py",
      "line": {
        "block_id": [
          "/tmp/sample.py",
          "_sleep",
          5
        ],
        "duration": 14.248581409454346,
        "hits": 16000,
        "line": "    time.sleep(t / 1000000.0)",
        "line_no": 6,
        "line_profile": []
      }
    }
  },
  {
    "address": "/tmp/sample.py:28",
    "datum": {
      "duration": 17.271899461746216,
      "file_name": "/tmp/sample.py",
      "line": {
        "block_id": [
          "/tmp/sample.py",
          "foo",
          21
        ],
        "duration": 0.6394095420837402,
        "hits": 16000,
        "line": "                    aaa(a), bbb(b),",
        "line_no": 28,
        "line_profile": [
          {
            "callee_file": "/tmp/sample.py",
            "callee_line": 12,
            "callee_name": "bbb",
            "duration": 15.001068115234375,
            "hits": 16000
          },
          {
            "callee_file": "/tmp/sample.py",
            "callee_line": 9,
            "callee_name": "aaa",
            "duration": 0.20499634742736816,
            "hits": 16000
          }
        ]
      }
    }
  }
]

pprofile.py Outdated
@@ -883,14 +962,37 @@ def runpath(path, argv, filename=None, threads=True, verbose=False):
"""
_run(threads, verbose, 'runpath', filename, path, argv)

_allsep = os.sep + (os.altsep or '')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of defining on module is to compute once on module import rather than once per call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what you say is true, in the general case. In the specific case here however - given that this code is executed at most once per execution of pprofile (correct me if I'm wrong) - then shouldn't readability/style take a higher precedence?

Basically unless a profiler tells me something is significantly faster in the less good-looking style, I opt for the good-looking style :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tastes... :)

Another aspect is that this changes a piece of code which does not have to change for the new feature. Like the s/close/close_fn/ change in main. This reduces (admittedly by a trivial amount here) patch (and later, VCS history) readability. I'm not opposed to style patches (although they can quickly degenerate in style wars), but I do not like patches which touch both functionality and style outside of that functionality.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can't argue with that; 👍

@vpelletier
Copy link
Owner

I commented on minor, main-feature-independent issues in the code.

On the main feature, I am a bit worried about the impact on forward compatibility: I wonder how dicts of lists of dicts (etc) work as an API.

For example, line_profile['block_id'] has at least 3 formats:

  • module, None, None
  • module, None, ''
  • module, func, lineno

The 3rd item can take 3 types depending on some internal details. Caller code will look like a 3-branch "if", and if it does not it is likely to fail when profiling another source code. Is caller supposed to treat this as a black box ? How to document this ? How to prevent naïve introspection of this value ?

From a quick look at the pprofiler.py file in your branch, maybe this is what should get exposed instead: it would subclass profiling classes and access the needed internal details directly (_FileTiming instances). Maybe it could be named ipprofile.py to express the interactive usage intention (which softens the constraints on forward compatibility).

@nima
Copy link
Author

nima commented Jul 29, 2016

On block_id - this wasn't a successful venture and I'm going to remove it. I was trying provide the user with an abstract view of the code, so that they can run queries like "what are the slowest blocks in the code?" rather than simply "what are the slowest lines". I will revert this change and push - but would be really interested to hear what your thoughts are - can we do this in a deterministic way? Rather than treating the source code as a pile of lines, can we dually profile it as a collection of objects, methods, and functions - and construct a tree? That would be super-cool!

Other than the block_id thing however, I did try and make the return object consistent - should be consistent enough that we could convert the output to JSON and police it against a JSON schema. In fact that would be a great unit test to add - I can a JSON schema (http://json-schema.org/documentation.html) for it if you agree?

So my goal here was to make get_status() the source of the profile data - then rewrite functions such as annotate() and callgring() to call this function for their input data.

Does that sounds like a bad idea?

I would be nice for the user not to have to subclass this, but simply be able to call get_stats() and do what they want to with that data - the same was as annotate() and callgrind() would. In fact, those could conceptually be split out the same way a user would be expected to use get_stats().

I'm open to changing the actual format of the object returned by get_stats() - moving towards a standard structure (either dict/list format, or an actual object (but that may slow things down) per entity. By entity I mean a line, module, function, etc.

What are your thoughts?

@vpelletier
Copy link
Owner

vpelletier commented Jul 30, 2016

I was trying provide the user with an abstract view of the code

I would tend to do this using extra run-time introspection, like what I did in zpprofile.py when encountering specific functions (object database and relational database queries) - just generalised to all functions: from current callable's details, tell if it is bound to an instance, maybe of what class etc. Then it opens the possibility to not only track hot classes but also hot instances.

So my goal here was to make get_status() the source of the profile data - then rewrite functions such as annotate() and callgring() to call this function for their input data.

On the avoiding-code-rot level, it is a good idea.

But as I said, I think it would be better to expose the interactive-usage-friendly layer directly rather than an extra level of internal state: it is easy to defend breaking (by future evolutions - and I prefer not to presume of what evolutions will do to the internal data structures) the API in the former because caller is a human on a command line: he can quickly see something changed and adapt. But it is hard to defend breaking the API in the latter case because python structures are just too tempting to use in another program (integrating pprofile), and once caller is a program nothing auto-corrects anymore.

So I would rather:

  • not expose data in a new python structure (if - and that's still a big "if" to me - it should be done, maybe _FileTiming would be a better format to expose as it allows more room for backward-compatibility code than having to later fake being a dict/list and overloading __getitem__)
  • as a consequence, keep callgrind & annotate output generators as-is
  • add classes in a new module similar to zpprofile.py, which would contain interactive-friendly commands and are free to themselves access _FileTiming instances and all pprofile internals: if internal data structure changes, that code can follow easily as it is in the same repository (easy to find) and released at the same time

nima added 5 commits August 2, 2016 16:36
Quoting `vpelletier':

> Well, in the case of qcachegrind, this is specifically for the
> callgrind output, so it made sense to be local to that method.
Will reserve style-changes for another branch to eliminate unnecessary
noise in this branch's diff:

 - Reverted `allsep'
 - Reverting `close' -> `close_fn' (style-change)
 - Reverted `x' -> `path' (style-change)
The `relative_path' kwarg is now mandatory, so user has to choose
(since no default makes sense or is otherwise intuitive).

Also updated the docstring.
Quoting `vpelletier':

> This test will prevent any output in statistic mode, as no time is
> tracked but only hits. I think it can be dropped and contained block
> be un-indented without any other change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants