docstring format for api, possibly across code base? #5937
Replies: 34 comments
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
-
I don't think we should spend time formalizing this, we have very limited amount of user facing functions really. And we may find better way to spend our time than argue about docstring formatting. |
Beta Was this translation helpful? Give feedback.
-
@Suor I think you brought this topic, right? :) Are we good to close it (discussion around formatting) then and let Jorge decide this, for example? |
Beta Was this translation helpful? Give feedback.
-
I agree it's only important at the moment to have good full code doc for the API functions and I'm happy to just pick one format but would prefer to get some feedback from the @iterative/engineering team. In fact I wanted to summarize the options here (extracted from #3130 (comment) +):
Among doc generators, Sphinx is by far the most popular, with pdoc a remote 2nd place, it seems. Sphinx uses reStructuredText format. pdoc is interesting because it supports Markdown and other nicer-looking formats worth checking out actually (from Numpy and Google). As for IDEs, VSCode intellisense doesn't interpret any format in the docstrings it seems. It mostly focuses on building and inspecting the Python code by itself. From what I can see here, PyCharm auto completion works similarly. So easy to read docstrings e.g. numpydoc or Google style may be best if we expect IDEs code completion will be the main way users reach our docstrings. And especially since we're writing the API documentation manually anyway (not generating it). p.s. An interesting VSCode extension to help write docstrings in any of the above formats is https://marketplace.visualstudio.com/items?itemName=njpwerner.autodocstring, BTW. |
Beta Was this translation helpful? Give feedback.
-
OK... I'm going to suggest we adopt the Google style for modules (files) and functions. Reasons:
All in favor? ✋ |
Beta Was this translation helpful? Give feedback.
-
I am in favor of no style, it will only slow down reviews with no real gain. |
Beta Was this translation helpful? Give feedback.
-
I'm in favour of an easy-to-use style so it doesn't slow down reviews :) |
Beta Was this translation helpful? Give feedback.
-
Also in favor of not slowing down reviews for doc strings. |
Beta Was this translation helpful? Give feedback.
-
I'm in favor of some simple consistent style for the external API functions (better to be more or less consistent internally, but would never block any review for this). If we have a ticket like this #2316 and we deal with this, it's better to agree what are some basic expectations and how this should looks like. I'm fine with Google style. |
Beta Was this translation helpful? Give feedback.
-
I think it's less about exact styles and more about completeness of docstings. I.e. Google format specifies which sections to include in the docstring (basically description, args, result/yield, throws I think). Yes, a style is attached to that but it's pretty simple... And consistency is good!
Yes guys, this issue is mostly about the API code. I think it's totally up to the core engineering team to decide about docstrings in the rest of the code. I'm confused about the voting results so far: 1 for no format, 2 for Google, 2 invalid haha – no consensus yet :( |
Beta Was this translation helpful? Give feedback.
-
What I don't like from Google style is that it "annotates" the arguments with types. I'm afraid that if we start documenting everything, we will end up including useless comments just for the sake of, like: def do_stuff(x, y):
"""A method that do_stuff given x and y
Args:
x (str): This is an X
y (str): This is a Y.
""" Ideally, the method name should be clear enough to understand what it does, so no need to put that on the description, same with argument names. Things that I think are important to document: Side effects Lines 67 to 81 in ec2c8ad Explain convoluted implementations / What & Why? Lines 26 to 49 in ec2c8ad Give examples when code is hard to read Lines 271 to 298 in ec2c8ad DISCLAIMER: I wrote all of those docstrings. I'm completely bias. 🙈 |
Beta Was this translation helpful? Give feedback.
-
I fully agree, except if we start doing intelligent parsing of docstrings in order to auto-gen docs in such a way that requires having to (annoyingly) document everything. I prefer to not document the obvious. And any intelligent parsing should look at the func signature as well as docstrings to pick up args. |
Beta Was this translation helpful? Give feedback.
-
@MrOutis @casperdcl 💯 about the "code must be good enough to read without comments in the first place", no doubts about this! But for this discussion, let's focus on external APIs only. ( My bias and experience here. I really like when I can press a hotkey, get to the source code of the API and see some documentation there (~ short version of what you can get online, but enough in most cases to understand everything, probably no examples, but definitely edge cases, what do we expect from the params, etc). It tremendously simplifies the experience in my case. So, back to the question. Do we want to have a bit more then just an obvious "This is a get method, it gets something" kind of docstrings at all (again, for external, user-facing API)? If not, should we just delete existing one-sentence comments (that do not provide any value whatsoever)? Btw, a solution might be to put just a link the docs? (similar to what we do in CLI). |
Beta Was this translation helpful? Give feedback.
-
Yes, definitely
Only if needed to describe a complex operation
Yes if no genuine value provided
Yes, though I'd recommend adding a travis job which checks for dead links. Overall in all cases I'd always recommend the original dev does the sensible thing, whether it's comment in prose, in bullet points, in code examples, in links - whatever they think will be most useful for people who may read it. There may not be a one-style-fits-all-functions approach. |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
I am against going for completeness too. Google format is a good example of how bloat is created by doing so. The same reason to avoid consistency if that causes bloat. Python stdlib lives happily without that. @shcheklein the doc-string should describe what function is doing, so that people wouldn't need to guess by name. Params should only be described if that is non-obvious, formal descriptions is bloat and will only stop people from reading it or at least slow them down. Any edge cases should never make it to doc-strings, unless they represent some common use case at the same time. |
Beta Was this translation helpful? Give feedback.
-
don't see how does it create bloat. Any docs are valuable for me (list of exceptions to expect if it can be provided, types, brief param descriptions, etc). I do use them extensively if they exist. Agreed that we can omit parameters if they are obvious. On the other hand how is it different from CLI - why do we include brief description to seemingly obvious options?
Not sure I understand why should we rely on this. We can check other very popular libraries? Also, they have extensive docs for a lot of stuff even inside. Have you seen any guidelines they use by chance? Do they have any?
check To reiterate. I don't care that much about philosophy, style guides, stdlib, google, etc. As I said my bias is very simple - I prefer offline to online and I do read and use public APIs docs via IDEs extensively. Plus it's better to be consistent. That's why I voted the way I voted. |
Beta Was this translation helpful? Give feedback.
-
Simple functions, like ones from |
Beta Was this translation helpful? Give feedback.
-
Okay, to summarize. Looks no one wants to introduce a heavy formats like google's, etc (with sections, and whatnot) and more in favor of stdlib's approach, namely: single or multi-line block of text which might (and probably should) include param names, exceptions, etc, and describes what's happening in some reasonable way. I'm totally fine with this as well. If we all agree on this, @jorgeorpinel @efiop let's close this? For the issue of improving (non-) existing docstrings we should create a separate issue? which I think is way more important to be honest than discussing a specific format in the first place. @casperdcl summarized it well enough as well:
|
Beta Was this translation helpful? Give feedback.
-
It sounds pretty vague to me. Is the policy "please write some sort of docstrings when you think its reasonable"? We would be back at square one but I can't really have a strong opinion here so up to you guys 🙂 p.s. just a question: does this mean it's not allowed to use a full format? Or is it totally fee for everyone to chose their favorite docstring flavor? Bc I've already been using Google's a little bit 😋 |
Beta Was this translation helpful? Give feedback.
-
@jorgeorpinel I would avoid mixing different styles - e.g. Google and stdlib. The way I understand stdlib - you just write a "description" that is valuable to the end-user of what's going on, using param names as terms in it.
we go from a format discussion to the usefulness of them, let's move it to another topic please? |
Beta Was this translation helpful? Give feedback.
-
I think we should write docstrings in some sane format. Google's style is a good choice too, but maybe there is something better. It will allow editors to pick them up easily, improving the dev experience. Sloppy docstrings like in stdlib are as good as nothing. If we start bothering with these, we might as well stick to some good format that will be actually useful. I don't like docstrings myself, but python is too high level to live without them. Even the exceptions introduce some non-obvious flow control, that you can't really tell from looking at one function's body. So I think we need to be good developers and use docstrings to document what needs to be documented. |
Beta Was this translation helpful? Give feedback.
-
I just want to add a link to this discussion about exceptions in stdlib docstrings: #3352 (comment) The summary is that they seem to just be mentioned (not even with specific type) some times (when it seems relevant). So, a pretty flexible common-sense approach as well. |
Beta Was this translation helpful? Give feedback.
-
@jorgeorpinel stdlib is quite old, so I would totally believe that they don't use proper docstrings because back in a day those didn't exist for python and nowadays no one wants to go through the whole codebase to consistently use new format everywhere. |
Beta Was this translation helpful? Give feedback.
-
Yeah, I didn't pick that style. I voted for Google style originally, but others weren't convinced and decided to use a super simple "no style" docstrings inspired on stdlib (at least for now), which is why I'm focusing on stdlib for docstring related matters. (Please refer to discussion above.) Recently I've noticed we kind of already use Google style docstrings throughout the cod base, so I'm a bit confused again... But will continue to assume the resolution so far is in #3176 (comment) ^ Totally up to you guys. |
Beta Was this translation helpful? Give feedback.
-
I don't really see a point of the discussions here, feels like bikeshedding. >> help(dvc.api.open) Then, we could start docstring format for api later on if it provides more benefits (and, it's not hard to change formats for just |
Beta Was this translation helpful? Give feedback.
-
I am still against any format, including googles. It only adds bloat. You should only mention things important for a specific function, not a predefined list in a predefined format, half of which people will need to skim through. I don't care if some IDEs would be happy parsing it, I prefer it to be human readable instead. |
Beta Was this translation helpful? Give feedback.
-
The one thing I've seen that would actually be pretty useful is to use back quotes around param names when explaining them in the docstring, since we're avoiding a special Args section. This was discussed here early on and it was mostly Alex against it. We ended up agreeing not to use them but now that I've been writing them I kind of miss them. This is because param names or other literals tend to be common words like def func(path, x, y)
```Calculates something with x and y argument,
based on a file found in the given path arg.``` vs. def func(path, x, y)
```Calculates something with `x` and `y`, based on the file in `path`.``` It's just a bit faster and more obvious using `. Maybe we can even allow this partially? (Not required but allowed when needed) |
Beta Was this translation helpful? Give feedback.
-
So we agree that we need docstrings in one shape or another, right? If so, please take a good look at the discussions in this ticket and notice how we all have different preferences. Most of us remember the constant fights we've had about the coding style a year ago, before we've started using black. We all come from different backgrounds and we all have different preferences, which will be a constant source of fights unless we agree on some style and just stick to it. "no-style" is a bad idea, because we will drown in fights as we did with code style and then we will still choose some style to put the end to our fights. So why not just skip ahead and pick something instead of wasting our energy on constant fights? That is why I'm admittedly pushing towards picking some style and sticking to it right away. So from my research, there is 1 defacto standard tool for checking docstrings: pydocstyle and it supports three styles IMPO, |
Beta Was this translation helpful? Give feedback.
-
We currently don't have a desired style to write docstrings.
Should we come up with one? If so, which one?
Beta Was this translation helpful? Give feedback.
All reactions