Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection of duplicates in API reference, some to remove #277

Merged

Conversation

sadielbartholomew
Copy link
Member

Took a slight tangent from my reviewing of #272 when considering API reference coverage and spotting there are some methods listed multiple times in the same file or even under the same sub-heading.

I thought I'd make a small update to the check_docs_api_coverage script so we can automate the detection of those when we are already checking that they are all present. So with this change we are checking they are all present precisely once and not just not zero times like before (sub-string method name complications are dealt with - see code comments), unless we've deliberately chosen to duplicate them under different headings. A warning is raised reporting any duplicates for consideration.

I've removed any duplication that seems unintended in the API reference source restructured text, thought left in some cases where it looks like we might want to put methods under multiple headings. Namely, at PR opening the script still reports these which I haven't touched:

$ ./check_docs_api_coverage
WARNING: some methods are listed multiple times inside one class file/page. Decide if the duplicates are intended and if not remove them. They are:
cfdm.Data.del_calendar
cfdm.Data.get_calendar
cfdm.Data.has_calendar
cfdm.Data.set_calendar
cfdm.Data.sum
cfdm.Data.unique
cfdm.Field.del_data_axes
cfdm.Field.get_data_axes
cfdm.Field.has_data_axes
cfdm.Field.set_data_axes
cfdm.core.Coordinate.del_bounds
cfdm.core.Coordinate.del_data
cfdm.core.Coordinate.set_bounds
cfdm.core.Coordinate.set_data
cfdm.core.Data.del_calendar
cfdm.core.Data.get_calendar
cfdm.core.Data.has_calendar
cfdm.core.Data.set_calendar
cfdm.core.Field.del_data_axes
cfdm.core.Field.get_data_axes
cfdm.core.Field.has_data_axes
cfdm.core.Field.set_data_axes
cfdm.core.PropertiesDataBounds.del_bounds
cfdm.core.PropertiesDataBounds.del_data
cfdm.core.PropertiesDataBounds.set_bounds
cfdm.core.PropertiesDataBounds.set_data

All non-private methods are documented

as opposed to, previously, a longer list:

WARNING: there are methods which are listed multiple times in one class file/page. Decide if the duplicates are useful. They are:
cfdm.Data.del_calendar
cfdm.Data.get_calendar
cfdm.Data.has_calendar
cfdm.Data.set_calendar
cfdm.Data.sum
cfdm.Data.unique
cfdm.Domain.apply_masking
cfdm.Domain.climatological_time_axes
cfdm.Field.climatological_time_axes
cfdm.Field.del_data_axes
cfdm.Field.get_data_axes
cfdm.Field.has_data_axes
cfdm.Field.set_data_axes
cfdm.NetCDFArray.get_address
cfdm.core.Coordinate.del_bounds
cfdm.core.Coordinate.del_data
cfdm.core.Coordinate.set_bounds
cfdm.core.Coordinate.set_data
cfdm.core.Data.del_calendar
cfdm.core.Data.get_calendar
cfdm.core.Data.has_calendar
cfdm.core.Data.set_calendar
cfdm.core.DomainAxis.construct_type
cfdm.core.Field.del_data_axes
cfdm.core.Field.get_data_axes
cfdm.core.Field.has_data_axes
cfdm.core.Field.set_data_axes
cfdm.core.PropertiesDataBounds.del_bounds
cfdm.core.PropertiesDataBounds.del_data
cfdm.core.PropertiesDataBounds.set_bounds
cfdm.core.PropertiesDataBounds.set_data


All non-private methods are documented

Happy to remove those duplicates from the first snippet too, though? But the main thing is the automation of the check and that I've removed methods duplicated under the same heading, which can be blitzed to tidy.

@sadielbartholomew sadielbartholomew added documentation Improvements or additions to documentation code tidy/refactor release labels Oct 31, 2023
@sadielbartholomew sadielbartholomew self-assigned this Oct 31, 2023
@davidhassell
Copy link
Contributor

Hi Sadie - thanks for this. I reckon that we could keep cfdm.Data.sum and cfdm.Data.unique and get rid of the rest. What do you think?

@sadielbartholomew
Copy link
Member Author

I reckon that we could keep cfdm.Data.sum and cfdm.Data.unique and get rid of the rest. What do you think?

Sounds good to me. I'll update the PR shortly to get the duplicates other than those removed, in the least appropriate location.

@sadielbartholomew
Copy link
Member Author

@davidhassell I have now updated the PR to remove the majority of the rest of the duplicates, so that the output of the script is:

$ ./check_docs_api_coverage 
WARNING: some methods are listed multiple times inside one class file/page. Decide if the duplicates are intended and if not remove them. They are:
cfdm.Data.sum
cfdm.Data.unique

All non-private methods are documented

in line with your idea to leave just those two under cfdm.Data.

So this should be ready for re-review. Thanks.

Copy link
Contributor

@davidhassell davidhassell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sadie, looks good, thanks. I have a question about the partial matches, though ...

# so we must account for that. Checking next character
# of duplicate(s) is something other than a newline or
# whitespace, seems robust and simplest.
end_loc = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume I'm missing something here, but count doesn't count partial matches, so why is this needed?

>>> a = ['abc', 'abcd', 'abcde']
>>> a.count('abc')
1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's needed because what's being done is count from one larger string, being the string read from the full file, rather than a list of sub-strings as per your example, in which case we have as an equivalent:

>>> a = "abc abcd abcde"
>>> a.count("abc")
3

Copy link
Member Author

@sadielbartholomew sadielbartholomew Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, if you can think of a more elegant way to go about the logic I've added, I am happy to hear. I might be able to think of an alternative myself if I mull over it, but since it's a convenience script for automating checks I am satisfied as long as it works and isn't (too) slow 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good - thanks for the explanation

Copy link
Contributor

@davidhassell davidhassell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the short discussion, I now understand, and therefore approve, all of these changes. Thanks, Sadie.

@sadielbartholomew sadielbartholomew merged commit 28fb90c into NCAS-CMS:main Nov 14, 2023
@sadielbartholomew sadielbartholomew deleted the find-api-ref-duplicate-entries branch November 14, 2023 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code tidy/refactor documentation Improvements or additions to documentation release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants