Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature that save mask as geojson format #306

Merged
merged 7 commits into from
Dec 16, 2024

Conversation

nanli-emory
Copy link
Collaborator

Add a --geojson option into HistoQC. The default value of geojson is False. Test the result and import intp Qupath.
This feature is associate with issue #298

Screenshot from 2024-06-11 17-38-07

@nanli-emory
Copy link
Collaborator Author

Hi @jacksonjacobs1, Please review it.

@choosehappy
Copy link
Owner

not quite, i think there was a miscommunication here. the module should be in the config file, and should be dynamic, such that the user can add it at any point during the pipeline, or even multiple times, with a tag specifying the mask to be written out. coding wise should be similar to the classification pen marking work

@nanli-emory
Copy link
Collaborator Author

Hi @choosehappy and @jacksonjacobs1,

I created a new function called saveMask2Geojson in SaveModule. Users can export a mask in geojson format dynamically.
I modified and tested by using the histoqc/config/config_clinical.ini config file for geojson mask.
Please review and provide some feedback.

@choosehappy
Copy link
Owner

yea, much closer!

1 point left:

i think " s["img_mask_use"]) " should actually be accepted as a variable with the name of the mask to save --- "img_mask_use" can and should be the default, but if someone is interested in saving e.g., the pen detector or blur mask they should be able to as well

do you see what I mean?

@nanli-emory
Copy link
Collaborator Author

Hi @choosehappy and @jacksonjacobs1,

I added an option called 'mask_name' to let the user point out what mask to save.

@choosehappy
Copy link
Owner

great, looks good to me. no remaining points on my side- will let @jacksonjacobs1 code review and comment as needed, if not merge

@choosehappy
Copy link
Owner

btw, i quick note - we can as well easily read/write gzip compressed json (an example is in my blog). normally compressed json is about 33% the size of the uncompressed. i suspect the ones that we have here are small enough that we don't need to worry about it, and its "nicer" to be able to easily see the json, but its something to keep in mind in case we have more "serious" file sizes

@CielAl
Copy link
Contributor

CielAl commented Jun 20, 2024

yea, much closer!

1 point left:

i think " s["img_mask_use"]) " should actually be accepted as a variable with the name of the mask to save --- "img_mask_use" can and should be the default, but if someone is interested in saving e.g., the pen detector or blur mask they should be able to as well

do you see what I mean?

Just in case, the current implementation use measures.find_contour, which typically finds both the inner and outer rings. In that case, your geojson output will consider the inner and outer ring as individual polygons, if I understand it correctly.

This might be relevant:
https://stackoverflow.com/questions/72040574/how-to-determine-between-inner-and-outer-contour-with-python-opencv

@jacksonjacobs1
Copy link
Collaborator

Agree with @CielAl

The geojson template will need to be updated to the multipolygon specification:
https://stevage.github.io/geojson-spec/#section-3.1.7

https://stackoverflow.com/questions/43645172/geojson-multipolygon-with-multiple-holes

I would also recommend testing whether qupath correctly renders multi-polygons imported from geojson.

@nanli-emory
Copy link
Collaborator Author

Hi @jacksonjacobs1, completed correctly showing the inner and outer rings as individual polygons. Please review the code. Thanks.
Screenshot from 2024-11-15 11-27-54

Copy link
Collaborator

@jacksonjacobs1 jacksonjacobs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! I left a couple of comments

histoqc/SaveModule.py Show resolved Hide resolved
the followings are helper functions for generating geojson
'''

feature_template = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using dicts to model json data is error prone. I think it would be better to use types provided by the geojson library:
https://python-geojson.readthedocs.io/en/latest/

I think it's worth spending a little more time to make this method robust because we may want to use it outside of histoqc in our lab.

By the way, did you get the chance to try out this python package? I remember mentioning it in a Friday meeting but do not remember if you found an issue with it.
https://pypi.org/project/cv2geojson/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I will use python-geojson and cv2geojson. I will let you know if there is any issue with using them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! note though that we want to avoid adding dependencies unless necessary.

I would argue that the geojson library is the right tool for the job in this case, but I'm not sure yet with cv2geojson - do you think this second library has much added value for HistoQC?

Also, do you have a sense of how large these packages are?

new_feature['geometry']['type'] = 'Polygon'
new_feature['geometry']['coordinates'].append(points.tolist())
else:
new_feature['geometry']['type'] = 'MultiPolygon'
Copy link
Collaborator

@jacksonjacobs1 jacksonjacobs1 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered last week that MultiPolygon is not actually the appropriate type for representing polygons with holes. Instead, the Polygon type should be used, e.g.,

{
  "type": "Feature",
  "properties": {},
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -120,
          60
        ],
        [
          120,
          60
        ],
        [
          120,
          -60
        ],
        [
          -120,
          -60
        ],
        [
          -120,
          60
        ]
      ],
      [
        [
          -60,
          30
        ],
        [
          60,
          30
        ],
        [
          60,
          -30
        ],
        [
          -60,
          -30
        ],
        [
          -60,
          30
        ]
      ]
    ]
  }
}

... where the first set of coordinates represents the outer ring and the second represents the inner hole.
https://gist.github.com/andrewharvey/971b9db1900a62efa7635f6a6d337ae7#file-polygon-with-a-hole-geojson

Also see this GeoJS sandbox. If you draw a polygon with a hole it will be stored as a "Polygon" type:
https://opengeoscience.github.io/geojs/examples/annotations/

Sorry for the confusion. Can you change the code to only use the Polygon type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We briefly discussed something relevant in #258

But I think multipolygon support should be retained as it is quite common for softwares (e.g., PathML) to have cascaded annotation support - which groups a list of disconnected polygons in a confined region together and depending on user's choice and their use cases an optional multipolygon output can be helpful.

Existing code in annotation module could be a good example for illustration (though it's geojson to mask not the other way around): https://github.com/choosehappy/HistoQC/blob/master/histoqc/annotations/annotation/geojson.py#25

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yufei.

The MultiPolygon support that you are describing would still require a change to the code, to produce e.g.,

{
  "type": "Feature",
  "properties": {},
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [[poly1_ring_coords_array, poly1_hole_coords_array], [poly2_ring_coords_array, poly2_hole_coords_array]]
   }
}

Correct me if I'm wrong, but it looks like @nanli-emory 's code produces a multipolygon like so:

{
  "type": "Feature",
  "properties": {},
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [[poly1_ring_coords_array], [poly1_hole_coords_array]]
   }
}

... where poly1_ring_coords_array and poly1_hole_coords_array are not actually sibling arrays in the nested structure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jacksonjacobs1 , I use geojson lib to generate FeatureCollection. Please review it. Thanks,

@jacksonjacobs1
Copy link
Collaborator

Yes, this looks exactly correct. @CielAl with the current implementation, I think we can just swap Polygon() for MultiPolygon() and the resulting mask should be compatible with the tools you mentioned. Do you agree?

For now though, I'm okay with leaving the code as is. QuickAnnotator should support reading both polygon and multipolygon formats.

@nanli-emory can you confirm you tested that this code produces geojson masks which are readable by QuPath?

@nanli-emory
Copy link
Collaborator Author

nanli-emory commented Dec 13, 2024

Yes, this looks exactly correct. @CielAl with the current implementation, I think we can just swap Polygon() for MultiPolygon() and the resulting mask should be compatible with the tools you mentioned. Do you agree?

For now though, I'm okay with leaving the code as is. QuickAnnotator should support reading both polygon and multipolygon formats.

@nanli-emory can you confirm you tested that this code produces geojson masks which are readable by QuPath?

Hi @jacksonjacobs1, the geojson masks work well on QuPath. I tested on my local. Should I swap Polygon() for MultiPolygon() now?

@CielAl
Copy link
Contributor

CielAl commented Dec 14, 2024

Yes, this looks exactly correct. @CielAl with the current implementation, I think we can just swap Polygon() for MultiPolygon() and the resulting mask should be compatible with the tools you mentioned. Do you agree?

For now though, I'm okay with leaving the code as is. QuickAnnotator should support reading both polygon and multipolygon formats.

@nanli-emory can you confirm you tested that this code produces geojson masks which are readable by QuPath?

Totally agree.

@jacksonjacobs1 jacksonjacobs1 merged commit 74757b9 into choosehappy:master Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants