Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finalising ami output for forest plots #12

Open
petermr opened this issue Jun 26, 2019 · 1 comment
Open

finalising ami output for forest plots #12

petermr opened this issue Jun 26, 2019 · 1 comment
Assignees

Comments

@petermr
Copy link
Member

petermr commented Jun 26, 2019

ami now carries out most of the required task and its my intention to prototype and test the full functionality in the next few days.
The result of running normami will be a large CTree and a set of html and csv files that can be re-used. The missing functionality includes

  • develop TableExtractor to identify table structure
  • TableExtractor should unify hocr and gocr output to a canonical table format.
  • TableExtractor will attempt to unify the cell content, according to a schema.
  • TableExtractor will apply simply heuristics to detect errors and add @class-based annotation
  • TableExtractor will emit CSV files or Html for the various components of a plot (i.e. possibly several files)
  • Develop GraphExtractor to extract SVGLines from body.graphs
  • Develop ScaleExtractor to extract numeric scales
  • apply the results of GraphExtractor and ScaleExtractor to convert to a CSV with user coordinates.
  • synchronise tables and graphs to determine consistency of horizontal content lines
  • provide an aggregate view of gocr, hocr and graph values.
  • extract and parse summary data in tables (e.g. Overall P values).
  • allow parameterisation of hocr and gocr as far as I understand it. (e.g. to prepare argument lists with whitelists. However both programs are very poorly documented, fragile and I shall not research this. I may open Issues showing the possible tasks.

This data should then be sufficient for repurposing for clients.

PMR output will be CSV and HTML that try to replicate what is visit on the screens, with some indications of reliability.

== What PMR will not currently do ==

  • domain-specific analysis of results.
  • customisation of use
  • client-facing documentations
  • refinement of image analysis parameters
  • creation of corpora
  • develop JS, containers, servers for this project
  • implement software on client site.
  • respond to alternative corpora.
  • write a clean facility for normami (there is a lot of potential output from a run, especially when different parameters are being used.)

== What PMR will do ==

  • attempt to fix runtime bugs
  • mentor CG and MD on how to run programs
@petermr petermr self-assigned this Jun 26, 2019
@petermr
Copy link
Member Author

petermr commented Jun 26, 2019

Please comment on the lists - if I have forgotten or misrepresented anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant