Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llm extraction schema #32

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

Llm extraction schema #32

wants to merge 37 commits into from

Conversation

sherjeelshabih
Copy link
Collaborator

@sherjeelshabih sherjeelshabih commented Nov 15, 2024

I got most of the suggestions implemented.

@hampusnasstrom antisolvent quenching is something you should give feedback on. Jesper's suggestion:

Gas quenching and antisolvent quenching
Right now, we have qas quenching as a Boolean field.
We do not have a field for antisolvent quenching. Have we decided that we should add that as a separate deposition step?

For now, my suggestion would be to add a Boolean filed for antisolvent quenching, and add an instruction that it should be added as a separate deposition step. In the long run (i.e. not in this project right now) we would probably implement processing steps as separate modules depending on the deposition method, so that antisolvent details only are asked for when the deposition method is spin-coating.

Summary by Sourcery

Introduce a new schema package and application for LLM extraction in the perovskite solar cell database, enhancing data extraction and user interaction capabilities. Update data types and units for existing fields to improve flexibility and accuracy.

New Features:

  • Introduce a new schema package for LLM extraction in the perovskite solar cell database, enabling structured data extraction and management.
  • Add a new application entry point for exploring LLM extracted solar cells, providing a user interface for searching and filtering extracted data.

Enhancements:

  • Change the type of the 'coefficient' field in the PerovskiteIonComponent class from float to string to accommodate more flexible input formats.
  • Update the default display unit for the 'concentration' field in the Impurity class to 'cm^-3'.

Copy link

sourcery-ai bot commented Nov 15, 2024

Reviewer's Guide by Sourcery

This PR introduces a new schema package for LLM (Large Language Model) extraction of perovskite solar cell data. The changes include modifications to the composition handling, addition of new schema classes for various solar cell components, and integration of a new app for exploring LLM-extracted data.

Class diagram for LLM Extraction Schema

classDiagram
    class LLMSchemaExtractionPackageEntryPoint {
        +load()
    }
    class PerovskiteIonComponent {
        +coefficient: str
    }
    class Impurity {
        +concentration: float
        +a_eln: ELNAnnotation
    }
    class LLMExtractedPerovskiteSolarCell {
        +DOI_number: str
        +perovskite_composition: PerovskiteComposition
        +device_architecture: MEnum
        +pce: float
        +jsc: float
        +voc: float
        +ff: float
        +active_area: float
        +number_devices: int
        +averaged_quantities: bool
        +light_source: LightSource
        +bandgap: float
        +encapsulated: bool
        +stability: Stability
        +layers: Layer
        +layer_order: str
        +normalize(archive, logger)
    }
    class PerovskiteComposition {
        +abbreviation: str
        +coefficient: str
    }
    class LightSource {
        +type: MEnum
        +description: str
        +light_intensity: float
        +lamp: str
    }
    class Stability {
        +time: float
        +light_intensity: float
        +humidity: float
        +temperature: float
        +PCE_T80: float
        +PCE_at_start: float
        +PCE_after_1000_hours: float
        +PCE_at_end: float
        +potential_bias: MEnum
    }
    class Layer {
        +name: str
        +thickness: float
        +functionality: MEnum
        +deposition: ProcessingStep
        +additional_treatment: str
    }
    class ProcessingStep {
        +step_name: str
        +method: str
        +atmosphere: MEnum
        +temperature: float
        +duration: float
        +gas_quenching: bool
        +antisolvent_quenching: bool
        +solution: ReactionSolution
        +additional_parameters: JSON
    }
    class ReactionSolution {
        +solutes: Solute
        +volume: float
        +temperature: float
        +solvents: Solvent
    }
    class Solute {
        +name: str
        +concentration: float
        +concentration_unit: MEnum
    }
    class Solvent {
        +name: str
        +ratio: float
    }
    LLMSchemaExtractionPackageEntryPoint --> PerovskiteIonComponent
    LLMSchemaExtractionPackageEntryPoint --> Impurity
    LLMSchemaExtractionPackageEntryPoint --> LLMExtractedPerovskiteSolarCell
    LLMExtractedPerovskiteSolarCell --> PerovskiteComposition
    LLMExtractedPerovskiteSolarCell --> LightSource
    LLMExtractedPerovskiteSolarCell --> Stability
    LLMExtractedPerovskiteSolarCell --> Layer
    Layer --> ProcessingStep
    ProcessingStep --> ReactionSolution
    ReactionSolution --> Solute
    ReactionSolution --> Solvent
Loading

File-Level Changes

Change Details Files
Changed coefficient type in PerovskiteIonComponent from float to string to support more flexible stoichiometric notation
  • Modified coefficient type from float to string
  • Updated normalization logic to handle string coefficients
  • Added support for expressions like '1-x' in coefficients
src/perovskite_solar_cell_database/composition.py
Created new schema package for LLM extraction of solar cell data
  • Added base SectionRevision class with review tracking
  • Created classes for various solar cell components (Ion, LightSource, Solute, etc.)
  • Implemented comprehensive schema for perovskite solar cell data extraction
  • Added stability tracking and measurement parameters
src/perovskite_solar_cell_database/llm_extraction_schema.py
Added new app for exploring LLM-extracted solar cell data
  • Created new app entry point for LLM extracted data
  • Configured custom columns and filters for the app
  • Added dashboard with visualization widgets
  • Set up filter menus for data exploration
src/perovskite_solar_cell_database/apps/__init__.py
src/perovskite_solar_cell_database/apps/llm_extracted_solarcells.py
Updated package configuration and entry points
  • Added new LLMSchemaExtractionPackageEntryPoint
  • Registered llm_extraction_schema in pyproject.toml
src/perovskite_solar_cell_database/__init__.py
pyproject.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sherjeelshabih - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -831,12 +831,10 @@ def normalize(self, archive: 'EntryArchive', logger: 'BoundLogger') -> None:
self.short_form += ion.abbreviation
if ion.coefficient is None:
continue
if ion.coefficient == 1:
if ion.coefficient == '1':
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

  • Replace if statement with if expression (assign-if-exp)
  • Simplify conditional into switch-like form [×2] (switch)

Comment on lines +13 to +14
if TYPE_CHECKING:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Remove redundant conditional (remove-redundant-if)

Pepe-Marquez and others added 27 commits November 19, 2024 15:45
* feat: update schema to new LLM version

* Added review base section which needs to be included in every section.

* add additional field descriptions

* Fixed typos and some other things

---------

Co-authored-by: Pepe Marquez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants