Skip to content

Commit 8e1e464

Browse files
authored
Merge pull request #34 from rkingsbury/sanitize_formula
Solution: sanitize all formulas
2 parents 0d1df5e + ba08760 commit 8e1e464

8 files changed

+161
-70
lines changed

CHANGELOG.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,24 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [Unreleased]
8+
## [0.7.0] - 2023-08-22
9+
10+
### Changed
11+
12+
- `Solution` now more robustly converts any user-supplied formulas into unique values using `pymatgen.core.ion.Ion.reduced_formula`. This means that the `.components` or `solvent` attributes may now differ slightly from whatever is entered during `__init__`. For example, `Solution(solvent='H2O').solvent` gives `H2O(aq)`. This behavior resolved a small bug that could occur when mixing solutions. User supplied formulas passed to `get_amount` or `Solution.components[xxx]` can still be any valid formula. E.g., `Solution.components["Na+"]`, `Solution.components["Na+1"]`, and `Solution.components["Na[+]"]` will all return the same thing.
13+
14+
## [0.6.1] - 2023-08-22
915

1016
### Added
1117

18+
- `Solution`: enable passing an `EOS` instance to the `engine` kwarg.
1219
- `Solution`: new properties `total_dissolved_solids` and alias `TDS`
1320
- `Solution`: support new units in `get_amount` - ppm, ppb, eq/L, etc.
1421
- `Solution`: implemented arithmetic operations `+` (for mixing two solutions), `*` and `\` for scaling their amounts
1522

1623
### Changed
1724

25+
1826
- `pyEQL.unit` was renamed to `pyEQL.ureg` (short for `UnitRegistry`) for consistency with the `pint` documentation and tutorials.
1927

2028
## [v0.6.0] - 2023-08-15

docs/chemistry.md

+34-23
Original file line numberDiff line numberDiff line change
@@ -10,47 +10,58 @@ calculations.
1010
## How to Enter Valid Chemical Formulas
1111

1212
Generally speaking, type the chemical formula of your solute the "normal" way
13-
and `pyEQL` should be able to inerpret it. Internally, `pyEQL` uses the [`pymatgen.core.ion.Ion`](https://pymatgen.org/pymatgen.core.html#pymatgen.core.ion.Ion)
14-
class to "translate" chemical formulas into a consistent format. Anything that the `Ion` class can understand will
15-
be processed into a valid formula by `pyEQL`.
13+
and `pyEQL` should be able to inerpret it. Internally, `pyEQL` uses a utility function `pyEQL.utils.standardize_formula`
14+
to process all formulas into a standard form. At present, this is done by passing the formula through the
15+
[`pymatgen.core.ion.Ion`](https://pymatgen.org/pymatgen.core.html#pymatgen.core.ion.Ion) class. Anything that the `Ion`
16+
class can understand will be processed into a valid formula by `pyEQL`.
1617

1718
Here are some examples:
1819

1920
| Substance | You enter | `pyEQL` understands |
2021
| :--- | :---: | :---: |
21-
| Sodium Chloride | "NaCl", "NaCl(aq)", or "ClNa" | "NaCl(aq)" |
22-
| Sodium Sulfate | "Na(SO4)2" or "NaS2O8" | "Na(SO4)2(aq)" |
23-
| Sodium Ion | "Na+", "Na+1", "Na1+", or "Na[+]" | "Na[+1]" |
24-
| Magnesium Ion | "Mg+2", "Mg++", or "Mg[++]" | "Mg[+2]" |
25-
| Methanol | "CH3OH", "CH4O" | "'CH3OH(aq)'" |
22+
| Sodium Chloride | "NaCl", "NaCl(aq)", or "ClNa" | "NaCl(aq)" |
23+
| Sodium Sulfate | "Na(SO4)2" or "NaS2O8" | "Na(SO4)2(aq)" |
24+
| Sodium Ion | "Na+", "Na+1", "Na1+", or "Na[+]" | "Na[+1]" |
25+
| Magnesium Ion | "Mg+2", "Mg++", or "Mg[++]" | "Mg[+2]" |
26+
| Methanol | "CH3OH", "CH4O" | "'CH3OH(aq)'" |
2627

27-
Specifically, `pyEQL` uses `Ion.from_formula(<formula>).reduced_formla` (shown in the right hand column of the table) to
28-
identify solutes. Notice that for charged species, the charges are always placed inside square brackets (e.g., `Na[+1]`)
29-
and always include the charge number (even for monovalent ions). Uncharged species are always suffixed by `(aq)` to
30-
disambiguate them from solids.
28+
Specifically, `standardize_formula` uses `Ion.from_formula(<formula>).reduced_formla` (shown in the right hand column
29+
of the table) to identify solutes. Notice that for charged species, the charges are always placed inside square brackets
30+
(e.g., `Na[+1]`) and always include the charge number (even for monovalent ions). Uncharged species are always suffixed
31+
by `(aq)` to disambiguate them from solids.
3132

3233
:::{important}
33-
**When writing multivalent ion formulas, it is strongly recommended that you put the charge number AFTER the + or -
34-
sign** (e.g., type "Mg+2" NOT "Mg2+"). The latter formula is ambiguous - it could mean $`Mg_2^+`$ or $`Mg^{+2}`$
34+
**When writing multivalent ion formulas, it is strongly recommended that you put the charge number AFTER the + or -
35+
sign** (e.g., type "Mg+2" NOT "Mg2+"). The latter formula is ambiguous - it could mean $`Mg_2^+`$ or $`Mg^{+2}`$ and
36+
it will be processed incorrectly into `Mg[+0.5]`
3537
:::
3638

3739
(manual-testing)=
3840
## Manually testing a formula
3941

40-
If you want to make sure `pyEQL` is understanding your formula correctly, you can manually test it via `pymatgen` as
41-
follows:
42+
If you want to make sure `pyEQL` is understanding your formula correctly, you can manually test it as follows:
4243

4344
```
44-
>>> from pymatgen.core.ion import Ion
45-
>>> Ion.from_formula(<your_formula>).reduced_formula
45+
>>> from pyEQL.utils import standardize_formula
46+
>>> standardize_formula(<your_formula>)
4647
...
4748
```
4849

4950
## Formulas you will see when using `Solution`
5051

51-
When using the `Solution` class,
52+
When using the `Solution` class,
5253

53-
- When creating a `Solution`, you can enter chemical formulas in any format you prefer, as long as `pymatgen` can understand it (see [manual testing](#manually-testing-a-formula)).
54-
- The keys (solute formulas) in `Solution.components` are preserved in the same format the user enters them. So if you entered `Na+` for sodium ion, it will stay that way.
55-
- Arguments to `Solution.get_property` can be entered in any format you prefer. When `pyEQL` queries the database, it will automatically convert the formula to the canonical one from `pymatgen`
56-
- Property data in the database is uniquely identified by the canonical ion formula (output of `Ion.from_formula(<formula>).reduced_formla`, e.g. "Na[+1]" for sodium ion).
54+
- When creating a `Solution`, you can enter chemical formulas in any format you prefer, as long as `standardize_formula` can understand it (see [manual testing](#manually-testing-a-formula)).
55+
- The keys (solute formulas) in `Solution.components` are standardized. So if you entered `Na+` for sodium ion, it will appear in `components` as `Na[+1]`.
56+
- However, the `components` attribute is a special dictionary that automatically standardizes formulas when accessed. So, you can still enter the formula
57+
however you want. For example, the following all access or modify the same element in `components`:
58+
59+
```
60+
Solution.components.get('Na+')
61+
Solution.components["Na+1"]
62+
Solution.components.update("Na[+]": 2)
63+
Solution.components["Na[+1]"]
64+
```
65+
66+
- Arguments to `Solution.get_property` can be entered in any format you prefer. When `pyEQL` queries the database, it will automatically standardize the formula.
67+
- Property data in the database is uniquely identified by the standardized ion formula (output of `Ion.from_formula(<formula>).reduced_formla`, e.g. "Na[+1]" for sodium ion).

src/pyEQL/engines.py

+6-16
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,6 @@
77
"""
88
from abc import ABC, abstractmethod
99

10-
from pymatgen.core.ion import Ion
11-
1210
# internal pyEQL imports
1311
import pyEQL.activity_correction as ac
1412

@@ -17,6 +15,7 @@
1715
from pyEQL import ureg
1816
from pyEQL.logging_system import logger
1917
from pyEQL.salt_ion_match import generate_salt_list
18+
from pyEQL.utils import standardize_formula
2019

2120

2221
class EOS(ABC):
@@ -192,7 +191,7 @@ def get_activity_coefficient(self, solution, solute):
192191
"""
193192
# identify the predominant salt that this ion is a member of
194193
Salt = None
195-
rform = Ion.from_formula(solute).reduced_formula
194+
rform = standardize_formula(solute)
196195
salt_list = generate_salt_list(solution, unit="mol/kg")
197196
for item in salt_list:
198197
if rform == item.cation or rform == item.anion:
@@ -456,18 +455,9 @@ def get_solute_volume(self, solution):
456455
"""Return the volume of the solutes."""
457456
# identify the predominant salt in the solution
458457
salt = solution.get_salt()
459-
# reverse-convert the sanitized formula back to whatever was in self.components
460-
for i in solution.components:
461-
rform = Ion.from_formula(i).reduced_formula
462-
if rform == salt.cation:
463-
cation = i
464-
if rform == salt.anion:
465-
anion = i
466-
467458
solute_vol = ureg.Quantity("0 L")
468459

469460
# use the pitzer approach if parameters are available
470-
471461
pitzer_calc = False
472462

473463
param = solution.get_property(salt.formula, "model_parameters.molar_volume_pitzer")
@@ -476,7 +466,7 @@ def get_solute_volume(self, solution):
476466
# this is necessary for solutions inside e.g. an ion exchange
477467
# membrane, where the cation and anion concentrations may be
478468
# unequal
479-
molality = (solution.get_amount(cation, "mol/kg") + solution.get_amount(anion, "mol/kg")) / 2
469+
molality = (solution.get_amount(salt.cation, "mol/kg") + solution.get_amount(salt.anion, "mol/kg")) / 2
480470

481471
# determine alpha1 and alpha2 based on the type of salt
482472
# see the May reference for the rules used to determine
@@ -512,8 +502,8 @@ def get_solute_volume(self, solution):
512502
solute_vol += (
513503
apparent_vol
514504
* (
515-
solution.get_amount(cation, "mol") / salt.nu_cation
516-
+ solution.get_amount(anion, "mol") / salt.nu_anion
505+
solution.get_amount(salt.cation, "mol") / salt.nu_cation
506+
+ solution.get_amount(salt.anion, "mol") / salt.nu_anion
517507
)
518508
/ 2
519509
)
@@ -530,7 +520,7 @@ def get_solute_volume(self, solution):
530520
continue
531521

532522
# ignore the salt cation and anion, if already accounted for by Pitzer
533-
if pitzer_calc is True and solute in [anion, cation]:
523+
if pitzer_calc is True and solute in [salt.anion, salt.cation]:
534524
continue
535525

536526
part_vol = solution.get_property(solute, "size.molar_volume")

src/pyEQL/salt_ion_match.py

+6-5
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from pymatgen.core.ion import Ion
1515

1616
from pyEQL.logging_system import logger
17+
from pyEQL.utils import standardize_formula
1718

1819

1920
class Salt:
@@ -44,9 +45,9 @@ def __init__(self, cation, anion):
4445
# create pymatgen Ion objects
4546
pmg_cat = Ion.from_formula(cation)
4647
pmg_an = Ion.from_formula(anion)
47-
# sanitize the cation and anion formulas
48-
self.cation = pmg_cat.reduced_formula
49-
self.anion = pmg_an.reduced_formula
48+
# standardize the cation and anion formulas
49+
self.cation = standardize_formula(cation)
50+
self.anion = standardize_formula(anion)
5051

5152
# get the charges on cation and anion
5253
self.z_cation = pmg_cat.charge
@@ -166,12 +167,12 @@ def identify_salt(sol):
166167
anion = "OH-"
167168

168169
# return water if there are no solutes
169-
if len(sort_list) < 3 and sort_list[0] == "H2O":
170+
if len(sort_list) < 3 and sort_list[0] == "H2O(aq)":
170171
logger.info("Salt matching aborted because there are not enough solutes.")
171172
return Salt(cation, anion)
172173

173174
# warn if something other than water is the predominant component
174-
if sort_list[0] != "H2O":
175+
if sort_list[0] != "H2O(aq)":
175176
logger.warning("H2O is not the most prominent component")
176177

177178
# take the dominant cation and anion and assemble a salt from them

src/pyEQL/solution.py

+14-18
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
# logging system
2727
from pyEQL.logging_system import logger
2828
from pyEQL.salt_ion_match import generate_salt_list, identify_salt
29+
from pyEQL.utils import FormulaDict, standardize_formula
2930

3031
EQUIV_WT_CACO3 = 100.09 / 2 * ureg.Quantity("g/mol")
3132

@@ -126,7 +127,7 @@ def __init__(
126127

127128
# create an empty dictionary of components. This dict comprises {formula: moles}
128129
# where moles is the number of moles in the solution.
129-
self.components: dict = {}
130+
self.components = FormulaDict({})
130131

131132
# connect to the desired property database
132133
if database is None:
@@ -163,7 +164,7 @@ def __init__(
163164
raise ValueError("Multiple solvents are not yet supported!")
164165
if solvent[0] not in ["H2O", "H2O(aq)", "water", "Water", "HOH"]:
165166
raise ValueError("Non-aqueous solvent detected. These are not yet supported!")
166-
self.solvent = solvent[0]
167+
self.solvent = standardize_formula(solvent[0])
167168

168169
# TODO - do I need the ability to specify the solvent mass?
169170
# # raise an error if the solvent volume has also been given
@@ -415,7 +416,7 @@ def dielectric_constant(self) -> Quantity:
415416
denominator = 1
416417
for item in self.components:
417418
# ignore water
418-
if item != "H2O":
419+
if item != "H2O(aq)":
419420
# skip over solutes that don't have parameters
420421
# try:
421422
fraction = self.get_amount(item, "fraction")
@@ -499,10 +500,6 @@ def viscosity_kinematic(self) -> Quantity:
499500
"""
500501
# identify the main salt in the solution
501502
salt = self.get_salt()
502-
# reverse-convert the sanitized formula back to whatever was in self.components
503-
for i in self.components:
504-
if Ion.from_formula(i).reduced_formula == salt.cation:
505-
cation = i
506503

507504
a0 = a1 = b0 = b1 = 0
508505

@@ -532,7 +529,8 @@ def viscosity_kinematic(self) -> Quantity:
532529
MW_w = ureg.Quantity(self.get_property(self.solvent, "molecular_weight"))
533530

534531
# calculate the cation mole fraction
535-
x_cat = self.get_amount(cation, "fraction")
532+
# x_cat = self.get_amount(cation, "fraction")
533+
x_cat = self.get_amount(salt.cation, "fraction")
536534

537535
# calculate the kinematic viscosity
538536
nu = math.log(nu_w * MW_w / MW) + 15 * x_cat**2 + x_cat**3 * G_123 + 3 * x_cat * G_23 * (1 - 0.05 * x_cat)
@@ -730,9 +728,7 @@ def alkalinity(self) -> Quantity:
730728
acid_anions = {"Cl[-1]", "Br[-1]", "I[-1]", "SO4[-2]", "NO3[-1]", "ClO4[-1]", "ClO3[-1]"}
731729

732730
for item in self.components:
733-
# sanitize the formulas
734-
rform = Ion.from_formula(item).reduced_formula
735-
if rform in base_cations.union(acid_anions):
731+
if item in base_cations.union(acid_anions):
736732
z = self.get_property(item, "charge")
737733
alkalinity += self.get_amount(item, "mol/L") * z
738734

@@ -780,7 +776,7 @@ def total_dissolved_solids(self) -> Quantity:
780776
tds = ureg.Quantity("0 mg/L")
781777
for s in self.components:
782778
# ignore pure water and dissolved gases, but not CO2
783-
if s in ["H2O", "H+", "OH-", "H2", "O2"]:
779+
if s in ["H2O(aq)", "H[+1]", "OH[-1]"]:
784780
continue
785781
tds += self.get_amount(s, "mg/L")
786782

@@ -1565,7 +1561,7 @@ def get_activity(
15651561
15661562
"""
15671563
# switch to the water activity function if the species is H2O
1568-
if solute == "H2O" or solute == "water":
1564+
if solute in ["H2O(aq)", "water", "H2O", "HOH"]:
15691565
activity = self.get_water_activity()
15701566
else:
15711567
# determine the concentration units to use based on the desired scale
@@ -1659,7 +1655,7 @@ def get_water_activity(self) -> Quantity:
16591655

16601656
concentration_sum = 0
16611657
for item in self.components:
1662-
if item == "H2O":
1658+
if item == "H2O(aq)":
16631659
pass
16641660
else:
16651661
concentration_sum += self.get_amount(item, "mol/kg").magnitude
@@ -1755,8 +1751,8 @@ def _get_property(self, solute: str, name: str) -> Any | None:
17551751
base_temperature = ureg.Quantity("25 degC")
17561752
# base_pressure = ureg.Quantity("1 atm")
17571753

1758-
# query the database using the sanitized formula
1759-
rform = Ion.from_formula(solute).reduced_formula
1754+
# query the database using the standardized formula
1755+
rform = standardize_formula(solute)
17601756
# TODO - there seems to be a bug in mongomock / JSONStore wherein properties does
17611757
# not properly return dot-notation fields, e.g. size.molar_volume will not be returned.
17621758
# also $exists:True does not properly return dot notated fields.
@@ -2125,7 +2121,7 @@ def __add__(self, other: Solution):
21252121

21262122
# retrieve the amount of each component in the parent solution and
21272123
# store in a list.
2128-
mix_species = {}
2124+
mix_species = FormulaDict({})
21292125
for sol, amt in self.components.items():
21302126
mix_species.update({sol: f"{amt} mol"})
21312127
for sol2, amt2 in other.components.items():
@@ -2150,7 +2146,7 @@ def __add__(self, other: Solution):
21502146

21512147
# create a new solution
21522148
return Solution(
2153-
mix_species,
2149+
mix_species.data, # pass a regular dict instead of the FormulaDict
21542150
volume=str(mix_vol),
21552151
pressure=str(mix_pressure),
21562152
temperature=str(mix_temperature.to("K")),

src/pyEQL/utils.py

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
"""
2+
pyEQL utilities
3+
4+
:copyright: 2023 by Ryan S. Kingsbury
5+
:license: LGPL, see LICENSE for more details.
6+
7+
"""
8+
9+
from collections import UserDict
10+
11+
from pymatgen.core.ion import Ion
12+
13+
14+
def standardize_formula(formula: str):
15+
"""
16+
Convert a chemical formula into standard form.
17+
18+
Args:
19+
formula: the chemical formula to standardize.
20+
21+
Returns:
22+
A standardized chemical formula
23+
24+
Raises:
25+
ValueError if `formula` cannot be processed or is invalid.
26+
27+
Notes:
28+
Currently this method standardizes formulae by passing them through pymatgen.core.ion.Ion.reduced_formula(). For ions, this means that 1) the
29+
charge number will always be listed explicitly and 2) the charge number will be enclosed in square brackets to remove any ambiguity in the meaning of the formula. For example, 'Na+', 'Na+1', and 'Na[+]' will all
30+
standardize to "Na[+1]"
31+
"""
32+
return Ion.from_formula(formula).reduced_formula
33+
34+
35+
class FormulaDict(UserDict):
36+
"""
37+
Automatically converts keys on get/set using pymatgen.core.Ion.from_formula(key).reduced_formula.
38+
39+
This allows getting/setting/updating of Solution.components using flexible
40+
formula notation (e.g., "Na+", "Na+1", "Na[+]" all have the same effect)
41+
"""
42+
43+
def __getitem__(self, key):
44+
return super().__getitem__(standardize_formula(key))
45+
46+
def __setitem__(self, key, value):
47+
super().__setitem__(standardize_formula(key), value)
48+
49+
def __delitem__(self, key):
50+
super().__delitem__(standardize_formula(key))

0 commit comments

Comments
 (0)