-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ StreamParsing - Add streaming parsing capacity #12
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
from typing import Any, Dict, List, Protocol | ||
|
||
|
||
class OnValidRowCallback(Protocol): | ||
def __call__(self, index: int, parsed_row: Dict[str, Any], raw_data: Any) -> None: | ||
... | ||
|
||
|
||
class OnInvalidRowCallback(Protocol): | ||
def __call__(self, errors_info: List[Dict[str, Any]], raw_data: Any) -> None: | ||
... |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
import codecs | ||
from abc import ABC, abstractmethod | ||
import csv | ||
|
||
from magicparse import OnInvalidRowCallback, OnValidRowCallback | ||
from .fields import Field | ||
from io import BytesIO | ||
from typing import Any, Dict, List, Tuple, Union, Iterable | ||
|
@@ -75,6 +77,45 @@ def parse(self, data: Union[bytes, BytesIO]) -> Tuple[List[dict], List[dict]]: | |
|
||
return result, errors | ||
|
||
def stream_parse( | ||
self, | ||
data: Union[bytes, BytesIO], | ||
on_valid_parsed_row: OnValidRowCallback, | ||
on_invalid_parsed_row: OnInvalidRowCallback, | ||
) -> None: | ||
if isinstance(data, bytes): | ||
stream = BytesIO(data) | ||
else: | ||
stream = data | ||
|
||
reader = self.get_reader(stream) | ||
|
||
row_number = 0 | ||
if self.has_header: | ||
next(reader) | ||
row_number += 1 | ||
|
||
for row in reader: | ||
errors = [] | ||
row_is_valid = True | ||
item = {} | ||
for field in self.fields: | ||
try: | ||
value = field.read_value(row) | ||
except Exception as exc: | ||
errors.append({"row-number": row_number, **field.error(exc)}) | ||
row_is_valid = False | ||
continue | ||
|
||
item[field.key] = value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Le C/C c'est moyen x) J'aurais bien utiliser au pire la parse tout court avec 0 callback en param non? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Effectivement ça serait mieux que parse appelle stream_parse. Mais il devra passer des callbacks pour construire les paramètres de retour. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ya un autre probleme, c'est que le reader dans stream_parse charge tout, ça me fait mal de pourrir les capacités pour le mode normal (parse), juste pour ne pas C/C |
||
|
||
if row_is_valid: | ||
on_valid_parsed_row(index=row_number, parsed_row=item, raw_data=row) | ||
else: | ||
on_invalid_parsed_row(errors_info=errors, raw_data=row) | ||
|
||
row_number += 1 | ||
|
||
|
||
class CsvSchema(Schema): | ||
def __init__(self, options: Dict[str, Any]) -> None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C'est quoi le besoin exact svp? @antoine-b-smartway @pewho @EwenBALOUIN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Le but est d'agréger plusieurs fichiers d'import d'article en un seul. On n'a pas forcément besoin d'avoir la donnée parsée, on veut juste l'ean et la ligne brute originale pour la remettre dans le fichier de sortie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Et pourquoi vous avez besoin de ça ? (désolé c'est vraiment pour comprendre 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pour par exemple rejouer les X arti d'un mag dans l'ordre.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pourquoi avoir la version brute ? J'ai peut être raté un truc, mais si au final notre fichier va être "tagué/nommé" d'une manière "spécifique", finalement on sait que pour lui on n'a pas besoin d'appliquer de parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c'est l'autre possiblité, mais ça veut dire que l'on fait un parser particulier pour celui ci. Je ne sais pas si c'est ce que l'on veut ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bas justement pas de parseur