Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas DataFrame Reader, additional example. #971

Open
apiszcz opened this issue Mar 23, 2024 · 2 comments
Open

Pandas DataFrame Reader, additional example. #971

apiszcz opened this issue Mar 23, 2024 · 2 comments

Comments

@apiszcz
Copy link

apiszcz commented Mar 23, 2024

Is it possible to document an example for the Pandas Dataframe reader that reads files from folder that has continue new data files added?
https://stonesoup.readthedocs.io/en/v1.2/auto_examples/Custom_Pandas_Dataloader.html#dataframe-detection-reader

In the DataFrame Detection Reader section, can the instantiation be changed from dataframe=truth_df to a dataframe generator? dataframe=dataframe_generator. Where the dataframe_generator would read csv files and return the dataframe object.

@apiszcz
Copy link
Author

apiszcz commented Mar 23, 2024

I made another endless loop inside of detections_gen, testing now ..
new attribute datapath and new section is while True until the link for row in.
in this case dataframes are stored as pkl, however csv, etc should be fine/slower.

class DataFrameDetectionReader2(DetectionReader, _DataFrameReader):
    from stonesoup.base import Property
    from stonesoup.buffered_generator import BufferedGenerator
    from stonesoup.types.detection import Detection
    """A custom detection reader for DataFrames containing detections.

    DataFrame must have headers with the appropriate fields needed to generate
    the detection. Detections at the same time are yielded together, and such assume file is in
    time order.

    Parameters
    ----------
    """
    # dataframe: pd.DataFrame = Property(doc="DataFrame containing the detection data.")
    datapath: str = Property(doc="Path to dataframes.")

    @BufferedGenerator.generator_method
    def detections_gen(self):
        while True:
            for ipath in sorted([str(p) for p in pathlib.Path(self.datapath).glob('*.pkl')]):
                data_lockfile = ipath.replace('.pkl', '.lock')
                data_lock = FileLock(data_lockfile)
                with data_lock.acquire():
                    with open(ipath, 'rb') as ipkl:
                        data = pickle.load(ipkl)
                    self.dataframe = data

                detections = set()
                previous_time = None

                for row in self.dataframe.to_dict(orient="records"):

                    time = self._get_time(row)
                    if previous_time is not None and previous_time != time:
                        yield previous_time, detections
                        detections = set()
                    previous_time = time

                    detections.add(self.Detection(
                        np.array([[row[col_name]] for col_name in self.state_vector_fields],
                                 dtype=np.float64),
                        timestamp=time,
                        metadata=self._get_metadata(row)))

                # Yield remaining
                yield previous_time, detections
            time.sleep(1)

@sdhiscocks
Copy link
Member

More examples are always welcome. It can be challenging with data readers as format and structures of data is very variable, so often need bespoke code for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants