Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from microsoft:main #11

Merged
merged 196 commits into from
Nov 19, 2024
Merged

[pull] main from microsoft:main #11

merged 196 commits into from
Nov 19, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Jul 28, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

darthtrevino and others added 7 commits July 26, 2024 09:11
add user input to history tracking
* Update caching llm to use history inputs

* formatting

* linting

* update glean sections to have continuous history
* add encoding model to text-chunking config

* revert groupby fix, handled in other pr

* revert environment reader update for other pr
* Add encoding-model configuration to entity & claim extraction

* add change note

* pr updates

* test fix

* disable GH-based smoke tests
@pull pull bot added the ⤵️ pull label Jul 28, 2024
dependabot bot and others added 22 commits July 29, 2024 14:34
Bumps [actions/stale](https://github.com/actions/stale) from 5 to 9.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](actions/stale@v5...v9)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [openai](https://github.com/openai/openai-python) from 1.37.0 to 1.37.1.
- [Release notes](https://github.com/openai/openai-python/releases)
- [Changelog](https://github.com/openai/openai-python/blob/main/CHANGELOG.md)
- [Commits](openai/openai-python@v1.37.0...v1.37.1)

---
updated-dependencies:
- dependency-name: openai
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <[email protected]>
* system -> assistant

* semver
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.3.1 to 8.3.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@8.3.1...8.3.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <[email protected]>
Switch the logic to only look for awaiting_response label
* fixed default entity extraction prompts

* minor changes and formatting

* add missing parenthesis and changelog

* Updating dictionary

---------

Co-authored-by: Alonso Guevara <[email protected]>
Bumps [textual](https://github.com/Textualize/textual) from 0.72.0 to 0.74.0.
- [Release notes](https://github.com/Textualize/textual/releases)
- [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md)
- [Commits](Textualize/textual@v0.72.0...v0.74.0)

---
updated-dependencies:
- dependency-name: textual
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <[email protected]>
Bumps [lancedb](https://github.com/lancedb/lancedb) from 0.10.2 to 0.11.0.
- [Release notes](https://github.com/lancedb/lancedb/releases)
- [Changelog](https://github.com/lancedb/lancedb/blob/main/release_process.md)
- [Commits](lancedb/lancedb@python-v0.10.2...python-v0.11.0)

---
updated-dependencies:
- dependency-name: lancedb
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [poethepoet](https://github.com/nat-n/poethepoet) from 0.26.1 to 0.27.0.
- [Release notes](https://github.com/nat-n/poethepoet/releases)
- [Commits](nat-n/poethepoet@v0.26.1...v0.27.0)

---
updated-dependencies:
- dependency-name: poethepoet
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
#677)

* added default title_column and collection_name values for workflows using the vector store option

* update poetry lockfile

* fixed ruff formatting

* ran semversioner

---------

Co-authored-by: Gabriel Nieves-Ponce <[email protected]>
Co-authored-by: Alonso Guevara <[email protected]>
Co-authored-by: Josh Bradley <[email protected]>
* added default title_column and collection_name values for workflows using the vector store option

* incorporated vector database support to the query client

* Updated docuemnatation to reflect the new query client param.

* Fixed ruff formatting

* added new poetry lock file

---------

Co-authored-by: Gabriel Nieves-Ponce <[email protected]>
Co-authored-by: Alonso Guevara <[email protected]>
fix and refactor community context builder

Co-authored-by: Alonso Guevara <[email protected]>
* Update prompts in prompt tune

* Update prompt tuning meta prompts

* Semver

* Formatting

* Update examples
* fixed json issue

* change to use try_parse_json_object onlu

* pyproject add json-repair

* add check extra description before and after json object

* json.loads() before repire_json, based on jbradley1 suggestion.

* Fix json parsing and formatting

* semver

* Nicer tuple parsing

---------

Co-authored-by: paulg <[email protected]>
* Fix embeddings loading on local search cli

* Update lockfile

* Update rules in ruff check
* fix json parsing logic and warning message

* amended warning message

---------

Co-authored-by: Alonso Guevara <[email protected]>
* Only repair broken reponses

* Format
* add a check for empty context

* remove log and format code

* add changelog

---------

Co-authored-by: Alonso Guevara <[email protected]>
* Remove outdated references to entity resolution

* Clarify covariate extraction

* Minor edits from other PR feedback

* Remove duplicate line

* Semver

---------

Co-authored-by: Alonso Guevara <[email protected]>
* add smoke tests again

* add smoke tests separated action

* add patch version

* disable blob test

* blob conn again

* add file as cache type

* remove cache type enterely

* increase timeout

* remove comment

---------

Co-authored-by: Alonso Guevara <[email protected]>
* Run smoke tests on 4o

* Shorten dulce for smoke tests

* Update secrets for consistency
darthtrevino and others added 29 commits October 30, 2024 14:49
* move mkdocs-typer to devdeps

* add .gitattributes for toml parsing issues on Windows CI

* bump timeout

---------

Co-authored-by: Alonso Guevara <[email protected]>
* New workflow to generate embeddings in a single workflow

* New workflow to generate embeddings in a single workflow

* version change

* clean tests without any embeddings references

* clean tests without any embeddings references

* remove code

* feedback implemented

* changes in logic

* feedback implemented

* store in table bug fixed

* smoke test for generate_text_embeddings workflow

* smoke test fix

* add generate_text_embeddings to the list of transient workflows

* smoke tests

* fix

* ruff formatting updates

* fix

* smoke test fixed

* smoke test fixed

* fix lancedb import

* smoke test fix

* ignore sorting

* smoke test fixed

* smoke test fixed

* check smoke test

* smoke test fixed

* change config for vector store

* format fix

* vector store changes

* revert debug profile back to empty filepath

* merge conflict solved

* merge conflict solved

* format fixed

* format fixed

* fix return dataframe

* snapshot fix

* format fix

* embeddings param implemented

* validation fixes

* fix map

* fix map

* fix properties

* config updates

* smoke test fixed

* settings change

* Update collection config and rework back-compat

* Repalce . with - for embedding store

---------

Co-authored-by: Alonso Guevara <[email protected]>
Co-authored-by: Josh Bradley <[email protected]>
Co-authored-by: Nathan Evans <[email protected]>
* Make base_entity_graph transient

* Add transient snapshots

* Semver

* Fix unit test

* Fix smoke tests
#1356)

* Updated the variable names within the for-loop to differentiate between them and the original title variable used in the dataframe. This avoids corrupting the original column-name defined in the title variable.

* Semver and formart

---------

Co-authored-by: Gabriel Nieves-Ponce <[email protected]>
Co-authored-by: Alonso Guevara <[email protected]>
* Drift CLI and backwards compat

* Adding DRIFT Cli, Docs and example notebook

* Update tests and fix ruff

* Format

* Small cleanup

* Fix smoke tests

* Update notebook

* Oopsie fix

* Delete duplicate img
* Fix init defaults for vector store and img in drift docs

* Adde more doc

* Spellcheck

* Remove example
* Release v0.4.0

* Missing change track
* Fix a file paths issue in the viz guide.

* fix formatting
* Fix optional covariates check in incremental indexing

* Oopsie fix
* Raise error on empty deltas for incremental indexing

* Format
* fix streaming output error

* add semversioner

---------

Co-authored-by: Alonso Guevara <[email protected]>
* Add update cli option with default storage

* Semver

* Semver

* Pyright

* Format
* Release v0.4.1

* Spellcheck
* update gitignore

* add dynamic community sleection to updated main branch

* update SearchResult to record output_tokens.

* update search result

* dynamic search working

* format

* add llm_calls_categories and prompt_tokens and output_tokens cate

* update

* formatting

* log drift search output and prompt tokens separately

* update global_search.ipynb. update operate dulce dataset and add create_final_communities. update dynamic community selection init

* add .ipynb back to cspell.config.yaml

* format

* add notebook example on dynamic search

* rearrange

* update gitignore

* format code

* code format

* code format

* fix default variable

---------

Co-authored-by: Bryan Li <[email protected]>
* Add source documents for verb tests

* Remove entity_type erroneous column

* Add new test data

* Remove source/target degree columns

* Remove top_level_node_id

* Remove chunk column configs

* Rename "chunk" to "text"

* Rename "chunk" to "text" in base

* Re-map document input to use base text units

* Revert base text units as final documents dep

* Update test data

* Split/rename node source_id

* Drop node size (dup of degree)

* Drop document_ids from covariates

* Remove unused document_ids from models

* Remove n_tokens from covariate table

* Fix missed document_ids delete

* Wire base text units to final documents

* Rename relationship rank as combined_degree

* Add rank as first-class property to Relationship

* Remove split_text operation

* Fix relationships test parquet

* Update test parquets

* Add entity ids to community table

* Remove stored graph embedding columns

* Format

* Semver

* Fix JSON typo

* Spelling

* Rename lancedb

* Sort lancedb

* Fix unit test

* Fix test to account for changing period

* Update tests for separate embeddings

* Format

* Better assertion printing

* Fix unit test for windows

* Rename document.raw_content -> document.text

* Remove read_documents function

* Remove unused document summary from model

* Remove unused imports

* Format

* Add new snapshots to default init

* Use util to construct embeddings collection name

* Align inc index model with branch changes

* Update data and tests for int ids

* Clean up embedding locs

* Switch entity "name" to "title" for consistency

* Fix short_id -> human_readable_id defaults

* Format

* Rework community IDs

* Fix community size compute

* Fix unit tests

* Fix report read

* Pare down nodes table output

* Fix unit test

* Fix merge

* Fix community loading

* Format

* Fix community id report extraction

* Update tests

* Consistent short IDs and ordering

* Update ordering and tests

* Update incremental for new nodes model

* Guard document columns loc

* Match column ordering

* Fix document guard

* Update smoke tests

* Fill NA on community extract

* Logging for smoke test debug

* Add parquet schema details doc

* Fix community hierarchy guard

* Use better empty hierarchy guard

* Back-compat shims

* Semver

* Fix warning

* Format

* Remove default fallback

* Reuse key
* Move indexing prompts to root

* Move query prompts to root

* Export query prompts during init

* Extract general knowledge prompt

* Load query prompts from disk

* Semver

* Fix unit tests
Add Parquet as part of the default emitters when not pressent
…ift search documentation (#1383)

Updated the wording of the example scenario from "global search" to "drift search" to accurately reflect the topic. This improves clarity and ensures the documentation accurately describes its content.

Co-authored-by: Alonso Guevara <[email protected]>
* Fix footer contrast

* Fix broken links

* Remove a few unneeded examples

* Point python API example to the whole folder

* Convert schema bullets to tables
* Firsst cut at config cleanup

* Reorder top nav

* Add query prompts to tuning page

* Remove dynamic notebook from nav

* Add more thorough yml config descriptions in docs

* Further clean out the config

* Semver

* Add new blog post

* Emphasize yaml

* Clarify output

* Fix unit test

* Fix bullet nesting
@akollegger akollegger merged commit d5e7b97 into graphrag:main Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.