Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Introducing vector and text search #9345

Merged
merged 14 commits into from
Nov 6, 2024
Merged

Conversation

eavanvalkenburg
Copy link
Member

@eavanvalkenburg eavanvalkenburg commented Oct 21, 2024

Motivation and Context

This PR does the following things:

  • Introduces TextSearch abstractions, including implementation for Bing
    • This consists of the TextSearch class, which implements three public search methods, and handles the internals, the search methods are: 'search' returns a string, 'get_text_search_results' returns a TextSearchResult object and 'get_search_results' returns a object native to the search service (i.e. BingWebPages for Bing)
    • This also has a method called "create_{search_method}' which returns a KernelFunction based on the search method. This function can be adapted by setting the parameters and has several adaptability options and allows you to create a RAG pipeline easily with custom names and descriptions of both the functions and the parameters!
  • Introduces VectorSearch abstractions, including implementation for Azure AI Search
    • This consists of a VectorStoreBase class which handles all the internal and three public interfaces, vectorized_search (supply a vector), vectorizable_text_search (supply a string that get's vectorized downstream), vector_text_search (supply a string), each vector store record collection can pick and choose which ones they need to support by importing one or more next to the VectorSearchBase class.
  • Introduces VectorStoreTextSearch as a way to leverage text search against vector stores
    • Since this builds on TextSearch this is now the best way to create a super powerfull RAG setup with your own data model!
  • Adds all the related classes, samples and tests for the above.
  • Also reorders the data folder, which might cause some slight breaking changes for the few stores that have the new vector store model.
  • Adds additional IndexKinds and DistanceFunctions to stay in sync with dotnet.
  • Renames VolatileStore and VolatileCollection to InMemoryVectorStore and InMemoryVectorCollection.

Closes #6832 #6833

Contribution Checklist

@eavanvalkenburg eavanvalkenburg requested a review from a team as a code owner October 21, 2024 13:58
@markwallace-microsoft markwallace-microsoft added python Pull requests for the Python Semantic Kernel memory labels Oct 21, 2024
@eavanvalkenburg eavanvalkenburg marked this pull request as draft October 21, 2024 13:58
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Oct 22, 2024

Python Test Coverage

Python Test Coverage Report
FileStmtsMissCoverMissing
semantic_kernel
   kernel.py1994776%148, 159, 163, 313–316, 423, 437–480
semantic_kernel/agents/group_chat
   agent_chat.py124298%78, 171
   agent_group_chat.py100298%151, 201
   broadcast_queue.py72199%35
semantic_kernel/agents/open_ai
   assistant_content_generation.py141994%97–98, 329–337, 379, 381
   azure_assistant_agent.py106298%285, 305
   open_ai_assistant_agent.py105298%252, 272
   open_ai_assistant_base.py467898%260, 338–339, 747, 868, 871, 945, 1007
semantic_kernel/connectors/ai
   chat_completion_client_base.py116298%382, 392
   completion_usage.py8188%17
semantic_kernel/connectors/ai/anthropic/services
   anthropic_chat_completion.py176597%147, 165, 169, 223, 419
semantic_kernel/connectors/ai/azure_ai_inference/services
   azure_ai_inference_chat_completion.py119794%120, 146–149, 159, 180, 202
   azure_ai_inference_text_embedding.py41198%87
semantic_kernel/connectors/ai/bedrock/services
   bedrock_chat_completion.py1361490%117, 138, 163, 167–170, 228, 246–265, 324
   bedrock_text_completion.py57296%95, 118
   bedrock_text_embedding.py45198%94
semantic_kernel/connectors/ai/bedrock/services/model_provider
   bedrock_ai21_labs.py13192%67
   bedrock_anthropic_claude.py12192%54
   bedrock_cohere.py20195%75
   utils.py802075%68, 71, 102, 106–115, 132–150, 171–174
semantic_kernel/connectors/ai/embeddings
   embedding_generator_base.py8188%50
semantic_kernel/connectors/ai/google/google_ai/services
   google_ai_chat_completion.py119497%126, 152, 175, 177
   google_ai_text_completion.py63297%98, 121
   utils.py65395%139, 159–164
semantic_kernel/connectors/ai/google/vertex_ai/services
   utils.py66395%140, 160–165
   vertex_ai_chat_completion.py119497%121, 147, 170, 172
   vertex_ai_text_completion.py62297%95, 116
semantic_kernel/connectors/ai/hugging_face/services
   hf_text_completion.py60395%103, 112, 127
   hf_text_embedding.py32584%79–83
semantic_kernel/connectors/ai/mistral_ai/services
   mistral_ai_chat_completion.py118794%118–121, 307–310
semantic_kernel/connectors/ai/ollama/services
   ollama_chat_completion.py1071190%114, 139, 143–144, 154, 187, 224, 234–235, 257, 284
   ollama_text_completion.py57395%93, 103, 131
   utils.py462546%29, 44–52, 64–86, 98–102, 119–122
semantic_kernel/connectors/ai/onnx
   utils.py53394%50–51, 112
semantic_kernel/connectors/ai/onnx/services
   onnx_gen_ai_chat_completion.py72790%67–68, 98, 122, 167, 173, 179
   onnx_gen_ai_completion_base.py582164%59–71, 79–90
   onnx_gen_ai_text_completion.py46589%54–55, 87, 117, 133
semantic_kernel/connectors/ai/open_ai/prompt_execution_settings
   open_ai_prompt_execution_settings.py95199%113
semantic_kernel/connectors/ai/open_ai/services
   azure_chat_completion.py107595%118, 123, 157, 166, 169
   azure_text_completion.py28293%82, 87
   azure_text_embedding.py30293%84, 89
   open_ai_chat_completion_base.py127596%71, 121, 141, 177, 287
   open_ai_handler.py65395%88, 97–98
   open_ai_text_completion_base.py80298%56, 161
semantic_kernel/connectors/ai/open_ai/settings
   azure_open_ai_settings.py22195%99
semantic_kernel/connectors/memory/azure_ai_search
   azure_ai_search_collection.py1303077%166, 168, 241–279, 289–299, 307, 336, 340
semantic_kernel/connectors/memory/redis
   redis_collection.py159299%146, 316
   utils.py451176%145–146, 164, 166, 173–188
semantic_kernel/connectors/memory/weaviate
   utils.py60395%81, 85, 252
   weaviate_collection.py1152578%144–153, 157–177, 181–186
   weaviate_store.py44980%106–114, 118–123
semantic_kernel/connectors/openapi_plugin
   openapi_manager.py58297%110–111
   openapi_parser.py88298%71, 128
   openapi_runner.py105298%181–182
semantic_kernel/connectors/openapi_plugin/models
   rest_api_operation.py129199%242
semantic_kernel/contents
   function_call_content.py100298%185, 213
   streaming_chat_message_content.py68199%210
   streaming_content_mixin.py39295%37, 64
semantic_kernel/core_plugins/sessions_python_tool
   sessions_python_plugin.py134894%69, 82–91, 99
   sessions_python_settings.py39490%84–87
semantic_kernel/data
   search_filter.py25196%7
semantic_kernel/data/record_definition
   vector_store_record_utils.py28293%55, 57
semantic_kernel/data/text_search
   text_search.py72494%125, 165, 205, 293
   utils.py32875%23, 54–62, 69–70
   vector_store_text_search.py761778%167–174, 180–187, 192
semantic_kernel/data/vector_search
   vector_search.py24579%105–110
   vector_search_filter.py20195%6
   vector_text_search.py16194%45
   vectorizable_text_search.py15193%50
   vectorized_search.py15193%45
semantic_kernel/data/vector_storage
   vector_store_record_collection.py2481992%403, 463–467, 475–479, 519–522, 529–532
semantic_kernel/functions
   kernel_function_decorator.py98199%102
   kernel_function_from_method.py96199%153
   kernel_function_from_prompt.py154795%165–166, 180, 201, 219, 239, 322
   kernel_function_log_messages.py36683%37–43
   kernel_plugin.py199597%468, 471, 500, 521, 546
semantic_kernel/planners
   plan.py2344581%54, 163–165, 197, 214–227, 264, 269, 277–278, 288–291, 308, 313, 329, 332–337, 355, 360, 363, 365, 372, 386–388, 393–397
semantic_kernel/planners/function_calling_stepwise_planner
   function_calling_stepwise_planner.py116497%145, 189–190, 198
semantic_kernel/planners/sequential_planner
   sequential_planner.py64691%71, 75, 109, 125, 134–135
   sequential_planner_extensions.py50982%31–32, 56, 110–124
   sequential_planner_parser.py771284%66–74, 93, 117–120
semantic_kernel/processes
   process_builder.py683943%43–52, 56–58, 64–74, 78–85, 89–92, 96–100, 105, 109–114
   process_end_step.py19289%37, 41
   process_function_target_builder.py25388%37–40
   process_step_builder.py1052477%44, 89, 103, 110–123, 135–142, 151, 160–169, 178, 192, 209
   process_step_edge_builder.py34391%31, 46, 56
   process_types.py25196%35
semantic_kernel/processes/kernel_process
   kernel_process_step_context.py17194%37
semantic_kernel/processes/local_runtime
   local_kernel_process.py20290%23, 30
   local_kernel_process_context.py32294%66–67
   local_process.py1345261%92, 102, 120–130, 163–190, 194–199, 203, 207–213, 217–227, 231–232
   local_step.py17811436%61, 72, 81–169, 173, 177, 181–182, 187–249, 253–270, 274–277, 281–284, 288–297, 303–306, 310–312
semantic_kernel/prompt_template
   kernel_prompt_template.py78791%144–151
semantic_kernel/schema
   kernel_json_schema_builder.py129993%53, 90, 186, 194, 205, 213, 228, 232–233
semantic_kernel/search
   const.py550%3–11
semantic_kernel/services
   ai_service_client_base.py22195%64
semantic_kernel/template_engine/blocks
   code_block.py77199%119
   named_arg_block.py43198%98
semantic_kernel/utils/authentication
   entra_id_authentication.py15287%26, 38
semantic_kernel/utils/telemetry
   user_agent.py16288%18–19
semantic_kernel/utils/telemetry/model_diagnostics
   decorators.py171498%372–375
TOTAL1394379494% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
2796 4 💤 0 ❌ 0 🔥 1m 23s ⏱️

@eavanvalkenburg eavanvalkenburg changed the title Python: vector and text search Python: Introducing vector and text search Nov 1, 2024
@eavanvalkenburg eavanvalkenburg marked this pull request as ready for review November 1, 2024 13:08
@TaoChenOSU
Copy link
Contributor

A general question:
This PR introduces a good number of breaking changes, though some of them are experimental features. How are we going to announce these changes after we make the next release when this is merged?

@moonbox3
Copy link
Contributor

moonbox3 commented Nov 1, 2024

Is there a backlog issue or two we can attach to this PR?

Copy link
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting to get this in soon. Great work!

@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 4, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 4, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 4, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 4, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 4, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 4, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 4, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 4, 2024
@moonbox3 moonbox3 added this pull request to the merge queue Nov 4, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 4, 2024
@moonbox3 moonbox3 added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@moonbox3 moonbox3 added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@moonbox3 moonbox3 added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@moonbox3 moonbox3 added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2024
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Nov 6, 2024
Merged via the queue into main with commit c8b4094 Nov 6, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation memory python Pull requests for the Python Semantic Kernel
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Python: Update Azure AI Search Memory Connector with New Text Search Design
5 participants