[Issue]: Error running pipeline!, if no data to return #1703

lingfan · 2025-02-13T16:24:34Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

{
"type": "error",
"data": "Error running pipeline!",
"stack": "Traceback (most recent call last):\n File "/opt/conda/lib/python3.11/site-packages/graphrag/index/run/run_workflows.py", line 166, in _run_workflows\n result = await run_workflow(\n ^^^^^^^^^^^^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/graphrag/index/workflows/extract_graph.py", line 45, in run_workflow\n base_entity_nodes, base_relationship_edges = await extract_graph(\n ^^^^^^^^^^^^^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/graphrag/index/flows/extract_graph.py", line 33, in extract_graph\n entities, relationships = await extract_entities(\n ^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/graphrag/index/operations/extract_entities/extract_entities.py", line 137, in extract_entities\n relationships = _merge_relationships(relationship_dfs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/graphrag/index/operations/extract_entities/extract_entities.py", line 178, in _merge_relationships\n .agg(\n ^^^^\n File "/opt/conda/lib/python3.11/site-packages/pandas/core/groupby/generic.py", line 1432, in aggregate\n result = op.agg()\n ^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/pandas/core/apply.py", line 190, in agg\n return self.agg_dict_like()\n ^^^^^^^^^^^^^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/pandas/core/apply.py", line 423, in agg_dict_like\n return self.agg_or_apply_dict_like(op_name="agg")\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/pandas/core/apply.py", line 1608, in agg_or_apply_dict_like\n result_index, result_data = self.compute_dict_like(\n ^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/pandas/core/apply.py", line 462, in compute_dict_like\n func = self.normalize_dictlike_arg(op_name, selected_obj, func)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/opt/conda/lib/python3.11/site-packages/pandas/core/apply.py", line 663, in normalize_dictlike_arg\n raise KeyError(f"Column(s) {list(cols)} do not exist")\nKeyError: "Column(s) ['description', 'source_id', 'weight'] do not exist"\n",
"source": ""Column(s) ['description', 'source_id', 'weight'] do not exist"",
"details": null
}

Steps to reproduce

No response

GraphRAG Config Used

### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

encoding_model: cl100k_base # this needs to be matched to your model!

llm:
  api_key: lm-studio # set this in the generated .env file
  type: openai_chat # or azure_openai_chat
  #model: deepseek-r1:32b
  #max_tokens: 4000
  model_supports_json: false # recommended if this is available for your model.
  #model: deepseek-r1
  #api_base: http://192.168.2.131:11434/v1/
  # audience: "https://cognitiveservices.azure.com/.default"
  #model: deepseek-r1-distill-qwen-7b
  #api_base: http://192.168.2.131:1234/v1/
  model: Qwen/Qwen2.5-1.5B-Instruct
  api_base: http://192.168.2.131:30000/v1/
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>

parallelization:
  stagger: 0.3
  num_threads: 50

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  vector_store: 
    type: lancedb # one of [lancedb, azure_ai_search, cosmosdb]
    db_uri: 'output/lancedb'
    collection_name: default
    overwrite: true
  llm:
    api_key: lm-studio
    type: openai_embedding # or azure_openai_embedding
    #model: quentinz/bge-large-zh-v1.5
    #api_base: http://192.168.2.131:11434/api
    model: text-embedding-bge-m3
    api_base: http://192.168.2.131:1234/v1
    #model: BAAI/bge-m3
    #api_base: http://192.168.2.131:30000/v1/
    max_tokens: 1024
    # api_version: 2024-02-15-preview
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>

### Input settings ###

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

chunks:
  size: 4096
  overlap: 100
  group_by_columns: [id]

### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: file # one of [blob, cosmosdb, file]
  base_dir: "cache"

reporting:
  type: file # or console, blob
  base_dir: "logs"

storage:
  type: file # one of [blob, cosmosdb, file]
  base_dir: "output"

## only turn this on if running `graphrag index` with custom settings
## we normally use `graphrag update` with the defaults
update_index_storage:
  # type: file # or blob
  # base_dir: "update_output"

### Workflow settings ###

skip_workflows: []

entity_extraction:
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  enabled: false
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 4000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: true # if true, will generate node2vec embeddings for nodes

umap:
  enabled: true # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots:
  graphml: true
  embeddings: true
  transient: true

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  prompt: "prompts/drift_search_system_prompt.txt"
  reduce_prompt: "prompts/drift_search_reduce_prompt.txt"

basic_search:
  prompt: "prompts/basic_search_system_prompt.txt"

Logs and screenshots

Additional Information

GraphRAG Version:1.2.0
Operating System:ubuntu
Python Version:3.11
Related Issues:

geniuszxd · 2025-02-14T07:26:25Z

I got the same error

shawn-maxiao · 2025-02-21T22:54:24Z

same error

JasonWei1366 · 2025-02-25T08:01:38Z

same error

lingfan added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Error running pipeline!, if no data to return #1703

[Issue]: Error running pipeline!, if no data to return #1703

lingfan commented Feb 13, 2025 •

edited

Loading

geniuszxd commented Feb 14, 2025

shawn-maxiao commented Feb 21, 2025

JasonWei1366 commented Feb 25, 2025

[Issue]: Error running pipeline!, if no data to return #1703

[Issue]: Error running pipeline!, if no data to return #1703

Comments

lingfan commented Feb 13, 2025 • edited Loading

Do you need to file an issue?

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information

geniuszxd commented Feb 14, 2025

shawn-maxiao commented Feb 21, 2025

JasonWei1366 commented Feb 25, 2025

lingfan commented Feb 13, 2025 •

edited

Loading