Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: <title>Error generating community report.TypeError: Object of type ModelMetaclass is not JSON serializable #1715

Open
3 tasks
ren-sheng opened this issue Feb 16, 2025 · 9 comments
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@ren-sheng
Copy link

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

当我运行index的时候,出现了以下主要错误:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
fnllm.base.services.errors.FailedToGenerateValidJsonError: JSON response is not a valid JSON
fnllm.base.services.errors.FailedToGenerateValidJsonError
TypeError: Object of type ModelMetaclass is not JSON serializable.

When I run the index in generating the community report, always can appear error, including mentioned
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
fnllm.base.services.errors.FailedToGenerateValidJsonError: JSON response is not a valid JSON
fnllm.base.services.errors.FailedToGenerateValidJsonError
TypeError: Object of type ModelMetaclass is not JSON serializable.

有一段错误信息提到返回的json内容,如下:
There is an error message that mentions the returned json content as follows:

fnllm.base.services.errors.FailedToGenerateValidJsonError: JSON response is not a valid JSON, response=```json
{
"title": "Ahvaz Academic Community",
"summary": "The community centers around Ahvaz, a prominent city in Iran known for its academic institutions, including Ahvaz Jundishapur University of Medical Sciences and Shahid Chamran University. These institutions are linked to the city and to researchers like Seyyed Maysam Mousavi Shoar, who is affiliated with Shahid Chamran University. The community highlights Ahvaz's role as an educational hub in Iran.",
"rating": 4.5,
"rating_explanation": "The impact severity rating is moderate due to the community's influence in Iran's higher education landscape and its potential to shape academic and research outcomes.",
"findings": [
{
"summary": "Ahvaz as an educational hub",
"explanation": "Ahvaz is a key city in Iran, known for hosting significant academic institutions such as Ahvaz Jundishapur University of Medical Sciences and Shahid Chamran University. These institutions contribute to the city's reputation as a center for higher education and research. The presence of these universities underscores Ahvaz's importance in Iran's academic landscape and its potential to influence educational and research outcomes. [Data: Entities (706)]"
},
{
"summary": "Ahvaz Jundishapur University of Medical Sciences",
"explanation": "Ahvaz Jundishapur University of Medical Sciences is a prominent institution located in Ahvaz, focusing on medical education and research. Its presence in the city highlights Ahvaz's role in advancing medical knowledge and training healthcare professionals. The university's activities could have significant implications for public health and medical research in Iran. [Data: Relationships (845)]"
},
{
"summary": "Shahid Chamran University's academic contributions",
"explanation": "Shahid Chamran University is another key academic institution in Ahvaz, offering a broad range of academic disciplines. Its affiliation with researchers like Seyyed Maysam Mousavi Shoar demonstrates its role in fostering academic and scientific research. The university's contributions to various fields of study could have a lasting impact on Iran's academic and research communities. [Data: Entities (802), Relationships (937)]"
},
{
"summary": "Seyyed Maysam Mousavi Shoar's research affiliation",
"explanation": "Seyyed Maysam Mousavi Shoar is a researcher affiliated with Shahid Chamran University, focusing on basic sciences. His work contributes to the university's academic output and highlights the institution's commitment to advancing scientific knowledge. Researchers like Shoar play a crucial role in shaping the academic and research landscape in Ahvaz and beyond. [Data: Entities (807), Relationships (937)]"
}
]
}


The above exception was the direct cause of the following exception:


### Steps to reproduce

_No response_

### Expected Behavior

_No response_

### GraphRAG Config Used

```yaml
# Paste your config here
### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

models:
  default_chat_model:
    type: openai_chat # or azure_openai_chat
    api_base: https://api.agicto.cn/v1
    # api_version: 2024-05-01-preview
    auth_type: api_key # or azure_managed_identity
    api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    # model: gpt-4-turbo-preview
    model: llama3-70b-8192
    # deployment_name: <azure_model_deployment_name>
    encoding_model: cl100k_base # automatically set by tiktoken if left undefined
    model_supports_json: true # recommended if this is available for your model.
    concurrent_requests: 25 # max number of simultaneous LLM requests allowed
    async_mode: threaded # or asyncio
    retry_strategy: native
    max_retries: -1                   # set to -1 for dynamic retry logic (most optimal setting based on server response)
    tokens_per_minute: 0              # set to 0 to disable rate limiting
    requests_per_minute: 0            # set to 0 to disable rate limiting
  default_embedding_model:
    type: openai_embedding # or azure_openai_embedding
    api_base: https://api.agicto.cn/v1
    # api_version: 2024-05-01-preview
    auth_type: api_key # or azure_managed_identity
    api_key: ${GRAPHRAG_API_KEY}
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    model: text-embedding-3-small
    # deployment_name: <azure_model_deployment_name>
    # encoding_model: cl100k_base # automatically set by tiktoken if left undefined
    model_supports_json: true # recommended if this is available for your model.
    concurrent_requests: 25 # max number of simultaneous LLM requests allowed
    async_mode: threaded # or asyncio
    retry_strategy: native
    max_retries: -1                   # set to -1 for dynamic retry logic (most optimal setting based on server response)
    tokens_per_minute: 0              # set to 0 to disable rate limiting
    requests_per_minute: 0            # set to 0 to disable rate limiting

vector_store:
  default_vector_store:
    type: lancedb
    db_uri: output\lancedb
    container_name: default
    overwrite: True

embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store

### Input settings ###

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$$"

chunks:
  size: 8000
  overlap: 0
  group_by_columns: [id]

### Output settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: file # [file, blob, cosmosdb]
  base_dir: "cache"

reporting:
  type: file # [file, blob, cosmosdb]
  base_dir: "logs"

output:
  type: file # [file, blob, cosmosdb]
  base_dir: "output"

### Workflow settings ###

extract_graph:
  model_id: default_chat_model
  prompt: "prompts/extract_graph.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  model_id: default_chat_model
  # prompt: "prompts_new/summarize_descriptions.txt"
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english # [regex_english, syntactic_parser, cfg]

extract_claims:
  enabled: false
  model_id: default_chat_model
  prompt: "prompts/extract_claims.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  model_id: default_chat_model
  graph_prompt: "prompts/community_report_graph.txt"
  # text_prompt: "prompts_new2/community_report.txt"
  text_prompt: "prompts/community_report_text.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots:
  graphml: false
  embeddings: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  prompt: "prompts/drift_search_system_prompt.txt"
  reduce_prompt: "prompts/drift_search_reduce_prompt.txt"

basic_search:
  prompt: "prompts/basic_search_system_prompt.txt"

Logs and screenshots

Image

Additional Information

  • GraphRAG Version:1.20
  • Operating System:window11
  • Python Version:3.12.8
  • Related Issues:
@ren-sheng ren-sheng added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Feb 16, 2025
@hjh19950927
Copy link

解决了吗,我也遇见同样的问题了

@ren-sheng
Copy link
Author

解决了吗,我也遇见同样的问题了

还没修好,但是我发现是模型返回的社区总结中带了markdown的格式导致没办法识别成json,现在我怎么改提示词都没办法让模型输出不带markdown格式的回答,在考虑往源码里面加入格式处理的环节

@hjh19950927
Copy link

嗯,你可以尝试换一个别的模型,我换成了阿里的qwen-max,就没有这个问题了

@rongtianjie
Copy link

Same for me.
Both deepseek-ai/DeepSeek-V3 and Qwen/Qwen2.5-32B-Instruct return the same error.

@luneice
Copy link

luneice commented Feb 21, 2025

ds-v3模型有这个问题,你可以去改代码:
graphrag\index\operations\summarize_communities\community_reports_extractor.py 文件的第 92 行那里

def _parse_json_response(self, res):
        """
        尝试从 Markdown 代码块中提取 JSON 内容,若提取失败则将输入当作纯 JSON 处理
        """
        try:
            # 尝试从 Markdown 代码块中提取 JSON
            start_index = res.find("```json")
            end_index = res.find("```", start_index + 7)
            if start_index!= -1 and end_index!= -1:
                json_str = res[start_index + 7:end_index].strip()
                return json.loads(json_str)
            else:
                # 当作纯 JSON 处理
                return json.loads(res)
        except json.JSONDecodeError:
            print("无法解析 JSON 响应")
            return {}

@ren-sheng
Copy link
Author

ds-v3模型有这个问题,你可以去改代码: graphrag\index\operations\summarize_communities\community_reports_extractor.py 文件的第 92 行那里

def _parse_json_response(self, res):
        """
        尝试从 Markdown 代码块中提取 JSON 内容,若提取失败则将输入当作纯 JSON 处理
        """
        try:
            # 尝试从 Markdown 代码块中提取 JSON
            start_index = res.find("```json")
            end_index = res.find("```", start_index + 7)
            if start_index!= -1 and end_index!= -1:
                json_str = res[start_index + 7:end_index].strip()
                return json.loads(json_str)
            else:
                # 当作纯 JSON 处理
                return json.loads(res)
        except json.JSONDecodeError:
            print("无法解析 JSON 响应")
            return {}

感谢回答,确实是deepseek模型的问题,不过我前几天试着改了改提示词,最后改好了,返回的不再是markdown了。

@TracyRaven007
Copy link

ds-v3模型有这个问题,你可以去改代码: graphrag\index\operations\summarize_communities\community_reports_extractor.py 文件的第 92 行那里

def _parse_json_response(self, res):
        """
        尝试从 Markdown 代码块中提取 JSON 内容,若提取失败则将输入当作纯 JSON 处理
        """
        try:
            # 尝试从 Markdown 代码块中提取 JSON
            start_index = res.find("```json")
            end_index = res.find("```", start_index + 7)
            if start_index!= -1 and end_index!= -1:
                json_str = res[start_index + 7:end_index].strip()
                return json.loads(json_str)
            else:
                # 当作纯 JSON 处理
                return json.loads(res)
        except json.JSONDecodeError:
            print("无法解析 JSON 响应")
            return {}

感谢回答,确实是deepseek模型的问题,不过我前几天试着改了改提示词,最后改好了,返回的不再是markdown了。

请问你是怎么改的,目前我使用ds-V3也遇到了这个问题

@ren-sheng
Copy link
Author

ds-v3模型有这个问题,你可以去改代码: graphrag\index\operations\summarize_communities\community_reports_extractor.py 文件的第 92 行那里

def _parse_json_response(self, res):
        """
        尝试从 Markdown 代码块中提取 JSON 内容,若提取失败则将输入当作纯 JSON 处理
        """
        try:
            # 尝试从 Markdown 代码块中提取 JSON
            start_index = res.find("```json")
            end_index = res.find("```", start_index + 7)
            if start_index!= -1 and end_index!= -1:
                json_str = res[start_index + 7:end_index].strip()
                return json.loads(json_str)
            else:
                # 当作纯 JSON 处理
                return json.loads(res)
        except json.JSONDecodeError:
            print("无法解析 JSON 响应")
            return {}

感谢回答,确实是deepseek模型的问题,不过我前几天试着改了改提示词,最后改好了,返回的不再是markdown了。

请问你是怎么改的,目前我使用ds-V3也遇到了这个问题

我修改了community_report_graph.txt提示词中Report Structure部分的一句话,即json示例的前面的那句话,我改为了:
Return output as a well-formed JSON-formatted string with the following format,but don't output in markdown format, the output string should be directly usable by json.load():
添加了对返回格式的要求,其实在community_report_text.txt中的json示例前有提到对返回的json需要可以直接被json.load()加载,但是不知道为什么在community_report_graph.txt中就没有。

@TracyRaven007
Copy link

ds-v3模型有这个问题,你可以去改代码: graphrag\index\operations\summarize_communities\community_reports_extractor.py 文件的第 92 行那里

def _parse_json_response(self, res):
        """
        尝试从 Markdown 代码块中提取 JSON 内容,若提取失败则将输入当作纯 JSON 处理
        """
        try:
            # 尝试从 Markdown 代码块中提取 JSON
            start_index = res.find("```json")
            end_index = res.find("```", start_index + 7)
            if start_index!= -1 and end_index!= -1:
                json_str = res[start_index + 7:end_index].strip()
                return json.loads(json_str)
            else:
                # 当作纯 JSON 处理
                return json.loads(res)
        except json.JSONDecodeError:
            print("无法解析 JSON 响应")
            return {}

感谢回答,确实是deepseek模型的问题,不过我前几天试着改了改提示词,最后改好了,返回的不再是markdown了。

请问你是怎么改的,目前我使用ds-V3也遇到了这个问题

我修改了community_report_graph.txt提示词中Report Structure部分的一句话,即json示例的前面的那句话,我改为了: Return output as a well-formed JSON-formatted string with the following format,but don't output in markdown format, the output string should be directly usable by json.load(): 添加了对返回格式的要求,其实在community_report_text.txt中的json示例前有提到对返回的json需要可以直接被json.load()加载,但是不知道为什么在community_report_graph.txt中就没有。

非常感谢,修改提词之后可以正常解析了。 不过在1.2.0版本中,./graphrag/graphrag/prompts/index/community_report.py 没有找到“对返回的json需要可以直接被json.load()加载” 相关的要求。再次感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

5 participants