Generate and Understand the Given Code Context. #149
-
Hi, I'm wondering if the Semantic Kernel has the ability to generate code based on a given GitHub repository. I have already tried the sample "Github Q&A," which can be found at https://github.com/microsoft/semantic-kernel/tree/main/samples/apps/github-qna-webapp-react , but unfortunately, it did not generate the code according to the required code context. I would like to know how to help the Language Models (LLMs) understand the code context, which exists in multiple files, so that they can generate the code correctly. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
hi @1openwindow yes that's possible, you can start from the sample code and make a few changes. The sample app currently indexes only markdown files. The first change would be to index also source code files, but here it gets tricky. Unless you're indexing only small files, when dealing with a large code you'll have to split it in "chunks", to fit into the maximum size accepted by the embeddings generator. Chunking code can be hard, depending on the language. But if you get past that, the rest should work fine, you'll use semantic memory and semantic search to retrieve relevant code snippets, fit them into a prompt, and use a semantic function (aka an LLM prompt) to generate the code you want. Here's an example of a prompt. The prompt has 2 parts:
See at the bottom the answer given by the LLM. You can tweak the question to get what needed, as long as the code example is relevant. So, the main problem is injecting relevant code, using semantic search.
LLM output: # Copyright (c) Microsoft. All rights reserved.
import asyncio
import semantic_kernel as sk
sk_prompt = """
ChatBot can have a conversation with you about any topic.
It can give explicit instructions or say 'I don't know'
when it doesn't know the answer.
{{$chat_history}}
User:> {{$user_input}}
ChatBot:>
"""
kernel = sk.create_kernel()
api_key, org_id = sk.openai_settings_from_dot_env()
# Change the backend to gpt4
kernel.config.add_openai_completion_backend(
"gpt-4", "text-gpt-4", api_key, org_id
)
prompt_config = sk.PromptTemplateConfig.from_completion_parameters(
max_tokens=2000, temperature=0.7, top_p=0.4
)
prompt_template = sk.PromptTemplate(
sk_prompt, kernel.prompt_template_engine, prompt_config
)
function_config = sk.SemanticFunctionConfig(prompt_config, prompt_template)
chat_function = kernel.register_semantic_function("ChatBot", "Chat", function_config)
async def chat() -> None:
context = sk.ContextVariables()
context["chat_history"] = ""
try:
user_input = input("User:> ")
context["user_input"] = user_input
except KeyboardInterrupt:
print("\n\nExiting chat...")
return False
except EOFError:
print("\n\nExiting chat...")
return False
if user_input == "exit":
print("\n\nExiting chat...")
return False
# Change the function to use the gpt4 backend
answer = await kernel.run_on_vars_async(context, chat_function)
context["chat_history"] += f"\nUser:> {user_input}\nChatBot:> {answer}\n"
print(f"ChatBot:> {answer}")
return True
async def main() -> None:
chatting = True
while chatting:
chatting = await chat()
if __name__ == "__main__":
asyncio.run(main()) |
Beta Was this translation helpful? Give feedback.
-
Note how I had to be very specific in the question ("Ask Smart to Get Smart.") to force the LLM to use a GPT4 model ID. Also the model name is wrong, because the LLM doesn't have that information. That aside, the LLM can generate code, which is pretty much how GitHub Copilot works. Then it's up to you whether to trust the code generated (I would not just run it, you'll need some form of validation). |
Beta Was this translation helpful? Give feedback.
-
@dluc , , I appreciate your guidance and clarification, as it has been both inspiring and beneficial for me. I'm eager to explore our AI-driven project using the semantical kernel approach. Thank you so much! |
Beta Was this translation helpful? Give feedback.
-
@1openwindow thanks for raising this and thanks to @dluc for answering! Closing this discussion. |
Beta Was this translation helpful? Give feedback.
hi @1openwindow yes that's possible, you can start from the sample code and make a few changes. The sample app currently indexes only markdown files. The first change would be to index also source code files, but here it gets tricky. Unless you're indexing only small files, when dealing with a large code you'll have to split it in "chunks", to fit into the maximum size accepted by the embeddings generator. Chunking code can be hard, depending on the language. But if you get past that, the rest should work fine, you'll use semantic memory and semantic search to retrieve relevant code snippets, fit them into a prompt, and use a semantic function (aka an LLM prompt) to generate the code you …