-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update document formatting for v2 tool use in code examples #229
base: main
Are you sure you want to change the base?
Conversation
@mrmer1 please check the merge conflicts. |
@mrmer1 This looks great! Could you please also make the following updates:
This will help users run the notebooks successfully in the future without version compatibility issues. |
@ai-yann updated the install versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the versioning edits. I was able to run through all the notebooks without issue -- I just added a few more comments, putting myself into the shoes of a customer during a workshop. I hope it's not too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to You can find the [dataset here](https://github.com/cohere-ai/notebooks/blob/main/notebooks/guides/advanced_rag/spotify_dataset.csv).
in the markdown, could you please also add to the python cell that follows it:
import pandas as pd
import shutil
from pathlib import Path
def setup_spotify_dataset():
"""
Loads the Spotify dataset and ensures a local copy exists in the current directory.
Returns the loaded DataFrame.
"""
# First try to load from current directory
local_path = Path('spotify_dataset.csv')
if local_path.exists():
print("Loading Spotify dataset from local directory...")
return pd.read_csv(local_path)
# If not found locally, try to find in notebooks directory structure
try:
current = Path.cwd()
while current.name != 'notebooks' and current.parent != current:
current = current.parent
if current.name != 'notebooks':
raise RuntimeError("Could not find notebooks directory")
# Original file path
original_path = current / 'guides' / 'advanced_rag' / 'spotify_dataset.csv'
if not original_path.exists():
raise FileNotFoundError(f"Dataset not found at {original_path}")
# Copy file to current directory
print(f"Copying Spotify dataset to local directory ({local_path})...")
shutil.copy2(original_path, local_path)
# Load and return the data
return pd.read_csv(local_path)
except (RuntimeError, FileNotFoundError) as e:
print(f"Error: {e}")
print("Please ensure the Spotify dataset is available either locally or in the expected directory structure.")
raise
# Load the dataset
try:
spotify_data = setup_spotify_dataset()
print("\nFirst few rows of the dataset:")
display(spotify_data.head(3))
except Exception as e:
print(f"Failed to load dataset: {e}")
This just makes it a smoother experience for folks trying to follow along in a group training setting. This way folks don't need to leave the window, download, upload, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the final results consistent? Can we add one more cell that outputs the final answers in markdown with clickable citations, like we do in the insert_inline_citations
method of the notebooks/agents/Vanilla_Tool_Use_v2.ipynb notebook? Because now it's a lot to scroll through, and a lot looks like
Start: 531 | End: 568 | Text: 'Damián Pacheco (twelve-string guitar)'
Sources:
web_search_ra443ajyz6xj:0
web_search_ra443ajyz6xj:2
web_search_ta7g2cd67jrx:0`
Something that looks like:
Spotify 2023 Top Songs Analysis
Top 3 Most Streamed Songs
- "Flowers" by Miley Cyrus
- "Ella Baila Sola" by Eslabon Armado and Peso Pluma
- "Shakira: Bzrp Music Sessions, Vol. 53" by Shakira and Bizarrap
Artist Details
Miley Cyrus
- Age: 31 years old
- Citizenship: United States
- Sources:
Eslabon Armado
- Type: American regional Mexican group
- Location: Patterson, California
- Formation: 2017
- Members:
- Pedro Tovar (vocals)
- Brian Tovar (bass)
- Ulises González (acoustic guitar)
- Damián Pacheco (twelve-string guitar)
- Sources:
Peso Pluma
- Age: 25 years old
- Birthdate: June 15, 1999
- Citizenship: Mexican
- Sources:
Shakira
- Age: 47 years old (born February 2, 1977)
- Citizenship: Colombian
- Location of Birth: Barranquilla, Colombia
- Sources:
Bizarrap
- Age: 25 years old
- Birthdate: August 29, 1998
- Citizenship: Argentine
- Profession: Record producer, songwriter, and DJ
- Sources:
Methodology
- Data sourced from Spotify's 2023 streaming statistics
- Artist information verified through web searches
- Ages and citizenships confirmed through multiple sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you can use:
from IPython.display import Markdown
display(Markdown(markdown_response))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we go to the trouble of creating a markdown table with the results and inline citations, maybe we can display the table as markdown, by adding:
from IPython.display import Markdown
display(Markdown(cited_text))
print("\n" + list_sources(response.message.citations, source_index))
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"# pip install cohere\n", | ||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove the two newlines at the top of this cell? just for presentation purposes :)
This PR introduces a new tutorial for Cohere's API, which is split into seven parts. Each part focuses on a different use case, including installation and setup, text generation, chatbots, semantic search, reranking, retrieval-augmented generation (RAG), and agents with tool use. The tutorial is designed to be completed in around 15 minutes.