Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oasst data v1.1 eda #3728

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Prev Previous commit
Next Next commit
clean up
  • Loading branch information
andrewm4894 committed Nov 6, 2023
commit dc82b460b4c77858447e14b2758581ac7b85a372
43 changes: 3 additions & 40 deletions notebooks/openassistant-oasst1.1/eda.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -634,7 +634,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 40,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1476,12 +1476,7 @@
" 'community of developers that maintain it and actively release '\n",
" 'upgrades. Most scientific computer applications run on Linux and '\n",
" 'many servers.',\n",
" 'user_id': 'b4362d6d-facc-47da-8b51-3331bb06a170'}]\n",
"What are the advantages of using Linux over Windows?\n",
"├── Linux can be more stable and use less resources than Windows, depending on the distribution you choose. Its open-source nature means that it's harder for companies to sneak in unwanted features, such as tracking or advertisements. Many Linux distribu\n",
"├── Linux is completely free and open source, and it also has a large community of developers that maintain it and actively release upgrades. Most scientific computer applications run on Linux and many servers.\n",
"└── There are many advantages to using Linux instead of Windows. Here's a few: 1. Linux is free. 2. Linux is open-source. 3. Most programming tools and packages are designed for Linux. 4. Fewer viruses are designed for Linux, because fewer people use it.\n",
"\n"
" 'user_id': 'b4362d6d-facc-47da-8b51-3331bb06a170'}]\n"
]
}
],
Expand All @@ -1497,39 +1492,7 @@
"df_message = df[df[\"message_tree_id\"] == message_tree_id]\n",
"message_reply_data = df_message[\"replies\"].values[0]\n",
"\n",
"pp.pprint(message_reply_data)\n",
"\n",
"# prompt\n",
"message_data = [\n",
" {\n",
" \"message_id\": df_message[\"message_id\"].values[0],\n",
" \"parent_id\": df_message[\"message_id\"].values[0],\n",
" \"text\": df_message[\"text\"].values[0],\n",
" \"role\": df_message[\"role\"].values[0],\n",
" }\n",
"]\n",
"# extend with replies\n",
"message_data.extend(\n",
" [\n",
" {\n",
" \"message_id\": m[\"message_id\"],\n",
" \"parent_id\": m[\"parent_id\"],\n",
" \"text\": m[\"text\"],\n",
" \"role\": m[\"role\"],\n",
" }\n",
" for m in message_reply_data\n",
" ]\n",
")\n",
"\n",
"# Create the tree\n",
"tree = Tree()\n",
"\n",
"# Add messages to the tree\n",
"for msg in message_data:\n",
" add_message_to_tree(tree, msg)\n",
"\n",
"# Show the tree\n",
"tree.show()"
"pp.pprint(message_reply_data)"
]
},
{
Expand Down