Merge pull request #479 from vespa-engine/kkraune/app-packages

kkraune · web-flow · commit b091374ea2e3 · 2023-02-28T08:52:35.000+01:00
Kkraune/app packages
diff --git a/docs/sphinx/source/application-packages.ipynb b/docs/sphinx/source/application-packages.ipynb
@@ -0,0 +1,319 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e05d0811",
+   "metadata": {},
+   "source": [
+    "![Vespa logo](https://vespa.ai/assets/vespa-logo-color.png)\n",
+    "\n",
+    "# Application packages\n",
+    "\n",
+    "Vespa is configured using an [application package](https://docs.vespa.ai/en/application-packages.html).\n",
+    "Pyvespa provides an API to generate a deployable application package.\n",
+    "\n",
+    "An application package has at a minimum a [schema](https://docs.vespa.ai/en/schemas.html)\n",
+    "and [services.xml](https://docs.vespa.ai/en/reference/services.html).\n",
+    "\n",
+    "Example - create an empty application package:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e3477a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from vespa.package import ApplicationPackage\n",
+    "\n",
+    "app_package = ApplicationPackage(name=\"myschema\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3f1e7d5",
+   "metadata": {},
+   "source": [
+    "To inspect an application package, dump it to disk using\n",
+    "[to_files](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.ApplicationPackage.to_files):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d05523a8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tempfile, os\n",
+    "\n",
+    "temp_dir = tempfile.TemporaryDirectory()\n",
+    "os.environ[\"TMP_APP_DIR\"] = temp_dir.name\n",
+    "app_package.to_files(temp_dir.name)\n",
+    "print(temp_dir.name)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e3a4dc05",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "./services.xml\r\n",
+      "./schemas/myschema.sd\r\n",
+      "./search/query-profiles/types/root.xml\r\n",
+      "./search/query-profiles/default.xml\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!cd $TMP_APP_DIR && find . -type f"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c038d33a",
+   "metadata": {},
+   "source": [
+    "Ignore these files for now:\n",
+    "\n",
+    "    ./search/query-profiles/types/root.xml\n",
+    "    ./search/query-profiles/default.xml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b01cd09",
+   "metadata": {},
+   "source": [
+    "## Schema\n",
+    "\n",
+    "Use a schema to Create fields, fieldsets and a ranking function - dump the empty schema (An empty schema is created, with the same name as the application package):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "923edec8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "schema myschema {\r\n",
+      "    document myschema {\r\n",
+      "    }\r\n",
+      "}"
+     ]
+    }
+   ],
+   "source": [
+    "!cat $TMP_APP_DIR/schemas/myschema.sd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a1cbaf2",
+   "metadata": {},
+   "source": [
+    "Add fields, a fieldset and a ranking function:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c83c1945",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from vespa.package import Field, FieldSet, RankProfile\n",
+    "\n",
+    "app_package.schema.add_fields(\n",
+    "    Field(name = \"id\",    type = \"string\", indexing = [\"attribute\", \"summary\"]),\n",
+    "    Field(name = \"title\", type = \"string\", indexing = [\"index\", \"summary\"], index = \"enable-bm25\"),\n",
+    "    Field(name = \"body\",  type = \"string\", indexing = [\"index\", \"summary\"], index = \"enable-bm25\")\n",
+    ")\n",
+    "\n",
+    "app_package.schema.add_field_set(\n",
+    "    FieldSet(name = \"default\", fields = [\"title\", \"body\"])\n",
+    ")\n",
+    "\n",
+    "app_package.schema.add_rank_profile(\n",
+    "    RankProfile(name = \"default\", first_phase = \"bm25(title) + bm25(body)\")\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f721bdfd",
+   "metadata": {},
+   "source": [
+    "Dump application package again, show schema:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4fcd3de2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "schema myschema {\r\n",
+      "    document myschema {\r\n",
+      "        field id type string {\r\n",
+      "            indexing: attribute | summary\r\n",
+      "        }\r\n",
+      "        field title type string {\r\n",
+      "            indexing: index | summary\r\n",
+      "            index: enable-bm25\r\n",
+      "        }\r\n",
+      "        field body type string {\r\n",
+      "            indexing: index | summary\r\n",
+      "            index: enable-bm25\r\n",
+      "        }\r\n",
+      "    }\r\n",
+      "    fieldset default {\r\n",
+      "        fields: title, body\r\n",
+      "    }\r\n",
+      "    rank-profile default {\r\n",
+      "        first-phase {\r\n",
+      "            expression: bm25(title) + bm25(body)\r\n",
+      "        }\r\n",
+      "    }\r\n",
+      "}"
+     ]
+    }
+   ],
+   "source": [
+    "app_package.to_files(temp_dir.name)\n",
+    "!cat $TMP_APP_DIR/schemas/myschema.sd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cc78157",
+   "metadata": {},
+   "source": [
+    "Note how the indexing settings are written to the schema.\n",
+    "\n",
+    "> **_Pyvespa generally does not support all indexing options in Vespa - it is made for easy experimentation.\n",
+    "  To configure setting an unsupported indexing option (or any other unsupported option),\n",
+    "  dump the application package, modify the schema file\n",
+    "  and deploy the application package from the directory, or as a zipped file.\n",
+    "  [Read more](https://pyvespa.readthedocs.io/en/latest/deploy-docker.html)_**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfd73872",
+   "metadata": {},
+   "source": [
+    "At this point, review the Vespa documentation:\n",
+    "* [field](https://docs.vespa.ai/en/schemas.html#field)\n",
+    "* [fieldset](https://docs.vespa.ai/en/schemas.html#fieldset)\n",
+    "* [rank-profile](https://docs.vespa.ai/en/ranking.html#rank-profiles)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a51353a4",
+   "metadata": {},
+   "source": [
+    "## Services\n",
+    "\n",
+    "In `services.xml` you will find a container and content cluster -\n",
+    "see the [Vespa Overview](https://docs.vespa.ai/en/overview.html).\n",
+    "This is a file you will normally not change or need to know much about - dump the default file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4abae84e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n",
+      "<services version=\"1.0\">\r\n",
+      "    <container id=\"myschema_container\" version=\"1.0\">\r\n",
+      "        <search></search>\r\n",
+      "        <document-api></document-api>\r\n",
+      "    </container>\r\n",
+      "    <content id=\"myschema_content\" version=\"1.0\">\r\n",
+      "        <redundancy reply-after=\"1\">1</redundancy>\r\n",
+      "        <documents>\r\n",
+      "            <document type=\"myschema\" mode=\"index\"></document>\r\n",
+      "        </documents>\r\n",
+      "        <nodes>\r\n",
+      "            <node distribution-key=\"0\" hostalias=\"node1\"></node>\r\n",
+      "        </nodes>\r\n",
+      "    </content>\r\n",
+      "</services>"
+     ]
+    }
+   ],
+   "source": [
+    "!cat $TMP_APP_DIR/services.xml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6477c44",
+   "metadata": {},
+   "source": [
+    "Observe:\n",
+    "* A content cluster (this is where the index is stored) called `myschema_content` is created.\n",
+    "  This is information not normally needed, unless using\n",
+    "  [delete_all_docs](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.delete_all_docs)\n",
+    "  to quickly remove all documents from a schema\n",
+    "\n",
+    "Remove the temporary application package file dump:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "84ce16e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "temp_dir.cleanup()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e242ac80",
+   "metadata": {},
+   "source": [
+    "## Next step: Deploy, feed and query\n",
+    "\n",
+    "Once the schema is ready for deployment, decide deployment option and deploy the application package:\n",
+    "* [Deploy to local container](https://pyvespa.readthedocs.io/en/latest/deploy-docker.html)\n",
+    "* [Deploy to Vespa Cloud](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html)\n",
+    "\n",
+    "Use the guides on the pyvespa site to feed and query data."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "python3",
+   "language": "python",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/sphinx/source/examples/semantic-retrieval-for-question-answering-applications.ipynb b/docs/sphinx/source/examples/semantic-retrieval-for-question-answering-applications.ipynb
@@ -923,4 +923,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}