Skip to content

Commit b091374

Browse files
authored
Merge pull request #479 from vespa-engine/kkraune/app-packages
Kkraune/app packages
2 parents a016793 + 28c9297 commit b091374

File tree

2 files changed

+320
-1
lines changed

2 files changed

+320
-1
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,319 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "e05d0811",
6+
"metadata": {},
7+
"source": [
8+
"![Vespa logo](https://vespa.ai/assets/vespa-logo-color.png)\n",
9+
"\n",
10+
"# Application packages\n",
11+
"\n",
12+
"Vespa is configured using an [application package](https://docs.vespa.ai/en/application-packages.html).\n",
13+
"Pyvespa provides an API to generate a deployable application package.\n",
14+
"\n",
15+
"An application package has at a minimum a [schema](https://docs.vespa.ai/en/schemas.html)\n",
16+
"and [services.xml](https://docs.vespa.ai/en/reference/services.html).\n",
17+
"\n",
18+
"Example - create an empty application package:"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": null,
24+
"id": "7e3477a6",
25+
"metadata": {},
26+
"outputs": [],
27+
"source": [
28+
"from vespa.package import ApplicationPackage\n",
29+
"\n",
30+
"app_package = ApplicationPackage(name=\"myschema\")"
31+
]
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"id": "e3f1e7d5",
36+
"metadata": {},
37+
"source": [
38+
"To inspect an application package, dump it to disk using\n",
39+
"[to_files](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.ApplicationPackage.to_files):"
40+
]
41+
},
42+
{
43+
"cell_type": "code",
44+
"execution_count": null,
45+
"id": "d05523a8",
46+
"metadata": {},
47+
"outputs": [],
48+
"source": [
49+
"import tempfile, os\n",
50+
"\n",
51+
"temp_dir = tempfile.TemporaryDirectory()\n",
52+
"os.environ[\"TMP_APP_DIR\"] = temp_dir.name\n",
53+
"app_package.to_files(temp_dir.name)\n",
54+
"print(temp_dir.name)"
55+
]
56+
},
57+
{
58+
"cell_type": "code",
59+
"execution_count": null,
60+
"id": "e3a4dc05",
61+
"metadata": {},
62+
"outputs": [
63+
{
64+
"name": "stdout",
65+
"output_type": "stream",
66+
"text": [
67+
"./services.xml\r\n",
68+
"./schemas/myschema.sd\r\n",
69+
"./search/query-profiles/types/root.xml\r\n",
70+
"./search/query-profiles/default.xml\r\n"
71+
]
72+
}
73+
],
74+
"source": [
75+
"!cd $TMP_APP_DIR && find . -type f"
76+
]
77+
},
78+
{
79+
"cell_type": "markdown",
80+
"id": "c038d33a",
81+
"metadata": {},
82+
"source": [
83+
"Ignore these files for now:\n",
84+
"\n",
85+
" ./search/query-profiles/types/root.xml\n",
86+
" ./search/query-profiles/default.xml"
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"id": "7b01cd09",
92+
"metadata": {},
93+
"source": [
94+
"## Schema\n",
95+
"\n",
96+
"Use a schema to Create fields, fieldsets and a ranking function - dump the empty schema (An empty schema is created, with the same name as the application package):"
97+
]
98+
},
99+
{
100+
"cell_type": "code",
101+
"execution_count": null,
102+
"id": "923edec8",
103+
"metadata": {},
104+
"outputs": [
105+
{
106+
"name": "stdout",
107+
"output_type": "stream",
108+
"text": [
109+
"schema myschema {\r\n",
110+
" document myschema {\r\n",
111+
" }\r\n",
112+
"}"
113+
]
114+
}
115+
],
116+
"source": [
117+
"!cat $TMP_APP_DIR/schemas/myschema.sd"
118+
]
119+
},
120+
{
121+
"cell_type": "markdown",
122+
"id": "5a1cbaf2",
123+
"metadata": {},
124+
"source": [
125+
"Add fields, a fieldset and a ranking function:"
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": null,
131+
"id": "c83c1945",
132+
"metadata": {},
133+
"outputs": [],
134+
"source": [
135+
"from vespa.package import Field, FieldSet, RankProfile\n",
136+
"\n",
137+
"app_package.schema.add_fields(\n",
138+
" Field(name = \"id\", type = \"string\", indexing = [\"attribute\", \"summary\"]),\n",
139+
" Field(name = \"title\", type = \"string\", indexing = [\"index\", \"summary\"], index = \"enable-bm25\"),\n",
140+
" Field(name = \"body\", type = \"string\", indexing = [\"index\", \"summary\"], index = \"enable-bm25\")\n",
141+
")\n",
142+
"\n",
143+
"app_package.schema.add_field_set(\n",
144+
" FieldSet(name = \"default\", fields = [\"title\", \"body\"])\n",
145+
")\n",
146+
"\n",
147+
"app_package.schema.add_rank_profile(\n",
148+
" RankProfile(name = \"default\", first_phase = \"bm25(title) + bm25(body)\")\n",
149+
")"
150+
]
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"id": "f721bdfd",
155+
"metadata": {},
156+
"source": [
157+
"Dump application package again, show schema:"
158+
]
159+
},
160+
{
161+
"cell_type": "code",
162+
"execution_count": null,
163+
"id": "4fcd3de2",
164+
"metadata": {},
165+
"outputs": [
166+
{
167+
"name": "stdout",
168+
"output_type": "stream",
169+
"text": [
170+
"schema myschema {\r\n",
171+
" document myschema {\r\n",
172+
" field id type string {\r\n",
173+
" indexing: attribute | summary\r\n",
174+
" }\r\n",
175+
" field title type string {\r\n",
176+
" indexing: index | summary\r\n",
177+
" index: enable-bm25\r\n",
178+
" }\r\n",
179+
" field body type string {\r\n",
180+
" indexing: index | summary\r\n",
181+
" index: enable-bm25\r\n",
182+
" }\r\n",
183+
" }\r\n",
184+
" fieldset default {\r\n",
185+
" fields: title, body\r\n",
186+
" }\r\n",
187+
" rank-profile default {\r\n",
188+
" first-phase {\r\n",
189+
" expression: bm25(title) + bm25(body)\r\n",
190+
" }\r\n",
191+
" }\r\n",
192+
"}"
193+
]
194+
}
195+
],
196+
"source": [
197+
"app_package.to_files(temp_dir.name)\n",
198+
"!cat $TMP_APP_DIR/schemas/myschema.sd"
199+
]
200+
},
201+
{
202+
"cell_type": "markdown",
203+
"id": "7cc78157",
204+
"metadata": {},
205+
"source": [
206+
"Note how the indexing settings are written to the schema.\n",
207+
"\n",
208+
"> **_Pyvespa generally does not support all indexing options in Vespa - it is made for easy experimentation.\n",
209+
" To configure setting an unsupported indexing option (or any other unsupported option),\n",
210+
" dump the application package, modify the schema file\n",
211+
" and deploy the application package from the directory, or as a zipped file.\n",
212+
" [Read more](https://pyvespa.readthedocs.io/en/latest/deploy-docker.html)_**"
213+
]
214+
},
215+
{
216+
"cell_type": "markdown",
217+
"id": "cfd73872",
218+
"metadata": {},
219+
"source": [
220+
"At this point, review the Vespa documentation:\n",
221+
"* [field](https://docs.vespa.ai/en/schemas.html#field)\n",
222+
"* [fieldset](https://docs.vespa.ai/en/schemas.html#fieldset)\n",
223+
"* [rank-profile](https://docs.vespa.ai/en/ranking.html#rank-profiles)"
224+
]
225+
},
226+
{
227+
"cell_type": "markdown",
228+
"id": "a51353a4",
229+
"metadata": {},
230+
"source": [
231+
"## Services\n",
232+
"\n",
233+
"In `services.xml` you will find a container and content cluster -\n",
234+
"see the [Vespa Overview](https://docs.vespa.ai/en/overview.html).\n",
235+
"This is a file you will normally not change or need to know much about - dump the default file:"
236+
]
237+
},
238+
{
239+
"cell_type": "code",
240+
"execution_count": null,
241+
"id": "4abae84e",
242+
"metadata": {},
243+
"outputs": [
244+
{
245+
"name": "stdout",
246+
"output_type": "stream",
247+
"text": [
248+
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n",
249+
"<services version=\"1.0\">\r\n",
250+
" <container id=\"myschema_container\" version=\"1.0\">\r\n",
251+
" <search></search>\r\n",
252+
" <document-api></document-api>\r\n",
253+
" </container>\r\n",
254+
" <content id=\"myschema_content\" version=\"1.0\">\r\n",
255+
" <redundancy reply-after=\"1\">1</redundancy>\r\n",
256+
" <documents>\r\n",
257+
" <document type=\"myschema\" mode=\"index\"></document>\r\n",
258+
" </documents>\r\n",
259+
" <nodes>\r\n",
260+
" <node distribution-key=\"0\" hostalias=\"node1\"></node>\r\n",
261+
" </nodes>\r\n",
262+
" </content>\r\n",
263+
"</services>"
264+
]
265+
}
266+
],
267+
"source": [
268+
"!cat $TMP_APP_DIR/services.xml"
269+
]
270+
},
271+
{
272+
"cell_type": "markdown",
273+
"id": "d6477c44",
274+
"metadata": {},
275+
"source": [
276+
"Observe:\n",
277+
"* A content cluster (this is where the index is stored) called `myschema_content` is created.\n",
278+
" This is information not normally needed, unless using\n",
279+
" [delete_all_docs](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.delete_all_docs)\n",
280+
" to quickly remove all documents from a schema\n",
281+
"\n",
282+
"Remove the temporary application package file dump:"
283+
]
284+
},
285+
{
286+
"cell_type": "code",
287+
"execution_count": null,
288+
"id": "84ce16e8",
289+
"metadata": {},
290+
"outputs": [],
291+
"source": [
292+
"temp_dir.cleanup()"
293+
]
294+
},
295+
{
296+
"cell_type": "markdown",
297+
"id": "e242ac80",
298+
"metadata": {},
299+
"source": [
300+
"## Next step: Deploy, feed and query\n",
301+
"\n",
302+
"Once the schema is ready for deployment, decide deployment option and deploy the application package:\n",
303+
"* [Deploy to local container](https://pyvespa.readthedocs.io/en/latest/deploy-docker.html)\n",
304+
"* [Deploy to Vespa Cloud](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html)\n",
305+
"\n",
306+
"Use the guides on the pyvespa site to feed and query data."
307+
]
308+
}
309+
],
310+
"metadata": {
311+
"kernelspec": {
312+
"display_name": "python3",
313+
"language": "python",
314+
"name": "python3"
315+
}
316+
},
317+
"nbformat": 4,
318+
"nbformat_minor": 5
319+
}

docs/sphinx/source/examples/semantic-retrieval-for-question-answering-applications.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -923,4 +923,4 @@
923923
},
924924
"nbformat": 4,
925925
"nbformat_minor": 5
926-
}
926+
}

0 commit comments

Comments
 (0)