-
Notifications
You must be signed in to change notification settings - Fork 5
/
model.xml
97 lines (94 loc) · 3.82 KB
/
model.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
<project name='project' pubsub='auto' threads='1' use-tagged-token='true'>
<description>
This example is a basic demonstrtation of using the tokenization algorithm to
perform text tokenization on streaming input. The project contains a single
continuous query consisting of the following:
1. A source window that receives the text data to be analyzed
2. A calculate window that tokenizes the text in incoming data events
and publishes the results
</description>
<contqueries>
<contquery name='contquery' include-singletons='true' trace='w_source w_calculate'>
<windows>
<window-source name='w_source' insert-only='true' index='pi_EMPTY'>
<description>
This source window receives the included example input data file via a file
socket connector. The input stream is placed into two fields for each observation:
a document ID that acts as the data stream's key, named docId; and a string of
incoming text, named doc.
</description>
<schema>
<fields>
<field name='docId' type='int64' key='true'/>
<field name='doc' type='string'/>
</fields>
</schema>
<connectors>
<connector class='fs' name='publisher'>
<properties>
<property name='type'>pub</property>
<property name='fstype'>csv</property>
<property name='fsname'>input.csv</property>
<property name='transactional'>true</property>
<property name='blocksize'>1</property>
</properties>
</connector>
</connectors>
</window-source>
<window-calculate name="w_calculate" algorithm="Tokenization">
<description>
This calculate window receives data events from the source window, and publishes
word tokens created by the Tokenization algorithm.In this example, the following
input-map and output-map properties govern the calculate window:
docId: Specifies the input variable for the unique docID.
doc: Specifies the input variable for the input doc from the source window.
docIdOut: Specifies the output variable for the unique doc ID.
tokenIdOut: Specifies the output variable for the unique ID of the token.
wordOut: Specifies the output variable for the word content in the token.
startPosOut: Specifies the output variable for the starting position of the token word.
endPosOut: Specifies the output variable for the ending position of the word.
The resulting output is written to the file result.out via a file socket connector.
</description>
<schema>
<fields>
<field name='docId' type='int64' key='true'/>
<field name='tokenId' type='int64' key='true'/>
<field name='word' type='string'/>
<field name='startPos' type='int32'/>
<field name='endPos' type='int32'/>
</fields>
</schema>
<input-map>
<properties>
<property name="docId">docId</property>
<property name="doc">doc</property>
</properties>
</input-map>
<output-map>
<properties>
<property name="docIdOut">docId</property>
<property name="tokenIdOut">tokenId</property>
<property name="wordOut">word</property>
<property name="startPosOut">startPos</property>
<property name="endPosOut">endPos</property>
</properties>
</output-map>
<connectors>
<connector class='fs' name='sub'>
<properties>
<property name='type'>sub</property>
<property name='fstype'>csv</property>
<property name='fsname'>result.out</property>
<property name='snapshot'>true</property>
<property name='header'>true</property>
</properties>
</connector>
</connectors>
</window-calculate>
</windows>
<edges>
<edge source='w_source' target='w_calculate' role='data'/>
</edges>
</contquery>
</contqueries>
</project>