forked from dvl6072/keys
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDMS
3719 lines (2672 loc) · 119 KB
/
DMS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
What are some sources of data can be considered as "big data"
A. Transactions from banking and business operations
B. Web traffics on the internet
C. Call logs of a telecommunications provider like AT&T
D. All of the these
D
The term "data mining" refers to the process of discovering useful patterns in data ?
A. True
B. False
A
What is not an application of data mining in information security fields ?
A. Auditing: analyze the audit data and determine if there are any abnormalities
B. Intrusion detection: examine the activities and determine whether unauthorized intrusions have occurred or will occur
C. Customer Modeling: finding the habits of customers for competitive advantages
D. Data quality: examine the data and determine whether the data is incomplete
C
The fields of machine learning and data mining is known as Knowledge Discovery
A. True
B. False
A
The informal definition of machine learning: "the field of study that gives computers the ability to learn without being explicitly programmed"
A. True
B. False
A
What is the valid Knowledge Discovery process circle
A. Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment
B. Business Understanding, Data Preparation, Data Understanding, Modeling, Evaluation, Deployment
C. Business Understanding, Data Understanding, Data Preparation, Evaluation, Modeling, Deployment
D. Data Understanding, Business Understanding, Data Preparation, Modeling, Evaluation, Deployment
A
Machine learning or "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E". If apply to "classify phishing emails" what is the P
A. The probability of being classified correctly
B. Classifying phishing email task
C. A data of labeled emails
D. None of these
A
Which of these are the popular data mining tasks ?
A. Classification
B. Clustering
C. Visualization
D. All of these
D
What is not a factor of a successful applications of data mining
A. Require knowledge-based decisions
B. Have accessible, sufficient, and relevant data
C. Have a changing environment
D. Provides low payoff for the right decisions
D
Unstructured data usually stored relational database system before being preprocessed in Data Preparation phase ?
A. True
B. False
B
In classification problems of machine learning, we are instead trying to predict results in a continuous output?
A. True
B. False
B
Supervised Learning can be categorized into "regression" and "classification" problems ?
A. True
B. False
A
Choose one algorithm that is not of type supervised learning?
A. Linear regression
B. Decision tree
C. Linear Support Vector Machine
D. K-means
D
In linear regression, we are trying to:
A. make a best straight line which passes through the training data points
B. constructs a hyperplane which has the largest distance to the nearest training data points of any class (largest margin)
C. applying Bayes' theorem with strong independence assumptions between every pair of features
D. None of these
A
Which of the problems below are best addressed using a supervised learning algorithm
A. From the network flow of a process in computer, predict whether or not it's related to a botnet
B. From a large malware samples, try to category them based on its behavior
C. From the web access logs, predict if any sessions come from abnormal users
D. All of these
A,C
Weka is a machine learning software to solve data mining problems ?
A. False
B. True
B
Which are some problems of finding patterns ?
A. Most patterns are not interesting
B. Patterns may be inexact
C. Data may be garbled or missing
D. All of these
D
Which of the problems below are best addressed using a unsupervised learning algorithm
A. From the network flow of a process in computer, predict weather or not it's related to a botnet
B. From a large malware samples, try to category them based on its behavior
C. From the web access logs, predict if any sessions come from abnormal users
D. All of these
B
To make a good predictor (the model after feeding the training data into algorithm) we can change the algorithm parameters ?
A. True
B. False
A
The term "overfitting" in machine learning refers what ?
A. It is too dependent on that data and it is likely to have a higher error rate on new unseen data
B. It is not adequately capture the underlying structure of the data and such a model will tend to have poor predictive performance
A
What are the forms of input that a machine learning might take ?
A. Concepts
B. Instances
C. Attributes
D. All of them
D
What is a concept which is the input form of machine learning ?
A. Kinds of things that can be learned or what we are trying to find—the result of the learning process
B. An individual, independent example of the concept to be learned
C. Is characterized by the values of attributes that measure different aspects of the instance
D. None of them
A
What is an instance which is the input form of machine learning ?
A. Kinds of things that can be learned or what we are trying to find—the result of the learning process
B. An individual, independent example of the concept to be learned
C. Is characterized by the values of attributes that measure different aspects of the instance
D. None of them
B
What is an attribute which is the input form of machine learning ?
A. Kinds of things that can be learned or what we are trying to find—the result of the learning process
B. An individual, independent example of the concept to be learned
C. Is characterized by the values of attributes that measure different aspects of the instance
C
What are basically different styles of learning commonly appear in data mining applications ?
A. Classification learning, Association learning
B. Classification learning, Association learning, Clustering
C. Classification learning, Association learning, Clustering, Numeric prediction
D. Classification learning
C
Describe classification learning style ?
A. The learning scheme is presented with a set of classified examples from which it is expected to learn a way of classifying unseen examples
B. Any association among features is sought, not just ones that predict a particular class value
C. Groups of examples that belong together are sought
D. The outcome to be predicted is not a discrete class but a numeric quantity
A
Describe association learning ?
A. The learning scheme is presented with a set of classified examples from which it is expected to learn a way of classifying unseen examples
B. Any association among features is sought, not just ones that predict a particular class value
C. Groups of examples that belong together are sought
D. The outcome to be predicted is not a discrete class but a numeric quantity
B
Describe clustering learning ?
A. The learning scheme is presented with a set of classified examples from which it is expected to learn a way of classifying unseen examples
B. Any association among features is sought, not just ones that predict a particular class value
C. Groups of examples that belong together are sought
D. The outcome to be predicted is not a discrete class but a numeric quantity
C
Describe numeric prediction (regression) learning ?
A. The learning scheme is presented with a set of classified examples from which it is expected to learn a way of classifying unseen examples
B. Any association among features is sought, not just ones that predict a particular class value
C. Groups of examples that belong together are sought
D. The outcome to be predicted is not a discrete class but a numeric quantity
D
It's called supervised learning because its scheme operates under supervision by being provided with the actual outcome for each of the training examples
A. True
B. False
A
Suppose in the weather data an attribute has values sunny, overcast, and rainy. So its attributes not belong to which type ?
A. Nominal attribute
B. Ordinal attribute
C. Numeric (continuous) attribute
D. None of them
C
Nominal attributes differ from ordinal attributes in which it's the ones that make it possible to rank order the categories (like hot > mild > cool)
A. True
B. False
B
Nâng cấp để gỡ bỏ quảng cáo
Chỉ 35,99 US$/năm
What are the challenges when gathering data together?
A. Different data sources
B. External data may be required
C. Type and level of data aggregation
D. All of them
D
ARFF file is an ASCII text file that describes a list of instances sharing a set of attributes in weka ?
A. True
B. False
A
Some issues like missing values, inaccurate values, unbalanced data are all insignificance things in preparing the input step ?
A. True
B. False
B
"knowledge" pattern representation is the process of representing the patterns that can be discovered by machine learning
A. True
B. False
A
______ is the way of representing the output from machine learning that is the same form as the input and can be considered as a lookup table
A. Decision table
B. Decision tree
C. Decision rule
D. Linear model
A
Which statements are true for decision tree pattern representation ?
A. Nodes in a decision tree involve testing a particular attribute. Usually, the test compares an attribute value with a constant.
B. Leaf nodes give a classification that applies to all instances that reach the leaf, or a set of classifications
C. To classify unknown instance it's is routed down the tree
D. All of them
D
In decision tree pattern representation, if an attribute that is tested at a node is a ______ one, the number of children is usually the number of possible values of the attribute
A. Nominal
B. Numeric
A
In decision tree pattern representation, if the attribute is numeric, the test at a node usually determines whether its value is greater or less than a predetermined constant, giving a two-way split
A. True
B. False
A
In decision tree pattern representation, a simple solution is to record the number of elements in the training set that go down each branch and to use the most popular branch if the value for a test instance is missing
A. True
B. False
A
Which statements are true for converting from trees to rules ?
A. One rule is generated for each leaf
B. Produces rules that are unambiguous
C. Resulting rules are unnecessarily complex
D. All of them
D
Decision rules are make up two components antecedent (precondition) and consequent(conclusion) what are they ?
A. Antecedent is a series of tests just like the tests at nodes in decision trees. Consequent gives a class (classes) assigned by a rule
B. Consequent is a series of tests just like the tests at nodes in decision trees. Antecedent gives a class (classes) assigned by a rule
A
Which statements are false for converting from rules to trees ?
A. Tree cannot easily express disjunction between rules (like the same structure but different attributes)
B. None of them
C. Corresponding tree may contains identical subtrees
D. It's easier process compared with the inverse
D
Some advantages of using decision rules over decision trees are:
A. New rules can be added to an existing rule set without disturbing ones already there, whereas to add to a tree structure may require reshaping the whole tree
B. For binary classification problems just need to define rules for only one class
C. None of them
D. All of them
D
Instance-based knowledge representation uses the instances themselves to represent what is learned, rather than inferring a rule set or decision tree and storing it instead ?
A. True
B. False
A
Which statements are true about instance-based learning ?
A. Instance-based learning is lazy, deferring the real work as long as possible
B. Training instances are searched for instance that most closely resembles new instance
C. K-nearest-neighbor method is of this type
D. All of these
D
2-D representation, venn diagram, probabilistic assignment and dendrogram are some output forms of diagrams when clusters rather than a classifier is learned ?
A. True
B. False
A
Differences between relational and propositional decision rules ?
A. Propositional decision rules are rules involved comparing an attribute-value to a constant
B. Relational decision rules exists because a need for comparing attributes with each other
C. None of them
D. All of them
D
For classification, linear model representation defines a decision boundary (hyperplane) which is a line separating classes
A. True
B. False
A
Opposed to 1R method, naive bayes's modeling use all attributes and allow them to make contributions to the modeling ?
A. True
B. False
A
Two assumptions of attributes that naive bayes makes are (all these assumption is almost never correct in real datasets)?
A. Attributes are equally important and statistically independent
B. Attributes are equally important and statistically dependent
A
Suppose Bayes's rule is P(H | E) = P(E | H)*P(H)/P(E) where E is the evidence (attribute values), H is the hypothesis (which class). Probability of event before evidence is seen is
A. P(H)
B. P(H | E)
C. none of these
D. P(E)
A
Suppose Bayes's rule is P(H | E) = P(E | H)*P(H)/P(E) where E is the evidence (attribute values), H is the hypothesis (which class). Probability of event after evidence is seen is
A. P(H)
B. P(H | E)
C. none of these
D. P(E)
B
Which statements are true about the zero-frequency problems ?
A. Occurs if a particular attribute value does not occur in the training set in conjunction with every class value
B. A posteriori probability will also be zero (regardless of how likely the other values are)
C. Fixed by smoothing method (Laplace estimator)
D. All of them
D
Why can we ignore the denominator of Bayes Theorem when calculating the probabilities of each class?
A. It's the same for all classes
B. It can't be ignored
C. It doesn't have a denominator
D. It will add error to the final probability of every class
A
What does zero-frequency problem in Naive Bayes modeling do is by adding 1 to the count for every attribute value-class combination ?
A. True
B. False
A
In Naive Bayes modeling how does it handle missing values ?
A. Just ignore missing attributes from calculation
B. Not problem at all because of zero-frequency problem fixing
C. All of them
D. None of them
A
In Naive Bayes modeling it usually handles numeric attributes by assuming that they have a "normal" or "Gaussian" probability distribution and then compute the most likely parameters of this Gaussian like mean and stand deviation ?
A. True
B. False
A
Popular applications of Naive Bayes classification are ?
A. Email spam filtering
B. News similarity
C. Product's reviews evalution
D. All of them
D
Suppose using Naive Bayes classification for email spam filtering, after you learned that probabilities of event 'spam' and 'not_spam' after evidence are 0.4, 0.6 corresponding. So the email is classified as:
A. Spam
B. Not spam
B
The general steps for constructing decision trees are first select an attribute to place at the root node, and make one branch for each possible value, then splits up the example set into subsets, one for every value of the attribute, finally repeat recursively for each branch, using only instances that reach the branch
A. True
B. False
A
Which dataset has the largest information impurity
A. A set of 5 samples all have the same class
B. A set of 10 samples all have 5 class A, 5 class B
C. A set of 20 samples all have the same class
D. All of them
B
Which statements are not true about decision tree ?
A. An internal node is a test on an attribute or which feature to split on
B. A branch represents an outcome of the test
C. A leaf node represents a class label or class label distribution
D. All of them
D
______ is the entropy of the class distribution and it represents the expected amount of information that would be needed to classify to one class
A. Impurity measure
B. Information gain
A
Some criterions for attribute selection are ?
A. The one which will result in the largest tree
B. Choose an attribute that has lowest impurity
C. An attribute whose information gain is greatest among others
D. B and C
E. All of them
D
When does the process of building a decision tree stop ?
A. When the data cannot be split any further or no information gain on every leaves
B. When information gain on one feature greater than 0
C. When information gain on every features greater than 0
D. When no information gain on one feature
A
Entropy(impurity measure) is ______ when all classes are equally likely and ______ when one of the classes has probability 1
A. Maximal, minimal
B. Minimal, maximum
A
What is the formula of computing information gain ?
A. Information before splitting - information after splitting
B. Information after splitting- information before splitting
A
Entropy is a function that satisfies all three properties which are ?
A. When node is pure, measure should be zero
B. When impurity is maximal (i.e. all classes equally likely), measure should be maximal
C. Measure should obey multistage property (i.e. decisions can be made in several stages)
D. All of them
D
In decision tree, information gain is biased towards choosing attributes with a small number of values
A. True
B. False
B
The overall effect of having attributes with a large number of distinct values is ?
A. The information gain measure tends to prefer attributes with large numbers of possible values
B. This may result in overfitting (selection of an attribute that is non-optimal for prediction)
C. All of them
D. None of them
C
The purpose of gain ratio or weighted information gain is that making an importance of attribute decreases as intrinsic information gets larger
A. True
B. False
A
Some limitations of decision tree algorithm are ?
A. Many implementations use divide-and-conquer method for construction so it's not globally optimal solution
B. They potentially overfit the data
C. All of them
D. None of them
C
Which statements are true about "gain ratio" ?
A. Should be large when data is evenly spread and small when all data belong to one branch
C. Takes number and size of branches into account when choosing an attribute
B. A modification of the information gain that reduces its bias on high-branch attributes
D. All of them
D
Gini impurity, information gain ratio or information entropy are all functions for:
A. Impurity measure
B. The expected information gain (or the change in information entropy from a prior state)
A
ID3 algorithm is an extension of algorithm C.45 with some improvements like numeric attributes, missing values, pruning strategy?
A. True
B. False
B
Which statements are false about how C.45 handles numeric attributes ?
A. The standard method is binary splits
B. Unlike nominal attributes, every attribute has many possible split points
C. To choose "best" split point, evaluate info gain for every possible split point of attribute
D. This process is less computationally demanding than for nominal attributes
D
Which statements are true when comparing binary(on numeric attribute) vs multiway (on nominal attribute) splits in C.45?
A. Splitting (multi-way) on a nominal attribute exhausts all information in that attribute
B. Numeric attribute may be tested several times along a path in the tree
C. Disadvantages of the tree using binary split are messy and difficult to understand
D. All of them
D
Because the decision tree models tend to ______ so one way to solve this problem is to prune the tree
A. Overfitting
B. Underfitting
A
______ is one of pruning strategy to prevent overfitting in the decision tree.
A. Postpruning which take a fully-grown decision tree and discard unreliable parts
B. Postpruning which stop growing a branch when information becomes unreliable
A
______ is one of pruning strategy to prevent overfitting in the decision tree.
A. Prepruning which take a fully-grown decision tree and discard unreliable parts
B. Prepruning which stop growing a branch when information becomes unreliable
B
The reason why most decision tree builders prefer postprune over prepruning is there're some situations occur in which two attributes individually seem to have nothing to contribute but are powerful predictors when combined
A. True
B. False
A
Which statements are true about postpruning ?
A. Two prunning operations are subtree replacement and subtree raising
B. To decide weather or not to prune some strategies are error estimation, significance testing ...
C. Some subtrees might be due to chance effects
D. All of them
D
Which statements are true about prepruning in decision tree ?
A. Based on statistical significance test
B. Most popular test is chi-squared test
C. Quinlan's classic tree learner ID3 used chi-squared test in addition to information gain
D. All of them
D
In decision tree optimization, ______ is a potentially time-consuming operation
A. Subtree raising
B. Subtree replacement
A
The Classification And Regression Tree (CART) algorithm used for ______ modeling problems
A. Classification tree
B. Regression tree
C. Classification tree or regression tree
C
Which statements are true about CART decision-tree algorithm ?
A. Non-parametric (independent of the statistical distribution of the training data)
B. Can model continuous (regression trees) or categorical (classification trees) target variables
C. Can use continuous and non-continuous predictor variables
D. All of them
D
Decision tree CART algorithm is multivariate?
A. True
B. False
B
In decision tree CART model building, at each node in the tree the remaining data (from training points) are split into two groups that have maximum dissimilarity
A. True
B. False
A
In decision tree CART whose features including ?
A. Automatically selects relevant fields
B. No data preprocessing needed
C. Missing value tolerant
D. All of them
D
In decision tree CART model building, which metrics used by CART ?
A. Gini impurity
B. Information gain (based on the concept of entropy)
A
Which criteria decision tree CART use to optimize tree selection ?
A. Deciding on the best tree after growing and pruning
B. Balancing simplicity against accuracy
C. All of them
D. None of them
C
In decision tree CART model building how it handle missing values ?
A. It treats missing as a distinct categorical value
B. It delete cases that have missing values
C. It freeze case in node in which missing splitter encountered
D. It allow cases with missing split variable to follow majority
E. It uses a more refined method —a surrogate
E
A primary splitter is the best splitter of a node so a surrogate (a method used to handle missing values in CART) is a splitter that splits in a fashion similar to the primary
A. True
B. False
A
CART algorithm is a decision tree algorithm ?
A. True
B. False
A
Which statements are false about linear regression
A. The goal is to find a line or a linear combination of its attributes
B. Work most naturally with numeric attributes
C. Weights are calculated from the training data
D. To find the best line, the line must maximum the cost function
D
The squared error function equals zero for linear regression when:
A. The line should pass all points (instances) in training dataset
B. The line should not pass all points (instances) in training dataset
A
Suppose use gradient descent to find linear regression model parameters if its learning is too small what will happen ?
A. Gradient descent can be slow to converge
B. It may fail to converge or even diverge
A
What is the purpose of gradient descent algorithm?
A. To find the point at local minimum or global minimum of the cost function
B. To find the point at local maximum or global maximum of the cost function
A
Logistic Regression differs from Linear Regression because the output of a Logistic Regression model ranges from -∞ to +∞
A. True
B. False
B
Which statements are true about logistic regression ?
A. The output of the model is the estimated probability that class 1 on an instance as input
B. Its model represented by a logistic(sigmod) function
C. Decision boundary for two-class logistic regression is where probability equals 0.5
D. All of them
D
The goal of maximum log-likelihood in logistic regression is to find parameters of decision boundary line so the cost function is minimized?
A. True
B. False
A
which statements are true about instance-based learning ?
A. In instance-based learning the distance function defines what is learned
B. Most instance-based schemes use Euclidean distance
C. For nominal attributes the distance is set to 1 if values are different, 0 if they are equal
D. All of them
D
The goal of normalization is make every datapoint have the same scale so each feature is equally important
A. True
B. False
A
______ algorithm is of type instance-based learning and ______ algorithm if of type clustering ?
A. K nearest neighbor, k means
B. K means, k nearest neighbor
A
What is the definition of training set ?
A. Used by one or more learning schemes to come up with classifiers
B. Used to optimize parameters of those classifiers, or to select a particular one
C. Used to calculate the error rate of the final, optimized, method
D. None of these
A
What is the definition of testing set ? it's the independent instances
A. Used by one or more learning schemes to come up with classifiers
B. Used to optimize parameters of those classifiers, or to select a particular one
C. Used to calculate the error rate of the final, optimized, method
D. None of these
C
What is the definition of validation set ? it's the independent instances
A. Used by one or more learning schemes to come up with classifiers
B. Used to optimize parameters of those classifiers, or to select a particular one
C. Used to calculate the error rate of the final, optimized, method
D. None of these
B
Is it true that three sets training, validation, test set must be chosen independently (three sets must be mutually exclusive) to achieve better the real world model ?
A. True
B. False
A
Difference between holdout and cross-validation method to resolve the problem where we only have a single limited dataset ?
A. The holdout method reserves a certain amount for testing, and uses the remainder for training
B. The cross-validation method reserves a certain amount for testing, and uses the remainder for training
C. In cross-validation, you decide on a fixed number n partitions of the data. Then the data is split into n approximately equal partitions: each in turn is used for testing and the remainder is used for training
D. B and C
E. A and C
D
The problems with cross-validation method for dataset splitting are that it might not be representative or its sets overlapping
A. True
B. False
B
How to calculate the overall error estimate when using 10-fold cross-validation method for dataset splitting ?
A. It's the averaged of the 10 error estimates
B. It's the sum of the 10 error estimates
C. It's the maximum one among 10 error estimates
D. It's the minimum one among 10 error estimates
A
The test set very similarly to the validation set, except it's never a part of building or tuning your model ?
A. True
B. False
A
Describe the properties of leave-one-out cross-validation ?
A. The greatest possible amount of data is used for training in each case
B. The procedure is deterministic: no random sampling is involved
C. Very computationally expensive
D. It guarantees a non-stratified sample because there is only one instance in the test set
E. All of them
F. A, B, C only
E
The term "hyperparameter" refers to ?
A. Parameter that can be tuned to optimize the performance of a learning algorithm like k in k-nearest neighbour classifier
B. From basic parameter that is part of a model, such as a coefficient in a logistic regression
A
How to get a useful estimate of performance for different parameter values ?
A. Build models using different values of k on the new, smaller training set and evaluate them on the validation set
B. Build models using different values of k on the new, smaller training set and evaluate them on the test set
A
______ measures how many classifications your algorithm got correct out of every classification it made
A. Accuracy measure
B. Recall measure
C. Precision measure
D. F1-score measure
A
______ is the percentage of relevant items that your classifier found and calculated as TP/(TP + FN)
A. Accuracy measure
B. Recall measure
C. Precision measure
D. F1-score measure
B
In confusion matrix, two kinds of errors are (choose 2)
A. False positive
B. True positive
C. False negative
D. True negative
A,C
In confusion matrix, two kinds of correction are (choose 2)
A. False positive
B. True positive
C. False negative
D. True negative
B,D
______ is the percentage of items your classifier found that were actually relevant and calculated as TP/(TP + FP)
A. Accuracy measure
B. Recall measure
C. Precision measure
D. F1-score measure
C
______ is a measure that combines precision and recall is the harmonic mean of precision and recall
A. Accuracy measure
B. Recall measure
C. Precision measure
D. F1-score measure
D
Precision and recall are tied to each other. As one goes up, the other will go up too ?
A. True
B. False
B
Why is the F1 score calculated using the harmonic mean?
A. The harmonic mean makes the F1 score low when either precision or recall is low
B. The F1 score is calculated using the arithmetic mean
C. The harmonic mean will consider precision and recall equally.
D. The harmonic mean takes less time to compute than the arithmetic mean
A
In practice, different types of classification errors often incur different costs ?
A. True
B. False
A
The idea behind cost-sensitive classification (learning) is to take costs into account and make predictions that aim to minimize ______ instead of minimizing ______
A. The overall costs, misclassifications
B. Misclassifications, the overall costs
A
Most learning schemes do not perform cost-sensitive learning so simple methods for cost-sensitive learning is:
A. Re-sampling of instances according to costs
B. Weighting of instances according to costs
C. All of them
D. None of them
C
______ is the principal and most commonly used measure for evaluating numeric prediction
A. Mean-squared error
B. Mean absolute error
A
MDL(minimum description length) principle defined as space required to describe a theory and space required to describe the theory's mistakes in which ?
A. The theory is the classifier
B. The mistakes are the errors on the training data
C. All of them
D. None of them
C
Is it true that in practice Data Preparation is estimated to consume 70-80% of the overall effort ?
A. True
B. False
A
Main data cleansing steps include:
A. Data acquisition and metadata
B. Converting nominal to numeric
C. Missing values
D. Discretization
E. All of them
E
Understanding data is important task so in terms of how its relevance typical asking questions are:
A. What data is available for the task ?
B. Is this data relevant ?
C. Is additional relevant data available?
D. What is the number of attributes and its features ?
E. A, B, C
F. All of them
E
In data preparation process, several ways to handle missing values are:
A. Ignore records
B. Treat missing value as a separate value
C. Replace with zero, mean, median values
D. Try to impute the missing values from other fields
E. All of them
E
In data preparation process, the step data acquisition and metadata is ?
A. Process of getting data where it may come from many resources like database systems, flat files, spreadsheets and fulfill its meta
B. Date fields come from many formats so a need to transform them
C. Process of converting nominal to numeric type like binary fields
D. Process of solving a problem where some methods require discrete values like Naive bayes classification but some feature's values
A
In data preparation process, the step data discretizationand metadata is ?
A. Process of getting data can come from many resources like database systems, flat files, spreadsheets and fulfill its meta
B. Date fields come from many formats so a need to transform them
C. Process of converting nominal to numeric type like binary fields
D. Process of solving a problem where some methods require discrete values like Naive bayes classification but some feature's values are not
D
In data preparation process some criteria when doing field selection are ?
A. Remove fields with no or little variability
B. Remove a field where almost all values are the same
C. Remove false predictors which are fields correlated to target behavior
D. All of them
D
In field selection step, false predictors removed and suppose the output of model is to predict the likelihood of passing a course so should remove which field ?
A. The student's final grade
B. The sleeping hours
C. The studying hours
D. The student's final grade of previous session
A
A manual approach to finding false predictors is build an initial decision-tree model and consider very strongly predictive fields which if a field by itself provides close to 100% accuracy as "suspects"
A. True
B. False
A