-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathconfig.py
62 lines (57 loc) · 3.06 KB
/
config.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# Presentation of the challenge
context_markdown = """
Manufacturing process feature selection and categorization
"""
content_markdown = """
Abstract: Data from a semi-conductor manufacturing process
Data Set Characteristics: Multivariate
Number of Instances: 1567
Area: Computer
Attribute Characteristics: Real
Number of Attributes: 591
Date Donated: 2008-11-19
Associated Tasks: Classification, Causal-Discovery
Missing Values? Yes
A complex modern semi-conductor manufacturing process is normally under consistent
surveillance via the monitoring of signals/variables collected from sensors and or
process measurement points. However, not all of these signals are equally valuable
in a specific monitoring system. The measured signals contain a combination of
useful information, irrelevant information as well as noise. It is often the case
that useful information is buried in the latter two. Engineers typically have a
much larger number of signals than are actually required. If we consider each type
of signal as a feature, then feature selection may be applied to identify the most
relevant signals. The Process Engineers may then use these signals to determine key
factors contributing to yield excursions downstream in the process. This will
enable an increase in process throughput, decreased time to learning and reduce the
per unit production costs.
"""
#------------------------------------------------------------------------------------------------------------------#
# Guide for the participants to get X_train, y_train and X_test
# The google link can be placed in your google drive => get the shared links and place them here.
data_instruction_commands = """
In order to get the data simply run the following command:
```python
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/secom/secom.data', sep=' ', header=None)
```
Please ask the admin in order to get the target and the random seed used for train/test split.
"""
# Target on test (hidden from the participants)
Y_TEST_GOOGLE_PUBLIC_LINK = 'https://drive.google.com/file/d/1-3X4eN_xk00GY4Bf6YU4mGtvQ8s_MDCQ/view?usp=sharing'
#------------------------------------------------------------------------------------------------------------------#
# Evaluation metric and content
from sklearn.metrics import precision_recall_curve as prauc
GREATER_IS_BETTER = True # example for ROC-AUC == True, for MSE == False, etc.
SKLEARN_SCORER = prauc
SKLEARN_ADDITIONAL_PARAMETERS = {}
evaluation_content = """
The predictions are evaluated according to the PR-AUC score.
You can get it using
```python
from sklearn.metrics import average_precision_score as prauc
```
More details [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html).
"""
#------------------------------------------------------------------------------------------------------------------#
# leaderboard benchmark score, will be displayed to everyone
BENCHMARK_SCORE = 0.7
#------------------------------------------------------------------------------------------------------------------#