doc: fix broken website (#1251)

microsoft · Nov 10, 2021 · c0b516b · c0b516b
1 parent 6459f3c
commit c0b516b
Show file tree

Hide file tree

Showing 46 changed files with 555 additions and 579 deletions.
diff --git a/...gnitive_services/CognitiveServices - Create a Multilingual Search Engine from Forms.ipynb b/...gnitive_services/CognitiveServices - Create a Multilingual Search Engine from Forms.ipynb
diff --git a/pipeline.yaml b/pipeline.yaml
@@ -461,18 +461,23 @@ jobs:
           sbt convertNotebooks
     - bash: |
         yarn install
+        cd website
+        yarn
+        yarn build
+      displayName: 'yarn install and build'
+    - bash: |
         git config --global user.name "${GH_NAME}"
         git config --global user.email "${GH_EMAIL}"
         git checkout -b main
         echo "machine github.com login ${GH_NAME} password ${GH_TOKEN}" > ~/.netrc
         cd website
-        yarn && GIT_USER="${GH_NAME}" yarn deploy
+        GIT_USER="${GH_NAME}" yarn deploy
       condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/master'))
       env:
         GH_NAME: $(gh-name)
         GH_EMAIL: $(gh-email)
         GH_TOKEN: $(gh-token)
-      displayName: 'yarn install and build'
+      displayName: 'yarn deploy'
 
 
 - job: UnitTests

diff --git a/website/docs/features/responsible_ai/Data Balance Analysis.md b/website/docs/features/responsible_ai/Data Balance Analysis.md
@@ -19,7 +19,7 @@ In summary, Data Balance Analysis, used as a step for building ML models has the
 
 ## Examples
 
-* [Data Balance Analysis - Adult Census Income](../../../examples/responsible_ai/DataBalanceAnalysis%20-%20Adult%20Census%20Income)
+* [Data Balance Analysis - Adult Census Income](../../../features/responsible_ai/DataBalanceAnalysis%20-%20Adult%20Census%20Income)
 
 ## Usage
 

diff --git a/website/docs/features/responsible_ai/Model Interpretation on Spark.md b/website/docs/features/responsible_ai/Model Interpretation on Spark.md
@@ -26,9 +26,9 @@ Both explainers extends from `org.apache.spark.ml.Transformer`. After setting up
 
 To see examples of model interpretability on Spark in action, take a look at these sample notebooks:
 
-- [Tabular SHAP explainer](../../../examples/responsible_ai/Interpretability%20-%20Tabular%20SHAP%20explainer)
+- [Tabular SHAP explainer](../../../features/responsible_ai/Interpretability%20-%20Tabular%20SHAP%20explainer)
 - [Image explainers](../../../features/responsible_ai/Interpretability%20-%20Image%20Explainers)
-- [Text explainers](../../../examples/responsible_ai/Interpretability%20-%20Text%20Explainers)
+- [Text explainers](../../../features/responsible_ai/Interpretability%20-%20Text%20Explainers)
 
 |                        | Tabular models              | Vector models             | Image models            | Text models           |
 |------------------------|-----------------------------|---------------------------|-------------------------|-----------------------|

diff --git a/website/docs/reference/datasets.md b/website/docs/reference/datasets.md
@@ -30,7 +30,7 @@ tab-separated file with 2 columns (`rating`, `text`) and 10000 rows.  The
 contains free-form text strings in English language.  You can use
 `synapse.ml.TextFeaturizer` to convert the text into feature vectors for machine
 learning models ([see
-example](../../examples/text_analytics/TextAnalytics%20-%20Amazon%20Book%20Reviews/)).
+example](../../features/other/TextAnalytics%20-%20Amazon%20Book%20Reviews/)).
 
 The example dataset is available
 [here](https://mmlspark.azureedge.net/datasets/BookReviewsFromAmazon10K.tsv);

diff --git a/website/src/pages/index.js b/website/src/pages/index.js
@@ -15,7 +15,7 @@ const snippets = [
   {
     label: "Cognitive Services",
     further:
-      "docs/features/CognitiveServices%20-%20Overview#text-analytics-sample",
+      "docs/features/cognitive_services/CognitiveServices%20-%20Overview#text-analytics-sample",
     config: `from synapse.ml.cognitive import *
 
 sentiment_df = (TextSentiment()

diff --git a/website/versioned_docs/version-0.9.1/examples/about.md b/website/versioned_docs/version-0.9.1/examples/about.md
diff --git a/...tion - Adult Census with Vowpal Wabbit.md → ...tion - Adult Census with Vowpal Wabbit.md b/...tion - Adult Census with Vowpal Wabbit.md → ...tion - Adult Census with Vowpal Wabbit.md
diff --git a/...fication/Classification - Adult Census.md → ...fication/Classification - Adult Census.md b/...fication/Classification - Adult Census.md → ...fication/Classification - Adult Census.md
@@ -43,9 +43,7 @@ and so on.  The parameter `numFeatures` controls the number of hashed features.
 
 ```python
 from synapse.ml.train import TrainClassifier
-
 from pyspark.ml.classification import LogisticRegression
-
 model = TrainClassifier(model=LogisticRegression(), labelCol="income", numFeatures=256).fit(train)
 ```
 

diff --git a/...ification - Before and After SynapseML.md → ...ification - Before and After SynapseML.md b/...ification - Before and After SynapseML.md → ...ification - Before and After SynapseML.md
diff --git a/...- Twitter Sentiment with Vowpal Wabbit.md → ...- Twitter Sentiment with Vowpal Wabbit.md b/...- Twitter Sentiment with Vowpal Wabbit.md → ...- Twitter Sentiment with Vowpal Wabbit.md
diff --git a/...iveServices - Celebrity Quote Analysis.md → ...iveServices - Celebrity Quote Analysis.md b/...iveServices - Celebrity Quote Analysis.md → ...iveServices - Celebrity Quote Analysis.md
diff --git a/..._services/CognitiveServices - Create a Multilingual Search Engine from Forms.md b/..._services/CognitiveServices - Create a Multilingual Search Engine from Forms.md
@@ -0,0 +1,165 @@
+---
+title: CognitiveServices - Create a Multilingual Search Engine from Forms
+hide_title: true
+status: stable
+---
+```python
+import os
+key = os.environ['VISION_API_KEY']
+search_key = os.environ['AZURE_SEARCH_KEY']
+translator_key = os.environ['TRANSLATOR_KEY']
+
+search_service = "mmlspark-azure-search"
+search_index = "form-demo-index"
+```
+
+
+```python
+from pyspark.sql.functions import udf
+from pyspark.sql.types import StringType
+
+def blob_to_url(blob):
+  [prefix, postfix] = blob.split("@")
+  container = prefix.split("/")[-1]
+  split_postfix = postfix.split("/")
+  account = split_postfix[0]
+  filepath = "/".join(split_postfix[1:])
+  return "https://{}/{}/{}".format(account, container, filepath)
+
+
+df2 = (spark.read.format("binaryFile")
+       .load("wasbs://[email protected]/forms/*")
+       .select("path")
+       .coalesce(24)
+       .limit(10)
+       .select(udf(blob_to_url, StringType())("path").alias("url"))
+       .cache()
+      )
+
+```
+
+
+```python
+display(df2)
+```
+
+
+```python
+displayHTML("""
+<embed src="https://mmlsparkdemo.blob.core.windows.net/ignite2021/form_svgs/Invoice11205.svg" width="40%"/>
+""")
+```
+
+
+```python
+from synapse.ml.cognitive import AnalyzeInvoices
+
+analyzed_df = (AnalyzeInvoices()
+  .setSubscriptionKey(key)
+  .setLocation("eastus")
+  .setImageUrlCol("url")
+  .setOutputCol("invoices")
+  .setErrorCol("errors")
+  .setConcurrency(5)
+  .transform(df2)
+  .cache())
+
+```
+
+
+```python
+display(analyzed_df)
+```
+
+
+```python
+from synapse.ml.cognitive import FormOntologyLearner
+
+organized_df = (FormOntologyLearner()
+  .setInputCol("invoices")
+  .setOutputCol("extracted")
+  .fit(analyzed_df.limit(10))
+  .transform(analyzed_df)
+  .select("url", "extracted.*")
+  .cache())
+```
+
+
+```python
+display(organized_df)
+```
+
+
+```python
+from pyspark.sql.functions import explode, col
+itemized_df = (organized_df
+        .select("*", explode(col("Items")).alias("Item"))
+        .drop("Items")
+        .select("Item.*", "*")
+        .drop("Item"))
+
+```
+
+
+```python
+display(itemized_df)
+```
+
+
+```python
+display(itemized_df.where(col("ProductCode") == 6))
+```
+
+
+```python
+from synapse.ml.cognitive import Translate
+
+translated_df = (Translate()
+    .setSubscriptionKey(translator_key)
+    .setLocation("eastus")
+    .setTextCol("Description")
+    .setErrorCol("TranslationError")
+    .setOutputCol("output")
+    .setToLanguage(["zh-Hans", "fr", "ru", "cy"])
+    .setConcurrency(5)
+    .transform(itemized_df)
+    .withColumn("Translations", col("output.translations")[0])
+    .drop("output", "TranslationError")
+    .cache())
+
+```
+
+
+```python
+display(translated_df)
+```
+
+
+```python
+from synapse.ml.cognitive import *
+from pyspark.sql.functions import monotonically_increasing_id, lit
+
+(translated_df
+  .withColumn("DocID", monotonically_increasing_id().cast("string"))
+  .withColumn("SearchAction", lit("upload"))
+  .writeToAzureSearch(
+    subscriptionKey=search_key,
+    actionCol="SearchAction",
+    serviceName=search_service,
+    indexName=search_index,
+    keyCol="DocID")
+)
+
+```
+
+
+```python
+import requests
+url = 'https://{}.search.windows.net/indexes/{}/docs/search?api-version=2019-05-06'.format(search_service, search_index)
+requests.post(url, json={"search": "door"}, headers = {"api-key": search_key}).json()
+```
+
+
+```python
+
+```
diff --git a/.../features/CognitiveServices - Overview.md → ..._services/CognitiveServices - Overview.md b/.../features/CognitiveServices - Overview.md → ..._services/CognitiveServices - Overview.md
@@ -85,20 +85,13 @@ To get started, we'll need to add this code to the project:
 
 ```python
 from pyspark.sql.functions import udf, col
-
 from synapse.ml.io.http import HTTPTransformer, http_udf
-
 from requests import Request
-
 from pyspark.sql.functions import lit
-
 from pyspark.ml import PipelineModel
-
 from pyspark.sql.functions import col
-
 import os
 
-
 ```
 
 
@@ -122,22 +115,13 @@ if os.environ.get("AZURE_SERVICE", None) == "Microsoft.ProjectArcadia":
 ```python
 from synapse.ml.cognitive import *
 
-
-
 # A general Cognitive Services key for Text Analytics, Computer Vision and Form Recognizer (or use separate keys that belong to each service)
-
 service_key = os.environ["COGNITIVE_SERVICE_KEY"]
-
 # A Bing Search v7 subscription key
-
 bing_search_key = os.environ["BING_IMAGE_SEARCH_KEY"]
-
 # An Anomaly Dectector subscription key
-
 anomaly_key = os.environ["ANOMALY_API_KEY"]
-
 # A Translator subscription key
-
 translator_key = os.environ["TRANSLATOR_KEY"]
 ```
 

diff --git a/...itiveServices - Predictive Maintenance.md → ...itiveServices - Predictive Maintenance.md b/...itiveServices - Predictive Maintenance.md → ...itiveServices - Predictive Maintenance.md
diff --git a/...cs/version-0.9.1/features/http/HttpOnSpark - Working with Arbitrary Web APIs.md b/...cs/version-0.9.1/features/http/HttpOnSpark - Working with Arbitrary Web APIs.md