huggingface · lewtun · Apr 5, 2022 · Apr 1, 2022 · Apr 1, 2022 · Apr 2, 2022
diff --git a/chapters/th/_toctree.yml b/chapters/th/_toctree.yml
@@ -11,4 +11,20 @@
 - title: 3. การ fine-tune โมเดลที่ผ่านการเทรนมาแล้ว (pretrained model)
   sections:
   - local: chapter3/1
-    title: บทนำ
+    title: บทนำ
+
+- title: 4. การแบ่งปันโมเดลและ tokenizers
+  sections:
+  - local: chapter4/1
+    title: The Hugging Face Hub
+  - local: chapter4/2
+    title: การใช้งานโมเดลที่ผ่านการเทรนมาแล้ว (pretrained models)
+  - local: chapter4/3
+    title: การแบ่งปันโมเดลที่ผ่านการเทรนมาแล้ว (pretrained models)
+  - local: chapter4/4
+    title: การสร้างการ์ดโมเดล (model card)
+  - local: chapter4/5
+    title: จบพาร์ทที่ 1!
+  - local: chapter4/6
+    title: คำถามท้ายบท
+    quiz: 4
diff --git a/chapters/th/chapter4/1.mdx b/chapters/th/chapter4/1.mdx
@@ -0,0 +1,17 @@
+# The Hugging Face Hub
+
+[Hugging Face Hub](https://huggingface.co/) –- เว็บไซต์หลักของเรา –- เป็นแพลตฟอร์มกลางที่ทุกคนสามารถค้นหาโมเดลและชุดข้อมูลที่ล้ำสมัยที่สุด (state-of-the-art) และสามารถนำไปใช้งาน รวมถึงมีส่วนร่วมได้ เรามีโมเดลที่เปิดให้ใช้ได้อย่างเป็นสาธารณะที่หลากหลายมากกว่า 10,000 โมเดลให้เลือกใช้ ซึ่งในบทนี้เราจะมาเจาะลึกลงในเรื่องของโมเดล และเราจะพูดถึงชุดข้อมูลในบทที่ 5
+
+โมเดลใน hub ของเราไม่ได้มีแค่ 🤗 Transformers หรือ NLP เท่านั้น ยังมีโมเดลจาก [Flair](https://github.com/flairNLP/flair) และ [AllenNLP](https://github.com/allenai/allennlp) สำหรับงาน NLP, [Asteroid](https://github.com/asteroid-team/asteroid) และ [pyannote](https://github.com/pyannote/pyannote-audio) สำหรับงานเสียง และ [timm](https://github.com/rwightman/pytorch-image-models) สำหรับงานภาพ และอื่นๆอีกมากมาย 
+
+โมเดลเหล่านี้ถูกเก็บไว้ในรูปแบบของ Git repository ซึ่งนั่นสามารถทำให้เกิดการกำหนดเวอร์ชั่น (versioning) และการทำซ้ำได้ (reproducibility) การแบ่งปันโมเดลบน hub นั้นหมายถึงการปล่อยโมเดลสู่ชุมชน และทำให้ผู้คนสามารถเข้าถึงโมเดลได้อย่างง่ายดาย รวมถึงช่วยกำจัดความจำเป็นในการเทรนโมเดลด้วยตัวเอง และทำให้สามารถแบ่งปันและใช้งานได้ง่ายอีกด้วย
+
+มากไปกว่านั้นการแบ่งปันโมเดลบน hub ยังเป็นการปล่อย (deploy) API สำหรับใช้โมเดลนั้นในการทำนายผลด้วย ซึ่งทุกคนสามารถนำไปทดสอบใช้งานกับข้อมูลอินพุตที่กำหนดได้เอง (custom inputs) และใช้คู่กับเครื่องมือที่เหมาะสมได้โดยตรงจากหน้าเว็บไซต์ของโมเดลนั้นโดยไม่มีค่าใช้จ่าย
+
+ส่วนที่ดีที่สุดคือ การแบ่งปันและใช้โมเดลสาธารณะบน hub นั้นไม่มีค่าใช้จ่ายโดยสิ้นเชิง! ซึ่งเรามี [แผนแบบจ่าย](https://huggingface.co/pricing) ถ้าหากคุณต้องการจะแบ่งปันโมเดลอย่างเป็นส่วนตัว
+
+วีดีโอข้างล่างนี้แสดงวิธีการนำทางไปหน้าต่างๆใน hub
+
+<Youtube id="XvSGPZFEjDY"/>
+
+ในส่วนนี้คุณจำเป็นที่จะต้องมีบัญชี huggingface.co เพื่อที่จะทำตาม เพราะเราจะมีการสร้างและจัดการ repositories บน Hugging Face Hub: [สร้างบัญชี](https://huggingface.co/join)
diff --git a/chapters/th/chapter4/2.mdx b/chapters/th/chapter4/2.mdx
@@ -0,0 +1,96 @@
+<FrameworkSwitchCourse {fw} />
+
+# การใช้งานโมเดลที่ผ่านการเทรนมาแล้ว (pretrained models)
+
+{#if fw === 'pt'}
+
+<DocNotebookDropdown
+  classNames="absolute z-10 right-0 top-0"
+  options={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/chapter4/section2_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/chapter4/section2_pt.ipynb"},
+]} />
+
+{:else}
+
+<DocNotebookDropdown
+  classNames="absolute z-10 right-0 top-0"
+  options={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/chapter4/section2_tf.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/chapter4/section2_tf.ipynb"},
+]} />
+
+{/if}
+
+Model Hub ทำให้การเลือกใช้โมเดลที่เหมาะสมเป็นเรื่องง่ายขนาดที่ว่า การใช้งานมันคู่กับ library ปลายน้ำสามารถเสร็จได้ในการใช้โค้ดเพียงไม่กี่บรรทัดเท่านั้น มาดูวิธีใช้โมเดลพวกนี้และการให้ความช่วยเหลือกับชุมชนกันดีกว่า
+
+สมมุติว่าเรากำลังมองหาโมเดลภาษาฝรั่งเศสที่สามารถเติมคำที่หายไปได้ (mask filling)
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/camembert.gif" alt="Selecting the Camembert model." width="80%"/>
+</div>
+
+เราเลือก `camembert-base` checkpoint มาลองใช้ ตัวระบุ `camembert-base` คือทั้งหมดที่เราต้องการในการเริ่มใช้งาน! อย่างที่คุณได้เห็นไปแล้วในบทก่อนหน้านี้ เราสามารถเรียกใช้งานมันได้ด้วยคำสั่ง `pipeline()`:
+
+```py
+from transformers import pipeline
+
+camembert_fill_mask = pipeline("fill-mask", model="camembert-base")
+results = camembert_fill_mask("Le camembert est <mask> :)")
+```
+
+```python out
+[
+  {'sequence': 'Le camembert est délicieux :)', 'score': 0.49091005325317383, 'token': 7200, 'token_str': 'délicieux'}, 
+  {'sequence': 'Le camembert est excellent :)', 'score': 0.1055697426199913, 'token': 2183, 'token_str': 'excellent'}, 
+  {'sequence': 'Le camembert est succulent :)', 'score': 0.03453313186764717, 'token': 26202, 'token_str': 'succulent'}, 
+  {'sequence': 'Le camembert est meilleur :)', 'score': 0.0330314114689827, 'token': 528, 'token_str': 'meilleur'}, 
+  {'sequence': 'Le camembert est parfait :)', 'score': 0.03007650189101696, 'token': 1654, 'token_str': 'parfait'}
+]
+```
+
+อย่างที่คุณเห็น การโหลดโมเดลใน pipeline นั้นง่ายมากๆ สิ่งเดียวที่ควรระวังคือ checkpoint ที่คุณเลือกนั้นควรเหมาะสมกับประเภทของงานที่คุณจะทำ อย่างเช่น ในงานนี้เราโหลด `camembert-base` checkpoint ใน `fill-mask` pipeline ซึ่งเหมาะกับงานที่เราจะใช้อย่างแน่นอน แต่ถ้าเราโหลด checkpoint นี้ใน `text-classification` pipeline ผลลัพธ์จะไม่สมเหตุสมผล เพราะหัวข้อของ `camembert-base` ไม่เหมาะสมกับงานประเภทนี้! เราแนะนำให้ใช้ตัวเลือกประเภทงาน (task selector) ในอินเตอร์เฟซของ Hugging Face Hub เพื่อเลือก checkpoints ที่เหมาะสม
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/tasks.png" alt="The task selector on the web interface." width="80%"/>
+</div>
+
+คุณสามารถเรียกใช้ checkpoint โดยการใช้สถาปัตยกรรมโมเดล (model architecture) ได้โดยตรงด้วย:
+
+{#if fw === 'pt'}
+```py
+from transformers import CamembertTokenizer, CamembertForMaskedLM
+
+tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
+model = CamembertForMaskedLM.from_pretrained("camembert-base")
+```
+
+อย่างไรก็ตาม เราแนะนำให้ใช้ [คลาส `Auto*`](https://huggingface.co/transformers/model_doc/auto.html?highlight=auto#auto-classes) แทน เพราะว่ามันเป็นคลาสที่สามารถใช้ได้กับสถาปัตยกรรมหลายประเภท (design architecture-agnostic) ในขณะที่โค้ดก่อนหน้านี้จำกัดผู้ใช้อยู่กับ checkpoints ที่สามารถโหลดได้เฉพาะกับสถาปัตยกรรมแบบ CamemBERT การใช้คลาส `Auto*` นั้นทำให้การเปลี่ยน checkpoints เป็นเรื่องง่าย:
+
+```py
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+
+tokenizer = AutoTokenizer.from_pretrained("camembert-base")
+model = AutoModelForMaskedLM.from_pretrained("camembert-base")
+```
+{:else}
+```py
+from transformers import CamembertTokenizer, TFCamembertForMaskedLM
+
+tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
+model = TFCamembertForMaskedLM.from_pretrained("camembert-base")
+```
+
+อย่างไรก็ตาม เราแนะนำให้ใช้ [คลาส `TFAuto*`](https://huggingface.co/transformers/model_doc/auto.html?highlight=auto#auto-classes) แทน เพราะว่ามันเป็นคลาสที่สามารถใช้ได้กับสถาปัตยกรรมหลายประเภท (design architecture-agnostic) ในขณะที่โค้ดก่อนหน้านี้จำกัดผู้ใช้อยู่กับ checkpoints ที่สามารถโหลดได้เฉพาะกับสถาปัตยกรรมแบบ CamemBERT การใช้คลาส `TFAuto*` นั้นทำให้การเปลี่ยน checkpoints เป็นเรื่องง่าย:
+
+```py
+from transformers import AutoTokenizer, TFAutoModelForMaskedLM
+
+tokenizer = AutoTokenizer.from_pretrained("camembert-base")
+model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
+```
+{/if}
+
+<Tip>
+เมื่อมีการใช้งานโมเดลที่ผ่านการเทรนมาแล้ว (pretrained model) คุณควรตรวจสอบให้มั่นใจว่ามันถูกเทรนมาอย่างไร กับชุดข้อมูลไหน ขีดจำกัด (limits) และความลำเอียง (biases) คืออะไร ซึ่งข้อมูลทั้งหมดนี้ควรถูกระบุอยู่ในการ์ดโมเดล (model card)
+</Tip>