แบบจำลองภาษา language model ค อแบบจำลองเช งสถ ต ท แสดงการแจกแจงความน าจะเป น สำหร บข อความภาษาธรรมชาต คำอธ บายเม อม สายอ

แบบจำลองภาษา (language model) คือแบบจำลองเชิงสถิติที่แสดงการแจกแจงความน่าจะเป็น สำหรับข้อความภาษาธรรมชาติ

คำอธิบาย

เมื่อมีสายอักขระข้อความที่มีความยาว $m$ แบบจำลองภาษาจะคำนวณความน่าจะเป็นของสายอักขระคำทั้งหมด $P(w_{1},\ldots ,w_{m})$ การแจกแจงความน่าจะเป็นนี้ได้มาจากการฝึกแบบจำลองภาษาโดยใช้คลังข้อความของภาษาหนึ่ง ๆ หรือหลายภาษา อย่างไรก็ตาม เนื่องจากภาษาสามารถมีประโยคที่ถูกต้องได้จำนวนไม่จำกัด ความท้าทายสำหรับการสร้างแบบจำลองภาษาคือการทำให้ลำดับคำที่ถูกต้องทางภาษาที่ไม่พบในข้อมูลการฝึกได้ค่าความน่าจะเป็นที่ไม่เป็นศูนย์ มีการคิดค้นวิธีการสร้างแบบจำลองต่าง ๆ เพื่อจัดการกับปัญหานี้ เช่น สมบัติมาร์คอฟ และสถาปัตยกรรมโครงข่ายประสาทเทียมดังเช่น โครงข่ายประสาทแบบเวียนซ้ำ หรือ ทรานส์ฟอร์เมอร์

แบบจำลองภาษามีประโยชน์สำหรับปัญหาต่าง ๆ ในภาษาศาสตร์คอมพิวเตอร์ เริ่มแรกใช้เพื่องานการรู้จำคำพูด เพื่อป้องกันการคาดคะเนลำดับคำที่ไม่มีความหมายและมีความน่าจะเป็นต่ำ ในปัจจุบันมีการใช้เพื่อวัตถุประสงค์ที่หลากหลาย เช่น การแปลด้วยเครื่อง การสร้างภาษาธรรมชาติที่มีลักษณะเหมือนมนุษย์มากขึ้น รวมถึงการติดฉลากหน้าที่ของคำในประโยค หรือ การรู้จำอักขระด้วยแสง และ และ การค้นคืนสารสนเทศ เป็นต้น

ตั้งแต่ปี 2018 เป็นต้นมา ได้ปรากฏแบบจำลองภาษาที่เรียกว่า แบบจำลองภาษาขนาดใหญ่ (LLM) ขึ้นทำให้ได้เห็นการพัฒนาที่สำคัญ แบบจำลองเหล่านี้ประกอบด้วยโครงข่ายประสาทเทียม เชิงลึก พร้อมพารามิเตอร์น้ำหนักที่เรียนรู้ได้นับพันล้านตัว ซึ่งได้รับการฝึกฝนกับชุดข้อมูลขนาดใหญ่ที่เป็นข้อความที่ไม่มีป้ายกำกับ แบบจำลองภาษาขนาดใหญ่ได้แสดงให้เห็นผลลัพธ์ที่น่าประทับใจในงานการประมวลภาษาธรรมชาติที่หลากหลาย และจุดสนใจของการวิจัยได้เปลี่ยนไปสู่การใช้แบบจำลองภาษาขนาดใหญ่สำหรับวัตถุประสงค์ทั่วไป

อ้างอิง

Jurafsky, Dan; Martin, James H. (2021). "N-gram Language Models". Speech and Language Processing (3rd ed.). สืบค้นเมื่อ 24 May 2022.
Kuhn, Roland, and Renato De Mori (1990). "A cache-based natural language model for speech recognition". IEEE transactions on pattern analysis and machine intelligence 12.6: 570–583.
Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation". Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation". Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
Pham, Vu, et al (2014). "Dropout improves recurrent neural networks for handwriting recognition". 14th International Conference on Frontiers in Handwriting Recognition. IEEE.
Htut, Phu Mon, Kyunghyun Cho, and Samuel R. Bowman (2018). "Grammar induction with neural language models: An unusual replication". arΧiv:1808.10000.
Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. pp. 275–281. doi:10.1145/290941.291008.
Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. pp. 569–584. doi:10.1007/3-540-49653-X_34.

[1] Jurafsky, Dan; Martin, James H. (2021). "N-gram Language Models". Speech and Language Processing (3rd ed.). สืบค้นเมื่อ 24 May 2022.

[2] Kuhn, Roland, and Renato De Mori (1990). "A cache-based natural language model for speech recognition". IEEE transactions on pattern analysis and machine intelligence 12.6: 570–583.

[Semantic_parsing_as_machine_translation-3] Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation". Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).

[Semantic_parsing_as_machine_translation2-4] Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation". Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).

[5] Pham, Vu, et al (2014). "Dropout improves recurrent neural networks for handwriting recognition". 14th International Conference on Frontiers in Handwriting Recognition. IEEE.

[6] Htut, Phu Mon, Kyunghyun Cho, and Samuel R. Bowman (2018). "Grammar induction with neural language models: An unusual replication". arΧiv:1808.10000.

[ponte19982-7] Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. pp. 275–281. doi:10.1145/290941.291008.

[hiemstra19982-8] Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. pp. 569–584. doi:10.1007/3-540-49653-X_34.