หน วยความจำระยะส นแบบยาว long short term memory LSTM เป นสถาป ตยกรรมโครงข ายประสาทแบบเว ยนซ ำ RNN ท ใช ในสาขา การเร ยนร

หน่วยความจำระยะสั้นแบบยาว (long short-term memory, LSTM) เป็นสถาปัตยกรรม โครงข่ายประสาทแบบเวียนซ้ำ (RNN) ที่ใช้ในสาขา การเรียนรู้เชิงลึก LSTM ต่างจากโครงข่ายประสาทเทียมแบบป้อนไปข้างหน้าแบบมาตรฐานตรงที่ LSTM มีจุดเชื่อมต่อป้อนกลับที่ทำให้เป็น "ตัวคำนวณเอนกประสงค์'' (กล่าวคือสามารถคำนวณอะไรก็ได้ที่เครื่องทัวริงสามารถคำนวณได้) LSTM สามารถประมวลผลได้ไม่เพียงแต่จุดข้อมูลเดียว (เช่น รูปภาพ) แต่ยังรวมถึงแถวลำดับข้อมูลทั้งหมด (เช่น เสียงพูดหรือวิดีโอ) ตัวอย่างเช่น LSTM สามารถนำไปใช้กับงานต่าง ๆ อย่างที่เชื่อมต่อแบบไม่แบ่งส่วน และ การรู้จำคำพูด นิตยสาร ได้เขียนไว้ว่า "ความสามารถนี้ทำให้ LSTM เป็นความสำเร็จด้านปัญญาประดิษฐ์ที่มีในเชิงพาณิชย์มากที่สุด ซึ่งใช้สำหรับทุกอย่างตั้งแต่การทำนายโรคไปจนถึงการแต่งเพลง"

เซลล์หน่วยความจำระยะสั้นแบบยาว (LSTM) ประมวลผลข้อมูลอย่างต่อเนื่องและสามารถรักษาสถานะที่ซ่อนอยู่ไว้ได้เป็นระยะเวลานาน

หน่วย LSTM โดยทั่วไปประกอบขึ้นจากเซลล์ที่ประกอบไปด้วย ประตูป้อนเข้า (input gate), ประตูขาออก (output gate) และ ประตูลืมเลือน (forget gate) ตัวเซลล์จะเก็บค่าตามช่วงเวลาที่กำหนดโดยประตูทั้ง 3 ช่องจะควบคุมการไหลของข้อมูลเข้าและออกจากเซลล์จัดเก็บข้อมูล

โครงข่าย LSTM เหมาะอย่างยิ่งสำหรับปัญหาการจำแนกเชิงสถิติ และ การทำนาย ตามข้อมูล เนื่องจากโครงข่าย LSTM อาจส่งผลให้เกิดความแตกต่างตามช่วงเวลาที่ไม่ทราบระหว่างเหตุการณ์สำคัญในอนุกรมเวลา LSTM ได้รับการพัฒนาขึ้นมาเพื่อจัดการกับ ปัญหาความชันอันตรธานซึ่งอาจเกิดขึ้นเมื่อฝึก RNN แบบดั้งเดิม

LSTM อยู่ในกระแสหลักจนถึงช่วงทศวรรษ 2010 แต่ตั้งแต่ ปี 2017 เป็นต้นมาได้ถูกแทนที่ด้วยแบบจำลองที่มีประสิทธิภาพสูงกว่าที่เรียกว่า ทรานส์ฟอร์เมอร์

อ้างอิง

Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.
Siegelmann, Hava T.; Sontag, Eduardo D. (1992). On the Computational Power of Neural Nets. ACM. Vol. COLT '92. pp. 440–449. doi:10.1145/130385.130432. ISBN .
Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). "A Novel Connectionist System for Improved Unconstrained Handwriting Recognition" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (5): 855–868. 10.1.1.139.4502. doi:10.1109/tpami.2008.137. PMID 19299860.
Sak, Hasim; Senior, Andrew; Beaufays, Francoise (2014). "Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling" (PDF). สืบค้นเมื่อ 2019-04-03.
Li, Xiangang; Wu, Xihong (2014-10-15). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". :1410.4281 [cs.CL].
Vance, Ashlee (May 15, 2018). "Quote: These powers make LSTM arguably the most commercial AI achievement, used for everything from predicting diseases to composing music". Bloomberg Business Week. สืบค้นเมื่อ 2019-01-16.
Hochreiter, Sepp; Schmidhuber, Jürgen (1996-12-03). "LSTM can solve hard long time lag problems". Proceedings of the 9th International Conference on Neural Information Processing Systems. NIPS'96. Cambridge, MA, USA: MIT Press: 473–479.
Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". . 12 (10): 2451–2471. 10.1.1.55.5709. doi:10.1162/089976600300015015. PMID 11032042. S2CID 11598600.

[lstm1997-1] Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.

[Siegelmann92-2] Siegelmann, Hava T.; Sontag, Eduardo D. (1992). On the Computational Power of Neural Nets. ACM. Vol. COLT '92. pp. 440–449. doi:10.1145/130385.130432. ISBN .

[3] Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). "A Novel Connectionist System for Improved Unconstrained Handwriting Recognition" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (5): 855–868. 10.1.1.139.4502. doi:10.1109/tpami.2008.137. PMID 19299860.

[sak2014-4] Sak, Hasim; Senior, Andrew; Beaufays, Francoise (2014). "Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling" (PDF). สืบค้นเมื่อ 2019-04-03.

[liwu2015-5] Li, Xiangang; Wu, Xihong (2014-10-15). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". :1410.4281 [cs.CL].

[bloomberg2018-6] Vance, Ashlee (May 15, 2018). "Quote: These powers make LSTM arguably the most commercial AI achievement, used for everything from predicting diseases to composing music". Bloomberg Business Week. สืบค้นเมื่อ 2019-01-16.

[hochreiter1996-7] Hochreiter, Sepp; Schmidhuber, Jürgen (1996-12-03). "LSTM can solve hard long time lag problems". Proceedings of the 9th International Conference on Neural Information Processing Systems. NIPS'96. Cambridge, MA, USA: MIT Press: 473–479.

[lstm2000-8] Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". . 12 (10): 2451–2471. 10.1.1.55.5709. doi:10.1162/089976600300015015. PMID 11032042. S2CID 11598600.