การสกัดสวนะลักษณ์ทนทานสำหรับระบบรู้จำเสียงพูดภาษาไทย

ชื่อผู้จัดทำโครงงานวิทยาศาสตร์

ศุภวรรษ สวนไพรินทร์

อาจารย์ที่ปรึกษาโครงงานวิทยาศาสตร์

ณัฐกร ทับทอง

สถาบันการศึกษาที่กำกับดูแลโครงงานวิทยาศาสตร์

ภาควิชาฟิสิกส์ คณะวิทยาศาสตร์ จุฬาลงกรณ์มหาวิทยาลัย

ระดับการศึกษา

โครงงานวิทยาศาสตร์ในระดับการศึกษาปริญญาโทขึ้นไป

หมวดวิชา

โครงงานวิทยาศาสตร์ในสาขาวิชาฟิสิกส์

วันที่จัดทำโครงงานวิทยาศาสตร์

01 มกราคม 2541

บทคัดย่อโครงงานวิทยาศาสตร์

งานวิจัยนี้มุ่งศึกษาความทนทานต่อเสียงรบกวน (noise robustness) ของสวนะลักษณ์ (Speech features) ต่างๆ สำหรับการรู้จำเสียงพูดสำหรับภาษาไทย โดยสวนะลักษณ์ที่นำมาใช้ได้แก่ Mel Frequency Ceptral Coefficient (MFCC) Perceptual Linear Predictive (PLP) Relative Spectral (RASTA) Running Spectrum Filtering (RSF) on MFCC และ Dynamic Range Adjustment (DRA) on MFCC เทคนิคในการรู้จำที่ นำมาใช้ คือ แบบจำลองฮิดเดนมาร์คอฟ (Hidden Markov Models) มีการเปรียบเทียบประสิทธิภาพการรู้จำเสียงพูดใน 3 รูปแบบ 1. เปรียบเทียบประสิทธิภาพการรู้จำเสียงพูดใน 2 ระดับของการรู้จำ คือ ระดับคำ (word level) และระดับคำย่อยแบบหน่วยต้น หน่วยตาม (onset rhyme level) 2. เปรียบเทียบประสิทธิภาพการรู้จำเสียงพูดโดยใช้สวนะลักษณ์ที่ต่างกัน 3. เปรียบเทียบประสิทธิภาพการรู้จำเสียงพูดในสภาวะที่มีเสียงรบกวนต่างกัน คือ 3.1 สภาวะเสียงสะอาด 3.2 สภาวะที่มีเสียงรบกวนแบบสุ่ม (white noise) 3.3 สภาวะที่มีเสียงรบกวนจากสิ่งแวดล้อม (environmental noise) ได้แก่ เสียงรบกวนจากท้องถนน (road noise) และ เสียงรบกวนจากการสนทนา (cafeteria noise)ในแต่ละสภาวะมีการปรับระดับของสัญญาณเสียงต่อเสียงรบกวน (signal to noise ratio) เป็น 100 10 และ 20 dB SNRผลการทดลองพบว่า ประสิทธิภาพการรู้จำเสียงพูดระดับคำย่อยที่แบ่งคำระดับหน่วยต้น หน่วยตาม ได้อัตราการรู้จำเสียงพูดสูงกว่าการรู้จำเสียงพูดระดับคำมาก และสวนะลักษณ์ RASTA ให้อัตราการรู้จำเสียงพูดสูงสุด และมีความทนทานต่อเสียงรบกวนมากที่สุด

In this project, the purpose is to study noise robustness of speech features on a Thai proper name speech recognition. The features consist of three conventional features, Mel Frequency Ceptral Coefficient (MFCC), Perceptual Linear Predictive (PLP), and Relative Spectral (RASTA) and two new robust features, Running Spectrum Filtering (RSF) on MFCC and Dynamic Range Adjustment (DRA) on MFCC. The HiddenMarkov Models is used to be the model of recognition.The efficiency of recognition is compared in 3 directions. 1. Level of recognition : word level and onset rhyme level 2. Types of speech feature 3. Noise varieties 3.1 Clean speech signal 3.2 Speech signal with white noise at 10, 0, 10, 20 dB SNR 3.3 Speech signal with environmental noise at 10, 0, 10, 20 dB SNR(road noise and cafeteria noise)The result indicates that the onset rhyme level recognition offers a very higher recognition rate than the word level recognition. RASTA gives the highest recognition rate in almost all conditions and has the greatest noise robustness.