WEKO3
アイテム
Preserving Word-Level Emphasis in Speech-to-Speech Translation
http://hdl.handle.net/10061/11397
http://hdl.handle.net/10061/113973dca84eb-d316-42e9-8d0b-a6762bdc461e
| 名前 / ファイル | ライセンス | アクション |
|---|---|---|
|
|
|
| アイテムタイプ | 学術雑誌論文 / Journal Article(1) | |||||
|---|---|---|---|---|---|---|
| 公開日 | 2017-04-27 | |||||
| タイトル | ||||||
| タイトル | Preserving Word-Level Emphasis in Speech-to-Speech Translation | |||||
| 言語 | ||||||
| 言語 | eng | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | hidden Markov models | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | language translation | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | LRHSMMs | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | S2ST systems | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | acoustic features | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | conditional random field model | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | cross-lingual communication | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | emphasis translation module | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | linear-regression hidden semiMarkov models | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | paralinguistic information translation | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | part-of-speech tags | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | speech synthesis module | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | speech-to-speech translation | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | target language emphasis sequence | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | word-level emphasis preservation | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | Acoustics | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | Estimation | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | Feature extraction | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | speech synthesis | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | regression analysis | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | Speech | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | Speech processing | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | Speech recognition | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | Emphasis estimation | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | emphasis translation | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | intent | |||||
| キーワード | ||||||
| 主題Scheme | Other | |||||
| 主題 | word-level emphasis | |||||
| 資源タイプ | ||||||
| 資源タイプ | journal article | |||||
| アクセス権 | ||||||
| アクセス権 | open access | |||||
| 著者 |
Do, Quoc Truong
× Do, Quoc Truong× Toda, Tomoki× Neubig, Graham× Sakti, Sakriani× Nakamura, Satoshi |
|||||
| 抄録 | ||||||
| 内容記述タイプ | Abstract | |||||
| 内容記述 | Speech-to-speech translation (S2ST) is a technology that translates speech across languages, which can remove barriers in cross-lingual communication. In the conventional S2ST systems, the linguistic meaning of speech was translated, but paralinguistic information conveying other features of the speech such as emotion or emphasis were ignored. In this paper, we propose a method to translate paralinguistic information, specifically focusing on emphasis. The method consists of a series of components that can accurately translate emphasis using all acoustic features of speech. First, linear-regression hidden semi-Markov models (LRHSMMs) are used to estimate a real-numbered emphasis value for every word in an utterance, resulting in a sequence of values for the utterance. After that the emphasis translation module translates the estimated emphasis sequence into a target language emphasis sequence using a conditional random field model considering the features of emphasis levels, words, and part-of-speech tags. Finally, the speech synthesis module synthesizes emphasized speech with LR-HSMMs, taking into account the translated emphasis sequence and transcription. The results indicate that our translation model can translate emphasis information, correctly emphasizing words in the target language with 91.6% F-measure by objective evaluation. A listening test with human subjects further showed that they could identify the emphasized words with 87.8% F-measure, and that the naturalness of the audio was preserved. | |||||
| 書誌情報 |
en : IEEE/ACM Transactions on Audio, Speech, and Language Processing 巻 25, 号 3, p. 544-556, 発行日 2016-12-21 |
|||||
| 出版者 | ||||||
| 出版者 | IEEE | |||||
| ISSN | ||||||
| 収録物識別子タイプ | ISSN | |||||
| 収録物識別子 | 2329-9304 | |||||
| 出版者版DOI | ||||||
| 関連タイプ | isVersionOf | |||||
| 識別子タイプ | DOI | |||||
| 関連識別子 | https://doi.org/10.1109/TASLP.2016.2643280 | |||||
| 収録物識別子 | ||||||
| 収録物識別子タイプ | NCID | |||||
| 収録物識別子 | AA12669539 | |||||
| 権利 | ||||||
| 権利情報 | (c) 2017 IEEE | |||||
| 著者版フラグ | ||||||
| 出版タイプ | AM | |||||