TK1059 : Any-to-Many Voice Conversion with Non-Parallel Data
Thesis > Central Library of Shahrood University > Electrical Engineering > MSc > 2024
Authors:
[Author], [Supervisor]
Abstarct: Abstract This thesis focuses on the task of converting one speaker's voice to another using sequence-to-sequence modeling with location-relative attention in a one-to-many voice conversion system. This technology has applications in various fields, such as personalized voice interactions, dubbing and content creation, voice assistants, and any application requiring natural and target speaker-like voice generation. The proposed method is baxsed on location-aware sequence-to-sequence modeling to better capture temporal correlations between the source and target voices. To enhance system performance, several modifications have been applied to different network components. Initially, changes were made to the Bottle-neck Feature Prenet using a transformer-baxsed encoder. Then, the Pitch Encoder was modified using convolutional methods, residual units, and temporal convolutions. Additionally, during the training phase, weight optimization was performed in the MOL attention decoder using Xavier and He Initialization, which led to performance improvements. In the post-processing stage, the PostNet was enhanced with 1D convolutional laxyers, depthwise separable convolutions, and gated convolutions. Experimental results demonstrate that these modifications improve quality, increase the similarity of the converted voice to the target speaker, and reduce unwanted noise. Overall, these enhancements have led to a 10% improvement in MOS scores. Furthermore, F0 RMSE, MCD, CER, and WER results indicate that the +BNE-seq2seqmol approach outperforms the BNE-seq2seqmol method. The final outcomes confirm the generation of high-accuracy and high-quality speech output, highlighting the potential of this approach for various applications and valuable advancements in the field.
Keywords:
#Keywords: Residual Block #Transformer #Positional Encoder #Temporal Convolution. Keeping place: Central Library of Shahrood University
Visitor: