Abstract:Aiming at the problem of speaker speech extraction, a mono speaker speech extraction method based on deep neural network multi-task learning embedded attention mechanism is proposed. The algorithm unifies speech separation and speech extraction into a single framework, embedding the speaker attention mechanism into the spectrum mapping separation network, embeds the speaker attention mechanism in the spectrum mapping separation network, obtains the time-varying attention weight in the attention mechanism the speaker auxiliary information, uses the time-varying attention weight to separate the internal embedding vector of the target speaker, and then uses the extraction model to perform nonlinear processing operations on the embedding vector of the target speaker, estimates the mask corresponding to the target speaker, and then extracts the target speaker’s voice. At the same time, using the TIMIT dataset, speech extraction experiments are carried out. Experimental results verify the feasibility and effectiveness of the proposed algorithm, and have obvious superiority in the performance of speaker speech extraction.