基于草图引导的少样本说话人视频生成算法研究
DOI:
作者:
作者单位:

上海大学 通信与信息工程学院

作者简介:

通讯作者:

中图分类号:

TP37

基金项目:

国家自然科学基金(61871262)


Research on Few-Shot Talking Head Video Generation Algorithm Guided by Sketches
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    说话人视频生成需要对面部纹理和驱动语音进行精准联合建模;为实现该目标,对语义引导的纹理特征形变进行了研究,提出一种基于草图引导的少样本说话人视频生成框架,采用双阶段生成技术进行模态对齐;在第一阶段使用真实先验关键点信息进行语音到目标关键点的生成,第二阶段将关键点转化为草图作为中间表征与参考图片进行语义对齐;草图的引入有效地解决了语音与图像的模态不匹配问题;通过实验测试,算法在公开数据集HDTF和MEAD上的FID指标达到了15.676和8.618;经上述结果验证,提出的算法可通过中间表征有效建模目标音频驱动下的面部纹理,达到与最先进算法相当的生成效果。

    Abstract:

    Talking face generation requires precise joint modeling of facial texture and driven audio; to achieve this goal, research on semantic-guided texture feature deformation has been conducted, proposing a sketch-guided few-shot speaker video generation framework, employing dual-stage generation techniques for modality alignment. In the first stage, real prior facial landmarks information is used to generate the target facial landmarks from audio, and in the second stage, facial landmarks are transformed into sketches as intermediate representations for semantic alignment with reference images. Introduction of sketches effectively addresses the modality mismatch between audio and images; through experimental testing, the algorithm achieves FID scores of 15.676 and 8.618 on the public datasets HDTF and MEAD respectively. The proposed algorithm effectively models facial texture under the drive of target audio through intermediate representations, achieving comparable results to state-of-the-art algorithms as validated by the aforementioned results.

    参考文献
    相似文献
    引证文献
引用本文

魏清杨,徐树公.基于草图引导的少样本说话人视频生成算法研究计算机测量与控制[J].,2024,32(10):236-242.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-28
  • 最后修改日期:2024-05-09
  • 录用日期:2024-05-09
  • 在线发布日期: 2024-10-30
  • 出版日期:
文章二维码