YuE:
Open Music Foundation Models for Full-Song Generation
Abstract
We tackle the task of generating whole-song music audio from given lyrics, dubbed lyrics2song. While text-conditioned music generation models have produced high-quality results on short clips of non-vocal music, generating minutes-long full songs with both vocal and accompaniment parts remains a challenging problem, and we only see some satisfactory results from several close-sourced commercial systems. The challenge of lyrics2song mainly lies in 1) the long context nature of music 2) the complexity of the music signal compared to others (speech, audio effects) 3) the distorted linguistic content and 4) lack of parallel data (lyrics-audio pairs). In this paper, we propose YuE, a series of open foundation language models for lyrics2song, incorporated into the llama family. We demonstrate that our method can model up to 5 minutes length of music audio, follow lyrics condition throughout the whole song, maintain coherent musical structure, generate catchy vocal melodies and appropriate accompaniment. We develop several techniques to achieve this: 1) we apply a semantically enhanced audio tokenizer to reduce the training cost and accelerate the convergence 2) we propose a dual-token technique to enable track-synced vocal-instrumental modeling without modifying the llama decoder-only architecture, enjoying the established infrastructure for scaling and serving 3) we introduce lyrics-chain-of-thoughts to allow the model progressively generate the whole song in a single context following lyrics condition 4) a 3-stage training scheme is proposed to ensure better scalability, musicality, and lyrics controllability.
YuE Model Checkpoints HF-Link
Examples
Intro: The Model Song
YuE (乐) means "music" and "happiness" in Chinese. For those who find words starting with Yu difficult to pronounce, it can be pronounced as "yeah."
Modeling Diverse Genres & Vocal Styles
Note:
- - Lyrics are GPT-generated
- - Future work will include more world music samples, e.g. Beijing Opera
Metal: Step Back
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- The riff is sick.
- I am not an expert in metal music. Maybe this vocal technique is called vocal fry?
- Lyrics following looks good.
Jazz: Quiet Evening
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Rap: This is My Life
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Pop: I Won't Back Down
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- Same lyrics but different tags (1-pop/2-pop/3-no pop)
Ballad: Hospital
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Mandarin Pop: My Love
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Pop: Quiet Evening
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Soul: Hold You Anyhow
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Country: Lonesome Road
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Alternative Rock: Corner
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Indie: Act
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Childrens' Song
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Modeling Diverse Languages
Note:
- - Lyrics are picked from authors' playlists, and rewritten by GPT.
- - We actually support a large spectrum of languages, both western and eastern. We currently provide demos in English, Chinese (Mandarin and Cantonese), Japannese and Korean.
- - The model supports code-switching between languages.
- - Will work on more samples from different languages. Musicality may vary.
Mandarin-English Hiphop: 酷佬
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- Chinese gangsta rap
English + Japanese + Korean Code Switching Kpop: 完璧な関係
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- The arrangement of this sample is amazing.
- OG song: https://www.youtube.com/watch?v=v1NMaIQ58N0 OFFICIAL by EXID
Cantonese Ballad: 爱你无需讲道理
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
Mandarin Rock: 你要跳舞吗
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- We actually did in-context-learning here, using part of the original song as prompt. You can hear the vocalist sounds similar.
- OG song link: https://www.youtube.com/watch?v=4ZaHdnfI6iQ 《你要跳舞吗》by 新裤子乐队
- The generated song is more like a western hardrock. No direct copying here.
Emergent Spontaneous Performance & Advanced Vocal Techniques
Note:
- - Lyrics are rewritten by GPT and directly generated by GPT.
- - We are showcasing advanced vocal techniques and spontaneous performances that demand years of professional training and talent, which have been learned by our 7B model.
Scatting
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- Originating in vocal jazz, scat singing or scatting is vocal improvisation with wordless vocables, nonsense syllables or without words at all.
- Scatting starts when running out of lyrics at the end of the song.
- You can jump to 2:20 if you only want to listen to scatting.
- But the vocal performance is beautiful anyways.
Death Growl
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- It is pretty loud. Turn down your headphone!!
- You can start at 0:50.
- The lyrics following is quite well even under this extreme low vocal-to-accompaniment ratio.
- Solo is a bit long and repetitive.
- Actually, we support chinese genre tags to some extent.
Mix Voice
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- Mix voice is the blending of the chest voice and the head voice. It is also the elimination of the bridge- that pesky gap that connects the two registers.
- Commonly seem in power metal.
Powerful Belt, Riffs and Runs 01
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- This sample is from an old checkpoint.
- Tons of riffs and runs. But less control.
Powerful Belt, Riffs and Runs 02
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- This one is new.
- Sounds great. But a lot of distortion at high vocal range. We plan to address it in the next version.
A cappella
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- We do not support the tag "a cappella". But this sample is prompt with in-context-learning.
- Lyrics are rewritten by GPT.
- OG song: https://www.youtube.com/watch?app=desktop&v=Urf7wDavQKw&t=0s
Harmonica solo improvisation
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
- A harmonica improvisation at 2:00.
Thanks to the following organizations for their support
And also thanks geely
Citation
@misc{yuan2025yue, title={YuE: Open Music Foundation Models for Full-Song Generation}, author={Ruibin Yuan and Hanfeng Lin and Shawn Guo and Ge Zhang and Jiahao Pan and Yongyi Zang and Haohe Liu and Xingjian Du and Xeron Du and Zhen Ye and Tianyu Zheng and Yinghao Ma and Minghao Liu and Lijun Yu and Zeyue Tian and Ziya Zhou and Liumeng Xue and Xingwei Qu and Yizhi Li and Tianhao Shen and Ziyang Ma and Shangda Wu and Jun Zhan and Chunhui Wang and Yatian Wang and Xiaohuan Zhou and Xiaowei Chi and Xinyue Zhang and Zhenzhu Yang and Yiming Liang and Xiangzhou Wang and Shansong Liu and Lingrui Mei and Peng Li and Yong Chen and Chenghua Lin and Xie Chen and Gus Xia and Zhaoxiang Zhang and Chao Zhang and Wenhu Chen and Xinyu Zhou and Xipeng Qiu and Roger Dannenberg and Jiaheng Liu and Jian Yang and Stephen Huang and Wei Xue and Xu Tan and Yike Guo}, howpublished={\url{https://github.com/multimodal-art-projection/YuE}}, year={2025}, note={GitHub repository} }