Xubo Liu @LiuXub
Research Scientist at Stability AI | PhD candidate at CVSSP, Uni. of Surrey liuxubo717.github.io Guildford, England Joined October 2020-
Tweets209
-
Followers861
-
Following691
-
Likes471
These days, although advanced LM is widely used, word embeddings that include both music entities and general words can have domain-specific advantages. ⭐ Introduce Musical Word Embedding (my Master Thesis topic!) - pdf: arxiv.org/abs/2404.13569 - code: github.com/seungheondoh/m…
We've released our paper on the model behind Stable Audio 2.0! Our model can generate high-fidelity music with lengths up to 4 minutes 45 seconds. Paper: arxiv.org/abs/2404.10301 Demos: stability-ai.github.io/stable-audio-2… SoundCloud: soundcloud.com/stable-audio/s… youtube.com/watch?v=UpxIGa…
Attending ICASSP 2024 in Seoul! We have four papers to be presented this year. Would love to chat if you're around! 👀
for #icassp2024 attendees, i'm open sourcing my `What to eat around COEX` list. originally written for @cwu307 but sharing it for a large crowd and make the world better place, reduce p(doom), etc. docs.google.com/document/d/1h5…
The new audio-to-audio capabilities of @stableaudio allows transforming your voice to foley.
🔥 Excited to kick off the DCASE 2024 Challenge Task 9: Language-Queried Audio Source Separation! The challenge starts on April 1st and will end on July 15th. Website: dcase.community/challenge2024/…
We are organizing a special session on "Generative AI for Media Generation" (yyua8222.github.io/MLSP-web/) at IEEE MLSP 2024. If you are interested, please feel free to contact me ([email protected]) before the paper submission deadline on April 30, 2024.
[1/n]🎉Introducing ChatMusician: an open-sourced LLM that can intrinsically generate and understand music. 🚀beats gpt-4 in music generation, beats gpt-3.5 in our college-level music understanding benchmark (0-shot) Paper: arxiv.org/abs/2402.16153 HF: huggingface.co/m-a-p/ChatMusi…
Happy to say you can now do this with the brand new features @StabilityAI API 🤠 1. Search and replace 2. Editing, with inpaint 3. Creative upscaling up to 4k 4. Stable Video More releases to come! 🛳️ platform.stability.ai/docs/api-refer…
Happy to say you can now do this with the brand new features @StabilityAI API 🤠 1. Search and replace 2. Editing, with inpaint 3. Creative upscaling up to 4k 4. Stable Video More releases to come! 🛳️ platform.stability.ai/docs/api-refer… https://t.co/YD6vx66HCw
🚀 As Sora-generated videos captivate the world, one piece remains missing from the sensory puzzle: sound. 🎥➕🔉 Our recent work, 'FoleyGen', elegantly fills this void, providing the missing soundtrack to these silent videos and making the visuals not just seen, but heard. 🌐🎶
Enjoy @OpenAI's Sora-generated videos, with sound by FALL-E, the audio gen-AI by @gaudiolab 🔊 No more silent videos! 🧵
Introducing Gemma - a family of lightweight, state-of-the-art open models for their class, built from the same research & technology used to create the Gemini models. Blog post: blog.google/technology/dev… Tech report: goo.gle/GemmaReport This thread explores some of the…
I'm attending AAAI in Vancouver in these weeks! If you are around and would like to chat, please DM me! 👀
Today we're releasing MMCSG — a dataset with >25h of two-sided conversations captured using Project Aria. Download ➡️ bit.ly/3I8ntui MMCSG could help researchers solve important challenges for future smart glasses including ASR, target speaker identification & more.
Stability releases Stable Cascade demo: huggingface.co/spaces/multimo… github: github.com/Stability-AI/S… a new text to image model building upon the Würstchen architecture
We wrote a position paper arguing that we should construct large generative models compositionally from smaller ones! We argue that (1) it enables data/computation efficient learning (2) it enables provable generalization to unseen test distributions. arxiv.org/abs/2402.01103
Be sure to check out latest LLM - Reka Flash! We worked so hard to awaken this beautiful creature, and I've learned so much what intelligence really means along the ride. Trust me it will only get better and better!
Be sure to check out latest LLM - Reka Flash! We worked so hard to awaken this beautiful creature, and I've learned so much what intelligence really means along the ride. Trust me it will only get better and better!
Thanks AK! 🙌
Pleased to release the text-to-speech work I developed while at @StabilityAI. 💬 TL;DR - Natural language control of high-fidelity TTS. It’s simple, generalizable, and it sounds better than Audiobox :) text-description-to-speech.com arxiv.org/abs/2402.01912 🧵
Shipped a new product! SnappyVideo, an experimental AI toolkit for creating videos. Given instructions, Snappy can source and generate video compositions, voiceovers, SFX and more. Here is a single clip created with /generate_clip command, "Lucy in the sky with diamonds"
Ilaria Manco @Ilaria__Manco
2K Followers 941 Following PhD student @c4dm, working on multimodal learning for music understanding • Former intern @GoogleDeepMind @AdobeResearch @Sony • DJChristian Steinmetz @csteinmetz1
5K Followers 2K Following AI for audio • PhD Student @c4dm MSc @mtg_upf • Previously Intern @Adobe @Meta @DolbyPedro Sarmento @umpedronosapato
2K Followers 2K Following PhD researcher in AI & Music @CDT_AI_Music @c4dm @QMUL a bit more at https://t.co/Ame4OSwQjqKeunwoo Choi @keunwoochoi
6K Followers 803 Following AI x {LLM Engineer @PrescientDesign @genentech, Advisor @gaudiolab}. music, audio, language, AI. Prev: @BytedanceTalk, @spotify, @c4dm @qmul.Oriol Nieto @urinieto
2K Followers 1K Following Researcher at Adobe Research. Machine learning on audio. General Co-Chair of ISMIR24. Screamer. Oaklander born in Barcelona. Titan. He/they 🏳🌈Jonathan Le Roux @JonathanLeRoux
1K Followers 314 Following Speech and audio research scientist at MERL. Opinions never really my own.dadabots @dadabots
9K Followers 7K Following Jupyter notebook prompt jockeys. Eliminating humans from music. AI Death Metal. 🧠 Research @Harmonai_org 🔥🦇🔉@NoiseDAO Artist @artblocks_io @braindrops_artJoan Serrà @serrjoa
2K Followers 598 Following Machine learning @SonyAI_global. Focus on #audio & #multimedia synthesis/analysis/retrieval. Personal account.Desh Raj @rdesh26
3K Followers 2K Following Research Scientist @Meta (AI Speech) | Previously: @jhuclsp, @IITGuwahatiHao-Wen (Herman) Dong.. @hermanhwdong
769 Followers 258 Following Incoming Assistant Professor at University of Michigan | PhD Candidate at UC San Diego | Generative AI for Music & AudioRobin Scheibler @fakufakurevenge
699 Followers 867 Following Researcher in speech and audio processing at LY Corp. I ❤ audio, microphone arrays, IoT, Python, and data. @[email protected]Dasaem Jeong @DasaemJ
588 Followers 413 Following Assistant Professor @ Dep. of Art & Technology, Sogang Univ. MIR/Deep Learning/Creative AIJordi Pons @jordiponsdotme
4K Followers 729 Following Music, audio, and deep learning research at @StabilityAI ~ Building bridges between audio signal processing wisdom and deep learning ~ Prev: @Dolby & @MTG_upf.Chin-Yun Yu @yoyololicon
642 Followers 708 Following PhD Researcher/Weeeeb @CDT_AI_Music @c4dm @qmul. Previously @RayarkOfficial @htcvive @AcadSinica @NYCU_official. I love music, anime, and vtubers.Heiga Zen (全 炳河.. @heiga_zen
7K Followers 197 Following Principal Scientist (Director) @GoogleDeepMind in Japan. 波瀬小⇒一志中⇒鈴鹿高専⇒名工大 (IBM TJ Watson intern for a year)⇒東芝欧州研⇒Google (Speech🇬🇧⇒Brain🇯🇵) ⇒Google DeepMind최형석 (Hyeong-Seo.. @92HsChoi
787 Followers 358 Following Love almost everything related to music. Research @elevenlabsio. Previously Co-founder and Research Lead @ Supertone, PhD @ Seoul National University, MARGYi-Hsuan Yang @affige_yang
2K Followers 674 Following 1 Professor, EE, National Taiwan University 2 Chief Music Scientist, Taiwan AI LabsFaRo @faroit
1K Followers 762 Following @[email protected] Audio-AI researcher at @audioshakeai (Before: @inria, @FraunhoferIIS / @uniFAU). All in 17.68% of greyYung Bliven @YBliven28089
0 Followers 59 FollowingAhmed Bourouis @BrsAhmed
74 Followers 232 Following Phd student @ CVSSP, seeking to understand how people perceive and communicate via SketchesTaina Gummo @gummo_gum
49 Followers 5K FollowingNorbert Biedermann @ .. @NJBiedermann
601 Followers 3K Following VIsionary - Expert (@LinkedIn) - Online Research ProfessionalWesson @Wesson0872
7 Followers 85 FollowingCiprian Cîmpan @devnulli
368 Followers 2K Following I build your next-gen products in my resilient AI cloud @fifi_ai 🤖 Full-stack engineer obsessed with DevOps 🚀MohMahS @MohSamiii
3 Followers 113 FollowingAndrew Rouditchenko �.. @arouditchenko
264 Followers 398 Following PhD student at MIT working on multi-modal and multilingual speech processing. Former @Apple MLR intern.Benny @Benny16421729
1 Followers 25 Followingyang @XiaoFeiyang128
4 Followers 21 FollowingJHU CLSP @jhuclsp
5K Followers 714 Following Center for Language and Speech Processing at @JohnsHopkins #NLProc #MachineLearning #AI https://t.co/6IXR5OSiDY @[email protected]haoshan @haoshan11
26 Followers 336 Followingshiyu @shiyuliukk
6 Followers 48 FollowingElizabeth Orji @Lizzy_Orji
1K Followers 3K Following Google Women Tech Maker Ambassador. Software developer. Founder of Techiepistle. Feminist. Dental TherapistYongyi (Colin) Zang @yongyi_zang
58 Followers 135 Following Audio Machine Learning Engineer/Researcher. MLE@Neosensory.Yuancwang @yuancwang
9 Followers 29 FollowingClaraSmith @OnyokPuno
74 Followers 4K Following Guiding @ Elonmusk’s vision for a better future through Space X, Tesla , Neuralink and more. 🚀 I teach Enthusiast, dream chaser and innovation advocate. 🌟Siddhant Arora @Sid_Arora_18
464 Followers 498 Following PhD student @LTIatCMU; Silver Medalist @iitdelhiYuki Saito @ysaito_human
448 Followers 381 Following Lecturer (speech synthesis) @ The University of Tokyo, JapanPan Zexu @PanZexu
54 Followers 97 Following Visiting research scientist @merl_news | Ph.D. Candidate @NUSingapore, working on speech extraction and multimedia.Rithesh Kumar @ritheshkumar_
934 Followers 560 Following Research Scientist @AdobeResearch. Ex @DescriptApp, @Mila_QuebecMinh Nhat Nguyen @menhguin
483 Followers 2K Following AI Safety research+climate protests. Owner of @AIHubCentral. Product @hume_ai. Don't do the deferred life plan.Zixun Nicolas Guo @nicolasguozixun
66 Followers 89 Following PhD Researcher in AI and Music @CDT_AI_Music @C4DM @QMUL in AI Musiclaura @laura012747755
415 Followers 5K FollowingAvinab Saha 🇮🇳 @avinab_saha
553 Followers 2K Following Incoming Research Intern @GoogleAI | PhD Student, LIVE @UTAustin @utexasece @MLFoundations | Formerly at @Apple, @samsungresearch | @IITKgp '19Yuhang He (Henry) @HenryOxplore
39 Followers 333 Following DPhil in Oxford CS, @Oxford, @WHU,@Baidu, @PhD, @MERLchang li @changli1089772
0 Followers 8 FollowingHenrik Pedersen @HenrikP55
27 Followers 35 Following Professor, Institut for Matematiske Fag, Aalborg Universitet Kontakt: +45 60 11 27 87 / e-mail: [email protected]Rùa @iambestfeed
22 Followers 215 Followingbrother bai @heixiaoniu81889
7 Followers 121 Followingpraful (dog talker) �.. @Prafulfillment
2K Followers 522 Following Building AI to speak with 🐶 https://t.co/SBPDckVLsZ | 🍊DAO 💎: https://t.co/EMK4adrjCR | @YCombinator S15, @TechstarsBoston 09, @Northeastern '11 #GoHuskiestaofangfei @taofangfei
68 Followers 717 FollowingDu Huang @dududuOliver
12 Followers 323 FollowingJonathan Xu @jonxuxu
2K Followers 662 Following neural decoding research | prev at Stanford, Waterloo, IYK, Hebbia | ty spc, emergent ventures, zfellows for supporting us!Charles Liu @CharlesLiu9
1K Followers 757 Following cs @uwaterloo | tweets about personal projects, web dev + ai, and memes | hosting a group house in SF (Hayes Valley) May-Aug, reach out if interested!Joel Stern @sternjoelstern
320 Followers 784 Following Researcher, curator, artist in Naarm/Melbourne working at RMIT/Liquid Architecture. Machine Listening, Eavesdropping, Sonic Art, Disorganising,Xulong Zhang @zhang_xu_long
32 Followers 179 Following Ph.D. in computer application technology from Fudan University. Currently, he work as a senior algorithm researcher at Ping An Technology. #TTS #VC #MIRIxolipsia.gr™ @Ixolipsia
153 Followers 501 Following Audio Engineering, Sound Recording, Music Technology, Μηχανική Ήχου, Ηχοληψία, Μουσική Τεχνολογία ►Supporting High-Quality Audio►AK @_akhaliq
311K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxIlaria Manco @Ilaria__Manco
2K Followers 941 Following PhD student @c4dm, working on multimodal learning for music understanding • Former intern @GoogleDeepMind @AdobeResearch @Sony • DJChristian Steinmetz @csteinmetz1
5K Followers 2K Following AI for audio • PhD Student @c4dm MSc @mtg_upf • Previously Intern @Adobe @Meta @DolbyPedro Sarmento @umpedronosapato
2K Followers 2K Following PhD researcher in AI & Music @CDT_AI_Music @c4dm @QMUL a bit more at https://t.co/Ame4OSwQjqarXiv Sound @ArxivSound
5K Followers 32 Following Sound-related articles (https://t.co/dxVYgWJGOw and https://t.co/b90N0Zzvjs) on https://t.co/HHqPequzVUKeunwoo Choi @keunwoochoi
6K Followers 803 Following AI x {LLM Engineer @PrescientDesign @genentech, Advisor @gaudiolab}. music, audio, language, AI. Prev: @BytedanceTalk, @spotify, @c4dm @qmul.Oriol Nieto @urinieto
2K Followers 1K Following Researcher at Adobe Research. Machine learning on audio. General Co-Chair of ISMIR24. Screamer. Oaklander born in Barcelona. Titan. He/they 🏳🌈Jonathan Le Roux @JonathanLeRoux
1K Followers 314 Following Speech and audio research scientist at MERL. Opinions never really my own.Ben Hayes @benhayesmusic
2K Followers 1K Following PhD student in machine learning for audio synthesis at @c4dm. former research intern @sonycslparis and @bytedance.Shinji Watanabe @shinjiw_at_cmu
3K Followers 310 Following I'm working at CMU (2021-). I was working at NTT (2001-2011), MERL (2012-2017), and JHU (2017-2020). Speech and Audio Processing is my main research topic.Yann LeCun @ylecun
713K Followers 718 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.dadabots @dadabots
9K Followers 7K Following Jupyter notebook prompt jockeys. Eliminating humans from music. AI Death Metal. 🧠 Research @Harmonai_org 🔥🦇🔉@NoiseDAO Artist @artblocks_io @braindrops_artVivek Kumar @vivek_kumar
2K Followers 615 Following Senior Manager, Sound Understanding at @googleai. Ex @Dolby & @Broadcom. Talks and Investments 👉🏽 https://t.co/Iqmk4l7YMFJoan Serrà @serrjoa
2K Followers 598 Following Machine learning @SonyAI_global. Focus on #audio & #multimedia synthesis/analysis/retrieval. Personal account.Desh Raj @rdesh26
3K Followers 2K Following Research Scientist @Meta (AI Speech) | Previously: @jhuclsp, @IITGuwahatiWAVLab | @CarnegieMel.. @WavLab
2K Followers 113 Following Shinji Watanabe's Audio and Voice Lab | WAVLab @LTIatCMU @SCSatCMU | Speech Recognition, Speech Enhancement, Spoken Language Understanding, and more.Titouan Parcollet @ParcolletT
3K Followers 404 Following Research Scientist @samsung. Affiliated Lecturer @Cambridge_Uni | @CaMLSys. Associate Professor on leave @UnivAvignon. Co-creator of @SpeechBrain1.Hao-Wen (Herman) Dong.. @hermanhwdong
769 Followers 258 Following Incoming Assistant Professor at University of Michigan | PhD Candidate at UC San Diego | Generative AI for Music & AudioAndrej Karpathy @karpathy
981K Followers 905 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Andrew Rouditchenko �.. @arouditchenko
264 Followers 398 Following PhD student at MIT working on multi-modal and multilingual speech processing. Former @Apple MLR intern.yang @XiaoFeiyang128
4 Followers 21 FollowingHeiga Zen (全 炳河.. @heiga_zen
7K Followers 197 Following Principal Scientist (Director) @GoogleDeepMind in Japan. 波瀬小⇒一志中⇒鈴鹿高専⇒名工大 (IBM TJ Watson intern for a year)⇒東芝欧州研⇒Google (Speech🇬🇧⇒Brain🇯🇵) ⇒Google DeepMindRobin Scheibler @fakufakurevenge
699 Followers 867 Following Researcher in speech and audio processing at LY Corp. I ❤ audio, microphone arrays, IoT, Python, and data. @[email protected]udio @udiomusic
29K Followers 0 FollowingYe Jia @jiayephy
317 Followers 560 Following Research Scientist @Meta. Prev: @Google Brain; co-founded https://t.co/Ao04duAZgQ.ClaraSmith @OnyokPuno
74 Followers 4K Following Guiding @ Elonmusk’s vision for a better future through Space X, Tesla , Neuralink and more. 🚀 I teach Enthusiast, dream chaser and innovation advocate. 🌟[email protected] @cs_lisheng
348 Followers 3K Following ◆Researcher at Kyoto and Nara ◆Speech recognition+multilingual+translation ◆Welcome collaboration, discussion CV: https://t.co/naL0tJB3sISiddhant Arora @Sid_Arora_18
464 Followers 498 Following PhD student @LTIatCMU; Silver Medalist @iitdelhiYuki Saito @ysaito_human
448 Followers 381 Following Lecturer (speech synthesis) @ The University of Tokyo, JapanRithesh Kumar @ritheshkumar_
934 Followers 560 Following Research Scientist @AdobeResearch. Ex @DescriptApp, @Mila_QuebecMinh Nhat Nguyen @menhguin
483 Followers 2K Following AI Safety research+climate protests. Owner of @AIHubCentral. Product @hume_ai. Don't do the deferred life plan.Zixun Nicolas Guo @nicolasguozixun
66 Followers 89 Following PhD Researcher in AI and Music @CDT_AI_Music @C4DM @QMUL in AI MusicFigure @Figure_robot
72K Followers 1 Following Figure is an AI Robotics company building the world's first commercially viable autonomous humanoid robot.Avinab Saha 🇮🇳 @avinab_saha
553 Followers 2K Following Incoming Research Intern @GoogleAI | PhD Student, LIVE @UTAustin @utexasece @MLFoundations | Formerly at @Apple, @samsungresearch | @IITKgp '19Robin Rombach @robrombach
6K Followers 400 Following Generative enthusiast and long-term PhD Student @LMU_Muenchen. Author of VQGAN, Latent Diffusion, Stable Diffusion.Patrick Esser @pess_r
5K Followers 298 Following Walking on the generative side of computer vision. he/himHenrik Pedersen @HenrikP55
27 Followers 35 Following Professor, Institut for Matematiske Fag, Aalborg Universitet Kontakt: +45 60 11 27 87 / e-mail: [email protected]Yuhang He (Henry) @HenryOxplore
39 Followers 333 Following DPhil in Oxford CS, @Oxford, @WHU,@Baidu, @PhD, @MERLbrother bai @heixiaoniu81889
7 Followers 121 Followingpraful (dog talker) �.. @Prafulfillment
2K Followers 522 Following Building AI to speak with 🐶 https://t.co/SBPDckVLsZ | 🍊DAO 💎: https://t.co/EMK4adrjCR | @YCombinator S15, @TechstarsBoston 09, @Northeastern '11 #GoHuskiestaofangfei @taofangfei
68 Followers 717 FollowingDu Huang @dududuOliver
12 Followers 323 FollowingSaining Xie @sainingxie
14K Followers 1K Following researcher in #deeplearning #computervision | assistant professor at @NYU_Courant @nyuniversity | previous: research scientist @metaai (FAIR) @UCSanDiegoJonathan Xu @jonxuxu
2K Followers 662 Following neural decoding research | prev at Stanford, Waterloo, IYK, Hebbia | ty spc, emergent ventures, zfellows for supporting us!Charles Liu @CharlesLiu9
1K Followers 757 Following cs @uwaterloo | tweets about personal projects, web dev + ai, and memes | hosting a group house in SF (Hayes Valley) May-Aug, reach out if interested!Yangqing Jia @jiayq
12K Followers 263 Following Founder @leptonai. @UCBerkeley alumni. ex @google & @facebook. ex vp @AlibabaGroup. Open source work on caffe, @pytorch, @tensorflow, & @onnxai.William Rudenmalm🇺.. @w_hgm
5K Followers 1K Following Writing Rust 🦀 - ml, comp vision + llms - work https://t.co/aH07o0rjfk https://t.co/pPOeAbtxtxHang Zhao @zhaohang0124
2K Followers 836 Following Asst. Prof@ Tsinghua University, former Scientist@Waymo, MIT PhD’19. Researching on Multimodal Learning and Autonomous Driving, and Robot Learning.Xulong Zhang @zhang_xu_long
32 Followers 179 Following Ph.D. in computer application technology from Fudan University. Currently, he work as a senior algorithm researcher at Ping An Technology. #TTS #VC #MIRStability AI Japan @StabilityAI_JP
38K Followers 98 Following 私たちは人類の可能性を広げるための基盤を構築しています。 We are building the foundation to activate humanity's potential. #StableDiffusion #StableLM #StableVideo #StableAudioAlex Graveley @alexgraveley
31K Followers 936 Following I’m Alex Graveley, creator of GitHub Copilot, AI Tinkerers, Dropbox Paper, MobileCoin, and Hackpad. Building @ai_minion Hiring https://t.co/nsHar8OLPCYi Tay @YiTayML
29K Followers 97 Following chief scientist / cofounder @RekaAILabs 🫠 past: research scientist @google brain 🤯 currently learning to be a dad 🍼Dan Lyth @danlyth
608 Followers 274 Following Working on something new. Previously leading speech research at @StabilityAI and @RockstarGames.Ziyang Ma @ddlbojack
63 Followers 338 Following PhD student @SJTU1896 Focus on speech, language, audio and music processingHady Elsahar @hadyelsahar
2K Followers 2K Following I am a human being, I am your equal ✊🏿✊🏾✊🏻 Research Scientist @MetaAI Identity ∈ ℂ ; Borders ∈ 𝕀 , mytweets ∉ anybody else .AV Speech Processing @AV_SP
2K Followers 1K Following The official(ish) account of the Auditory-VIsual Speech Association (AVISA) AV 👄👓speech references,but mostly what interests me fediscience(dot)org/@AV_SPNaihan Li @ElijhaLee
9 Followers 12 FollowingMM4SG Workshop @ WebC.. @multimodal4SG
28 Followers 100 Following Workshop on Multimodal Content Analysis for Social Good: MM4SG 2024 at the Web Conference 2024 For any queries, email: [email protected]ICCC'24 @iccc_conf
1K Followers 678 Following The Int. Conference on Computational Creativity, organized by @acc_assoc June 21 – 27 2024, University of Jönköping / Sweden Also: @[email protected]di lu @isnoelani
7 Followers 78 FollowingIf you are an Oxford undergraduate/master student looking for potential research projects, feel free to contact me or Christian Rupprecht 🚀
Excited to introduce SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound 🎉 SemantiCodec (50 tokens/second or 0.71kbps) ≈ Previous methods (200 tokens/second or 2.0kbps). 🎉Our study also reveals that SemantiCodec tokens hold richer semantic information.
One of the #ICASSP2024 week highlights for me was the MERL Past, Present, and Future party, which gathered 40+ members of the MERL family to enjoy craft beer, memories, and fun discussions 🤗 Special thanks to @jeonchangbin49 for helping with the local organization! @merl_news
Wanna know how good your multimodality LLM is on our latest vibe-eval? DM us to get free credits and test yourself!
Evals are notoriously difficult to get right but necessary to move the field forward. 🌟 As part of our commitment to science, we’re releasing a subset of our internal evals. 🙌 Vibe-Eval is an open and hard benchmark comprising 269 image-text prompts for measuring the…
``T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining,'' Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang, ift.tt/93X5oew
The AudioLDM-2 paper has been accepted by TASLP - IEEE/ACM Transactions on Audio, Speech and Language Processing 🚀 Many thanks to all the contributors, reviewers, and open-source collaborators to this paper!👍
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining paper page: huggingface.co/papers/2308.05… Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires…
These days, although advanced LM is widely used, word embeddings that include both music entities and general words can have domain-specific advantages. ⭐ Introduce Musical Word Embedding (my Master Thesis topic!) - pdf: arxiv.org/abs/2404.13569 - code: github.com/seungheondoh/m…
New post is out! 🇰🇷 ICASSP 2024, eleven picks. jordipons.me/icassp-2024/
The Machine Listening Group is at #ICASSP2024 in Seoul!
Excited to meet you in Seoul. Looking forward to see you at Room 101, 08:20 - 08:40 on Wed. Anytime up to a coffee time as well.
Attending ICASSP 2024 at Seoul Korea!☕️ I'll present AudioSR at Room E1, 17 April 18:10-18:30. Looking forward to see you there!
Breaking! HuggingFace's open-sourced reproduction of Stability's close-sourced style-controlled Text-to-Speech model 𝐏𝐚𝐫𝐥𝐞𝐫-𝐓𝐓𝐒 is out🚀🚀🚀 released v0.1 is trained on 10k hours of data. Forthcoming v1 trained on 50k hours github.com/huggingface/pa…
Excellent work by @sanchitgandhi99 and @yoachlacombe reproducing the text-description-to-speech model I developed while at @StabilityAI 👏❤️
Introducing Parler-TTS: an inference and training library for high-quality, controllable text-to-speech (TTS) models 🗣️ To fuel the development of open-source TTS research, we are open-sourcing all datasets, training code and our first iteration checkpoint: Parler-TTS Mini v0.1
Hello Twitter! A few weeks ago, I defended my PhD thesis (Title below). I want to thank everybody that joined, or helped along the way and especially my supervisors, jury members and colleagues. I joined since the Google Deepmind team here in Paris. Good things ahead (I hope 🤞!)
Thanks for AK! We will release a gradio demo as soon as possible! Please check out our github: github.com/PKU-YuanGroup/…
MagicTime Time-lapse Video Generation Models as Metamorphic Simulators Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions. A largely overlooked problem in T2V
Can't wait to show the world what our model can do! Have faith in the scaling law and keep cooking🫕
A good day. Testing our new ✨Reka Core✨ model and its showing promising capabilities. Complex table understanding is one of them. Lmk if you are interested in early access @RekaAILabs
``UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions,'' Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe, ift.tt/my6W0iX
The new audio-to-audio capabilities of @stableaudio allows transforming your voice to foley.
MERL Speech Team reunion 🤗 as we celebrated the outstanding career of our chairman and former president & CEO Dick Waters, who retires this week. 33 years on, MERL is stronger than ever thanks to his leadership 👏🙏 @merl_news @shinjiw_at_cmu