Alibaba says its I2VGen-XL model can handle “visualization, sampling, training, inference, join training using images and videos, acceleration, and more.
Alibaba Cloud—subsidiary of Chinese conglomerate Alibaba Group and one of the world's largest cloud computing companies—has unveiled its I2VGen-XL AI tool. It’s an advanced text-to-video system that's intended to compete against top-of-the-line models like the ones released by Pika Labs or Stability AI.
The company announced the release of the model’s weights today after publishing the model’s research paper last month.
I2VGen-XL is engineered using cascaded diffusion models, the paper explains, a sophisticated AI technique that ensures the generated videos are not only visually impressive but also contextually coherent and semantically accurate. It operates on a two-stage process: the base stage focuses on maintaining coherence with the input text and images, and the refinement stage enhances the details and resolution of the video, achieving up to 1280x720 pixels.
This technique may sound similar to those used to generate images with SDXL. Unlike SD 1.5 and SD 2.1 which relied on a single model, Stability AI developed two different models, a base and a refiner, which should be combined to generate the best quality images possible.
Alibaba Cloud says the model's training utilized an extensive dataset of around 35 million text-to-video pairs and a staggering 6 billion text-to-image pairs. Such a vast dataset ensures the model's versatility and accuracy across various scenarios and subjects.
A new model amidst an AI arms race
This release comes as the global tech landscape is witnessing heightened tensions and competition, particularly between the US and China. Amidst a backdrop of trade restrictions and a push for technological self-reliance, Alibaba's move is both timely and strategically significant for the country.
Alibaba's latest innovation is not an isolated development but part of a longer narrative of technological rivalry. With the US imposing restrictions on chip exports and China responding with its countermeasures, the race for AI supremacy has accelerated. This environment has spurred advancements in indigenous technologies, with both nations vying for a leading position in AI, semiconductor technology, and 5G innovation.
When contrasted with other notable advancements in the field, such as Pika Labs' model and Stable Video Diffusion, I2VGen-XL distinguishes itself through its unique approach and high semantic accuracy. A demo with several examples of using HiGen (a diffusion model) with I2VGen-XL shows a major improvement in temporal and frame consistency when compared to the use of HiGen alone.
Alibaba's I2VGen-XL model represents a significant milestone in the AI landscape because it provides an alternative to models that are either banned for Chinese users or could be restricted in the future by the US or the Chinese government.
Alibaba’s emerging tech plays
Alibaba goes beyond just e-commerce. It has been a significant player in emerging technologies for a while, consistently pushing new developments in the realms of AI, the metaverse, software, and even digital currencies.
In AI-driven animation, besides sI2VGen-XL, Alibaba's "Animate Anyone" model stands out. This tool transforms static images into dynamic animations, employing a novel framework called ReferenceNet. Integrating sophisticated diffusion models achieves temporally stable and visually consistent videos.
Alibaba Cloud also partnered with Avalanche to launch its Cloudverse platform. This technology offers businesses a seamless pathway to create and maintain their digital universes. The strategic alliance with Avalanche and Metaverse Universal Assets DAO's involvement in middleware solutions highlights Alibaba's collaborative approach and its dedication to harnessing Web3 technologies.