11thestate11thestate

BABA: Alibaba Reinvents RL with GSPO for Qwen3 — But $433M Ant IPO Fallout Still Weighs

Bacaan 1 minit

Court: S.D. New York

Case: 1:20-cv-09568

Alibaba BABA researchers have introduced Group Sequence Policy Optimization (GSPO), a major breakthrough in reinforcement learning (RL) designed to power the next generation of large language models, including Qwen3. GSPO addresses stability flaws in previous methods like GRPO, enabling more scalable and reliable training of massive models — particularly Mixture-of-Experts (MoE) architectures.

Key Innovations Behind GSPO

  • Resolves instability in GRPO by replacing token-level clipping with sequence-level optimization
  • Aligns importance sampling with full sequence likelihood for improved reward consistency
  • Enables efficient training without high-variance noise from token-level estimates
  • Outperforms GRPO across AIME’24, LiveCodeBench, and CodeForces benchmarks
  • Supports MoE training by stabilizing expert activations without routing replay hacks
  • Reduces infrastructure costs by avoiding recomputation and using inference engine likelihoods
  • Drives performance gains in Qwen3-30B-A3B-Base and other advanced LLMs

Strategic Impact

GSPO allows Alibaba to scale RL methods efficiently across massive models while avoiding collapse or overfitting. By eliminating the mismatch between reward signals and optimization targets, the method streamlines infrastructure and improves the reliability of results — a key milestone in Qwen3’s rapid progress.

But a $433.5 Million Settlement Over Ant Group Fallout Still Weighs

Timeline Overview

  • November 5, 2019: Chinese regulators warned Alibaba and peers about antitrust and fintech compliance
  • November 2, 2020: Ant executives summoned by China’s central bank ahead of IPO
  • November 3, 2020: Ant IPO suspended; BABA dropped 8%
  • December 24, 2020: Antitrust probe revealed; BABA dropped 13% in a single day
  • April 22, 2022: Investors filed suit alleging withheld regulatory risks

Allegations Include

  • Failing to disclose compliance risks to investors ahead of Ant’s IPO
  • Misleading public markets about the true regulatory climate facing Ant
  • Downplaying potential impact of new credit and lending rules on Ant’s business model

Investor Update

Alibaba ultimately agreed to a $433.5 million settlement with shareholders who alleged the company misled them about the regulatory landscape surrounding Ant Group. The suit focused on material omissions that affected the anticipated IPO and contributed to major share price drops.

You can check more information about it and file for a payout HERE.