cuML profiling tools arrive in RAPIDS 25.08 update

NVIDIA introduced cuML profiling tools in the RAPIDS 25.08 release, giving developers clearer insight into accelerated machine learning performance. Alongside the data science update, Isaac Lab 2.3 delivers whole-body robot control, richer teleoperation, and new evaluation workflows. Together, these updates target faster iteration from data prep to deployment.

cuML profiling tools: what’s new

The latest RAPIDS 25.08 update adds two cuML profilers. The function-level profiler surfaces which operations run on the GPU and which fall back to the CPU. The line-level profiler goes deeper to pinpoint bottlenecks inside code blocks.

As a result, practitioners can see performance hotspots without rewriting pipelines. Moreover, the tools help teams decide when to refactor algorithms or adjust data flow. Therefore, users can measure gains from minor code changes with confidence.

The profilers integrate with cuml.accel, which accelerates common estimators without changing API calls. Additionally, teams can compare wall-clock gains across runs. Consequently, reproducible tuning becomes easier in continuous integration and notebook workflows.

cuML profilers RAPIDS 25.08 expands data science features

Beyond profiling, RAPIDS broadens data handling. The Polars GPU engine adds a default streaming executor that processes datasets larger than device memory. This unlocks bigger workloads during feature engineering and joins. For context, the Polars GPU engine focuses on speed with familiar DataFrame ergonomics.

Furthermore, type coverage improves with struct data and new string operators. In practice, this reduces conversions between preprocessing steps. Besides that, developers gain more predictable memory behavior in mixed CPU and GPU pipelines. Companies adopt cuML profiling tools to improve efficiency.

On the algorithms side, cuML adds Spectral Embedding for dimensionality reduction. It also introduces LinearSVC, LinearSVR, and KernelRidge estimators. Importantly, these estimators can run with zero code changes through cuml.accel. That approach lowers the barrier for production teams adopting accelerated estimators.

Spectral Embedding for compact representations and clustering readiness
LinearSVC and LinearSVR for efficient linear margin and regression tasks
KernelRidge for flexible, kernelized regression on structured features

Therefore, practitioners can try multiple models faster during model selection. Moreover, the combination of streaming dataframes and profiling supports end-to-end optimization.

RAPIDS cuML profiler Whole-body robot control in Isaac Lab 2.3

In parallel, NVIDIA advanced sim-first robotics with Isaac Lab 2.3. The release emphasizes humanoid capability through whole-body control and improved locomotion. It also tightens the loop between policy learning and evaluation.

Automatic Domain Randomization (ADR) and Population Based Training (PBT) arrive as scaling levers for reinforcement learning. ADR injects variability across textures, lighting, and physics to aid sim-to-real transfer. In contrast, PBT explores hyperparameters dynamically to improve convergence. For background, DeepMind’s overview of Population Based Training outlines the strategy’s benefits in nonstationary training.

Moreover, Isaac Lab adds a dictionary observation space to blend perception and proprioception. This helps policies learn manipulation with richer cues. Additionally, a motion planner-based workflow supports generating labeled data for grasping and placement. Experts track cuML profiling tools trends closely.

Enhanced teleoperation and data generation

Isaac Lab 2.3 expands teleoperation support to accelerate demonstrations. The platform now works with devices like Meta Quest VR and Manus gloves. As a result, teams can capture high-fidelity manipulation trajectories faster.

Furthermore, teleop data complements imitation learning samples out of the box. Therefore, researchers can bootstrap policies before switching to reinforcement learning. Meanwhile, expanded device coverage lowers hardware setup friction for labs.

The release also introduces a policy evaluation framework named Isaac Lab – Arena, co-developed with Lightwheel. The framework enables scalable simulation-based benchmarking of learned skills. Consequently, teams can compare policies across tasks, seeds, and disturbances with consistent metrics.

Policy evaluation framework and reproducibility

Evaluation often lags training in robotics and ML. Arena aims to standardize the loop with repeatable scenarios and datasets. Moreover, it supports larger experiment grids that stress-test generalization.

Additionally, Arena encourages best practices around logging and versioning. Therefore, organizations can track regressions as they adjust reward shaping or curriculum schedules. Notably, the approach aligns with broader MLOps trends in model validation. cuML profiling tools transforms operations.

Domain randomization also plays a role in evaluation. For an overview of the concept, OpenAI’s early note on domain randomization highlights why varied simulation pays off in deployment. In turn, Arena can combine randomized conditions with deterministic seeds for robust comparisons.

Why these updates matter

The cuML profilers demystify where speedups happen and where they do not. This visibility speeds up model selection and pipeline tuning. Moreover, it reduces guesswork when scaling from prototype to production.

RAPIDS’ Polars streaming and new estimators further compress iteration cycles. Additionally, broader type support cuts friction in feature engineering. Therefore, data teams can focus on experimentation instead of dataset surgery.

On the robotics side, whole-body control and richer teleoperation address practical gaps in data quality and policy expressiveness. Furthermore, Arena’s emphasis on scalable evaluation brings discipline to benchmarking. As a result, research groups can compare results more fairly and ship skills with clearer safety margins.

What practitioners should do next

Teams using RAPIDS should profile current ML pipelines immediately. Start with the function-level view, then drill into line-level hotspots. Additionally, track CPU fallbacks and memory transfers that hide inside preprocessing. Industry leaders leverage cuML profiling tools.

Data engineers can test the Polars streaming executor on large joins and group-bys. Moreover, try Spectral Embedding for compact representations before clustering. Consequently, you may see faster convergence with fewer features.

Robotics groups should refresh demonstration capture setups with supported VR devices and gloves. Furthermore, schedule ADR and PBT sweeps to improve robustness and convergence. Finally, adopt Arena-like evaluation to standardize metrics across tasks and teams.

Outlook

These releases push machine learning toward measurably faster iteration and stronger generalization. The cuML profiling tools improve visibility across the modeling stack. Meanwhile, Isaac Lab 2.3 focuses on closing the sim-to-real gap with better control, data, and evaluation.

Therefore, organizations that adopt these features early should deliver models with fewer surprises in production. Moreover, the combined focus on profiling, streaming data, and reproducible evaluation sets a practical blueprint for the next wave of ML deployments.

cuML profiling tools: what’s new

cuML profilers RAPIDS 25.08 expands data science features

Spectral Embedding for compact representations and clustering readiness
LinearSVC and LinearSVR for efficient linear margin and regression tasks
KernelRidge for flexible, kernelized regression on structured features

Therefore, practitioners can try multiple models faster during model selection. Moreover, the combination of streaming dataframes and profiling supports end-to-end optimization.