CCCL 2.x ‐ CCCL 3.0 migration guide
The CCCL team plans breaking changes carefully and only conducts them at major releases. The CCCL 2.8 release came with many deprecations to prepare for the breaking changes conducted in CCCL 3.0. This page summarizes the changes and helps migrating from CCCL 2.x to CCCL 3.0.
See also the list of all deprecated APIs in CCCL 2.8 and the list of breaking changes in CCCL 3.0.
Removed macros
CUB_IS_INT128_ENABLED
: No replacementCUB_MAX(a, b)
: Use thecuda::std::max(a, b)
function insteadCUB_MIN(a, b)
: Use thecuda::std::min(a, b)
function insteadCUB_QUOTIENT_CEILING(a, b)
: Usecuda::ceil_div(a, b)
insteadCUB_QUOTIENT_FLOOR(a, b)
: Use plain integer divisiona / b
insteadCUB_ROUND_DOWN_NEAREST(a, b)
: Usecuda::round_down(a, b)
insteadCUB_ROUND_UP_NEAREST(a, b)
: Usecuda::round_up(a, b)
insteadCUB_RUNTIME_ENABLED
: No replacementCUB_USE_COOPERATIVE_GROUPS
: No replacementCUDA_CUB_RET_IF_FAIL
: No replacement[THRUST|CUB]_CLANG_VERSION
: No replacement[THRUST|CUB]_DEVICE_COMPILER*
: No replacement[THRUST|CUB]_GCC_VERSION
: No replacement[THRUST|CUB]_HOST_COMPILER*
: No replacement[THRUST|CUB]_INCLUDE_DEVICE_CODE
: No replacement[THRUST|CUB]_INCLUDE_HOST_CODE
: No replacement[THRUST|CUB]_IS_DEVICE_CODE
: No replacement[THRUST|CUB]_IS_HOST_CODE
: No replacement[THRUST|CUB]_MSVC_VERSION_FULL
: No replacement[THRUST|CUB]_MSVC_VERSION
: No replacementTHRUST_CDP_DISPATCH
: No replacement (Support for CUDA Dynamic Parallelism V1 (CDPv1) has been removed, see below)THRUST_DECLTYPE_RETURNS_WITH_SFINAE_CONDITION
: No replacementTHRUST_DECLTYPE_RETURNS
: No replacementTHRUST_DEVICE_CODE
: No replacementTHRUST_HOST_BACKEND
: UseTHRUST_HOST_SYSTEM
insteadTHRUST_INLINE_CONSTANT
: Useinline constexpr
insteadTHRUST_INLINE_INTEGRAL_MEMBER_CONSTANT
: Usestatic constexpr
insteadTHRUST_LEGACY_GCC
: No replacementTHRUST_MODERN_GCC_REQUIRED_NO_ERROR
: No replacementTHRUST_MODERN_GCC
: No replacementTHRUST_MVCAP
: No replacementTHRUST_NODISCARD
: Use[[nodiscard]]
insteadTHRUST_RETOF1
: No replacementTHRUST_RETOF2
: No replacementTHRUST_RETOF
: No replacementTHRUST_TUNING_ARCH
: No direct replacement. Use compiler-specific__CUDA_ARCH__
(nvcc) or__NVCOMPILER_CUDA_ARCH__
(nvc++) instead
Removed functions and classes
_ReadWriteBarrier
and__thrust_compiler_fence
: Usecuda::atomic
insteadcub::*Kernel
: Any CUB kernel entrypoint is considered an implementation detail. No public exposure is provided.cub::Agent*
: CUB agents were considered implementation details and have all been moved to internal namespaces. No public exposure is provided.cub::AliasTemporaries
: No replacementcub::ArrayWrapper
: Usecuda::std::array
insteadcub::BAR
: No replacementcub::BaseTraits::CATEGORY
: Use the facilities from<cuda/std/type_traits>
insteadcub::BaseTraits::NULL_TYPE
: No replacementcub::BaseTraits::PRIMITIVE
: Use the facilities from<cuda/std/type_traits>
insteadcub::BFI
: Usecuda::bitfield_insert
insteadcub::BinaryOpHasIdxParam::HAS_PARAM
: Usecub::BinaryOpHasIdxParam::value
insteadcub::ConstantInputIterator
: Usethrust::constant_iterator
insteadcub::CountingInputIterator
: Usethrust::counting_iterator
insteadcub::CTA_SYNC_AND
: Use__syncthreads_and()
insteadcub::CTA_SYNC_OR
: Use__syncthreads_or()
insteadcub::CTA_SYNC
: Use__syncthreads()
insteadcub::Device*Policy
: Those policy hubs are considered implementation details. No public exposure is provided.cub::DeviceSpmv
: Use cuSPARSE insteadcub::Difference
: Usecuda::std::minus
insteadcub::DivideAndRoundUp
: Usecuda::round_up
insteadcub::Division
: Usecuda::std::divides
insteadcub::Equality
: Usecuda::std::equal_to
insteadcub::FFMA_RZ
: No replacementcub::FMUL_RZ
: No replacementcub::FpLimits<T>
: Usecuda::std::numeric_limits<T>
insteadcub::GridBarrier
: Use the APIs from cooperative groups insteadcub::GridBarrierLifetime
: Use the APIs from cooperative groups insteadcub::IADD3
: No replacementcub::Inequality
: Usecuda::std::not_equal_to
insteadcub::Int2Type
: Usecuda::std::integral_constant
insteadcub::IterateThreadLoad
: No replacementcub::IterateThreadStore
: No replacementcub::KernelConfig
: No replacementcub::LaneId()
: Usecuda::ptx::get_sreg_laneid()
insteadcub::LaneMaskGe()
: Usecuda::ptx::get_sreg_lanemask_ge()
insteadcub::LaneMaskGt()
: Usecuda::ptx::get_sreg_lanemask_gt()
insteadcub::LaneMaskLe()
: Usecuda::ptx::get_sreg_lanemask_le()
insteadcub::LaneMaskLt()
: Usecuda::ptx::get_sreg_lanemask_lt()
insteadcub::MakePolicyWrapper
: No replacementcub::Max
: Usecuda::maximum
insteadcub::max
: Usecuda::std::max
insteadcub::MemBoundScaling
: No replacementcub::Min
: Usecuda::minimum
insteadcub::min
: Usecuda::std::min
insteadcub::Mutex
: Usestd::mutex
insteadcub::PolicyWrapper
: No replacementcub::PRMT
: Usecuda::ptx::prmt()
insteadcub::RegBoundScaling
: No replacementcub::SHFL_IDX_SYNC
: Use__shfl_sync()
insteadcub::SHL_ADD
: No replacementcub::SHR_ADD
: No replacementcub::Sum
: Usecuda::std::plus
insteadcub::Swap(a, b)
: Usecuda::std::swap(a, b)
insteadcub::ThreadTrap()
: Usecuda::std::terminate()
insteadcub::TransformInputIterator
: Usethrust::transform_iterator
insteadcub::TripleChevronFactory
: No replacement for now, we are working on a new kernel launch facilitycub::ValueCache
: No replacementcub::WARP_ALL
: Use__all_sync()
insteadcub::WARP_ANY
: Use__any_sync()
insteadcub::WARP_BALLOT
: Use__ballot_sync()
insteadcub::WARP_SYNC
: Use__syncwarp()
insteadcub::WarpId()
: Usecuda::ptx::get_sreg_warpid()
insteadthrust::*::[first_argument_type|second_argument_type|result_type]
: The nested aliases have been removed for all function object types:thrust::[plus|minus|multiplies|divides|modulus|negate|square|equal_to|not_equal_to|greater|less|greater_equal|less_equal|logical_and|logical_or|logical_not|bit_and|bit_or|bit_xor|identity|maximum|minimum|project1st|project2nd]
. No replacement.thrust::[unary|binary]_function
: No replacement. If you inherit from one of these types, just remove those base classes.thrust::[unary|binary]_traits
: No replacement.thrust::async::*
: No replacement for now. We are working on a C++26 senders implementation. For make a thrust algorithm skip syncing, usethrust::cuda::par_nosync
as execution policy.thrust::bidirectional_universal_iterator_tag
: No replacementthrust::conjunction_value<Ts...>
: Usecuda::std::bool_constant<(Ts && ...)>
insteadthrust::conjunction_value_v<Ts...>
: Use a fold expression:Ts && ...
insteadthrust::cuda_cub::core::*
: Those are considered implementation details. No public exposure is provided.thrust::cuda_cub::counting_iterator_t
: Usethrust::counting_iterator
insteadthrust::cuda_cub::identity
: Usecuda::std::identity
insteadthrust::cuda_cub::launcher::triple_chevron
: No replacement for now, we are working on a new kernel launch facilitythrust::cuda_cub::terminate
: Usecuda::std::terminate()
insteadthrust::cuda_cub::transform_input_iterator_t
: Usethrust::transform_iterator
insteadthrust::cuda_cub::transform_pair_of_input_iterators_t
: Usethrust::transform_iterator of a thrust::zip_iterator
insteadthrust::disjunction_value<Ts...>
: Usecuda::std::bool_constant<(Ts || ...)>
insteadthrust::disjunction_value_v<Ts...>
: Use a fold expression:Ts || ...
insteadthrust::forward_universal_iterator_tag
: No replacementthrust::identity<T>
: Usecuda::std::identity
instead. Ifthrust::identity
was used to perform a cast toT
, please define your own function object.thrust::input_universal_iterator_tag
: No replacementthrust::negation_value<T>
: Usecuda::std::bool_constant<!T>
insteadthrust::negation_value_v<T>
: Use a plain negation!T
thrust::not[1|2]
: Usecuda::std::not_fn
insteadthrust::null_type
: No replacementthrust::numeric_limits<T>
: Usecuda::std::numeric_limits<T>
insteadthrust::optional<T>
: Usecuda::std::optional<T>
instead.thrust::output_universal_iterator_tag
: No replacementthrust::random_access_universal_iterator_tag
: No replacementthrust::remove_cvref[_t]
: Usecuda::std::remove_cvref[_t]
insteadthrust::void_t
: Usecuda::std::void_t
instead
Deprecations with planned removal
CUB_LOG_SMEM_BANKS
: No replacementCUB_LOG_WARP_THREADS
: No replacementCUB_MAX_DEVICES
: No replacementCUB_PREFER_CONFLICT_OVER_PADDING
: No replacementCUB_PTX_LOG_SMEM_BANKS
: No replacementCUB_PTX_LOG_WARP_THREADS
: No replacementCUB_PTX_PREFER_CONFLICT_OVER_PADDING
: No replacementCUB_PTX_SMEM_BANKS
: No replacementCUB_PTX_SUBSCRIPTION_FACTOR
: No replacementCUB_PTX_WARP_THREADS
: No replacementCUB_SMEM_BANKS
: No replacementCUB_SUBSCRIPTION_FACTOR
: No replacementCUB_WARP_THREADS
: No replacementTHRUST_FALSE
: No replacementTHRUST_PREVENT_MACRO_SUBSTITUTION
: No replacementTHRUST_STATIC_ASSERT(expr)
: Usestatic_assert(expr)
insteadTHRUST_TRUE
: No replacementTHRUST_UNKNOWN
: No replacementTHRUST_UNUSED_VAR
: No replacementcub::BFE
: Usecuda::bitfield_extract
insteadcub::MergePathSearch
: No replacementcub::Traits<T>::Max()
: Usecuda::std::numeric_limits<T>::max()
insteadcub::Traits<T>::Min()
: Usecuda::std::numeric_limits<T>::min()
insteadthrust::iterator_difference[_t]<T>
: Usecuda::std::iterator_traits<T>::difference_type
orcuda::std::iter_difference_t<T>
insteadthrust::iterator_pointer[_t]<T>
: Usecuda::std::iterator_traits<T>::pointer
insteadthrust::iterator_reference[_t]<T>
: Usecuda::std::iterator_traits<T>::reference
orcuda::std::iter_reference_t<T>
insteadthrust::iterator_traits<T>
: Usecuda::std::iterator_traits<T>
insteadthrust::iterator_value[_t]<T>
: Usecuda::std::iterator_traits<T>::value_type
orcuda::std::iter_value_t<T>
instead
API breaks
cub::Block*
: All trailingint LEGACY_PTX_ARCH
template parameters have been removedcub::CachingAllocator
: The constructor taking a trailingbool debug
parameter has been removedcub::Device*
: All overloads with a trailingbool debug_synchronous
parameter have been removedcub::Dispatch*
: All Boolean template parameters have been replaced by enumerations to increase readabilitycub::Dispatch*
: All policy hub template parameters have been moved to the back of the template parameters listcub::DispatchScan[ByKey]
: The offset type must be an unsigned type of at least 4-byte sizecuda::ceil_div
: Now returns the common type of its argumentsthrust::pair
: Is now an alias tocuda::std::pair
and no longer a distinct typethrust::tabulate_output_iterator
: Thevalue_type
has been fixed to bevoid
thrust::transform_iterator
: Upon copying, will now always copy its contained function. If the contained function is neither copy constructible nor copy assignable, the iterator fails to compile when attempting to be copied.thrust::tuple
: Is now an alias tocuda::std::tuple
and no longer a distinct typethrust::universal_host_pinned_memory_resource
: The alias has changed to a different memory resource, potentially changing pointer types derived from an allocator/container using this memory resource.The following Thrust function object types have been made aliases to the equally-named types in
cuda::std::
:thrust::[plus|minus|multiplies|divides|modulus|negate|equal_to|not_equal_to|greater|less|greater_equal|less_equal|logical_and|logical_or|logical_not|bit_and|bit_or|bit_xor|identity|maximum|minimum]
. No replacement.CUB_DEFINE_DETECT_NESTED_TYPE
: The generated detector trait no longer provides a::VALUE
member. Use::value
instead.
Iterator traits
cuda::std::iterator_traits
will now correctly recognize user-provided specializations of std::iterator_traits
.
All of Thrust’s iterator traits have been redefined in terms of cuda::std::iterator_traits
,
and users should prefer to use iterator traits from libcu++.
thrust::iterator_traits
can no longer be specialized.
Users should prefer to specialize cuda::std::iterator_traits
instead of std::iterator_traits
when necessary,
to make their iterators work equally in device code.
CUB Traits
The functionality and internal use of cub::Traits
has been minimized, because libcu++ provides better and standard alternatives.
Only the use in CUB’s radix sort implementation for bit-twiddling remains.
Floating-point limits should be obtained using cuda::std::numeric_limits<T>
instead of cub::FpLimits<T>
.
Classification of types should be done with the facilities from <cuda/std/type_traits>
and <cuda/type_traits>
,
notably with cuda::std::is_signed[_v]
, cuda::std::is_integral[_v]
, etc.
There is an important difference for extended floating point types though:
Since cuda::std::is_floating_point[_v]
will only recognize C++ standard floating point types,
cuda::is_floating_point[_v]
must be used to correctly classify extended floating point types like __half
or __nv_bfloat16
.
cub::BaseTraits
and cub::Traits
can no longer be specialized for custom types, and cub::FpLimits
has been removed.
We acknowledge the need to provide user-defined floating point types though,
e.g., registering a custom half type with CUB to be used in radix sort.
Therefore, users can still specialize cub::NumericTraits
for their custom floating point types,
inheriting from cub::BaseTraits
and providing the necessary information for the type.
Additionally, the traits from libcu++ have to be specialized as well:
For example, a custom floating point type my_half
could be registered with CUB and libcu++ like this:
template <>
inline constexpr bool ::cuda::is_floating_point_v<my_half> = true;
template <>
class ::cuda::std::numeric_limits<my_half> {
public:
static constexpr bool is_specialized = true;
static __host__ __device__ my_half max() { return /* TODO */; }
static __host__ __device__ my_half min() { return /* TODO */; }
static __host__ __device__ my_half lowest() { return /* TODO */; }
};
template <>
struct CUB_NS_QUALIFIER::NumericTraits<my_half> : BaseTraits<FLOATING_POINT, true, uint16_t, my_half> {};
Behavioral changes
cub::DeviceReduce::[Arg][Max|Min]
: Will now usecuda::std::numeric_limits<T>::[max|min]()
instead ofcub::Traits
to determine the initial valuecuda::std::mdspan
: The implementation was entirely rewritten and you may experience subtle behavioral changesthrust::transform_iterator
: The logic to determine the reference type has been reworked, especially wrt. to functions that return references to their own arguments (e.g.,thrust::identity
).thrust::transform_iterator::difference_type
: The logic to select the difference type has been reworked. It’s now eitherint
orptrdiff
.
ABI breaks
All of libcu++’s old ABI namespaces have been removed
Platform support
At least C++17 is required
At least clang 14 is required
At least GCC 7 is required
On Windows, at least Visual Studio 2019 is required (MSC_VER >= 1920)
Intel ICC (
icpx
) is no longer supportedAt least CUDA Toolkit 12.0 is required
Support for CUDA Dynamic Parallelism V1 (CDPv1) has been removed
At least a GPU with compute capability 50 (Maxwell) is required