The ideal design separates the mathematical operation from the parallelization method. The operation should be passed as a runtime parameter, while the parallelization method should be parameterized on the operation type. This allows the same parallelization infrastructure to work with different operations.
Deep Dive
Voraussetzung
- Keine Daten verfügbar.
Nächste Schritte
- Keine Daten verfügbar.
Deep Dive
Lightning Talk: Crafting CUDA Compatible C++ Code - Jon White - CppCon 2025Indiziert:
https://cppcon.org --- Lightning Talk: Crafting CUDA Compatible C++ Code - Jon White - CppCon 2025 --- An annoying part of writing CUDA code is needing to repeat program logic in CUDA kernels. C++ 20 provides a way to write this code in a single location accessible from both the host and device. --- Work at Hudson River Trading (HRT): https://tinyurl.com/safxfctf --- CppCon is the annual, week-long face-to-face gathering for the entire C++ community. The conference is organized by the C++ community for the community. You will enjoy inspirational talks and a friendly atmosphere designed to help attendees learn from each other, meet interesting people, and generally have a stimulating experience. Taking place this year in Aurora, Colorado, near the Denver airport, and including multiple diverse tracks, the conference will appeal to anyone from C++ novices to experts. Annual CppCon Conference - https://www.cppcon.org https://www.linkedin.com/company/cppcon https://x.com/cppcon https://www.facebook.com/CppConference https://www.reddit.com/r/cppcon/ https://mastodon.social/@CppCon --- Videos Filmed & Edited by Bash Films: http://www.BashFilms.com YouTube Channel Managed by Digital Medium Ltd: https://events.digital-medium.co.uk --- #cpp #cplusplus #cppcon #cppprogramming #cplusplusprogramming #softwaredevelopment #softwareengineering #coding #code #computerscience #technology #technews #programming #programmer
attending any conference, [music] it's in it's incredibly important to to be there. That's kind of the only way to really dedicate your time uh uh to be there and kind of be immersed in the whole thing and not distracted by by other stuff going on. Even if you do get distract distracted with [music] interesting conversations in the hallway.
>> Hi, my name is John and I'm going to be talking about crafting CUDA compatible C++ code.
Um, so basically the problem is I'm trying to write a uh parallel math library that needs to run on CPU and GPU, but I don't want to write anything twice and I don't want CUDA features in my C++ code or CPU code. Um so just as a simple example of uh an operation you want to parallelize u single precision ax plus y um it's a embarrassingly parallel problem because uh every index is independent of all the other indices um so if you want to parallelize this on a CPU you get your vector of uh threads and then you distribute the work to each of the threads or if you want to uh paralyze it on a GPU I've implemented a grid stride loop and a uh CUDA kernel.
Um but the issue is uh those were both uh singlepurpose functions that you had to write both the operation and the parallelization method uh every time. Uh and ideally we want to separate concerns into the operation and the parallelization method.
Uh so step one obligatory context for all the things.
Um so the reason this is important is because on NVCC uh the NVIDIA compiler um you can pass the experimental relaxed con expert flag uh and that allows all of your con expert functions to be uh uh used on both the host and the device. Uh you write it once uh and you don't actually have to use any CUDA uh keywords.
Um so if you can read that uh um so up at the top we have uh the single precision ax plus y uh as a const expert function um and so that is being called from both of the parallelization methods the the one that's doing the CPU uh threading and then the one that's doing the CUDA kernel.
Um so step two uh basically you want to follow the example of the STL uh pass in an operation instead of having to call it uh from each of your par parallelization methods. Um so we're going to pass the operation as a runtime parameter and uh the parallelization method is going to be parameterized on the type of the operation.
Um, so yeah, now we're able to pass in the uh SAXP op to both of our parallelization methods.
Um, and so now we have separation of concerns and everything is only written once except that this doesn't work. Um, so anyone know what's wrong with this?
Uh so the problem is that the operation kernel is being passed at runtime and so it doesn't actually correctly resolve as the the host or device uh version of it.
Um so the CUDA version is actually getting the host version of the function.
Um and so the CUDA kernel is going to silently fail when you try to call it.
Um so actual final step uh make the operation a non-type type play parameter. um that's going to force resolution at compile time. And so the host version gets the the host the host code gets the host version and the uh CUDA version CUDA code gets the CUDA version. So this is what that looks like. Uh still just a con expert function uh defining the operation. Um but now we're passing in the operation as the template parameter, not as one of the runtime parameters. And so this works now.
We did it. Uh we now have a way of uh executing parallel operations that work on either the CPU or the GPU without having to write anything twice and without using CUDA in code that's meant for CPU.
Thank you. [applause] [applause]
Ähnliche Videos
Ubuntu Touch Q&A 190
UBports
241 views•2026-05-17
Iterators and Generators: Real Use Cases
jsmentor-uk
188 views•2026-05-17
TCS NQT Coding Questions Solution (One Shot) | TCS NQT Preparation 2027 | TCS Actual PYQ 2026
knacademy20
2K views•2026-05-17
The 4 Bit AI Training Trick
explaquiz
414 views•2026-05-19
Image to 3D World Workflow 👀
badxstudio
843 views•2026-05-16
Why Learn Algorithms in the AI Era
bitsandproofs
245 views•2026-05-17
NFA - Transition Diagram and Transition Table
nesoacademy
198 views•2026-05-19
BCS | BASIC COMPUTER SKILLS | WHOLE SUBJECT EXPLANATION | OSMANIA UNIVERSITY | @shivanipallela
shivanipallela
345 views•2026-05-22











