Installieren Sie unsere Erweiterung an, um sofort in jedem Video zu suchen

Einsummable
Indiziert: 2026-05-15

231 Aufrufe103:16ChristopherJermaine-b6uOriginalveröffentlichung: 2026-05-15

While the conventional system waits for post-processing to complete before starting its second wave of matmul operations, Einsummable is already on its fourth and final wave of matmul operations. Due to the long pause required for post-processing in conventional systems, they finish much later than fine-grained systems like Einsummable, which maintain continuous computation throughout.

[00:00:00]I'm summable uses a fine-grained data flow graph to execute a computation on a set of GPUs.

[00:00:06]Fine-grained means that for a matrix multiply chain like X * Y * Z, I'm summable breaks the computation up into a large number of pieces.

[00:00:17]This is the graph that I'm summable might use to run the computation on four GPUs. The operations it uses in this example are matmul, move, and matadd.

[00:00:28]The graph is produced automatically by I'm summable.

[00:00:32]In this example, I'm summable starts by running a first wave of matmul operations load balanced across the four GPUs.

[00:00:41]Once those matmul operations finish, a second wave of matmul operations starts.

[00:00:46]At the same time, I'm summable begins the transmission of results from the first wave through the network, where they are aggregated.

[00:00:55]I'm summable is an asynchronous work conserving system where communication and computation are overlapped, and there is no pause for communication.

[00:01:05]Once the second wave of matmul operations finishes, the third wave of matmul operations begins.

[00:01:12]As the third wave runs, the intermediate results from the second wave are transmitted through the network and aggregated.

[00:01:19]Again, there is no pause for communication.

[00:01:22]Then the third wave finishes, and the fourth wave of matmul operations begins.

[00:01:28]As the fourth wave runs, the results of the third wave are transmitted to the location required for the final aggregation.

[00:01:36]The entire computation has been run without the need to pause and wait for communication and aggregation.

[00:01:43]Let's compare with how a conventional system, such as PyTorch or JAX, might operate.

[00:01:49]Typically, the system would break things up so that the amount of work matches the number of GPUs.

[00:01:55]Keep an eye on the comparison with Einsummable, shown running at the same time at the bottom right.

[00:02:02]The conventional system starts its first wave of matmul ops, which take twice as long as the Einsummable matmul ops because they operate on larger chunks.

[00:02:15]We're going to have to wait a little while for those matmul ops to finish.

[00:02:22]Once those matmul ops finish, the process of post-processing the results begins.

[00:02:28]In this case, that means concatenating the intermediate results to form larger submatrices, and then replicating those submatrices on the appropriate GPUs.

[00:02:39]Note that while this is happening, the GPUs sit idle as no expensive matmul operations are being run.

[00:02:46]Finally, the second wave of matmul operations can begin.

[00:02:52]Note that at this point, Einsummable is already on its fourth and final wave of matmul ops.

[00:03:00]Now, Einsummable finishes the computation.

[00:03:05]Because of the long pause required to post-process the outputs of the first wave, the conventional system finishes much later than Einsummable.

Ähnliche Videos

Ubuntu Touch Q&A 190

UBports

241 views•2026-05-17

Iterators and Generators: Real Use Cases

jsmentor-uk

188 views•2026-05-17

TCS NQT Coding Questions Solution (One Shot) | TCS NQT Preparation 2027 | TCS Actual PYQ 2026

knacademy20

2K views•2026-05-17

The 4 Bit AI Training Trick

explaquiz

414 views•2026-05-19

Image to 3D World Workflow 👀

badxstudio

843 views•2026-05-16

Why Learn Algorithms in the AI Era

bitsandproofs

245 views•2026-05-17

NFA - Transition Diagram and Transition Table

nesoacademy

198 views•2026-05-19

BCS | BASIC COMPUTER SKILLS | WHOLE SUBJECT EXPLANATION | OSMANIA UNIVERSITY | ‎⁨@shivanipallela⁩

shivanipallela

345 views•2026-05-22

Trends

She Lived A DECADE In 3 Weeks

andyyjiang

3866K views•2026-05-18

The Gen Alpha Melody

Carl.e.martin

845K views•2026-05-17

How Big is the Biggest Volcano?

CleoAbram

1908K views•2026-05-16

The 10-Year-Old Who Outsmarted His Math Teacher 🤯

DiscoveryPill_YT

1848K views•2026-05-18