Kimi K2.6 achieves impressive speed, but rapid execution without improved accuracy just means making mistakes faster. It is a powerful tool for efficiency that places a heavy burden on the user to maintain control.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
Kimi K2.6 - The End of Slow LLMsインデックス作成:
KimiK2.6 is *fast*. Really fast. Allmost ... to fast? In this video we explore why having a faster model is amazing, but that you also need to pay attention. Relevant links: https://github.com/koaning/kimi-k2.6-exploration https://www.coreweave.com/blog/coreweave-is-now-the-fastest-at-inference-on-the-best-open-source-model-kimi-k2-6 https://wandb.ai/inference/coreweave/cw_moonshotai_Kimi-K2.6 00:00 Introduction 00:36 The Engine 01:21 Starting the Notebook 02:49 Careful now 04:47 Settings Links: Website: https://marimo.io Discord: https://marimo.io/discord Reddit: https://www.reddit.com/r/marimo_notebook/ Twitter: https://x.com/@marimo_io Tiktok: https://www.tiktok.com/@marimo.io Instagram: https://www.instagram.com/marimo_io Bluesky: https://bsky.app/profile/marimo.io Newsletter: https://marimo.io/newsletter
Kimiko 2.6, at least to me, is a milestone in speed when it comes to LLMs. I'm about to do a kind of a weird demo by giving open code the prompt to write me a 100-line rhyming poem about Python, and we're going to number every line while we're at it. I'm going to hit run, and it's going to generate something that's way too hard to read because it's just going by so darn quick. And I'll just give it a extra second. There we go. This took about 9.2 seconds, and Kimiko 2.6 is a pretty capable model at that. So, in this video, what I'd like to do is just give a demo of what it's like to actually do some notebook work when you have a model like this. But before getting there, I just want to make one comment, and that is also the provider that I'm using here cuz this is happening over the Weights & Biases inference engine. This Weights & Biases inference engine is something that you can use via the Python API, which is cool. But the thing that's really neat about it is if you go to Open Router, you can actually compare this Weights & Biases model over here, and you can see the latency is almost half a second. We get about 120 tokens per second here. If you were to compare that to the model directly above or below the Open Router list, well, it is 2 seconds versus 50 tokens per second, or maybe even going down to 34. This Weights & Biases model is hosted by CoreWeave, and they've definitely been investing in this, and it shows. We really have one of the fastest models on one of the fastest deployments that we're playing with here. Now, to play with this, what I'm going to do is I'm going to go to Mo Lab, which is a sandbox environment for these Python notebooks, and I'm going to take this open code command over here, paste that into the terminal, then I got to take this authentication token and put it in. And now, Open Code is going to boot up as it would normally, but it's going to try to connect to this running instance over here. It's going to try to use an internal scratchpad, but I can see the toast appear already.
It's ready to pair program. Now, as a demo, what I'm going to do is I'm actually going to mimic a analysis task.
It's a dataset that I already know very well. It's a dataset from World of Warcraft with date time stamps. It's a heartbeat kind of a data set so you can do things like detect sessions and then try to find bots. I'm just going to see if I can actually see the variable. So, do you see DF? Just to confirm that it can read that data frame. You can see that it's running some code. It's trying to fetch the globals that are in this notebook. So, yep, there we go. It can read that data frame. So, now we can ask it to do stuff with it. I'm going to dictate now. I would like to find bots in the data set and as a first step, I think we got to add a session ID. Let's make a sessionize function that can be applied to this data set. I'm going to run this and it's going to go off and yeet into the distance. You can see it's going to try to run a whole bunch of code and the moment that something runs, it can then add a new cell to the notebook on the side. And there we go. I can see that it just made a new cell and it wrote the code. And what I could do now is say this looks pretty good. Let's move on and detect some bots. But this is also the point in time we have to be slightly wary because it is going to yeet off to the distance. It just made a new cell. It made a new data frame. I would have to actually go in and inspect what it actually wrote here. And you know, to some extent it's doing the right things, but here I already feel like I got to remind it to write proper pipeline functions. And oh, it's cool that you have a data frame here with bots, but there's some assumptions being made here like when is a session out of bounds and that sort of a thing. And the code as is right now is not written with flexibility in mind. It just did the one thing that I asked it to do, which is good. But now imagine that I gave it a longer task, it would really just yeet off into the distance. So, if you're going to use a model like this, boy is it quick. But because it's so quick, it's actually maybe also a good idea to take it one cell at a time so that you are in the loop cuz that's the biggest risk with this thing. It is so quick that if you really let it run for a mile, it will run 10. Let's clean this code up a little bit because we can introduce a function that has a couple of parameters that let us play around.
Right now the script is doing a lot of things. Let's also add a chart as an intermediate result that shows us the distribution of the session time per session and maybe also the maximum session time per player. That's by the way something I like to do a lot. When it's going off into the distance and when it might be going super quick, having some charts give you something to hold on to just in case it's doing something that you don't want it to do.
If something is up, if something is iffy, the charts usually tell you really quickly. It also definitely made some charts. It also seems to generate the same charts twice for some reason, but that's okay. There's just an artifact with the way that it's being called.
just call plot show there. No need to mention the figure as well. But okay, you get the point. It is really cool to have a model this quick. It is definitely a different kind of frontier model you could say. It's not so much the highest quality model maybe, but it definitely is a different kind of utility just because it's so quick. But when models become quick, they also introduce new risks and the main risk here is that it's so quick that it's going to go for miles even if you just wanted to go for a few steps. Now, if you're keen to give this a spin yourself, one thing that I will do is I will link to this file in the show notes. code.json file because this is the configuration that you need if you want to do something like this yourself.
Open code.json is pretty simple. You have to configure a provider. It has to follow the open ISD key, but that's definitely taken care of for you. Just make sure it's pointing to the right inference endpoint. And then within that provider, you want to specify the models that you can quickly pick. So in this case, you want to go for this Moonshot AI Kimi K2.6.
And in there, you got to make sure that you set your authorization properly. So you want to make sure that the API key of Weights & Biases is in there. And before you start up open code, you also want to make sure that the environment variable is properly sourced. With that configured though, you now have access to probably the fastest useful open source LLM right now, also on one of the fastest inference engines also right now.
関連おすすめ
Elon Musk’s XAI, Fiber-Optic Drones & the New Era of US Defense & Winning the AI Arms Race
DefenseNow
250 views•2026-05-15
I Read Every Google Antigravity 2.0 Doc So You Don't Have To (13-Min Operator Playbook)
hyperautomationlabs1045
120 views•2026-05-19
Could AI change the future of cancer survival?
MotherConservative
999 views•2026-05-16
[RQ] All Preview 2 Midnight Horror School Deepfakes in Macbg Major
macbghuggylego
102 views•2026-05-15
Firefox on Android Just Added 'Shake to Summarize'
BrenTech
349 views•2026-05-19
Google’s NEW AI Just SHOCKED The World…
JulianGoldiePodcast
188 views•2026-05-21
WWDC 2026 Promises Apple Intelligence and Siri Upgrades | Episode 195
TheMacRumorsShow
104 views•2026-05-22
RNNs Had a Fatal Flaw — Why Transformers Replaced Sequential Processing
axiom-motion-math
567 views•2026-05-18











