The Delphi Podcast
The Delphi Podcast
Ben Fielding: Gensyn’s Polar Opposite Architecture vs AI Data Centers, Fueling an AI-Native Internet, Open vs. Closed Source AI and RL Swarm
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Join Tom Shaughnessy as he hosts Ben Fielding, co-founder of Gensyn Network, for an in-depth exploration of decentralized AI infrastructure and the future of machine learning. From Gensyn's evolution as a low-level infrastructure platform to its revolutionary approach with RL Swarm, the conversation covers transformative developments in distributed computing, model training, and the philosophical implications of AI development.
🎯 Key Highlights
▸ Gensyn's approach to agnostic, low-level AI infrastructure designed to transcend trends
▸ Transition from vertical to horizontal scaling in AI computing systems
▸ The limitations of centralized hyperscaler data centers and how Gensyn provides an alternative
▸ Deep dive into RL Swarm: enabling peer-to-peer reinforcement learning at scale
▸ The philosophical importance of open, personalized AI models vs centralized systems
▸ Exploration of AI's future: parameter space, model communication, and verification
▸ Vision for how decentralized AI can democratize machine learning technology
💡 Want to stay updated with the latest in crypto & AI? Hit subscribe and the notification bell! 🔔
🧠 Follow the Alpha
▸ Tom's Twitter: @shaughnessy119
▸ Ben's Twitter: @fenbielding
🔗 Connect with Delphi
🌐 Portal: https://delphidigital.io/
🐦 Twitter: https://twitter.com/delphi_digital
💼 LinkedIn: https://www.linkedin.com/company/delphi-digital
🎧 Listen on
Spotify: https://open.spotify.com/show/62PR1RigLG2YN5Pelq6UY9?si=18ac7ccf36ab4753
Apple Podcasts: https://podcasts.apple.com/us/podcast/the-delphi-podcast/id1438148082
Youtube: https://www.youtube.com/channel/UC9Yy99ZlQIX9-PdG_xHj43Q
Timestamps
00:00 - Introduction to Gensyn Network
01:29 - Building AI-Agnostic Infrastructure
05:00 - Vertical vs Horizontal Scaling in AI
10:50 - Limitations of Centralized Data Centers
17:42 - Gensyn's Core Components and Vision
20:45 - The Internet as Parameter Space
28:00 - Execution, Communication, and Verification
34:42 - RL Swarm: Decentralized Reinforcement Learning
41:30 - Model Communication and Improvement
48:46 - Testnet and Persistent Identity
53:56 - Preserving Model Personalization
59:42 - Open vs Closed Source Models
01:05:30 - The Future of AI Development
01:11:20 - Value Capture in AI and Final Thoughts
Disclaimer
This podcast is strictly informational and educational and is not investment advice or a solicitation to buy or sell any tokens or securities or to make any financial decisions. Do not trade or invest in any project, tokens, or securities based upon this podcast episode. The host and members at Delphi Ventures may personally own tokens or art that are mentioned on the podcast. Our current show features paid sponsorships which may be featured at the start, middle, and/or the end of the episode. These sponsorships are for informational purposes only and are not a solicitation to use any product, service or token.
You're now plugged into the Delphi podcast. Hey everyone, it's Tommy from Delphi Ventures, and welcome back to the podcast. Today I'm thrilled to have on Ben, who is the co-founder of Jensen Network. Uh Jensen is a decentralized blockchain-based compute network that aggregates underutilized hardware into scalable, trustless infra for cost-effective deep learning model training. Ben, did Chat GPT screw up that intro or what? Is it good?
SPEAKER_01Um it's uh it's maybe a little bit out of date, but otherwise it's reasonably accurate, I would say. Yeah. And uh great to be here.
SPEAKER_00Yeah, I've seen you talk so many times over the years, like ETH Denver on podcasts. Um, I have a fond memory of listening to one of your earliest podcasts where you talked about unlocking the compute on laptops and phones to add more electricity to the quote unquote grid of intelligence. Like unlocking that always stuck with me. So really glad to have you on. Nice, glad to hear it. So, Ben, you've been I'm not trying to date you, but you've been you've been around for a couple of years, and that's like 20 cycles in AI, right? We started out with with Midjourney, then we had Chat GBT launch, and then centralized was gonna win, and now we have decentralized sort of winning, and now we have open source from China. Like it's just so much volatility for it for an AI founder to question what they're doing, if they're in the right sector, if they're doing the right things. Like, how do you take all that volatility in the AI space and funnel that into Jensen? What are your views?
SPEAKER_01Yeah, it's a it's a good question. Um, and it's one that we myself and my co-founder have thought about quite a lot. Um, in general, we made a decision very early on. So we've been going for five years now. Um, and we made a decision really early on to build low-level infrastructure technology that was agnostic to the changes and whims and trends of machine learning, both like out kind of applied in the world, but also research itself. Um, so I started my research, I started my PhD a decade ago. I saw the rise of deep learning within computer vision, so post-Alexnet, all the kind of models that uh came and went, the rise of uh RNNs for text generation back in the kind of char RNN days, and then that moving to Transformers and all of these kind of changes through uh research of different model architectures, different approaches, et cetera. Um, and when we started Jensen five years ago, we knew that that was happening. We knew it was going to continue to happen because back then machine learning was still mostly a research thing. It was being applied in some places, but it was still kind of the vast majority of it was within academia or adjacent to academia. And so we knew it wasn't over yet for those big changes. So making a bet on a specific model architecture or a specific kind of deployment of ML just didn't make any sense. What did make sense is making a bet on the low-level operations that are happening. So we looked at that whole space and we said, okay, matrix multiplications haven't changed. They're not really going anywhere. There's a slight risk from quantum, but outside of that, chances are we're gonna keep doing mapmulls and um and things like that for a very long time. Maybe we'll change the way that we do them, we'll change the um higher-level layers and things that are used. Uh, but at the end of the day, what's operating on parallel processes is gonna be very consistent. Um, and that's where we we care. So we wanted to unlock those low-level operations rather than higher level uh kind of like task-specific or model-specific operations. Uh, and I think the at the time we when we thought this, we thought it is more difficult to do it that way. You have to go lower level, you have to think about things from first principles, you have to solve some problems that you don't necessarily have to solve if you start at a higher level. Um, an easy example of this is if you think about the verification of uh execution of machine learning operations by an untrusted party, you get loads of tricks you can perform if you narrow down to a specific, like a task-specific problem. If you narrow down to a specific model, you can do tricks on top of that model, you can check the way the model learns and things like that. But you become beholden to that model type or that task type, and you actually can't expand outside of that. And if that task type or model type changes and actually the meta moves on to another one, all of the work that you've done just collapses because it's not interesting to the world anymore. Exact same problem uh happens with custom ASICs. So there's a graveyard or growing graveyard of companies who've gone to build machine learning specific hardware. They've said, okay, transformers, everybody's using transformers, let's build hardware for transformers. They have a huge risk in their business because if suddenly something that isn't Transformers comes along and changes the world, they've spent 50 million plus dollars like taping out this specific chip to do Transformers, and then suddenly it's worthless and they have to move on to the next thing. And in reality, what you see is those companies desperately try and move everybody in the world back to the model that they took a bet on. Uh, not necessarily because it's the best model for the world, but because they've spent so much money on it, they've got this like huge sunk cost that they're trying to um get something out of. So we didn't want to end up in a position like that. We focused really, really low level, uh, and we continue to do so. First principles, uh, as low as we can go, agnostic to the whims and changes of the market. Obviously, you can't be completely agnostic, but as agnostic as you can be pragmatically.
SPEAKER_00So there's something that happens over time when I'm trying to think of like path dependency, right? Like the last 20 years we've spent or were spent optimizing centralized data centers, right? Having tons of GPUs, they're all co-located, redundant power, fiber, retractable roofs, cooling from the base, the whole nine yards, right? And now we're starting to get into this world where something that you're enabling, where any device can can be on that network, right? Heterogeneous hardware, whatever device, whatever GPU you have, it all adds in. The question I have for you is is a velocity question, right? Um, if data centers continue to get these incredibly fast, powerful chips, um, there is a lot of latent compute on our our Macs. I've never really used the GPU on this thing on my phone, yada yada. But is there enough to tap into when you go live versus what the centralized players will have in a couple of years? Does that make sense comparing the two?
SPEAKER_01Or yeah, I'd say so. I I think the way I would frame it is a timing piece. Um, so you look at just general distributed systems typically as a pattern in the past, we go through this process of vertical scaling to get the kind of fastest increases in performance at um usually a cost that doesn't matter too much to the market. So you say, okay, I've created this system. Can I scale it up in some way? Can I make it bigger or can I make it faster? Yes, by just throwing resources at it, but not changing it fundamentally, just kind of scaling it up in place. And that vertical scaling approach is typically always the fastest approach you can take. The problem is it has diminishing returns in terms of expense. And at a certain point, that scaling slows down. And actually, it's an enormous cost to continue the vertical scaling for not a huge amount of benefit. Um, and in my opinion, I think now, growing the world's opinion, we've hit that diminishing returns point. There's still a reason to continue scaling vertically, but it's mostly a marketing reason at this point, in my opinion. So all of the companies that are doing um ML models out in the wild, they're trying to capture users, in my opinion, um, because that's where the value is. As in the way that they do that currently is to beat the benchmarks, kind of have the shiniest new model, um, have as much deployment as they possibly can to get the users to their platform, and they'll throw cash at that to just make it happen right now. But it's not a kind of long-term into the future scaling approach. It's a short-term, bursty, expensive scaling approach just to capture the users so that later they can scale out more deliberately. That scale out more deliberately to me is horizontal scaling. So, same as you've seen in the past in again in distributed systems, like uh MapReduce coming around, like a huge horizontal scaling approach to say, actually, we can keep throwing resources in a centralized way and in vertical scaling for these systems. But at a certain point, it makes sense to do the redesign, go through the pain of saying this this task is going to operate slightly differently, it's gonna look different. But instead of it being enormously expensive to add 10% scale to it, we suddenly unlock a hundred X scale, but we had to do this like weird period of redesign where we move to a new way of thinking. Uh, and in my opinion, that's where AI is right now. It's in that weird redesign period where we vertically scaled, it was super interesting, we proved what was possible, diminishing returns, super expensive. We need to move to that new redesign, and the world is discovering that. That discovery process looks at the history of machine learning research, it finds those weird niche areas that for whatever reason we're already doing this. You see this in um some of the cross-data center support, um, the research papers, where they'll take certain things from like the field of federated learning and they'll just reapply them here. And federated learning did them for different reasons. It was data privacy why you would need to average um learning done by many different devices because you didn't want to combine the data. But it turns out doing that averaging process in a communication efficient way is actually highly beneficial for horizontal scaling. So you get a reapplication of those techniques in this new area. So I think we're gonna see a continuation of that reapplication of research techniques that existed for a different reason in other areas to this horizontal scaling point, but also new techniques. So people do the novel research, they figure out new ways of doing that scaling and they apply these. And eventually there's this flip point where suddenly everybody's doing it the horizontal way, and it's the logical way to do it. Um, but we're currently in that awkward teen kind of phase where like kind of between the two and everyone's trying to fight for their reason and their way.
SPEAKER_00Yeah. I mean, that is a pretty spicy take, though, in in that if I'm understanding correctly, you're saying that all of the work being done by the hyperscalers, like the Googles, the Metas, the OpenAIs, just massively scaling these data centers is not h are you saying it has diminishing returns, or it's just like not the end game? Like what is the limiting factor there?
SPEAKER_01Well, I think a bit of both. Um, so there's a the I I guess there's another dynamic to this, which is once you have a vertically scaled data center, that is a useful asset, right? Like there's certain techniques within machine learning which are particular, they do particularly benefit from a heavily centralized, hyper-efficient, super fast interconnect cluster. Um that that's why it's kind of been scaled in this way. So you look at uh pre-training, for example, where you have a corpus of enormous amounts of labeled data, you've got like the entire internet labeled in one place on fast disks, and you want to move that into parameter space. It's always going to be the most efficient way to do that in a centralized context because you've got all the data in one place and you're just shoving it into this model the fastest way you can, the most effective way you can. Um, but that isn't the only machine learning operation that needs to be performed. And it also doesn't take into account the future collection cost of new data. It assumes that that data's already been collected in one place, it's already been labeled in one place, and it exists in one place. Um, that makes sense when you take the existing internet and you just compress that down into this single data set. But then when you look out into the future and you say, okay, machine learning systems aren't going anywhere. They're going to exist alongside us into the rest of time, essentially. Uh, they're going to need to incorporate new data constantly from different sources. Does it make sense for us to continue to gather all of that data from all over the world, compress it into these centralized data sets, and then do the training in that one place? I would say no, that doesn't make sense to me. What makes sense to me is sure, get the quick skip by doing pre-training what we've already got, great, you've got a model at a certain point. But then out into the future, you want these continual living models which constantly benefit from data that exists at the edge, it originates at the edge, and it doesn't necessarily need to be moved. Um, I think uh one of the really sharp pieces of this, I think, that people miss is people say that the communication cost of moving from centralized training to decentralized training is really high. What they forget is we've already paid the communication cost of gathering the data into one place. And so into the future, that exists and that is on top of every kind of training run. Um, so you forget this cost that's been paid by academics in the past gathering the data sets, and then these companies themselves gathering the data sets into one place. There was huge amounts of communication that happened there to make that be the case. Um, and if you want to do this stretching out into the future, you have to incorporate that cost as well as the cost of communication in the training itself. Uh, and I think people forget that.
SPEAKER_00Yeah, no, those are those are really nuanced points. I mean, maybe like maybe just to double down on the small brain question, because I want to get your take. Like, it like when you're looking at these data center builds, like I'm looking at the pictures of Elon's new data center. I just saw the the Instagram of the one in China, which it just looks like a city. Like, what is the reason that that is diminishing returns? Is it just that building the the ninth building is exponentially more expensive? Do you do you run out of land? Is it interconnect? Like, what is it that would drive someone to say, hey, look, you know, I don't want to build this next marginal data center. I'm gonna go use the Jensen network. Like, what what do you think is that decision tree at the extreme?
SPEAKER_01Yeah, I think you just go down the list of resources that you're stressing, basically. So um, you think like the main resources that people shout about obviously electricity sources? Like, can you get enough gigawatts or megawatts in the previous world now, gigawatts into this data center to actually fuel the cards? That's one big one. After that, you've got like cooling. Can you actually cool a space this large? Do you need to shard it into multiple different buildings? When you do that cooling, uh, do you generate too much noise for the local area? Are you polluting that area? Are you able to even get uh agreement from the local authority or the government of the country itself to build something like that in that place? Um, can you get the cards themselves that can be connected up in this hyperfast interconnect way? Do they need to be compatible cards with each other? I.e., do you have to only buy H100s because they're the only things that work with the uh InfiniBand that you're using, et cetera? Like you run down this list of things. Um, some of the ones that people don't expect are like the geographical constraints of is there anywhere on the planet that can actually support another one of these data centers? Does a place exist that that can make sense? And there's a weird fight going on between the hyperscales that a lot of people don't see of like finding the small number of geographical locations and then fighting to get that location for your data center rather than somebody else's. Um so I think you you kind of you walk down this list of constraints that a small number of companies are walking down right now. What I would say is we don't think of ourselves as the kind of like latent compute alternative to that. We think of ourselves as infrastructure software layer to just make that whole process more efficient. So if you make the analogy to Bitcoin, what Bitcoin provided was the kind of like downstream access to yield generation or the demand side in kind of the machine learning world for those data centers. So it allowed basically anybody to be involved in that process of finding the right location, finding electricity source, finding cooling, et cetera. It opened up that entire world of logistics to, again, like the entire world and allowed them to participate. Where right now, to build one of those data centers, you would need to be Google or Facebook or X or someone like that to have all of the later, all of the ML knowledge, all of the downstream ML use cases, have the chat app, et cetera. You need all of that before it's beneficial to have that data center. We sit in between that and we say, look, there can just be a layer of infrastructure here that allows anybody to use machine learning capable compute. That might be a MacBook. Long term, it might be an iPhone, it might be an Android phone, it might be a smart fridge for all we know, as long as it can do machine learning operations at a market efficient kind of uh rate. But it might also be a data center. It might be a hundred thousand H100s in a data center somewhere run by somebody who doesn't know much about machine learning, but knows a lot about how to deploy data center hardware. Um, and so we just kind of sit as this sandwich layer in between, abstracting away the logistics of running hardware from the realities of running machine learning software itself and training models, running inference on models, post-training models, etc. Those people can be entirely separate if you have this completely market, like this neutral infrastructure layer that everybody can access from both sides.
SPEAKER_00That's the best argument I've heard. Like mentally, I always assume this is war level spending, the hyperscars will figure it out, the governments will like give the green lights and just make this happen. But when you go through all the when you move from like academic to like reality, there is this long list of things that you will just butt up against in reality. The one you haven't you didn't mention was just energy. Like we're seeing in the US like them just want like these hyperscars want nuclear, like they want power plants everywhere. Like, is that a limiting factor?
SPEAKER_01Or I think in the centralized case, it is a limiting factor because you have to get an enormous amount of energy to this one location. Um, so you might be able to get the physical location to put 200,000 H100s in a big shed somewhere in the middle of nowhere, you might be able to cool that. Maybe it's miles away from any settlements, noise doesn't matter, you can get the fans in, maybe you do immersion cooling, whatever. You can put all that infrastructure in. But like you say, if you don't have a two gigawatt supply of energy direct into that data center, then you can't run the cards. And so then you get the situation of are you pulling from the grid? Are you pulling from the nearest settlement nearby where that city then suddenly is competing with this enormous data center, which its economy sits on the global scale, whereas the cities doesn't sit on the global scale, so might actually end up losing out against this data center and you kind of kill a town or you kill a city and just to have a data center, or you put in a completely bespoke power supply, like you say, you build a new nuclear reactor, build small nuclear reactors, whatever. Not to say those things are bad necessarily, but just the way that they're done right now is highly constrained. It's a very small number of people who are able to apply these techniques to building a data center. And again, like we saw with uh Bitcoin mining, you have all of these electricity sources that sit in the middle of nowhere, you've got hydroelectric plants, you've got geothermal vents all over the place. These are good sources of power. There are people who know about those, who know how to apply them. Maybe they own those locations, and they could be building these data centers. They could be deploying cards into them, they could be running machine learning infrastructure, but they don't have that connection to the entire machine learning market. They don't understand how the dynamics work, and so they're cut off. And it's just waiting for one of the hyperscalers to discover them and then a deal has to be made, et cetera. Um, and it goes through all this human world process to be settled out. Gensin will exist as this, again, pure software infrastructure layer, which says to anyone who has access to those resources, hey, you can capitalize on those resources. We think about it sometimes in the kind of um like oil refining sense, where the core resource underneath machine learning is energy, it's electricity. It's being refined into machine learning operations by refinement hardware. Same as you would refine oil into some useful kerosene or whatever, some useful thing that you're going to use in the world. Um, that refinement hardware is GPUs. And people can buy GPUs, people can make new refinement hardware, there's an entire market for that. Um, what Jensen does is basically say to people once you've done that refinement process, here's an entire market for you. You could just go and sell it almost as a commodity rather than having to become a machine learning company and compete with OpenAI to actually sell what you've refined. You shouldn't have to do that. You should just be able to trade it as a resource. We can't get completely to the commodity stage because there's lots of dynamics within the market, but we can get far, far closer than we currently sit by changing the way machine learning is performed.
SPEAKER_00That's awesome, Ben. And so I it's like the best comparison of centralized data centers, what they can do. So I want to switch gears to Jensen itself. Um what is the end product of Jensen? Right? When I think of OpenAI, I just think of more compute, more training, innovating on RL, thinking models. When I look at Jensen, what am I getting? Am I getting a large model? Am I getting a small model? Am I getting thinking models? Like, like what is the output of the network?
SPEAKER_01Yeah, good question. Um, so we, like I said kind of earlier in the in this chat, we think quite first principles and very low level about machine learning operations. So at the end of the day, we think, what is happening when I perform machine learning operations? What is happening within these neural networks? Basically, I'm taking raw data of some kind, I'm taking human knowledge or expert knowledge, which could come from models themselves. You're taking some kind of additional knowledge on top of the data. And then finally, electricity. I'm combining all of those things together, and I'm basically doing this like generalized compression mechanism to transform this raw data into a form which exists in parameter space and is more usable for a lot of downstream tasks. And so foundation models and frontier models do this on many modalities. They compress everything into this shared space, and then within that shared space, we could do all these exciting things that we see from the consumer-facing apps that are out in the world. So Chat GPT was the first time that the kind of entire world got exposed to this. Obviously, within research, there were many moments before that where people got exposed to hey, actually embedding text and images into the same space allows us to like do these weird operations on them, and it's really cool. Like if you remember Deep Dream back in the day, this idea that you could walk through Through latent space around model features or around image features and things like that. Those were the seeds of like this exciting stuff you can do if you can compress raw data into this like um parameter space. We've taken that to an extreme now with um foundation models, but we can take it even more to an extreme in the future by just changing most data representations to parameter space. Is that that's our view? Um so rather than doing this with this concept of a singular model, we think you can do this basically everywhere. Um there's difficulties in how you construct objective functions and how you actually create targets for the compression of this data into models, but those are solvable in our view. And they become, rather than this kind of like a task for a single researcher to do against the hardware they have, they become more of a global task. The whole world is constructing these objective functions constantly. And in doing that, we're creating a not even a model, but like a set of models in the future which exists across every device. So this gets a bit abstract, but our view is that the internet itself is changing. Its base data type is becoming parameters. As that happens, we replace uh ML execution or we replace the kind of raw data storage with ML execution. We stop storing uh databases of text entries and we actually store um model parameters that are compressed versions of all of those um those text entries, the same with image entries, et cetera. But once you do that, every time you access that data, you have to perform machine learning operations to access it. You can't just pull it from a database and read it as a human. You have to have this model layer that sits in between. So what that means is you need to be executing machine learning operations constantly at every device when accessing data, when going back and forth between parameter space. And so, long story short, to circle back around, that's where Jensen focuses. We build three main things for the future of machine learning. One is consistent execution. So executing machine learning operations on every device in the world that's capable of doing them in a way where those devices are compatible with each other. So when you're executing some matrix multiplications within your neural network on your MacBook, for whatever reason, if you needed to link that to the execution of part of the network happening on an H100 in a data center somewhere, part of the execution happening on your iPhone, you could link all of those together because they're fully compatible with each other. They're essentially doing the same set of operations. Um, so we build a compiler and a set of uh libraries that sit on low-level hardware to make that consistent all the way down from high-level frameworks. So when you have a model in PyTorch or TensorFlow or Jax, when you actually run that, when you compile it down to execute on hardware, it's consistent between the H100 and the MacBook and the AMD uh device and the Intel CPU and et cetera, et cetera, that kind of list of hardware. Um the next piece that we build, once you've got that, you've got this concept of execution is just compatible on every device. The next piece is communication. So once those devices can execute any arbitrary machine learning operation, any piece of model, any distinct model, can the devices talk to each other in a way where they can send these matrices back and forth? They can communicate over tensors. If we wanted to do pipeline parallelism, data parallelism, model parallelism, expert parallelism, think of any parallelism method, can the devices send the data they need to to each other in a standardized way? This is analogous to TCPIP or something like that, but fully peer-to-peer. The idea is there shouldn't need to be any centralized party. If my MacBook wants to communicate about machine learning execution with my iPhone, it should be able to create a local network and just send data back and forth in a standardized way. And then if I want to drop in access to a server with an H100 in it, that should be able to just join. And it's also sending data back and forth in a standardized way. So we build those peer-to-peer communication libraries to allow the devices to communicate. The final piece we build is the verification text. So that says every device in the world is able to execute consistently, every device in the world is able to communicate with every other device in the world. Can they trust each other without having a human involved for some reason? So right now, if I wanted my device to talk to your device about machine learning operations, you and I would probably have to sign a contract that says, yes, your device is going to do what I send it, uh what I tell it to do. Otherwise, like you might just do something completely different. I might not know, our contract is invalid and it the whole thing doesn't work. And so instead of having that real world contract where a judge would arbitrate if we had a dispute and a court system would decide which one of us is right and we'd have to pay each other, we put all of that into the code. So we say within this standard, as they're doing the execution, as they're doing the communication, they can also do the verification at the operation level. So my device sends a set of operations to your device. Your device does slightly more work to generate cryptographic, probabilistic, game theoretic proofs. And then my device can check those using the consensus of a group of devices, and the entire thing can settle out. The devices can trust each other, communicate, and execute without a human ever being involved. And so once you've created that, you have this set of open source software that can run in a private network, a public network over the internet, anywhere where there are devices that can talk to each other and it can facilitate the execution, communication, and verification of machine learning operations. You could build any kind of model you want over that. If you have a local network of 100,000 H100s, you can now build a model in any parallelism method you want that will just consistently work over those devices. If you want to link two data centers together, again, everyone's compatible, everyone's doing the same things, everyone's talking the same way, they can work together. If you add in a data center that somebody else owns, you don't know who they are, you don't know where the devices even are, they can still just drop in and you can spread your model over those uh devices as well. There's no limit to what you can do with this software once it's uh spread out. That's the main thing that we build. So it it set settles out all of those human world aspects, it solves a lot of the software stack issues, and it just gives you access to hardware resource as a resource. You don't have to think about any of the other stuff.
SPEAKER_00So, Ben, those three pieces are huge. Um, the verification side makes a lot of sense to me. The thing that I'm a little lost on is when I think of Jensen, I think of training these models, uh fine-tuning them. But from your description on part one, that's like a very small subset of what you're doing. But I I also can't fully grasp it, right? Like, could maybe could you describe it in a different way? Like, I just I don't totally understand the internet as parameter space, right? Like, like what does that what does that conceptually mean? Like, I'm a user, I go on the internet, like like what's different? What's happening?
SPEAKER_01Yeah, so I I guess the way I've explained it so far is heavily infrastructure based. It's not what the vast majority of the world would see. It's the work we had to do over the past five years, basically, to enable the next phase, which is much more heavy productization. So taking that, we we solved problems basically that we had in uh conceptualizing how machine learning will work in the future. So all these scaling limits, these execution limits where I want to spread models over untrusted devices, we needed to solve those before we could move on to the next stage of what that unlocks. And so when you think about Jensen, the vast majority of the time we've spent so far has been building what I just described. All of that now is at the kind of like beta v0.1 level. It all works, it's all ready to be tested wider than Jensen itself. And so we move on to the next phase, which is as you described, how does this actually change the world for machine learning developers and researchers, but also just for people using those machine learning products? So the user of ChatGPT or something like that. Um in my mind, it changes it in a few ways. I think the biggest way is that replacement of the current kind of data access methods with models themselves. And so we saw a really naive and quite frustrating, frankly. I think I've got tweets like literally like five years ago about this with uh approach, which was just turn everything into a chatbot that happened for a while, where it was like you'll go to a website, you can click through options. Uh, this is in the kind of old world, uh, and you're you do using some CRUD app. So you're just creating some information, you're serving, you're saving some more um parameters back into a database, and that's how you use like a web app itself. So maybe I upload a new profile picture and I put some tags in about myself or whatever like that. Um, the world saw LLMs and machine learning and said, hey, forget that interface. You're now going to talk to an agent which is just gonna do exactly those steps again, but it's actually gonna be harder because now you have to talk to this thing and you don't really know how it works. And so that was a really frustrating instantiation of this, but it was still the replacement of current UIs and current web apps with models. It was just the most naive way of doing it. We're moving into the next phase now with much deeper integrations, with the idea that the UI that you use, so the buttons that you click and the options that are given to you, the drop-downs, et cetera, can be generated on the fly by a model which understands the context that you're coming from. And so that could lead to entirely user-specific UIs, moment-specific UIs. When I go to my bank's website, instead of me seeing the same form that every other person sees when they go to the website, it can bring context from my current situation and it can give me an entirely bespoke UI that says, look, I know you're here to deposit a check. So like I'm just gonna take you through that flow, and it's gonna be really intuitive because I know why you're here. In the past, systems have had this, but they've all been rules-based. They've all been kind of if statements and things like that, and they're kind of janky, like we try and guess what the user wants to do, and we'll we'll sort of just flip a switch and you get the UI that everyone gets when they go to this. But you know, we've we've kind of uh inferenced out what you're gonna do. All of that's gonna get deeper and deeper and deeper. And so um, rather than this like replace everything with chatbots, it is replace everything with machine learning, but in a way that's actually intuitive. Um, I think that's gonna happen more and more. All of the thick technology interactions that we have will be replaced with models uh figuring things out about us and giving us bespoke experiences. And as that happens, we're gradually moving the databases and the information behind the scenes from being static text in a database, static images stored in a file system to compressed implement compressed versions sitting in parameter space that allow those models to make those inferences and decide what we want to do in that moment. Uh, it doesn't mean that like raw canonical data goes away. Obviously, I don't want my bank to store my balance compressed into parameter space with um like kind of error bars and probabilities on it. I don't want the model to hallucinate my balance when I come to the site. Like I need that to be stuck. Unless it's fire. Exactly. Yeah. I mean, depending on its inference, maybe I do want that. But obviously, there are there are lots of situations where you don't want that. But I think there's lots of situations where you can move in this way. Um another analogy that might be helpful, maybe it's just specific to me, but I was a database administrator through the NoSQL kind of revolution. So I worked on SQL databases where everything followed lots and lots of specific rules at the time when NoSQL was coming around and saying, actually, for web apps, you don't need to follow any of those rules. We just need like a better user experience. Um, and we can deal with things behind the scenes. Like things don't necessarily need to be atomic. We can like, we can fix all that later, basically. And this was like a huge revolution because it gave app developers the ability to do way more stuff uh than they could before when they had to follow all these relational database rules. I view this very similar now, where we're moving from this kind of database-driven design mechanism to this like parameter space driven design mechanism where we're okay with hallucinations in a lot of cases. We're okay with probabilistic access. Um, the final analogy I would draw there, which maybe starts to get a bit controversial, is I think humans inherently prefer that anyway. It's only been a short period of time where we've been used to determinism and very strict rules-based systems. Before that, before we had technology and we kind of came to the point where we knew if we pressed a button in a system exactly what would happen, we interacted with other humans and we interacted with the environment where we had to constantly second guess what that other human was saying or what was going to happen in the environment, whether a predator was going to jump out on us or whatever. We constantly had to exist in this probabilistic space. So I think actually this is a technology which is going to allow us to bring to come back to that more natural probabilistic state that we've existed in for millennia. Uh, and this sort of blip of becoming used to determinism was just what we had to do to use technology in its early infancy.
SPEAKER_00So, Ben, that is incredible. It's it's hard to describe given how ambitious the vision is of having a dynamic, personalized, fluid internet, which is our entire world. We all view things differently. It's dynamic on the fly, it's not static, the experience is is totally nuanced per person. I want to tie that back to Jensen, though. How does Jensen power that if we're going back down the stack?
SPEAKER_01Yeah. So um at the very kind of base, we power it by providing those three kind of core components that I mentioned: the execution on any device, the communication between the devices, and the trust uh between the devices as well, entirely programmatically. Um so that's all very, very low level, obviously. That's when um when the really, really kind of like low-level device communication is happening, all of that is important. But higher up, when somebody's building a model, how does that actually kind of present itself to them? And I would draw the analogy back again to TCPIP uh behind the internet. So this idea that there is a standardized protocol for communication between devices that we use to create the overall internet, but we can also use to just link devices together. So the internet protocol wars lasted for literally like a decade where we tried to figure out how are we going to architect the standard, how our device is going to communicate out into the future. Once we decided, as kind of human society on this standard, we just went forward and we used that basically everywhere. And so now the way people think about it is they don't necessarily think about the low-level communication. They just assume it's handled. They know it's handled, they use higher-level libraries to do it, but they know that when they need devices to communicate, they can. Um, and so that's what we think about with Jensen's low-level technology is it shifts the way of thinking about compute resource from how do I get an H100 and how do I execute my model against it to execution's always available. It's always a thing that's there. The cost of it settles out in the ether somehow. Eventually, there's a value chain which goes from my high-level app all the way down through multiple transactions down to the fact that something executed somewhere and it was expensive. It needed electricity to do it, it needed to communicate, it needed specialized hardware to do it. But that whole chain is abstracted from the users. What the users think is just execution's available when I need it. And there's a rough kind of cost for me and when I access it. I know like vaguely how much computation I can do based on the value of my app, et cetera. And I don't think about it beyond that. Moving the world to that state, though, is not a simple thing. It's not something you do overnight where you say to people, hey, we built these, like this compiler and we built this communication framework and built this verification system. Now go and use it and build that entire chain of value back up. People won't do that. And so that's the kind of thinking behind this new phase of Jensen, the productization phase, where we have to do two things. One is demonstrate to the world what's possible now. So we build demo applications, they can be products in their own right, but they're things that draw from that new future that use that uh new way of thinking to provide a benefit that you just couldn't access before. And so an example of that is RL Swarm, which we released a few weeks ago. And um there's actually, as of the airing of this, there's some uh there's some new stuff out there with our with our testnet for interacting with RL Swarm. Um, but there's other demos we can do as well. There's other products we can build. Um and that feeds into the other kind of half of your question as you asked it earlier. What does Jensen do when it comes to training models and kind of designing new models? We are very opinionated in the way that models can be built once you've solved those infrastructure problems. Because we started there. We started with wanting to build entirely new models, much, much larger models, much more distributed models, sparse models, uh, these mixture of expert-style models, models that learn to communicate with each other rather than having that inherently built in. That was where we originated. But to get to where we wanted to get to, we had to solve all these infrastructure problems first. And so we did that. Now we're back to building those new models. Um, RL Swarm is an example of models learning to communicate between each other in a reinforcement learning process training context. But the same things can be applied, or very similar things can be applied in pre-training, you can apply them in inference pipelines, you can apply them at every point of the model stack because what you've done is just change the mechanisms for communication and execution to ones that are infinitely scalable, much easier to access, work over heterogeneous devices, work over heterogeneous latencies and bandwidths, um, et cetera. So we just have better tools available. Now we can move on to those new models. Um, what I would say for me personally, my research back in the day, 10 years ago, was specifically on evolving neural network structures using swarm algorithms. So, this idea that you could handcraft a neural network, you could find a problem and you could like set the layers up in the right orientation that you think is going to be effective for your problem, and then train that kind of static model to work with those layers. But at the end of the day, that's a huge optimization problem. It's a big search space of different layer configurations and different ways of architecting this model, and we can just be searching that space. So, like I said, a decade ago, I did this with like four GPUs at a time under my desk as a poor PhD student. I want to see what's possible when the world can do that inherently within machine learning over the entire global landscape of compute. And I think it changes this idea of training runs from they run for a few days, they run for a few weeks on these specific models. We're trying to get the best benchmark results. To these models live multi-generationally. They exist, they evolve, they merge, they interact with each other, they learn communication alongside learning communication with humans. Like each of us will have a model that sits alongside us for our entire lives. Maybe there'll be intergenerational models that get passed down to children, et cetera. They have all the context of the previous uh owner of the model, etc. That entire process can happen out into the future of time across every device if we have the right infrastructure. Um, and so we build those models as well, but we had to build the infrastructure first.
SPEAKER_00Hey Ben, so RL Swarm is is live and testnet will be live by the time this comes out. We're recording on March 28th, right now on Friday. Um, walk me through the difference between RL Swarm and the and the testnet side.
SPEAKER_01Yeah, sure. So RL Swarm, um, just for anyone who who hasn't seen it or hasn't uh drilled into it yet, is a uh reinforcement learning post-training system. So take the idea of Deep Seekar One, um, this idea that once you've done the pre-training of a model, you can use reinforcement learning against a data set which is verifiable in the machine learning sense. You have some kind of signal that you can use to self-improve answers. Um, in the RL setting that we use, it's like the formatting of a proof, or you could do the formatting of code, you can do syntax checks, etc. Use that combined with the accuracy of an answer to allow a system to recursively self-improve, essentially. Um, RL Swarm is taking that concept and allowing models to also communicate with each other and critique each other's answers with the goal of improving together as a swarm to find the best answer they possibly can. In doing that, what you actually get is every local individual model is improving based on the model, the other models in the swarm. So every model is able to pull information from the swarm, it's able to get better. Um, and the overall performance of the swarm itself gets better. If you were to ensemble all the models together, you've got a better model out of it than you would have had it had individually. The key piece or the key way that this is done is rather than doing something like federated learning where you're averaging or something between models, you're actually allowing the models to learn themselves how to communicate. So there literally is like an explained tag, like the think tag that you get in reasoning models. Instead of just thinking, it also has a tag to say, this is what I'm going to say to the other models to convince them that my answer is the best. And then you also have a critique tag where they say, when I've looked at other models' answers, this is how I disagree with them, or this is what I think I they should do to tweak their answer to make it better. And in doing that, you've created kind of like the town square for models. The analogy, and again, super rough on this analogy, but the analogy I use is if you think about humans learning things. In school, early on, a human learns basic rules. So they learn this is how maths works. If you add one and one together, you get two, et cetera. This is like a set of rules, you just need to know it to go forward. Once you've done a load of that, you move on to almost like philosophy. It's how to think. So you move on from you've got all of these rules to how are you going to apply the rules and how are you going to check your own thinking when you apply these rules? And so people go down the kind of rabbit hole of learning about philosophers and kind of rules for how to think and how to form ideas, um, how to reason about new domains, et cetera. We kind of like we get all of that from the field of philosophy. Finally, you get to discussion and discourse. So, the idea of like a Socratic dialogue. You and I talk, and in doing that, we form our opinions even more strongly than we could have done individually, because I'm accessing the diversity of your thought through our dialogue. And so that's like the next iteration of human learning. Once you've gone from learning rules, learning how to think, you learn how to talk and how to communicate and how to get ideas out of a group of people better. And wisdom of the crowd typically leads to better outcomes. We're just applying that exact same system to models where pre-training is teaching them all the rules, it's teaching them the fundamentals from an image perspective. It's teaching them the concept of a line exists, a circle exists, like higher-level kind of abstractions exist. Same for language, the distance between different words within different languages, et cetera. They learn all of that. They move on to reasoning. And so they learn this is how I would structure my answer when I'm recursively able to look at my answer again and like uh apply some self-critique to it. And then finally, within RL Swarm, this is what I would do to convince another model of my answer. Uh, and this is what I would do if another model tells me something and it's novel to me. I would critique it in this way. I would say, actually, I disagree with that or I agree with that. Um, and so to us, it's taking machine learning to that next stage, but specifically within the post-training reasoning context. The final piece that's really interesting about it is that's almost infinitely horizontally scalable. Um, because you could have any number of models communicating. Um, you have kind of similar communication rules to you have as humans. So, like enormous numbers of models talking in one big room is obviously going to be a lot of data for those models to process, but you can move into separate rooms, you can move into separate swarms, you can have subswarms to discuss specific topics, you can have meta-swarms that discuss specific areas. All of those tools are available once you have the primitives of, hey, models can just communicate about ideas now and improve based on the ideas of others and themselves together into one big um kind of understanding as a group.
SPEAKER_00It's it's fascinating to hear through just a discussion on reinforcement learning because it's something I'm I'm really don't fully understand. But so like my mental model is we we take the the internet's text and we convert a model and it's all zeros and ones, like it's set in stone. If we want to do specific narrow models, we we edit those numbers so that it's specific to a domain, but it's sort of static, it's in place. But now we have this totally new domain, which is just reasoning and how the models think and and things like that. And that's just fascinating. So like, is the so you're doing RL Swarm and they're it's peer-to-peer reinforced learning over the internet on consumer hardware, and all of these nodes are talking to each other. Well, like, how does that though feed back into the model? Like, does that changing those base zero and ones, or are you changing the way they think? Like the sun is a circle, convinces the other node, and now it thinks differently about the world? Like, how does that update the model? Like, that's where I'm stuck.
SPEAKER_01Yeah, so in the RL swarm setting, they do full fine-tuning, so it does update the weights of the model. Um, but you can do this in any kind of area of the machine learning. It you could be uh using that communication to update a local database, which is used in a rag system, so you keep the parameters themselves completely separate. You could do partial fine-tuning, um, you can do full training. You can do any uh method of updating the knowledge that the model has. If you think about it just really, really primitively, then the model has like a set of information. It's got a set of information in its parameters. Typically, it has a set of information in a rag database that it can pull from. It can have information at prompt time in a system prompt and a user prompt, etc. But at the end of the day, you can just map out those different sets of information and you can change them at will. Um, and so what we're saying is what you want to do is be able to take some entropy in the world and get that into the model's context in like a wider sense of the word of context, just its entire context. You get it in somewhere. Um, this is a continuation of a view we've had for a long time. We've talked about for quite a long time. It used to be pretty heretical, it's not anymore. But this idea that training and inference just blur together, at the end of the day, you have a big set of information over different sources, and you can do similar operations to you would do against a database with that information. So create, read, update, delete. You can do all of those operations against this information. They're just changed in a way. So some of them are like training, some of them are inference, some of them are kind of fine-tuning. There's like the concept of forgetting, which is incredibly hard to implement, but it still exists within parameter space. So you're doing those same operations, but you're doing it against parameter space, prompts, context, uh, rag databases, etc. Like I said, in the RL Swarm context, it's just full fine-tuning against the model. It's updating the actual model parameters. So to drill it all the way down to an individual user, if you, Tommy, deploy an RL Swarm node on your device right now, what you will get is a series of checkpoints from the training of your local model as it communicates with the swarm. So you'll get new model parameters which are updated based on the information it has learned by interacting with the swarm. So some of that is its own reasoning. It's just done RL reasoning itself. But some of it is also information it's got from other models as it's seen them reason. If every single model is identical, this is better than a single model doing it on its own in a way that's similar to like federated learning where you get kind of faster convergence and things like that. If every model is different, you actually get a huge benefit from the diversity of the swarm. So you're able to access the thinking of a model which has information in its parameters that you don't have. It might have been fine-tuned locally on a data set that's bespoke to that individual user. It might have been fine-tuned on data that's in a different language or something like that. And so it's using its own context to give more information. And so what you get out of this is this kind of meta distillation process where all of the models are able to just learn over a kind of defined interface that explains and critique and whatever other tags you want to put in it interface from the other models. So it can be seen as a big meta distillation process. Um, I realize I didn't answer a bit of your question before, which is what does Testnet mean if that's what RL Swarm is? Um, maybe very briefly and happy to go into this more. Testnet is the first kind of application of persistent identity to this system. So the intention behind RL Swarm was it for it to be purely open source software that's applicable on any device, any network, any subset of devices. So you don't need to connect to any specific system to use it. If you have two MacBooks sat next to each other with no internet connection, you can run RL Swarm across those two MacBooks. If you have an internet connection, you have a MacBook, and you know someone on the other side of the world who has a bunch of H100s, you can make a swarm together as well. So it's completely agnostic to the devices, the heterogeneity of them, the heterogeneity of the connections, the way they connect, if it's internet, if it's local, whatever. It's it can just work across all of those interfaces. The next step for us is saying there needs to be a coordination point for this, and people want to have some kind of persistent identifier for their devices. And so we achieve that maintaining that decentralized standard. So maintaining this requirement not to have a centralized server by connecting that up to an on-chain identity and saying now your device can identify itself through different swarm runs. So if you joined one swarm that was training math knowledge, and then you joined another swarm that was training coding knowledge, your model can move between those, but you have this persistent uh account, which is um the kind of higher-level identifier of this model. Uh so the testnet is testing that functionality basically. We'll show participation, there'll be a leaderboard to show people, hey, this is your progress in participating in the first swarm that exists, future swarms, if you want to make a swarm, you're able to make one as well. But it all kind of flows back to this on-chain identity, which you could see as the identity of the model or of the person. It's kind of up to you. We don't make any um any specific uh intuitions ourselves. We just say, hey, the tooling is here now. We want to see what people can build with it. And we'll build a roadmap as we go. Uh the deeper infrastructure gets fleshed out, the compiler comes in, verification comes in, etc. There's more and more and more functionality for people to build at this low level. But this is just the kind of first demonstration of the MVP of the identity piece.
SPEAKER_00That's that's that's awesome. I have so many questions. The the RL stuff, it just back to that is fascinating. It's a little scary though, right? Like I'm I'm just trying to think through like my own model. Like, I don't want its personality to change that much, right? Like, how do I make sure that if my model goes out there, it's on this global swarm? I don't want it to be like an art expert or or a nuclear physics expert. Like, I want it to be a crypto expert, right? I want it to stay there. Like, how do you like how do you get the benefits of the gossip without the extreme model drift that would be not usable for me or not helpful?
SPEAKER_01Sure. So you're fully in control of that. At the end of the day, if you're running an RL swarm node against a local model, you can choose which checkpoints you use, which updates happen to the model. There doesn't need to be any updates whatsoever. You could just be almost in the kind of BitTorrent way, you could just be seeding. All you're doing is putting input into the swarm and you're not actually receiving any. You're not taking any updates yourself and canonically putting them into your model. Or you can do. Uh, the beauty of models as well is you can have many copies of the model. So you could have a new copy running that is learning uh the kind of principles of kung fu to use a matrix reference, and the original model that doesn't know kung fu, and you choose if you want to switch over, or you want to run two in parallel, or you want to use the kung fu model in a kung fu discussion in a kind of chat system somewhere, sure, go and use that. But don't use it in the crypto chat system one. So all of that is still available to people. I think there's an enormous amount of productization that can happen here. And obviously, we as a company will do some of it because we have to demonstrate it, but we really want to see what happens when other people do it. So when people are using these models for downstream tasks, right now they might use Allama, they might use LM Studio. I imagine there's going to be many, many more local applications for executing these models. Cursor is a good example, like coding models could be done in this way where you're using your local coding model. It can improve in kind of coding swarms. Maybe there's language-specific coding swarms. You want your model to get really, really good at Rust, then it is kind of constantly in the background in the Rust swarm, just getting better and better and better at doing Rust. All of those things can exist, but they need to be put into the products. And so, like I said, we demonstrate some of what's possible. We can't build everything. We really want other people to build these things, and they can build them against the software that we've got, that low-level execution, communication, and verification software, but also against the chain. So if for whatever reason you need ownership information, if for whatever reason you need attribution of value, you have all of that available via the chain and via the on-chain identities. You also have the ability to transact if you need to move and move money around. If you want to build kind of uh downstream markets on top of this, you could have prediction markets on top of these models. All of those things are available because of the uh low-level infrastructure we've built and because of the chain itself.
SPEAKER_00That's awesome. Yeah, I'm just it I'm just trying to think through like you you can just set your biases, set your values, like maybe the AI has um visibility into the problems you're solving and your personality, and that's weighted higher. It's it's super interesting. Um, Ben, I have so many questions here, but I know we've limited time. Uh so walk me through the I know we've covered the end state of Jensen a bit and we we've gone up and down the stack, but walk me through the end result of RL Storm specifically in the test net. And if that's the wrong question, feel free to take it a different way. But what what does that look like in a year or two when everyone has this, everyone's running it? Like how does that how does that change the user experience? And I know you're the infro, so it's annoying to keep getting these product questions, but it's yeah.
SPEAKER_01Honestly, it's it yeah, it's it's fine. It's something we we obviously think about a lot, and we we have to. And in many ways, um every strong project has to go through this, right? Like you look at OpenAI's um deep research, bunch of engineering, enormous investment in scaling models. And then Chat GPT was their productization moment where they realized, oh, this is the way, at least for now, the world is gonna interact with everything we've been doing. They didn't know that from the start. What they knew from the start was solving those scaling problems and just like throwing uh enormous amounts of resources at this would make a transformational technology. And then after that, they would figure out how the world would interact with it. And they're still gonna be figuring that out. Beyond Chat GBT, there's gonna be more. There's gonna be more product kind of segments that people discover where they say, oh, this is where this technological shift clicks for people. This is where actually people will interact with it. Um, and typically I think you don't see that at first. You see, yes, this is definitely gonna be transformational. You don't see exactly how when the kind of random normal user on the street interacts with a technological system, that value will show itself. Uh, and I think we're in exactly the same position where we're productizing, we're seeing those early uh kind of light bulb moments for people where they realize, oh, I can do this thing now that I couldn't do before. And so RL Swarm is one of those foras with the local execution where it says, I can run a local model that I can constantly improve based on other models without having to be beholden to a specific model from a specific supplier. I don't necessarily have to sit here and wait for Claude 3.8 to release before my coding model gets better. And power users right now will do local fine-tuning and they'll gather data sets, but that's an enormous amount of effort for them to do. And so RL Swarm gives a way for the average user or even the power users to improve their models, to constantly have these living models which are getting better all of the time based on communication without having to wait for Anthropic or OpenAI to release the next big model. And I think that's a good shift towards essentially more open source development, more open source uh or more embracement of open source from the wider world. Uh, the key reason it couldn't be done before was access to resources, like actually being able to execute training steps on a model. Uh, actually being able to communicate between devices which can do those training steps was really, really difficult. Um, and our infrastructure allows you to do that much more seamlessly, and RL Swarm is the first interface to it. Um, quickly as well, and like in the progression of the test net, we'll have the ability to do outsourced execution. So you have a model training on your local device. There comes a certain point where your device just isn't powerful enough. Right now, when you hit that point, you have to go and buy a GPU somewhere. You have to figure out how to lease one or whatever in the test net. That will just be a one-click kind of like one transaction. Hey, somebody else needs to execute this for me because he's got too big for my device. Can I just push that to someone knowing that it'll be performed correctly, knowing that it'll be verified, knowing that it'll be trusted, uh, knowing that exactly what I'm pushing will actually be executed. And then maybe it continues to execute there. Maybe I pull down a distilled model which runs locally on my machine. All of that is just seamless because the infrastructure exists to do it. Uh, maybe actually I shard my model. My model exists in two parts. Some of it executes on somebody else's device, some executes on mine, and that's just how I want to run it. But again, you don't have to have those human world arrangements of like renting a GPU somewhere. It just happens over the execution uh system and over the test net itself. Um, so yeah, again, a bit of a kind of winding path, but RL Swarm is just the first product angle. We find it really, really interesting. We think it has a huge kind of spike for that local execution um use case. But there are more. There are more that we will do that use that low-level infrastructure and kind of show the world what's possible.
SPEAKER_00That's awesome. Ben, I hope somebody listening takes this transcript and just throws in one of the killer video models just to showcase like what the world in the inifer looks like now side by side with the vision, with your vision, just to visualize the two different worlds because it's it's so ambitious. Um, and it's just awesome to hear. And uh, we have like 15 more minutes, so I want to just go back, zoom, zoom back out to the the hyperscale or the web two AI labs and and things like that. I want to talk about the end state differences between what they can provide and what you could ultimately provide, right? We started the podcast on the infra. We talked about why they can't do it, the limits, we talked about that stuff, but I want to talk about like where they're going versus where you're going, right? Like, can Apple give me a fine-tuned model and that's enough? Or or can open AI do that and that's that's enough to serve my needs? Or is that just is that just not enough for the vision? Like, what's the limiting factors there?
SPEAKER_01Yeah. Um, so I think they're all slightly different in the dynamics of what they can do. Um, Apple's a really interesting example because they've been doing a version of this for a long time. So they've been doing federated learning within the landscape of their hardware to improve the model in the photos app, to improve the keyboard. Google have been doing the keyboard thing as well for a long time with federated learning over the devices that they control and they own to be able to get the benefits of models executing on those different devices shared between the devices. But crucially, that's within a world garden. So that is the kind of like OG Microsoft of the world of the future of AI, where yes, if you do have access to a lot of consistent hardware or an operating system across these devices, within your world garden, you can do versions of what we've I've just described as Jensen doing. The way you can view Jensen is the open source alternative of that, where it executes across everything. So you can do what Apple does in the Photos app across iPhones, but across iPhones, Android, fridges, like TV screens, anything that has a camera in it, laptops, whatever, it doesn't matter. You can do that process. Um, and so we're building the software stack to catch up with that kind of centralized walled garden approach and saying, actually, there's no reason other than difficulty of engineering why this can't exist open over every device. And so that's what we've built. So in comparison to Apple, that's what I would say.
SPEAKER_00Um And there's there's also just like what are the other reasons why the end result of your world is better than an open Air Apple? Like, are there system prompts that would bias the model, like centralization risks, like what outside of like the personalization, the internet side, like the just like the nitty-gritty things like that?
SPEAKER_01Like, yeah, I think there's there's a huge philosophical piece here, um, which again we we've thought about for a long time. It's become um a lot more in the kind of um front of mind for technologists recently. Um, but this idea that models are inherently biased. They are machines of bias. That's why they exist. Um, that bias is effective because it's how you translate data into parameter space by having biases. Um, so you want a model to be biased a certain way because that's its its kind of learning. It's how it actually compresses information. Um, but in doing that, every time that model is used, it presents its biases in that use case. Um, if you have a model created by one person, but used by a hundred thousand people, for example, those hundred thousand people are inherently being exposed to the biases of the originator of the model. That's not necessarily a problem if you have a kind of free and open world of models where people can move and choose and kind of uh if they don't like the biases in one model, pick another model. But if that world isn't the case, if there's enormous barriers to building a model and we only allow a small number of people to do it, what we allow is those people to export their biases and context to everybody else. Um, this sounds bad if you think about it just in the way that we think about social media right now. Like uh within certain social media platforms, you'll see like a bias shift. And so if you're there, you're like, hey, why am I getting so much information about this thing? Why am I reading so much news about this specific thing that actually I don't agree with? And that can annoy people. Um and so that's how it kind of presents itself in a very um, a very sort of like obvious high-level way. If you think about models being embedded in everything, it's a lot less obvious. Every single interaction you have with technology, if that itself is able to be controlled and has biases in it, you will very quickly be influenced as a person by those biases without even realizing it's happening. Um, this happens in social media. We're all influenced by social media. We've seen it time and time again. I think we made, frankly, a huge mistake allowing social media to become so centralized. There should be a rich ecosystem of different platforms. There should be rich ecosystems of different methods of censorship. Humans should be able to move freely between them. I think there could be an open standard for communication of humans in a social context like that, which would allow us to have much more freedom of ideas across a spectrum of different uh people on the planet. We didn't really manage to do that the first time around. It's okay right now. Maybe we can move it back. I think it's much harder to move it back once you've you've got there. Machine learning is at its pre social media point right now, and we have a choice. Do we let what happened last time with social media happen again, but with enormously magnified effects because it's in absolutely everything we do, or do we architect? protect it correctly and say we should allow this to be as open as possible, as open to ideas as possible, over open standards not controlled by a small number of companies. And I kind of obviously incredibly strongly think we should take the open approach. This should be a freedom of ideas. It shouldn't be a small number of companies controlling what the world sees, thinks, hears, experiences, etc. Because machine learning is going to be embedded in absolutely everything.
SPEAKER_00The media representative insanely good take. No, you're totally right. I mean if when you pull up an LLM there's that invisible system prompt and you don't really know what's in that, right? That can bias your output. If you're in a world where all your applications are built on an AI model, it could it it's massively powerful technology. It understands you better than you know yourself and it could bias you and convince you of things and make it and you you feel like it was your idea, right? Um the the what you're describing with social media is just so awesome because I'm trying to think through why would someone choose Facebook on Jensen versus Facebook on web two? And the thing I'm thinking about is what you said before is that your model can decide what it wants to interact with, right? And when you go on there you should have an experience that makes you happy right it doesn't create this flywheel of anger and bias and like flaunting your views. And I think people will choose the happy approach over the web two monetization algo.
SPEAKER_01But I guess we'll we'll see I think one of the difficulties there and um some context my startup before Jensen was a data privacy startup. So I I have some reasonably strong opinions still and some battle scars on uh individual data privacy and how we interact with the world of companies and one of the biggest issues there was always you want to present as much choice as possible to the individual. So you want the individual to consent to the decisions made but it very quickly becomes a fire hose that individuals cannot deal with. Like humans can't make choices that often like it just doesn't work. People get fatigued incredibly quickly. So you go the other way where you make the choices for them, you index on their interactions and you just make it so they keep interacting. And so as we've seen like the the the algorithm within social is basically just driven by what keeps me in this platform and interacting. And I'll just keep hammering you with that. But that's not necessarily good for people. It's just good for the platform like you mentioned that monetization algorithm one of the beauties of having machine learning as an interface between us and those platforms is machine learning doesn't get fatigued. If you can keep giving it resources, it can keep doing that. And so you can have this layer between you and the rest of the world which is your layer and it's able to do those choices on your behalf. You're able to outsource a lot of those choices to the model. Obviously there needs to be an enormous kind of shift in the way humans think about technology and outsourcing thinking to personalized models and things like that, but it can actually improve that situation as long as we architect it right. If we don't, I'm outsourcing my thinking to ChatGPT as is millions to billions of other people in the world that's not good because then we have this like homogeneous hive mind which is all being filtered through like a small number of people's opinions and biases. It shouldn't be it should be filtered through truly personalized models that we're all able to build, we're all able to refine and they reflect us individually rather than reflecting one kind of consensus view and then a little bit of us that that company decides they'll let through. Overall my view of what we should do with machine learning is allow it to reflect the biases of the true human world. So eight billion people in the world all have different opinions uh vast the vast majority of us are okay with that. We think that's a good situation. Everyone has different opinions out of those opinions clashing and working together, we get the progress of human society. We should reflect that world in the machine society as well. So we should allow the machines to have that same degree or even more of difference of opinion and allow them to just work it out through again not just a training run that lasts for a few days or a few weeks but out into the perpetuity of humanity working with machines, which is going to get closer and closer as we we head into the future.
SPEAKER_00I I love the take Ben it so it's kind of philosophical but it reminds me of the books The Subtle Art of Not Giving a Fuck and the courage to be disliked where like when you export your happiness or views and life framework based on other people like you can never be happy right because I'm constantly trying to make you happy based on your values and your guidelines, right? Not mine. And you could never truly be happy or enjoy yourself until you focus on what makes you know you're not you're exporting your your moral rule set. It's very similar with these AI models, right? Like you're export if I go on an AI model and it tells me my idea sucks, like I'm just gonna keep asking questions until it gives me the validation that an idea is right. Like it's it's so messed up to think about. Yeah. I I have one last question for you because I know we're butting up on time, but we you've seen so many um changes in AI from centralized to open source so many waves, right? We're now getting open source AI from China with Deep Seek and Clan and Manus and everything and the centralized web two companies in America continue to argue for regulatory motes, stay closed, stay proprietary.
SPEAKER_01I I just need your take on where this war ends like what happens to open AI, who gets to AGI first, open source like any take you want here this fine I would maybe yeah maybe a couple of takes one I really struggle with AGI as a concept. I think as a label it's so ill-defined as to be detrimental I think in its usage um you you have to talk around it because we need something we need some like concept to be able to like cling to otherwise you've just got this entirely mercurial world of changes which is really difficult. But I think so much gets hidden in the redefinition of it that like you can actually you can push any view through the way that you describe AGI. So that's just a personal kind of rant which doesn't really help anything for this conversation but it frustrates me a lot. I think in reality in terms of progress like who is going to do particularly well I think I would bring it back to what are these companies chasing? They're chasing value capture at the end of the day. And I think value capture looks like this spectrum with two extremes where the capture actually happens. On the infrastructure level there are core resources that need to be coordinated. If you have those resources you will capture value because you're providing a real kind of like true resource to the market it's required it's essential there is a limited supply there is essentially unlimited demand there's always going to be a market there. On the other end you have the users themselves if you can capture users into your platform you can continue pushing features to those users you can build motes around personalization. So if ChatGPT has a year's worth of history of your discussions with it, you're not gonna move to another system that doesn't have that year's worth of discussions. So the more these companies get you in, the more they capture you with value with a moat by personalizing models to you. In the middle is this huge kind of like valley of decaying value. And so actually building models themselves in my opinion is in that kind of decaying value position. You can't continue to monetize a model forever. If you offer access to a model that model can be distilled out in the world and increasingly it'll be able to be distilled. So focusing on building the best model right now is a kind of redirect to capturing one of the two other sides. So every company that's saying hey look at this amazing model we have they're probably either trying to capture investment to build out the infrastructure side or they're trying to capture user eyeballs to get them in, to personalize models, to have them out into the future and to have built a point in time moat. And so that's all these companies are doing I think open source in my view is the future of that entire valley in the middle because it's decaying value, because there isn't any kind of like business proprietary value in having it, it inevitably just gets built out by the community. Facebook recognized this or Meta recognized this really early and said, hey we can do the same playbook we've done before which is build a really good open source model and then let the community keep improving it. We get the benefits of that we also kind of put a great marketing stamp out in the world and say that we're doing open source I think that will just continue to happen. I think the models out of China have done exactly the same thing. They've seen a closing down in the West and they've said hey we can get loads of eyeballs we can get loads of developers rallying around this we can get improvements to this model just by open sourcing it. It's a shame that the West closed down like it did. I think it was a mistake. I think we'll continue to see more investment in open source. I think realistically we'll see the companies in the West actually move back to more of an open source model but they will be driving to get more eyeballs and more investment in infrastructure and if they can do that they'll do whatever they can to get that um the side point on all of this that you mentioned is that regulatory capture piece which is always available. Do you think OpenAI open sources their models in our generation I think they will open source more than they have so far. I think they got spooked by the um kind of model safety fears and things like that. They used that for regulatory capture or they I totally agree yeah they are now seeing a narrative shift away from that towards open source again away from a lot of the kind of frankly far too far doomer um kind of fear mongering that happened around model safety in the past and I think they're correcting back the other way. But while you have a lead on the model side there is a benefit to keeping it closed or to lagging open source behind your lead obviously like Meta do this like they kind of lag the lead on open source that is going to continue to happen but I think we will see more open sourcing from uh the big labs.
SPEAKER_00Ben I have so many questions for you but I can't keep you all day um I want to thank you so much for coming on I I your takes time and time again on podcasts and at conferences have always uh positively influenced me and I hope others so really excited for testnet and uh and thank you so much for coming on.
SPEAKER_01Awesome. Thank you so much for having me. Always love these conversations the uh the kind of maze of uh of following the conversations is always absolutely fascinating genuinely uh enjoy them very very much every time we do them so thanks again for having me on and uh hopefully this drives some more people to get involved with the Jensen testnet get involved with our olswarm if you do want to build on top of any of it please contact us through one of the channels you can find we're like very very keen to get people in and building we open source as basically everything we do. We'd love to just like expand this out and get more people building uh kind of freely machine learning not on top of the centralized APIs that you see in the world.
SPEAKER_00Ben is it pretty easy to get to to run RLSwarm or on the testnet side like can I like vibe code my way as a non-technical person to join the network or what's what's the fastest way? Okay.
SPEAKER_01Yeah yeah I think if you if you drop it in cursor, the README should hopefully be reasonably straightforward. But yeah you do need at least for now you need uh a reasonable level of technical knowledge to get it deployed but if you drop it in cursor and you run it through there you should be able to easily vibe code your way to to getting it running. We'd love to see uh more and more and more people running it and feedback on that process is super welcome. We'll we'll kind of productize more of it out we'll make it easier to run. We'd love other people to help do that as well. Obviously it's open source so anyone can build interfaces and things for it.
SPEAKER_00But yeah more people doing it the better and uh and feedback incredibly uh valuable for us very very welcome nice yeah if you're listening you could Google uh Jensen RL swarm hit the GitHub view the readme and if you're technical go go to bat if not vibe code your way there um Ben I'm really excited man I really I I hope you come back on again we'll have a longer combo and we'll go from there. Thank you so much. Sound good thanks man