- Ben's Bites
- Posts
- Google's Co-founder answers questions about Gemini and AGI.
Google's Co-founder answers questions about Gemini and AGI.
Google’s Co-founder Sergey Brin visited an hackathon at AGI house and answered question about their flagship model Gemini, AGI, why he is isn’t worried about business models. You can check the video here. We pulled some quick highlight quotes from Sergey and a somewhat cleaned transcript in QnA format. Let us know if these are helpful.
HIGHLIGHTS
[Gemini image messup] was mostly due to not thorough testing. [...] If you red team any AI model, you're going to get weird corner cases.
Do I want to [build AGI]? Yeah, absolutely. To me, the reasoning aspects are really exciting and amazing. I came out of retirement because of the trajectory of AI.
As long as there's huge value being generated, we'll figure out the business models.
I don't know the details. He hasn't asked me for 7 trillion dollars yet.
If I get distracted by building hardware for today's AIs, it might not be the best use of time compared to working on the next level of AI.
I feel like I messed up Google Glass. [...] both in terms of overall technology evolution and in the sense that I tried to push it as a finished product when it was more of a prototype. Personally, I still like the lightweight, minimal display that Google Glass offered.
Cleaned Transcript
Q: What are your reflections on the Gemini art mishap?
We definitely messed up on the image generation. I think it was mostly due to not thorough testing, and it definitely upset a lot of people for good reasons.
The images prompted many people to test the base text models deeply, and the text models have two separate effects. Honestly, if you deeply test any text model out there, whether it's ours, ChatGPT, or Grok, it'll say some pretty weird things that definitely feel far left. Any model, if you try hard enough, can be prompted in that regime.
But also, just to be fair, there's definitely work [needed] in that model. Once again, we haven't fully understood why it leans left in many cases, and that's not our intention. But if you try it starting over this last week, it should be at least 80 per cent better (of the test cases that we've covered).
[…]
The model that you're trying, the Gemini 1.5 Pro, which isn't in the sort of public-facing app, shouldn't have much of that effect, except for the general effect, if you sort of red team any AI model, you're going to get weird corner cases.
Q: With the recent advancements in multimodality, have you considered a video ChatGPT?
We probably wouldn't call it that. But multimodal, both in and out, is very exciting. Video, audio... we're running early experiments and it's an exciting field.
Remember the duck video that got us in trouble? To be fair, it was fully disclaimed as not being real-time. But that's something we've actually done – embedded images and worked frame-by-frame on how to incorporate them.
So, yeah, that's super exciting. I don't think we have anything real-time to present right now, today.
Q: Are you personally writing code for some projects?
I haven't actually written code recently, to be perfectly honest. It's not code that you would be very impressed by. But every once in a while, I do a little bit of debugging or try to understand how a model works. Maybe analyze performance in a different way – just little bits and pieces that make me feel connected. I don't think you would be technically impressed, but it's nice to play with that.
Sometimes I'll even use AI bots to write code for me because I'm rusty. They actually do a pretty good job, and I'm very pleased by that.
Q: Pre AI, the closest thing we got to simulators was game engines. What do you think the new advances in the field mean for us to create better games or game engines in general?
I think obviously, on the graphics, you can do new and interesting things with game engines. But I think maybe the more interesting is the interaction with the other, you know, virtual players and things like that. I don't know what the characters are. I guess, these days, you can call people who are bland NPCs or whatever. But in the future, maybe NPCs will be actually very powerful and interesting.
So, I think that's a really rich possibility. I'm probably not enough of a gamer to think through all the possible futures with AI, but, it opens up many possibilities.
Q: What kind of applications are you most excited about for people building on Gemini?
I think for the 1.5 Pro version we're testing, we're really experimenting with long-context ingestion. Whether it's a large amount of code or even video... I mean, I've seen people dump their code, add a video of the app, describe a bug, and the model identifies the issue. Honestly, it's mind-blowing that this works at all, and I don't fully understand how a model does that.
I'm not suggesting you do precisely that, but yeah, we're experimenting with scenarios that truly require long-context capabilities.
Q: Do we have the servers to support all these people here banging on it?
Million context queries do take a bit of compute time. But, you should go for it.
Q; Do you think we can reach a point where we actually understand how these models work? Or will they remain black boxes if we just trust the makers of the model to not mess up?
No, I think you can learn to understand it. The fact is, when we train these models, there are a thousand different capabilities you could try. So, on the one hand, it's surprising that it can do certain things. On the other hand, for any specific capability, we can analyze where the attention is going at each layer between the code and video. We can deeply analyze it. Personally, I don't know the current progress researchers have made in doing that, but it takes a huge amount of time and study to dissect why a model can do certain things.
Honestly, most of the time I see analysis, it's focused on why a model fails rather than why it succeeds. So, I'd say we could understand it, and people probably are working on it, but most of the effort is spent figuring out where it goes wrong, not where it goes right.
Q: What are your thoughts on the implications of extremely long context windows and the language model being able to modify its own prompts, and what that have to do with autonomy?
I find it exciting to see these models improve themselves. I remember in grad school, I wrote a game where it was like a wall maze you flew through while shooting the walls. The walls corresponded to bits of memory, and it would flip those bits to crash the system as quickly as possible. That doesn't directly answer your question, but it was an example of self-modifying code, though not for a useful purpose. I'd hope people played that until the computer crashed.
Anyway, regarding your positive example, I see people using it to discuss themselves. I think OpenLoop, even without human intervention, could potentially enable continued improvement for certain limited domains today. However, I don't think we're at the stage where it can handle real, serious tasks. For example, a million contexts aren't enough for large codebases on the entire codebase. But, you could use it for retrieval and occupation editing.
Personally, I haven't experimented with it enough, but I haven't yet seen it reach the stage where a complex piece of code will iteratively improve itself. It's a great tool, and with human assistance, I definitely use Gemini to work on Gemini code even today. However, not for very open loop, deep, sophisticated tasks yet.
Q: What's your take on the rumour about Sam Altman trying to raise $7T?
Look, I saw the headline but didn't delve into it. I assume there was a provocative statement or something. I don't know the details. He hasn't asked me for 7 trillion dollars yet. I believe it was meant for chip development. I'm not an expert in that field, but I doubt you can simply throw large amounts of money at it and instantly produce chips. However, I'm not an expert in the market.
Q: Training cost
Training costs are definitely high, and that's something companies like us have to manage. However, I think the long-term utility is far greater. If you measure it in terms of human productivity, the value is clear. If it saves someone an hour of work per week, that hour is worth a lot. And many people are using, or will be using, these tools. Yes, it's a big bet on the future, but the cost is certainly less than $7T, right?
Q: Models running on device
We've shipped models to Android and Pixel phones. Chrome runs a pretty decent model these days. We just open-sourced Gemma, which was relatively small—a couple of billion parameters, if I remember correctly.
Yes, running models on-device is really useful. It offers low latency, eliminates dependency on connectivity, and smaller models can call bigger cloud-based models too. I think on-device is a really good idea.
Q: What are some verticals/industries that you feel like is generally going to have a big impact on, and startups should consider hacking on those?
It's very hard to predict. There are obvious industries that come to mind, like customer service, analyzing lengthy documents, and workflow automation. But I think there will be non-obvious applications that I can't foresee, especially with multimodal models and their surprising capabilities. That's why we have all of you here—you're the creative ones who will figure that out.
Q: Will Google’s models stay cheap? Or are you just planning to raise prices at some point?
I'm actually not on top of the pricing thing. I don't expect that we will raise prices, however. Because there are fundamentally a couple of trends. One is just that these optimizations and things around inference are constantly happening.
Someone says, I have this 10 percent idea, this 20 percent idea, and month after month, that adds up. I think our TPUs are actually pretty damn good at inferencing – not this thing, the GPUs – but for certain inference workloads, they're just configured really nicely. The other big effect is that we're able to make smaller models more and more effective, just with new generations – architectural changes, training changes, all kinds of things like that. So the models are getting more powerful, even at the same kind of size. So, I would not expect prices to go up.
Q: What are your predictions for how AI is going to impact healthcare and biology?
I think there are a couple of different ways AI will impact biotech. On the research side, people look at tools like AlphaFold for understanding the fundamental mechanics of life. I think you'll see AI do more and more of that – whether it's analyzing physical molecules and bonding or reading and summarizing journal articles.
I also think there's potential for AI to help patients, though this is a tough area. We're not prepared for AIs to answer any medical question directly, as they can make mistakes. But I think there's a future where an AI can much more deeply spend time on an individual person and their history and all their scans. This may be mediated by a doctor, but it could lead to better diagnoses and recommendations.
Q: Are you focusing on any other non transformer architectures to get better at reasoning, planning?
I think there are many variations, but I guess most people are still transformer-based. I'm sure someone at the company who's more involved could speak to it in more detail. As much progress as transformers have made over the last six, seven, or eight years, there's nothing to say there won't be a new revolutionary architecture. It's also possible that incremental changes, like sparsity, could still lead to significant evolution within the transformer framework. So, I don't have a magic answer.
Is there some bottleneck for reasoning-type questions? Is there a bottleneck in transformers? Yeah, there's been lots of theoretical work showing the limitations of transformers – things they can't do, limitations with layers, etc. I don't know how to extrapolate that to contemporary transformers that usually don't meet the assumptions of the theoretical works, so it may not apply. But I'd probably hedge my bets and try other architectures as well.
Q: Google Glass was may be a little bit early. Would you consider giving that another shot?
I feel like I messed up Google Glass. I made some bad decisions. It was definitely early, both in terms of overall technology evolution and in the sense that I tried to push it as a finished product when it was more of a prototype. I should have set clearer expectations. I also wasn't familiar with consumer hardware supply chains back then. There are a lot of things I wish I'd done differently.
Personally, I still like the lightweight, minimal display that Google Glass offered – something you could wear all day compared to the heavier devices we have now. That's my preference, but devices like Apple Vision and Oculus are very impressive. Having played with them, I'm amazed at what you can have in front of your screen. But that lightweight design was what I was personally aiming for back then.
Q: Do you see Gemini expanding its capabilities into 3D, spatial computing, and a broader understanding of the real world? Especially considering Google's existing products like Google Maps, Street View, and ARCore?
Wow, that's a great question! I haven't thought about it before, but now that you mention it, there's no reason we couldn't incorporate more 3D data into Gemini. It could be an interesting new mode for it.
I don't see why you wouldn't try combining 3D capabilities with a model that already understands text so well. Maybe someone on the Gemini team is already working on it—I'm not sure if I've forgotten or if it's just in progress.
Q: Hallucination and misinformation (significant summarization in answer)
Hallucinations are a problem right now, no question. We've made them hallucinate less over time, but I'd be excited to see a breakthrough that brings it to near zero. You can't count on breakthroughs, so we'll keep making incremental improvements to reduce hallucinations.
Misinformation is a complicated issue. Obviously, you don't want your AI bots to make stuff up. But they can also be tricked into it. There are complicated political issues around what different people consider misinformation, and it gets into a broader social debate.
Another concern is deliberate misinformation generation. Unfortunately, it's easy to make a lousy AI that hallucinates a lot. You can take any open-source text model and tweak it to generate all kinds of misinformation. If you're not concerned about accuracy, it's easy to do.
Detecting AI-generated content is an important field that we work on. This way, you could at least tell if something was created by an AI.’
Q: The CEO of NVIDIA said that basically the future of writing code as a career is dead. What's your take on that?
We don't know where the future of AI is going broadly. It seems to help across a range of careers, whether it's graphic artists, customer support, doctors, executives, or what have you. I wouldn't single out programming in particular. It's actually one of the more challenging tasks for LLMs today.
But if you're talking about decades in the future and what to prepare for, it's hard to say. AI could get quite good at programming, but you could say that about any field of human endeavor. So, I probably wouldn't single out programming and say, "Don't study that specifically." I don't know if that's a good answer.
Q: So, is IT security way to go? Because like the codes gotta be written by Agents, but someone still needs to check it.
Oh, wow. You're all trying to choose careers basically. I do think using an AI today to write unit tests is pretty straightforward. That's the kind of thing AI does quite well. So, my hope is that AI will make code more secure, not less.
Insecurity is often a result of people being lazy, and AI is good at not being lazy. If I had to bet, I'd say there's probably a net benefit to security with AI. But I wouldn't discourage you from pursuing a career in IT security based on that.
Q: Do you want to build AGI?
Do I want to? Yeah, absolutely. Different people mean different things by that, but to me, the reasoning aspects are really exciting and amazing. I came out of retirement because of the trajectory of AI – it was so exciting. As a computer scientist, seeing what these models can do year after year is astonishing.
Q: Hardware at Google? (heavy summarization)
We've worked on humanoid robotics extensively over the years, acquiring and selling companies in the field. There are many companies doing humanoid robotics now, and we still have internal groups working on various robotics projects.
What are my thoughts? I previously worked on apps before this new AI focus, and those projects were definitely more hardware-centric. Honestly, I learned the hard way that hardware is much more difficult – technically, business-wise, in every way.
I'm not discouraging robotics; we certainly need people working on it. However, with software and AI progressing so rapidly, that feels like the rocket ship to me. If I get distracted by building hardware for today's AIs, it might not be the best use of time compared to working on the next level of AI.
Besides, could AI design a robot for me? There are people at Google who could work on hardware, but that's not my personal focus.
Q: How do you think about the advertising revenue? Is it getting disrupted?
Of all people, I'm not too concerned about business model shifts. I think it's wonderful that for 25 years, we've been able to provide world-class information and search for free to everyone. Advertising supports this, which I think is great for the world – a kid in Africa has just as much access to basic information as the President of the United States. That's good.
At the same time, business models will evolve. Maybe advertising will remain, perhaps because AI can tailor it better. But even if it shifts to paid models, as we now have with Gemini Advanced, the fundamental issue is delivering immense value. You're displacing a huge amount of mental effort, whether in time or labor, with AI. It was the same with search. So, as long as there's huge value being generated, we'll figure out the business models.
Q: Where do you see Google Search going?
Well, it's a super exciting time for search because AI greatly enhances our ability to answer questions. I think the biggest opportunity lies in situations where you're recall-limited.
You might ask a very specialized question or one related to your own personal situation that nobody on the internet has written about. Traditional search might not be a big help for those kinds of questions.
However, AI has a huge opportunity to shine there – especially for questions that are specific to what you care about in that moment. You can imagine all kinds of products, UIs, and ways to deliver that, but the core enabler is AI doing a much better job in this unique context.
Q: Will AI help with immortality?
I'm not as well-versed in this as all of you, but I've seen huge progress in molecular AI. I imagine there's also unseen progress in using AI for epidemiology - to gain a broader, more accurate understanding of global health trends.
I don't have a brilliant answer for your immortality question. But, AI definitely benefits this field, whether it's for researchers or for summarizing articles. In the future, I expect AI to provide novel hypotheses to test. It already does that with molecules (like AlphaFold), and perhaps this capability will extend to more complex systems.
Reply