- Ben's Bites
- Posts
- From Pixels to Possibilities: AI Vision
From Pixels to Possibilities: AI Vision
š Hey, this is Ben with a š subscriber-only issueš of Benās Bites Pro free version of Benās Bites Pro. A weekly newsletter covering AI trends, ideas, business breakdowns and how companies are using it internally. I wrote a series of posts like this as an experiment. Now Iāve launched Benās Bites Pro. If you find these posts valuable, consider subscribing.
GPT-4V (or more simply, GPT Vision) search volume is starting to take off, and donāt expect it to slow down as it becomes more widely known. Letās explore some examples and opportunities from this trend.
Search volume for āChatGPT visionā
Search volume for āGPT-4Vā
What is it?
Itās AI that can see. Using GPT4-Vision API or uploading an image to ChatGPT, you can get the model to interpret what is in an image or video. āWhatās this thing on my bike?ā is great and all, but how about these examples:
Screenshot to code. Itās what you think it is. Take a screenshot of something, and itāll turn it into actual code. Clone the YouTube, Instagram, Hacker News websites etc. (github repo here, no-code version here)
Cursor, the popular AI coding tool, lets you copy components with a screenshot and add it to your code, modify it etc.
Try out GPT-V in Cursor!
It's pretty good for building/modifying components!
ā Aman Sanger (@amanrsanger)
7:22 PM ā¢ Nov 29, 2023
Tldraw has been everywhere on my Twitter (X) feed recently. And for good reason. An unsuspecting whiteboard app that came alive with the new AI model. They added GPT4-V into its āMake Realā feature - so you could draw boxes of a web application (let's say a calculator) and it would actually create a functional calculator.
Yeah, this is nuts.
Just made a working tip calculator app in 2 minutes.
Sketch > working prototype.
The future of software will be heavily streamlined with AI.
Kudos @tldraw
ā Roberto Nickson (@rpnickson)
4:14 PM ā¢ Nov 17, 2023
So drawing code is now real. Itās only a matter of time before more applications are made this way. The true no-code (Iāve been harping on about this for years!!)
Also, its ādrawingā capabilities are insane too from such a basic starting point.
How to draw an owl with @tldraw
ā Taz Singh (@tazsingh)
7:09 PM ā¢ Nov 28, 2023
Be My Eyes is an app for the visually impaired to let volunteers essentially FaceTime the impaired to help them with daily tasks. Now, powered by OpenAI - AI can be the helper.
Taking control of a userās computer. This guy asked it to find his youtube channel and you see the AI literally go to Google Chrome, type in the address bar and click on a search result. WITHOUT HIM TOUCHING ANYTHING. While this is a basic task, you can imagine what this kind of thing unlocks.
And, if you canāt, I did:
Opportunities
You can generate AI voiceovers for your product demos like this guy just built. So instead of going through the process of scriptwriting, GPT4-V will help do that for you.
Make Pokemon Go, but for real life, like this demo.
Use Vision to count cards at an online casino. Ok, this isnāt recommended, but technically possible?
Get a breakdown of how much you spend waste on social media each week.
āSketch your dreamā app.
Virtual time-travel experiences.
Personal stylist assistant.
Analyse my weightlifting technique, my work posture (like this demo), my tennis swing etc.
Set up a productised service to turn real estate listings into more enhanced virtual viewing experiences.
Create a ton of infographics and interesting reports on topics that you can sell access to.
A user feedback tool where I just upload a video (a loom?) of me using your site and it interprets where Iām getting stuck, where Iām spending too much time, where Iām clicking vs where I should be. Forget heatmaps.
How about an automated system that signs up for every AI tool, goes through onboarding and posts the recording on a site so others can check it out? Works for PageFlows (which I believe, has a human behind it).
The opportunities are endless. Send me what youāre working with Vision. And let me know what you think of these types of posts - keep āem going // donāt bother
Reply