• Ben's Bites
  • Posts
  • From Pixels to Possibilities: AI Vision

From Pixels to Possibilities: AI Vision

šŸ‘‹ Hey, this is Ben with a šŸ”’ subscriber-only issuešŸ”’ of Benā€™s Bites Pro free version of Benā€™s Bites Pro. A weekly newsletter covering AI trends, ideas, business breakdowns and how companies are using it internally. I wrote a series of posts like this as an experiment. Now Iā€™ve launched Benā€™s Bites Pro. If you find these posts valuable, consider subscribing.

GPT-4V (or more simply, GPT Vision) search volume is starting to take off, and donā€™t expect it to slow down as it becomes more widely known. Letā€™s explore some examples and opportunities from this trend.

Search volume for ā€˜ChatGPT visionā€™

Search volume for ā€˜GPT-4Vā€™

What is it?

Itā€™s AI that can see. Using GPT4-Vision API or uploading an image to ChatGPT, you can get the model to interpret what is in an image or video. ā€œWhatā€™s this thing on my bike?ā€ is great and all, but how about these examples:

Screenshot to code. Itā€™s what you think it is. Take a screenshot of something, and itā€™ll turn it into actual code. Clone the YouTube, Instagram, Hacker News websites etc. (github repo here, no-code version here)

Cursor, the popular AI coding tool, lets you copy components with a screenshot and add it to your code, modify it etc.

Tldraw has been everywhere on my Twitter (X) feed recently. And for good reason. An unsuspecting whiteboard app that came alive with the new AI model. They added GPT4-V into its ā€˜Make Realā€™ feature - so you could draw boxes of a web application (let's say a calculator) and it would actually create a functional calculator.

So drawing code is now real. Itā€™s only a matter of time before more applications are made this way. The true no-code (Iā€™ve been harping on about this for years!!)

Also, its ā€˜drawingā€™ capabilities are insane too from such a basic starting point.

Be My Eyes is an app for the visually impaired to let volunteers essentially FaceTime the impaired to help them with daily tasks. Now, powered by OpenAI - AI can be the helper.

Taking control of a userā€™s computer. This guy asked it to find his youtube channel and you see the AI literally go to Google Chrome, type in the address bar and click on a search result. WITHOUT HIM TOUCHING ANYTHING. While this is a basic task, you can imagine what this kind of thing unlocks. 

And, if you canā€™t, I did:

Opportunities

You can generate AI voiceovers for your product demos like this guy just built. So instead of going through the process of scriptwriting, GPT4-V will help do that for you.

Make Pokemon Go, but for real life, like this demo.

Use Vision to count cards at an online casino. Ok, this isnā€™t recommended, but technically possible?

Get a breakdown of how much you spend waste on social media each week.

ā€˜Sketch your dreamā€™ app.

Virtual time-travel experiences.

Personal stylist assistant.

Analyse my weightlifting technique, my work posture (like this demo), my tennis swing etc.

Set up a productised service to turn real estate listings into more enhanced virtual viewing experiences.

Create a ton of infographics and interesting reports on topics that you can sell access to. 

A user feedback tool where I just upload a video (a loom?) of me using your site and it interprets where Iā€™m getting stuck, where Iā€™m spending too much time, where Iā€™m clicking vs where I should be. Forget heatmaps.

How about an automated system that signs up for every AI tool, goes through onboarding and posts the recording on a site so others can check it out? Works for PageFlows (which I believe, has a human behind it).

The opportunities are endless. Send me what youā€™re working with Vision. And let me know what you think of these types of posts - keep ā€˜em going // donā€™t bother

Reply

or to participate.