- Ben's Bites
- Posts
- Stable Diffusion 2.0 is out!
Stable Diffusion 2.0 is out!
Work's out, brussel sprout! Time for the holidays and all the awkward chats with family over new tech taking over our lives. So I've got some more hot topics for you to use.
Oh, and we hit over 5,000 subscribers in the last few days. Truly appreciative of all of your time and support, I'm not American, but I'm still thankful. Let's get into it.
Boiling hot regards,
Ben ✌️
🤌 Ben's Picks
Stable Diffusion 2.0 is live! Robust text-to-image models trained using a brand-new text encoder (OpenCLIP), which greatly improves the quality of the generated images compared to earlier V1 releases. An Upscaler Diffusion model that enhances the resolution of images by a factor of 4, a new depth-guided stable diffusion model, called Depth2img, which infers the depth of an input image (using an existing model), and then generates new images using both the text and depth information. We also include a new text-guided inpainting model, fine-tuned on the new Stable Diffusion 2.0 base text-to-image, which makes it super easy to switch out parts of an image intelligently and quickly. (link). However, there are some users pointing out issues with; hands, styles missing & celebrities.
Runway is a video editing tool, with AI baked in. Their videos are a marketing masterclass. Here’s an interview with the CEO, Cristóbal Valenzuela, where he discusses the company's history, tools & future plans. (link)
🛠️ Cool Tools
Automated pitch deck parsing + AI analysis using Docparser, Zapier and GPT-3. The user uploads a pitch deck, it pulls out all the info into a readable memo and can also assess various risks. (link)
AvatarAI just got 50+ new styles added. (link)
DocQuery - Upload a document, ask a question and this tool will answer it. (link)
Flowjin - Turn podcasts into clips with AI. (link)
Friday Go - Answers your Google searches for you, instead of trawling through pages of results. (link)
EverSQL - SQL query optimisation and database observability, powered by AI. (link)
👋 Too many links?! I created a database for all links mentioned in these emails. Refer 1 friend using this link and I'll send over the link database.
🔬 Research
Taking text-to-image synthesis to the realm of image-to-image translation. Give it a guidance image and a text prompt, and a newly generated image is created that follows the guidance image and takes the prompt into account. (link)
The future of Foundation Models will be embodied agents that proactively take action, endlessly explore the world, and continuously self-improve. This is a blueprint for that future; MineDojo. (link)
RA-CM3 is designed to be more scalable and modular than previous models. The model is trained using the LAION dataset and outperforms other models on image and caption generation tasks. (link)
Text-guided video completion (TVC) requests the model to generate a video from partial frames guided by an instruction. (link)
Lightweight video diffusion models that synthesize high-fidelity and arbitrary-long videos from pure noise. Specifically, performing diffusion and denoising in a low-dimensional 3D latent space significantly outperforms previous methods on 3D pixel space when under a limited computational budget. (link)
Given only a single input painting image, this method can accurately transfer the creative attributes such as semantic elements, material, object shape, brushstrokes and colours of the references to a natural image with a very simple learned textual description. (link)
Paint by Example: Exemplar-based image editing with diffusion models. (link)
🤓 Everything else
How to do sentiment analysis on encrypted data. (link)
DeepMind introduces a framework to create AI agents that can understand human instructions and perform actions in open-ended settings “Building interactive agents in video game worlds”. (link)
How AI can be used to bridge different content modalities, and how this can be used to improve creativity and productivity. AI will play an important role in the future of coding, by helping programmers quickly learn new languages. (link)
The potential risks of artificial intelligence, specifically the risk of AI systems could become so powerful that they could end up destroying much or all of humanity. The article argues that this scenario rests on a few key assumptions that are not justified by our current understanding of artificial intelligence research. Light-hearted reading to discuss around the holidays. (link)
Tutorial: Build your ML app in your favourite stack with Gradio's "Use via API". (link)
What are agents in LangChain, and how they can be used to determine which actions to take and in what order? Using tools to help agents take action, and provide some examples of agents that are already available. (link)
Combining self-ask and Google Search to get answers neither can answer correctly independently. (link)
'Humans of New York' style photos from the Lexica Aperture model. (link)
Harvey, a legal AI startup, received $5M in funding from OpenAI. It positions itself as ‘copilot for lawyers’. (link) Oh, and they’re hiring a founding Full-Stack Engineer.
Tom Osman (a friend of BB’s) was on the Creators Lab podcast (another friend of BB’s) talking about what he’s seeing in the world of AI. (YouTube link)
AI puzzles generated from Wikipedia pages. (link)
Kandinsky 2.0 - multilingual text2image latent diffusion model, trained on a large 1B multilingual set. (link)
Tutorial: Easily deploy any Stable Diffusion based model to Replicate. (link)
🧑💻 Who's hiring in AI
VEED.IO - Simple Online Video Editing. VEED is hiring AI / ML engineers to level up its creative toolkit and make it more magical.
Buildspace - where builders, build! They're looking for an ML/AI instructor to build their new course.
🖼 AI IMAGES OF THE DAY
Not Turkey, but close enough!
🤗 SHARE BENS BITES
Send this with 1 AI-curious friend and receive my AI project tracker database!
or copy/paste this link: https://bensbites.beehiiv.com/subscribe?ref=PLACEHOLDER
👋 SEE YA
⭐️ HOW DID WE DO?
How was today's email? |
Reply