- Ben's Bites
- Posts
- Microsoft’s situationship with OpenAI
Microsoft’s situationship with OpenAI
PLUS: Datasets and Copyright.
Hello folks, here’s what we have today;
Our picks
1/
Microsoft plans AI service with Databricks that could hurt OpenAI. MSFT’s latest Azure+Databricks offer will allow Databricks users to use any AI model, including open-source LLMs, to train using their data on Azure—which might reduce the number of companies licensing OpenAI models for the same use case.
2/
Alex Reisner from The Atlantic did an analysis of Books3 - A dataset used to train Meta's Llama, BloombergGPT, and EleutherAI's GPT-J. He reveals it contains pirated versions of 170K+ books from Stephen King and other authors. Of the 170,000 titles, roughly one-third are fiction, and two-thirds are nonfiction. More than 30,000 titles are from Penguin Random House and 14,000 from HarperCollins.
3/
How to lose an AI copyright case? This lawyer says by saying that your AI tools are fully autonomous. If you say that your AI tool is fully autonomous and you want to copyright something generated by it, it is a fundamentally wrong case for copyright. (which is what happened in the recent case about a judge ruling that AI art can not be copyrighted)
4/
The Allen Institute of AI drops the biggest open dataset yet for training language models. DOLMA 3T is larger than Meta’s Llama 2’s 2T token dataset, with straightforward permissions to use.
From the community
Foundations of LLM app development with LangChain.js and Zep.
How to build Stumbleupon for git repos with ChatGPT and Hugo.
Vector databases - Analysing the trade-offs.
Cool Tools trending product launches from the last 24 hours
GodMode - The AI chat browser. Fast, free access to ChatGPT, Bing, Bard, Claude, YouChat, Poe, Perplexity, Phind, and Local/GGML Models like Vicuna and Alpaca.
Poozle - open-source Plaid for LLMs.
Strada - Developer-first, enterprise integration platform.
Chapple - A one-stop AI-powered content creation tool.
ThoughtCast - Craft and share compelling audio pitches, blogs, and more.
Langfuse - Open source tracing and analytics for LLM applications.
Cerelyze - Turn technical research papers into usable code.
Vexis - Unbiased, accurate grading to free up time for educators to teach.
NeoGPT - Steerable AutoGPT where you & AI agents collaborate on complex tasks in real-time.
Recursive document agents from LlamaIndex - Ask and answer more questions over heterogeneous documents. (ps: I’m an investor)
From the network
Hallucinations with Andy Weissman - AI-native podcast about AI and creativity.
Llama2 7B-32K Instruct—and fine-tuning for Llama 2 models with Together API.
Human expertise will unlock AI, not more data. So, build grimoires - spellbooks of human expertise.
Ben’s Bites News top posts from the last 24 hours
Unbabel unveils AI project to allow communication using thought alone.
Viome, a microbiome startup, raises $86.5M, inks distribution deal with CVS.
Expanding transformer size without losing function or starting from scratch.
With LLMs, enterprise data is different.
Profile and interview with Max Tegmark, MIT physicist who went from AI optimist to co-founding Future of Life Institute.
How Index Ventures jumped to the front of the AI GPU line.
Meet the $4 Billion AI superstars that Google lost.
How Science, Nature, and other journals are grappling with outlines, drafts, or papers using generative AI without disclosure.
The world isn’t ready for the next decade of AI - Mustafa Suleyman.
Ben’s Bites Insights
We have 2 databases that are updated daily which you can access by sharing Ben’s Bites using the link below;
All 10k+ links we’ve covered, easily filterable (1 referral)
6k+ AI company funding rounds from Jan 2022, including investors, amounts, stage etc (3 referrals)
Reply