A podcast about bringing machine learning into the real world. Each episode features a conversation with top data science and machine learning practitioners, who'll share their thoughts, best practices, and tips for promoting machine learning to production
🌲 Machine Learning in Agriculture: Scaling AI for Crop Management with Dror Haor
In this episode, Dean speaks with Dror Haor, CTO at SeeTree, about the challenges of deploying AI in agriculture at scale. They explore how SeeTree integrates AI and sensor fusion to manage vast amounts of remote sensing data, helping farmers improve crop yields with high accuracy at low costs. Dror shares insights on handling data drift, customizing models for different regions, and balancing the trade-offs between cost and performance. This conversation dives deep into practical machine learning applications in agriculture, offering valuable lessons for anyone working with large-scale data and AI.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 Introduction
00:32 Production in machine learning at SeeTree
07:34 Sensor fusion in machine learning
16:26 Balancing accuracy and cost in agriculture
20:09 Customizing models for different customers and crops
24:19 Dealing with data in different domains
30:10 Tools and processes for ML at SeeTree
35:58 Building for scale
40:17 Collecting user feedback and self-improving products
42:45 Exciting developments in ML & AI
45:12 Hot takes in ML - Overfitting is good
46:34 Recommendations for the Audience
➡️ Dror Haor on LinkedIn – https://www.linkedin.com/in/dror-haor-phd-77152322/
➡️ Dror Haor on Twitter – https://x.com/DrorHaor
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://x.com/TheRealDAGsHub
➡️ Dean Pleban: https://x.com/DeanPlbn
9/15/2024 • 50 minutes, 46 seconds
📊 Data-Driven Decisions: ML in E-Commerce Forecasting with Federico Bacci
In this episode, Dean speaks with Federico Bacci, a data scientist and ML engineer at Bol, the largest e-commerce company in the Netherlands and Belgium. Federico shares valuable insights into the intricacies of deploying machine learning models in production, particularly for forecasting problems. He discusses the challenges of model explainability, the importance of feature engineering over model complexity, and the critical role of stakeholder feedback in improving ML systems. Federico also offers a compelling perspective on why LLMs aren't always the answer in AI applications, emphasizing the need for tailored solutions. This conversation provides a wealth of practical knowledge for data scientists and ML engineers looking to enhance their understanding of real-world ML operations and challenges in e-commerce.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 Introduction and Background
01:59 Owning the ML Pipeline
02:56 Deployment Process
05:58 Testing and Feedback
07:40 Different Deployment Strategies
11:19 Explainability and Feature Importance
13:46 Challenges in Forecasting
22:33 ML Stack and Tools
26:47 Orchestrating Data Pipelines with Airflow
31:27 Exciting Developments in ML
35:58 Recommendations and Closing
Links
Dwarkesh podcast with Anthropic and Gemini team members – https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken
➡️ Federico Bacci on LinkedIn – https://www.linkedin.com/in/federico-bacci/
➡️ Federico Bacci on Twitter – https://x.com/fedebyes
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://x.com/TheRealDAGsHub
➡️ Dean Pleban: https://x.com/DeanPlbn
8/15/2024 • 39 minutes, 36 seconds
🚗 Driving Innovation: Machine Learning in Auto Claims Processing
In this episode, Dean speaks with Michał Oleszak, an ML engineering manager at Solera. Michał shares insights into how his team is using machine learning to transform the automotive claims process, from recognizing vehicle damages in images to estimating repair costs. The conversation covers the challenges of deploying ML pipelines in production, managing data quality for computer vision tasks, and balancing technical implementation with business needs. Michał also discusses his approach to model evaluation, the benefits of monorepo architecture, and his views on exciting developments in self-supervised learning for computer vision.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 Introduction
00:42 Production for Machine Learning at Solera
03:49 Transitioning from Images to Structured Data
04:58 Combining Deep Learning and Non-Deep Learning Models
05:15 Deployment Process for Machine Learning Models
08:01 Challenges and Solutions in Monorepo Adoption
12:57 Evaluating Model and Pipeline Versions
21:57 Tools for ML Projects: Monorepo, Pants, GitHub Actions
24:04 Data Management and Data Quality
30:14 Challenges in ML Efforts: Data Quality
30:37 Excitement about Self-Supervised Learning and JEPA Architectures
34:45 Controversial Opinion: Importance of Statistics for ML
36:40 Recommendations
Links
🌎Prisoners of Geography by Tim Marshall: https://www.amazon.com/Prisoners-Geography-Explain-Everything-Politics/dp/1501121472
➡️ Michał Oleszak on LinkedIn – https://www.linkedin.com/in/michal-oleszak/
➡️ Michał Oleszak on Twitter – https://x.com/MichalOleszak
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://twitter.com/TheRealDAGsHub
➡️ Dean Pleban: https://twitter.com/DeanPlbn
7/15/2024 • 39 minutes, 25 seconds
🚑 ML in the Emergency Room with Ljubomir Buturovic
In this episode, I chat with Ljubomir Buturovic, VP of ML and Informatics at Inflammatix. We discuss using ML to diagnose infections and blood tests in the emergency room. We dive into the challenges of building diagnostic (classification) and prognostic (predictive) modes, with takeaways related to building datasets for production use cases.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 What is Inflammatix and how do they use ML7:32 Edge Device Deployment: The Future of Model Deployment21:16 Navigating Regulatory Submission for Medical Products
26:01 Evolution of Regulatory Processes in ML for Medical Applications30:18 Challenges and Solutions in ML for Medical Applications
34:00 The Future of AI in Clinical Care40:25 The Overrated Concept of Interpretability in AI and ML45:32 RecommendationsLinks
🌎📈 Our world in data: https://ourworldindata.org/ 🚀 Profiles of the future: https://www.amazon.com/Profiles-Future-Arthur-C-Clarke-ebook/dp/B00BY7GITK
➡️ Ljubomir Buturovic on LinkedIn – https://www.linkedin.com/in/ljubomir-buturovic-798156/
➡️ Ljubomir Buturovic on Twitter – https://x.com/ljbuturovic
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://twitter.com/TheRealDAGsHub
➡️ Dean Pleban: https://twitter.com/DeanPlbn
6/10/2024 • 50 minutes, 26 seconds
🌊 AI-Native with Idan Gazit – The future of AI products and interfaces + Getting AI to production
In this episode, Idan Gazit, Senior Director of Research at GitHub Next, discusses his role in exploring strategic technologies and incubating long bet projects. He explains how the GitHub Next team chooses research projects and the process of exploration and theme selection. Idan also shares insights into the ML focus at GitHub Next and the challenges of evaluating the impact of AI products. He reflects on his journey into the AI space and provides advice for testing AI products in smaller organizations. Finally, he shares his thoughts on the future of AI interfaces.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 Introduction and Background
00:56 Choosing Research Projects at GitHub Next
06:09 ML Focus in GitHub Next
10:52 ML Work and the Leaky Abstraction
13:16 Idan's Journey into the AI Space
17:54 Evaluating the Impact of AI Products
24:36 Testing AI Products in Smaller Organizations
32:52 The Future of AI Interfaces
40:01 Transitioning from Prototype to Product
46:45 Challenges in the ML/AI Space
56:03 Recommendations
➡️ Idan Gazit on LinkedIn – https://www.linkedin.com/in/idangazit/
➡️ Idan Gazit on Twitter – https://twitter.com/idangazit
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://twitter.com/TheRealDAGsHub
➡️ Dean Pleban: https://twitter.com/DeanPlbn
5/16/2024 • 1 hour, 2 minutes, 56 seconds
🍪 Machine Learning in the cookie-less era with Uri Goren
In this episode, I chatted with Uri Goren, founder and CEO of Argmax, about Machine Learning and the future of digital advertising in a world moving away from cookies due to privacy laws like GDPR and CCPA. We chat about challenges in maintaining personalized ads while respecting user privacy, and new methods like probabilistic models and contextual features to cover some of the gap left by removing cookies.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 Introduction
00:35 The Rise of Privacy Regulations
1:40 The Impact of Losing Cookies
2:48 Understanding Cookies
4:33 Reasons for the Decline of Cookies
8:47 ML Leveraging Cookies in Advertising
10:32 The Shift to Contextual Features
12:53 The Future of ML without Cookies
15:23 New and Old Ways of Generating Contextual Features
20:33 Regulatory Conspiracies
22:33 Unsolved Problems in ML and AI
24:39 Predictions for the Next Year in AI and ML
26:17 Controversial Take: Overuse of LLMs
28:03 Recommendations
➡️ Uri Goren on LinkedIn – https://www.linkedin.com/in/ugoren/
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://twitter.com/TheRealDAGsHub
➡️ Dean Pleban: https://twitter.com/DeanPlbn
4/18/2024 • 32 minutes, 47 seconds
🛰️ Modern & Realistic MLOps with Han-chung Lee
In this episode, I speak with Han-Chung Lee, a machine learning engineer with a lot of interesting takes on ML and AI. We dive into the buzz around natural language processing and the big waves in generative AI. They chat about how newcomers are racing through NLP’s history, mixing old school and new tech, and the shift towards smarter databases. Han-Chung breaks it down with his straightforward takes, making complex AI trends feel like coffee chat topics. It’s a perfect listen for anyone keen on where AI’s headed, minus the jargon.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 Intro
0:41 State of NLP and LLMs
1:33 Repeating the past in NLP
3:29 Vector databases vs. classical databases
8:49 Choosing the right LLM for an application
12:13 Advantages and disadvantages of LLMs
16:10 Where LLMs are most useful
21:13 The dark side of LLMs and can we detect it?
25:19 Thoughts on LLM leaderboard metrics
31:19 Using LLMs in regulated industries
36:40 Creating a moat in the LLM world
40:20 Evaluating LLMs
44:20 Impact of LLM on non-english languages
48:35 Thoughts on MLOps and getting ML into production
56:48 The Hardest Unsolved Problem in ML and AI
59:09 Predictions for the Future of ML and AI
1:03:25 Recommendations and Conclusion
➡️ Han Lee on Twitter – https://twitter.com/HanchungLee
➡️ Han Lee on LinkedIn – https://www.linkedin.com/in/hanchunglee/
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://twitter.com/TheRealDAGsHub
➡️ Dean Pleban: https://twitter.com/DeanPlbn
3/18/2024 • 1 hour, 5 minutes, 42 seconds
🩻 AI in Medical Devices & Medicine with Mila Orlovsky
In this episode, I had the pleasure of speaking with Mila Orlovsky, a pioneer in medical AI. We delve into practical applications, overcoming data challenges, and the intricacies of developing AI tools that meet regulatory standards. Mila discusses her experiences with predictive analytics in patient care, offering tips on navigating the complexities of AI implementation in medical environments. This episode is packed with actionable advice and forward-thinking strategies, making it essential listening for professionals looking to impact healthcare through AI.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 Introduction and Background
4:03 Early Days of Machine Learning in Medicine
5:19 Challenges in Building Medical AI Systems
6:54 Differences Between Medical ML and Other ML Domains
15:36 Unique Challenges of Medical Data in ML
24:01 Counterintuitive Learnings on the Business Side
28:07 Impact and Value of ML Models in Medicine
29:41 The Role of Doctors in the Age of AI
38:44 Explainability in Medical ML
44:31 The FDA and Compliance in Medical ML
48:56 Feedback and Iteration in Medical ML
52:25 Predictions for the Future of ML and AI
53:59 Controversial Predictions in the Field of ML
56:02 Recommendations
57:58 Conclusion
➡️ Mila Orlovsky on LinkedIn – https://www.linkedin.com/in/milaorlovsky/
🩺MeDS – Medical Data Science Israel Community – https://www.facebook.com/groups/452832939966464/
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://twitter.com/TheRealDAGsHub
➡️ Dean Pleban: https://twitter.com/DeanPlbn
2/15/2024 • 58 minutes, 48 seconds
⏪ Making LLMs Backwards Compatible with Jason Liu
In this episode, I had the pleasure of speaking with Jason Liu, an applied AI consultant and the creator of Instructor – an open-source tool for extracting structured data from LLM outputs. We chat about LLM applications, their challenges, and how to overcome them. We also dive into Instructor, making LLMs interact with existing systems and a bunch of other cool things.
Join our Discord community: https://discord.gg/tEYvqxwhah
➡️ Jason Liu on Twitter – https://twitter.com/jxnlco
🤖 Instructor Blog – https://jxnl.github.io/instructor/
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://twitter.com/TheRealDAGsHub
➡️ Dean Pleban: https://twitter.com/DeanPlbn
Timestamps:
00:00 Introduction
02:18 Excitement about Machine Learning and AI
03:28 Using LLMs as Backend Developers
04:22 Building Applications with LLMs
07:07 Building Instructor
09:30 Thinking in Logic and Design
10:33 Validating Data and Building Systems with Instructor
11:49 Thoughts About Product and UX in LLMs
17:51 Future of Instructor
20:25 Misconceptions and Unsolved Problems in LLMs
24:57 Improving LLM Applications
26:14 RAG as Recommendation Systems
29:32 Fine-tuning Embedding Models
32:32 Beyond Vector Similarity in RAG
39:32 Predictions for the Next Year in AI and ML
45:26 Measuring Impact on Business Outcomes
47:06 The Continuous Cycle of Machine Learning
48:38 Unlocking Economic Value through Structured Data Extraction
50:52 Questioning the Status Quo and Making an Impact