The MLOps Podcast

English, Technology, 2 seasons, 32 episodes, 1 day, 8 hours, 9 minutes

The MLOps Podcast

English, Technology, 2 seasons, 32 episodes, 1 day, 8 hours, 9 minutes

About

A podcast about bringing machine learning into the real world. Each episode features a conversation with top data science and machine learning practitioners, who'll share their thoughts, best practices, and tips for promoting machine learning to production

🌲 Machine Learning in Agriculture: Scaling AI for Crop Management with Dror Haor

In this episode, Dean speaks with Dror Haor, CTO at SeeTree, about the challenges of deploying AI in agriculture at scale. They explore how SeeTree integrates AI and sensor fusion to manage vast amounts of remote sensing data, helping farmers improve crop yields with high accuracy at low costs. Dror shares insights on handling data drift, customizing models for different regions, and balancing the trade-offs between cost and performance. This conversation dives deep into practical machine learning applications in agriculture, offering valuable lessons for anyone working with large-scale data and AI. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:32 Production in machine learning at SeeTree 07:34 Sensor fusion in machine learning 16:26 Balancing accuracy and cost in agriculture 20:09 Customizing models for different customers and crops 24:19 Dealing with data in different domains 30:10 Tools and processes for ML at SeeTree 35:58 Building for scale 40:17 Collecting user feedback and self-improving products 42:45 Exciting developments in ML & AI 45:12 Hot takes in ML - Overfitting is good 46:34 Recommendations for the Audience ➡️ Dror Haor on LinkedIn – https://www.linkedin.com/in/dror-haor-phd-77152322/ ➡️ Dror Haor on Twitter – https://x.com/DrorHaor 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

9/15/2024 • 50 minutes, 46 seconds

📊 Data-Driven Decisions: ML in E-Commerce Forecasting with Federico Bacci

In this episode, Dean speaks with Federico Bacci, a data scientist and ML engineer at Bol, the largest e-commerce company in the Netherlands and Belgium. Federico shares valuable insights into the intricacies of deploying machine learning models in production, particularly for forecasting problems. He discusses the challenges of model explainability, the importance of feature engineering over model complexity, and the critical role of stakeholder feedback in improving ML systems. Federico also offers a compelling perspective on why LLMs aren't always the answer in AI applications, emphasizing the need for tailored solutions. This conversation provides a wealth of practical knowledge for data scientists and ML engineers looking to enhance their understanding of real-world ML operations and challenges in e-commerce. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 01:59 Owning the ML Pipeline 02:56 Deployment Process 05:58 Testing and Feedback 07:40 Different Deployment Strategies 11:19 Explainability and Feature Importance 13:46 Challenges in Forecasting 22:33 ML Stack and Tools 26:47 Orchestrating Data Pipelines with Airflow 31:27 Exciting Developments in ML 35:58 Recommendations and Closing Links Dwarkesh podcast with Anthropic and Gemini team members – https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken ➡️ Federico Bacci on LinkedIn – https://www.linkedin.com/in/federico-bacci/ ➡️ Federico Bacci on Twitter – https://x.com/fedebyes 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

8/15/2024 • 39 minutes, 36 seconds

🚗 Driving Innovation: Machine Learning in Auto Claims Processing

In this episode, Dean speaks with Michał Oleszak, an ML engineering manager at Solera. Michał shares insights into how his team is using machine learning to transform the automotive claims process, from recognizing vehicle damages in images to estimating repair costs. The conversation covers the challenges of deploying ML pipelines in production, managing data quality for computer vision tasks, and balancing technical implementation with business needs. Michał also discusses his approach to model evaluation, the benefits of monorepo architecture, and his views on exciting developments in self-supervised learning for computer vision. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:42 Production for Machine Learning at Solera 03:49 Transitioning from Images to Structured Data 04:58 Combining Deep Learning and Non-Deep Learning Models 05:15 Deployment Process for Machine Learning Models 08:01 Challenges and Solutions in Monorepo Adoption 12:57 Evaluating Model and Pipeline Versions 21:57 Tools for ML Projects: Monorepo, Pants, GitHub Actions 24:04 Data Management and Data Quality 30:14 Challenges in ML Efforts: Data Quality 30:37 Excitement about Self-Supervised Learning and JEPA Architectures 34:45 Controversial Opinion: Importance of Statistics for ML 36:40 Recommendations Links 🌎Prisoners of Geography by Tim Marshall: https://www.amazon.com/Prisoners-Geography-Explain-Everything-Politics/dp/1501121472 ➡️ Michał Oleszak on LinkedIn – https://www.linkedin.com/in/michal-oleszak/ ➡️ Michał Oleszak on Twitter – https://x.com/MichalOleszak 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

7/15/2024 • 39 minutes, 25 seconds

🚑 ML in the Emergency Room with Ljubomir Buturovic

In this episode, I chat with Ljubomir Buturovic, VP of ML and Informatics at Inflammatix. We discuss using ML to diagnose infections and blood tests in the emergency room. We dive into the challenges of building diagnostic (classification) and prognostic (predictive) modes, with takeaways related to building datasets for production use cases. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 What is Inflammatix and how do they use ML7:32 Edge Device Deployment: The Future of Model Deployment21:16 Navigating Regulatory Submission for Medical Products 26:01 Evolution of Regulatory Processes in ML for Medical Applications30:18 Challenges and Solutions in ML for Medical Applications 34:00 The Future of AI in Clinical Care40:25 The Overrated Concept of Interpretability in AI and ML45:32 RecommendationsLinks 🌎📈 Our world in data: https://ourworldindata.org/ 🚀 Profiles of the future: https://www.amazon.com/Profiles-Future-Arthur-C-Clarke-ebook/dp/B00BY7GITK ➡️ Ljubomir Buturovic on LinkedIn – https://www.linkedin.com/in/ljubomir-buturovic-798156/ ➡️ Ljubomir Buturovic on Twitter – https://x.com/ljbuturovic 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

6/10/2024 • 50 minutes, 26 seconds

🌊 AI-Native with Idan Gazit – The future of AI products and interfaces + Getting AI to production

In this episode, Idan Gazit, Senior Director of Research at GitHub Next, discusses his role in exploring strategic technologies and incubating long bet projects. He explains how the GitHub Next team chooses research projects and the process of exploration and theme selection. Idan also shares insights into the ML focus at GitHub Next and the challenges of evaluating the impact of AI products. He reflects on his journey into the AI space and provides advice for testing AI products in smaller organizations. Finally, he shares his thoughts on the future of AI interfaces. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 00:56 Choosing Research Projects at GitHub Next 06:09 ML Focus in GitHub Next 10:52 ML Work and the Leaky Abstraction 13:16 Idan's Journey into the AI Space 17:54 Evaluating the Impact of AI Products 24:36 Testing AI Products in Smaller Organizations 32:52 The Future of AI Interfaces 40:01 Transitioning from Prototype to Product 46:45 Challenges in the ML/AI Space 56:03 Recommendations ➡️ Idan Gazit on LinkedIn – https://www.linkedin.com/in/idangazit/ ➡️ Idan Gazit on Twitter – https://twitter.com/idangazit 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

5/16/2024 • 1 hour, 2 minutes, 56 seconds

🍪 Machine Learning in the cookie-less era with Uri Goren

In this episode, I chatted with Uri Goren, founder and CEO of Argmax, about Machine Learning and the future of digital advertising in a world moving away from cookies due to privacy laws like GDPR and CCPA. We chat about challenges in maintaining personalized ads while respecting user privacy, and new methods like probabilistic models and contextual features to cover some of the gap left by removing cookies. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:35 The Rise of Privacy Regulations 1:40 The Impact of Losing Cookies 2:48 Understanding Cookies 4:33 Reasons for the Decline of Cookies 8:47 ML Leveraging Cookies in Advertising 10:32 The Shift to Contextual Features 12:53 The Future of ML without Cookies 15:23 New and Old Ways of Generating Contextual Features 20:33 Regulatory Conspiracies 22:33 Unsolved Problems in ML and AI 24:39 Predictions for the Next Year in AI and ML 26:17 Controversial Take: Overuse of LLMs 28:03 Recommendations ➡️ Uri Goren on LinkedIn – https://www.linkedin.com/in/ugoren/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

4/18/2024 • 32 minutes, 47 seconds

🛰️ Modern & Realistic MLOps with Han-chung Lee

In this episode, I speak with Han-Chung Lee, a machine learning engineer with a lot of interesting takes on ML and AI. We dive into the buzz around natural language processing and the big waves in generative AI. They chat about how newcomers are racing through NLP’s history, mixing old school and new tech, and the shift towards smarter databases. Han-Chung breaks it down with his straightforward takes, making complex AI trends feel like coffee chat topics. It’s a perfect listen for anyone keen on where AI’s headed, minus the jargon. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Intro 0:41 State of NLP and LLMs 1:33 Repeating the past in NLP 3:29 Vector databases vs. classical databases 8:49 Choosing the right LLM for an application 12:13 Advantages and disadvantages of LLMs 16:10 Where LLMs are most useful 21:13 The dark side of LLMs and can we detect it? 25:19 Thoughts on LLM leaderboard metrics 31:19 Using LLMs in regulated industries 36:40 Creating a moat in the LLM world 40:20 Evaluating LLMs 44:20 Impact of LLM on non-english languages 48:35 Thoughts on MLOps and getting ML into production 56:48 The Hardest Unsolved Problem in ML and AI 59:09 Predictions for the Future of ML and AI 1:03:25 Recommendations and Conclusion ➡️ Han Lee on Twitter – https://twitter.com/HanchungLee ➡️ Han Lee on LinkedIn – https://www.linkedin.com/in/hanchunglee/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

3/18/2024 • 1 hour, 5 minutes, 42 seconds

🩻 AI in Medical Devices & Medicine with Mila Orlovsky

In this episode, I had the pleasure of speaking with Mila Orlovsky, a pioneer in medical AI. We delve into practical applications, overcoming data challenges, and the intricacies of developing AI tools that meet regulatory standards. Mila discusses her experiences with predictive analytics in patient care, offering tips on navigating the complexities of AI implementation in medical environments. This episode is packed with actionable advice and forward-thinking strategies, making it essential listening for professionals looking to impact healthcare through AI. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 4:03 Early Days of Machine Learning in Medicine 5:19 Challenges in Building Medical AI Systems 6:54 Differences Between Medical ML and Other ML Domains 15:36 Unique Challenges of Medical Data in ML 24:01 Counterintuitive Learnings on the Business Side 28:07 Impact and Value of ML Models in Medicine 29:41 The Role of Doctors in the Age of AI 38:44 Explainability in Medical ML 44:31 The FDA and Compliance in Medical ML 48:56 Feedback and Iteration in Medical ML 52:25 Predictions for the Future of ML and AI 53:59 Controversial Predictions in the Field of ML 56:02 Recommendations 57:58 Conclusion ➡️ Mila Orlovsky on LinkedIn – https://www.linkedin.com/in/milaorlovsky/ 🩺MeDS – Medical Data Science Israel Community – https://www.facebook.com/groups/452832939966464/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

2/15/2024 • 58 minutes, 48 seconds

⏪ Making LLMs Backwards Compatible with Jason Liu

In this episode, I had the pleasure of speaking with Jason Liu, an applied AI consultant and the creator of Instructor – an open-source tool for extracting structured data from LLM outputs. We chat about LLM applications, their challenges, and how to overcome them. We also dive into Instructor, making LLMs interact with existing systems and a bunch of other cool things. Join our Discord community: https://discord.gg/tEYvqxwhah ➡️ Jason Liu on Twitter – https://twitter.com/jxnlco 🤖 Instructor Blog – https://jxnl.github.io/instructor/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn Timestamps: 00:00 Introduction 02:18 Excitement about Machine Learning and AI 03:28 Using LLMs as Backend Developers 04:22 Building Applications with LLMs 07:07 Building Instructor 09:30 Thinking in Logic and Design 10:33 Validating Data and Building Systems with Instructor 11:49 Thoughts About Product and UX in LLMs 17:51 Future of Instructor 20:25 Misconceptions and Unsolved Problems in LLMs 24:57 Improving LLM Applications 26:14 RAG as Recommendation Systems 29:32 Fine-tuning Embedding Models 32:32 Beyond Vector Similarity in RAG 39:32 Predictions for the Next Year in AI and ML 45:26 Measuring Impact on Business Outcomes 47:06 The Continuous Cycle of Machine Learning 48:38 Unlocking Economic Value through Structured Data Extraction 50:52 Questioning the Status Quo and Making an Impact

1/15/2024 • 53 minutes, 41 seconds