Winamp Logo
The Data Exchange with Ben Lorica Cover
The Data Exchange with Ben Lorica Profile

The Data Exchange with Ben Lorica

English, Technology, 1 season, 255 episodes, 9 hours, 29 minutes
About
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
Episode Artwork

Monthly Roundup: Ray Compiled Graphs, Llama 3.2 and Multimodal AI, and Structured Data for RAG

This is our monthly conversation on topics in AI and Technology with Paco Nathan, the founder of Derwen, a boutique consultancy focused on Data and AI.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
10/24/202452 minutes, 57 seconds
Episode Artwork

Reimagining Code: The AI-Driven Transformation of Programming and Data Analytics

Matt Welsh is a technical leader at Aryn AI, an AI-powered ETL system for RAG frameworks, LLM-based applications, and vector databases. In this episode, we explore how AI is revolutionizing programming and software development. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
10/17/202441 minutes, 5 seconds
Episode Artwork

The Security Debate: How Safe is Open-Source Software?

Mars Lan, Co-Founder & CTO at Metaphor1, an AI-powered social platform that enhances data governance by empowering all employees, not just data teams, to easily collaborate, search, and share insights through an intuitive, AI-driven interface. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
10/10/202451 minutes, 6 seconds
Episode Artwork

Generative AI in Voice Technology

Yishay Carmiel is the CEO of Meaning, a startup building real-time generative AI systems focused on voice applications.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
10/3/202459 minutes, 41 seconds
Episode Artwork

Building An Experiment Tracker for Foundation Model Training

Aurimas Griciūnas is the  Chief Product Officer of Neptune.AI, a startup building experiment tracking tools for foundation model training. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
9/26/202437 minutes, 56 seconds
Episode Artwork

Monthly Roundup: AI Regulations, GenAI for Analysts, Inference Services, and Military Applications

This is our monthly conversation on topics in AI and Technology with Paco Nathan, the founder of Derwen, a boutique consultancy focused on Data and AI.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
9/19/202445 minutes, 49 seconds
Episode Artwork

Unlocking the Power of LLMs with Data Prep Kit

Petros Zerfos and Hima Patel of IBM Research are part of the team behind Data Prep Kit, an open-source toolkit that helps process and prepare raw text and code data at scale for use in large language model applications.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
9/12/202438 minutes, 15 seconds
Episode Artwork

Advancing AI: Scaling, Data, Agents, Testing, and Ethical Considerations

Dr. Andrew Ng is a globally recognized AI leader, founder of DeepLearning.AI and Landing AI, General Partner at AI Fund, Chairman and Co-Founder of Coursera, and Adjunct Professor at Stanford University. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
9/5/202424 minutes, 37 seconds
Episode Artwork

Bridging the Hardware-Software Divide in AI

Jay Dawani is CEO and founder of Lemurian Labs, a pioneering startup building a software stack for developing advanced AI systems, focusing on pushing the boundaries of computational capabilities and model performance.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
8/29/202448 minutes, 27 seconds
Episode Artwork

Monthly Roundup: The Economic Realities of Large Language Models

This is our monthly conversation on topics in AI and Technology with Paco Nathan, the founder of Derwen, a boutique consultancy focused on Data and AI.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
8/22/202443 minutes, 31 seconds
Episode Artwork

From Hype to Reality: The Current State of Enterprise Generative AI Adoption

Evangelos Simoudis is Managing Director at Synapse Partners, a firm that assists corporations in implementing AI solutions, and invests in startups developing applications that exploit data using AI.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
8/15/202444 minutes, 53 seconds
Episode Artwork

Automating Unstructured Data Extraction with LLMs

Shuveb Hussain is co-founder of Unstract, a no-code platform that uses large language models to extract structured data from unstructured documents, allowing users to build API endpoints and ETL pipelines to automate document processing workflows. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
8/8/202435 minutes, 29 seconds
Episode Artwork

Generative AI in Context: Hybrid Intelligence and Responsible Development

Alfred Spector’s distinguished career includes groundbreaking work in networked computing systems and leadership roles in research at IBM, Google, and Two Sigma Investments. He is currently a visiting scholar at MIT.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
8/1/202436 minutes, 4 seconds
Episode Artwork

Monthly Roundup: Navigating the Peaks and Valleys of Generative AI Technology

This is our monthly conversation on trending topics in AI and Technology with Paco Nathan, the founder of Derwen, a boutique consultancy focused on Data and AI.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.
7/25/202446 minutes, 18 seconds
Episode Artwork

From Preparation to Recovery: Mastering AI Incident Response

Andrew Burt is co-founder of both Luminos.Law and Luminos.ai, entities building tools to help companies mitigate and manage AI risks. We dive into the critical topic of AI incident response, highlighting its unique challenges compared to traditional software incidents.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
7/18/202434 minutes, 38 seconds
Episode Artwork

Unlocking the Power of Unstructured Data

Chang She is CEO and co-founder of LanceDB, an open-source database designed for multimodal AI applications, offering scalable vector search, streaming training data, and interactive exploration of large AI datasets. In this episode we discuss Lance, an open-source columnar data format that tackles the unique challenges posed by modern AI and machine learning workloads.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
7/11/202449 minutes, 32 seconds
Episode Artwork

Postgres: The Swiss Army Knife of Databases

Ajay Kulkarni and Mike Freedman are the co-founders of Timescale, a startup that provides an enhanced version of PostgreSQL optimized for time-series analytics, AI applications, and scalable relational workloads. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
7/3/202450 minutes, 50 seconds
Episode Artwork

Supercharging AI with Graphs

Philip Rathle, CTO of Neo4j, joins the podcast to discuss the rising popularity of graph-enhanced retrieval augmented generation (GraphRAG).  He also discusses the potential impact of the new GQL graph query language standard. [Link to the demo that Philip showed.]Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/27/202443 minutes, 58 seconds
Episode Artwork

Monthly Roundup: SB 1047, GraphRAG, and AI Avatars in the Workplace

Paco Nathan is the founder of Derwen, a boutique consultancy focused on Data and AI. This episode is part of our series of monthly roundups and covers: the proposed California Senate Bill 1047 for regulating AI models, including its feasibility and potential unintended consequences. We also discuss the rising popularity of graph retrieval augmented generation (GraphRAG) techniques to mitigate hallucinations in large language models, while acknowledging the current limitations and future potential of integrating symbolic and statistical AI approaches. Additionally, we explore the concept of AI avatars in the workplace, highlighting the challenges and ethical considerations surrounding digital twins and agent-based systems.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/20/202436 minutes, 59 seconds
Episode Artwork

Fine-tuning and Preference Alignment in a Single Streamlined Process

Jiwoo Hong and  Noah Lee of KAIST AI are co-authors of ORPO: Monolithic Preference Optimization without Reference Model. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/13/202435 minutes, 32 seconds
Episode Artwork

TinyML, Sensor-Driven AI, and Advances in Large Language Models

In this episode, Pete Warden introduces his company, Useful Sensors, which focuses on developing AI solutions for consumer electronics and appliances.  [This episode originally aired on Generative AI in the Real World, a podcast series I’m hosting for O’Reilly.]Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/6/202425 minutes, 23 seconds
Episode Artwork

Machine Unlearning: Techniques, Challenges, and Future Directions

Ken Liu,  Ph.D. student in Computer Science at Stanford, is the author of Machine Unlearning in 2024. We explore the concept of machine unlearning, a process of removing specific data points from trained AI models.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/30/202449 minutes, 36 seconds
Episode Artwork

Unleashing the Power of AI Agents

Joao (Joe) Moura is the  founder of crewAI, an open-source platform that simplifies the development and deployment of AI agents, allowing users to build autonomous systems for various tasks using multiple large language models.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/23/202438 minutes, 47 seconds
Episode Artwork

Monthly Roundup: Llama 3, Agents, Evaluation Metrics, Cyc, TikTok, and more

Paco Nathan is the founder of Derwen, a boutique consultancy focused on Data and AI. This episode is part of our series of monthly roundups and covers: Llama 3 and other recent LLMs, the rise of open foundation models, the evolution of AI agents, and the importance of data engineering. We also explore the limitations of leaderboards in evaluating AI models and touch upon the ethical and societal implications of AI development.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/16/202441 minutes, 58 seconds
Episode Artwork

LLMs for Data Access: Unlocking Insights with Text-to-SQL

Gunther Hagleither is co-founder of Waii, a startup that provides an API enabling businesses to seamlessly integrate text-to-SQL functionality into their products. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/9/202443 minutes, 22 seconds
Episode Artwork

2024 Artificial Intelligence Index

In this episode we explore the latest developments in artificial intelligence with a focus on the 2024 Artificial Intelligence Index Report, edited by Nestor Maslej from Stanford’s Institute for Human-Centered Artificial Intelligence. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/2/202453 minutes, 50 seconds
Episode Artwork

DBRX and the Future of Open LLMs

In this episode, Hagay Lupesko, Senior Director of Engineering at Databricks MosaicAI, delves into the creation and aspirations behind DBRX, an innovative open Large Language Model (LLM) designed to bridge the gap between quality and cost-effectiveness for AI applications.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
4/25/202445 minutes, 45 seconds
Episode Artwork

Monthly Roundup: New LLMs, GTC 2024, Constraint-Driven Innovation, Model Safety, and GraphRAG

Paco Nathan is the founder of Derwen, a boutique consultancy focused on Data and AI. This episode is part of our series of monthly roundups and covers: recently released large language models,  Constraint-Driven Innovation, highlights from GTC 2024, and Lessons from the First AI Workload Security Exploit.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
4/18/202437 minutes, 1 second
Episode Artwork

Automating Software Upgrades: How to Combine AI and Expert Developers

Steve Pike is a co-founder of Infield.ai, a startup building tools to help companies upgrade and maintain open source software dependencies, ensuring they stay up-to-date with the latest releases, features, and security fixes.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
4/11/202436 minutes, 27 seconds
Episode Artwork

Generative AI in the Industrial Sphere

Chetan Gupta is the Head of AI Research at Hitachi. This episode explores the applications and challenges of generative AI in industrial settings.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
4/4/202444 minutes, 4 seconds
Episode Artwork

The Intersection of LLMs, Knowledge Graphs, and Query Generation

Semih Salihoglu  is an Associate Professor at University of Waterloo, and co-creator of Kuzu an open source embeddable property graph database management system. This episode explores the use of large language models (LLMs) for generating queries across different query languages like SQL and Cypher for graphs.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/28/202457 minutes, 45 seconds
Episode Artwork

Unlocking the Potential of Private Data Collaboration

Sadegh Riazi, CEO and co-founder of Pyte, a startup offering secure, encrypted data collaboration solutions, enabling partners to maximize insights without compromising privacy or data integrity. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/21/202436 minutes, 14 seconds
Episode Artwork

Frontiers of AI: From Text-to-Video Models to Knowledge Graphs

Paco Nathan is the founder of Derwen, a boutique consultancy focused on Data and AI. This episode explores recent developments in AI, including text-to-video models like Sora, frameworks for productionizing AI models, analyses of systems like Google’s Gemini, techniques to improve foundation models, AMD’s software innovations for AI acceleration, and knowledge graph augmentations of language models.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/14/202433 minutes, 35 seconds
Episode Artwork

Adaptive, Specialized, and Accessible: Where AI Systems Are Heading Next

Jerry Kaplan is the author of the new book “Generative Artificial Intelligence: What Everyone Needs to Know”.  Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/7/202443 minutes, 1 second
Episode Artwork

2024 Themes and Trends in AI

This episode is our annual deep dive into the themes and trends of AI in 2024, emphasizing the democratization of AI hardware, advancements in generative AI models, and the integration of AI into various enterprise processes. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/29/202426 minutes, 36 seconds
Episode Artwork

The AI Infrastructure Revolution: From Cloud Computing to Data Center Design

Bryan Cantrill, CTO and Co-founder of Oxide Cloud Computer, leads a startup delivering integrated hardware and software solutions for enterprises seeking cloud computing systems with hyperscaler agility. Oxide specializes in vertically integrated, scale-ready cloud infrastructure tailored for mainstream business needs.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/22/202442 minutes, 44 seconds
Episode Artwork

AI in Depth: Transforming Transportation, Enterprise, and Policy

Evangelos Simoudis is a seasoned venture investor and a senior advisor to global corporations, and Managing Director at Synapse Partners, a company that invests in startups developing enterprise applications that exploit Big Data and AI.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/15/202441 minutes, 12 seconds
Episode Artwork

Software Meets Hardware: Enabling AMD for Large Language Models

Sharon Zhou and Greg Diamos are co-founders of Lamini, a startup at the forefront of enabling enterprise adoption of large language models (LLMs). We discussed Lamini’s work with AMD, which focused on closing the gap between AMD hardware capabilities and software integration in LLM applications.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/8/202438 minutes, 37 seconds
Episode Artwork

Incentives are Superpowers: Mastering Motivation in the AI Era

Uri Gneezy is Professor of Economics and Strategy at UC San Diego, and author of our 2023 Book of the Year, “Mixed Signals: How Incentives Really Work”.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/1/202431 minutes, 9 seconds
Episode Artwork

Synthetic Futures: The Convergence of Biology and AI

Dmitriy Ryaboy is the VP of AI Enablement at Ginkgo Bioworks, a startup that uses machine learning and AI to develop a wide range of applications.  The conversation focuses on the intersection of AI, machine learning, and biology, particularly in the field of synthetic biology.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
1/25/202432 minutes, 22 seconds
Episode Artwork

AI Co-Pilots in Action: Transforming Function Calling in Cybersecurity

Jian Zhang is co-founder, CTO, VP Engineering at Nexusflow AI a startup that uses Generative AI to build tools for Cybersecurity.  This conversation revolves around the integration of various AI components, with a specific focus on cybersecurity and function calling copilots.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
1/18/202445 minutes
Episode Artwork

Leveling Up: Tools and Techniques to Make AI Development More Accessible

Sarmad Qadri, founder and CEO of LastMile, a startup building an AI developer platform for engineering teams. This conversation delves into key artificial intelligence and machine learning themes, focusing on injecting software engineering rigor into the development of LLM and GenAI applications.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
1/11/202445 minutes, 16 seconds
Episode Artwork

LLMs on CPUs, Period

Nir Shavit, Professor at MIT’s Computer Science and Artificial Intelligence Laboratory, is also a Founder of Neural Magic, a startup working to accelerate open-source large language models and simplify AI deployments.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
1/4/202433 minutes, 13 seconds
Episode Artwork

Democratizing Wealth Management With AI

Chirag Yagnik is a co-founder of Arta , a company that harnesses innovations in artificial intelligence and software to develop wealth management solutions. Arta aims to democratize access to sophisticated investment tools typically only available to ultra-high net worth individuals through family offices.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
12/28/202347 minutes, 27 seconds
Episode Artwork

Knowledge Graphs: Contextualizing Enterprise Data for More Accurate LLMs

Juan Sequeda (Principal Scientist & Head of AI Lab) and Dean Allemang (Principal Solutions Architect) are knowledge graph experts at data.world, a startup that offers a data catalog powered by a knowledge graph to help organizations better understand and gain value from their data.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
12/21/202341 minutes, 36 seconds
Episode Artwork

TimeGPT: Machine Learning for Time Series, Made Accessible

Max Mergenthaler (CEO) and Azul Garza Ramirez (CTO) are co-founders of Nixtla, a startup that seeks to make cutting-edge predictive insights widely accessible.  In this episode we discuss TimeGPT, Nixtla’s new frontier model for time series forecasting.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
12/14/202344 minutes, 7 seconds
Episode Artwork

Best Practices for Building LLM-Backed Applications

Waleed Kadous, Chief Scientist at Anyscale, is one of my go-to experts for best practices on building applications leveraging large language models.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
12/7/202353 minutes, 50 seconds
Episode Artwork

The Evolution of Crypto, Blockchain, and Web3

Kieren James-Lubin, CEO of BlockApps and the Co-Chair Technical Steering Community for the Enterprise Ethereum Alliance. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
11/30/202349 minutes, 15 seconds
Episode Artwork

Open Source Data and AI: Past, Present, Future

Earlier this year, I had a conversation with Sam Ramji, Chief Strategy Officer at DataStax and host of the Open||Source||Data podcast,  where we talked about the evolution of big data and AI technologies. I’m airing our original conversation in its entirety on this holiday weekend in the U.S. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
11/23/202343 minutes, 7 seconds
Episode Artwork

Orchestration for LLM and RAG applications

Malte Pietsch is co-founder & CTO of Deepset, the company behind the popular open source project Haystack, an orchestration framework for LLMs.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
11/16/202349 minutes, 58 seconds
Episode Artwork

Reflections from the First AI Conference in San Francisco

In this episode, Paco Nathan and I dive into insights from the inaugural AI Conference in San Francisco (video of talks can be found here). Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
11/9/202349 minutes, 28 seconds
Episode Artwork

Kùzu: A simple, extremely fast, and embeddable graph database

Semih Salihoglu  is an Associate Professor at University of Waterloo, and co-creator of Kuzu an open source embeddable property graph database management system.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
11/2/202351 minutes, 9 seconds
Episode Artwork

Navigating the Nuances of Retrieval Augmented Generation

Philipp Moritz (Co-founder and CTO) and Goku Mohandas (ML and Product Lead) of Anyscale do a deep dive into retrieval augmented generation (RAG) and large language models (LLMs). Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
10/26/202342 minutes, 40 seconds
Episode Artwork

The Rise of Generative AI-Powered Social Media Manipulation

Bill  Marcellino is a senior behavioral scientist at the RAND Corporation, and Nathan Beauchamp-Mustafaga, policy researcher at the RAND Corporation. They are the principal researchers behind the new report  “The Rise of Generative AI and the Coming Era of Social Media Manipulation 3.0”. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
10/19/202340 minutes, 3 seconds
Episode Artwork

Versioning and MLOps for Generative AI

Yucheng Low, Cofounder & CEO of  XetHub, discusses the challenges of managing large-scale machine learning assets and the need for version control.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
10/12/202338 minutes, 35 seconds
Episode Artwork

Navigating the Generative AI Landscape

Christopher Nguyen is CEO and Co-founder of Aitomatic, a startup that builds virtual advisors tailored with domain-specific expertise, primarily catering to industrial AI applications. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
10/5/202340 minutes, 31 seconds
Episode Artwork

Trends in Data Management: From Source to BI and Generative AI

Sudhir Hasbe, Chief Product Officer at Neo4j, and a longtime technical and product leader in the data management space.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
9/28/202348 minutes, 4 seconds
Episode Artwork

AI and the Future of Speech Technologies

Yishay Carmiel is the CEO of Meaning, a startup at the forefront of building real-time speech applications for enterprises. We discuss the state of AI for speech and audio, including trends in Generative AI, automatic speech recognition, diarization, restoration, voice cloning, speech synthesis and more.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
9/21/202336 minutes, 52 seconds
Episode Artwork

The Future of Cybersecurity: Generative AI and its Implications

Casey Ellis is Founder/Chair/CTO of Bugcrowd, a Crowdsourced Cybersecurity Platform. Bugcrowd recently released “Inside the Mind of a Hacker 2023”, an interesting report that provides insights into the motivations, challenges, and specializations of hackers, as well as security implications of AI.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
9/14/202349 minutes, 7 seconds
Episode Artwork

Ivy: The One-Stop Interface for AI Model Deployment and Development

Daniel Lenton is the CEO of Ivy, a suite of tools designed to accelerate AI Model Development and Model Deployment. Ivy serves as a glue that connects various frameworks and compiler infrastructures, making them compatible. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
9/7/202338 minutes, 59 seconds
Episode Artwork

Navigating the Risk Landscape: A Deep Dive into Generative AI

Andrew Burt is the Managing Partner at Luminos.Law, the first law firm focused on helping teams manage the privacy, fairness, security, and transparency of their AI and data — including generative AI systems.  We explore the state of risk and compliance in light of generative AI. This episode further explores the challenges and risks posed by AI, and the implications of the FTC probe into OpenAI, as well as the NIST AI Risk Management Framework.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
8/31/202342 minutes, 25 seconds
Episode Artwork

Software Development with AI and LLMs

Michele Catasta is VP of AI at Replit, an AI-powered software development platform that allows teams to build and deploy applications on any device, without any setup required.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
8/24/202349 minutes, 9 seconds
Episode Artwork

A Lightweight SDK for Integrating AI Models and Plugins

Alex Chao is a Product Manager at Microsoft focused on Semantic Kernel, an open-source AI and LLM orchestrator. Semantic Kernel (SK) is a lightweight SDK that makes it easy to integrate AI models and plugins into applications. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
8/17/202345 minutes, 55 seconds
Episode Artwork

Using LLMs to Build AI Co-pilots for Knowledge Workers

Steve Hsu wears many hats, but most recently he is co-founder of SuperFocus, a startup building LLM-backed knowledge co-pilots for enterprises.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
8/10/202348 minutes, 21 seconds
Episode Artwork

ETL for LLMs

Brian Raymond is the founder of Unstructured, a startup building open source data pre-processing and ingestion tools specifically for Large Language Models (LLMs). Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
8/3/202336 minutes, 10 seconds
Episode Artwork

The Future of Graph Databases

Emil Eifrem is co-founder and CEO of Neo4j, the leading graph database and graph data science software provider. We discussed a range of topics including: the current state of graph databases, graph data science and graph neural networks, vector databases, the interplay between LLMs, knowledge graphs, and graph databases.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
7/27/20231 hour, 1 minute, 24 seconds
Episode Artwork

Delivering Safe and Effective LLM and NLP Applications

David Talby is the CTO and Founder of John Snow Labs, the company behind two popular open source projects: Spark NLP and LangTest.  In this episode we focus on LangTest, an open-source Python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. [Note: After we recorded this episode, NLTest was renamed to LangTest.]Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
7/20/202337 minutes, 56 seconds
Episode Artwork

Using Data and AI to Democratize Entity Resolution and Master Data Management

Jeff Jonas is Founder and CEO of Senzing, a startup focused on democratizing entity resolution – making this deceptively complicated task easy for programmers to use and deploy.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
7/13/202350 minutes, 39 seconds
Episode Artwork

An Open Source Data Framework for LLMs

Jerry Liu is CEO and co-founder of LlamaIndex, an open source project and startup that builds tools that enable teams to augment LLMs with their own private data. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
7/6/202349 minutes, 24 seconds
Episode Artwork

Redefining AI Infrastructure: Deploying and Developing with a Next-Generation Developer Platform

Tim Davis is the Co-Founder & Chief Product Officer of Modular, a startup focused on building tools to help simplify AI infrastructure.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/29/202350 minutes, 19 seconds
Episode Artwork

The Rise of Custom Foundation Models

Andrew Feldman is CEO and co-founder of Cerebras, a startup that has released the fastest AI accelerator, based on the largest processor. We discussed Cerebras-GPT, a family of language models that have set new benchmarks for accuracy and compute efficiency, with sizes ranging from 111 million to 13 billion parameters.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/22/202339 minutes, 20 seconds
Episode Artwork

The Future of Vector Databases and the Rise of Instant Updates

Louis Brandy is  VP of Engineering at Rockset, the real-time search and analytics database startup formed by the creators of the popular open source project, RocksDB. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/15/202348 minutes, 17 seconds
Episode Artwork

LLMs Are the Key to Unlocking the Next Generation of Search

Amin Ahmad, the co-founder of Vectara, has played a crucial role in developing a powerful API platform specifically tailored for developers.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/8/202344 minutes, 58 seconds
Episode Artwork

Building and Deploying Foundation Models for Enterprises

Jonas Andrulis is the Founder & CEO Aleph Alpha, a startup that provides enterprise software solutions backed with their own large language models and multimodal modelsSubscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
6/1/202334 minutes, 9 seconds
Episode Artwork

Building Robust AI Infrastructure for Critical Solutions

Alex Remedios, founder of Treebeardtech, leads a London-based consulting firm dedicated to assisting machine learning teams in constructing dependable, secure, and adaptable cloud infrastructures crucial for delivering business-critical artificial intelligence solutions. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/25/202331 minutes, 40 seconds
Episode Artwork

Machine Learning for High-Risk Applications

Patrick Hall, is co-founder of BNH and a visiting faculty member of decision sciences at the George Washington University School of Business.  Agus Sudjianto, EVP, Head of Corporate Model Risk at Wells Fargo. We explore several topics covered in the new book Machine Learning for High-Risk Applications, co-authored by Patrick and with a foreword by Agus.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/18/202346 minutes, 25 seconds
Episode Artwork

Boosting Perception With Synthetic Data

Omar Maher is Director of Product Marketing at Parallel Domain, a startup that is advancing machine perception capabilities by harnessing the power of synthetic data. We delve into the growing adoption of synthetic data and the factors driving its use. We discuss major developments in synthetic data generation and its overlap with Generative AI. The conversation also covers data privacy, intellectual property, the generation of structured data like LiDAR, the current state of adoption, and key research directions to overcome existing challenges.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/11/202335 minutes, 34 seconds
Episode Artwork

Revolutionizing B2B: Unleashing the Power of AI and Data

Simon Chan is the General Partner at Firsthand Alliance, a venture capital fund focused on the future of B2B and enterprise software. We explore the evolution of AI, cloud computing, and business collaboration tools, revealing how a new generation of generative AI technologies is enabling applications to generate content and drive transformative innovation across various industries.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
5/4/202343 minutes, 8 seconds
Episode Artwork

AI Metadata

Gev Sogomonian is co-author of AimStack, an open-source, self-hosted AI metadata tracker that logs all your AI metadata, such as experiments and prompts, and provides a user-friendly UI for comparing and observing them. It also offers an SDK for programmatically querying tracked metadata.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
4/27/202331 minutes, 30 seconds
Episode Artwork

The 2023 AI Index

Raymond Perrault is a Distinguished Computer Scientist at SRI International, and Co-Director of the Steering Committee for the AI Index, an annual report that tracks, collates, distills, and visualizes data relating to AI, to help inform decision-makers and teams to take meaningful action for responsible and ethical AI. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
4/20/202343 minutes, 38 seconds
Episode Artwork

Custom Foundation Models

Hagay Lupesko, is VP Engineering at MosaicML, a startup that enables teams to easily train large AI models on their data and in their own secure environment. We discuss the the evolution of cloud based machine learning (from “traditional” ML through LLMs), his experience building machine learning applications at leading technology companies, and the need for companies to build their own custom foundation models.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
4/13/202338 minutes, 6 seconds
Episode Artwork

Uncovering and Highlighting AI Trends

Jakub Zavrel is the Founder and CEO at Zeta Alpha, a premier Neural Discovery Platform that utilizes cutting-edge Neural Search technology to enhance the way you and your team uncover, arrange, and disseminate knowledge. Our conversation focuses on the latest developments in artificial intelligence, taking inspiration from their recent viral article featuring the top the 100 most cited AI papers of 2022.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
4/6/202349 minutes, 2 seconds
Episode Artwork

How Data and AI Happened

Chris Wiggins is a Professor at Columbia University and the Chief Data Scientist at the NYTimes.  He is also co-author of a fascinating new historical exploration of how data has been used as a tool in shaping society, from the census to eugenics to Google search. How Data Happened traces the trajectory of data and explores new mathematical and computational techniques that serve to shape people, ideas, society, and economies.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/30/202348 minutes, 48 seconds
Episode Artwork

Blazing fast bulk data transfers between any cloud

Paras Jain and Sarah Wooders are graduate students at UC Berkeley’s Sky Computing Lab. They are part of the team behind Skyplane, and open source project that accelerates wide-area transfers in the cloud via overlay routing and parallelism. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/23/202331 minutes, 29 seconds
Episode Artwork

Exhaustion of High-Quality Data Could Slow Down AI Progress in Coming Decades

Pablo Villalobos is a Staff Researcher at  Epoch, and lead author of the recent paper “Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning”.  We discuss the key findings in this paper, as well as a related study Pablo conducted on scaling laws. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/16/202333 minutes, 9 seconds
Episode Artwork

Generating high-fidelity and privacy-preserving synthetic data

Jinsung Yoon (Senior Research Scientist) and Sercan Arik (Staff Research Scientist and Manager) are part of the Google team behind EHR-Safe,  a set of tools for generating highly realistic and privacy-preserving synthetic Electronic Health Records.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/9/202335 minutes, 47 seconds
Episode Artwork

How technology is disrupting the venture capital industry

Brandon Jenkins, Co-founder and COO of Fundrise, the largest direct-to-individuals alternative investment platform in the country. Our conversation centered on their recent foray into technology investing, specifically startup companies in the data infrastructure space. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
3/2/202336 minutes, 19 seconds
Episode Artwork

Running Machine Learning Workloads On Any Cloud

Zongheng Yang, is a researcher in the Sky Computing Lab at UC Berkeley, a multi-year research initiative that utilizes distributed systems, programming languages, security and machine learning to separate the services that a company requires from the choice of a specific cloud. He provides a detailed overview and update on SkyPilot, a groundbreaking intercloud broker that views the cloud ecosystem as a unified and integrated entity rather than a collection of disparate, largely incompatible clouds. SkyPilot enables users to run Machine Learning and Data Science batch jobs on any cloud, realize substantial cost savings, access the best hardware across clouds, and enjoy higher resource availability.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/23/202337 minutes, 11 seconds
Episode Artwork

2023 Trends in Data Engineering and Infrastructure

Jesse Anderson, Evan Chan, and I delve into the current developments and possibilities within the realm of data engineering and platforms. As the foundation for artificial intelligence and machine learning, data plays a crucial role in the advancement of these technologies. Download a copy of the FREE Report:  https://gradientflow.com/2023trendsreport/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/16/202345 minutes, 47 seconds
Episode Artwork

Preparing for the Implementation of the EU AI Act and Other AI Regulations

This week we discuss AI regulations with Gabriela Zanfir-Fortuna is VP for Global Privacy at the Future of Privacy Forum, and Andrew Burt, Managing Partner at BNH, the first law firm focused on AI and Analytics.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/9/202336 minutes, 38 seconds
Episode Artwork

The Open Source Stack Unleashing a Game-Changing AI Hardware Shift

Dylan Patel is the Chief Analyst at SemiAnalysis,  a boutique semiconductor research and consulting firm focused on the semiconductor supply chain from chemical inputs to fabs to design IP and strategy. In this episode, we discuss the emerging open source software stack for PyTorch that makes it easier and more accessible to implement non-Nvidia backends (see his recent post).Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
2/2/202341 minutes, 55 seconds
Episode Artwork

Data Science and AI in Context

Peter Norvig (of Google and Stanford) and Alfred Spector (of MIT) are part of the team of authors behind the must-read book Data Science in Context: Foundations, Challenges, Opportunities. We discussed their recent book and tool a deep dive into their Data Science Analysis Rubric, and we also talked about a trending topics in AI including looming regulations, synthetic data, and Large Language and Foundation Models.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
1/26/202347 minutes, 40 seconds
Episode Artwork

Evaluating Language Models

Percy Liang is Associate Professor of Computer Science and Statistics, and Director of the new Center for Research on Foundation Models at Stanford University. We discussed a new suit of tools (HELM) designed to help users and researchers understand language models in their totality. We also discuss recent trends in AI including the rise of Generative AI and Foundation Models.Download a copy of our FREE 2023 Trends in Data and AI Report:  https://gradientflow.com/2023trendsreport/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
1/19/202345 minutes, 36 seconds
Episode Artwork

2023 Opportunities and Trends: Data, Machine Learning, and AI

Jenn Webb, special correspondent and managing editor at Gradient Flow, recently organized a mini-panel to discuss themes and trends for 2023. The panel consisted of myself and Mikio Braun. More information on these trends can be found in our Annual Trends Report, which is available for free download (see details below). Download a copy of the FREE Report:  https://gradientflow.com/2023trendsreport/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
1/12/20231 hour, 5 minutes, 31 seconds
Episode Artwork

Exploring DALL·E 2

Given the growing interest in Generative AI, we revisit a conversation with Mark Chen, Research Scientist at OpenAI and part of the team behind DALL·E 2, a new AI system that can create realistic images and art based on natural language descriptions. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
1/5/202337 minutes, 40 seconds
Episode Artwork

Data Science at Shopify and Stitch Fix

On this special end of the year episode, we revisit conversations with two data science leaders in the e-commerce space:Wendy Foster, Director, Engineering & Data Science at Shopify.Olivia Liao, Senior Director of Data Science at Stitch Fix.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon •  RSS.Detailed show notes can be found on The Data Exchange web site.
12/29/202237 minutes, 25 seconds
Episode Artwork

Building a data management system for unstructured data

Shayan Mohanty is the CEO of Watchful, a modern and interactive solution that places the control of data labeling back in the hands of data scientists, machine learning practitioners, and subject matter experts. This podcast focuses on a data management system (written in Rust) they built to support the level of automation and interactivity required to support Watchful.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
12/22/202236 minutes, 32 seconds
Episode Artwork

A Cloud Native Vector Database Management System

Frank Liu is Director of Operations & ML Architect at Zilliz, the company behind Milvus,  an open source vector database. We discuss their recent VLDB paper (“A Cloud Native Vector Database Management System”) that describes recent updates to Milvus, as well as vector databases and vector search in general.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
12/15/202248 minutes, 50 seconds
Episode Artwork

What’s Next for Machine Learning in Time Series

Ira Cohen is co-founder, Chief Data Scientist at Anodot, a startup that uses time series tools to monitor  business data in real time, so organizations can proactively resolve revenue, cost, and customer experience issues before they impact business performance. We recently wrote a well-received post that provided a detailed overview on the state of technologies for collecting, storing, and unlocking time series. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
12/8/202238 minutes, 8 seconds
Episode Artwork

Efficient Methods for Natural Language Processing

Roy Schwartz is Professor of Natural Language Processing at The Hebrew University of Jerusalem. We discussed a recent survey paper that Roy co-wrote that presented a broad overview of existing methods to improve NLP efficiency through the lens of traditional NLP pipelines. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
12/1/202245 minutes, 40 seconds
Episode Artwork

Responsible and Trustworthy AI

On this Thanksgiving holiday weekend in the U.S., we revisit a Twitter Spaces conversation I had withAndrew Burt, Managing Partner at BNH1, the first law firm focused on AI risks.Bob Friday, Chief AI Officer at Juniper Networks.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
11/23/202230 minutes, 1 second
Episode Artwork

Building a premier industrial AI research and product group

Hung Bui is the CEO of VinAI, a premier Artificial Intelligence research-based company developing world-class products and services. Hung assembled the VinAI team just over three years ago and they are now among the Top 20 Global Companies in AI Research in 2022. Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
11/17/202237 minutes, 50 seconds
Episode Artwork

An open source, production grade vector search engine

Bob van Luijt, is CEO of SeMI Technologies, the company behind the popular vector search engine Weaviate.   Bob describes their key features and core components, popular use cases, and he also provides an overview of Weaviate’s near-term roadmap. We also discuss how vector search engines compare with existing data management systems.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
11/10/202235 minutes, 14 seconds
Episode Artwork

A comprehensive suite of open source tools for time series modeling

Federico Garza and Max Mergenthaler Canseco are both CTOs and co-founders of Nixtla, a startup building developer-friendly software that helps data scientists deploy predictive pipelines.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • RSS.Detailed show notes can be found on The Data Exchange web site.
11/3/202235 minutes, 11 seconds
Episode Artwork

Building Safe and Reliable AI applications

Christopher Nguyen is CEO and cofounder of Aitomatic, a startup that uses a knowledge-first approach to build and deploy machine learning solutions, with a focus on industrial applications (manufacturing and other physical settings).Join us at K1st World, a fantastic symposium and networking event slated for November 16 & 17. Use the discount code GRADIENTFLOW60 to attend in person or online.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
10/27/202230 minutes, 39 seconds
Episode Artwork

A new storage engine for vectors

Ram Sriharsha is VP of Engineering and R&D at Pinecone, a startup that offers a fully managed vector database (not just an index). We discuss Pinecone’s new proprietary storage engine, which was first described around the time we recorded this conversation.Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
10/20/202241 minutes, 58 seconds
Episode Artwork

Project Lightspeed: Next-generation Spark Streaming

Karthik Ramasamy, is the Head of Streaming at Databricks. He has extensive experience in streaming, having led teams at Twitter (Apache Heron), Splunk, and Streamlio (Apache Pulsar).Subscribe to the Gradient Flow Newsletter:  https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
10/13/202241 minutes, 43 seconds
Episode Artwork

The Unreasonable Effectiveness of Speech Data

Piotr Żelasko is Head of Research at Meaning, a startup building an AI platform using speech technologies. He has years of experience in speech technologies, both as a researcher and as a software engineer.  We recorded this episode on the week of the release of Whisper,  deep learning model (from OpenAI) that approaches human level robustness and accuracy on English speech recognition.  Our conversation centered on Whisper and speech recognition, but also touched on the new speech data processing tools (Lhotse, k2, Icefall) that we described in our recent post.Download a FREE copy of our recent 2022 Trends Report (Data, Machine Learning, AI):  https://gradientflow.com/2022trendsreport/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
10/6/202235 minutes
Episode Artwork

Machine Learning Integrity

Yaron Singer is the CEO of Robust Intelligence, a company building tools to help manage and mitigate risks associated with machine learning models and applications. Download a FREE copy of our recent 2022 Trends Report (Data, Machine Learning, AI):  https://gradientflow.com/2022trendsreport/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/29/202244 minutes, 33 seconds
Episode Artwork

Synthetic data technologies can enable more capable and ethical AI

Yashar Behzadi is the CEO & Founder of Synthesis AI, a startup that uses synthetic data technologies to enable teams building AI applications, as well as gaming and metaverse applications.Download a FREE copy of our recent 2022 Trends Report (Data, Machine Learning, AI):  https://gradientflow.com/2022trendsreport/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/22/202239 minutes, 4 seconds
Episode Artwork

Confidential Computing for Machine Learning

Sadegh Riazi is CEO and co-founder of CipherMode Labs, a startup building tools that enable data and machine learning teams to build and deploy models directly on encrypted data. CipherMode’s new open source project enables teams to develop and deploy machine learning algorithms using familiar tools, and thus opens up the possibility of using sensitive data in different scenarios both within an organization, and in cooperation with other organizations.Download a FREE copy of our recent 2022 Trends Report (Data, Machine Learning, AI):  https://gradientflow.com/2022trendsreport/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/15/202236 minutes, 20 seconds
Episode Artwork

Applied NLP Research at Primer

John Bohannon is a Senior Director of Data Science and Head of Research at Primer AI, an end-to-end machine intelligence solution for textual data. We discussed their process of translating ML research into ML products, through the lens of several use cases.Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/8/202241 minutes, 50 seconds
Episode Artwork

Using SQL to Retrieve Data from APIs and Web Services

Jon Udell is community lead for Steampipe, an open-source tool that populates a database table with data retrieved from APIs. They use Postgres, which means that data is easy to explore and retrieve using SQL. Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/1/202231 minutes, 9 seconds
Episode Artwork

Machine Learning for Time Series Intelligence

Aadyot Bhatnagar, is a Senior Research Engineer at Salesforce, and co-creator of Merlion an open source framework for applying machine learning on time series data.  Merlion supports a wide range of time series learning tasks including forecasting, anomaly detection, and change point detection. Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
8/25/202240 minutes, 12 seconds
Episode Artwork

Unleashing the power of large language models

Maarten Grootendorst, is a data scientist at IKNL, and more importantly, he’s the author of two open source libraries that I’ve come to love: BERTopic (topic modeling with transformers and c-TF-IDF) and PolyFuzz (fuzzy string matching). Both these projects bring the power of transformers and other leading edge models, and package them with simple APIs, clear documentation, and visualization tools.Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
8/18/202238 minutes, 51 seconds
Episode Artwork

Building production-ready machine learning pipelines

Hamza Tahir and Adam Probst are co-creators of ZenML, an extensible open source framework for building reproducible pipelines. We discuss the current state of ZenML, the many use cases that ZenML has been designed for, and its near-term roadmap. Download the FREE Report: State of Workflow Orchestration →  https://gradientflow.com/2022-workflow-orchestration-survey/?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
8/11/202249 minutes, 9 seconds
Episode Artwork

Machine Learning at Gong

Dr. Omri Allouche is Head of Research at Gong, a company that uses advances in NLP and speech models to identify and highlight risks and opportunities during customer interactions. Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
8/4/202236 minutes, 43 seconds
Episode Artwork

Data Infrastructure for Computer Vision

Danny Bickson and Amir Alush are the creators of fastdup, a very impressive free tool for surfacing duplicates, anomalies, and leakage in visual data. In line with its name, it’s fast: fastdup is written in C++ and can handle millions of images easily. Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
7/28/202236 minutes
Episode Artwork

How DALL·E works

Mark Chen is a Research Scientist at OpenAI and part of the team behind DALL·E 2, a new AI system that can create realistic images and art based on natural language descriptions. Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
7/21/202237 minutes, 37 seconds
Episode Artwork

Scalable, end-to-end machine learning, for everyone

Jules Damji is lead developer advocate, and Richard Liaw is an engineering manager at Anyscale, the startup founded by the creators of Ray, the open source project that makes it simple to scale any compute-intensive Python workload. To learn more about Ray and how to scale machine learning applications, attend the Ray Summit (San Francisco / Aug 23-24)  https://www.anyscale.com/ray-summit-2022?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
7/14/202246 minutes, 42 seconds
Episode Artwork

Orchestration and Pipelines for Data Scientists

Rick Lamers is co-Founder and CEO at Orchest, the startup behind an open source project that enables data scientists to create, manage, and execute complex end-to-end data pipelines. Download the FREE Report: State of Workflow Orchestration →  https://gradientflow.com/2022-workflow-orchestration-survey/?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
7/7/202244 minutes, 15 seconds
Episode Artwork

Dataframes at scale

Devin Petersohn is CTO and co-founder of Ponder, and the creator of Modin,  a fast, scalable, drop-in replacement for the popular Pandas library. Download the FREE Report: State of Workflow Orchestration →  https://gradientflow.com/2022-workflow-orchestration-survey/?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
6/30/202237 minutes, 15 seconds
Episode Artwork

Software-Defined Assets

Nick Schrock is founder and Elementl, the startup behind Dagster, a popular open source, data orchestration platform. We discussed recent trends in data engineering and infrastructure, and Dagster’s introduction of software-defined assets, a new approach to managing, maintaining, and orchestrating data declaratively.Download the FREE Report: State of Workflow Orchestration →  https://gradientflow.com/2022-workflow-orchestration-survey/?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
6/23/202240 minutes, 49 seconds
Episode Artwork

Adversarial Machine Learning

Edmon Begoli, leads the AI Systems R&D section at Oak Ridge National Laboratory (ORNL), where he is also a distinguished member of the ORNL research staff.  Our conversation centered on his upcoming presentation at the Data+AI Summit, where he will describe the four principal categories of Adversarial AI and their future implications.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
6/16/202246 minutes, 42 seconds
Episode Artwork

Orchestrating Machine Learning Applications

Haytham Abuelfutuh is co-founder and CTO of Union, a startup founded by the team behind Flyte, a popular open source project originated by Lyft. Flyte is a workflow automation platform used for many different applications, but especially as an orchestrator for machine learning applications.Download the FREE Report: State of Workflow Orchestration → https://www.prefect.io/lp/gradientflow?utm_source=gradientflow&utm_medium=newsletterSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
6/9/202247 minutes, 11 seconds
Episode Artwork

Narrative AI

This week’s guest is Hilary Mason, co-founder of Hidden Door, a startup that uses AI and machine learning to help create and power role-playing games (RPG). Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
6/2/202240 minutes, 38 seconds
Episode Artwork

Machine Learning Model Observability

Oren Razon is CEO and co-founder of Superwise, a startup that builds tools to streamline observability for machine learning models. This episode provides a comprehensive overview of tools and best practices for deploying, monitoring, and managing machine learning models in production.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
5/26/202239 minutes, 32 seconds
Episode Artwork

Dataflow Automation

Jeremiah Lowin is co-founder and CEO of Prefect, the company behind the popular open source data workflow orchestration system with the same name. We discussed the major design changes in Prefect 2.0, their move towards treating “code as workflows”, data engineering challenges facing data and ML teams today, and implications of looming trends in machine learning and AI.Download the FREE Report: State of Workflow Orchestration → https://www.prefect.io/lp/gradientflow?utm_source=gradientflow&utm_medium=newsletterSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
5/19/202246 minutes, 55 seconds
Episode Artwork

Practical Machine Learning and Deep learning

Sebastian Raschka is lead author of a new book from Packt entitled “Machine Learning with PyTorch and Scikit-Learn”.  He is also an Assistant Professor of Statistics at the University of Wisconsin (Madison), and serves as the Lead AI Educator at Grid.ai.  Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
5/12/202248 minutes, 27 seconds
Episode Artwork

Machine Learning for Optimization

This week’s guests are Ade Fajemisin (Postdoctoral Researcher) and Donato Maragno (PhD Student) of the University of Amsterdam. They were co-authors of a recent paper (“Optimization with Constraint Learning: A Framework and Survey”) that explores how machine learning can be used to learn constraints in optimization problems. Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
5/5/202226 minutes, 25 seconds
Episode Artwork

Efficient Scaling of Language Models

This week’s guests are Barret Zoph and Liam Fedus, research scientists at Google Brain. Our conversation centered around Large Language Models (LLM), specifically recent work by Barret, Liam, and their collaborators on efficient scaling of large language models.Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
4/28/202227 minutes, 6 seconds
Episode Artwork

Data Science at Stitch Fix

Olivia Liao is Senior Director of Data Science at Stitch Fix, a company that uses data science and expert stylists to deliver personalization at scale. We discuss how they blend data science and domain expertise, how they tune recommendations in light of logistics and supply chain constraints, and how they incorporate new developments in large language models, multimodal  models and Responsible AI.Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
4/21/202230 minutes, 57 seconds
Episode Artwork

The 2022 AI Index

Jack Clark is co-director of the AI Index Steering Committee. In this episode we discuss key findings of the fifth edition of the AI Index. The report uses multiple metrics (benchmarks, publications, patents, legislation, etc.) to track progress in AI (mainly deep learning) in key areas that include computer vision, speech recognition, and language models. Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
4/14/202245 minutes, 13 seconds
Episode Artwork

Why You Need A Time-Series Database

This week’s guests are Ajay Kulkarni (CEO) and Mike Freedman (CTO), co-founders of Timescale, the startup behind the popular relational database for time-series and analytics. Mike is also a Professor of Computer Science at Princeton University. Our conversation took place a few weeks after Timescale raised a massive funding round and achieved unicorn status. Download the FREE Report: 2022 Data Engineering Survey Report → https://gradientflow.com/2022desurvey/?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
4/7/202245 minutes, 47 seconds
Episode Artwork

Data Science at Shopify

This week’s guest is Wendy Foster, Director of Engineering & Data Science at Shopify. We discussed applications of data science within Shopify, how they organize their data teams, the lifecycle of a data science project within the company, and how they approach emerging challenges like Responsible AI, large language models, and multimodal models.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
3/31/202235 minutes, 28 seconds
Episode Artwork

An AI Risk Management Framework

This week’s guests are Elham Tabassi of the National Institute of Standards and Technology (NIST) and Andrew Burt, Managing Partner of BNH.ai, the first law firm focused on AI compliance, risk mitigation, and related topics. We discuss the new NIST framework – “AI Risk Management Framework” – intended for voluntary use to manage risks in the design, development and use of AI products and systems. Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
3/24/202230 minutes, 55 seconds
Episode Artwork

An open source and end-to-end library for causal inference

This week’s guests are Amit Sharma (Principal Researcher) and Emre Kiciman (Senior Principal Researcher) of Microsoft Research. We talk about practical applications of causal inference, a set of tools and techniques that enable data teams to draw causal conclusions based on data.  Amit and Emre are part of the team behind DoWhy, a new open source library for estimating causal effects based on historical data alone, particularly useful when we cannot run an experiment because of time, expense, or ethical concerns.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
3/17/202239 minutes, 57 seconds
Episode Artwork

The Graph Intelligence Stack

Leo Meyerovich is founder and CEO of Graphistry, a startup building tools to democratize visual graph intelligence and graph machine learning. Leo and I recently wrote a well-received post (“What Is Graph Intelligence?”) making the case for why companies need to revisit graph analytics and graph intelligence.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
3/10/202237 minutes, 21 seconds
Episode Artwork

NLP and Language Models in Healthcare and the Life Sciences

This week’s guests are Dia Trambitas-Miron (Head of Product) and David Talby (CTO) of John Snow Labs, the startup behind the popular open source project, Spark NLP. The company also has a suite of products including an NLP platform targeted specifically for the healthcare, pharmaceutical, and biotech sectors. Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/3/202237 minutes, 44 seconds
Episode Artwork

Delivering Continuous Intelligence at Scale

Simon Crosby is CTO of Swim.ai, a startup building tools (based on the Swim open source project) for next-generation data and AI applications. Swim is one of several projects (along with Ray and Akka) contributing to interest in the Actor Model for building large-scale machine learning and data applications and infrastructure. Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
2/24/202231 minutes, 23 seconds
Episode Artwork

Imperceptible NLP Attacks

Nicholas Boucher is a PhD at Cambridge University where his focus is on security including on topics like homomorphic encryption, voting systems, and adversarial machine learning. He is the lead author of a fascinating new paper – “Bad Characters: Imperceptible NLP Attacks” – which provides a taxonomy of attacks against text-based NLP models, that are  based on Unicode and other encoding systems. Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/17/202244 minutes, 53 seconds
Episode Artwork

Evolving Data Science Training Programs

This week’s guest is Anjali Samani, Director of Data Science and Data Intelligence at SalesForce. We first met during the early days of Faculty, one of the leading data science and AI startups in Europe. Anjali helped design and lead the early Fellowship programs at Faculty (these are intensive bootcamps that turn STEM PhDs and turn them into industrial data scientists).Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
2/10/202233 minutes, 52 seconds
Episode Artwork

Building Machine Learning Infrastructure at Netflix and beyond

Savin Goyal is CTO and co-founder of Outerbounds, a startup building infrastructure to help teams streamline how they build machine learning applications. Prior to starting Outerbounds, Savin and team worked at Netflix, where they were instrumental in the creation and release of Metaflow, an open source Python framework that addresses some of the challenges data scientists face around scalability and version control.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
2/3/202235 minutes, 6 seconds
Episode Artwork

Democratizing NLP

Moshe Wasserblat is a Senior Principal Engineer at Intel, where he serves as a Research Manager focused on NLP and Deep Learning. Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/27/202243 minutes, 32 seconds
Episode Artwork

Machine Learning at Discord

Gaurav Chakravorty, is a Senior Manager at  Discord, where he leads the team responsible for machine learning models in the area of search and notification. Prior to discord Gaurav was a manager at Google where he led the team responsible for personalized podcast recommendations.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
1/20/202240 minutes, 14 seconds
Episode Artwork

Applications of Knowledge Graphs

This week's guest is Mike Tung, founder and CEO of Diffbot, a startup that crawls the web and offers one of the most comprehensive knowledge graphs accessible through a variety of simple interfaces. Detailed show notes can be found on The Data Exchange web site.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.
1/13/202239 minutes, 48 seconds
Episode Artwork

Key AI and Data Trends for 2022

In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and my podcast co-organizer Mikio Braun. This conversation took place as we were assembling our list of trends for 2022.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.
1/6/202236 minutes, 49 seconds
Episode Artwork

Large Language Models

This episode features conversations with two experts who have helped train and release models that can recognize, predict, and generate human language on the basis of very large text-based data sets. First is an excerpt of my conversation with Connor Leahy, AI Researcher at Aleph Alpha GmbH, and founding member of EleutherAI, (pronounced “ee-luther”) a collective of researchers and engineers building resources and models for researchers who work on natural language models.   Next up is an excerpt from a recent conversation with Yoav Shoham, co-founder of AI21 Labs, creators of the largest language model available to developers. Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/30/202141 minutes, 13 seconds
Episode Artwork

Data and Machine Learning Platforms at Shopify

Azeem Ahmed, is Director of Engineering at Shopify, where he leads the team that builds the primitives and the API’s used by all data scientists, machine learning engineers, and members of Shopify's engineering team. Our conversation focused on the evolution and design of data and machine learning platforms within Shopify. Azeem and I also discussed broader trends, including the rise of modern data platforms and the maturation of data lakehouses.Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/23/202143 minutes, 34 seconds
Episode Artwork

What is AI Engineering?

Christopher Nguyen is CEO and co-founder of Aitomatic, a startup building a platform for Industrial AI applications. Christopher previously held executive and leadership roles at organizations tasked with building machine learning solutions for traditional enterprises. Our conversation centered around what Christopher terms, AI Engineering – a new discipline concerned with the qualitative and quantitative design, construction, and operation of systems with artificial-intelligence capabilities.Download a FREE copy of our recent Data Engineering Survey Results:  https://gradientflow.com/2022desurveySubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/16/202132 minutes, 48 seconds
Episode Artwork

NLP and AI in Financial Services

This week’s guest is Anshul Pandey, CTO and co-founder at Accern, a startup helping financial services companies build and deploy AI applications via a no-code platform. Our conversation focused on the specific challenges of building AI and NLP applications within financial services. Download a FREE copy of our recent NLP Industry Survey Results:  https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/9/202146 minutes, 24 seconds
Episode Artwork

Modern Experimentation Platforms

Che Sharma is the founder and CEO of Eppo, an experimentation framework that integrates with modern data platforms (cloud lakehouses and cloud data warehouses). We discuss the importance of investing in experimentation tools and the power of having a well-oiled experimentation culture within an organization. Che also explains how modern data platforms enable applications like experimentation frameworks like Eppo.Download a FREE copy of our recent Data Engineering Survey Results:  https://gradientflow.com/2022desurveySubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/2/202153 minutes, 55 seconds
Episode Artwork

Reinforcement Learning in Real-World Applications

Happy Thanksgiving to listeners who celebrate it! This episode features conversations with two experts who have been applying reinforcement learning to problems in industry. First is an excerpt of my conversation with  Nicolas (Nic) Hohn, Chief Data Scientist, McKinsey/QuantumBlack Australia. Nic led a team of data scientists charged with helping America’s Cup winning team,  Emirates Team New Zealand, test new designs for hydrofoils – important sailing boat components that could be modified based on rules set forth by race organizers.  I also include an excerpt of a conversation with Max Pumperla, Data Science Professor at IU International University of Applied Sciences, who at the time of our conversation, was also the Head of Product Research at Pathmind, a SaaS that helps businesses use reinforcement learning in real-world applications.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/24/202137 minutes, 5 seconds
Episode Artwork

MLOps Anti-Patterns

This week’s guest is Nikhil Muralidhar,  a Graduate Research Assistant at Virginia Tech College of Engineering. He is the lead author of an excellent survey paper entitled “Using AntiPatterns to avoid MLOps Mistakes”. Download a FREE copy of our recent Data Engineering Survey Results:  https://gradientflow.com/2022desurveySubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/18/202137 minutes, 10 seconds
Episode Artwork

Why You Need a Modern Metadata Platform

Pardhu Gunnam (CEO) and Mars Lan (CTO), are co-founders of Metaphor Data, creators of the first Modern Metadata Platform. As we noted in a previous post, a metadata fabric is the right foundation for data governance and data discovery solutions, data catalogs, and other enterprise data services. This insight resulted in several metadata systems being created within technology companies a few years ago. In fact, the team at Metaphor created one of the more popular systems – DataHub – while they were at Linkedin.Video version has a detailed table of contents:  https://www.youtube.com/watch?v=W8ZJHN77IegSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/11/202146 minutes, 5 seconds
Episode Artwork

Making Large Language Models Smarter

This week’s guest is Yoav Shoham, co-founder of AI21 Labs, creators of the largest language model available to developers. Yoav is also a Professor Emeritus of Computer Science at Stanford University, and a serial entrepreneur who has co-founded numerous data and AI startups. Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/4/202138 minutes, 42 seconds
Episode Artwork

AI Begins With Data Quality

This week’s guest is Jeremy Stanley, co-founder and CTO of Anomalo, a startup building SaaS tools to help companies with data quality.   Prior to Anomalo, Jeremy was VP of Data Science at Instacart.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/28/202143 minutes, 23 seconds
Episode Artwork

Modernizing Data Integration

This week’s guest is Michel Tricot, co-founder and CEO of Airbyte, a startup behind the popular open source project with the same name. While still a relatively young open source project, Airbyte has emerged a favorite among data and platform engineers tasked with building and maintaining data integration systems within companies.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/21/202137 minutes, 44 seconds
Episode Artwork

Deploying Machine Learning Models Safely and Systematically

This week’s guest is Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai. Prior to GitHub, Hamel worked on machine learning applications and systems at  Airbnb and DataRobot.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/14/202141 minutes, 32 seconds
Episode Artwork

Large-scale machine learning and AI on multi-modal data

This week’s guest is Bob Friday, VP and CTO at Mist Systems a Juniper Company.  Bob is a serial entrepreneur and seasoned technologist, and at Mist  his team uses data technologies, machine learning , and AI to “optimize user experiences and simplify operations across the wireless access, wired access, and SD-WAN domains”. Bob and his team build models from structured, semi-structured, and unstructured data. They have deployed anomaly detection models that rely on deep learning (LSTMs) and have begun exploring the use of graph neural networks for a variety of use cases. They have also built and deployed systems that use recent advances in natural language models. Their virtual assistant provides insight and guidance to IT staff via a natural language conversational interface.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/7/202132 minutes, 5 seconds
Episode Artwork

Machine Learning in Astronomy and Physics

This week’s guest is Dr. Viviana Acquaviva, Associate Professor in the Physics Department at the CUNY NYC College of Technology and at the CUNY Graduate Center. She is an Astrophysicist with a strong interest in Data Science and Machine Learning.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/30/202140 minutes, 48 seconds
Episode Artwork

The Unreasonable Effectiveness of Multiple Dispatch

This week I have my annual check-in on the state of Julia with Viral Shah, Co-founder and CEO of Julia Computing. Since we spoke last year, Julia continues to make inroads and grow its user base, and Julia Computing closed their $24M Series A round in July.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/23/202151 minutes, 18 seconds
Episode Artwork

How To Lead In Data Science

This week our special correspondent and editor Jenn Webb and I speak with Jike Chong and Cathy Chang, executives and seasoned leaders of data science teams. Our conversation is focused on their new book “How to Lead in Data Science”.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/16/202143 minutes, 54 seconds
Episode Artwork

Why interest in graph databases and graph analytics are growing

In this episode of the Data Exchange, our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Paco Nathan, author, teacher, and founder of Derwen.ai, a boutique consulting firm specializing in Data, machine learning, and AI.  Of late, Paco has been doing a lot of work with graphs and as such he’s had to immerse himself in the world of graph data management technologies. This conversation is focused on what’s new with graph databases, and why there’s been a resurgence in interest in them. We also discuss use cases of graph databases, graph analytics, and graph neural networks. Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/9/202153 minutes, 18 seconds
Episode Artwork

The State of Data Journalism

This week our special correspondent and editor Jenn Webb speak with Tara Kelly, Data Editor at DataJournalism.com (DJC) an organization created by the European Journalism Centre. DJC provides journalists and media groups with free resources, materials, online video courses and community forums. Most recently they created two free e-books: The Verification Handbook and an updated edition of the Data Journalism Handbook.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/2/202153 minutes, 31 seconds
Episode Artwork

Auditing machine learning models for discrimination, bias, and other risks

This week’s guest are Rayid Ghani,  Distinguished Career Professor in the Machine Learning Department and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University, and Andrew Burt, co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance, risk mitigation, and related topics.  BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate risks associated with machine learning and AI.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/26/202152 minutes, 13 seconds
Episode Artwork

An oscilloscope for deep learning

This week’s guest is Charles Martin, independent researcher and founder of Calculation Consulting, a boutique consultancy focused on data science and machine learning. Along with Michael Mahoney and Serena Peng, Charles is co-author of a recent Nature paper on new methods for evaluating and tuning deep learning models (“Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data”).Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/19/202149 minutes, 57 seconds
Episode Artwork

What’s new in data engineering

This week our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Jesse Anderson, Managing Director at the Big Data Institute. Jesse is the author of a recent book entitled “Data Teams: A Unified Management Model for Successful Data-Focused Teams”. This conversation was focused on key areas in data engineering.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/12/202136 minutes, 45 seconds
Episode Artwork

The evolution of the data science role and of data science tools

This week our managing editor Jenn Webb and I speak with Sean Taylor, Data Science Manager at Lyft. Sean was previously a research scientist and manager at Facebook where he was instrumental in the creation and release of Prophet, a very popular open source library for time-series forecasting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/5/202150 minutes, 15 seconds
Episode Artwork

Data Augmentation in Natural Language Processing

This week’s guests are Steven Feng, Graduate Student and Ed  Hovy, Research Professor, both from the Language Technologies Institute of Carnegie Mellon University. We discussed their recent survey paper on Data Augmentation Approaches in NLP (GitHub), an active field of research on techniques for increasing the diversity of training examples without explicitly collecting new data. One key reason why such strategies are important is that augmented data can act as a regularizer to reduce overfitting when training models.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/29/202151 minutes, 44 seconds
Episode Artwork

Storage Technologies for a Multi-cloud World

This week’s guest is Brad King, CTO of Scality, a company that builds software-defined file and object storage systems for hybrid & multi-cloud settings. Storage and compute are the basic building blocks of (cloud) computing platforms and this episode highlights all the important considerations and recent innovations in storage technologies that data engineers, architects, and machine learning professionals need to know.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/22/202142 minutes, 43 seconds
Episode Artwork

Building a next-generation dataflow orchestration and automation system

In this episode, our managing editor Jenn Webb and I speak with Chris White, CTO of Prefect, a startup building tools to help companies build, monitor, and manage dataflows. Prefect originated from lessons Chris and his co-founder learned while they were at Capital One, where they were early users and contributors to related projects like Apache Airflow.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/15/202148 minutes, 36 seconds
Episode Artwork

Building a flexible, intuitive, and fast forecasting library

This week’s guests are Reza Hosseini, Staff Software Engineer, and Albert Chen, Staff Data Scientist, both at Linkedin. Reza and Albert are part of the team behind the new open source library Greykite, a flexible and fast library for time-series forecasting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/8/202144 minutes, 14 seconds
Episode Artwork

Neural Models for Tabular Data

This week’s guest is Sercan Arik, Research Scientist at Google Cloud AI. Sercan and his collaborators recently published a paper on TabNet, a deep neural network architecture for tabular data. It uses sequential attention to select features, is explainable, and based on tests Sarjan and team have done spanning many domains, TabNet outperforms or is on par with other models (e.g., XGBoost) on classification and regression problems.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/1/202143 minutes, 55 seconds
Episode Artwork

Training and Sharing Large Language Models

This week’s guest is Connor Leahy, AI Researcher at Aleph Alpha GmbH, and founding member of EleutherAI, (pronnounced “ee-luther”) a collective of researchers and engineers building resources and models for researchers who work on natural language models. As NLP research becomes more computationally demanding and data intensive, there is a need for researchers to work together to develop tools and resources for the broader community. While relatively new, EleutherAI has already released a models and data that many researchers are benefitting from.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/24/202150 minutes, 53 seconds
Episode Artwork

Questioning the Efficacy of Neural Recommendation Systems

This week’s guests are leading researchers in recommendation systems: Paolo Cremonesi is Professor of Computer Science and Maurizio Ferrari Dacrema is a Postdoc at Politecnico di Milano, where they are both part of the RecSys research group. Paolo is also the Reproducibility co-chair for the upcoming RecSys Conference.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/17/202159 minutes, 24 seconds
Episode Artwork

Automation in Data Management and Data Labeling

This week’s guest is Hyun Kim, co-founder and CEO of Superb AI, a startup building tools to help companies manage data across the entire machine learning application lifecycle. This includes tools to label, store, and monitor data assets that power all computer vision applications. We also discussed emerging trends in machine learning and AI including synthetic data, reinforcement learning, and self-supervised learning.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/10/202143 minutes, 50 seconds
Episode Artwork

Reinforcement Learning For the Win

This week’s guest is Nicolas (Nic) Hohn, Chief Data Scientist, McKinsey/QuantumBlack Australia. Nic led a team of data scientists charged with helping America’s Cup winning team,  Emirates Team New Zealand, test new designs for hydrofoils – important sailing boat components that could be modified based on rules set forth by race organizers. More precisely the QuantumBlack team used Ray RLlib to design an AI agent that could learn to sail the boat for a given design at an optimal speed, and this AI agent proved crucial during the design process.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/3/202148 minutes, 15 seconds
Episode Artwork

How Companies Are Investing in AI Risk and Liability Minimization

In this episode of the Data Exchange I speak with Andrew Burt, co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance, risk mitigation, and related topics.  BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate risks associated with machine learning and AI. Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/27/202141 minutes, 57 seconds
Episode Artwork

The Future of Machine Learning Lies in Better Abstractions

This week’s guest is Travis Addair, he previously led the team at Uber that was responsible for building Uber’s deep learning infrastructure. Travis is deeply involved with two popular open source projects related to deep learning:He is maintainer of Horovod, a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.And Travis is a co-maintainer of Ludwig, a toolbox that allows users to train and test deep learning models without the need to write code.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/20/202148 minutes, 53 seconds
Episode Artwork

Why You Should Optimize Your Deep Learning Inference Platform

In this episode of the Data Exchange, I speak with Yonatan Geifman, CEO and co-founder of Deci, as well as with Ran El-Yaniv, Chief Scientist and co-founder of Deci and Professor of Computer Science at Technion. Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/13/202141 minutes, 37 seconds
Episode Artwork

AI Beyond Automation

In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and Jerry Overton, who previously served as a DXC Fellow, Head of AI at DXC Technology.  We discussed Jerry’s experience helping companies across many industries adopt data science and machine learning. We spoke about Centers of Excellence for AI, automation in the workforce, human-centered and responsible AI, and cyborgs!Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/6/202143 minutes, 18 seconds
Episode Artwork

Injecting Software Engineering Practices and Rigor into Data Governance

As the amount and importance of data grows within organizations, there is growing interest in tools that enable them to strategically utilize, manage, and unlock their data resources. This week’s guest is Steven (Steve) Touw, cofounder and CTO of Immuta,  a startup that builds tools that help companies address data governance, data discovery, data privacy and security.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/29/202142 minutes, 37 seconds
Episode Artwork

Building a data store for unstructured data and deep learning applications

In this episode of the Data Exchange, I speak with Davit Buniatyan, founder and CEO of ActiveLoop, a startup building data management tools for unstructured data types commonly associated with deep learning.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/22/202135 minutes, 33 seconds
Episode Artwork

How Technology Companies Are Using Ray

In this episode of the Data Exchange, I speak Zhe Zhang, Engineering Manager at Anyscale where he leads the team that works on the Ray and its ecosystem of libraries and partners. Ray is an open source, general purpose framework for building distributed applications (more details in this post and video).Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/15/202135 minutes, 26 seconds
Episode Artwork

Data quality is key to great AI products and services

In this episode of the Data Exchange, I speak with Abe Gong, CEO and co-founder at Superconductive,  a startup founded by the team behind the Great Expectations (GE) open source project. GE is one of a growing number of tools aimed at improving data quality through tools for validation and testing. Other projects in this area include TensorFlow DV, assertr, dataframe-rules-engine, deequ, data-describe, and Apache Griffin.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/8/202141 minutes, 1 second
Episode Artwork

Machine Learning in Healthcare

In this episode of the Data Exchange, I speak with Parisa Rashidi,  Associate Professor at the Department of Biomedical Engineering at University of Florida. Parisa is a computer scientist and machine learning researcher who specializes in applications of ML to healthcare and biomedical domains.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/1/202143 minutes, 9 seconds
Episode Artwork

Measuring the Impact of AI and Machine Learning Research

In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and Simon Rodriguez,  Data Research Assistant at the Center for Security and Emerging Technology (CSET) at Georgetown University.  Through a series of reports and data briefs, CSET provides policymakers with data rich material to inform and guide public policy.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/25/202140 minutes, 43 seconds
Episode Artwork

The Mathematics of Data Integration and Data Quality

In this episode of the Data Exchange, I speak with Ryan Wisnesky, CTO and co-founder of Conexus, a startup that uses techniques from mathematics and incorporates them into novel tools for data integration, data management, and knowledge management.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/18/202144 minutes
Episode Artwork

Pricing Data Products

In this episode of the Data Exchange, I speak with Jian Pei, Professor, School of Computing Science, Simon Fraser University. His research spans data science, big data, data mining, and database systems. But in this podcast we talk about tools for estimating the economic value of data. Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/11/202146 minutes, 24 seconds
Episode Artwork

Challenges, Opportunities, and Trends in EdTech

In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb and I speak with Sharon Zhou, a PhD student in Computer Science at Stanford University. Sharon has been teaching very popular courses on GANs (generative adversarial networks) on Coursera. In this conversation we  examine the state of Education Technology (EdTech), learning platforms, and other tools for teaching online. A year into the global pandemic, we discuss advantages and disadvantages of various technologies for delivering classes, as well as broader issues in education.We also took the opportunity to discuss Sharon’s work on deep learning, including her work using GANs to help the general public and policy makers to better understand the implications of climate change.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/4/202151 minutes, 32 seconds
Episode Artwork

Towards Simple, Interpretable, and Trustworthy AI

In this episode of the Data Exchange I speak with Sheldon Fernandez, CEO at Darwin AI, and Alex Wong, Professor at the University of Waterloo, Co-Founder of DarwinAI (Chief Scientist) and Euclid Labs.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/25/202141 minutes, 42 seconds
Episode Artwork

The Rise of Metadata Management Systems

In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and Assaf Araki,  investment manager at Intel Capital. Assaf and I have written a series of articles and this interview took place shortly before the release of our most recent collaboration: The Growing Importance of Metadata Management Systems. We devote this episode to how metadata management will impact many enterprise data systems.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/18/202130 minutes, 19 seconds
Episode Artwork

Tools for building robust, state-of-the-art machine learning models

In this episode of the Data Exchange I speak with Michael Mahoney, a researcher at UC Berkeley’s RISELab, ICSI, and Department of Statistics. Mike and his collaborators  were recently awarded one of the best papers awards at NeurIPS 2020, one of leading research conferences in machine learning.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2021 Trends Report: Data, Machine Learning, AI and learn emerging technologies for data management, data engineering, machine learning, and AI.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/11/202142 minutes, 26 seconds
Episode Artwork

Creating Master Data at Scale with AI

In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and Sonal Goyal, founder of Aficx, a startup that builds solutions to unify data silos for cross selling and upselling, fraud and risk management, compliance and regulatory reporting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/4/202138 minutes, 11 seconds
Episode Artwork

Bringing AI and computing closer to data sources

In this episode of the Data Exchange I speak with Bruno Fernandez-Ruiz, CTO and cofounder of Nexar, Inc., a startup that uses dash cams powered by vision-based applications to improve driving and logistics. Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2021 Trends Report: Data, Machine Learning, AI and learn emerging technologies for data management, data engineering, machine learning, and AI.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/28/202148 minutes, 43 seconds
Episode Artwork

Deep Learning in the Sciences

In this episode of the Data Exchange I speak Bharath (“Bart”) Ramsundar, author and open source developer. While in graduate school, Bart created deepchem, an open source project that aims to democratize deep learning for science.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/21/202139 minutes, 47 seconds
Episode Artwork

Taking business intelligence and analyst tools to the next level

In this episode of the Data Exchange I speak with Ira Cohen: co-founder and Chief Data Scientist at Anodot, a startup that uses AI for business monitoring.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/14/202148 minutes, 16 seconds
Episode Artwork

Data exchanges and their applications in healthcare and the life sciences

In this episode of the Data Exchange I speak with Omer Dror, CEO and co-founder of Lynx.md, a startup that enables data exchanges and markets in the health and life sciences. Data exchanges match data providers and suppliers, with data buyers and users. Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/7/202151 minutes, 51 seconds
Episode Artwork

Key AI and Data Trends for 2021

In this episode of the Data Exchange, our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and my podcast co-organizer Mikio Braun. We began our conversation by taking a look back at some of our predictions from last year which included applications of reinforcement learning, end-to-end machine learning platforms, and more. This year we organized trends in the following categories:Tools for building and managing machine learning and AI applications.Foundational data technologies.(Cloud) Computing and Hardware.Emerging trends in AI.This episode provides a sneak peak to a formal report that comes out in early 2021. Sign-up here and we will send you a copy of our 2021 Trends Report as soon as it comes out.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/31/202052 minutes, 32 seconds
Episode Artwork

A Unified Management Model for Successful Data-Focused Teams

In this episode of the Data Exchange, our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Jesse Anderson, Managing Director at the Big Data Institute. Jesse is the author of a recent book entitled “Data Teams: A Unified Management Model for Successful Data-Focused Teams”.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/24/202047 minutes, 58 seconds
Episode Artwork

Security and privacy for the disoriented

In this episode of the Data Exchange I speak with Dan Geer, Senior Fellow at In-Q-tel and Andrew Burt, co-founder and Managing Partner of BNH.ai and Chief Legal Officer at Immuta. Dan is one the leading experts in cybersecurity and risk management, and he has written numerous influential essays on security, privacy, and risk (examples here and here). Andrew serves as co-founder of a new law firm focused on AI compliance and related topics.  BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate those risks.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/17/202046 minutes, 34 seconds
Episode Artwork

The State of Responsible AI

In this episode of the Data Exchange I speak with Dr. Rumman Chowdhury, founder of Parity, a startup building products and services to help companies build and deploy ethical and responsible AI. Prior to starting Parity, Rumman was Global Lead for Responsible AI at Accenture Applied Intelligence.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/10/202038 minutes, 55 seconds
Episode Artwork

Improving the robustness of natural language applications

In this episode of the Data Exchange I speak with Jack Morris, a member of Google’s AI Residency program. He is co-creator of TextAttack, an open source framework for adversarial attacks, data augmentation, and adversarial training in NLP (paper, code).Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/3/202037 minutes, 35 seconds
Episode Artwork

End-to-end deep learning models for speech applications

In this episode of the Data Exchange I speak with Yishay Carmiel, an AI Leader at Avaya, a company focused on digital communications.  He has long been immersed in speech technologies and conversational applications and I have frequently used him as a resource to understand the latest in speech systems.  We previously co-wrote an article that listed out recommendations for teams building speech applications. We also had a previous conversation on the impact of deep learning and big data on speech technologies.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/26/202043 minutes, 48 seconds
Episode Artwork

Securing machine learning applications

In this episode of the Data Exchange I speak with Ram Shankar, a Berkman Klein Center affiliate, and a researcher and engineer who works at the intersection of Machine Learning and Security. This episode is focused on the current state of tools and techniques for securing machine learning applications.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/19/202045 minutes, 40 seconds
Episode Artwork

Testing Natural Language Models

In this episode of the Data Exchange I speak with Marco Ribeiro,  Senior Researcher at Microsoft Research, and lead author of the award-winning paper ”Beyond Accuracy: Behavioral Testing of NLP models with CheckList”.  As machine learning gains importance across many application domains and industries, there is a growing need to formalize how ML models get built, deployed, and used.  MLOps is an emerging set of practices focused on productionizing the machine learning lifecycle, that draws ideas from CI/CD. But even before we talk about deploying a model to production, how do we inject more rigor into the model development process?Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/12/202030 minutes, 9 seconds
Episode Artwork

Detecting Fake News

Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.In this episode of the Data Exchange I speak with Xinyi Zhou,   a graduate student in Computer and Information Science at Syracuse University.  Xinyi and her advisor (Reza Zafarani) recently wrote a comprehensive survey paper entitled “A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities”. They set out to organize the many different methods and perspectives used to detect fake news. Their paper is a great resource for anyone wanting to understand the strengths and limitations of various state-of-the-art techniques, and a feel for where the research community might be headed in the near future.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/5/202032 minutes, 46 seconds
Episode Artwork

The Computational Limits of Deep Learning

Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Neil Thompson, Research Scientist at Computer Science and Artificial Intelligence Lab (CSAIL) and the Initiative on the Digital Economy, both at MIT.  I wanted Neil on the podcast to discuss a recent paper he co-wrote entitled “The Computational Limits of Deep Learning” (summary version here). This paper provides estimates of the amount of computation, economic costs, and environmental impact that come with increasingly large and more accurate deep learning models. Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/29/202043 minutes, 4 seconds
Episode Artwork

Making deep learning accessible

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Piero Molino, creator of Ludwig, a toolbox that allows users to train and test deep learning models through a declarative interface. Piero created Ludwig while serving as a Senior Research Scientist at Uber AI. He originally created Ludwig for his personal use and it slowly garnered users within Uber. By the time it was open sourced in early 2019, the project immediately found a receptive audience in the conferences I was chairing at the time.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/22/202046 minutes, 56 seconds
Episode Artwork

Building and deploying knowledge graphs

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Mayank Kejriwal, a Research Assistant Professor in the Department of Industrial and Systems Engineering, and a Research Lead at the USC Information Sciences Institute. The focus of our conversation is knowledge graphs, a collection of linked entities (objects, events, concepts) that is used in many AI applications. For example, Google uses a knowledge graph to enhance its search engine results with infoboxes that appear in some search results. Other areas where knowledge graphs are common include e-commerce, healthcare, and financial services.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/15/202049 minutes, 30 seconds
Episode Artwork

Financial Time Series Forecasting with Deep Learning

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Murat Özbayoğlu, Chair of Artificial Intelligence Engineering at TOBB University of Economics and Technology in Ankara, Turkey.  I wanted Murat on to discuss two survey papers he and his colleagues wrote on the use of deep learning in finance.I’ve long been fascinated with finance and trading. My first job after I left academia was as the lead quant in a hedge fund, and ever since, I’ve tried to stay abreast of what tools and techniques quants and data scientists in finance are using. Forecasting in this setting usually means price prediction or price movement (trend) prediction. Output of forecasting models are used to inform investment decisions. What makes finance particularly challenging is that many people are using the same underlying data (time series of prices/values), and thus as Murat notes, many firms use alternative data sources (such as text) as potential sources of additional signal.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/8/202037 minutes, 10 seconds
Episode Artwork

A programming language for scientific machine learning and differentiable programming

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Viral Shah, co-founder and CEO, Julia Computing. Along with his Julia language co-creators, Viral was awarded the 2019 Wilkinson prize, for outstanding contributions in the field of numerical software. I first tweeted about Julia at the beginning of March 2012 after seeing Jeff Bezanson give a talk in Stanford. I’ve dabbled with it here and there, but have never used it for a major project. Over the past few years, Julia continued to add packages at a steady pace and the package manager is really quite impressive and solid.  We spent much of the podcast discussing the state of Julia, Julia 1.5, and the Julia ecosystem and community.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/1/202050 minutes, 17 seconds
Episode Artwork

Using machine learning to modernize medical triage and monitoring systems

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Kira Radinsky, Chairwoman & Chief Technology Officer at Diagnostic Robotics, a startup using AI to build a medical-grade triage and clinical-predictions platform. She is also a visiting Professor at Technion – Israel Institute of Technology.  Kira has extensive experience using data science and machine learning in a variety of settings, and she was one of the pioneers in using alternative data sources to augment forecasting models. Her earlier work includes models to predict social unrest as well as disease outbreaks.  The global pandemic has increased the need for experts in medical data mining, a field where Kira has made many significant contributions to.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/24/202032 minutes, 55 seconds
Episode Artwork

Connecting Reinforcement Learning to Simulation Software

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Max Pumperla, deep learning engineer at Pathmind and a contributor to many open source projects in data science and machine learning.  Max is speaking on applications of reinforcement learning to simulation problems at the upcoming Ray Summit, a free virtual conference scheduled for Sep 30th and Oct 1st.  Earlier this year I had Pathmind’s CEO Chris Nicholson on this podcast and he described how reinforcement learning might play a role in simulation problems. In this episode, Max provides an update and a technical description of how Pathmind uses reinforcement learning, RLLib, and Tune, to help users of AnyLogic, a widely used software for simulations in business applications.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/17/202052 minutes, 47 seconds
Episode Artwork

Using machine learning to detect shifts in government policy

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Weifeng Zhong, Senior Research Fellow at the Mercatus Center at George Mason University. He is the core maintainer of the open source Policy Change Index (PCI), a framework that uses machine learning and NLP to “process and read” large amounts of text to discern government priorities and policies. The initial PCI is focused on major policy shifts in China and uses NLP and machine learning to process and analyze  the People’s Daily.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/10/202043 minutes, 8 seconds
Episode Artwork

What is AI Assurance?

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Ofer Razon, co-Founder & CEO at Superwise, a startup focused on building tools that help companies gain more visibility and control of machine learning models in production. Ofer and Superwise are part of a group in the early stage of building tools and best practices for scaling AI operations. The goal is to help multiple stakeholders build the necessary solutions to evaluate models, receive alerts and troubleshoot on time, validate, observe, and gather insights for more efficiency.  AI assurance will ultimately bring together different parts of an organization including business, data science and operational teams, legal and compliance, and privacy and security.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/3/202038 minutes, 5 seconds
Episode Artwork

Best practices for building conversational AI applications

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Alan Nichol, co-founder and CTO of Rasa, the startup behind the popular open source framework for building conversational AI applications. I had Alan on as a guest in my old podcast, and that conversation was focused on components of Rasa and of chatbot applications. This time around we talked about the state of developer tools, as well as software engineering best practices for building conversational AI applications.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/27/202043 minutes, 55 seconds
Episode Artwork

Tools for scaling machine learning

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange, our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Paco Nathan, author, teacher, and founder of Derwen.ai, a boutique consulting firm specializing in Data, machine learning (ML), and AI.We began by discussing tools for scaling machine learning. Paco and I have been impressed with the growth in the number of libraries being built on top of Ray as well as the variety of use cases that are being addressed by Ray.We then discussed the upcoming Ray Summit, a FREE virtual conference featuring over 50 talks on machine learning, Python, serverless and cloud native technologies.We also looked back at the first eight months of this podcast (here’s an archive of previous episodes).  Both Paco and Jenn were instrumental in getting this podcast started, and I wanted to mark crossing the 30episode threshold with a short retrospective.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/20/202039 minutes, 6 seconds
Episode Artwork

From Python beginner to seasoned software engineer

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Joel Grus, Principal Engineer at the Capital Group. He previously served as a Senior Research Engineer at the Allen Institute for AI, where he was a core engineer on AllenNLP, a PyTorch-based library for NLP research. Joel is also the author of one of the most widely read books in data science – Data Science from Scratch. Joel has a new book which I recommend highly:  Ten Essays on Fizz Buzz.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/13/202049 minutes, 20 seconds
Episode Artwork

Assessing Models and Simulations of Epidemic Infectious Diseases

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I bring back Bruno Gonçalves, a data scientist working at the intersection of Data Science and Finance. Bruno was a guest on this podcast in April, when the COVID-19 cases were spiking in his home base in NYC. Prior to shifting over to data science, he spent several years as a researcher focused on mathematical models in Epidemiology – a field with a rich history dating as far back as the 1920s.  I wanted to bring him back to get an update on the mathematical models being used to model the global pandemic.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/6/202043 minutes, 38 seconds
Episode Artwork

Improving the hiring pipeline for software engineers

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Karthik Ramasamy (Senior Director Of Engineering at Splunk) and Arun Kejariwal (experienced engineering leader). The focus of our conversation was hiring technical talent such as software engineers, developers, data scientists, architects, etc.  The global pandemic has caused a global economic slowdown and massive layoffs across many industry sectors. But many companies are still hiring and companies are still competing for technical talent. In our bi-weekly newsletter, links pertaining to hiring and work culture have been very popular from the outset.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/30/202052 minutes, 30 seconds
Episode Artwork

How to build state-of-the-art chatbots

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Lauren Kunze, CEO of Pandorabots, a widely used platform for building chatbots. About four years ago I attended Bot Day in San Francisco, and at the time, chatbots were very much in the news. Today, chatbots are used across many industries and use cases, and on many types of devices.  Lauren Kunze and Pandorabots have been at the forefront of many important developments in the conversational applications space. They assist many enterprises build and deploy bots, and they also create leading edge chatbots like Mitsuku.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/23/202045 minutes, 12 seconds
Episode Artwork

Democratizing machine learning

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Ameet Talwalkar, co-founder and Chief Scientist at Determined AI1, and an Assistant Professor in the Machine Learning Department at Carnegie Mellon University.  A few months ago, I spoke with one of Ameet’s co-founders (Evan Sparks), around the time they announced that they were open sourcing the Determined Training Platform (DTP).  Ameet and I started off by discussing the first few months of DTP as an open source project, specifically initial feedback from users, applications and use cases that they are seeing, and much more.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/16/202044 minutes, 26 seconds
Episode Artwork

How graph technologies are being used to solve complex business problems

Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Denise Gosnell, Chief Data Officer at DataStax. Denise is also the co-author of the new book, The Practitioner’s Guide to Graph Data, which covers foundational tools and techniques needed to utilize graph technologies in production applications.  This conversation is a great introduction to what has become an important class of technologies and tools. Graph technologies are used to power a wide array of applications, including recommendation engines, fraud detection systems, identity and access management, search, and many other use cases.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/9/202049 minutes, 38 seconds
Episode Artwork

Machines for unlocking the deluge of COVID-19 papers, articles, and conversations

In this episode of the Data Exchange I speak with Amy Heineike, Principal Product Architect at Primer.ai, a startup building machines that can read and write. Primer recently used their technology to build COVID-19 Primer, a web site that provides an overview of the latest research papers, media coverage, and social media conversations pertaining to COVID-19.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/2/202042 minutes, 57 seconds
Episode Artwork

Designing machine learning models for both consumer and industrial applications

In this episode of the Data Exchange I speak with Christopher Nguyen, CEO of Arimo (a Panasonic company). I first met Christopher in the early days of Apache Spark, Arimo was one of the first companies to embrace Spark and make it a central component of their data platform. He was also an early proponent of exploring deep learning for enterprise applications. A serial entrepreneur, Christopher was also an Engineering Director at Google where he was responsible for Google Apps.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/25/202033 minutes, 34 seconds
Episode Artwork

Building open source developer tools for language applications

In this episode of the Data Exchange I speak with Matthew Honnibal, founder of Explosion AI, a startup focused on building developer tools for AI and natural language processing. Matthew and team are the creators of popular tools like spaCy (NLP), Thinc (lightweight deep learning library), and Prodigy (annotation and active learning). Our conversation focused on a range of topics including:spaCyThincExplosion AI and ProdigyDistributed computing with RayDetailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/18/202043 minutes, 55 seconds
Episode Artwork

Viewing machine learning and data science applications as sociotechnical systems

In this episode of the Data Exchange I speak with Chris Wiggins, Associate Professor at Columbia University, Chief Data Scientist at the New York Times, and co-founder of hackNY. He began his career in theoretical physics but he always had a strong interest in applying quantitative techniques to other disciplines. Early in his career he became interested in applications of machine learning to problems in biology and the health sciences.Our conversation focused on a range of topics including:How he shifted his focus from physics to machine learning and data science.Applications of reinforcement learning.“Data scientist” as a job title, and data science training programs.Ethics in machine learning and data science, including training the next generation of data scientists.A 2015 essay written by Michael Jordan and Tom Mitchell.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/11/202040 minutes, 34 seconds
Episode Artwork

Identifying and mitigating liabilities and risks associated with AI

In this episode of the Data Exchange I speak with Andrew Burt, Chief Legal Officer at Immuta and co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance and related topics. As AI and machine learning become more widely deployed, lawyers and technologists need to collaborate more closely so they can identify and mitigate liabilities and risks associated with AI. BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate those risks.Our conversation focused on a range of topics including:Why a law firm is the right vehicle for helping companies manage and mitigate risks associated with AI and machine learning.The legal profession’s long history in managing risk and regulatory frameworks.Model governance.Incident response and recovery.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/4/202035 minutes, 7 seconds
Episode Artwork

How machine learning is being used in quantitative finance

In this episode of the Data Exchange our special correspondent and editor Jenn Webb speaks with Arum Verma, Head of Quantitative Research Solutions at Bloomberg. My first job post-academia was as lead quant in a small hedge fund. Since then, I’ve followed the industry from afar and I’ve long been interested in the role of data and models in financial services. Arun and I discussed quantitative finance when we ran into each other at the O’Reilly AI conference in London last year. He was slated to give a talk on extracting trading signals from alternative data sets, an important subject among quants.Jenn and Arun discussed a range of topics including:The quantitative finance landscape.The challenges in identifying and using alternative data sources.Applications of machine learning in finance, specifically deep learning and reinforcement learning.New natural language models and their applications in finance.Model Explainability and Model Risk Management.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/28/202040 minutes, 4 seconds
Episode Artwork

Understanding machine learning model governance

In this episode of the Data Exchange I speak with Harish Doddi, cofounder of Datatron, a startup focused on helping companies operationalize machine learning. Over the past two years, Harish has worked closely with enterprises to understand their needs in the areas of model operations and model governance. Last year Harish and I, along with David Talby, wrote two articles on these topics. In the first article, we described these emerging areas (“What are model governance and model operations?”),  and in the second we listed lessons that ML engineers can draw from two highly regulated industries (“Managing machine learning in the enterprise: Lessons from banking and health care”).As machine learning becomes widely deployed, organizations will need to develop processes and tools to ensure that models behave as intended. This means having the right set of controls and validation steps in place.Our conversation focused on model governance and related topics:We discussed the three related areas of MLOps, Model Governance, Model Observability.I asked Harish to describe how model governance is perceived and practiced in different industries.We discussed real-world examples of model governance, and organizational and staffing considerations that come into play.CI/CD for machine learning.Key enterprise features for model governance solutions.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/21/202035 minutes, 8 seconds
Episode Artwork

Improving performance and scalability of data science libraries

In this episode of the Data Exchange I speak with Wes McKinney, Director of Ursa Labs and an Apache Arrow PMC Member. Wes is the creator of pandas, one of the most widely used Python libraries for data science. He is also the author of the best-selling book, “Python for Data Analysis” – a book that has become essential reading for both aspiring and experienced data scientists.Our conversation focused on data science tools and other topics including:Two open source projects Wes has long been associated with: pandas and Apache Arrow.The need for a shared infrastructure for data science.Ursa Labs: its mission and structure.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/14/202033 minutes, 43 seconds
Episode Artwork

Why TinyML will be huge

In this episode of the Data Exchange I speak with Pete Warden, Staff Research Engineer at Google. Pete is a prolific author and teacher, and he has made many important contributions across many open source software projects. To name just a couple of his projects: he put together the Data Science toolkit (open data sets and open-source tools for data science) and he assembled tools to help developers get started using deep learning, long before TensorFlow and PyTorch were available.  Most recently, Pete has been focused on implementing machine learning in ultra-low power systems (TinyML).Our conversation focused on TinyML and other topics including:The early days of using deep learning for computer visionTensorFlow – Pete was part of the team at Google that originated TF.What is TinyML and why is going to be an important topic in the years ahead.Privacy and security in the context of TinyML.Pete’s new book and accompanying video series on YouTube, both designed to help developers get started building TinyML applications.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/7/202036 minutes, 49 seconds
Episode Artwork

An open source platform for training deep learning models

In this episode of the Data Exchange I speak with Evan Sparks, cofounder and CEO of Determined AI, a startup that recently open sourced a platform for training deep learning models.  Many of the impressive results and applications of deep learning have happened at a handful of companies and research groups. As more companies use deep learning they are learning that infrastructure for training and transfer learning isn’t widely available.Our conversation focused on deep learning and other topics including:Their decision to open source the Determined Training Platform (DTP).Enterprise use cases and applications of deep learning, and why Evan thinks more companies will need a platform for training DL models.The components that come with the DTP:  Distributed Training and Hyperparameter Tuning, Experiment Tracking and tools for collaboration and governance, Scheduler specialized for DL workflows, and more.Some examples of how teams have been using DTP.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/30/202040 minutes, 44 seconds
Episode Artwork

Algorithms that continually invent both problems and solutions

In this episode of the Data Exchange I speak with Kenneth Stanley, a Senior Research Manager at Uber AI and a Professor at UCF. Ken just announced that starting in June he is starting a new research group focused on open-endedness at OpenAI.  He is a pioneer in the field of neuroevolution – a method for evolving and learning neural networks through evolutionary algorithms. Ken and his colleague, Joel Lehman, wrote one of my favorite books on AI aimed at a broad audience: Why Greatness Cannot Be Planned. In this episode we discuss his upcoming move to OpenAI, as well as his recent work on open-ended algorithms.Our conversation covered:Ken’s new position at OpenAI.The transition from being a longtime academic researcher to founding and helping lead an industrial research team (Uber AI Labs).Open-ended algorithms, specifically his work on POET (Paired Open-Ended Trailblazer) and Enhanced POET.Generative Teaching NetworksDetailed show notes can be found on The Data Exchange web site.Subscribe to The Data Exchange Newsletter.
4/23/202043 minutes, 45 seconds
Episode Artwork

Computational Models and Simulations of Epidemic Infectious Diseases

In this episode of the Data Exchange I speak with Bruno Gonçalves, a data scientist working at the intersection of Data Science and Finance. I have known Bruno for several years and we met when I recruited him to teach several extremely popular conference tutorials and talks on machine learning and deep learning. Prior to shifting over to data science, he spent several years as a researcher focused on mathematical models in Epidemiology – a field with a rich history dating as far back as the 1920s.  This episode is devoted to tools and techniques for modeling epidemics.Our conversation covered:Bruno’s background and his experience in modeling epidemics.The field of epidemic models: what techniques are used, the size of the community of researchers, and how do models get evaluated.His two recent posts: “Epidemic Modeling 101 – Or why your CoVID-19 exponential fits are wrong” and  “Epidemic Modeling 102 – All CoVID-19 models are wrong, but some are useful”The role that epidemic models play in decision making.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Data Exchange Newsletter.
4/16/202034 minutes, 37 seconds
Episode Artwork

Human-in-the-loop machine learning

In this episode of the Data Exchange I speak with Rob Munro, CEO of Machine Learning Consulting and author of the forthcoming book, “Human-in-the-loop Machine Learning”. If you want a copy of Rob’s book, use the discount code podexchange20.Our conversation covered:Rob’s experience building data and machine learning products at Powerset, Idibon, and AWS.Natural language processing - Given Rob’s extensive experience as a researcher, practitioner, and entrepreneur in areas that touch on NLP, we discussed recent trends in language technologies.Human-in-the-loop machine learning.Our goal in this podcast is to build a community of people interested in Data, Machine Learning and AI. If you have suggestions for us on what to recommend (books, conferences, links), and guests to book, please visit TheDataExchange.media site and fill out the “contact” form.  The first five people who fill out the form get a free book from Manning (you can view Manning’s catalog here).Detailed show notes can be found on The Data Exchange web site.
4/9/202043 minutes, 35 seconds
Episode Artwork

Next-generation simulation software will incorporate deep reinforcement learning

In this episode of the Data Exchange I speak with Chris Nicholson, founder and CEO of Pathmind, a startup applying deep reinforcement learning (DRL) to simulation problems.  In a recent post I highlighted two areas where companies can begin to add DRL to their suite of tools: personalization and recommendation engines, and simulation software. My interest in the interplay between DRL and simulation software began when I came across the work of Pathmind in this area.Our conversation focused on deep reinforcement learning and its applications:We began with the basics: what is reinforcement learning and why should businesses pay attention to it?We discussed enterprise applications of DRL, with particular emphasis in areas where Chris and Pathmind have been focused of late: Business Process Simulation and Optimization.Pathmind have been early adopters of Ray and of RLlib, a popular open-source library for reinforcement learning built on top of Ray. I asked Chris why they chose to build on top of RLlib.Detailed show notes can be found on The Data Exchange web site.
4/2/202039 minutes, 55 seconds
Episode Artwork

Business at the speed of AI: Lessons from Shopify

In this episode of the Data Exchange I speak with Solmaz Shahalizadeh, VP and Head of Data Science and Data Platform Engineering at Shopify. Shopify is a powerhouse in ecommerce and their technology powers over a million businesses worldwide. Solmaz is a frequent speaker and presenter at conferences throughout the world and she has played a critical role in helping Shopify scale its data and machine learning infrastructure.Our conversation covered many important technical and business topics including:Building and scaling machine learning data products.Building and scaling data teams.Data informed product building.Detailed show notes can be found on The Data Exchange web site.
3/26/202037 minutes, 12 seconds
Episode Artwork

How deep learning is being used in search and information retrieval

In this episode of the Data Exchange I speak with Edo Liberty, founder of Hypercube, a startup building tools for deploying deep learning models in search and information retrieval involving large collections. When I spoke at AI Week in Tel Aviv last November several friends encouraged me to learn more about Hypercube - I’m glad I took their advice!Our conversation covered several topics including:Edo’s experience applying machine learning and building tools for ML at places like Yale, Yahoo's Research Lab in New York, and Amazon’s AI Lab.How deep learning is being used in search and information retrieval.Challenges one faces in building search and information retrieval applications when the size of collections are large.Deep learning based search and information retrieval and what Edo describes as “enterprise end-to-end deep search platforms”.Detailed show notes can be found on The Data Exchange web site.
3/19/202039 minutes, 50 seconds
Episode Artwork

The responsible development, deployment and operation of machine learning systems

In this episode of the Data Exchange I speak with Alejandro Saucedo, Engineering Director at Seldon, a startup building tools for productionizing machine learning. Alejandro is also Chief Scientist at The Institute for Ethical AI & Machine Learning, a UK-based research center that conducts “research into processes and frameworks that support the responsible development, deployment and operation of machine learning systems”.Our conversation covered Alejandro’s work at both Seldon and the Institute for Ethical AI & Machine Learning:We discussed topic areas that the Institute focuses on including explainability, MLOps, adversarial robustness, and privacy-preserving machine learningWe covered some of the recent output from the Institute including the machine learning maturity model, their open source explainable AI library, their AI-RFX Procurement Framework, and their list of Principles for Responsible AIWe also discussed his role at Seldon, and areas that Seldon has been focused on.Detailed show notes can be found on The Data Exchange web site.
3/12/202038 minutes, 52 seconds
Episode Artwork

Hyperscaling natural language processing

In this episode of the Data Exchange I speak with Edmon Begoli, Chief Data Architect at Oak Ridge National Laboratory (ORNL).  Edmon has developed and implemented large-scale data applications on systems like Open MPI, Hadoop/MapReduce, Apache Calcite, Apache Spark, and Akka. Most recently he has been building large-scale machine learning and natural language applications with Ray, a distributed execution framework that makes it easy to scale machine learning and Python applications.Our conversation included a range of topics, including:Edmon’s role at the ORNL and his experience building applications with Hadoop and Spark.What is distributed online learning?Why they started using Ray to build distributed online learning applications.Two important use cases: suicide prevention among US veterans and infectious disease surveillance.Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.
3/5/202035 minutes, 14 seconds
Episode Artwork

What businesses need to know about model explainability

In this episode of the Data Exchange I speak with Krishna Gade, founder and CEO at Fiddler Labs, a startup focused on helping companies build trustworthy and understandable AI solutions. Prior to founding Fiddler, Krishna led engineering teams at Pinterest and Facebook.Our conversation included a range of topics, including:Krishna’s background as an engineering manager at Facebook and Pinterest.Why Krishna decided to start a company focused on explainability.Guidelines for companies who want to begin working on incorporating model explainability into their data products.The relationship between model explainability (transparency) and security (ML that can resist adversarial attacks).Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.
2/27/202036 minutes, 10 seconds
Episode Artwork

Scalable Machine Learning, Scalable Python, For Everyone

In this episode of the Data Exchange I speak with Dean Wampler, Head of Developer Relations at Anyscale, the startup founded by the creators of Ray. Ray is a distributed execution framework that makes it easy to scale machine learning and Python applications. It has a very simple API and as someone who uses both Python and machine learning, Ray has been a wonderful addition to my toolbox.  Dean has long been one of my favorite architects, speakers and teachers, and we have known each other since the early days of Apache Spark. He has authored numerous books and is known for his interest in Scala and programming languages, as well as in software architecture.Our conversation spanned many topics, including:What is Ray and why should someone consider using it?The first Ray Summit (May 27-28 in San Francisco)Dean’s first impressions of Ray, and his journey from Scala to Python.An update on Ray’s core libraries, Ray on Windows, and distributed training with Ray.Detailed show notes can be found on The Data Exchange web site.For more on Ray and scalable machine learning & Python, come hear from Dean Wampler, Michael Jordan, Ion Stoica, Manuela Veloso, Wes McKinney and many other leading developers and researchers at the first Ray Summit in San Francisco (May 27-28).
2/20/202035 minutes, 45 seconds
Episode Artwork

Computational humanness, analogy and innovation, and soft concepts

In this episode of the Data Exchange I speak with Dafna Shahaf, Associate Professor at the School of Computer Science and Engineering, the Hebrew University of Jerusalem. She also runs the hyadata lab, a research group that consistently produces unique and interesting projects at the intersection of computer science, data, and the social sciences.Our conversation included a range of topics, including:Computational analogy: Dafna and her students mine online sources like patent filings, research papers, and data from crowdsourcing platforms focused on innovation, and in the process they produce tools that should be of interest to innovation officers and members of innovation labs.Soft Concepts: Dafna has continued her work on computational humor, and along with her students, they have new tools for automatically finding trivia facts in Wikipedia.An upcoming workshop on Innovative Ideas in Data Science (April 20th in Taipei; the deadline to submit proposals is: 21 February 2020).Detailed show notes can be found on The Data Exchange web site.
2/13/202033 minutes, 38 seconds
Episode Artwork

Building domain specific natural language applications

In this episode of the Data Exchange I speak with David Talby, co-creator of Spark NLP, an open source, highly scalable, production grade natural language processing (NLP) library. Spark NLP has become one of the more popular NLP libraries and is available on PyPI, Conda, Maven, and Spark Packages. With recent advances in research in large-scale natural language models, there is strong interest in domain specific natural language applications. Besides their work on Spark NLP, David and his collaborators are building natural language models tuned specifically for healthcare applications.Our conversation spanned many topics, including:Spark NLP: its current status and some common and surprising use cases.Recent developments in NLP research and their implications for companies.Spark NLP for HealthcareDetailed show notes can be found on The Data Exchange web site.
2/6/202033 minutes, 9 seconds
Episode Artwork

The state of privacy-preserving machine learning

In this episode of the Data Exchange I speak with Morten Dahl, research scientist at Dropout Labs, a startup building a platform and tools for privacy-preserving machine learning. He is also behind TF Encrypted, an open source framework for encrypted machine learning in TensorFlow.  The rise of privacy regulations like CCPA and GDPR combined with the growing importance of ML has led to a strong interest in tools and techniques for privacy-preserving machine learning among researchers and practitioners. Morten brings the unique perspective of being a longtime security researcher who has also worked as a data scientist in industry.Our conversation spanned many topics, including:Morten’s unique background as an experienced security researcher, developer, and data scientist.The current state of TF Encrypted.Federated learning (FL) and secure aggregation for FL.Privacy-preserving ML solutions will employ a variety of techniques, and thus we also discussed related topics such as differential privacy, homomorphic encryption, and RISELab’s stack for coopetitive learning (MC2).Detailed show notes can be found on The Data Exchange web site.
1/30/202042 minutes, 15 seconds
Episode Artwork

Taking messaging and data ingestion systems to the next level

Sijie Guo on how Apache Pulsar is able to handle both queuing and streaming, and both online and offline applications.In this episode of the Data Exchange I speak with Sijie Guo, founder of StreamNative, a new startup focused on making enterprise messaging technologies - specifically Apache Pulsar - easy to use on the cloud. Sijie was previously a cofounder of Streamlio (acquired by Splunk) and prior to that he led the messaging team at Twitter. He is also the main organizer behind the Pulsar Summit (April in San Francisco), a new conference whose Call for Speakers closes on January 31st.  Our conversation spanned many topics, including:The role of messaging in modern data applications and platforms.The two main types of messaging applications: queuing and streaming.Apache Pulsar as a unified messaging platform, able to handle both queuing and streaming, and both online and offline applications.A status update on Apache Pulsar.Detailed show notes can be found on The Data Exchange web site.
1/23/202038 minutes
Episode Artwork

Business at the speed of AI: Lessons from Rakuten

The Data Exchange Podcast: Bahman Bahmani on attracting and retaining talent, and the importance of delivery-oriented teams.In this episode of the Data Exchange I speak with Bahman Bahmani, VP of Data Science and Engineering at Rakuten, a large Japanese ecommerce and online retail company. When I first met Bahman several years ago, he was finishing up his Computer Science PhD at Stanford, and at the time he was giving technical talks on machine learning algorithms and their applications to computer security. Today he leads a large team at Rakuten, and in my opinion he has established an organizational structure, processes and an AI practice that other companies should study.Our conversation spanned many topics, including:The impact that AI, machine learning, and data have had on Rakuten’s businesses.Attracting, nurturing, and retaining talent in an environment when data scientists, data engineers, and analysts who all have many other options.The trio of strategic options: operational excellence, product leadership, customer intimacy.Organization and culture, including key roles within an AI practice.The power of delivery-oriented teams with end-to-end responsibility.Detailed show notes can be found on The Data Exchange web site.
1/16/202041 minutes, 16 seconds
Episode Artwork

The combination of the right software and commodity hardware will prove capable of handling most machine learning tasks

In this episode of the Data Exchange I speak with Nir Shavit, Professor of EECS at MIT, and cofounder and CEO of Neural Magic, a startup that is creating software to enable deep neural networks to run on commodity CPUs (at GPU speeds or faster). Their initial products are focused on model inference, but they are also working on similar software for model training.Our conversation spanned many topics, including:Neurobiology, in particular the combination of Nir’s research areas of multicore software and connectomics – a branch of neurobiology.Why he believes the combination of the right software and CPUs will prove capable of handling many deep learning tasks.Speed is not the only factor: the “unlimited memory” of CPUs are able to unlock larger problems and architectures.Neural Magic’s initial offering is in inference, model training using CPUs is also on the horizon.Detailed show notes can be found on The Data Exchange web site.
1/9/202030 minutes, 23 seconds
Episode Artwork

Key AI and Data Trends for 2020

In this episode of the Data Exchange, I speak with my podcast co-organizer Mikio Braun, data scientist at GetYourGuide, and a former machine learning researcher and data architect. Mikio and I go out on a limb and speculate about new trends in AI and Data that we think people should pay attention to in 2020.Our conversation spanned many topics, and we listed trends in:Models: reinforcement learning, deep learning, language models, and related topics.Applications: including emerging use cases for reinforcement learning.Infrastructure and Tools: end-to-end machine learning platforms, the importance of distributed computing, etc.Managing risks: privacy, security, safety, fairness, etc.Emerging technologies to watch for in 2020.Detailed show notes can be found on The Data Exchange web site.
12/26/201936 minutes, 26 seconds
Episode Artwork

The evolution of TensorFlow and of machine learning infrastructure

In this episode of the Data Exchange I speak with Rajat Monga, one of the founding members of the TensorFlow Engineering team. Up until recently Rajat was the engineering manager for TensorFlow at Google. Our conversation spanned many topics, including:TFX, a production scale machine learning platform based on TensorFlow.Distributed training.MLIR (Multi-Level Intermediate Representation), “a representation format and library of compiler utilities that sits between the model representation and low-level compilers/executors that generate hardware-specific code.”Deep learning in the enterprise.The state of machine learning infrastructure.[full show notes can be found on the Data Exchange web site.]
12/12/201936 minutes, 24 seconds
Episode Artwork

Building large-scale, real-time computer vision applications

In this episode of the Data Exchange I speak with Reza Zadeh, founder and CEO of Matroid, a startup focused on making computer vision applications easy to build and deploy. Reza is also an adjunct professor at Stanford.This particular conversation spanned many topics pertaining to computer vision, including:Challenges in building large-scale, real-time computer vision applications.Robustness of computer vision applications (adversarial attacks, deepfakes).Impact of computer vision technologies on society: security, privacy and surveillanceWe also preview the upcoming 2020 edition of the ScaledML conference: Reza is the main organizer behind one of my favorite conferences in the SF Bay Area.[full show notes can be found on the Data Exchange site.]
11/26/201940 minutes, 19 seconds