Data Crunch

English, Sciences, 1 season, 81 episodes, 1 day, 4 hours, 46 minutes

Data Crunch

English, Sciences, 1 season, 81 episodes, 1 day, 4 hours, 46 minutes

About

If you want to learn how data science, artificial intelligence, machine learning, and deep learning are being used to change our world for the better, you’ve subscribed to the right podcast. We talk to entrepreneurs and experts about their experiences employing new technology—their approach, their successes, their failures, and the outcomes of their work. We make these difficult concepts accessible to a wide audience.

Data Science in Manufacturing: On-Demand Designer Materials at Commodity Scale

Welcome to another exciting episode of Data Crunch! In this episode, we dive deep into the world of data analytics and manufacturing with our special guests, Alex Reid and Jay Minutrie. Both Alex and Jay bring unique entrepreneurial and data-focused experiences to the table, making them the perfect duo for an engaging conversation about the future of manufacturing.We start with the origin story of their innovative technology, which began as an idea in Alex's father's laboratory at Tulane University. Alex, a young entrepreneur, teamed up with Jay, a seasoned industry veteran, to bring this patented technology to the market. Their complementary backgrounds served as the foundation for their successful partnership, leading to significant advancements in the manufacturing industry.As we continue the conversation, Alex and Jay discuss the challenges and opportunities they faced while introducing their hard tech into the industrial space, and how they managed to successfully navigate these obstacles with the help of a strategic acquisition by Yokogawa. The pair also share insights on how they plan to leverage their new partnership to drive even more innovation and growth in their industry.In this episode, we also explore the concept of on-demand designer materials at commodity scale. Alex shares his vision of a future where bespoke materials can be produced for specific customers at an affordable cost, leading to greater efficiency, reduced waste, and the ability to create customized products tailored to individual needs. This groundbreaking concept has the potential to revolutionize the manufacturing industry, and we dive into what it could mean for industry players and end-users alike.Alongside this fascinating discussion, we also touch on the importance of mentorship for young entrepreneurs and the value of having experienced individuals to guide and support you through the challenges of starting a business. Alex and Jay's unique partnership serves as an inspiring example of how collaboration and shared knowledge can lead to incredible success.Furthermore, our guests delve into the impact of their technology on the biopharma space, highlighting the potential for advancements in life-saving drugs and personalized medicine. With opportunities for growth in both the polymer and biopharma sectors, Alex and Jay's enthusiasm for the future is infectious, and their determination to make a difference is truly inspiring.So, join us as we explore the world of data science in manufacturing, the potential for on-demand designer materials, and the inspiring journey of two entrepreneurs who are pushing the boundaries of innovation. Don't miss this enlightening episode of Data Crunch, filled with valuable insights, engaging conversations, and a glimpse into the future of manufacturing.

4/3/2023 • 14 minutes, 32 seconds

Creating a Database for AI with Activeloop

Working with structured and semistructured data can be hard, but it's currently much easier than working with unstructured data, like images, video, audio, and text. We talk with Davit Buniatyan, CEO of Activeloop, who chats with us about how he works to make unstructured data for machine learning easier and faster to work with.

3/24/2022 • 27 minutes, 50 seconds

Streamlining Construction with AI

What's it like to build an AI product in an industry that still uses outdated project management technology and has no clear conceptual model? After receiving his PhD at Stanford, René Morkos speaks to this very situation. He describes his journey building an AI product in the construction industry.

12/17/2021 • 24 minutes, 14 seconds

The Future of Unstructured Data with Graviti

After graduating from the University of Pennsylvania with a master’s degree in artificial intelligence/robotics, Edward Cui was one of the first Uber self-driving car engineers. He’s had a lot of experience working with unstructured data and shares how we can increase the efficiency of modeling unstructured data by an order of magnitude.

10/29/2021 • 29 minutes, 38 seconds

Data Strategy in the Education Sector

What is the secret culprit behind overworked teachers and administrators in much of the educational system? We're joined by one of Data Crunch's finest, James Thomas, who tells from both a technological and personal standpoint the real difficulties faced by our teachers and students, and how the right approach to data can solve many of their problems.

10/1/2021 • 17 minutes, 41 seconds

CEOs: Here Is What You Should Know about GPT-3

If you haven't heard, GPT-3 is a machine learning model that can write text. Text that looks like a human wrote it—almost out of thin air. The real opportunity is in the ease of use. GPT-3 doesn't replace your writers. It augments them, making writing faster and more accessible without compromising your results. The last thing you want to do is compromise the quality of your content because copy is an exponential business multiplier. The more you can apply it, the more business you'll get. Despite advances in technology, one thing that hasn't changed is that people still talk about products and services—and what they say matters.

9/1/2021 • 6 minutes, 13 seconds

Cyber Security in Higher Ed

Higher education institutions house lots of important data that bad actors can access and sell on the dark web, like students' social security numbers, financial aid information, and national security research. Protecting this information should be a high priority for institutions, but it's not always easy to apply best practices and enforce compliance measures. We chat with Brandon Sherman about these issues.

7/31/2021 • 18 minutes, 50 seconds

GE Aviation's Dinakar Deshmukh Discusses Data

As the VP of Data Science and Analytics for GE Aviation, Dinakar Deshmukh talks about how he, and the large team he is over, solve big problems internally and externally by splicing the power of data science with deep domain knowledge.

6/28/2021 • 17 minutes, 16 seconds

Telmo Silva Talks ClicData

Telmo Silva created ClicData, an end-to-end SAAS BI platform, which as he describes, is the little guy coming up in the BI platform world. He talks about how his company was started, where it’s been, and where it’s going with cutting-edge R&D. He also offers additional thoughts on the role of data in the business world today.

5/4/2021 • 30 minutes, 24 seconds

Pricing with Cactus Raazi

Keeping quality customers is the aim of nearly every healthy business. Cactus Raazi challenges the typical methods of doing this and suggests alternative data-focused pricing strategies in order for businesses to survive in the future.

4/16/2021 • 27 minutes, 11 seconds

AI Making Developers more Effective

Robin Purohit talks to us about how he and his company are creating AI tools to help developers be more effective. Learn what their approach is, how they're training their models, and where they're headed in the future.

3/25/2021 • 26 minutes, 6 seconds

Overcoming Cultural Hurdles in Tech

2/27/2021 • 23 minutes, 39 seconds

Traffic Equilibrium and a PhD

1/30/2021 • 25 minutes, 17 seconds

Machine Learning and Flight with Ian Cassidy

Ian Cassidy: When you did a PCA, a principal component analysis, like, it was like beautiful. There was, like, a red circle in the middle of, you know, the blue on purchase, you know, data points. And there were the red purchase ones and they were all clustered together. It was, it was really interesting. And like the, the machine learning model had a really good time trying to predict that the ones in that red cluster where the things that people were were interested in purchasing. Ginette: I'm Ginette, Curtis: and I'm Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company. If you want to become the type of tech talent we talk about on our show today, you’ll need to master algorithms, machine learning concepts, computer science basics, and many other important concepts. Brilliant is a great place to start digging into these. The nice thing about Brilliant is that you can learn in bite-sized pieces at your own pace, and with a bit of consistent effort, you can tackle some really tough subjects. With 60+ courses that combine story-telling, code-writing, and interactive challenges, Brilliant helps develop the skills that are crucial to school, job interviews, and careers. Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Now onto our show. We’ve waited to publish today’s episode because Covid has taken a toll on the travel industry and lots of things have changed since we recorded this episode, but there’s good information in this episode, so we don’t want to wait too long to publish it. Hopefully 2021 changes the travel industry’s fortunes and this information becomes even more applicable. So today we chat with Ian Cassidy, former senior data scientist at Upside Business Travel. Ian: I'm Ian Cassidy. And my interests are in the machine learning optimization realm, since I have experience with that from my grad school days, and a little bit about Upside is we are a travel company, travel management company. We offer a product that is no fees, 100% free. And in fact, if you spend over a hundred thousand dollars booking travel on our website, we offer a 3% cash back, as well as free customer service, 24/7, no contracts. So that's you sign up with us, no contracts, you get all of this as soon as you sign up. We are a one-stop shop to book and manage all of your travel. In one place, we offer flights, hotels, rental cars, and we also offer expense integration and reporting for companies looking to, to manage all of their, their travelers and, and their expenses for that.Curtis: Right on. We talked before about the journey that your company has gone through, uh, to figure out how to best use data, you know, how to target and what really works with, with machine learning and things like this. So I'd love to just talk a little bit about that: where you guys started and how you guys made some decisions, what you learned along the way and what you're, what you're up to from a data science perspective.Ian: Yeah, sure. So, uh, you know, like you mentioned, things have changed quite a bit at Upside. We started off as a B2C company where we were targeting what we were calling do it yourself travelers. You did not have to be logged into our site in order to start doing a search and book flights or hotels. So that kind of made it interesting from a data collection perspective. We had like some unique IDs about who the people were that were doing the searching, but it was, it was largely kind of, you know, we didn't really know much about you when you, when you were searching. So when we started, one of the main things that we were trying to improve upon was our sorting of inventory...

12/31/2020 • 22 minutes, 7 seconds

Implementing ML Algorithms with Ylan Kazi

12/1/2020 • 26 minutes, 28 seconds

Hiring Top Tech Talent

10/31/2020 • 18 minutes, 48 seconds

Making Data Assets Profitable with VDC

Many companies are sitting on data assets that could be revenue streams for them, without knowing it. Matt Staudt of VDC discusses making latent data profitable.Ginette: I'm Ginette, Curtis: and I'm Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company. Ginette: Today, we chat with the president and CEO at the Venture Development Center, Matt Staudt. Matt Staudt: The company that I'm with is VDC, Venture Development Center. Basically VDC is an organization that works in the alternative big data, bringing buyer and seller together. So we have a unique perspective on available data assets that are out in the marketplace and a unique perspective of the companies that utilize them, and what they're specifically looking for in the way of points of, uh, value for various data assets. My background was originally in the marketing and advertising area, where I owned a company for 20 years, IMG, Interactive Marketing Group. I left that in 2007 and joined this, which was more or less of a lifestyle organization. And we made it a full-fledged organization company back in 2010.Curtis: Now, when you say data assets, can you put a little bit of definition around that for the listeners? Just so they understand how you define a data asset? 'Cause I imagine there may be some things that you think are valuable that maybe they haven't thought of, or maybe it'll help expand our thinking around what a data asset is.Matt: Yeah, sure. In my, in my terminology "data asset" basically falls into eight different categories, where assets basically come from within the information world. So they could be things like transaction data or crowdsource data. They could be things like search data or social data sets. They fall into various categories, traditional data, meaning assets that are business to business or business to consumer generally aggregated by large companies that most everybody's heard of Dun & Bradstreet, Infogroup, Axcium, the credit bureaus, et cetera. Alternative data in our world are companies that have unique data points, unique. They're collecting unique pieces of information, usually as a byproduct of their core business. And we look at the assets that the data sets, the actual data points that they collect. And we figure out if there might be something of value to take to the marketplace, usually to the large consumers of the data, the big aggregators that I previously mentioned, but oftentimes it also fits well with some of our mid-tier players. And we have a significant amount of relationships in the brand grouping, meaning large organizations that they themselves are looking to try and take advantage of big data and utilize data in sales, marketing operations, in order to transform or help to administer certain activities that they have going on.Curtis: Do you find that this is maybe industry specific, like for example, a big insurance company, or if you're in healthcare or something like this, it tends to be more data intensive that you see more activity there or, or is this really applicable across the board? What kind of industries do you find have a lot of applications?Matt: Yeah. Well, it's interesting on the surface, you certainly think that there's probably industries that would have a larger appetite and a larger need for data than, than other organizations, but going, you know, through the list of companies that we've helped over the last 15 or 20 years, it really runs the gamut. I mean, we've worked with insurances, you mentioned insurance, insurance companies. I mentioned credit bureaus. We work with credit bureaus, risk and fraud, sales and marketing, sometimes large brands within those retail environments. So it really truly has run the gamut for us. There's,

9/30/2020 • 23 minutes, 33 seconds

Machine Learning with Max Sklar

8/28/2020 • 20 minutes, 57 seconds

Think Differently with Graph Databases

7/31/2020 • 31 minutes, 26 seconds

Data, Epidemiology, and Public Health

With recent events being what they are, epidemiology has come into the spotlight. What do epidemiologists do and how does data shape their everyday experience? Sitara and Mee-a from "Donuts and Data" fill us in. Ginette: I'm Ginette, Curtis: and I'm Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. Many people are on the lookout for online math and science resources right now, particularly data and statistics courses, and whether you're a student looking to get ahead, a professional brushing up on cutting-edge topics, or someone who just wants to use this time to understand the world better, you should check out Brilliant. Brilliant’s thought-provoking math, science, and computer science content helps guide you to mastery by taking complex concepts and breaking them up into bite-sized understandable chunks. You'll start by having fun with their interactive explorations, over time you'll be amazed at what you can accomplish. Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Now onto the show. Curtis: I'd like to welcome Sitara and Mee-a from the Instagram account Donuts and Data to talk to us today. I guess let's just have you guys introduce yourselves, as opposed to me trying to introduce you cause you know what you do better than I do. So maybe we just have some introductions. Sitara: So I'm Sitara one half of Donuts and Data. I'm a PhD student in epidemiology at the University of Texas Health Science Center. I'm also a research assistant in a lab that I work in. Mee-a: And I'm Mee-a. I am an infectious disease epidemiologist that works in the public sector. I actually met Sitara through the lab that she's currently working in. Curtis: Nice. And I'm excited to have you guys on. I just, I think epidemiology is a really interesting space, especially with what, you know, with what's going on now with COVID. I think it's more pertinent than it ever has been. Not that it ever hasn't been pertinent, but maybe it's more top of mind for people. So I'd love maybe just to have you guys level set with everybody, like what is epidemiology. There's probably some confusion about what that is and maybe how you guys got into it. And then we can get into what your day to day is and, and what it's all about. Sitara: So, epidemiology, I think everyone's kind of understanding is setting patterns of disease in the, in the human population. And so in that sense, what Mee-a and I do are the same, but instead of studying infectious diseases or the natural science part of epidemiology, what I focus on is how human behavior contributes to those patterns of disease. So I look for patterns in data associated like demographics or just behaviors, diet, nutrition, and how that contributes to getting diseases. Mee-a: For me in the public sector, it's going to be a lot of looking at incidents, rates of infectious diseases. It . . . primarily with COVID-19 right now, and just different ways that we can try to possibly implement infection prevention measures. So we are dealing a little bit more with, I don't want to say the medical side of it because we aren't clinicians, but we are dealing more with the medical side of, of the infectious disease than we are with, with the data compared to when I was in academia, at least. Curtis: So take us through maybe the end goal, right? So what you guys are working on. You're hoping to come out with, I think, some recommendations for people to, to take maybe a better understanding of how the disease spreads, so we get in front of it. What does that look like? Mee-a: I always thought that epidemiology's gold standard of what we try to achieve is probably..

7/17/2020 • 29 minutes, 40 seconds

Vast ETL Efficiency Gain with Upsolver

7/1/2020 • 22 minutes, 39 seconds

Data Flexibility in Healthcare

5/31/2020 • 27 minutes, 27 seconds

Education and AI

For David Guralnick, education, AI, and cognitive psychology have always held possibility. With many years of experience in this niche, David runs a company that designs education programs, which employ AI and machine learning, for large companies, universities, and everything in between. David Guralnick: Somehow what's happened in a lot of the uses of technology and education to this point is we've taken the mass education system that was there only to solve a scalability problem, not because it was the best educational method. So we've taken that and now we've scaled that even further online because it's easy to do and easy to track. Ginette Methot: I’m Ginette, Curtis Seare: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. Curtis: First off, I'd like to thank everyone who has taken the Tableau fundamentals zombie course that we announced the last episode. We've been getting a lot of great feedback from you. It's fun to see how people are enjoying the course and thinking that it's fun and also clear and it's helping them learn the fundamentals of Tableau. The reason we made that course is because Tableau and data visualization are really important skills. They can help you get a better job, they can help you add value to your organization. And so we hope that the course is helping people out. Also, according to the feedback that we have received, we've made a couple of enhancements to the course, so there are now quizzes to test your knowledge. There are quick tips with each of the videos to help you go a little bit further than even what the videos teach. We've also included a way to earn badges and a certificate so that you can show off your skills to your employer or whoever. And we've also thrown in a couple other bonuses. One is our a hundred plus page manual that we actually use to train at fortune 500 companies so that'll have screenshots and tutorials and tips and tricks on the Tableau fundamentals. And we have also included a checklist and a cheat sheet, both of which we actually use internally in our consulting practice to help us do good work. One of them will help you know which kind of chart to use in any given scenario that you may encounter, whether that's a bar chart or a scatter plot or any number of other more advanced charts. And the other is a checklist that you can run down and say, "do I have this, this, this and this in my visualization before I take it to present to someone to make sure that that's going to be a good experience." So hopefully all of that equals something that is really going to help you guys. And something also where you can learn Tableau and have fun doing it, saving the world from the zombie apocalypse, and the price has risen a little bit since last time. But for our long-time listeners here, if you use the code "podcastzombie" without any spaces in the middle, then that'll go ahead and take off 25% of the list price that is currently on the page. So hopefully more of you guys can take it and keep giving us feedback so we can keep improving it. And we would love to hear from you Ginette: Now onto the show today. We chat with David Guralnick, president and CEO of kaleidoscope learning. David: I've had a long time interest in both education and technology going way, way back. I was, I was lucky enough to go to an elementary school outside of Washington DC called Green acres school in Rockville, Maryland, which was very project based. So it was non-traditional education. You worked on projects, you worked collaboratively with people, your teachers' role was almost as much an advisor and mentor as a traditional teacher. It wasn't person in front of the room talking at you, and you learn how to, you know,

4/24/2020 • 27 minutes, 38 seconds

Upskilling from Home

4/1/2020 • 13 minutes, 25 seconds

How to Reduce Uncertainty in Early Stage Venture Funding

2/29/2020 • 24 minutes, 11 seconds

Data in Healthcare with Ron Vianu

If you've ever tried to find a doctor in the United States, you likely know how hard it is to find one who's the right fit—it takes quite a bit of research to find good information to make an informed choice. Wouldn't it be nice to easily find a doctor who is the right fit for you? Using data, Covera Health aims to do just that in the radiology specialty.Ron Vianu: I think the tools are really improving year over year to a significant degree, but like anything else, the tools themselves are only as useful as how you apply them. You can have the most amazing tools that could understand very large datasets, but you know how you approach looking for solutions, I think can dramatically impact. Do you yield anything usefulGinette Methot: I’m Ginette,Curtis Seare: and I’m Curtis,Ginette: and you are listening to Data Crunch,Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.If you're a business leader listening to our podcast and would like to move 10 times faster and be 10 times smarter than your competitors, we're running a webinar on February 13th where you can learn how to do this and more. Just go to datacrunchcorp.com/go to sign up today for free. If you're a subject matter expert in your field, like our guest today, and you're looking to understand data science and machine learning, brilliant.org is a great place to dig deeper. Their classes, help you understand algorithms, machine learning concepts, computer science basics, and many other important concepts in data science and machine learning. The nice thing about brilliant.org is that you can learn in bite-sized pieces at your own pace. Their courses have storytelling, code writing and interactive challenges, which makes them entertaining, challenging, and educational. Sign up for free and start learning by going to brilliant.org/data crunch. And also the first 200 people that go to that link will get 20% off the annual premium subscription. Today we chat with Ron Vianu, the CEO of Covera Health. Let's get right to it. Curtis: What inspired you to get into what you're doing, uh, to start Covera health? Where did the idea come from and what drives you? So if we could start there and learn a little bit about you and the beginnings of Covera health, that would be great. Ron: Sure. Uh, and I, I guess it's important to state that, you know, I'm a problem solver by nature, and my entire professional career, I've been a serial entrepreneur building companies to solve very specific problems. And as it relates to Covera, the, the Genesis of it was understanding that there were two problems in the market with respect to, uh, the healthcare space, which is where we're focused that were historically unsolved and there were no efforts really to solve them in, from my perspective, a data-driven way. And that was around understanding quality of physicians that is predictive to whether or not they'll be successful with individual patients as they walk through their practice. And so if you, and we're focused on the world of radiology, which today is highly commoditized and what that means is that there was a presumption that wherever you get an MRI or a CT study for some injury or illness, it doesn't matter where you go. It's more about convenience and price perhaps. Whereas what we understand given our research and the, the various things that we've published since our beginning is that one, it's like every other medical specialty. It's highly variable. Two, since radiology supports all other medical specialties in a, as a tool for diagnosis, diagnostic purposes, any sort of variability within that specialty has a cascading effect on patients downstream. And so for us, the beginning was, is this something that is solvable through data?

1/30/2020 • 20 minutes

Data Literacy with Ben Jones

We talk with Ben Jones, CEO of Data Literacy, who's on a mission to help everyone understand the language of data. He goes over some common data pitfalls, learning strategies, and unique stories about both epic failures and great successes using data in the real world.Ginette Methot: I’m Ginette,Curtis Seare: and I’m Curtis,Ginette: and you are listening to Data Crunch,Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.It’s becoming increasingly important in our world to be data literate and to understand the basics of AI and machine learning, and Brilliant.org is a great place to dig deeper into this and related topics. Their classes help you understand algorithms, machine learning concepts, computer science basics, and many other important concepts in data science and machine learning. The nice thing about Brilliant.org is that you can learn in bite-sized pieces at your own pace. Their courses have storytelling, code-writing, and interactive challenges, which makes them entertaining, challenging, and educational.Sign up for free and start learning by going to Brilliant.org/DataCrunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription.Curtis: Ben Jones is here with me on the podcast today. This is a couple months coming. Excited to have him on the show. He's well known in the data visualization community, he's done a lot of great work there. Uh, used to work for Tableau. Now he's off doing his own thing, has a company called Data Literacy, which is interesting. We're going to dig into that and also has a new book out called Avoiding Data Pitfalls. So all of this is really great stuff and we're happy to have you here, Ben. Before we get going, just give yourself a brief introduction for anyone who may not know you and we can go from there. Ben: Yeah, great. Thanks Curtis. You mentioned some of the highlights there. I uh, worked for Tableau for about seven years running the Tableau public platform, uh, in which time I wrote a book called Communicating Data with Tableau. And the fun thing was for me that launched kind of a teaching, um, mini side gig for me at the University of Washington, which really made me fall in love with this idea of just helping people get excited about working with data. Having that light bulb moment where they feel like they've got what it takes. And so that's what caused me to really want to lead Tableau and launch my own company Data Literacy at dataliteracy.com which is where I help people, you know, as I say, learn the language of data, right? Whether that's reading charts and graphs, whether that's exploring data and communicating it to other people through training programs to the public as well as working one on one with clients and such. So it's been a been an exciting year doing that. Also, other things about me, I live here in Seattle, I love it up here and go hiking and backpacking when I can and have three teenage boys all in high school. So that keeps me busy too. And it's been a fun week for me getting this book out and seeing it's a start to ship and seeing people get it. Curtis: Let's talk a little bit about that because the book, it sounds super interesting, right? Avoiding Data Pitfalls, and there are a lot of pitfalls that people fall into. So I'm curious what you're seeing, why you decided to write the book, how difficult of a process it was and then some of the insights that you have in there as well. Ben: Yeah, so I feel like the tools that are out there now are so powerful and way more so than when I was going to school in the 90s, and it's amazing what you can do with those tools. And I think also it's amazing that it's amazing how easy it is to mislead yourself. And so I started realizing that that's sometim..

12/19/2019 • 29 minutes, 59 seconds

Social Media and Machine Learning

How do you build a comprehensive view of a topic on social media? Jordan Breslauer would say you let a machine learning tool scan the social sphere and add information as conversations evolve, with help from humans in the loop.Ginette Methot: I’m Ginette,Curtis Seare: and I’m Curtis,Ginette: and you are listening to Data Crunch,Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.Ginette: Many of you want to gain a deeper understanding of data science and machine learning, and Brilliant.org is a great place to dig deeper into these topics. Their classes help you understand algorithms, machine learning concepts, computer science basics, probability, computer memory, and many other important concepts in data science and machine learning. The nice thing about Brilliant.org is that you can learn in bite-sized pieces at your own pace. Their courses have storytelling, code-writing, and interactive challenges, which makes them entertaining, challenging, and educational.Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription.Let’s get into our conversation with Jordan Breslauer, senior director of data analytics and customer success at social standards. Jordan: My name is Jordan Breslauer. I'm the senior director of data analytics and customer success at social standards. I've always been a data geek as it pertains to sports. I think of Moneyball when I was younger, I always wanted to be kind of a the next Billy Bean and I, when I started working for sports franchises right after high school and early college days, I just realized that, that type of work culture is wasn't for me, but I was so, so into trying to answer questions with data that had no previously clear answer, you know? I loved answering subjective questions like, or what makes the best player or how do, how do I know who the best player is? And I thought what was always fun was to try and bring some sort of structured subjectivity to those sorts of questions through using data. And that's really what got me passionate about data in the first place. But then I just started to apply it to a number of different business questions that I always thought were quite interesting, which have a great deal of subjectivity. And that led me to Nielsen originally where my main question that I was answering on a day-to-day basis, what was, what makes a great ad? Uh, what I found though is that advertising at least, especially as it pertains to TV, is really where brands were moving away from and a lot of the real consumer analytics that people were looking for were trying to underpin people in their natural environment, particularly on social media. And I hadn't seen any company that had done it well. Uh, and I happened to meet social standards during my time at Nielsen and was truly just blown away with this ability to essentially take a large input of conversations that people were happening or happening, I should say, and bring some sort of structure to them to actually be able to analyze them and understand what people were talking about as it pertained to different types of topics. And so I think that's really what brought me here was the fascination with this huge amount of data behind the ways that people were talking about on social. And the fact that it had some structure to it, which actually allowed for real analytics to be put behind it. Curtis: It's a hard thing to do though. Right? You know, to answer this question of how do we extract real value or real insight from social media and you'd mentioned historically or up to this point, companies that that are trying to do that missed the mark.

11/21/2019 • 22 minutes, 30 seconds

Deep Learning, Microwaves, and Bugs

Sometimes AI and deep learning are not only overkill, but also a subpar solution. Learn when to use them and when not. Diego from Northwestern's Deep Learning Institute discusses practical AI and deep learning in industry. He covers insights on how to train models well, the difference between textbook and real AI problems, and the problem of multiple explanations.Diego Klabjan: One aspect of the problem it has to have in order to be, to be amenable to AI is complexity, right? So if you have, if you have a nice data with, I don't know, 20, 30 features that you can quote, put in a spreadsheet, right? So then, then AI is going to be an overkill and it's actually sort of not, is going to be an overkill. It's going to be a subpar solution.Ginette Methot: I’m Ginette, Curtis Seare: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.We’d like to hear what you want to learn on our future podcast episodes, and so we’re running a give away until our next podcast episode comes out. We’re giving away our book Simple Predictive Analytics. All you have to do is go on to LinkedIn and tag The Data Crunch Corporation in a post with your suggestion, and we’ll randomly pick a winner from those who submit. If you win and you’re in the US, we’ll send you a physical copy, and if you’re in another country, we’ll send you an electronic copy. Can’t wait to hear from you.Today, we chat with Professor Diego Klabjan the director of the Master of Science in Analytics and director of the Deep Learning Lab at Northwestern University. Diego: My name is Diego Klabjan. So I'm a faculty at Northwestern University in the department of industrial engineering and management sciences. I actually spend my entire career in academia. So I graduated from Georgia tech in '99, and then I spent six years at the university of Illinois Urbana-Champaign and got my tenure there. And then I was recruited here at Northwestern as a tenured faculty member a year later. So I'm at Northwestern for approximately 14 years. Yeah, so I'm the director of the master of science in analytics, actually founding director of the master of science in analytics, so I established the master's program back in 2010, and I'm directing it since then. And recently, I also became the director of the center for deep learning, which is a relatively new initiative at Northwestern. Sort of we, we are having discussions for the last year and a half, and about half a year ago, we officially kicked it off with a few founding members. So my expertise is in machine learning and deep learning. So I have, I run sort of a very big research program. So I advise more than 15 PhD students from a variety of, of departments and the vast majority of them do deep learning research. Yeah, so I started, I started deep learning what was around six, seven years ago. So I was definitely not sort of one of the, one of the early or the earliest faculty members conducting, studying, being attached to deep learning. But I wasn't that late to the game either. Right. So I still, I still remember approximately six, seven years ago attending deep learning conferences with like 50 attendees, and now, now those conferences are like 5,000 people. Just astonishing. Curtis: That's crazy. How you've seen that grow.Diego: Yup. Um, yeah, and I'm also, so the last word is ah, I'm also a founder of OPEX analytics, which is a consulting company. I no longer have much to do with the company, uh, but sort of have experience also on the business side. Curtis: Great. So this, uh, the deep learning Institute started about a year or two ago, is that right? Did I understand that right?Diego: Yeah, that's correct. I mean, so we,

11/8/2019 • 18 minutes, 38 seconds

Potential Advantages of Blockchain for Data Scientists

Luciano Pesci is bullish on blockchain and data science. Since blockchain offers a complete historical record, no one can delete or alter prior information written into the record. He sees this characteristic as a massive advantage for data scientists. Luciano Pesci: And the key for data scientists and leaders who are gonna oversee data sciences, you've got to get a narrow enough problem to demonstrate one quick win and I mean in 90 days. If in 90 days you can't come back to the organization and show, "we have made real progress on these metrics in your understanding so that you can make these decisions," they're not going to continue to do it. Ginette Methot: I’m Ginette, Curtis Seare: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.Ginette: No matter what your position in a company is, knowing about data, how it works, and what it can do for you is vital to the success of your organization. Fortunately there are ways for you and those in your organization to learn about data. Brilliant dot org, an online educational resource, has on-demand classes in data basics that can help you understand this growing area, providing you with tools and the framework you need to break up complex concepts into bite-sized chunks. You can sign up for free, preview courses, and start learning by going to Brilliant.org/DataCrunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Ginette: The CEO of Emperitas, Luciano Pesci, joins us today. Let’s get right into the episode. Curtis: What inspired you to get into data? What inspired you to to start the company you're working at now and how'd you get going? Luciano: All of it was a complete accident. Yeah, none of it, not the schooling, the business, none of it was intentional. Curtis: Okay, let's hear about it. Luciano: My first business was actually recording studio and a record label, and I had signed, among other acts, my own band, and we got a management deal, and we went to LA. We started to tour with national acts, and I thought that was going to be my career path without a doubt, and so I didn't take the ACT/SAT at the time, barely graduated high school, and then the band fell apart. And I was like, "well, what am I going to do?" So I went back to school, had a transformative experience, got drawn into economics, and then within economics really found data. Curtis: And what drew you to economics? Luciano: I like studying people. I think it's the most complete picture of people. So there's a lot of other disciplines that sort of dive deeper when it comes to people's psychological characteristics, their behavioral components. But economics was about the entire system and how an individual functions within that bigger system. And the reason I got to data from that was that the key assumption of modern economics is perfect information. So this is usually where critics of what is called the classical model in economics come in and say, "well, you can't have perfect information, so therefore you can't have optimizing behavior." And one of the beautiful lessons of the last 20 years, especially with data science is it might not be perfect information, but you can get really good information to make optimized choices. And so the represented that, that method of going into the real world and optimizing all these processes that we were learning about in the textbooks and at the abstract theory level. Curtis: Interesting. And that's, there's not a lot of places, if any, that I know of that teach that approach, right? Or have good coursework around that. Did you kind of figure this out on your own or how'd you, how'd you come to that?

10/22/2019 • 25 minutes, 53 seconds

How to Predict World Events with Predata

There have been some spectacular fails when it comes to looking at Internet traffic, think Google Flu Trends; however, Predata, a company that helps people understand global events and market moves by interpreting signals in Internet traffic, has honed human-in-the-loop machine learning to get to the bottom of geopolitical risk and price movement.Predata uncovers predictive behavior by applying machine learning techniques to online activity. The company has built the most comprehensive predictive analytics platform for geopolitical risk, enabling customers to discover, quantify and act on dynamic shifts in online behavior. The Predata platform provides users with quantitative measurements of digital concern and predictive indicators for different types of risk events for any given country or topic.Dakota Killpack: Over the past few years, we’ve have collected a very large annotated data set about human judgment for how relevant many, many pieces of web content are to various tasks. Ginette Methot: I’m Ginette, Curtis Seare: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.Let’s jump into our episode today with the director of Machine Learning at Predata.Dakota: My name is Dakota Killpack and I'm the director of machine learning atPredata, and Predata is a company that using machine learning to look at the,the spectrum of human behavior online organizes it into useful signals aboutpeople's attention and we use those to influence how people make decisions bygiving them a factor of what people are paying attention to. Because attentionis a scarce cognitive resource. People tend to pay attention only to veryimportant things, If they're about to act in a way that might cause problemsfor our potential clients, they'll, they'll spend a lot of time online doingresearch, making preparations, and by unlocking this attention dimension to webtraffic, we're able to give some unique insights to our clients.Curtis: Can we jump into maybe a concrete use case into what you're talkingabout just to frame and put some details around how someone might use thatservice?Dakota: Absolutely. So one example that I find particularly useful forrevealing how attention works online is looking at what soybean farmers did inresponse to a tariffs earlier this year. So knowing that the, they weren'tgoing to get a very good price on soybeans at that particular moment. A lot ofthem were looking up how to store their grain online and purchasing these verylong grain storage bags, purchasing some obscure scientific equipment needed toinsert big needles into the bags to get a sample for testing the soybeans andmoisture testing devices to make sure they wouldn't grow mold. And all of thesewebpages are things that tend to get very little traffic. And when we see anincrease in traffic to all of them, at the same time, we know that a, a veryinfluential group of individuals, namely farmers, is paying attention to thistopic. Using that we're able to give early warning to our clients.Curtis: Sounds like looking for needles in a haystack of data. Right? So how doyou determine what is a useful bit of information in the context of what yourclients are looking for? Do they kind of have an idea of what you're lookingfor and then you'd go out and search for that or, or does your algorithm findanomalies in the data and then characterize those anomalies so that you canthen report that back? How does it work?Dakota: It’s a mix of both. Because the, the Internet is such a rich andcomplex domain. It's, it's very dangerous to just look for anomalies at scale.There there've been some high profile failures, most notably the Google Flu Trends

10/9/2019 • 16 minutes, 42 seconds

Structuring Your Data Science Dream Team

The way you organize your data science team will greatly affect your business’s outcome. This episode discusses different structures for a data science team, as well as top down versus bottom up approaches, how to get data science solutions into production organically, and how to be part of the business while remaining in contact with other data scientists on the team.Mark Lowe: Having lived through small scale, two people working, to large scale, thousands of people in your organization, the way that you organize the data science team has dramatic effect on its productivity.Ginette Methot: I’m Ginette, and I’m Curtis, and you are listening to Data Crunch, a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.Building effective data science processes is tough. Mode, the data science platform, has compiled three tips to make it a bit easier: don’t over plan, there’s no one process that fits everyone, and waste time. That’s right. Waste time. Read more at mode.com/dsp M O D E.com/D S P.Today we’re going to talk about effective ways you can organize your data science team, and we’ll hear lots of great insights from our guest. Let’s get to it.Mark: My name is Mark Lowe. I’m currently the senior principal data scientist here at Valassis.Curtis Seare: Describe just a little bit about what Valassis does.Mark: So we work with pretty much every major manufacturer retailer in the U.S. Our work kind of runs the gamut in terms of solving problems for them in terms of how do I influence customers. And so we manage a lot of print products that go reach every household, every week and of course a lot of digital products. So everything from display advertising, campaign, search campaign, social. Pretty much any distribution mechanism that can influence customers, we try to use those channels.Curtis: And in working on these problems we talked a little bit about earlier what the approaches for data science. Some people try to bin it in a software development kind of a role, an agile role, and how that usually doesn’t work for data science cause it’s more of an experimental type of a thing. Can you comment on its similarities and differences and how you should be approaching data sites?Mark: I think that’s a great question. Honestly, if you, if you asked me 10 years ago if this was an interesting question, I would have found it very boring. But having, having lived through small-scale, two people working, to large scale, thousands of people in your organization, the way that you organize the data science team has dramatic effect on its productivity, and there’s no one size that fits all. Honestly, you kind of have to cater the organization of the data science team to where the company is. For example, the two common models that are deployed and, and we’ve, we’ve lived in both of them is kinda thinking about data science as an internal consulting group. So I have a a pool of data scientists. Stakeholders throughout the company come to me and ask, they say, “I have this problem. I think it needs data science” and then the data science lead or team.Yes, we do need a data scientist working on that. Here’s a person with that specialty. So kind of farming out individuals on the team to solve particular problems. So it’s a fairly centralized organization and that, you know, there’s a lot of benefits to that. One, you’ve got strong sense of community as a team. Oftentimes you’re very tightly organized together. You function as a data science unit. You can try to make sure that you’re putting the right skillset for the right problem. As you know, as you’ve talked to that, there’s, there is no one definition of data science, there’s no one skillset. So oftentimes the data science team has a mixture of skills across the team,

9/26/2019 • 15 minutes, 30 seconds

The Hidden World of Data Science in Utilities

David Millar is a man bringing analytical solutions to an industry that historically has had little data. But with the explosion of smart devices, that is all changing, and the way utilities operate is as well.David Millar: The way that electricity markets work is that you have what's called the day ahead market. And so the day before, let's say one o'clock tomorrow, markets run, and this is a big optimization problem. Ginette Methot: I'm GinetteCurtis Seare: And I'm CurtisGinette: And you are listening to Data Crunch,Curtis: A podcast about how applied data science, machine learning and artificial intelligence are changing the world.Ginette: Data Crunch is produced by the Data Crunch Corporation and analytics training and consulting company. Ginette: The father of lean startup methodology once said “There are no facts inside the building so get the heck outside.”The utilities industry is no different. Sometimes the facts that’ll make your machine learning career are waiting just outside your office.Read more at mode.com/MLutilities. m o d e dot com slash M L utilities. Ginette: David Millar is a man bringing analytical solutions to an industry that historically has had little data. But with the explosion of smart devices, that's all changing, and the way utilities operate is as well. Let's get into it.David: I'm, ah, Dave Millar. I am the director of resource planning consulting at Ascend Analytics where I lead the research client consulting team. And so my team and I work with utilities primarily to help them make decisions using analytics, regarding their longterm power portfolio. So primarily I read looking at we'll say we're retiring coal plants or retired, retired gas plant. What would we replace it with? Renewable energy. We need batteries. How do we approach these questions using analytics in order to help us come up with the best solution going forward.Curtis: You had talked a little bit about, you sent me some notes about how the, the sector that you're in, the power sector, you know, is kind of slow moving, right? It's not known for these quick changes and innovations, but you are starting to see some things that, that's gonna change this fundamentally. And so if we could jump into that and, and then get your perspective, I'd love to hear about it. David: Yeah, the power sector basically didn't change from the time of once they figured out that we're going to use alternating current that it didn't really change much in the past hundred years, that the model is essentially the same. You have big power stations that are far away from the load centers and then you have this transition network and flow of electricity is really one direction, right, from, from the big power plants to your home. And technology is rapidly changing that and it creates a space to becoming both more digital and more decentralized. So, on the digital front, we, we actually have generation technologies, that don't use anything, any spinning parts, right? so you have solar, solar power, and you have, now we're seeing more and more batteries being connected to solar. And so those are both digital technologies that are increasingly becoming this default, energy source, wind or solar and batteries and and just because the cost of the signals is have, dramatically over the past 10, 10. It's really happened over the past 10 years. And so now renewables are at parity with the more conventional sources of electricity. So gas, power and natural gas power, coal power. Curtis: Is that in terms of like how much energy they're currently producing parity or just effectiveness or efficiency. What is that parity?David: Parity in terms of costs. So, you know, as renewables drop in costs, especially as batteries drop in costs, that means that when, when I look at a problem with my clients, we're comparing, technologies that essentially have the ability, similar attributes,

9/19/2019 • 19 minutes, 5 seconds

The Good Fight against Shadow IT

Simeon Schwarz has been walking the data management tightrope for years. In this episode, he helps us see the hidden organizational and economic impacts that come from leading a data management initiative, and how to understand and overcome the inertia, fears, and status quo that hold good data management back.Simeon Schwarz: Fighting against shadow IT . . . you have to find a way to adopt it, you have to find a way to incorporate it, and you have to find a way to leverage it. You will never be able to completely eliminate it. Ginette Methot: I'm Ginette.Curtis Seare: And I'm Curtis.Ginette: And you are listening to Data Crunch,Curtis: A podcast about how applied data science, machine learning and artificial intelligence are changing the world.Ginette: This might come as a surprise to some, but......tools won’t build a data-driven culture. The right people will. Read more at mode.com/datadrivenculture. m o d e dot com slash data driven culture.Ginette: Today we speak with Simeon Schwarz. He’s been working in data management for over twenty years and owns his own consultancy, Data Management Solutions.Simeon: Being in the data management function, you're de facto seeing the life blood of how the business flows, how the uh, where the information goes, how the decision are made. Curtis: So have you been focused mainly in a, in a specific industry or have you spend a lot in your career? Simeon: I've started in telecom. I've built first cell phone carrier back in my home country. I worked in academia, in a retail, ecommerce, and then 10 years in financial services, most recently, and now I do insurance. So a lot of different fields. Curtis: So you've run the gamut. That's interesting. And now that you've done this in several different fields, do you find that the principles and your approach is basically the same or or is it different depending on the problems that you're trying to solve? Simeon: The approach is the same, and there are two parts to this. We'll talk about what's difficult in this role a little bit further in this conversation. The second part is you really need to understand the domain you're dealing with because, one, if we, if we're talking about data management in general, one of the key functions, one of the key challenges that you're going to be facing is establishing and building your credibility. Without knowledge of the domain. B insurance or financial services or manufacturing or any other field, you simply can't have intelligent conversations with your stakeholders in a way that would lead to good conclusions. So you will absolutely have to know the domain, which is large portion, of your value. Curtis: So as you've gotten into a domain that maybe you weren't as familiar with in a data role, how did you overcome this need to understand the domain better? Simeon: Let's step back and talk about what a data genuinely is right now and specifically talk about data management. You are running a data function or sometimes called data services because what used to be DBA teams or data analysts or various forms is really becoming a practice and looking at it as a practice. You have a certain set of clients, the are paying you for the services, you have certain amount of resources and you trying to optimize those resources to serve your clients better. So what are the challenges that you're going to face in any data management role? So you're in this interesting balance between moving forward very rapidly as well as not destroying what already exists, not destroying the services that are already provided. People have to breath, people have to be able to, to leave. You can't disrupt too much the services that already exist, your reports, your, you know, our auditing work your work with, you know, regulatory agencies. Anything else that the business needs to produce has to continue to happen. The people who are doing their jobs in the current way simil..

9/12/2019 • 22 minutes, 20 seconds

Using Data to Design Tests People Don’t Hate

David Saben is on a mission to make taking tests less painful, and he’s using data to do it. In this episode, he’ll discuss reviving methods developed in 1979 to shorten tests and make them more effective, as well as how to use psychometrics to aid in the design and crafting of an effective test.David Saben: When I see my son who's 11 years old, spending three days and testing when I know there's absolutely no reason for it that you can do that in an hour. Ginette Methot: I'm GinetteCurtis Seare: And I'm CurtisGinette: And you are listening to Data CrunchCurtis: A podcast about how applied data science, machine learning and artificial intelligence are changing the world.The father of lean startup methodology once said “There are no facts inside the building so get the heck outside.”The education industry is no different. Sometimes the facts that’ll make your machine learning career are waiting just outside your office. Read more at mode.com/mledum o d e dot com slash M L e d uGinette: Today we chat with David Saben, the CEO and president of Assessment Systems, an organization innovating psychometrics (the science of assessment)Dave: I originally started my career in telecommunications, uh, bringing voice and data services into institutions and to learning institutions. And then when I realized is, is that connecting universities and for profit schools, you know, connecting them online really created a huge opportunity for learning and really crossing barriers to learn and really meeting learners on their terms with online learning courses. And that kind of brought me through this, this journey with using technology to, to really make better decisions in learning and knowledge and how we do that effectively. And that has started a about a 16 year career focused on that using using data, using e tools to make a better learning environment for everybody and make us more effective in the way that we, we gather information and retain information. And that that's left. Let brought me, um, into several areas. One is in the learning sciences is how do you, how do you deliver learning content more effectively, but also in the assessment side as well, where, how do you measure what folks are learning effectively and painlessly in that that's brought me on this, uh, this journey into the assessment industry and really making sure that every exam that's delivered in classrooms or whether it's a licensure exam is as fast and as fair as possible and using data to be able to do that. So really mitigating the risk of human bias when it comes to measuring a human's abilities, uh, which is, uh, which is a troublesome area, right?Curtis: Yeah. And now you say a effective and, and painless. And I know most people hate taking tests, so, so tell me how you approach that. Dave: Yeah. Well, I think there's a lot of ways. I mean, I think one of the, one of the most important ways is that you make the test faster, right? You make, you know, in 1979, I was the chairman of assessment systems help create a technology called computerized adaptive testing. What that uses, it uses algorithms to gauge what you know and what you don't know and then basically tailoring the content that you see, the next item you see gets more progressively difficult or progressively easier depending on your, your ability. And what that does is that reduces test time by about 50%. We see that with the ASVAB exam that's given to our service men and women to make their testing experience faster and fair and really, and we're starting to see that really across the world with measurements. So really making those exams tailored to the person's ability, uh, which is really, really important. You know, what you don't want to do is you don't want to give one test that doesn't change to everyone cause that's really, really inefficient. You know, if I'm going through the test and I know I know the content really well,

9/4/2019 • 18 minutes, 51 seconds

Activating Analytics in Business and Government

Todd Jones: My name is Todd Jones. I'm the chief analytics officer here at WebbMason analytics. We are a professional services firm helping our clients accelerate their analytic evolution. So I think my journey started about 10 years ago. Uh, I graduated from Princeton with a degree in operations research and financial engineering. So I could have basically taken f two paths. One, I could have went into the financial space or the second path I could have taken was going into the analytics space and I, and I chose the, the analytics space. I joined a very early company called Spry. When I joined. It was about four months old and primarily started off doing a lot of DOD contracting specific to analytics and data. And we eventually built that company to a pretty nice size. We expanded past the DOD space, got into commercial, started consulting with some large, uh, pharmaceutical companies, transportation companies, and really built that company up and then sold that in 2015. Curtis: When you fill that is Webb Mason, the company that then bought Spry? Todd: Correct. So Spry was again, another professional services firm specializing in data and analytics. WebbMason historically has been a marketing a firm and so they specialize in all aspects of marketing. And as you can imagine, analytics is definitely a big area of focus for them and their clients. And so they brought us in and about 20% of our revenue comes from marketing related activities through WebbMason and then 80% of our revenue still comes working with it and analytic groups outside of the WebbMason portfolio. Curtis: Interesting. Okay. So there was some crossover there, but not as much as you might expect. Todd: Yeah, definitely some crossover without a doubt. So that was definitely beneficial. But you know, as, as I'm sure you can imagine with any acquisition, you learn a lot. And so we're in a great spot right now, and we're able to generate very healthy stream of business independently, but then also find those synergies with WebbMason as it relates to the marketing activities. Curtis: Sure. That's awesome. So when you got started at Spry, ah what, what was your role? What did, what did that look like? Todd: Yeah, so when I got started, most of my role at that time was consulting. So I was working directly with our stakeholders who at the time were within the Department of Defense. So I split my time between Crystal City, Virginia and the Pentagon. And really what we were trying to do was help them build a solution that gave them a enterprise view across the four military groups, specifically related to human resources. So if you think about it, when we, you know, when we fought world war two, you had, you know, one division, the Marines and the navy out in the Pacific and then you had the army in Europe and they, for the most part fought separate campaigns.And then we started to get into Iraq and Afghanistan and all of a sudden all of these individuals started to really come together. And so you might look at a city block and you have the air force there, army there, you know, navy seals in the area. And so all of these groups now have to work very closely together. And one of the things that the DOD was trying to accomplish at that time was to start to get a better view of people across the different military branches. So, for example, rather if I need a particular skillset within a particular city block, can I get that skillset from the navy? Can I get that skillset from the army? Maybe the Marine Corps has that skillset. And so they needed a very, they needed a large enterprise view so that they could very easily and quickly start to develop these blended teams. And so that was definitely a combination of technology solutions as well as analytics solutions. And so we were consulting with individuals within the Pentagon to help them build that technology solution.Curtis: That's really interesting.

8/28/2019 • 13 minutes, 51 seconds

Last-Mile Logistics Analytics—for Everyone Who Isn't Amazon

Today we speak with Professor Ram Bala, an expert in supply chain management analytics, particularly last-mile delivery. He has very interesting insights into how today’s supply chain is evolving. He talks about various methods and algorithms he uses, the specific challenges inherent in doing last mile logistics and deliver, how pricing factors in, and how everyone is trying to catch up to Amazon.Ram Bala: Then there is this great opportunity to actually use the data effectively. But that is a long way to go in terms of coming up with the right algorithms, both on predictions, as well as the optimization to actually get this done in a meaningful way. And if you look at the landscape today in terms of industry, I would say very few companies that actually there yet. Right? I mean, Amazon obviously is a clear example of the leaders in the space, but everyone's trying to get there as well.Ginette Methot: I'm GinetteCurtis Seare: And I'm CurtisGinette: And you are listening to Data CrunchCurtis: A podcast about how applied data science, machine learning and artificial intelligence are changing the world.Intro: Today we speak with Professor Ram Bala, an expert in supply chain management analytics, particularly last-mile delivery. He has very interesting insights into how today’s supply chain is evolving.Ram Bala: My name is Ram Bala. I'm a professor at Santa Clara University as well as a data science leader at CH Robinson, which is the largest logistics marketplace in North America. I've been working with topics in supply chain, belated data science even before it was called data science for the past 15 years. I got my Ph.D. in operations research and a supply chain from UCLA and a, I've been working on these problems both for companies as well as within the academic context that I've been working on research problems. And more recently I think there's been a lot of excitement in this space. And then that's where my involvement with both startups and as well as larger companies has gone up and I, I came into the CH Robinson fold as a consequence of an acquisition. So I was part of a startup that was working on last mile logistics and how to, how to improve that.Curtis Seare: Got It. That's awesome. And the space that you're in is really interesting. Could you give the audience just to contextualize the problem set that you're focused on?Ram: So I think one of the major things that has changed in logistics is the growth of e-commerce and also personal mobility. I mean if you think about Uber Logistics as a larger concept that covers both moving people as well as products and what's really happened is the, the availability of real time data has had a significant consequences on how we are able to predict as well as optimize how we move things and that's then also raised the bar in terms of customer expectations. We expect to get a get a ride to go somewhere within and within five minutes, we expect to get a product within a day and those expectations have been set by specific companies say Uber in the case of personal mobility. In the case of products, it's Amazon and having set the stage, everyone's now trying to be competitive with them, which means that in the product space, certainly all e-commerce companies as well as companies that were in brick and mortar are trying to achieve that same end goal, which is how do I get products to consumers quickly at the same time and not spend too much money? Right? That's the core problem. Now doing that as hard, it's become easier simply because we have real time access to real time data in terms of location as well as you know where products are at an even point. But it is a hard problem to solve.Curtis: Some of the intricacy and you know, routing and pricing and kind of interplay there. Can we dive into a little bit of those details?Ram: Absolutely. So I think uh, routing problems have been around ever since transportation's been around,

8/21/2019 • 23 minutes, 34 seconds

Running a Successful Machine Learning Startup

Today, our guest, Alain Briancon, will talk to us about how to work with Fortune 500 companies and help them get quick value from their data, how to build a roadmap of incremental value during the data collection and analysis process, how they help predict and incentivize customer purchases, and how to dial in on an idea for successful data science software companies.Alain Briancon: Adding one more question to answer is always easy. The difficult part is what question can I remove and still providing insight.Ginette Methot: I'm Ginette Curtis Seare: And I'm Curtis Ginette: And you are listening to Data Crunch Curtis: A podcast about how applied data science, machine learning and artificial intelligence are changing the world. Ginette: If you’re a fortune 1000 company, and your team needs to be trained in Tableau, Statistics, Data Storytelling, or how to solve business problems with data, we’ll fly one of our expert trainers out to your site for a private group training. The most important investment a business can make is in its people, so head over to our site at datacrunchcorp.com and check out our training courses.Today, our guest, Alain Briancon, will talk to us about how to work with Fortune 500 companies and help them get quick value from their data, how to build a roadmap of incremental value during the data collection and analysis process, how they help predict and incentivize customer purchases, and how to dial in on an idea for successful data science software companies.Alain: My name is Alain Briancon. I am currently the VP of data science and chief technology officer for CEREBRI AI. CEREBRI AI is an AI company, as the name could guess. We are located in three cities: Austin, which is the corporate headquarters; Toronto, which is a hotbed of data science in North America; and Washington DC where I work. What CEREBRI AI focuses on is developing a system to help manage above the strategic component as well as the tactical component of customer experience. This is my fifth startup. This is my third startup that involves data science and machine learning. Jean Belanger, who is the CEO of CEREBRI is a friend of mine; now he's my boss. So I'm trying to work through that, and it took him about 19 years to convince me to join a startup, uh, with him. And this was the right opportunity because the kind of problems we are solving are very challenging.It has been a, an absolute blast. Besides working with a great team and building it up. But when I joined we were about 20 people. Now we're about 63 people, about 50 of them on the technical side. Half in data science, half in software. What has been fantastic is applying tricks and insight that I've gained over the years to, uh, help guide the data science side. The other thing also, which is fun, is we have a very pragmatic view of how to approach things and how to approach engagement with customers. Our customers are fortune 500 customers; they are major banks. One of them is a Central Bank. Others are car makers and we're working very hard into the telco business as well. And, uh, when you deal with such companies, first of all, a very interesting sell cycle in which data science and machine learning play a role at the right moment in time.But you have to also be humbled by the fact that you don't start on their side from a clean sheet. And I think that's one of the most interesting component of making things work is bring data science and machine learning insight to companies who cannot afford and we should not afford the, "okay, let's start from scratch. Let's share all of the data in the like," and so vis Jujitsu between the business case that machine learning brings up and the underlying machine learning technology is one of the most fun element of the work.Curtis: That's interesting. Let's, let's dig into that if we can. Can you give me a concrete example in CEREBRI AI how that works and spell out that concept for us?

8/10/2019 • 26 minutes, 37 seconds

Executive Panel: How Can Data Science, ML, and AI Best Support Executive Goals

Today is a special episode. We welcome three executive guests from different organizations to share their experiences and insights about how data science can best support executive goals. Ginette Methot: I'm Ginette Curtis Seare: And I'm Curtis Ginette: And you are listening to Data Crunch Curtis: A podcast about how applied data science, machine learning and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. There's a lot going on here at Data Crunch. Just this last week we finalized the merger of Vault Analytics and Lightpost Analytics under the new banner of the Data Crunch Corporation, which improves our capabilities to serve our clients head over to datacrunchcorp.com to check out our training and consulting offerings. For our executive panel, today we'll be talking to Simon Lee, the chief analytics officer from Waiter; Fatma Kocer, who is the vice president of data science engineering at Altair, and Rollen Roberson who is the president at Trianz. Curtis: So, welcome everyone to the executive panel. We are super excited to have you guys here. You are all executives and companies that are doing amazing things with data science. So the audience knows, again, we're talking about today, the topic is how data science, machine learning and AI can best support executive strategy and business goals. How, how does that function really work? Let's start maybe with Simon and then Fatma and then Rollen, if you could just give us a little introduction, and we'll get going from there. Simon Lee: Thanks. I'm Simon Lee. I'm actually kind of a mixed bag when it comes to data science and analytics. I've got about 20 years of experience using analytics and advanced algorithms, you know, in a whole bunch of different industries like transportation for example, airline rail, trucking, ocean carriers, printing, publishing, manufacturing, finance and delivery. Delivery is where I'm currently at. Waiter is a restaurant, food delivery company in small and mid size market. So probably a lot of people haven't heard of us because we're in the smaller communities, but, we're trying to make a big splash. So yeah, that's who I am. Curtis: Awesome. Thanks for being here. Fatma Kocer: Hi, this is Fatma Kocer from Altair engineering. I am a civil engineer by training, although I never get a chance to practice it. Um, my background is multidisciplinary design, exploration and optimization. And I was in the auto industry before I joined Altair. Um, there, I've done several things throughout the 14 years that I've been here, but always keeping, designing solution optimization as the core of my responsibilities. And Altair is a global technology company. We provide software in solutions for product development, data intelligence and high performance computing. We are located at headquarters in Michigan in Troy, Michigan where I'm speaking from and we have offices in I think 25 countries now. So that would be me. Curtis: Great. Thanks for being here, Fatma, and, ah, Rollen. Rollen Roberson: Right. Thank you. Good Morning. Rollen Roberson with Trianz. You know, for my own background, I've a similar to Simon. I'm kind of a mixed bag, I've been in the industry for 20 plus years, I'm solely in the digital transformation space. Uh, working from startups, mid-level companies through global service integrators, uh, working with Trianz currently to really expand the growth and a use within AI and IoT within the organization. And our customer base, Trianz is a company that has 1,500 plus employees, global offices mainly serving the, upper, mid-tier and enterprise level customer base, uh, solely focused on digital transformation and the use of those higher technologies for greater return on value. How Can Data Science and AI Have an Impact On Your BusinessCurtis: That's awesome.

7/26/2019 • 43 minutes, 14 seconds

The Biggest Pitfalls of New Analytical Initiatives

Our guest Andrzej Wolosewicz has had years of experience helping companies define and build machine learning and analytical solutions that have a measurable impact on the business, and he shares with us his experience and expertise. He shares with us the biggest pitfalls he sees companies fall into over an over as they try to implement these initiatives.The problem was there was a lot of activity every month that they were doing, but in terms of progressing, their analytic capabilities were really kind of being able to to grow and be more effective. They weren't, they weren't able to do that. As the saying goes, they had a lot of action but not a lot of progress.Ginette Methot: I'm Ginette.Curtis Seare: And I'm Curtis, and you are listening to Data Crunch, a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Andre Wolosewicz: My name is Andre Wolosewicz. I am currently the director of sales at HEXstream. We are Chicago based analytics and data consultancy. But this is kind of the, the latest step on my journey. So I actually started out coming straight out of college into a, a predictive modeling startup. And this would have been in the late nineties. Artificial intelligence at the time was, was a big buzzword as it is today. And we were looking at being able to do fairly advanced modeling of systems, but actually looking at the data as being the model. So if you were looking at uh, everything from a jet engine to the human body to complex refineries, we didn't necessarily understand all the nuances of how they ran, but we had all the data and so we would use that data to build out those models. And then I ended up going from that actually flipping into the, into the other side of the world around program management.So, not so much doing the analysis, but understanding how the analytics and designs and all of those steps fit together to actually deliver a furnished product. And so that was, that was very useful because it taught me that, hey, there's a lot more, you may find things that are interesting, but on the business side of the world you have all of the constraints that analysts may not always be aware of or or may not, you know, really want to take into consideration like budgets, schedule, things of that nature. And so I learned how to operate with that. And then another interesting twist of fate, met somebody who knew somebody who was looking for somebody that could provide that line of business experience, but actually selling a business intelligence platform. Not necessarily that you knew how the all the software worked. And you know, if you click here, this happen, if you click here, that happened, but could sit across the table from somebody who was in a line of business and say, I understand the business problem you're having.I understand how to solve it and here's how the technology can be applied. Because the, the reality is technology in and of itself will never solve a tool. It needs people, it needs processes, it needs the people to use it. My Dad used to like to look at a rake and say, well, the art's not going to rake itself, so the rake does the job, but it needs somebody to use it. After about five, six years actually selling and being involved with the bi platform, the opportunity to join HEXstream came up, and for me, this was kind of a combination of all of the past experiences because it gave me the opportunity to engage with clients and engage with our inner teens on what is it that you're trying to do. So going back to my first experience, what is the project? What is the model?What is the data that you're trying to work with and build? But then I also had to understand why that was relevant. Why would a client engage with a company like HEXstream to undertake a project? How is that project measured? There's a lot of things that over the years I've found people would love to do,

7/20/2019 • 27 minutes, 27 seconds

Digital Credentials and Machine Learning Aim to Change How You Hire

Today we’re going to see how a clever idea and the skillful use of data is starting to disrupt how people get credentials. The use case here has the potential to remove gender and racial bias in the hiring process, help companies understand specific talent gaps in their workforce, and help learners find lucrative educational pathways they can take.

7/12/2019 • 19 minutes, 57 seconds

How to Win Hearts and Minds as a Data Leader

Joe Kleinhenz talks about his journey from starting out in data all the way to becoming a leader in one of the largest insurance organizations in the United States. We'll learn about the importance of staying on top of technology, how to win hearts and minds of nontechnical folks, centralized versus decentralized team, pros and cons, how to hold effective conversations with stakeholders and how to go from individual contributor to leader.Joe Kleinhenz: The critical skills you bring to the table is the ability to break down complex ideas into ones that translate for nontechnical folks.Ginette Methot: I'm Ginette.Curtis Seare: And I'm Curtis.Ginette: And you are listening to Data Crunch—a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Data Crunch is produced by the Data Crunch Corporation and analytics training and consulting company. One of the biggest challenges companies have in getting value from their data is finding the right talent. Good talent is scarce and building a top-tier team is hard if not impossible for some companies. If you are having this challenge try out our analytics as a service offering: we bring a fully equipped data science team to bear on your projects, on demand and with no long-term contract constraints. If you want to start seeing success for your data science efforts quickly and economically, head over to datacrunchcorp.com for more details.Today we'll be hearing about Joe Kleinhenz's journey from starting out in data all the way to becoming a leader in one of the largest insurance organizations in the United States. We'll learn about the importance of staying on top of technology, how to win hearts and minds of nontechnical folks, centralized versus decentralized team, pros and cons, how to hold effective conversations with stakeholders and how to go from individual contributor to leader. There's lots of unpack in this episode, so let's get to it.Curtis: If we could just start out just by talking about what got you interested in data in the first place, where your journey started, and we can go from there.Joe: I actually first started thinking about using math to predict future outcomes when I was a teenager. I read a book by Asimov called Foundation and whole premise of the book series was I'm using mathematics to predict the future. It's all science fiction stuff in it that point, but that's kind of what certainly got me first interested in it.Curtis: So it was a, it was a work of fiction that got you interested.Joe: Yeah, that captured my imagination. I didn't even at that point even know, it was a, you know, data science was a thing, and as I got my path into the technology, within IT, I was doing business consulting for awhile and got into data warehousing, and this was in the late nineties. From there, ended up in part of GE financial that was doing a lot of direct marketing, and they had a group called database marketing, which was essentially the precursors for data scientists. They had predictive modelers, statisticians essentially in there that were, by today's standards, relatively simplistic tools like linear regression to build, you know, models predicting who would respond to direct-drip marketing offers. I used to joke with people that I ran a team of bad people that decided to call you at dinner with an offer. You can just have the here. Um,Curtis: And you made those people very effective at, at being bad, I assume.Joe: Yes. Yes. At that point there was very few restrictions on what you could do. We were even using credit data for some of the, the algorithms cause we were with credit card companies. Credit data at the time, there wasn't the regulatory restrictions there is now, it's incredibly predictive. When you combine that with recency frequency data on purchasing behavior, you'd really kind of tune in on, you know, what someone would be interested in.

6/29/2019 • 21 minutes, 47 seconds

Building Data Products that Work in the Health and Wellness Industry

Our guest today holds a PhD in organizational psychology and has been working on data products in the health and wellness space for over a decade. We cover a lot of ground in this interview: how to create data products that work, how to avoid the unexpected consequences of poorly designed data interventions, and the importance of ethnographic thinking in data science.We'll also talk about reducing friction in data collection, the coaching data product model, and surprising things we can learn when people's routine's are broken. From today's episode, you'll come away with a better understanding of how to build contextually relevant data products that make a difference in people's lives.

6/1/2019 • 19 minutes, 40 seconds

The Road to a Data-Driven Culture in Your Organization

How do you whittle the murky business of creating a data-driven culture down to a proven process? Today we talk to a guest who has done this time and time again, helping companies transform their operations. He points out the small nuances and details about the process, like questions to ask to start on the right foot, critical feedback loops to put in place along the way, and how to overcome some of the most common problems that make people give up.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.Now, let's jump into our interview with Ryan Deeds, VP of technology and data management at Assurex Global.Ginette Methot: How do you whittle the murky business of creating a data driven culture down to proven process? Today we talk to a guest who has done this time and time again helping companies transform their operations. He points out the small nuances and details about the process, like questions to ask to start on the right foot, critical feedback loops to put in place along the way and how to overcome some of the most common problems that make people give up. I'm Ginette and I'm Curtis and you are listening to data crunch, a podcast about how applied data science, machine learning and artificial intelligence are changing the world, a vault analytics production. Let's jump into our interview with Ryan deeds that VP of technology and data management at Assurex global.Ryan Deeds: Uh, I think it's an interesting time in the whole a data experience because I think so many people failed. You know, in the last like decade that this next couple of years everybody's now trying to look at root cause. And so culture actually is becoming important now, you know? And so that's kind of a cool thing.Curtis Seare: What do you mean by that? In terms of a lot of people have failed.Ryan Deeds: I think when you look at bi projects from 2003 to 2013, they were just, companies went through litany of failures and trying to get data to a place that what made sense was easily accessible, had had a good quality. Um, but they didn't address that. They just put the visualizations on top of kind of crappy data and they did that over and over and over again. Um, and then finally it seems like, you know, in the last year or two years, we start really having a conversation about what has to happen inside an organization to make data usable. I mean, it's just like water, right? You can't just take water from a stream and start drinking it. You got to process it and clean it and make it and make it valuable and make it worthy of consumption. And that's exactly the thing we got to do with data.Curtis Seare: Sure. Maybe we can dive into that as well, because you've had this experience taking a lot of companies through those steps, right? So what do you see as the major roadblocks? How do you start this process of helping people get their hands around? How do I get value from my data?Ryan Deeds: So it's interesting. I kind of have, uh, you know, I've done this a lot and so I have, uh, organizations that come to me and they say, hey, you know, we want to, we were ready to start leveraging data. Um, and the, the typical thing is there's just a lack of expectation of the time it takes. Um, and so I threw together like a timeline to try to help, uh, educate individuals on that, you know, and kind of like the steps that it would take to get to usable data, um, in, and the first is really a recognition that today we don't, you know, the organization that we're in is not effectively using data, um, as a, as a strategic advantage.

5/1/2019 • 24 minutes, 17 seconds

Statistics Done Wrong—A Woeful Podcast Episode

Beginning: Statistics are misused and abused, sometimes even unintentionally, in both scientific and business settings. Alex Reinhart, author of the book "Statistics Done Wrong: The Woefully Complete Guide" talks about the most common errors people make when trying to figure things out using statistics, and what happens as a result. He shares practical insights into how both scientists and business analysts can make sure their statistical tests have high enough power, how they can avoid “truth inflation,” and how to overcome multiple comparisons problems.Ginette: In 2009, neuroscientist Craig Bennett undertook a landmark experiment in a Dartmouth lab. A high tech fMRI machine was used on test subjects, who were “shown a series of photographs depicting human individuals in social situations with a specified emotional valence” and asked “to determine what emotion the individual in the photo must have been experiencing.” Would it be found that different parts of the brain were associated with different emotional associations? In fact, it was. The experiment was a success. The results came in showing brain activity changes for the different tasks, and the p-value came out to 0.001, indicating a significant result.The problem? The only participant was a 3.8 pound 18-inch mature Atlantic salmon, who was “not alive at the time of scanning.”Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company.Ginette: This study was real. It was real data, robust analysis, and an actual dead fish. It even has an official sounding scientific study name—”Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon”.Craig Bennett did the experiment to show that statistics can be dangerous territory. They can be abused and misleading—whether or not the experimenter has nefarious intentions. Still, statistics are a legitimate and powerful tool to discover actual truths and find important insights, so they cannot be ignored. It becomes our task to wield them correctly, and to be careful when accepting or rejecting statistical assertions we come across.Today we talk to Alex Reinhart, author of the book “Statistics done wrong—The Woefully complete guide”. Alex is an expert on how to do statistics wrong. And incidentally, how to do them right.Alex: We end up using statistical methods in science and in business to answer questions, often very simple questions, of just “does this intervention or this treatment or this change that I made, does it have an effect?” Often in a difficult situation, because there are many things going on, you know, if you're doing a medical treatment there’s many different reasons that people recover in different times, and there's a lot of variation, and it’s hard to predict these things. If you’re doing an A-B test on a website, your visitors are all different. Some of them will want to buy your product or whatever it is, and some of them won’t, and so there’s a lot of variation that happens naturally, and we’re always in the position of having to ask, “This thing/change I made or invention I did, does it have an effect, and can I distinguish that effect from all the other things that are going on.” And this leads to a lot of problems, so statistical methods exist to help you answer that questions by seeing how much variation is there naturally, and this effect I saw, is it more than I would have expected had my intervention not worked or not done anything, but it doesn’t give you certainty. It gives us nice words, which is like “statistically significant,” which sounds important, but it doesn't give you certainty. You're often asking the question, “Is this effect that I’m seeing from my experim...

3/27/2019 • 21 minutes, 27 seconds

Getting into Data Science

What does it take to become a data scientist? We speak with three people who have become data scientists in the last three years and find out what it takes, in their opinions, to land a data science job and to be prepared for a career in the field.Curtis: We’ve talked a lot in our recent episodes about all the interesting things you can do with data science, and we’ve only talked a little bit recently about what it actually takes to get into the field, which is a topic that a lot of you have reached out to us and asked us to cover in a more thorough way. So today, we’re taking a broader approach on this topic by talking to three data scientists who have become data scientists in the last three years. You’re going to be able to hear all the details of each of their three journeys, how they got started, how they landed their jobs, and what their best advice is for getting into the field, and this will give you a broad view about how to get into data science from three people who have actually done it.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Ginette: A Vault Analytics production.Ginette: Here at Data Crunch we’ve been hard at work developing a technology that allows executives and business leaders to gain insight from their data instantly—simply by talking to the air. We hook up your data to an Alexa device with custom skills built in to understand the questions you have about your business - and give you answers. Figure out sales forecasts, marketing performance, operational compliance, progress on KPIs, and more by just talking to Alexa. We are officially launching the product this week and have room for three initial customers—if you're interested, head over to datacrunchcorp.com/alexa or datacrunchpodcast.com/alexa (both work), and book some time to chat with us. We’ll assess if your company is a good fit, and if so, we look forward to working with you!Tyler Folkman: My name’s Tyler Folkman. I've gotten into data science in kind of a strange route to be honest. I did my undergrad in economics, actually originally thinking to get into computer science, but for some reason, I had this thought that computer science was going to get outsourced; I don't know if that was a thing, but I think people back in the early 2000s were talking about computer science getting outsourced, so I thought about business, which ended up begin economics, which I really liked, and then ended up doing economic consulting, which is, basically in usually large litigation cases, lawyers hire economists to value damages, so for example, when Samsung and Apple were suing each other, I worked on the Samsung side to help value how much they might sue Apple for, for patent infringement, and a lot of that involves statistical analyses, data analytics, econometrics as economists would call it. And I got really interested in just this idea of data being a really powerful tool for making decisions and coming to conclusions, and so I started hearing about machine learning on the Internet, kind of dabbling with Python, which at the time, I was a Windows user, and it was a huge pain to get Python installed, but I kind of got it up and running, played around with things like SciKit learn, read some blogs, and really got into machine learning and found that it was really housed more in the computer science department at that time, and just kind of decided to apply to some computer science departments and was lucky to get in at University of Texas at Austin and do some studies there, join a machine learning lab and got to do some work at Amazon. Really got a really good set of experiences to kind of help me learn how to be both a programmer and a machine learning person, a little bit of statistics, and jumped straight from there over here to Ancestry and was luc...

3/1/2019 • 22 minutes, 51 seconds

Automated Machine Learning with TransmogrifAI

Would you rather take a year to develop a proprietary algorithm for your company that has an accuracy of 95% or use an open source platform that takes a day to develop an algorithm that has nearly the same accuracy? In most business cases, you'd choose the latter. In this episode, we talk to Till Bergmann who works on a team that developed TransmogriAI, an open source project that helps you build models quickly.

1/31/2019 • 12 minutes, 49 seconds

The Data Scientist's Journey with Nic Ryan

What does it take to become a data scientist? Nic Ryan has been in the field for over a decade and answered thousands of questions from people looking to get into the field. In this episode, he talks about his journey into data science and his experiencing mentoring aspiring data scientists, giving advice to both beginners and seasoned professionals.Nic Ryan: I think there's sometimes a problem in data science education, and what people find interesting is they tend to focus on the algorithms, which as you know from doing data science projects is really just the last little bit. There's tens or even sometimes hundreds of decisions steps that are made until you get to that particular point. Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Ginette: A Vault Analytics production.Ginette: Ad spaceCurtis: Let’s introduce you to our guest: Nic Ryan. He is an experienced data scientist and LinkedIn influencer who has helped a lot of aspiring data scientists in their journey into the profession. He’s been part of many different data teams, small and large, in big companies and startups, and he wrote a book called, “The Data Scientist's Journey. The Guide for Aspiring Data Scientists,” which is based off the thousands of questions he’s been asked about becoming a data scientist.Nic: It started off with failure. Originally, I wanted to go over to the States to play basketball, so I’m a failed basketball player, and there’s a couple reasons why I didn’t make it: one is I wasn’t tall enough to be a small forward, which is a bit ironic. I’m only 6’2”, but probably the more important reason is I wasn’t very good, but I didn’t know that at the time, so I didn't get a scholarship to play basketball, but I did get a scholarship to do actuarial studies. So it’s not a bad backup plan. But from there, I ended up falling into more of the stats side of things, of insurance, so the statistical modeling, pricing, fire, and theft, I really enjoyed that kind of stuff, so over time, I did more of that. Did some of my post-grad actuarial exams, and I was doing some reading on the weekends and finding out more about stats and a bit about code and a bit about R, and what really did it for me was having an incredibly long train ride to get to work. It was a couple hours each way, and so this is of course, this is the era of MOOCs, and rather than just talking to people, I just ended up joining the MOOCs, and so, really enjoyed that, and this whole thing of data science has just kind of grown around me, and I ended up working for one of the banks and doing their credit scoring and consulting with different banks for a long period of time, and I got a call out of the blue to, a guy just gave me a plane ticket and said come talk to us. So I flew there, and they offered me what was really a head of data science role, so there was a team overseas and a couple teams in Australia doing data science, and yeah, we did some pretty awesome things with NLP and bank statements and built some pretty sophisticated risk models; it was probably best in the country at that time. It’s about 60 miles away from Sydney where I worked, and so it was a real opportunity. It was probably two hour door to door each way, and that was the other thing as well: that was a long time away from family, which wasn’t cool. I had a couple young kids. That’s part of the reason I have my own business now is that I’ve spent too much time away from my daughters. The result of it being I had a whole heap of dead time that I could either use or not use, and so I was able to teach myself code and teach myself some more stats and machine learning and stuff pretty quickly when you have a couple hours of dead time each day, you become pretty good, pretty quickly,

12/28/2018 • 19 minutes, 33 seconds

Cutting-Edge Computational Chemistry Enabled by Deep Learning

Machine learning is becoming a bigger part of chemistry as of the last two or three years. Industries need to have people trained in both fields, and it's taken time for them to make their way into this sector. Olexandr Isayev is at the forefront of that wave, and he talks to us about what he's done while melding deep learning and chemistry together and his vision of where he sees this field going with this new tech.

11/27/2018 • 17 minutes, 42 seconds

Python and the Open Source Community

Python versus R. It's a heated debate. We won't solve this raging controversy today, but we will peek into the history of Python, particularly in the open source community surrounding it, and see how it came to be what it is today—a well used and flexible programming language.Travis Oliphant: Wes McKinney did a great job in creating Pandas . . . not just creating it but organized a community around it, which are two independent steps and both necessary, by the way. A lot of people get confused by open source. They sometimes think you just kind of going to get people together and open source emerges from the foam, but what ends up happening, I’ve seen this now at least eight, nine different times, both with projects I’ve had a chance and privilege to interact with, but also other people's projects. It really takes a core set of motivated people, usually not more than three.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world.Ginette: A Vault Analytics production.Ginette: This episode of Data Crunch is supported by Lightpost Analytics, a company helping bridge the last mile of AI: making data and algorithms understandable and actionable for a non-technical person, like the CEO of your company.Lightpost Analytics is offering a training academy to teach you Tableau, an industry-leading data visualization software. According to Indeed.com, the average salary for a Tableau Developer is above $50 per hour. If done well, making data understandable can create breakthroughs in your company and lead to recognition and promotions in your job.Go to lightpostanalytics.com/datacrunch to learn more and get some freebies.Here at Data Crunch, we love playing with artificial intelligence, machine learning, and deep learning, so we started a fun new side project. We just launched a new podcast that tests the boundaries of what can be done with Google’s cutting-edge deep learning speech generation algorithms. We use surprisingly human-like voices to host the podcast that reads all the unusual Wikipedia articles you haven’t had a chance to read yet, like chicken hypnosis, the history of an amusing German conspiracy theory, strange trends in Russian politics, and much more to come. It’s worth listening to to hear what this tech sounds like and you’ll learn unique and bizarre trivia that you can share at your next dinner party. Search for a podcast called “Griswold the AI Reads Unusual Wikipedia Articles,” now found on all your favorite popular podcast platforms. Curtis: There has been a heated, ongoing debate about which programming language is better when working with machine learning and data analytics: Python or R, and while we won’t be wresting that particular question, we will overview a bit of history for both and then dive into significant history behind one of these languages, Python, with a major contributor to the language, a man who significantly influenced the way that data scientists use Python today.Ginette: As a very short historical background, Python came to the scene in 1991 when Guido Van Rossem developed it. His language has developed a reputation as easy to use because it’s syntax is simple, it’s versatile, and it has a shallow learning curve. It’s also a general purpose language that is used beyond data analysis and great for implementing algorithms for production use. As for R, it followed shortly after Python. In 1995, Ross Ihaka and Robert Gentleman created it as an easier way to do data analysis, statistics, and graphic models, and it was mainly used in academia and research until more recently. It’s specifically aimed at statistics, and it has extensive libraries and a solid community.As a controversial side note, according to Gregory Piatetsky Shapiro’s KDNuggets poll, late last year,

10/24/2018 • 24 minutes, 50 seconds

Machine Learning, Big Data, and Your Family History

How can artificial intelligence, machine learning, and deep learning benefit your family? These technologies are moving into every field, industry, and hobby, including what some say is the United State's second most popular hobby, family history. Today, it's so much easier to trace your roots back to find out more about your progenitors. Tyler Folkman, senior manager at Ancestry, the leading family history company, describes to us how he and his team use convolutional neural networks, LSTMs, conditional random fields, and the like to more easily piece together the puzzle of your family tree.Ginette: Today we peek into an area rich in data that has lots of interesting AI and machine learning problems. Curtis: The second most popular hobby in the United States, some claim, is family history research. And whether that’s true or not, it's has had a lot of growth recently. Personal DNA testing products have exploded in popular over the past three years, but beyond this popular product, lots of people go a step further and start tracing their roots back to piece together the puzzle of their family tree. Today we’re going to dive into the data side of this hobby with the leading family history research company.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics production.Ginette: This episode of Data Crunch is supported by Lightpost Analytics, a company helping bridge the last mile of AI: making data and algorithms understandable and actionable for a non-technical person, like the CEO of your company.Lightpost Analytics is offering a training academy to teach you Tableau, an industry-leading data visualization software. According to Indeed.com, the average salary for a Tableau Developer is above $50 per hour. If done well, making data understandable can create breakthroughs in your company and lead to recognition and promotions in your job.Go to lightpostanalytics.com/datacrunch to learn more and get some freebies.Tyler: My name's Tyler Folkman.Curtis: Who is a Senior manager of data science at Ancestry.Tyler: As I look across Ancestry and family history, we almost have, like, every kind of machine learning problem you might want, I mean, probably not every kind, but we have genetically based machine learning problems on the DNA science side. We have search optimization because people need to search our databases. We have recommendation problems because we want to hint the best resources out to people or provide them. For example, if we have a hundred things we think might be relevant to a person, what order do we showed them? So we use recommendation algorithms for that. We have a lot of computer vision problems because people upload pictures and a lot of our documents, if they're not like digitized yet, meaning that they’ve extracted the text, they might just be raw photos, or even just the things that our pictures uploaded, we want to understand what's in them, so is this a picture of a graveyard is it a family portrait? Is it an old photo? And so tons of computers vision stuff, natural language processing. On the business side, we have marketing problems just like any other business, like how do you optimize marketing spend? How do you optimize customer experience, customer flow? And so it's really a cool place because you really can get exposed to almost any type of problem you might be interested in.Curtis: So back in the 80s, before you could go easily find information on the Internet, genealogists had to spend a ton of time trekking around to libraries to try to find information on their ancestors. Ancestry saw a business opportunity and started selling floppy disks, and eventually CDs, full of genealogical resources for genealogists to easily access in their home.Tyler: And then they grew up through the Internet age and moved out ...

9/26/2018 • 21 minutes, 10 seconds

Machine Learning Takes on Diabetes

When Bryan Mazlish's son was diagnosed with Type I diabetes, there were unexpected challenges. Managing diabetes on a day-to-day basis was tough, so he hacked into his son's insulin pump and continuous glucose monitor to create the world's first ambulatory real-world artificial pancreas. Now his mission is to make it available to everyone.Bryan Mazlish: A nice demo that we showed at Google IO earlier this summer, where we showed our use case for one of their forthcoming APIs. We’re really at the vanguard of digital health medical device enterprise software, and it's incredibly exciting but also challenging place to be. We're enthusiastic about the prospects for what we can do for a whole lot of people.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics production.This episode of Data Crunch is brought to you by Lightpost Analytics, a company helping bridge the last mile of AI: Making data and algorithms understandable and actionable for a non-technical person, like the CEO of your company.Lightpost Analytics is offering a training academy to teach you Tableau, an industry-leading data visualization software. According to Indeed.com, the average salary for a Tableau Developer is above $50 per hour. If done well, making data understandable can create breakthroughs in your company and lead to recognition and promotions in your job.Go to lightpostanalytics.com/datacrunch to learn more and get some freebies.Curtis: Today we get to speak with a man who, after studying computer science at Harvard, went to start a stock-trading algorithm company on Wall Street until his life experienced a twist. Now he’s the president and co-founder of one of the leading digital health medical device enterprise software companies, which employs machine learning to customize and automate medicine intake, all because of an unexpected challenge that showed up in his life.Bryan: My name is Bryan Mazlish. I’m one of the founders of Bigfoot biomedical. My background is in quantitative finance. I spent 20 years on Wall Street, first at a large investment bank and then about a decade running a fully automated trading business where we built algorithms to buy and sell stocks completely automated fashion, and it was about 6 or 7 years ago that my path took a change . . .Ginette: Bryan’s son was diagnosed with Type 1 diabetes, which Bryan says wasn’t entirely unexpected because his wife has the same disease. But what was unexpected was the intensity of managing the disease on a day-to-day basis. He was surprised with how antiquated the insulin management technology was. There wasn’t technology that could anticipate his son’s insulin needs and automatically give him the insulin he needed.Bryan: You have a need to take insulin to just simply to live. This is something that needs to be delivered on a constant basis, 24 hours a day. You can take this in one of two ways: you can use an insulin pump that delivers this in a continuous basis, and you can also take a once-a-day injection, and the benefit of the pump is that you can vary that at different points in the day. When you take an injection, it lasts for up to 24 hours, and it doesn't have the same flexibility, but it does have the benefit of not having to wear a device to deliver the insulin. And that's just the baseline, on top of that you need to take insulin to offset meals, primarily carbohydrates and high glucose levels. So when you're going to sit down to eat breakfast, lunch, or dinner, or even a snack, you need to estimate the amount of carbohydrate and glucose impact of the meal that you're about to consume, and then dose that amount of insulin, either through an insulin pump or through an injection at that time. Ginette: Figuring out how much insulin to give yourself is tough.

8/31/2018 • 17 minutes, 16 seconds

Digital Twins, the Internet of Things, and Machine Learning

In a world where so many things are Internet connected, how is machine learning playing a role? Bruce Sinclair speaks with us about the intersection of IoT, AI/ML, and the digital twin.Bruce: Where AI, and in particular machine learning, and then in particular neural networks, and then in particular deep learning neural networks, where they apply is mostly in this model making, so with IoT, there are two types of models for the digital twin: we have the analytical model that's created through more analytical techniques, and then we have the cognitive models that are being created through a machine learning and artificial intelligence techniques. I kind of like to separate the two, but the the impact in both cases are profound.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics production.Today, if you haven’t guessed already, we’re talking about the intersection of data, artificial intelligence, and the internet of things, or IoT. So we’re talking to an expert well versed in this topic. A little bit about his background: Among many other things he’s done, like found and head companies, he’s authored a book on the Internet of Things, created a certification program for people who want to become certified IoT professionals, and he explains all things IoT on his podcast called “The Internet of Things Business Show.” Today, we’ll learn about AI in the IoT world and more specifically digital twins—a concept named by Gartner two years in a row now as one of the top ten strategic technology trends for both 2017 and 2018. Let’s dive into this topic with our guest.Bruce Sinclair: My name is Bruce Sinclair. I am the president of IoT Inc. We consult for brands, manufacturers, and vendors and help them with their IoT strategies, both on the business side and on the product side, and we produce content, so part of the content is the podcast, and we do trainings, so we're training executives on how to introduce IoT within their business, and how to—most importantly—be profitable with IoT, and the reason I started IoT Inc. was that I saw pretty quickly that there was a lot of hype around the Internet of Things, and this hype was all around the shiny new things, in particular the technology, but as most technologies, they run out of steam if they can't make any money. And so I was very deliberate in focusing on the business aspect of IoT to try to help executives and managers to understand how to apply this technology. Curtis: One of the most important concepts in IoT is the digital twin, which is a virtual reflection of a physical object. One major use of the digital twin is taking the virtual reflection of an object and virtually change it before actually changing things in the physical object in the real world. Today a digital twin is generated from data coming from sensors embedded in a physical object.Bruce: So the Internet of Things, for everyone that’s listening, is really just the Internet being put into physical objects. The Internet being networking, things being the device. That's really, at least when you look at it from a business perspective, that’s not where the action’s at, and not coincidentally, where the action’s at is in data analytics, data science, and a subset of that being AI, and the purpose of putting the Internet in the physical objects at the highest level is to capture data. So we capture data in our sensors, which is more of our internal data sources, and we capture data on the Internet, and that is using business systems, that is using microservices, and coincidentally or interestingly, it's also other products, and this leads us to the most important technology for the Internet of things and this is the digital twin, and the digital twin is the virtualization of the physical into the digital, so this is where it kind of allows us to take the ...

7/31/2018 • 21 minutes, 23 seconds

Building a Machine Learning Company that Decodes Web Analytics, with Per Damgaard

The most important thing is to have an AI-enable infrastructure. It sounds very boring, but that was the learning that I got from the bank as well. It’s actually very easy for us to build the model, but what took a long time was to have the AI infrastructure that enables us to do so.Per: The most important thing is to have an AI-enable infrastructure. It sounds very boring, but that was the learning that I got from the bank as well. It’s actually very easy for us to build the model, but what took a long time was to have the AI infrastructure that enables us to do so.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics production.Ginette: Before we get into this episode, let’s bring you behind the scenes at Data Crunch. We’re going to show you what we’ve learned about your tastes so far.According to the podcast analytics, which are still rudimentary and can only tell us so much, you really liked our last episode with DataOps. You also enjoyed the "No PhD Necessary" episode, the "How Artificial Intelligence Might Change Your World" episode. Almost all of you have loved the history of data science series. In fact, the third one in the series is our most popular episode in terms of how much of the show you listen to. But in terms of sheer listening numbers, the Hilary Mason episode, titled "The Complex World of Data Scientists and Black-Box Algorithms," tops our charts, with the Ran Levi episode, titled "Deep Learning—A Powerful Tool with a Name that Means Nothing," coming in second place. What this seems to tell us is you like interesting data history, you like interesting projections into the future, and you like learning practical ways you can be successful with data projects.But since the podcast analytics are still rudimentary, we want to hear if our conclusions are correct. So if you want to steer our future seasons, let us know what you want to hear more about by filling out a short survey. Just go to datacrunchpodcast.com/survey, and we would love to hear from you!Today we talk to the cofounder and CEO of a Danish company that employs machine learning to gather insights on what content on your website leads people to take action. If you’re looking into building a company using artificial intelligence or machine learning, this episode will be of particular interest to you because he talks about the impetus for his idea, some tools he used to build his product, some challenges, how he hired his team, when he uses or discards algorithms, and how he packages his product. And you can even try a free version of his product, which he mentions at the end of the show.Per Damgaard Husted: My name is Per Damgaard Husted. I'm the founder and CEO of Canecto. Canecto is a new way of doing web analytics based on machine learning, and the reason we do machine learning is because we want to understand the intention of the users so that we can predict how they are interacting on the website. We focus a lot on how content influences people to make decisions on a website, so it sort of compliments the user journey that you have and the UX and the SEO, but we focus on the content.Curtis: So how did Per come up with this idea of extracting insights from users’ interaction with content?Per: The background was that actually I needed this tool. I was a manager in one of the big Danish banks, and I was in charge of the online banking elements, and I got a lot of traffic, or we got a lot of traffic statistics about what's going on, but I didn't really know anything about that users’ intent. I wanted to make our website better. I wanted to understand what motivates them. I wanted to understand what content we produced. We produced a lot of content in the bank, and we had no tools that could explain how the users’ interaction with the content drove them to take specific a...

6/28/2018 • 15 minutes

Why DataOps Matter

If you’re building a data product, these questions are likely occupying your mind: how do you get your customers to trust your data? How do you know your product’s something your customers will want? How do you produce those products more quickly without compromising accuracy? Today we talk with someone who has a lot of experience answering these questions.Ginette: If you’re building a data product, these questions are likely occupying your mind: how do you get your customers to trust your data? How do you know your product’s something your customers will want? How do you produce those products more quickly without compromising accuracy? Today we talk with someone who has a lot of experience answering these questions.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics production.Curtis: If you’re a company aiming to research emerging technologies, like AI, ML, IoT, or edge computing, and you find your company lacking expertise, we know where you can the expertise to pad your research team: this team is a group of ex-fortune 500, b2b tech product managers with in-depth market analysis, product planning, and development expertise in bringing successful products, software, and services to the market, and they have significant in-depth technology skills on their team. They drive emerging tech research, product strategy, and tech marketing that resonates with customers, and they’re good at it. If a service like this would be helpful to you for a proposal you’re writing or a for a product that you’re creating, reach out to us at [email protected], and we’ll be in touch.Ginette: Now let’s jump into today’s episode. We’re talking with someone who’s worked with data teams for many years and has learned a thing or two. This is Chris Bergh.Chris: I’m Chris Bergh. I'm head chef of a company called Data Kitchen in Cambridge, Massachusetts, and we're a company that helps teams of people who do AI or machine learning or data engineering or data visualization deliver insight faster with higher-quality, and so how did I, how did I get to this point to found a company to focus on what we called dataops? Well, I guess I'm a working class kid from Wisconsin. I went to, in the late 80s actually, I went to Columbia to study AI back when AI was just a corner of the world that people, no one knew what it was, and you didn't walk through an airport and run into it, and then I worked on some AI systems at NASA and MIT to automate air traffic control, and then I sort of got into software development and managing software teams.Curtis: To fill out this picture a little more, Chris has two patents under his belt and has had two companies acquired, one by Microsoft, while he was building the company in the C-suite. So he’s no stranger to the difficult experiences that come with companies’ growing pains. Chris: About 10 years ago I got into data and analytics, and the company I worked for was about a 60 person company. We did everything that you could do in analytics, and we did data visualization. We had data scientists. We had data engineers. We even decided to build our own complete software platform that did everything in analytics, and I was the chief operating officer, and I worked with a guy who was from Harvard Medical School, really knew, it was a healthcare analytics company, really knew health care and really could talk to customers and figure out what they wanted, but then he'd come back to me and say, “Chris, here I've got this idea. Customer has this pain. Could you get some people together and figure out how to solve it, so I would go off and pull the data scientist and maybe data engineer and maybe someone who knew Tableau and maybe a software engineer in a room, and we’d talked it through.And I’d, I’d, you know,

6/13/2018 • 16 minutes, 4 seconds

Drones and AI

We are joined by the host of podcast Commercial Drones FM, Ian Smith, who gives us a fascinating understanding of how drones are being used today and in the future. From petri-dish wielding drones that follow whales, to miniature drones working in warehouses, to thermal sensing drones in the mining industry—drones are starting to be used extensively and will continue to grow in the future. We go over the technology, the use cases, the regulations, and the future.Intro: There’s never been a good way, ever, to get snot from a whale to see how healthy they are or do other types of experiments. It can hover right above the whale as it’s surfacing, and it will just have a little petri dish that when the whale blows it’s blowhole, all the snot just goes on it. Then they bring it back to the boat, and then they analyze it later. Curtis: One big area that uses AI and will continue to increase use of it is drone technology. One of the big things that machine learning enables drones to do is be aware of its surroundings. Computer vision classifiers help the drones identify objects that it is seeing and take appropriate action, such as avoiding obstacles, performing maintenance recon, and charting autonomous flight paths.Ginette: Let’s talk to someone steeped in all things drones who can give us insights into drones and how AI currently plays a role and will continue to play a role as drones evolve. This is Ian Smith.Ian: I got into drones in 2013, but before that I had actually built and flown model aircraft, like RCE aircraft with little tiny gas engines, and the balsa wood, and the glue that you have to wait overnight for it to set, and yeah it was a lot of work, and I wound up flying helicopters for my career, so I’m a commercial helicopter pilot. I was a flight instructor, and I heard in 2013 that RC aircraft that model aircraft had come so far that there was people that were using them. They were calling them drones, and they were taking pictures with them and selling them to people, but it was illegal in the United States because there was no regulation from the FAA at the time. So of course I decided to get into this as much as I could, since I wasn’t flying at the time, and ever since then in 2013 it’s been my career, and I worked for a company in France called Delair, and today I work for a company in San Francisco where I’m based now called DroneDeploy, and I host a podcast about drones called Commercial Drones FM as a side project.Curtis: So if you’re looking for more on drones after this episode, go check out Ian’s podcast. He covers all things drone and will keep you up on the latest. Let’s take a broad look at some of the use cases for drones.Ian: Some of the use cases, some of the industries that are using drones really are . . . agriculture was one that everyone latched on to. The construction industry of course. Inspecting assets, so whether that’s oil and gas or utilities or something else entirely, like wind turbines, or something like that. There’s general land surveyors that use drones for mapping activities, and of course there’s the film and photography. Everybody’s by now has seen a Youtube video of a drone or a drone shot in a movie or TV show. . . . Then there’s the mining industry who use them to calculate volumetrics of stockpiles, and search and rescue for finding people and putting crazy sensors on these drones that can sense thermal signatures.The way they’re being used, it’s really up to your imagination. Pretty much anything outside that can get a GPS signal these days. They're going to go towards more indoors things and closed, confined spaces too, so we're seeing just amazing use cases. People have these incredible imaginations, and the more you ask somebody what would a drone do for you? You just get these awesome responses, and it’s really cool to hear what people come up with. They’re even using them for wildlife monitoring,

5/19/2018 • 19 minutes, 47 seconds

Travel AI with Pana

Travel’s an interesting industry because it’s inherently global which makes it inherently complex, and it’s so behind other industries when it comes to innovative and advanced technology being applied. A great example of that is when you buy a ticket on an Expedia or Priceline, etc., it’s likely that 75% of the time that a fax is sent to the hotel to tell them that you’ll be staying there that night.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics production.Ginette: Data Crunch is brought to you by data.world, the productive, secure platform for modern data teamwork. Organizations like The Associated Press, Rare, Encast, and Square Panda use data.world to replace outdated barriers with deep connections among data, people, and impact. This makes data easier to find, helps people work together better, and puts data and insights in the hands of those who need it. To learn more, visit data.world and request a demo. Curtis: Envision in your mind’s eye our globe and all the airplane flights in the sky at any given time. Now, zoom into a busy city on that globe and notice all the cars being rented by business professionals and the hotels that they’re checking into. Even in just one city, the amount of transactions is dizzying. The travel industry has a lot going on, and yet, sometimes it’s surprisingly antiquated.Devon: I'm Devon Tivona. I'm a founder at Pana. My background is actually technical. I went to school for engineering, spent the first five years of my career as a engineer, then a product lead, and most recently as a founder of this company.Ginette: The founders of Pana were intrigued with the possibilities of what they could do in the professional travel space, and as they talked with travelers, they saw an opportunity.Devon: We were talking particularly to frequent traveler[s]. And we kept hearing over and over again two primary pain points. One was felt like “with all the new found technology in the travel space, I still have to be my own travel agent. And it was great 10 years ago when I could just email someone, and they would take care of all of the logistics for me, but now all the technology has made it so I have to do all that work.” And then the second pain point that we started hearing was “then once I buy my plane ticket or my hotel ticket, if I need to make a change or something goes wrong and I want to get ahold of a real human being, that's like pulling teeth from these companies, particularly if I bought my ticket online.” So we kind of had this vision for could we build the 21st century version of the travel agent, but do so, you know, in a scalable Internet business sort of way. We didn’t want to build a boutique travel agency. We wanted to build something big.Travel’s an interesting industry because it’s inherently global which makes it inherently complex, and it’s so behind other industries when it comes to innovative and advanced technology being applied, particularly because it’s so big, not because it doesn’t have awesome people working in the space. A great example of that is when you buy a ticket on an Expedia or Priceline, etc., it’s likely that 75% of the time that a fax is sent to the hotel that you’ll be staying there that night. And for me when I heard that I was like, “okay, this is a really interesting industry because I can always be building stuff here as a technologist.” Curtis: Pana focused on the corporate travel space in particular because it felt it had more user pain points than other travel workflows. Devon: I think that there's, a there's a lot of a lot of varied user pain that are experienced throughout a travel journey, particularly I would say on the corporate travel side of things. I think that leisure travel, there’s billions of dollars being spent on optimizing conversion flows of you buy...

4/29/2018 • 14 minutes, 44 seconds

The Patent Law Land Grab

Before the airplane was invented, some people were concerned that everything that could be invented had been invented. Obviously, that was not the case then, and it's certainly not the case now. So as you create novel inventions, how do you protect them? What's the process? And what tools can help you and your team navigate the world of patents?Janal Kalis: It was like a black hole. Almost nothing got out of there alive. So it became slightly more possible to try and steer your application away by using magic words . . . it didn’t always work but sometimes it did.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics production.Here at Data Crunch, we research how data, artificial intelligence, and machine learning are changing things. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. So to help keep you, our listeners, informed, we’ve started collecting and categorizing all of the artificial applications we see in our daily research. It’s on a website we just launched. Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send a weekly newsletter highlighting the top three to four applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And, now let’s get back to today’s episode.Curtis: Today we dive into a world filled with strategy, intrigue, and artful negotiation, a world located in the wild west of innovation.Ginette: In this world, you fight for your right to own something you can’t touch: your ideas. You and your team ride out into this wild west to mark your territory, drawing a border with words. Sometimes during this land grab, people get a lot of what they want, but generally they don’t, so you have to negotiate with the people in charge, called examiners, to decide what you can own, but what if you’re assigned someone who isn’t fair? Or what if you want to avoid someone who isn’t fair? Is there anything you can do? Maybe, but first you need to understand how the system works. Let’s dive into the world of patents and hear from Trent Ostler, a patent practitioner at Illumina.Trent: The kind of back and forth that goes on oftentimes is trying to get broad coverage for a particular invention, and chances are, the examiner, at least initially, will reject those claims. Curtis: Claims define the boundaries of the invention you’re seeking to protect. It’s like buying a plot of land. There are boundaries that come with the property. These claims define how far your ownership of the invention extends. Claims can be used to tell the examiner why he or she should allow, or approve, your exclusive rights to your idea, giving you ownership over that idea, or in other words, grant you a patent.Trent: The examiner will say that they are broad. The claims don't deserve patent protection. And he could say that they would have been obvious. He could say that it's been done before—it's not novel, and so what this means for anyone trying to get a patent is that it's very complex. There are thousands of pages of rules and cases that come out that further refine what it is that's too broad or what it is that makes something obvious, and oftentimes there is a balancing act of coming close to the line to get the protection that you deserve but not going overboard.Ginette: So there’s a back-and-forth volley between the inventor’s lawyers and the examiner. The examiner says, “hey, you don’t deserve these claims,” and he or she gives you a sound reason or argument for it, and then you and your team try to persuade him or her otherwise, and hopefully overcome those rejections by arguing for why your claims are rea...

3/27/2018 • 19 minutes, 36 seconds

Exposing World Corruption with a Unique Dataset

Transparency International started when a rebellious World Bank employee quit to dedicated himself to exposing corruption. Now the organization claims the media's attention for about one week a year when it publishes its annual Corruption Perceptions Index, an index that ranks countries in order of perceived corruption. Find out how the organization sources the data, what an important bias is in that data, and how that data ultimately impacts the world.Alejandro Salas: I studied political science and I got very interested in all the topics related to good governance, to ethics in the public sector, etc., and I started working in the Mexican public sector, and—oh, the things I could see there. I was a very junior person working in the civil service, and I got all sorts of offers of presents and things in order to gain access to certain information, access to my boss—so very early on in my professional career, I started to see corruption from very close to me, and I think that's something that marked my interest in this topic.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics Here at Data Crunch, we research how data, artificial intelligence, and machine learning are changing things, and we’re noticing an explosion of real-world applications of artificial intelligence and machine learning that are changing how people work and live today. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. At the same time, we think it’s really important that people understand the impact machine learning is having on our world, because it’s changing and is going to change nearly every industry. So to help keep our listeners informed, we’ve started collecting and categorizing all of the artificial applications we see in our daily research and adding them on generally a daily basis to a collection available on a website we just launched. Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send out a weekly newsletter highlighting the top 3–4 applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And now let’s get back to today’s podcast.Curtis: We’ve spent a lot of time on our episodes talking to interesting people about what creative things they’ve done with data, like detecting eye cancer in children, identifying how to save the honey bees, and catching pirates on the high seas, but today we’re going to talk about a simple measurement. A creative and clever way to measure something that is incredibly hard to measure. And powerful results come from a measurement that puts some numbers behind a murky issue so people can start to have important conversations about it. And we’re going to look at an example that’s all over the news right now.Ginette: This dataset that’s all over the news right now has an interesting history. While it draws criticism from some sources, it draws high praise from others. But before we get too ahead of ourselves, let’s officially meet Alejandro, the man at the beginning of this episode.Alejandro: My name is Alejandro Salas. I am the regional director for the Americas at Transparency International. I come from Mexico. I started 14 years ago, and I was hired to work mainly in the Central America region, which is also a region where there's a lot of corruption that affects mainly public security, access to health services, access to education. In general the basic public services are broadly affected by corruption. That was my point of entry to this organization.Curtis: Something important to note here is Transparency International’s origins. It’s a surprising story because Transparency Internationa...

2/21/2018 • 16 minutes, 10 seconds

Data Science Reveals When Donald Trump Isn't Donald Trump

Few things are as controversial in these perilous times as Donald Trump's Twitter account, often laced with derogatory language, hateful invective, and fifth-grade name-calling. But not all of Trump's tweets sound like they came straight out of a dystopian dictator's mouth. Some of them are actually nice.Probably because he didn't write them.Join us on a discerning journey as two data scientists tackle Donald Trump's Twitter account and, through quantitative methods, reveal to us which hands are behind the tweets.Episode TranscriptFor the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.Dave Robinson: So the original Trump analysis is certainly the most popular blog post I’ve ever written. It got more than half a million hits in the first week and it still gets visits . . . and the post still gets a number of visits each week. I was able to write it up for the Washington Post and was interviewed by NPR.Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Curtis: Here at Data Crunch, as we research how data and machine learning are changing things, we’re noticing an explosion of real-world applications of artificial intelligence that are changing how people work and live today. We see new applications every single day as we research, and we realize we can’t possibly keep you well enough informed with just our podcast. At the same time, we think it’s really important that people understand the impact machine learning is having on our world, because it’s changing and is going to change nearly every industry. So to help keep our listeners informed, we’ve started collecting and categorizing all of the artificial intelligence applications we see in our daily research. These are all available on a website we just launched, which Data Elixir recently recognized as a recommended website for their readers to check out. The website includes, for example, a drone taxi that will one day autonomously fly you to work, a prosthetic arm that uses AI to aid a disabled pianist to play again, and a pocket-sized ultrasound that uses AI to detect cancer.Go explore the future at datacrunchpodcast.com/ai, and if you want to keep up with the artificial intelligence beat, we send out a weekly newsletter highlighting the top 3-4 applications we find each week that you can sign up for on the website. It’s an easy read, we really enjoy writing it, and we hope you’ll enjoy reading. And now let’s get back to today’s podcast.Ginette: Today, we’re chatting with someone who made waves over a year ago with a study he conducted and he recently did a follow up study that we’ll hear about. Here’s Dave Robinson.Dave: I'm a data scientist at Stack Overflow, we’re a programming question-and-answer website, and I help analyze data and build machine learning features to help get developers answers to their questions and help them move their career forward, and I came from originally an academic background where I was doing research in computational biology, and after my PhD I was really interested in what other kinds of data I could apply a combination of statistics and data analysis and computer programming too.Curtis: Dave studied stats at Harvard and then went on to get his PhD in Quantitative and Computational Biology from Princeton. He did a study on Donald Trump’s tweets in 2016 you may have heard about and posted it to his blog, Variance Explained.For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.SourcesPicture SourcePhoto by Kayla Velasquez on UnsplashMusic

1/19/2018 • 15 minutes, 16 seconds

No PhD Necessary

The ubiquity of and demand for data has increased the need for better data tools, and as the tools get better and better, they ease the entry into data work. In turn, as more people enjoy the ease of use, data literacy becomes the norm.Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”“We have a gift for you this holiday season. We’re giving you, our listeners, a website . . . it’s a website of all the AI applications we come across or hear about in our daily research. We post bite-size snippets about the interesting applications we are finding that we can’t feature on the podcast so that you can stay informed and see how AI is changing the world right now. There are so many interesting ways that AI is being used to change the way people are doing things. For example, did you know that there is an AI application for translating chicken chatter? Or using drones to detect and prevent shark attacks on coastal waters? To experience your holiday gift, go to datacrunchpodcast.com/ai.”Curtis: “If you’ve listened to our History of Data Science series, you know about the amazing advances in technology behind the leaps we’ve seen in data science over the past several years, and how AI and machine learning are changing the way people work and live.“But there is another trend that’s also been happening that isn’t talked about as much, and it’s playing an increasingly important role in the story of how data science is changing the world.“To introduce the topic, we talked with someone who is part of this trend, Nick Goodhartz.”Nick Goodhartz: “So I went to school at Baylor University, and I studied finance and entrepreneurship and a minor in music. I ended up taking a job with a start-up as a data analyst essentially. So it was an ad technology company that was a broker between websites and advertisers, and so I analyzed all the transactions between those and tried to find out what we are missing.“We were building out these reports in Excel, but there was a breaking point when we had this report that we all worked off of, but it got too big to even email to each other. It was this massive monolith of an Excel report, and we figured there's got to be a better way, and someone else on our team had heard of Tableau, and so we got a trial of it. In 14 days we—actually less than 14 days—we were able to get our data into Tableau, take a look at some things we were curious about, and pinpointed a possible customer who had popped their head out and then disappeared. We approached them and signed a half million dollar deal, and that paid for Tableau a hundred times over, so it was one of those moments where you really realize, ‘man, there’s something to this.’“That's what got me into Tableau and what changed my mind about data analysis because at school analyzing finance it was nothing but Excel and mindless tables of stock capitalization and all this stuff and what made it fascinating was finding a way to look at it and answer questions on the fly, and then it actually changed the way I look at things around me. I find myself now watching a television show and thinking ‘well this episode wasn't as interesting. I wonder what the trends of the ratings look like.’ It really has changed the way I think about data because of how easy it's been to access it.”Ginette: “Nick is a member of a growing portion of people who didn’t think they’d end up doing analytics. He didn’t have the specific training for it, he doesn’t have a computer science or statistics degree, and he doesn’t spend nights and weekends writing code. And yet, he was able to produce extremely useful insights from his company’s data stores and help land a large business deal. Not only that, he found the process of finding insights from data so fascinating that it spilled over into his le...

12/19/2017 • 13 minutes, 44 seconds

How to Succeed at IoT—Amid Increasing Complexity

The growth of the Internet of Things, or IoT, is often compared with the industrial revolution. A completely new phase of existence. But what does it take to be part of this revolution by building an IoT product? It's complex, and Daniel Elizalde gives us a peek into what the successful process looks like.For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.Donate 15 SecondsIf you liked this episode, please consider giving us a review on iTunes! It helps other people find the show and lets us know how we’re doing.Partial Transcript(for the full episode, select play above or go here)Ginette: “So, today, we’re defining an IoT product, or an Internet of Things product, as “a product that has a combination of hardware and software. It acquires signals from the real world, sends that information to the cloud through the Internet, and it provides some value to your customers.”Okay, so before we introduce you to our guest, consider this: The IoT Market is infernally hot. In 2016, we had 6.4 billion connected ‘things’ in use worldwide, and Gartner research firm projects that number will nearly double to 11.2 billion in 2018, and then nearly doubling again to 20.4 billion IoT products in 2020. For context, this last number is about 2 and a half times the number of people on earth. “Let’s look at an example of IoT at work. Let’s say you’re an oyster farmer, and you need to keep your oysters under a certain temperature because harmful bacteria might grow if you don’t—which would result in people getting very sick after eating your product. If that happened, the FDA could shut your operation down. “This is where IoT products can help you. You can track water temperature with sensors. Those sensors can send that data to the cloud, where you can access it. The system will even send you an alert if the temperature ranges outside your chosen temperature criteria. You can use cameras that show when the oysters are harvested and how long the oysters are out of cold water before they’re put on ice. By using these sensors and cameras to record harvest date, time, location, and temperature at all stages of harvest, you have recorded evidence that you’ve properly handled the harvest.“So, for the purposes of today’s episode, let’s now switch to the other perspective—to the perspective of someone who wants to make and sell an IoT product. Imagine you and two of your friends recently launched an IoT startup—you’re able to secure funding to build your IoT product, and you’ve hired some team members to help you get your beta version off the ground. But you’re new to building products like this, and the rest of your team is also pretty new to it as well. So you decide to talk with someone who is an expert in the IoT space who can give you and your team pointers—and you’re lucky enough to find this man.”Daniel: “My name is Daniel Elizalde. I am the founder of Tech Product Management. My company focuses on providing training for companies building IoT products, specifically I focus on training product managers. I've been doing IoT really for over 18 years, before it was called IoT, and I worked in small companies and large companies, consulting, and UX agencies. Most of my career has been on the product side of things, anywhere from single contributor to head of product and most recently, I left the corporate world, and I founded Tech Product Management. I teach online. I have an online course for a certification program for IOT product managers. I also teach at Stanford continuing studies, and I do consulting and workshops for companies.“I started to get a lot of request for an online program. And so that's when I decided to build my online training, and it's actually a certification program where you take all the material, then you take a test, and you get a certification.”

11/17/2017 • 17 minutes, 43 seconds

After Disaster Strikes: Data in Disaster Recovery

We’ve seen photos of disasters depicting fearful and fleeing victims, ravaged properties, and despondent survivors. In this episode, we explore two ways data can help survivors heal and how data also tells their stories.For the full episode, listen by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast. Donate 15 SecondsIf you liked this episode, please consider giving us a review on iTunes! It helps other people find the show and lets us know how we're doing! Partial Transcript(for the full episode, select play above or go here)Aaron Titus: “I almost disbelieved my own numbers, even though I chose the most conservative ones. It's just outrageous. I'm like, ‘Really? A 233x ROI?’ That's insane.”Ginette: “I’m Ginette."Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”“Today’s episode is brought to you by Lightpost Analytics. Data skills are in intense demand and are key for organizations to remain competitive; in fact, Forbes listed the industry’s leading data visualization software, Tableau, as the number three skill with the most explosive growth in demand, so investing in yourself to stay relevant in today's hyper-competitive, data-rich, but insights-hungry world is extremely important. Lightpost Analytics is a trusted training partner to help you develop the Tableau skills you need to stay relevant. Check them out at lightpostanalytics.com and let them know that Data Crunch sent you." “Today, we look at what it takes to understand a larger story—when many disparate voices come together to tell you something much more powerful, and specifically how it can help people deal with the large scale devastation of natural disasters. Let’s jump into how one man did something about his pet peeve, and it produced $300,000,000.00 dollars in savings. And then we’ll pop over to New Zealand to explore how a disaster situation affected Christchurch and what people did about it.”Aaron: “I was a disaster relief volunteer in New Jersey during hurricanes Irma (Ginette: Here Aaron actually means Irene) and Sandy, and my area got very hard hit by Irma, and I started off as a relief volunteer and ended up directing a lot of those relief efforts for my church, and while I was there, I remember standing in very long lines, and a thousand of us would gather together at a field command center and spend an hour and a half waiting to get checked in, which is lightning speed for 1,000 people, but it's still an hour and a half.“And while everybody was waiting, they’d pull out their phones and would start playing Angry Birds, and the technologist in me would just scream inside, “I could have you all checked in with your work orders in 30 seconds, not an hour and a half!”“And I abhor inefficiency—to a fault—like it's almost a little bit of a sickness. I really ought to be better, but I really abhor inefficiency, and I hate it when people waste my time, and I hate wasting people's times, especially volunteers. As a volunteer manager, your most precious asset are your volunteers and the time that they give to you, and when you waste that, not only are you wasting an hour right now, and that’s an hour that you're not helping somebody, but then that volunteer has a bad experience, and they don't come back next week, and so you're not just wasting an hour, you're wasting weeks when you've wasted volunteers’ time.”Curtis: “This is Aaron Titus, the executive director for Crisis Cleanup, a platform that connects volunteers with survivors who opt. in for help cleaning up their properties after a disaster. After this moment of frustration, Aaron decides he’s going to do something about this inefficiency, and he spends over a year designing a system while tryin...

10/18/2017 • 26 minutes, 28 seconds

The Complex World of Data Scientists and Black-Box Algorithms

Hilary Mason is a huge name in the data science space, and she has an extensive understanding of what's happening in this space. Today, she answers these questions for us: What are the backgrounds of your typical data scientists? What are key differences between software engineering and data science that most companies get wrong? How should you measure the effectiveness of your work or your team's work as a data scientist for the best results? What is a good approach for creating a successful data product? How can we peak behind the curtain of black-box deep learning algorithms?Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.Curtis: Today we hear from one of the biggest thinkers in the data science space, someone who DJ Patil endorses on LinkedIn for data science skills. She worked at bit.ly, the url shortener, and is a data scientist in residence at venture capital firm Accel Partners, a firm that helped fund some companies you may know, like Facebook, Slack, Etsy, Venmo, Vox Media, Lynda.com, Cloudera, Trifacta—and you get the picture.Ginette: The partner of this VC firm said that Accel wouldn’t have brought on just any data scientist. This position was specifically created because this particular data scientist might be able to join their team. Curtis: But beyond her position as data in residence with Accel, she founded a company that’s doing very interesting research, and today, she shares with us some of her experiences and perspective on where AI is headed.Ginette: I’m Ginette.Curtis: And I’m Curtis.Ginette: And you are listening to Data Crunch.Curtis: A podcast about how data and prediction shape our world.Ginette: A Vault Analytics production.Hilary: I'm Hilary Mason, and I'm the founder and CEO of Fast Forward Labs (Please note that Hilary is now the VP of Research at Cloudera). In addition to that, I'm a data science in residence for Accel Partners. And I've been working in what we now call data science, or even now call AI, for about twenty years at this point. Started my career in academic machine learning and decided startups were more fun and have been doing that for about 10, 12 years depending on how you count now, and it's a lot of fun!Ginette: Something I’d like to note here is there’s been a very recent change: Hilary’s company, Fast Forward Labs, and Cloudera recently joined forces, and Hilary’s new position is Vice President of Research at Cloudera. Now, one thing that Hilary talks to is where the data scientists she works with come from, which is a great example of the different paths people take to get into this field.Hilary I am a computer scientist, and I have studied computer science. It's funny because now at Fast Forward, our team only has only two computer scientists on it, and one of them is our general counsel, and one is me, and I'm running the business, so most of the people doing data science here come from very different backgrounds. We have a bunch of physicists, mathematicians, a neuroscientist, a person who does brilliant machine learning design who was an English major, and so data science is one of those fields where one of the things I really love about it is that people come to it from so many different backgrounds, but mine happens to be computer science.The people on our team at Fast Forward typically have a PhD in a quantitative field, such as physics, neuroscience, electrical engineering, and then have, through that, learned sufficient programming skill. One of the jokes I make about my team is that we're essentially a halfway house for wayward academics in the sense that we can absorb people and teach them to be good software engineers, help them understand the difference between theoretical machine learning an...

9/19/2017 • 25 minutes, 17 seconds

Deep Learning—A Powerful Tool, with a Name that Means Nothing

Tesla isn’t the only car brand in the world producing or aiming to produce self-driving cars. Every single car brand is working on developing self-driving cars. But what does this mean for our future? We talk about this and other interesting deep learning projects and history with Ran Levi, science and technology observer and podcaster, who explains in thought-provoking ways what we have to look forward to.Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.Ran Levi: “I actually had the pleasure of being invited to Google's Mountain View headquarters, and they took me for a drive in one of their autonomous vehicles, and it was, to tell you about that drive because it was boring—boring in a good way. Nothing happened! We were just driving around. The car was driving itself all around Mountain View. And it worked.“The first time I entered such a car, I didn't know what to expect. I mean, I didn't know how reliable are those kinds of cars. So I had the idea that maybe I should sit somewhere where I can maybe jump and grab the wheel if necessary. You know, I was a bit dumb. They don't need me, really. And probably if I touch the steering wheel, I would probably make some mistake and ruin the car. It drives better without me.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Ginette: “We have a great live show planned that we hope to give at SXSW 2018. It's a really awesome show about the power of niche artificial intelligence, and we’re going to share details from our research into what amazing things AI is doing right now on the fringe and in mainstream AI projects. We're really excited to share it, so if you’re going to SXSW, or you just want to be good hearted and help us out, please vote on our dual panel by going to panelpicker.sxsw.com, signing in, and liking our topic, which you can find by searching for ‘The Power of Niche AI: From Cucumbers to Cancer.’“Today we get to talk to Ran Levi, who’s been researching and reporting on science and technology for the past 10 years. He’s a hugely successful science and tech podcaster in Israel, producing a Hebrew-language show called Making History, and he’s also producing two English podcasts right now for an international audience, so since he’s steeped in the subject, he has a lot of very interesting insights for us.”Ran: “I'm actually an electronics engineer by trade. I was an engineer for 15 years. I was both a hardware and software developer for several companies in Israel. And during my day job as an engineer, I wrote some books about the history of science and technology, which was always a big hobby of mine. And actually, I started a podcast about this very subject about 10 years ago, and it became quite a hit in Israel I’m happy to say. So about four years ago, I quit my day job, and I actually started my own podcasting company, and now we are podcasting both in Israel and in the U.S. for international audience and actually launched my brand new podcast last week. It's called Malicious Life about the history of malware and cybersecurity, which is a fun topic. Actually, the day I launched the podcast, there was a big ransom attack in Europe mostly. So it was . . . I didn't plan it. You've got no proof against me.”Ginette: “This is a topic well worth learning more about because cyber attacks can affect anything from your access to electricity to your bank account, so check out his new podcast on the website Malicious.life. But today, we’re talking about a different topic—deep learning. This is something Ran knows quite a bit about, technically and historically.”

8/9/2017 • 16 minutes, 55 seconds

When Song Lyrics and British Lit Meet Tidy Text

When Julia Silge's personal interests meet her professional proficiencies, she discovers new meaning in Jane Austen's literature, and she gauges the cultural influence of locations in pop songs. Even more impressive than these finds, though, is that she and her collaborator, Dave Robinson, have developed some new, efficient ways to mine text data. Check out the book they've written called Tidy Text Mining with R.Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.TranscriptJulia Silge: “One that I worked on that was really fun was about song lyrics. The last 50 years or so of pop songs, we have all these lyrics, so all this text data, and I wanted to ask the question, what places are mentioned more or less often in these pop songs.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Curtis: “Brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Whether you’re already a frequent dataset contributor or totally new to data.world, there are several resources you can use to stay in the loop on the latest features, learn new skills, and get support. Check out docs.data.world for up-to-date API documentation, tutorials on SQL, and other query techniques, and much more!”Ginette: “We hope you’re enjoying some vacation time this summer. We just did, and now Data Crunch is back! To hear the latest from us, add us on Twitter, @datacrunchpod. Today we hear from an exciting guest—someone who is on the cutting edge of data science tool creation, someone exploring and developing new ways to slice and dice difficult data.”Julia: “My name is Julia Silge, and I'm a data scientist at Stack Overflow. My academic background is in physics and astronomy, but I’ve worked in academia, teaching and doing research, I worked at an ed tech start up, and I've made a transition now into data science.”Ginette: “Stack Overflow, where Julia works, is the largest online community for programmers to learn, share knowledge, and build their careers. It's a great resource when you need to solve a coding problem or develop new skills.”Curtis: “Now there are basically two main camps in data science: people who program with R, a statistical programming language, and people who program with Python, a high-level, general purpose language. Both languages have devoted followers, and both do excellent work. Today, we’re looking at R, and Julia is a big name in this space, as is her collaborator Dave Robinson.”Julia: “Text is increasingly a really important part of our work as people who are involved in data. Text is being generated all the time, at ever faster rates. This unstructured data is becoming a really important part of things that we do. I also am somebody that—my academic background is not in text or literature or natural language processing or anything like that, but I am somebody who's always been a reader and always been interested in language, and these sort of collection of circumstances kind of all came together to converge that me and Dave decided to develop some tools for making text mining something that people can do within this idiom of people who work using the R programming language. So we’ve developed a package called tidy text.”Ginette: “Now this particular tool is based on tidy data principles, which is basically organizing data in a uniform way so it’s ready for you to ferret out insights.”Julia: “There's a section of people who use tools that are built for dealing with tidy data principles,

7/16/2017 • 17 minutes, 47 seconds

How Data Is Eradicating Malaria in Zambia

According to the CDC, people have been writing descriptions of malaria—or a disease strikingly similar to it—for over 4,000 years. How is data helping Zambian officials eradicate these parasites? Tableau Foundation's Neal Myrick opens the story to us.Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.Neal: “When somebody walks from their village to their clinic because they're sick, health officials can see that person now as the canary in a coal mine.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Curtis: “This episode is brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Looking for a lightweight way to deliver a collection of tables in a machine-readable format? Now you can easily convert any tabular dataset into a Tabular Data Package on data.world. Just upload the file to your dataset, select 'Tabular Data Package' from the 'Download' drop-down, and now your data can be effortlessly loaded into analytics environments. Get full details at meta.data.world.”Ginette: “Today we’re talking about something that can hijack different cells in your body for what we’ve deemed nefarious purposes. It enters your bloodstream when a mosquito transfers it from someone else who has it, to you. Once it’s in your body, it makes a B-line for your liver, and when safely inside your liver, it starts creating more of itself.“Sometimes, this parasite stays dormant for a long time, but usually it only takes a few days for it to get to work. It starts replicating, and there are suddenly thousands of new babies that burst into your bloodstream from your liver. When this happens, you might get a fever because of this parasite surge. As these new baby parasites invade your bloodstream, they hunt down and hijack red blood cells. They use these blood cells to make more of themselves, and once they’ve used the red blood cells, they leave them for dead and spread out to find more. Every time a wave of new parasites leaves the cells, it spikes the number of parasites in your blood, which may cause you to have waves of fever since it happens every few days.“This parasite can causes very dangerous side effects, even death. It can cause liver, spleen, or kidney failure, and it can also cause brain damage and a coma. To avoid detection, the parasites cause a sticky surface to develop on the red blood cell so the cell gets stuck in one spot so that it doesn’t head to the spleen where it’d probably get cleaned out. When the cells stick like this, they can clog small blood vessels, which are important passageways in your body. You may have guessed it, we’re describing malaria.“It plagues little children, pregnant women, and other vulnerable people. Children in particular are incredibly vulnerable, something that’s reflected in the statistics: one child dies every two minutes from malaria.“But often outbreaks are treatable, trackable, and preventable when the data is properly captured and analyzed. The United States eradicated malaria in the 1950s. But it still plagues other areas of the world, especially sub Saharan Africa. In 2015, 92 percent of all deaths related to malaria worldwide are in sub saharan Africa.“Today, we’re talking to the man who authorized a partnership aimed at eradicating malaria in one country that’s suffered heavily from it. The results, which we’ll get to, are impressive.”Neal: “My name is Neal Myrick. I'm the director of social impact at Tableau Software and the director of Tableau Foundation.

6/11/2017 • 17 minutes, 16 seconds

How Artificial Intelligence Might Change Your World

What does the creation of new artificial intelligence products look like today, and what do experts in this field foresee realistically happening in the near future? One thing's for sure, the way we work and function in life will change as a result of growth in this field. Listen and find out more.Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.TranscriptIrmak Sirer: “It’s kind of like a Where’s Waldo of finding an expert in this entire giant ocean of people.” Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Curtis: “Brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. A complex dataset with a ton of files can quickly become scary and unwieldy, but you need not fear! Now you can use file labels and descriptions to manage and organize your many files on data.world. With file labels and descriptions, you can quickly see what type of file it is, view a short description, and also filter down by file type. Wanna see an example of how data.world users are using file labels and descriptions to keep their dataset organized? Search "data4democracy/drug-spending" on data.world.Ginette: “Today we’re taking a closer look at something that is starting to seep into our daily lives. In one of its forms, it’s something Stephen Hawking, Bill Gates, and Elon Musk are concerned will eventually be a threat to mankind. In another form, though, you’re probably already using it, and it’s becoming a major game changer, kind of like the early days of the desktop computer. We’re talking about artificial intelligence. You use AI when you talk to Siri or your in-home assistant, Alexa or Echo, and some people are using it in the form of a self-driving car. “So daily applications of artificial intelligence are on the rise, becoming much more of a staple in our society, but AI’s definition shifts according to the source. Popular movies depict AI as having a consciousness, emotions, and exhibiting human-like characteristics. Usually it’s involved in some sort of world-domination plot to kill all the humans. Although most experts agree that artificial intelligence will never actually think and feel like a human, the existential threat still exists. This kind of apocalyptic AI is known as ‘general AI.’ But that’s a topic for another episode. Today, we’re focusing on the kind of AI that currently exists, otherwise known as narrow AI.”Curtis: “A narrow AI is called narrow because it’s usually focused on one specific task, where as a general AI would be able to be good pretty much any task thrown its way. The Google search bar is probably the most ubiquitous example of a narrow AI that most people use on a daily basis. The process usually goes like this: you give it an input like ‘How to own a llama as a pet.’ It does its processing. It gives you an output in the form of the 10 most relevant web pages to answer your questions (along, of course, with some paid advertisers who are trying to sell you a pet llama).“The simplicity of the interaction belies the complexity of the cognitive work that’s going on behind the scenes. Imagine if you had to do the same cognitive task without the help of Google. What would that actually entail? You would have to individually look at and read every single website, and there are over 1 billion, and peruse them to see if they have anything to do about llamas, not to mention then find the individual pages on those websites that actually answer your questions. “This is a really big task!

5/28/2017 • 20 minutes, 16 seconds

Preventing a Honeybee Fallout

What would the world look like without honeybees? In theory, if there were no honeybees, it could drastically change our lives. Bjorn Lagerman, though, never wants to know the actual answer to that question. but the honeybees current worst foe, Varroa Destructor, is killing off honeybee hives at intense rates. Bjorn's in the middle of a machine learning project to save the bees from the vampirish Varroa.Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through iTunes, Google Play, Stitcher, and Overcast.Bjorn Lagerman: “My name is Bjorn Lagerman. I live in the middle of Sweden. When I look back in my younger days, I remember, I sat in school, looked outside the window and decided I wanted to be outside. You know, I was raised in a stone desert in the middle of Stockholm in the old town; that's a medieval town. And inside the blocks, there were sort of an oasis of water and fountains and green in this stone desert, but the streets were very old streets. And then the contrast was that in the summertime, I spent that in the countryside, and that was total freedom—you kow, lakes, rivers, forests, and my parents let us do what we wished during all the days, just come home for dinner. So when I was 22, I thought bees might be a reason to spend more time in nature. So I went to the nearest beekeeper, . . . and he sold me my first colony, and from there on, I was really hooked.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Curtis: “This episode is brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. A complex dataset with a ton of files can quickly become scary and unwieldy, but you need not fear! Now you can use file labels and descriptions to manage and organize your many files on data.world. With file labels and descriptions, you can quickly see what type of file it is, view a short description, and also filter down by file type. Wanna see an example of how data.world users are using file labels and descriptions to keep their dataset organized? Search ‘data4democracy/drug-spending’ on data.world.”Ginette: “Imagine for a minute what the world would look like without bees. The image is potentially pretty bleak: we’d have much less guacamole, fruit smoothies, chocolate everything, various vegetables, pumpkin pie, peach cobbler, almond butter, cashews, watermelons, coconuts, lemon, limes, and many more food products. Let’s not forget the obvious—we wouldn’t have honey, which man can’t replicate well. “But fruits, vegetables, and chocolate aren’t the only food stuffs that would be affected. Bees support other animal life. They pollinate alfalfa, which helps feed dairy cows and boost their milk production, and on a more limited basis, alfalfa helps feed beef cows, sheep, and goats. Statistics vary, but bee pollination affects somewhere between one to two thirds of food on American’s plates. Beyond food, bees help grow cotton, so without bees, we’d have to rely more on synthetics for our cloth.“Honeybees in particular are incredibly hard workers. They pollinate 85 percent of all flowering plants. They collect from just one flower specie at a time, and in turn, the pollen they carry fertilizes the flower’s egg cells. One industrious honeybee worker can pollinate up to 5,000 flowers a day. One honeybee hive worth of workers can visit up to 500 million flowers a year. “With a reduced bee population, it gets harder to produce food. Let’s take an example. California grows 85 percent of the world’s almonds, and it takes at least 1.7 million hives to pollinate them,

5/14/2017 • 17 minutes, 48 seconds

When a Picture Is Worth a Life

What if you found out your infant had eye cancer? That news would rock anyone’s world. But what if you had a tool that helped you catch it early enough that your baby didn’t have to lose his or her eye and didn’t have to go through chemo? You’d probably do almost anything to get it. Bryan Shaw has dedicated his time to helping parents detect this cancer sooner so their children don't have to go through what his son went through—and he’s doing it for free. With computer scientists from Baylor University, he's harnessed the power of a machine learning algorithm to detect cancer that no human eye can detect.Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link or you can also listen to the podcast through iTunes, Google Play, Stitcher, and Overcast. Bryan Shaw: “The very first person who ever contacted me because our app helped them was a gentleman in Washington State, and his little girl had myelin retinal nerve fiber layer, which is an abnormal myelination of the retina, and it can cause blindness, but it presents with white eye. And his little girl was five years old, and he kept seeing white-eye pics. He heard our story. He downloaded our app. Our app detected the white-eye pics. That emboldened him enough to grill the child's doctor. You know, 'My camera's telling me this. Look, this app. I heard this story . . .’ The doctor takes a close look. The girl had been 75 percent blind in one of her eyes for years, and nobody had ever caught it.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Curtis: “Data Crunch is again brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Did you know that you can add files via URL to your data sets on data.world? Data.world APIs allow you to pull live survey data into your data set, enable automatic file updates, and more. Get the full details on data.world APIs at docs.data.world, or search ‘Austin Cycling Survey’ on data.world to see live survey sync in action in Rafael Pereira's data set!”Ginette: “One quick reminder that our data competition is currently up on data.world. Be sure to post your submissions by May 5.“Okay, now back to the story. If you know someone who’s about to have a child, has a child five or under, or plans to have children, you need to send them this episode, and you’re about to find out why from this man, Bryan Shaw.”Bryan: “When Noah was three-months-old, we started noticing that a lot of his pictures had white pupillary reflections, what doctors call leukocoria, white core, white pupil, and that can be a symptom of a lot of different eye diseases.”Ginette: “You probably put this together, but Noah is Bryan’s son. And to add in Noah’s mom’s perspective here, when she started noticing this strange white reflection in Noah’s eyes, like most moms today, she aggressively searched the Internet for answers. Like Bryan said, leukocoria could indicate a disease, or it could indicate nothing, but the Shaws decided they needed to tell their pediatrician about what they’d found.”Bryan: “Noah passed all his red reflex tests, until we told his pediatrician that we noticed leukocoria, and he had a very good pediatrician—Pearl Riney, Cambridge, Massachusetts. And then she really looked really, really closely. And on that test, she noticed a white pupillary reflection and immediately sent us that afternoon to an opthamologist.”Ginette: “At this point, Bryan’s wife, Elizabeth, was freaking out because she’d done all the research about leukocoria, or white eye, and she knew what white eye might mean for their four-month-old son.”Bryan: “In Noah's case,

4/29/2017 • 25 minutes, 10 seconds

How Many Slaves Work for You?

If someone came up to you and randomly asked you, "How many slaves work for you?" maybe you'd think, "Slavery ended a long time ago, Bro." Or maybe you would take the question seriously. With 20 million to 46 million people enslaved in the world, it is a serious question, and while we don't see it daily, some of these enslaved people make things for us. Even if we're judicious about what we buy, we would be surprised just how much global slavery goes into producing the goods we do buy. But how can we quantify it? How can we solve this? Justin Dillon, who has worked with the U.S. State Department and hundreds of businesses, thinks he has the answer.Transcript:Ginette: “Our world today is an extremely vast, complicated, and interconnected web of 7.5 billion people. We’re directly connected to some, and it’s really easy to see those connections on Facebook, Instagram, Twitter, LinkedIn. But there’s a whole other group of people we are much more subtly connected to—people who are basically (who are essentially working for us) invisible to us, 20 to 46 million of them.“Our guest today deals with this invisible web every day.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production . . .”Ginette: “Today’s episode is brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Quickly locating data, understanding it, and combining it with other sources can be difficult. The data.world Python library allows you to bring data.world datasets straight into your workflow. Easily work with data and metadata in your Python scripts and Jupyter notebooks. Ready to dive in? Learn how to use data.world’s Python library at meta.data.world. Curtis: “Before we get going, one other note about data.world—starting today until May 5th, we are hosting a data competition on their site, and we’d love your participation. Donald Trump’s tweets have been the source of a lot of media attention recently—many high profile news outlets have asserted his tweets show signs of authoritarianism, some say he’s using his twitter account to shape the new cycle, and some have even built algorithms to make stock market decisions based on his tweets. Whatever your stance is on the subject, we’ve uploaded a dataset of every single one of his Tweets up to data.world, and we want to see what you can make of the data. This is a create competition by nature—submissions can be of any format, but the point is we want to see what you can learn, assert, or create with this data set. It’s easy to participate—just go to data.world/datacrunch, and you’ll find the dataset and all of the details. Submit by May 5, and we’re going to take all the submissions that tell the most compelling stories, we want to feature them on a future podcast episode.”Ginette: “Now back to the story. A few months ago, I ran across a website. It sucked me in. It asked me a provocative question, which we’ll get to in just a second, but first, we’ll introduce you to the man who’ll situate the story for you—the main person behind the website.” Justin: “My name’s Justin Dillon. I’m the founder and CEO of Made in a Free World. We started off years ago. I would say probably the genesis for us was me getting a call from the State Department in about 2010. I’d already been doing some projects, a few websites and, films that I was producing, around human trafficking and modern-day slavery.”Curtis: “Justin directed a documentary he released in 2008 called ‘Call + Response,’ which ranked as one of the top documentaries in 2011.”Justin: “And the State Department called and said, we would like to do a project with you, we like the way that you use data and tell stories,

4/15/2017 • 20 minutes, 14 seconds

Predicting the Unpredictable

We now know black swans exist, but Europeans once believed that spying one of their kind would be like stumbling across a unicorn in the woods—impossible. Then, Willem de Vlamingh spotted black swans in Australia, and this black bird, which once represented the impossible to Europeans, shifted to represent the unpredictable. One company now dons the name "Black Swan." Find out how it aims to predict what we currently consider to be unpredictable.TranscriptGinette: “Submerse yourself in early 1600s London culture for a minute. Shakespeare’s alive and in his late career. The first permanent English settlement in the Americas just happened. Oxygen hasn’t been discovered yet. But a lesser known cultural idiosyncrasy has to do with a large white bird, the swan. In Europe, the only swans anyone had seen or heard about were white, so of course, in their minds, a swan couldn’t be any other color. From this concept, a popular saying develops, originally stemming from a poem. You use it when you want to make a point that something either doesn't exist or couldn’t happen. You’d say something like this: ‘you’re not going to find out because it’s about as likely as seeing a black swan,’ meaning that, that thing or event was impossible.“But then a discovery blows everyone’s minds. Dutch explorer Willem de Vlamingh is sent on a highly important rescue mission. A lost ship with 325 people on it probably ran aground near Australia, and they needed him to go rescue these people and the goods on board. While Willem and the three ships under his command go and search Australia for this lost ship, they find lots of fish; unique trees; quokka, a cat-sized kangaroo-like creature; and . . . black swans. This last discovery inevitably permanently shifts the meaning of this saying. After this, people start using it more to say when something’s highly unlikely or an unpredictable moment. “Now this concept of an unpredictable moment is why Steve King named his company Black Swan, because they predict the seemingly unpredictable.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Steve King: “I am Steve King; I’m the CEO of Black Swan. Black Swan is 250 people who focus on trying to predict consumer behavior using data science, artificial intelligence, and big data. We have lots of large clients. We mostly work with big companies that have big problems to solve. Our work sort of splits across the US and the UK. Black Swan is absolutely full of stories. A lot of the work we really do is finding a hard problem that no one’s really solved before and then using data science to crack it, but there always quite interesting stories because, you know, they’re stories of a little bit of adventure, luck, and skill.’”Ginette: “The UK’s Sunday Times has consistently placed Black Swan on its lists: in 2014, it was on the ‘Ones to Watch’ list in its Tech Track. In 2015, it was ranked number one on the Start-Up Track. And in 2016, it was ranked number one in the Export Track 100, because it had the fastest growing international sales for the UK’s small to medium enterprises. “So what’s the secret sauce to the rapid growth and success of Black Swan, a company that solves problems for large companies in many different industries? It turns out, they aim to be better than anyone else at accessing and crunching a specific datasource.”Steve: “The reason we’re quite broad is it actually sits on one simple idea, and the simple idea really is that the Internet is really the world’s biggest data source, and we call, we call the Internet the world’s biggest focus group. So pretty much every opinion of a consumer or the open data that governments are laying out is all there for you to consume, but the, the trick is can you consume it in a way to help you find patterns so you can ...

4/2/2017 • 21 minutes, 16 seconds

The Golden Age of Data Science

How did one boy's stuffed yellow elephant permanently intertwine itself in history? What is a data scientist? Why is right now the golden age for data science? We take a crack at all three of these questions—the second two, with the help of Gregory Piatetsky-Shapiro and Ryan Henning.TranscriptGinette: “Over the past few years, we’ve seen these news flashes:“An article in Harvard Business Review in 2014, titled: Data Scientist: the Sexiest Job of the 21st Century“Mashable’s article in 2015: So You Wanna Be a Data Scientist? A Guide to 2015’s Hottest Profession“Business Insider, 2016: Data Science was the #1 Profession as Rated by Glassdoor“A data science industry observer, KDnuggets, 2017: Data Scientist: Best Job in America, Again, which cites the most recent Glassdoor report outlining the very top jobs in America:“It turns out, four of the five top US jobs deal with data. In descending order, we find data scientist, devops engineer, data engineer, and analytics manager.”Curtis: “With four out of five of these top jobs orbiting data, clearly something’s going on here.” Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Ginette: “Today is a culmination of everything we’ve talked about in our series on the history of data science. This is where all the contributions of Florence Nightingale, William Playfair, Ronald Fisher, Ada Lovelace, and many others come together in one place. We’ll add a couple more people to this list to answer these two questions: ‘What is a data scientist? And why is right now the golden age of data science?’” Curtis: “According to IBM, ‘everyday, we create 2.5 quintillion bytes of data.’ But what does a quintillion actually look like?“Well, if you take one quintillion pennies, you could actually place them face up end to end can and blanket the entire surface of the earth 1.5 times over. Or think about one quintillion ants. That would be like taking all of the ants that exist today on planet earth according to some estimates, and then you have to take that number and multiply it by 100. So, that ant pile in your front yard becomes 100 ant piles in your front yard. Basically ants take over the earth. And we make 2.5 quintillion bytes every single day!“The next question is, how much information does that actually represent? It’s 250,000 times the amount of information that all the printed material in the Library of Congress contains. And we make that every single day.” Ginette: “In 2013, SINTEF published this stat, quote: ‘90% of the world’s data has been created in the preceding two years.’ According to one Ph.D. technologist, this has been true for the last 30 years because every two years, we produce 10 times as much data.”Curtis: “This exponential growth is insane. Just as an example of this type of growth rate, if you take a hypothetical scenario, and you take the world’s population, and say it starts growing as rapidly as data is growing now, it would look like this: Currently, the world’s population, 7 billion people, could fit in the size of Texas if they were living as densely as they do in New York City. Now, in two year’s time with this growth rate, you’d actually have to cover the entire United States and half of Canada with people living in New York City-like density. And if you extrapolate that out ten years keeping the growth rate the same, you’d have to cover the entire planet, including all of the oceans, with New York Cities, and then you’d have to do that with 100–150 additional earths to fit all of those people. That’s the kind of growth rate we’re talking about.”Ginette: “With data collection on the rise, one report goes so far as to say that only the data literate will have the chops to be executives in the future, quote:

3/18/2017 • 25 minutes, 6 seconds

The Curated History of Data Science, Part 3

From a small building in Pennsylvania to widespread usage across the world, we track the compelling story of one of the greatest technological innovations in history, setting the stage for the age of data science.Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.”Ginette: “Today our story starts at a business building.” Curtis: “The building is in Philadelphia, Pennsylvania, on Broad and Spring Garden Streets to be precise. Envision the late 1940s.” Ginette: “You see a man absorbed in thought entering the building, and you decide to follow him in.”Curtis: “When you walk through his office, you find some bright engineering minds working on a fairly new startup in town: the Eckert-Mauchly Computer Corporation, or EMCC. It turns out, this is the very first large-scale computer business in the United States.”Ginette: “While this business environment on the surface is vibrant and innovative, behind the scenes, it’s a pressure cooker full of confusion.” Curtis: “The owners, John Mauchly, who you followed into the office, and his business partner, J. Presper Eckert, are talking about something strange that’s been happening: most of their clients had been from the government, and now they’re quietly pulling away from doing business with EMCC without any explanation, which is both alarming and confusing to the business owners. It’d be one thing if the government gave a reason each time it pulled out of a contract, but without one, they have no idea what’s wrong or how to try and fix the situation. It’s like going through several breakups where the only explanation offered is, ‘it’s not you; it’s me.’ “So what’s actually going on here?”Ginette: “The answer is woven into John’s backstory, a backstory that also includes the story of the ENIAC, the very first fully electric general purpose computer.“In John’s earlier career, he was involved with scientific clubs and academia. He started as an engineer and eventually became a professor at the prestigious Moore School of Engineering at UPENN. At one point, he got lucky. He asked essentially this question to the right military person on campus: what if I could build a machine that would significantly reduce your trajectory calculation time for projectiles?”Curtis: “So the military ends up formally accepting his proposal, and John and Presper team up for three years on this top-secret military project to build the ENIAC. “At the time, the ENIAC is really impressive in both size and ability. It weighs about the same as nine adult elephants, which is 27 tons, and it has about 17,500 vacuum tubes, each about the size of your average household light bulb. It has 5,000,000 hand-melted joints. And it’s the size of a small house—about 1,800 square feet. And in today’s dollars, it costs about $7 million.“It’s the very first of its kind. It’s both completely electric and a general purpose machine, meaning you can use it to calculate almost anything as long as you give it the right parameters. The bottom line is that it’s a lot faster than anything before it. It’s 2,400 times faster than human computers, and 1,000 times faster than any other type of machine computer at the time. For example, it took the calculation of a 60-second projectile down from 20 hours to just 30 seconds. To understand the magnitude of this, it's like moving from an average snail’s pace to the average speed of a car on a highway.”Ginette: “Here’s another way to look at this: if you drive your car (the ENIAC) across the country from L.A. to New York City at about 70 miles per hour without stopping, it would take you a little over a day and a half to drive there. In contrast, it’d take a snail (the human computer) without stopping about 11 years.” Curtis: “So it turns out the ENIAC isn’t ready in time f...

3/1/2017 • 19 minutes, 6 seconds

The Curated History of Data Science, Part 2

She isn’t your typical English girl from the early 1800s. She’s a girl who, because of her fortunate and unfortunate family circumstances, ends up perfectly situated to become part of something that will revolutionize the world.Ginette: “For many reasons, she isn’t your typical English girl from the early 1800s. She’s a girl who at one point examines birds to discover their body-to-wing ratio so she can invent a flying machine and write a book about it. These are goals that show mathematical skill, creativity, and initiative. She’s also a girl who, because of her fortunate and unfortunate family circumstances, ends up perfectly situated to become part of something that will revolutionize the world.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Curtis: “In our last episode on the history of data science, we talked about the origins of charts and data visualization, which are an important to data science, but in today’s story, we’re going to start a new thread that’s absolutely essential to the fabric of this history. We’re going to talk about some brilliant inventors that gave rise to an idea that would change the course of history—arguably one of the most powerful ideas that has shaped our modern world. It’s a story of triumph and innovation, but also of tragedy, because even though the ideas they moved forward had a dramatic effect on all of us in the long run, in the short term, many of these people saw their dreams fall apart before their eyes. So today and in our next episode, we pay homage to some key people who started the wave that gave us technology that makes our modern lives possible. And we’re gonna to do that first by getting back to the story of the girl we mentioned in the intro.”Ginette: “Interestingly enough, this episode ties into our last episode in an unexpected way. The little girl we introduced to you earlier is born about the same time as Florence Nightingale. She’s about five years older. “We have to understand a little bit about her parents, Annabella and George, to have a better insight into her, so here’s a peek into their lives: They’re both highly intelligent, capable, and well-educated, and they’re from high society. George is more verbal and artistic, and Annabella is more logical and mathematical. “From the start, the pair is not a good match. Annabella sees George’s flaws, but she also sees George’s potential. Beyond that, Annabella is probably attracted to his very handsome (as a lot of people describe him), bad-boy, wild-and-wooly type. One good example of his rebellious nature and disdain for authority is how he exploits a loophole in college to flout what he considers is an absolutely outrageous school rule: since the university won’t let him bring his cherished pet dog with him, he defiantly keeps in his Cambridge University apartments a tame pet bear. Essentially, as loopholes work, the rule doesn’t explicitly say no pet bears, so the university in his mind can’t immediately do anything about it—this may be partly why he only lasts there a term. Anyway, these are the types of things Annabella thinks she can change about George. “On George’s side of things, he notices Annabella’s sharp intellect. She’s incredibly smart. From early childhood, her parents recognize her natural brilliance and essentially give her what most women can’t get in those days—the equivalent of a Cambridge University education. Something else George likes about Annabella is that she’s down to earth. So eventually, he proposes to her, and probably against her better judgement, she says ‘yes’, and they get married, but within a year, things get messy.“She notices George’s strange behavior. He’s dark, he’s angry, he’s brooding. And over time, he starts doing other odd things and even lashes out at her.

2/16/2017 • 22 minutes, 37 seconds

Eyes on the Pirates, Part 2

Pirates in folk stories and popular movies conjure up strong imagery: eye patches, Jolly Rogers, parrots, swashbuckling, scruffy voices that say “Aye, Matey.” But what do the lives of successful pirates look like today? And what's being done to stop them from plundering and smuggling our ocean's precious resources? World Wildlife Fund's project Detect IT: Fish takes aim at these pirates and other illegal actors with this cutting-edge project that reduces a time-consuming tracking process from days to minutes.Ginette Methot-Seare: “After nearly 15 years of lucrative, illegal activity, he was caught and convicted. The judge in this key case stated that his business activities were an ‘astonishing display of the arrogance of wealth and power.’ He destroyed evidence, and while under investigation, even hired a private I to follow an agent around. After serving prison time, the main perpetrator and his accomplices were ordered to pay 22.5 million dollars in restitution to South Africa for the damage they had done.”Curtis Seare: “Who was this man? Arnold Bengis, a modern-day pirate.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Ginette: “Believe it or not, these episodes take hours and hours of hard work to produce, and the success of this show depends in large part on the listener reviews and ratings we get. If you like what we do, the best way to support us is to go to iTunes, Google Play, or your favorite medium for getting the episodes, and leave us a review. “If you’re willing to do that, a big thank you in advance, and a big thank you to those who already done it.”“At the end of our last episode, we promised you the story of one of the biggest pirate busts in history, and we will deliver, but before we go on, if you’re new to Data Crunch, you may want to start with the last episode, which will give you more background and context.“By some accounts, this is what happened: Arnold Bengis became incredibly wealthy after growing a business in South Africa. He had a house in Bridgehampton, New York, worth several million dollars, an apartment in the Upper West Side of Manhattan on the 41 floor, and a house in Four Beaches, an exclusive neighborhood in Cape Town, South Africa. “His 6,000-plus square foot Bridgehampton house, a large Spanish-tile stucco villa, overlooked the beautiful Mecox Bay to one side and the Atlantic ocean on the other. His six bedroom, seven full bathroom single-family home had what you’d expect to find at a palatial place: a well-manicured golf green; a luxurious pool; large, well-decorated rooms with chandeliers, and expensive furniture. When the house last sold, it went for 10 and a half million dollars. One of the agents of the National Oceanic and Atmospheric Administration, or NOAA, who investigated Bengis’s case even said he was in partial awe of the lifestyle Bengis was living, which was supported by illegal fishing business.“Bengis held his money, both personal and business, in a highly complex network of trusts and asset havens. The money was scattered abroad in many different places, like Switzerland, Gibraltar, Jersey Islands, and Britain. While authorities didn’t know everything about his money, what they did know was that he had vast assets. For example, in just one year, he deposited $13 million into one of his accounts. His lawyer said that one of his several trusts was worth more than $25 million, according to the book Hooked: Pirates, Poaching, and the Perfect Fish. “I know what you’re probably thinking: ‘How did this man make so much money from illegal fishing?’ We told you in our last episode that IUU fishing rakes in between $10 billion and $23.5 billion dollars a year, and that’s a conservative estimate. The larger picture is this: When you consider that the entire world’s trade...

1/31/2017 • 21 minutes, 59 seconds

Eyes on the Pirates, Part 1

The history books teach that slavery ended, but it still exists; it’s just morphed its form—different commodity, different location, but same abuses. The commodity is seafood. The location, Southeast Asia. The abuses, forced servitude with all its ugly associations. Some people make a substantial living off illegal, unregulated, and unreported (IUU) fishing, which fuels a dark underground. How is big data angling to stop it? Find out in our next two episodes.Transcript:Michele Kuruc: “People who were seeking better lives and, and coming to look for work were kidnapped by unscrupulous dealers, who forced them into lives we can’t even imagine.”Ginette Methot: “I’m Ginette.”Curtis Seare: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Ginette: “Welcome back to Data Crunch! We took a bit of a break over the holidays, and we hope you were able to too. “So upward and onward to 2017. What are we up to this year? We’ll be finishing our data science history miniseries for you, and we’ll be meeting some really cool people from KDnuggets, Galvanize Austin, and Datascope in Chicago. But before we do those episodes, we have to pivot because with major recent developments, this particular episode deserves to come out now.“The lives we can’t even imagine look like this according to the Associated Press. One Burmese man left his village when he was 18 years old. He followed a recruiter who promised him a construction job. When he arrived in Thailand, his captors held him with little food or water for a month. He was then forced onto a fishing boat. He was told that he was sold and would never be rescued. In that fishing environment, sometimes he worked 24-hours a day. He and his fellow fishers were whipped with stingray tails and shocked with electric devices. They were told during their time fishing that they would never be let go, not even when they died, and men in his similar situation were sometimes sold from ship captain to ship captain. “If they tried to escape the work, they were locked in cages on remote islands. In the 22 years he was away from home, he asked to go home twice. The first time he asked, the company official chucked a helmet at his head, which left a bloody gash that he had to hold closed. The second time he begged to go home, he was chained to the boat deck for three days in the blistering sun and when the night came, it was rainy, and he could do little to protect himself from it. During that three-day period, he had no food. He amazingly fashioned a lock pick and unlocked his shackles. He knew if he was caught, he’d be killed, so he dove into the water in the cover of night and swam ashore, hiding for his life.“You might ask why he didn’t go to local officials. The answer is he couldn’t because they might sell him back to the ship captains. So after eight years in the jungle hiding from the fishing companies, he finally got to go home because of the AP’s reporting. This is modern-day slavery. Every year, thousands of people are tricked or sold into this type of slavery in order to catch fish for lucrative markets.“If you’ve ever read Solomon Northup’s gripping autobiography, Twelve Years a Slave, the similarity is eery. They are both free men who are initially unknowingly abducted. They’re shackled, beaten into servitude, and forced to work in harsh conditions for many, many years. Both are desperate to go home to their families, and both experience miraculous escapes from tyrannical systems. But unfortunately, not everyone escapes.“This is a huge problem, and it’s frequently linked to illegal, unregulated, and unreported fishing, well known as IUU fishing. Unfortunately, IUU fishing is linked to some of the ugliest transnational crimes: modern-day slavery, human trafficking, drug trafficking,

1/13/2017 • 30 minutes, 55 seconds

The Curated History of Data Science, Part 1

Who were the people pushing the limits of their time and circumstances to bring us what we know today as data science? We examine what motivated them to do their important work and how they laid the foundations for our modern world where algorithms and analytics affect everything from communications to transportation to health care—to basically every aspect of our lives.This is their story.Transcript:Ginette: “She was obsessed with her failure—she thought she hadn’t done enough. And it didn’t matter that the public saw her as a heroine. So she ended up writing an 830-page report where she employed some power graphics, and this paired with her other efforts ended up changing the entire system.”Ginette and Curtis: “I’m Ginette, and I’m Curtis, and you are listening to Data Crunch, a podcast about how data and prediction shape our world. A Vault Analytics production.”Ginette: “In our last three episodes, we have just thrown you into the middle of data and prediction and the explosion of data science. And some of you have had some questions, like, How did data science become a thing?“In the next three episodes, we’re doing a miniseries where we’re going to address some of these questions, and I think you’ll find it very interesting. Our story starts with an impressive woman. “It’s 1854. It’s the Crimean War, and a woman shows up at a hospital to help. She finds horrifying conditions. To paint an accurate picture for you, here’s a little bit of what she found: the sewage and ventilation systems were broken; the floor was an inch thick with waste—probably human and rodent; the water was contaminated because, come to find out, the hospital was built over a sewer; rats were hiding under beds and scurrying past, as were bugs; and the soldiers’ clothing was swarming with lice and fleas; and on top of that, there were no towels, no basins, no soap, and there were only 14 baths for 2,000 soldiers. Keep in mind this was 20 years before Pasteur and Koch spread Germ Theory. “So she and the 37 nurses that she brought with her set to work, and they did their best to clean up the hospital and help the soldiers. Eventually, because of her, the government sent a sanitary commission. They flushed the sewers; they improved the ventilation. And this helped the situation dramatically. In the end, she reduced the death rate by two thirds.“But Florence Nightingale went home feeling like she had failed, which you’ll remember we mentioned right at the beginning of the podcast. She felt a lot of soldiers had died needlessly. This drove her to write her famous 830-page report. And she ended up working with lead statistician William Farr, who actually helped invent medical statistics. He would say to her, ‘We don’t want impressions, we want facts.’ And working under that type of context, she gathered vast amounts of complex army data and analyzed it to find something rather shocking: 16,000 of 18,000 deaths in hospitals were not due to battle wounds but to preventable diseases spread by poor sanitation.”“So these statistics completely changed her understanding. She thought the deaths were due to inadequate food and lack of supplies, but after the sanitary commission came in, she noticed that the mortality rate dropped significantly. So as Florence prepared her report, she was afraid that people’s eyes would glaze over the numbers and that they wouldn’t grasp the significance of what she was trying to say. So she came up with a clever way to present her data: she ended up using graphics, in particular what she’s known form the rose chart, to convey her message.”Curtis: “Nowadays, charts are everywhere, but back in her day, the idea of creating a picture that was defined by certain data points was not very common, and so the fact that Nightingale thought to do this was very innovative and clever, and it was important because it was able to communicate what she needed to communicate. “Her mentor,

12/9/2016 • 12 minutes, 45 seconds

The Predictive Power of Waffles

When breakfast food takes on hurricanes, who wins?For another interesting take on the Waffle House Index, see this article the Fivethirtyeight blog, which they posted December 6, 2016.Curtis: “I love waffles. I fill up each of the little squares with the precise amount of syrup so that each bite is a perfect distribution of syrupy goodness.”Nathan: “I love owl-shaped waffles.”Tiffany: “The kind you get at a hotel when they serve you those free breakfasts—they’re just perfect.”Lily: “I love waffles with strawberries.”Vince: “Liège waffles—Belgian waffles were pale in comparison. They’re sugar clumps in the shape of pearls, and they put this in the batter, and it doesn’t dissolve out, and they taste really good. I didn’t even need to add syrup.”Ginette: "I'm Ginette, and I’m Curtis, and you are listening to Data Crunch, a podcast about how data and prediction shape our world. A Vault Analytics production."Curtis: “Today we’re talking about hurricanes, waffles, and predictions.”Ginette: “It happened in 2004. Charley, Frances, Ivan, and Jeanne were four aggressors. With the group’s combined strength, they wrecked their victims. First, Charley attacked and was the most destructive. Frances followed quickly behind with a much weaker pummel, but, being so quick on the heels of Charley, the attack was effective. Then came Ivan with an unexpected one-two punch. And finally, Jeanne forcefully hit the same spot as Frances—but with much more intensity.“To some, this wrecking ball of an attack is known as the Year of the Four Hurricanes. These four hurricanes ruthlessly shredded Florida’s east coast, west coast, panhandle, and interior in about six weeks, leaving $29 to $41 billion in damages. As a point of comparison, if Google had to cover these costs, it would take two to three years of the organization’s net income. Next to Hurricane Andrew, (the most destructive hurricane in US history at the time)—Charley claimed second-place that year.“Charley obliterated mobile homes, savaged houses, knocked over water towers, caused the collapse of carports, obstructed roads by littering them with large trees and power poles, blew over semi-trucks, crushed large trailers, and rendered areas unrecognizable.“We spoke with a couple that experienced a hurricane first hand, and their ordeal sounds harrowing.”Melody Metts: “I don’t think we expected anything that we found when we came back. You couldn’t even recognize where you were.”Ginette: “Christopher and Melody Metts lived within twenty miles of Homestead, Florida, where Hurricane Andrew hit with full fury.”Christopher Metts: “There was nothing taller than the first floor. Any tree, any light pole, any anything that might have been higher than the first floor of a house was completely gone. Anything that would indicate where you were—a street sign, a light—it was all gone as far as you could see.”Ginette: “Like most south Florida residents, they didn’t think much of the storm predictions.”Christopher: “We saw it, and the predictions for it for many days.“Because we were in south Florida and because every hurricane season that comes along has scares that could be very devastating but it’s a near miss or it turns at the last minute, you get into a pattern of they cry wolf too often and you’re lulled into a sense of ‘well not this time.’”Ginette: “While this was their initial feeling, eventually the predictions became serious enough that the authorities issued an evacuation order, so the Metts prepped their house for wind damage and drove to Orlando with seven children in tow, ages one to eight, and it’s a good thing they did because their family would have been in extreme danger otherwise. This is where we start to see the power of prediction in people’s lives. Imagine if there had been little to no ability to predict the hurricane.”Curtis: “Before modern hurricane prediction,

11/18/2016 • 18 minutes, 6 seconds

I Had to Run

Imagine you have to leave your home immediately, and you have little time to grab anything to take with you. You don't know where you are going—you just know you have to flee for your life. Many people face a similar situation—one in every 113 people on the earth, in fact. There are 65 million people living in a state of limbo, and they don't know what's going to happen to them, but they do know they can't go home. After losing their homes, often their loved ones, and sometimes their identity, they desperately hope for safety and a new home. This episode is where data science meets refugees.Transcript:Hadidja Nyiransekuye: “It wasn’t until I started having as a teacher and a principal of a school when people come in the middle of the night to come attack my house. That’s when I decided I think I need to run again.”Ginette Methot-Seare: “I'm Ginette Methot-Seare, and you are listening to Data Crunch, a Vault Analytics production.”Hadidja: “Just think about something threatening you. Your first reaction would be to duck away from the noise or from whatever is threatening you. Now think about somebody coming with a gun or with a machete, threatening not only your life but the life of your loved ones. You run, you run. Everybody does.”Ginette: “And that’s exactly what Hadidja Nyiransekuye did twice.”Hadidja: “The first time I run, I run because I needed to run.”Ginette: “She was fleeing from bombs.”Hadidja: “It was a mass exodus. Everybody was running, so we run like everybody else.”Ginette: “Hadidja had to flee in her PJs with four children. One of them, a baby on her back.”Hadidja: “My little girl, Lydia, was eight at the time, and I had two of my nieces.”Ginette: “Her husband, who was imminent danger, fled first. And her boys also ran before her.”Hadidja: “It was hot. We were thirsty and hungry. And these young people were perched on . . .”Ginette: “pickup trucks”Hadidja: “And they would say, ‘Keep moving, keep moving! There’s a nice place called Mugunga; that’s where you’ll get food and you’ll get water and you’ll get shelter. And I remember saying to myself, ‘People are dying of Cholera, and I’m going to Mugunga on foot—like 50 miles?’ I just didn’t think I was going to make it.”Ginette: “As a child, Hadidja had polio. Everyone one in 200 polio cases leaves its victims permanently paralyzed. For Hadidja, while her virus didn’t paralyze her, it left her disabled. She walks with a cane and a leg brace.”Hadidja: “At the time, I actually ended up at the Center for People with Disability in the Congo because I had been treated there in my teens. And of course, you just wished people would just let you spread your mat or something you have on their door so you can spend the night there. But they were asking us to get out of the city, to go to that place where they were going to be building refugee camps, so in those conditions, you actually, you hear what other people are saying. Well you just follow because it’s not like you have a choice. Nobody knows where they are going when they are refugees. That’s why they’re called forced migrants.”Ginette: “Let me go back and fill in some holes for you. Hadidja’s story starts . . . ”Hadidja: “in the town of Gisenyi. That’s where I was born and raised.” Ginette: “Her town is right inside the border of Rwanda.”Hadidja: “It’s at the border of former Zaire, now Democratic Republic of Congo.” Ginette: “As she grew, she gained an education, became involved in women’s movements, and taught modern languages with an emphasis in applied linguistics. During that time, she married her husband, and they had four children. But then in the 1990s things became precarious in her country.”Hadidja: “People tend to think that the war in Rwanda started in ’94. Actually the war started on October 1, 1990.”Ginette: “Hadidja is referencing an invasion of a group of mostly Tutsis, a minority group,

11/1/2016 • 22 minutes, 8 seconds

Take It Back

What if one day, out of the blue, you find yourself sick—really sick—and no one knows what's wrong. This is a podcast about a sleeper illness and what one team of data scientists led by Elaine Nsoesie is doing to reduce its reach.Sam Williamson: "It felt as if I were on some kind of hallucinogenic drug. I felt really, really hot. Really cold again. The room started spinning. I got tunnel vision. I was about to black out."Ginette Methot: "I'm Ginette Methot-Seare, and you are listening to Data Crunch, a Vault Analytics production. Today we're going to talk about something that could affect you or someone you love if it hasn't already."Shawn Milne: "It still is a pretty vivid memory for me just because it was such a, such a terrible thing."Ginette: "This is Shawn Milne."Shawn: "Both of us just booked for the bathroom because we were both throwing up."Ginette: "He's describing a sickness that both he and a friend suffered from."Shawn: "On the way home, we had to keep pulling the car over, and we were just both throwing up on the side of the road. It was absolutely terrible. We were just both up all night just throwing up. Just so beat."Ginette: "While Shawn's experience lasted about 48 hours, Samuel Williamson, the person you heard speak at the beginning of our podcast, had one that lasted for about a month."Sam: "I did go to a doctor for it after a while. They convinced me to go to a doctor. He in fact told me that my stomach was just tired, which I thought was a very strange diagnosis. So he suggested that I don't eat anything for a week. I think I lost about ten to twelve pounds in the first week, and so I went a week without eating anything, and came back a week later, and he asked me if the symptoms had gone away, and I told him 'no, they were about the same,' and he said, 'okay, well you can't eat anything else for another week.' I went about three days and then pigged out."Ginette: "While everyone's body reacts differently to this type of sickness, stomach pain was one symptom that everyone we interviewed described."Amy Smart: "I remember at one point, lying on my couch in excruciating pain, and thinking, ‘this is like having a baby, only with a baby, I know it's going to end.’" Ginette: "Amy had two little girls when she got sick, and she became so ill and weak that she couldn't take care of them. Fortunately, her mom lived nearby and could take her girls during the day, and her husband was able to stay home from work to take care of her."Amy: "I couldn't, I couldn't eat. I wanted to because my body was so depleted, but I couldn't drink. I couldn't keep anything down. We went to the ER because I was so weak, and they put me on IVs and gave me morphine for the pain."Ginette: "But for Amy Smart, the person speaking here, things got a lot worse."Amy: "All that was coming out both ends was blood. And I remember feeling like, 'this is what it feels like to die.'"Ginette: "Amy described to me that it literally felt like life was leaving her body."Amy: "I didn't know when it would end, when I would feel better again. If it would take days or weeks or ever. I remember thinking, 'I'm so glad it's me and not one of my little kids' because I don't know how they would have survived it.'"Ginette: "Now put yourself in her shoes for a second: you're sick and only getting worse. When you go to the doctor, the doctor isn't sure what's wrong."Amy: "They first thought it was stomach flu, then maybe Giardia, then maybe salmonella, and then they cultured it and found I had E. coli."Aside: "E. coli contamination. Possible E. coli contamination. E. coli contamination."Amy: "By then, once it was diagnosed as E. coli, it was a relief because then they knew how to treat it, and they put me on Cipro. By then the Center for Disease Control gets involved and is interviewing and trying to match the strain."Ginette: "Now as an interesting side note,

10/13/2016 • 11 minutes, 3 seconds