From 136e35be8418e2590859fa986a394ea5434f767c Mon Sep 17 00:00:00 2001 From: kljohnson4good <112668500+kljohnson4good@users.noreply.github.com> Date: Thu, 3 Oct 2024 12:29:34 -0700 Subject: [PATCH] Update 104-clair-ds-teams.md with timestamps and transcript Towards #92 Closed #258 --- 2024/104-clair-ds-teams.md | 1155 +++++++++++++++++++++++++++++++++++- 1 file changed, 1149 insertions(+), 6 deletions(-) diff --git a/2024/104-clair-ds-teams.md b/2024/104-clair-ds-teams.md index e4c12a3..cb7dc09 100644 --- a/2024/104-clair-ds-teams.md +++ b/2024/104-clair-ds-teams.md @@ -25,11 +25,6 @@ This presentation covers: - The “upside-down” approach to creating a data science team - Suggestions for how to right the ship if your data science team was formed using the “upside-down” approach - Cautionary tales for the near future of data science teams and history repeating -``` -## Timestamps -00:00 Help us add timestamps -``` -https://github.com/data-umbrella/event-transcripts/issues/92 ## About the Speaker @@ -44,6 +39,1154 @@ Dr. Clair Sullivan is currently the Founder and CEO of Clair Sullivan and Associ alt="Best Practices for Creating a Data Science Team" width="50%" /> ## Timestamps -[get from video] +**0:00** +**Introduction** +* Rishika Sullivan (host) introduces herself and gives a brief overview of Data Umbrella, a community for underrepresented persons in data science. +* Host introduces Claire Sullivan, the founder and CEO of CLA Sullivan and Associates. +* Claire gives a brief overview of her career and expertise in data science. + +**0:14** +**Data Umbrella** +* Reminder of the code of conduct for the webinar. +* Several ways to support Data Umbrella, including following the code of conduct, sharing events with others, and contributing to open source projects. +* Links to the Data Umbrella website, YouTube channel, GitHub, LinkedIn page and online suggestion box. + +**0:50** +**Best Practices for Creating a Data Science Team** +* Claire begins her presentation by sharing the title of her talk: "Best Practices for Creating a Data Science Team." +* She explains that this topic is a passion of hers and that she has worked with a variety of data science teams over the past 22 years. + +**0:56** +**Ugly Truths About Data Science** +* Claire starts by discussing some of the "ugly truths" about data science, including the fact that only 40.5% of companies have a Chief Data Officer, and that less than 25% of companies consider themselves data-driven. +* She cites statistics showing that most employees have access to data they shouldn\'t, and that 80% of analyst time is spent discovering and preparing data. +* She emphasizes that 87% of data science projects never make it to production. + +**10:12** +**Data Fable** +* Claire introduces the concept of the "data fable," a fictional story that illustrates the common challenges faced by data science teams. +* She walks through the fable, which involves a CEO who hires a team of data scientists without a clear problem to solve, and the subsequent challenges that the team faces in accessing data and creating value. + +**14:42** +**Lessons for the Future** +* Claire discusses the lessons that can be learned from the data fable, including the importance of clearly defining the problem to be solved, hiring the right people in the right order, and establishing a data governance framework before hiring data scientists. + +**16:11** +**The Upside Down Approach** +* Claire introduces the concept of the "upside down approach" to data science, which involves starting with culture and problem definition before hiring and collecting data. +* She explains that the traditional approach, which involves hiring people first and defining the problem later, is often problematic. + +**21:02** +**Culture** +* Claire emphasizes the importance of culture in creating a successful data science team. +* She explains that culture starts at the top and that companies need to understand the value of data and use it to solve business problems. + +**22:11** +**The Recommended Approach** +* Claire provides a step-by-step guide to the recommended approach for creating a data science team: + 1. Culture + 2. Problem + 3. Data + 4. People + +**25:24** +**Data Culture** +* Claire discusses the characteristics of a good data culture, including: + * Collective behaviors, beliefs, and practices + * Data being woven into the operations and mindset of the organization + * Everyone having a role in data and its use + +**28:21** +**Cultivating a Data Culture** +* Claire suggests ways to cultivate a data culture, including: + * Encouraging participation from everyone + * Investing in training and development + * Recognizing the role of data in decision-making + * Avoiding assumptions and biases + +**29:49** +**Strategy Specific to Data** +* Claire discusses the importance of having a clear data strategy that includes: + * Defining the business problem + * Identifying the right data + * Measuring ROI + * Communicating the strategy and its success to stakeholders + +**30:54** +**Return On Investment (ROI)** +* Claire emphasizes the importance of quantifying ROI when it comes to data science initiatives. +* She suggests using a variety of metrics to measure success, depending on the specific problem being solved. + +**34:04** +**Other Ways Data Can Contribute** +* Claire explains that data can contribute to business value in both direct and indirect ways, such as increasing revenue, decreasing costs, and improving efficiency. + +**34:13** +**Communicating Value** +* Claire stresses the importance of clearly communicating the value of data science projects to stakeholders. +* She suggests stating upfront how success will be measured and providing regular updates on progress and outcomes. + +**35:04** +**Business Knowledge** +* Claire discusses the importance of data scientists understanding business concepts and being able to communicate with business stakeholders. +* She recommends that data scientists consider pursuing an MBA or other business-related education. + +**35:30** +**Generative AI** +* Claire raises concerns about the current hype surrounding generative AI and emphasizes the importance of defining a clear business problem before investing in this technology. +* She warns that generative AI is not a panacea and that it can be expensive and difficult to maintain and monitor. + +**40:08** +**Empowering the Organization** +* Claire discusses the importance of empowering everyone in the organization to use data and make data-driven decisions. +* She emphasizes the role of managers in creating a culture of empowerment and providing training and support. + +**41:33** +**Conclusion** +* Claire concludes her presentation by emphasizing the importance of focusing on the business problem and creating a positive data culture in order to build a successful data science team. +* She invites participants to ask questions and connect with her on LinkedIn. + +**42:07** +**Q&A** +* **Question:** What are some red flags to look for in data science job postings? +* **Answer:** Job postings that do not clearly state the business problem, the required skills, or the expected time spent on maintenance and development. +* **Question:** What are the best roles for defining business problems that can be solved by data? +* **Answer:** Product managers and business managers are typically best equipped to define business problems. +* **Question:** How can data science managers advocate for data science and manage expectations? +* **Answer:** By demonstrating quantifiable improvements resulting from data science projects and by educating business leaders about the capabilities and limitations of data science. +* **Question:** Do smaller companies use data well? +* **Answer:** Yes, smaller companies often have the advantage of being able to implement data solutions more easily and effectively than larger companies. +* **Question:** What differentiates senior data scientists from junior data scientists? +* **Answer:** Senior data scientists are able to identify and propose solutions to problems independently, while junior data scientists may require more guidance and support. ## Transcript +
+ +Expand section to view full transcript + +``` +0:00 hey everyone welcome to data umbrella's webinar I'm going to do a brief +0:07 +introduction about five minutes and then Claire will do her talk and uh if you have any questions ask in the chat and +0:14 +we will answer questions at the end of um Claire's presentation this is being +0:19 +recorded and it will be up on our YouTube within a day or +0:25 +so um data Bella is a community for underrepresented persons in data science we organize data science and open source +0:32 +events we are a nonprofit organization this is the team that makes +0:37 +it happen my name is RMA barl is out of Nairobi Christina is out of Brooklyn and +0:43 +suum is out of India we have a code of conduct and we +0:49 +thank you for helping to make this a welcoming friendly Community for all that people want to keep returning +0:55 +to there are various ways to support data umbrella the first and foremost is to follow our code of conduct and we'd +1:01 +love if you could share our events with other people in your network and we have some open source projects as well um +1:07 +where people can contribute to um our we we do time stamps for our videos we have +1:13 +a new website we have a Sprint website um check out our +1:18 +GitHub uh you can also donate to our nonprofit we're on open Collective and you can find us +1:25 +there um we have quite a library of videos um we have over 120 videos now +1:32 +and so um you know subscribe to our YouTube channel to see all the other events we have a lot of libraries such +1:39 +as career advice playlists um data visualization +1:44 +psychic learn PMC numpy just a handful we have a monthly newsletter on substack +1:50 +data umbrella. substack do.com so you know I'm going to share the link in the chat in a +1:57 +bit uh these are the social media platforms where we're the most active Meetup is a place to go to find out +2:03 +about upcoming events YouTube is where our videos are our monthly newsletter GitHub has our +2:11 +projects this webinar platform has live captioning at the top right you'll see CC for closed captioning so if you want +2:18 +a transcript of you know the presentation just hit that and you'll get that we also have an online suggestion +2:25 +box which I will share in the chat but if you have any suggestions for future events topics if you have any feedback +2:31 +on this event if you want to share any te technical issues you've encountered +2:36 +or if you'd like to do a presentation for data umbrella let us know uh our next event after this one is +2:44 +on October 8th it is polers for data analysis in Python and that should be a great +2:50 +talk today's talk is best practices for creating a data science +2:55 +team Dr CLA Sullivan is currently the founder and CEO of CLA Sullivan and Associates a company dedicated to +3:02 +providing data science Consulting Services her career spans over 22 years of working with data with interest in +3:08 +graphs natural language processing and generative AI she has worked in a variety of settings ranging from +3:15 +director level roles to individual contributor at a variety of startups to Academia and the federal government you +3:21 +can find Claire on LinkedIn and with that I will be handing over the microphone and screen share to +3:28 +Claire excellent thank you rishima it's great to be here uh glad to be invited +3:33 +back and um just thanks everybody for tuning in or you know checking the video +3:39 +out after the fact I'm going to start by sharing my screen here and I will answer +3:45 +the very first question so uh the very first question is always what it is are +3:53 +the slides going to be made available and the answer to that question is yes you can go to my GitHub here and you +3:58 +will see a link in the in the top under uh talks and presentations um which will take you to a repository with all the +4:04 +slides that I'm going to talk about today okay so um let's let's kind of +4:09 +dive into it here the title of this talk best practices for creating a data science team um this is just something +4:16 +that um I'm kind of passionate about uh which has to do we're going to talk a lot today about data culture and how you +4:23 +create data culture and why it's important and so um just these are kind +4:29 +of my observations of working with a variety of data science teams over the past 22 years even before it was called +4:35 +data science but um let's just let's just kind of dive in and we're we're gonna have some kind of ugly truths here +4:42 +um this is this is just you know the obligatory agenda um you know if we're talking about data science teams and how +4:48 +to make them successful that implies wait a second are they not successful so we're going to start with some uncomfortable facts about data science +4:56 +and data science products um and I'm going to turn this into what call a data Fable here I'll explain what I mean by +5:02 +that in a second and then I've got this other thing that I call the upside down approach um and um how it's when you see +5:10 +it happening and and when I talk about the data Fable and we get into the upside down approach when I've talked about this before I get a lot of people +5:16 +in the room who data scientists who've been working for a few years and they're like oh yeah this is really painful H +5:22 +how can we get out of this um this negative side of data science um and and +5:29 +fix this thing that I call the upside down approach and then just kind of some lessons for the future of where I see um +5:35 +the history is repeating itself here before I get too far in um we've got the chat going and I have the chat window +5:42 +open I'm not necessarily going to be looking at it all the time so there will definitely be QA time at the end um if I +5:48 +happen to see questions pop up in the chat window great um but probably the +5:53 +the the time that you'll have be able to catch my attention most for questions will will be at the end +6:00 +okay so let's get into some of these stats all right um so I was um doing +6:06 +some looking around various literature here at um what uh what Chief data +6:11 +officers are um saying about their companies and the success of their companies and starting off by companies +6:19 +who are starting to create these roles of Chief data officer um only 40 this is +6:25 +um this is a survey by the way of something um like aund 50 Chief data +6:30 +officers from Fortune 1000 companies and of the respondents um +6:37 +40.5% said that people even knew what the role of a chief data officer was +6:43 +despite the fact that almost 83% of the responding companies had one so they have these people whose job it is to +6:51 +kind of Shepherd the whole data process and yet nobody really knows what they should be doing okay well that's not +6:57 +great um I will tell you right now that I'm looking at two monitors so if you see me looking over here um that's +7:04 +that's what's going on here um it the in 2003 in this data anex sorry data and +7:12 +analytics leadership annual executive survey which is the the linked at the +7:17 +bottom here um almost 88% of companies said that they were increasing their +7:23 +spend and data initiatives especially in data modernization as their highest +7:28 +priority um 24% identified their organizations as being data driven okay +7:34 +so let's let's stop and think about that one one quarter Le slightly less than +7:40 +one quarter of companies in these Fortune 1,000 companies slightly less than one quarter considered themselves +7:45 +data driven that's mindboggling to me um 20% 20.6% identified that they have a +7:53 +data driven culture or they have a data culture at all so you know we'll talk +7:58 +about what we mean by data culture culture here but people people aren't basically saying that they value data is +8:04 +what they're saying in that statement um almost 80% cited cultural issues as the +8:10 +greatest obstacle for delivering business value from data Investments so you've got you know 20% saying they +8:17 +don't have a developed data culture yet almost 80% saying the lack of that culture is what's keeping them from +8:23 +getting business value okay so these are some really kind of scary statistics if +8:28 +you think about these are big companies these are you know publicly traded companies who are saying yeah we don't +8:34 +really know what to do with data okay some more stats um less than half of an +8:41 +organization's structured data is being used in making decisions why are you +8:46 +collecting data if you're not using it to make decisions that's and then look +8:52 +at that subet less than 1% of unstructured data is even analyzed or +8:58 +used at all so like 99% of unstructured data is just kind of sitting there doing +9:03 +whatever um More than 70% of employees have access to data they shouldn't have +9:09 +80% of analyst time is spent just discovering and preparing data and and +9:15 +those of you on this call I'm that's probably not a surprise to you I say you know when I give talks I say at least +9:21 +80% of my time as a data scientist is actually a data janitor finding the data cleaning the data getting it ready and +9:28 +then you know we can have all kinds of discussions about data breaches and data being in silos and um data technology +9:35 +that isn't up to the demands the company's placing on it this is not setting up a good scenario here is what +9:41 +I'm what I'm hoping you're taking away from this um but here's the one that just kills me +9:48 +87% of data science projects never make it to +9:54 +production so you have teams of data scientists and data scientists be real +9:59 +are not you know the cheapest employees at a company and 87% of their work is +10:05 +not making it into use what is up with that okay so this is why I um decided +10:12 +that I was going to create this data Fable to try and explain this to people um perhaps some of you on this call have +10:17 +read this book the five dysfunctions of a team it's practically mandatory reading if you're going for an MBA um +10:24 +which I happen to be doing that and um this this five Len's book here I really +10:30 +encourage everyone to read this book especially if you're in management or thinking about going into management um +10:36 +really interesting book um and he starts off with what they call lindon's Fable where he creates like this story around +10:44 +some fictitious seite um individuals and how it is that they're relating to each +10:50 +other how they communicate to each other this is a book all about improving Communications during you know some +10:55 +difficult times at a company so I decided to take that same idea and I created a data Fable and I'll tell you +11:02 +um I started my my Consulting freelance company here a little less than a year ago and my my marketing department I do +11:09 +not fund them very well so you will see that reflected in the quality of my Graphics here so here is the beginning +11:15 +of the data Fable we have the CEO and the CEO is is you know thinking here +11:22 +that oh boy you know gez all of my competitors they're they're getting data +11:27 +scientists and if I'm going to remain competitive I have to do it too so the fomo the fear of missing out is real +11:34 +with the CEO the CEO perceives a threat and so when a CEO perceives a threat they're going to act on it so CEO +11:41 +contacts HR and says hey hire me some data scientists and HR says yeah let's +11:47 +do that but uh we haven't had them before what's a data scientist and they +11:52 +sit around and they talk about it and they oh they all have to be phds and experts in natural language processing +11:59 +and 10 plus years of experience with pytorch because the CEO heard about pytorch and thought it was great and uh +12:06 +yeah if you're doing the math pytorch has not existed for 10 more more years so and actually I have seen that exact +12:12 +thing on a job ad before somebody wanted more than 10 you know 10 plus years of experience with pytorch this is you know +12:18 +these are taken from real life experiences here so um so yes the uh HR +12:24 +is primed and ready to go to hire a bunch of data scientists so they hire this team of data scientists and they're +12:29 +excited and they're ready and they're here and and they get there and they're like okay who's our manager what's uh what's mentoring look +12:37 +like here what projects are we assigned to who's our PM okay all very reasonable +12:43 +questions and by the way this exact scenario has happened to me um so uh the +12:49 +CEO is just like uh what okay so then this the data scientists like start +12:56 +trying to figure out what they could work on because it's not like they were told by anybody what they're going to work on because remember they were just +13:02 +hired because everybody else was getting data scientists so the CEO needed to get them too and then they say uh where's the +13:10 +data okay well um so then it comes in um well you can't have access to the data +13:17 +because you don't have a business need okay probably there are people in this call who've heard this before somewhere +13:23 +um we we can't give you that data because there's no data governance process but we'll get there I'm jumping +13:29 +ahead okay so it then says okay but we need to give the data scientist something because we have data +13:36 +scientists tears and they're sad because they don't have access to data okay so then they say aha here's an Excel +13:41 +spreadsheet it's got some data that makes you happy right oh no that does not make a data scientist happy my my +13:47 +day as a data scientist is very sad if I have to work in Excel spreadsheets okay so then um the data +13:53 +scientists start coming up with their own ideas well if we could just collect this type of data by making this +14:00 +pipeline we can somehow get it um now mind you they still don't know what problem they're solving they're just +14:05 +going to go out and collect data okay and data scientists are not data Engineers one of the things that I hope +14:11 +you take away from this talk is data Engineers are awesome people we need more of them you need to hire lots of +14:16 +them I think the number I heard like three data Engineers for every data scientist is is a good number um so the +14:25 +data scientists they start creating pipelines which are not good Pipelines they're not efficient pipelines because +14:31 +they are not data Engineers but they're doing what they can okay So eventually they get some usable data and now we're +14:37 +cooking here I've got the little smoke coming out of the computer indicating that the computer's now working really hard because now they have data that +14:44 +they can do stuff with um and they finally get to do what they've been hired to do and are good at now notice +14:49 +how many slides we've had to go through before the data scientists actually get to start doing what they're hired to do and are good at okay that's probably not +14:56 +a good sign okay so now they have results and they are anxious to show +15:01 +these results to people look at this thing that we made isn't it great but like they don't have anybody championing +15:07 +them because remember there was no manager and there was no PM okay and yes I have lived that life and I know a lot +15:13 +of other people have as well and um so they spend a lot of time trying to get under people's calendars and explain the +15:19 +value of what they've done and look at this thing that we made and isn't this great oh and then there's more data +15:25 +scientist here because they can't get on people's calendars and people don't recognize that what they've made is great and then we have the CEO who's all +15:32 +mad saying wait a second I spent all of this money on a team and they haven't created anything of value where's my +15:38 +return on my investment where's my Roi okay so that's you know unhappy CEO and +15:45 +then the data scientists are like okay well I'm going to leave um and it's what they might be doing it voluntarily they +15:50 +might be doing it involuntarily um because there's nothing for them to do um there's no job impact +15:56 +or satisfaction they're not being mentored and growing in ways that they are looking for um there's no long-term +16:02 +job prospects at the company okay and so what does that result in that results in CEO tiers okay that is the end of the +16:11 +data science Fable um and I I apologize if if you've um if this is you know +16:18 +resonated very strongly with you when I've given this data science or the data +16:23 +Fable before I'll get people who come up to me afterwards oh my gosh you've described my past three jobs or +16:29 +something like this I've lived that job too I I'm sorry that you've you've had that experience I um would like to kind +16:38 +of break this apart and analyze it of how it went wrong and how to fix it okay +16:43 +so let's start from the very beginning here we've got the CEO with fomo fear of missing out okay what they did they went +16:51 +and hired a bunch of people but they did not identify a business problem that could only be solved with data now we're +16:57 +going to come back to that word only here in a little bit there was no business problem there was meaning why +17:03 +are you spending money okay just point blank why are you spending money if +17:09 +there is not a business problem to be solved okay so now our next step here +17:14 +was the CEO and the HR person were getting together um and they didn't have the business problem therefore they +17:20 +could not identify the requirements and the skill sets on the data scientists you know here we have like this +17:26 +fictitious they have to you know have PhD expert in NLP pytorch all of that +17:32 +well if that if you're trying to solve a forecasting problem you don't need any of those things right um so they they +17:40 +hired exactly the wrong people because they didn't know what business problem they were trying to solve okay so next +17:46 +you got the data scientists coming in all excited um so here is another +17:52 +problem um you remember we were talking about how I said data Engineers are really important here and that you need +17:58 +to to take that away from here data Engineers were not brought in because the CEO and HR did not identify data +18:05 +Engineers as being required because again no business problem okay also what +18:11 +was not here is is reflected by the fact that it would not give them access to the data was that there was no data +18:17 +governance put in place before bringing the data scientists on board okay so those are two things that should have +18:23 +happened before any data scientists were hired at all right um so then when we +18:30 +come to next here we've got the team asking where the data is okay um there +18:36 +is no data yet as you recall in this story they had to go out and collect it if you recall so the business problem +18:42 +had no data well there was no business problem so therefore there's no data to solve the business problem because nobody knew they needed it okay so then +18:49 +here's um the governance problem um so siloing of data this is this is +18:56 +something that's very common in a lot of places siloing of data indicates a +19:01 +problem of data culture when you do not recognize who needs to have access to +19:06 +what data it means you cannot solve data problems okay so this is this is a biggie um then here's the data +19:13 +scientists who are um going and creating their own pipelines again this is the sequencing of hiring problem here um +19:22 +then they created their solution but again they didn't have a PM so there was +19:27 +nobody saying hey we we care about what you're creating why don't you create this thing here's a business problem +19:33 +there was no person whose job it was to take their work and interface it with the rest of the company so again hiring +19:41 +sequence problem okay let's let's kind of recap +19:46 +what what they went for first was people okay and people are your strength +19:52 +any business leader will tell you that but they're only your strength if you hire the right ones they didn't hire the +19:58 +right ones in the right order second data okay they went after the data next the data scientists went after the data +20:05 +and you know the value of data is not in volume but the problems it can solve +20:10 +remember those stats earlier where you know like less than 1% of unstructured data was even being looked at um if you +20:19 +don't know what problems your data can solve um or or what problems you have it +20:24 +doesn't matter what data you have you could have tons of data you could have p btes of data but if you're not solving +20:30 +problems with it it's just pointless and it actually is problematic because it opens you up to you know data leakage it +20:35 +opens you up to ransomware attacks it opens you up to um you know all kinds of +20:41 +pii concerns and gdpr concerns third problem okay so data scientists went and +20:48 +um decided that they would find out what the problem was okay they were going to create the problem um again you know you +20:56 +know where this is going that I'm going to say that the problem is actually much higher of an urgency um and then culture +21:02 +it all comes back to the culture understanding how to identify business problems that can be solved by data if +21:09 +it doesn't need to be solved by data don't do it you know data is expensive data scientists are +21:15 +expensive when you look at this slide you'll see the order that the story went +21:20 +in the Fable went in and it's exactly upside down and um for those of you who +21:25 +who chimed in early rim and I were talking about Netflix recommendations um I forgot this one um those of you who um +21:33 +are uh um oh my gosh why can't I even think of the name of this TV show no um +21:39 +they they had this concept of the upside down and um stranger things is the name of the show sorry I was having a a +21:47 +moment there um and in stranger things they have like this kind of parallel universe called the upside down and I +21:53 +was I was talking to my kid who's a huge stranger things fan about this concept and she was like oh well it's like +21:59 +stranger things and the upside down so that's how I've kind of created this idea of the upside down approach to data +22:05 +so what we're going to do is we're going to take this thing and we're going to completely flip it as my recommended +22:11 +approach for for going after problems like this okay so here's what it looks like completely flipped start with +22:18 +culture culture starts at the top okay it and it's the thing that sets the +22:25 +organization up for success I have yet to see see a a place that has had a +22:30 +successful culture change without it starting from the top Grassroots stuff is great but without buyin of the +22:38 +absolute highest levels of management of a company it is very difficult to change +22:43 +the culture I'm not going to say that it never happens I would love to hear about places where it has but what you need is +22:49 +from the very top of the company understanding the importance and the value of data and using data to solve +22:56 +business problems being a data in and I I'm getting really sick of this term to be perfectly honest oh we're a +23:02 +data driven company okay most companies don't know what that means um we'll talk about what that means here in a minute +23:09 +um the problem not all problems need sophisticated data approaches to solve +23:15 +them let's be real here like there are times that Excel solves that problem awesome great you do not need to hire a +23:22 +team of data scientists to do that hire the data scientists when you have that +23:27 +that business problem that can only be solved by data and you know complex +23:33 +beyond Excel spreadsheets data okay once you've identified that problem and you +23:39 +need when you identify that problem it's important to try and quantify and this is really hard I mean I go to +23:45 +conferences all the time um where you have cdos trying to come up with ways to +23:51 +quantify the ROI of a data solution um +23:56 +you know it's it's it is not easy you have to try um once part part of that is +24:03 +then understanding you know how are we collecting the data creating strategies around collecting it because that feeds +24:11 +into your calculation of rois understanding what data do you need how much is it going to cost um where are +24:17 +the gaps how do we um monitor its control and usage um you know that's the +24:22 +data governance side too so you have to solve that data governance side and the data engineering side that's what's in +24:28 +this third line here called Data okay you have to solve that before that last thing of people and go to any CD +24:37 +conference and you say the words data governance and they will go running for the hills why because it's hard okay +24:44 +it's it's you know there are tools out there obviously to help with data governance but they they go running for +24:51 +the hills because they know they've already gotten it wrong because they did this upside down they did this in the +24:58 +wrong sequence then the people the data scientists okay strategic sequential +25:05 +hiring you want these people to be able to hit the ground running from the moment they hire on if these three +25:12 +things are in place before you hire those people they have a much higher probability of being able to get started +25:19 +immediately okay so um let's talk a little bit about data culture because +25:24 +it's kind of this nebulous term and this is this is taken straight from the tableau web page um so let's see we have Collective +25:33 +behaviors beliefs of people who value practice and encourage the use of data +25:39 +to improve decisionmaking if data is not being used to make decisions or somehow you know +25:46 +Drive revenue or something why do we have it um data is woven into the +25:51 +operations mindset and identity of an organization data culture equips +25:57 +everyone in the organization everyone with the insights they need to be able to truly be data driven and tackle the +26:05 +most complex beta data challenges and business challenges okay so you know it's kind of nice marketing speak let's +26:12 +talk about what it looks like okay first off good data culture is a participatory +26:17 +Sport with a role for everyone if you are truly a datadriven business if data +26:25 +that means that what you do your decision are all backed with data not with your gut but with data okay so +26:33 +everyone touches data at some point whether you're a data scientist whether you +26:39 +are the executive assistant to the CEO whether you are in marketing whether you +26:45 +are in it whether you are in HR everyone in a good data culture touches the data +26:52 +and is involved with the data there's a role for everyone um it is cultivated and grown over time +26:59 +and you know so let's if we're talking startups you know you you have the opportunity to start with a good data +27:06 +culture but most of us don't you know are not working for you know we are employee number two at the company out +27:12 +of somebody's garage here most of us are working for companies that existed for a while and so this idea of of data being +27:22 +useful driving decisions that is something that is grow you know you don't just wake up one day and say aha +27:28 +we're going to do this today it takes time um and it's a way of thinking it's not just about tooling like you know I'm +27:35 +sure that you know Tableau great product would love you to think it's just about tooling but it's it's also about you +27:44 +know stopping you you have a business problem that you've been presented with and you stop and you ask yourself okay +27:51 +what how would I go about proving this um what biases might I have in my +27:57 +assumptions about solution here um what data do I need to collect you know really just thinking about the problem +28:03 +in a data- minded way not even Hands-On keyboard this is you know this is the gray matter part of the process here um +28:10 +we also have to recognize that not everybody is a data scientist and that's +28:15 +great I love that and we need to look at ways of leveling people up and training +28:21 +them um so and I I see that there's a question about how can you um what +28:28 +actions can be done to cultivate a data culture is it driven by a particular person are there different personas that +28:35 +should be engaged I love this thank you for that question um so let we're I'm G +28:40 +to try and hit on most of these through the the next few slides here so just just bear with me for a second um but +28:47 +let let's talk about the who of this question is it driven by a particular person yes all okay if you're starting +28:56 +from zero it's it starts from the top obviously it's the CEO but recognizing +29:02 +that not everybody is at the same place with data the CEO has to decide that it matters to them that the company be a +29:10 +datadriven company the way CEOs say something matters is they put their money behind it so that means investing +29:17 +in their employees investing in you know training sessions you know um is there +29:23 +some course that we could send this person to such that they you know level up just a bit on their overall +29:29 +understanding of how to approach a problem with data how to work with data things like this um but then it's you +29:36 +know this is everybody's responsibility um to advocate for the +29:41 +good and appropriate and effective use of data um okay strategy specific to +29:49 +data okay so this is more of a management statement here you know higher level managements Define the +29:56 +strategies how many times have if you're and I'm specifically thinking about non-tech companies I've worked in some +30:02 +of those um what is can you tell me what your company's data strategy is most people +30:09 +can't because a lot of companies don't have them and they need to if you're going to be a datadriven company you +30:15 +need to have a data specific strategy um that that is the responsibility of +30:21 +management right there um an understanding of how to measure Roi from a data product okay we talked about that +30:27 +that's very complicated one of the one of the things that I've noticed in companies that I've worked with um is +30:34 +you can tell a company that's at least moving in this direction based on you know they're talking about a problem and +30:40 +somebody will say do we have the data to back that up you know so if somebody says here's how it is somebody else in +30:47 +the room says do we have the data to back that up that's the sign that people are thinking about um you know that way +30:54 +of thinking that I have here in this third bullet that way of thinking is shown in that statement do we have the +31:00 +data to back that up um okay so um the identifying the right business +31:09 +problem if you recall in the data Fable everything came back to the CEO not having a business problem okay what do +31:18 +those business problems look like and I'm specifically speaking of data science business problems here because +31:23 +this is a data science talk if the problem can be answered with a a simple SQL query you do not need a data science +31:30 +scientist to do it okay you can have an analyst do this and great but data +31:38 +science Pro you know problems that are right problems for data scientists are those that that are going to require +31:44 +more than just SQL they're going to have modeling involved whether it's an r or in python or pick your language a choice +31:53 +um next thing it will drive Roi and it's important to recognize that +31:58 +Roi can be measured in many ways not just dollars it could be something like you know this made our Workforce more +32:05 +efficient or this decreased our uh susceptibility to you know fill in the +32:10 +blank security problem so you know understanding what what the um what the +32:18 +actual uh thing the actual driver whether it's dollars whether its efficiency whatever um number of new +32:25 +users things like this um it also depends on how you define success so is +32:32 +Success that we have moved the needle on sales 0.5% cool but what that means is you +32:40 +have to State it up front you have to know up front we're going to do this we have this problem we're gonna it's a +32:46 +data problem we've identified that it's a data problem we're going to use a data science Solution on it and we will know +32:53 +that that data science solution is successful when or if and you need to be +32:58 +able to fill that out and until you can fill that out do not begin work you you +33:03 +want and this is you know this is the role of management and this is the role of the PMS is to clearly identify what +33:09 +those things are um okay and then again like I was saying before in terms of +33:15 +it's not always about dollars data contribute in both D data can contribute in both direct and indirect ways so the +33:22 +ROI can be difficult to determine so you know what if it's um you're providing insights so you've created some forecast +33:30 +you know some you know you're using a profit model to create a forecast of you +33:36 +know whatever um that forecast is being used to inform the SE Suite who then +33:41 +uses it to inform strategy or whatever um how do you quantify the ROI on that +33:47 +that's difficult to do um you know so you can look at some common measures you've increased your Revenue you've +33:54 +decreased your cost you know you can do data monetization maybe you're a company +33:59 +who sells your data um faster time to Market freeing up resources security and compliance um these are all ways um that +34:08 +you could measure Roi but you want to agree on what that is upfront so you +34:13 +know when you had the CEO who was like oh let's get people and then the data scientist here's a solution to a problem +34:20 +that you know didn't really exist and there was no Roi behind that thing either it's not like they said to the +34:25 +CEO or you know whomever calendar they got on they didn't say heyy look at this number this number um improves our time +34:33 +to Market by 50% they didn't do that they just created this thing right um so this is this is something that the data +34:39 +scientists can do as well um and can learn from which is to learn learn how the business works and learn something +34:46 +about business um it's it's actually what's driving me to go back to school +34:52 +and I'm not going to admit my age that I'm going back to school but you know I've been working for 20 plus years so you know it's been a while but going +34:57 +back to school to get my MBA because I think it's really important that data scientists understand something about +35:04 +how business works and how to communicate to business on in a language +35:09 +that they are accustomed to and that they understand okay so I want to kind of +35:16 +make a return to this slide here um this was our um CEO feeling um you know that +35:23 +fomo feeling here we have a case of history repeating right so I'm just going to slightly +35:30 +modify this you know I am in trouble all of my competitors are getting data +35:35 +scientists if I'm going to remain competitive I need to hire some to +35:41 +oops okay how many of us and I I will claim you know I'm part of this I you +35:46 +know my a lot of my work right now is in generative AI but this is history repeating I'm seeing a lot of places out +35:52 +there well everybody else is doing something with Gen AI therefore I need to do it as well okay okay let's let's +35:58 +everybody take a step back here okay and I want to come back in my slides oops +36:05 +here okay so forgetting the culture part let's +36:11 +let's just look at the problem what is the problem we are trying to solve with generative AI um is is the +36:20 +problem I want my um users my customers +36:25 +whatever to uh be able to ask questions to something and get answers to a thing +36:31 +is Gen really the solution to that or is it just that you didn't know how to implement elastic search very well maybe +36:37 +that information is already out there and you just haven't given your customers the right way to access it gen +36:43 +does not necessarily solve that problem and for those who have not created a production system with Gen it is +36:49 +complicated it is not as cheap as anyone would lead you to believe it is expensive it is difficult to maintain it +36:56 +is difficult to monitor it is difficult to quantify Its Behavior and its +37:02 +accuracy um and I mean I I use Lang Smith just like everybody else for you +37:07 +know measuring llms and their you know their outputs and stuff like this um I I +37:13 +find it a little entertaining that our you know the the Gen Community is +37:18 +letting llms measure the performance of an llm seems a little circular but that +37:23 +is what's going on um but what is that business problem that gen is solving now +37:29 +there are cases where I do see good business problems being solved okay those are the ones that are going to +37:35 +survive right now we are in if you go look up the Gartner um hype cycle there +37:42 +we are in what the Gartner hype cycle would call the peak of inflated expectations this is where everybody's +37:48 +super excited about a thing and everybody's getting into it but very soon here what's going to happen and +37:55 +this happened with data science Circa you I'm going to call it 2012 2013 um all of the sudden everybody's +38:02 +like whoa wait a second this is not exactly as easy as I thought and it's more expensive than I thought I'm not +38:07 +getting what I want out of it and people start just taking a big step back and they say whoa okay maybe maybe this +38:15 +isn't the Panacea that I thought it was so you end in the gardener hype cycle in +38:20 +this thing called the um the trough of disillusionment and not all Technologies recover from this okay what has to +38:28 +happen in order for technology to recover from this is for them for somebody to create something useful not +38:33 +even like the so-called killer app just something useful so when I you know have this slide here that talks about geni we +38:42 +are in the the peak of inflated expectations I have seen countless CEOs who are all you know charging ahead with +38:49 +this thing get me gen experts they need to have been programming chat GP no joke +38:54 +programming chat GPT for five years okay I I saw a job ad like that um and so we +39:01 +are repeating the exact same problem that we had in you know the early 2010s +39:07 +in the hiring of data scientists with Genai so you know let's let's think +39:12 +about how can we as data scientists um get us out of that upside down approach +39:19 +here and it's it's all about us helping people get back to what is the business +39:26 +problem that is the number one recommendation I I can make to you is what is the business problem um so +39:35 +let's click onto here this is um just kind of a a parting quote about data +39:40 +culture from Sati andella um and and I really like this quote I'm just going to leave it on the screen here for a second +39:46 +for y'all to be able to read it while I grab a quick sip of +39:55 +water you know so s talks about it's not just about deploying +40:00 +technology um it's about changing the culture so every individual is empowered how do we Empower +40:08 +individuals okay we think about you know when the um the question came in are +40:13 +there different personas that should be engaged differently empowerment you know that's a that's a kind of a squishy term +40:20 +everybody can Empower everybody managers have a very important role to play in in so-called empowerment +40:29 +um it means bringing people together it and developers creating a cultural shift +40:36 +um everyone benefits When more people can ask questions and get answers who who is it that's asking the questions +40:42 +what what type of answers are they looking for those are your business problems what what are the problems that +40:48 +people want answers to okay there's your business problem um and when every +40:54 +employee can harness the power of data this is the invest by the company in leveling everyone up providing training +41:00 +for everyone in data um so that it's you know the power of data once only +41:06 +reserved for data scientists can be used for everybody and in common apps you +41:11 +know this is this is a quote from 2014 this is pre gen but you know I think +41:16 +these statements still hold true today of what are what are we empowering why +41:21 +who what's involved how much do it cost okay all all questions to ask ask +41:27 +yourself and those questions can be asked by anybody in the organization all right with that I'm +41:33 +going to wrap up here I just want to thank Risha again for inviting me back to data umbrella it's a it's always fun +41:40 +giving presentations here reminder of where the slides are there is a um I I think rishima put in chat the exact um +41:48 +PDF um uh to the um to the slides themselves +41:54 +but there is also a repository that has that PDF in there if you go go to my GitHub so with that I will just say +42:01 +thank you and um open it up for +42:07 +questions so I believe you answered the when rubber meets the road what actions can be done to cultivate a data culture +42:15 +right yes okay um the next one is this content is incredible do you have a book +42:20 +or course or something to learn from you oh thank you um I've thought about +42:26 +writing a book about this um I'm having a lot of fun right now not writing books so I don't know if there's a publisher +42:34 +you know in this session who wants to talk to me there's my email address feel free to reach +42:41 +out I also am happy to you know speak on these subjects to you know any variety +42:46 +of places so it's it's something that that I think is really important and matters a lot to me +42:54 +so so while we're waiting for people to ask questions I have a question which is when you are looking at data science +43:00 +jobs that are posted um and interviewing um Are there specific red flags we +43:07 +should look out um is does the question relate just +43:13 +to the posting or conversations and interviews as well I would say both okay +43:19 +well let's start with the posting um you know and I've I've written many +43:26 +job ads for you know hiring people I have interviewed many people I have interviewed many times I've applied to +43:33 +many jobs um one of the things you know the good job postings in my opinion +43:41 +State the problems that you're going to be working on so um you know this role +43:47 +is being opened up to help us solve this type of problem okay so let's let's just +43:54 +say um uh I'm just going to come up with a a natural language processing problem +44:01 +okay so then I would expect in you know a good job ad that they would list things like required and desired skills +44:09 +um as somebody who likes to work in NLP I know what the typical packages are you know maybe it's gen I know what those +44:16 +packages are so I would expect to see those listed out in the required and desired skills if I see a disconnect +44:22 +there well if I don't see a business problem or you know some sort of explanation on exact what you're going to be working on um that that's a red +44:30 +flag to me um it's definitely something I would ask about if I got into a call with the hiring manager um I would also +44:39 +um in a call with the hiring manager want to know things like um you know +44:45 +what percentage of my time is being spent maintaining old things developing new things um if it's maintain if it's +44:54 +mostly maintaining old things um to me that's that's also a red flag of you +45:00 +know okay so yes code requires maintenance but you know you're hiring a full-time person to maintain something +45:07 +that's you know in theory already shipped so what's that about um the uh +45:13 +let's see other red flags you definitely definitely definitely want to hear from them that they know why they're hiring +45:20 +into this role what problems are they specifically wanting worked on um I have +45:25 +taken jobs I have you know I'm not going to name places but I I have hired into roles where they didn't they're like +45:31 +yeah data scientists we're forming a data science team great what are we gonna do um and we spent four years +45:40 +languishing trying to find things to do um not it's not a great use of time it's +45:46 +not very fulfilling uh which roles in a company are most prepared to Define business +45:52 +problems that can be solved by data okay oh this is a great question so +46:01 +the there's there's a few answers to this one if you're already in a company +46:07 +the PM is going to be the one who understands how to go between the business and the data scientists the +46:14 +best um it doesn't mean that they're the ones defining the problems but they're working with the people who are defining +46:21 +the problems um so they're going to be able to take that business language and +46:26 +converted into something that is more um comfortable for data scientists to +46:32 +understand but um generally speaking business problems are identified through +46:37 +managers um they identify issues through you know different strategy sessions and things like this so um you know whether +46:46 +you're talking a firstline manager a mid-level manager a senior manager um +46:51 +depends on the size of the company but but those business problems should in +46:56 +theory if they're being generated properly be tied into what is the strategy of the +47:02 +company um okay for a manager of a data science team what are the best ways to advocate for data science oh or to +47:10 +manage expectations of business owners in seite about what data science can and cannot +47:16 +do well we could be here all day on this one um okay so the best ways to advocate +47:21 +for a data science um team again I'm going to you know I'm going to sound +47:26 +like a broken record here and I'm going to come back to business problems there was this business problem we solved it and here's how much better we did okay +47:32 +so not just you know I didn't stop with saying we solved it you know put the little check mark next to the thing +47:38 +quantify you know before our solution um our time spent on doing this thing was X +47:47 +after our solution we improved it by y um and by the way that is a really +47:53 +really great thing to put on your resume um when you know if if you're looking for that +47:59 +next role it's it's one thing to be able to say I created a model that you know to solve x it's way better to say I +48:06 +created a model to solve x that resulted in y Improvement it means that you +48:12 +understand how to quantify Roi or at least approach it um and that's a very attractive thing to see in a resume um +48:19 +managing expectations of business owners and seite about what data science can and cannot do that is hard hard hard +48:27 +hard because like you know and this was especially true you know in our data Fable where the CEO had that fomo thing +48:32 +going on um they think and this is hard you know people in this call will +48:37 +understand this they think that you know machine learning is this you know it will solve everything it slices it dices +48:44 +it's going to create World Peace it's going to end Hunger it's not we all know that um but they um this is about data +48:52 +education and leveling up is not something that just happens in the lower ranks to the company it happens +48:58 +everywhere so if as a data science manager you can help level up your upper +49:03 +management and your SE suite and you know it's it's about explaining to them +49:11 +you know here's the here's the pros and the cons of this thing here's the drawbacks of this thing um here is pro +49:20 +prior to starting off on a project you know we would anticipate you know if we +49:26 +could get an accuracy of X we could improve our you know whatever by y it's +49:33 +about you know setting their expectations that no we're not going to get 100% accuracy um but you know if we +49:42 +can get 85% accuracy look at how much better that is for us so it's it's not +49:47 +about just saying yes we can do a thing it's about making them understand what success is and helping them define +49:53 +success you know so I talked earlier about um you know being able to State +49:59 +upfront before you even start work what success looks like this is part of that +50:04 +conversation um so just really having a very honest conversation with them about +50:10 +what is realistic and what is not uh have you ever have you seen +50:16 +smaller companies using data well absolutely um I uh work with some +50:23 +startups in my current role and just just some of the stuff um you know and +50:29 +I'm on I I can't you know State who obviously and ndas and blah blah blah but um there there are companies out +50:36 +there and and honestly I think the smaller companies have the have the opportunity to get it right a lot easier +50:43 +than the bigger ones you know bigger ones they're you know they're a bohemuth you're you're trying to you know boil +50:50 +the ocean with a smaller company though a couple good demonstrations of good use +50:56 +of data and you got everybody's attention um so I'm thinking about you know some startups that I work with +51:01 +which are like 20 people and you know they are they are companies that revolve +51:07 +around data um and it's it's +51:12 +really you know I hate to say you know it and you see it but like when you walk into a space and you can just say Hey +51:18 +where's the data for and what is the data show um and I'm not going to argue data being plural or singular but you +51:24 +know you you see just like this constant discussion about the data um and it's +51:31 +it's not even just in data companies I I there are other startups that I work with in completely disparate domains and +51:38 +it's you know it's go collect this go find out go measure and it's it's just +51:43 +really it's it's really refreshing when you've come from a place that doesn't do that back into one that does um so they +51:50 +do absolutely exist +51:57 +pop over and check the chat window here real quick last call for questions if anybody has any other +52:07 +questions so I'll just ask one last question which I guess is so bigger companies are not necessarily um you +52:15 +know using data as well as we might expect I I don't know that I can make +52:21 +that conclusion um you know when you when you got a quote from satiana Adella like that talking about data culture um +52:28 +you know that would certainly point to um a higher probability of um there +52:33 +being a good data culture you know where SAA at Microsoft um but it's if you have +52:39 +to change the culture it's a lot harder to change a place of thousands of people than it is of tens of people um it's +52:46 +absolutely the you know the easiest to do it is when a company is starting up like I said um here's a question Beyond +52:52 +knowledge about more advanced modeling techniques what differentiates senior data scientists for those more Junior in +52:59 +their role okay this is um this is kind of also reflect my +53:06 +answer is going to reflect my time as a professor as well I spent some time uh teaching at the University of Illinois +53:12 +and the question um that I would get is how do you know when somebody's ready to get their PHD when they're ready to +53:17 +write it up and the easy it's it's it's really obvious to a professor when it happens because that student starts +53:23 +telling the professor what to do and like not not in a boss way but like they actually have an informed and educated +53:29 +opinion and they know what to do next so when I look at the difference between a senior data scientist and a junior data +53:34 +scientist the junior data scientist doesn't always know um how to get started um what the right approach is +53:42 +they they are um you know they're needing a lot more input versus the +53:48 +senior data scientists they are they are able to propose okay here's the problem +53:54 +here's a path to solving that problem so so they the senior data scientist can delegate those things to the junior data +54:00 +scientist but the senior data scientist is the one who can take a problem from start to finish and know how to craft a +54:06 +solution for it that's that's my opinion but um it's certainly shown itself true +54:11 +in my career well that is a perfect way to end +54:18 +the webinar thank you so much Claire um it was yeah it was really really a +54:23 +wonderful talk um we'll have the video up within 24 hours um and we look +54:30 +forward to seeing everybody at our next event great thank you everybody +54:35 +appreciate the invite +``` +