Join Alexandra and data science guru, Peter Bull, as they dive into the fascinating world of data analytics and its impact on the nonprofit sector. They explore the common misconception that data science is only for large corporations and how nonprofits can leverage it to supercharge their impact. Learn about real-life data science projects that have made a difference for nonprofits, including the powerful combination of data and automation to free up human resources for more valuable tasks. Throughout the episode, Peter shares valuable insights on avoiding pitfalls when embarking on data science projects. He emphasizes the importance of starting simple and building iteratively towards more sophisticated solutions. You’ll discover how having a strong data collection strategy and an observability mindset are critical for success in the data science world.
Whether you’re a nonprofit leader, data enthusiast, or simply intrigued by the power of data, this episode will leave you inspired to unlock the full potential of data science.
Resources:
Click to read the auto-generated transcript of today’s episode
Alexandra Mannerings: thank you so much for joining me today. Peter. I am thrilled for our conversation today and I want to just start by letting you introduce yourself. Tell us where you’re coming into us from and how you came to data.
Peter Bull: Great. Yeah, well, thanks for having me, Alexandra. Nice to chat about data in the social sector. I’m coming to you today from Seattle, Washington. and we were just chatting before the recording started about how, the weather finally broke in Seattle and we moved from a phenomenon called where winter comes back in June to, uh,
Sunny and 75 degrees. And so, it’s really delightful here right now. so hopefully I’ll, uh, have a nice conversation with you and then head out for a walk is the plan. so that’s where I’m
Alexandra Mannerings: I love
Peter Bull: yeah, it should be good. And how I came to data,is, a little bit of a long story. I, started my, career as a software engineer, and I was working on a team at Microsoft that was building new user experiences for Microsoft Office.
So. if your listeners were called to transition from the normal, like, file button with a menu that drops down and a bunch of words on it to the ribbon, which is this very graphical, big icons, large hit areas, trying to figure out, like, what controls go together in a way that makes sense. so that team was really working on this transition in the software to thinking about being more user friendly.
and one of the things the team did was collect a lot of telemetry data. and so, you know, how long did it take between clicks of 2 different kinds of actions that you might take in a document? How many clicks did it take to move from, one sort of function to another? if you’re in Excel and you’re trying to like build a complicated spreadsheet, you spend most of your time finding the actions that you want to take.
or they surfaced right there for you in the application. and how do we optimize that? So, we were asking a lot of these sort of data rich questions. and my academic background was in, my undergrad degree was actually in philosophy. and so I was both fully under qualified to answer any kinds of data questions that were coming up.
but it really was like, sort of the burgeoning of. Big data as a term of art. and it seemed like it was such a critical tool to answering these questions that I really wanted to dig in, and be able to use the data that was coming in to answer these critical questions. Because, even though we had the data, we didn’t necessarily have the answers yet.
so. I decided that,with my background, I needed a bit more training to get to where I needed to be. so I went and did a degree in computational science and engineering. and so I think these days, it would probably get just called the data science degree. So, it’s half statistics, have computer science.
but at the time, it was a really, coming out of thinking about simulation models for scientific modeling and the data that’s generated by that. But really ended up being quite a bit of machine learning and data modeling. and so I did that. And, while I was there, I was looking at ways to sort of combine my interest in data and social impact.
And at the time this was in 2013, there weren’t a lot of options for putting together data and social impact in ways that felt impactful. It felt like, every time I tried to find something to work on in that space, the angle of combining the latest tools that I was learning about in grad school, and approaches to social impact problems weren’t there.
And so, a colleague and I in grad school. decided that we wanted to do something about this that, we thought the sector could really benefit from a set of case studies or, examples of how these data tools could be used for social impact organizations. and so we framed this. In terms of, a machine learning challenge platform and so let me explain a little bit about what that means.
So. for a lot of tasks and data, you can, assess how well, you have modeled your data by asking it to make predictions and comparing that model that makes predictions against actual results on. So you say, okay. I didn’t use this data from this current year to set up my model and understand what’s happening.
I used everything up to this current year. And so I use this current year to say, hey, is this a good model or not? Is it doing reasonable things that I would expect? And you can do that kind of set up for all kinds of machine learning problems. and it’s 1 of the ways in which people assess how well they’ve solved a problem.
And so, there are a number of platforms online where people who love machine learning and love to build the most accurate models will come and participate. If you set up a problem for them, you say, hey, here’s the data. Here’s the context for the problem, why it’s interesting. Here’s how we’re measuring success.
Go build the best model that you can. There were these communities cropping up of people who really loved that kind of puzzle, that kind of challenge. and so we took that side of the technical community that really loved digging into those puzzles and then the context building for social impact organizations and put them together and said, okay.
If you’re an organization that cares about building water pumps to improve access to water, and you want to predict when one of your water pumps is going to fail so that you can send maintenance to those water pumps, you want some sort of algorithm, some sort of statistical model to do that work for you to really understand the best places to send your resources. And so we frame this question for a large community of machine learning people, and they come and they say, okay, let’s look at what elevation it’s at. Let’s look at how deep the groundwater is. Let’s look at how often it’s being used. Let’s look at. how close to the population center it is. Let’s look at recent weather trends and understand if we expect the pump to have water or not.
Or maybe it’s operating dry in a way that might damage the equipment and they put together all these factors and they come up with predictions for which ones are likely to fail. and so that community that thinks about those sort of puzzles and data in 1 way is now coming together with organizations that are really just about installing water pumps, but want to do what they’re doing more effectively.
And so we set up all of these challenges where we have these examples that speak to both communities that say to data scientists. Hey, here’s an interesting way that data can get used in the social sector. Here’s the impact it can have. Here’s why it’s a motivating and challenging puzzle. And actually, your sophisticated data skills can help to solve problems.
But it also communicates to the social impact organizations. If you’re collecting this kind of data. Here’s the way that you can become more effective as an organization. Here’s how you can think about smoothing out your processes. Here’s how you can think about better measuring your results, becoming more efficient.
and so really started from the idea of having, these kinds of case studies and these challenges that show very specific problems, kinds of data. Across impact areas being solved. and so we started that in 2014 and have been doing it ever since. And so this is our 9th year of, running challenges.
And out of that, our team has grown and in addition to running the challenges works directly with organizations to solve problems. We do a lot of data and data science consulting, outside of just the challenge framework. and so now we’ve built up, over 100 projects across different domains that we’ve worked on thinking about how to use data more effectively in the sector.
and so have. A lot of sort of experience that, sort of has built up over time. We watched the growth of investment in data and people caring about it in the social sector and thinking about what are the sort of. Baselines and foundations we need as a sector to get to where we want to get to. and yeah, it’s been a really, really exciting journey.
Alexandra Mannerings: I love it. And I think you and I really do intersect on that point of just Seeing how exciting it is for people who love doing the technical challenge meeting with these deep, deep needs in the social sector, because oftentimes those populations are not the same, right? When you, you become an executive director of a nonprofit, you’re really sunk into helping.
support nonprofits and these mission driven organizations, like, you may not have the technical, you know, computational science background that you mentioned. Although I love that. You got there via philosophy. I think that’s so true of so many of us that it’s not necessarily a linear path, but to bring those skills together and say, together, we can really make a difference in all kinds of different social problems.
Peter Bull: Yeah, and I think 1 of the interesting things is that, People with those technical skills want these social sector problems framed for them. They want to participate, but they don’t know how they don’t understand the context. They don’t understand what data does and doesn’t exist yet. and they don’t understand.
What interventions organizations actually have, what are the levers that they can pull? and so being able to communicate that context helps them think about how to solve the problems in a way that works for organizations, rather than just saying, Oh, here’s the fanciest new method. I’m going to apply it on this data and solve the world’s problems.
Because, if things were that simple, we’d be at a much better place than we are today.
Alexandra Mannerings: Amen to that. That’s so true. And I think that can be a place where nonprofits and social impact organizations and specialists often undersell. What they bring to the table that they think, oh, you know, this person’s really good at the numbers or the calculations or whatever, like, they just should take it and run with it and don’t realize just how valuable and important and necessary that frontline knowledge is that experience of which of these factors are truly related.
To what we’re trying to do and meaningful, verse things that just might be noise in the data, right? Things that are there in the numbers, but they’re not connected to the real world or connected to an actual, like, mechanism by which change is happening.
Peter Bull: yeah, I think that is a particular lesson that it took us a while to learn in our work. and we sort of got lucky in the 1st, few years. We worked on a project with a design thinking firm idea dot org. It’s their social impact arm. that does a lot of human centered design work, trying to look at social impact problems.
and we sort of got partnered with this human centered design firm, as a data science firm under this grant, to look at improving access to mobile money. offerings in rural communities. And, I think one of the things that we learned In the course of that project, which lasted a couple of years was how our perspective changed when we saw how the data was getting generated.
And so we had the individual transaction level data for these mobile money transactions. We had where they were thinking about ways to increase uptake and how to make it easier to go from. Cash that you have in your hand to digital money that you can transfer in the system versus back. How does the network of ways to do that?
actually facilitate those transactions and the difference in our approach to the problem when we were 1st hand at the data, we started just digging in and building all of these graph models and understanding the network effects to, after,going watching people actually doing the transactions on their devices, watching who they worked with to do those transactions, just changed how we thought about the problem completely.
And so 1 of the things that we always talk about is. How do we go observe the data generating process to understand all the things that aren’t in the data? So 1 of the critical findings that came out of that project was that people would go to mobile money agents, which are the individuals that get some commission from the mobile network operators to transfer cash into digital currency and digital currency back to cash.
So they basically operate like an ATM would. And 1 of the things that we learned in the design process was, There was a relationship of trust that was built between people who were putting their money into this system that they didn’t understand. there was a relationship of trust for those people between them and the agents that they went to, and it built up over time.
And so we started to look more deeply in the data at this question of. What is the subscriber agent relationship between a subscriber and a particular agent that they keep going back to? And how does that trust get built over time? And that question, if we’re just looking at the data was not even on our radar.
and so. We try to think about how do we build up within the people who are going to be analyzing the data enough of that context to ask the questions that they wouldn’t otherwise, or wouldn’t necessarily get asked of them.
Alexandra Mannerings: That is such a great example, and I always come back to, you know, the all models are wrong, and some models are useful, right? There’s always this process where we have to take what’s happening in the real world and reduce it in some way, right? You cannot capture the entirety of human existence and reality in data.
So you’re going to have loss of information somewhere, things that are difficult to capture, things that are impossible to capture, or things that are captured by proxy and not the actual thing you’re trying to get at. And so I love that you bring up that one of the steps has to be observing that process and understanding that translation from real life into the data.
Because, like you said, The information about who they were going to as, you know, being the person who was going to do that, that transaction was there. It, it marked, you know, this person went to see that person, but the nature of that relationship was not in the data. And so you had to observe it to understand the meaning of the fact that they might be going to the same person versus changing or going to whoever was closest.
Peter Bull: that’s right. And that really came out of talking to those users, because, some of them, we saw actually hand their mobile devices, their phones over to the agent. They said this agent. Knows my pin is able to access my account, but I want them to do the transactions for me because I trust them and that do the transactions for me is not something you see in the data.
The data that has that transaction doesn’t say and they handed their phone over to the agent. We had to see that a person to see what that gap was and what that trust relationship meant to the people who were interacting with the system.
Alexandra Mannerings: I love that now you’ve given two fabulous examples of real life using data to help solve and improve on interventions in real life social challenges. I want to back up one step and help clarify a little bit like when we talk about data science. Right? You gave the example of models that can predict something using some fancy math.
But generally, what do we mean by data science when we bring the whole concept of data science to bear on these kinds of problems?
Peter Bull: Yeah, that’s a great big and complicated and maybe debated question. and so, I guess I think about it, in. A couple ways, so, I 1st, think about how does data get used in the social sector? and I think there are 2 big buckets there. The 1st bucket I think about as the bucket, evaluation and, monitoring really understanding.
How well we’re doing as an organization that hitting our impact goals. A lot of that gets asked for by funders and is a critical part of the data that we think about as a sector. And I think that area, because of the structures that are set up, gets a lot of investment. It’s people interested in it.
That’s what we think about when we say data in the social sector. But I think that’s only half of how you would think about using data tools in the sector. The other half of things really is operational data. So understanding how we might be more effective as an organization, how we might change our strategy that we have, because we’ve learned from data about what is happening on the ground changing, In front of us and our programs need to change to meet that. And then you can also think about how might I use a model to automate processes that take a long time or inefficient or are wasting our staff time and resources in a way that we need to be spending them more effectively. And so a great example of that is an organization we worked with that tries to help schools to baseline their budgetary spending more effectively.
So are we spending the right amounts on textbooks, on equipment for students, on support services for the students? How does that compare to other schools? And this is something that businesses do all the time. They say, Hey, What are the industry metrics for these kinds of expenses? Are we in line? How do I adjust this in the right way?
and they can do that because they have, budget templates that have categories that they can align their budgets with the industry standards. and this, for a long time has not been the case for schools and school districts. And so, there’s an organization, based outside of Boston called education resource strategies that spends a ton of time with people who are really experts in these school budgets.
Looking at the expenditures that a district makes and saying, oh, this goes into this category. And here’s an expense that goes into services. Here’s 1 that is about, Infrastructure and maintenance, and that takes an enormous amount of their staff time and really the value that their staff brings to the table is about making recommendations for the districts about what to do with that data.
Not in just this 1st. Level, like, clean up, how do we get this? So we can even start understanding the problem. So we help them to build a model to say, okay, for a given budget line item that has some amount of description and it’s got when it was bought and it has some information about, like, what they call the department.
It was under. What is the most likely category and their taxonomy of expenses so that we can automate as much of that cleanup process as possible? We know there are going to be some hard cases that their experts have to go in and look at. But how do we clean up enough of that? That we can actually get to the important analysis piece.
That’s really the value of the work that they do. so that’s what I mean, when I say automation, it’s not about, like, replacing human jobs, but it’s about. Thank you. Whereas are we doing low leverage work? That’s much higher leverage. If we can automate it in some way. so those are the buckets within just operational work that I think about, as being useful avenues for data science tools that don’t necessarily get talked about as much as measuring impact does.
Alexandra Mannerings: I really appreciate that clarification. I agree with you very much that when we think about how data can be used, what I often see with nonprofits is like you said, there’ll be a Program specific evaluation, right? Did this program do this one or two things that we’re supposed to be measuring or grantors are expecting us to report back on?
And then you do sometimes get data around donations. Right. How many donations are coming in? Where are those donations coming from? Like, you will see metrics that are oftentimes built into donation platforms or CRM platforms. And that’s kind of where I see data used. But I agree with you, that whole idea that there’s data everywhere, whether it’s in your expenses, right?
How you’re using your resources and are there ways, like you said, there’s data in being able to benchmark and evaluate Something like your expenditures, but there’s also data in the automation. And I think you’re right that that one’s probably the most often missed in nonprofits. We just don’t think about it.
And to your point, you know, it’s so much better to let computers do things that computers are good at and free up. are brilliant and caring humans to do the things that they bring so much more value to and it reduces burnout, right? it increases retention. Your people are happier and more invested in doing the things that, that light them up because they get to spend their time doing that rather than these like menial robotic tasks.
so I, I think that’s really helpful to think about it. and really what you’re illustrating in that is that data science encapsulates. Basically all of the possible activities and tools and techniques, statistics, software, etc. That let us get what we need out of the data to get where we’re trying to go.
Which is why it’s so difficult, and I did sort of set you up in asking you that question. Because it’s a very nebulous place.
Peter Bull: right. Well, you notice I sort of dodged the question, right? I didn’t say data science does X or does Y. I say, here are the ways that we can think about using data in the sector for organizations, because it really is about problem solving, and it’s a particular set of tools that let us do things, but we should be focused on where we want to get to, and why some tools might make sense to do that.
Alexandra Mannerings: So you’ve given us several just wonderful examples of real life, you know, actionable data science projects that are making differences for organizations out there. I’m curious if you have some pitfalls or sort of warnings that you want people to be aware of as they either are considering approaching someone like you or promote, proposing something to be a challenge, you know, for a data science.
Platform like you have, or if they’re getting into a project, like what do we really need to think about to keep ourselves kind of safe as we head into this brave new frontier.
Peter Bull: Yeah, that is, such a deep question. and I think that, one of the things it brings to mind for me is, there’s a wonderful blog post called the data science hierarchy of needs. That talks about, and, I’ll give you a link to this. So it could go in the show notes, but it talks about, what are the layers that we build up in our data systems in order to be able to do more sophisticated analyses.
And so 1 of the major pitfalls. That I think about is reaching for those most sophisticated tools too quickly. so a lot of the projects that we’ve done over the last 9 years are really what we think of as 0 to 80 projects. We’re going from having nothing to having 80% of the problem solved. and those are things that are.
The biggest opportunities for organizations to really change their impact because it is going from nothing to something. but a lot of the tools that you might reach for are best designed for the 80 to 90%, the 90 to 95% solution. and 1 of the things that we always think about. We say, hey, what’s the.
Easiest baseline thing that we can do here. It may be if it’s a modeling task, just a simple rule. If over this value, we think it’s X. If not, it’s why it could be sufficient for this problem. It’s easy to understand. It’s easy to deploy and maintain in a system. It’s easy for people to reason about.
Reaching for those simple solutions first and understanding how far they get you is one of the core principles, I think, of doing effective data science. It’s building iteration cycles that only get more sophisticated and complex as the problem calls for it and as you have the intervention mechanisms to make use of a more accurate solution.
Or a more efficient algorithm. So that’s 1 of the ones that I really think about. But core to everything that goes into that is the data itself. So this is like, 1 of the conversations we have all the time is, what is the data that you actually have? What data are you collecting? And there’s this principle in engineering systems called observability.
And what observability means is that. it’s not. Can you see every part of the system at every time? Observability means if I ask a question about what happened, I have recorded enough data that I can look back and see the critical pieces of the system and understand why what happened. Happened and so this ability to have collected what I need to answer a question that I’m asking now is 1 of the core things that, organization should be thinking about from a data perspective when they’re asking questions, to be more effective as an organization, or because they want to change their strategy, or even just to report metrics.
How many people did we serve? Through this program, they need to be asking themselves at the same time. Do we have the data to answer this question? because sometimes you don’t. So building those data collection systems and understanding what you will be able to say with that data that you’ve collected versus what you can’t, 1 of the things that is important.
Requires a lot of careful thinking. and so that’s another area that it’s not quite a pitfall. I would say, but it is 1 of the ones that takes more work and thought than I think people anticipate. and the reason that it’s worth that. Time and thought and effort is that all of the data science work that we do lives by this principle of garbage and garbage out.
if you can’t trust the data going in, if it hasn’t captured everything you need to capture, you’re not going to get results that you’re confident in. And then the whole project is going to be a waste. So thinking about the data collection strategy, when, how it happens, what actually gets stored, is I think one of the biggest areas of investment to think about when you’re thinking about how to use data as an organization.
And one of the interesting things is that, you brought up fundraising and development is this area where organizations have a lot of data. and the reason that they have that data Is that it is a byproduct of the tools that they use, they use tools to track interactions with donors. They use tools to track mailing lists.
We think about tools for databases. We understand all of the transactions that we’ve had historically through our accounting systems and just by the process of doing our development work as an organization, we are collecting the data that we need. To think about those systems more holistically. And what organizations can do is apply that same mindset to their operations as well.
and so how do we make it that just by virtue of running our programs, the data gets collected that we can answer questions about it. and that just happens to happen. In fundraising because of the way those systems are set up in a way that doesn’t necessarily happen programmatically. And so it’s worth thinking about for how to do that for your programs as well.
Alexandra Mannerings: I love that. I don’t know that I’ve heard it phrased in quite that way before but you’re right that we have all this great financial data because in order to make a financial transaction. like, you have to do the things that generate the data to have the financial transaction happen. and you’re right that we do struggle oftentimes to get comprehensive enough programmatic data because of how we do the programs or the natures of the interventions or whatever we’re doing don’t necessarily require tools that do that, but to think about how can we have that happen?
How can we integrate the actual activity of the intervention or program so that the data generation becomes sort of a secondhand result, rather than having to be its own activity all on its own, which oftentimes then doesn’t get done or is done. In the sort of margins because you’re spending appropriately most of your time doing the thing rather than having to write about doing the thing.
Peter Bull: and the fundraising teams that are happiest have invested in CRM’s that do a lot of that data capture for them so that they don’t spend a lot of time manually editing donor databases and logging transactions. That’s the bane of many a fundraising team. And so the systems that capture that by virtue of letting fundraisers do their work, are ones that make everyone happier.
Like you pointed out earlier.
Alexandra Mannerings: Exactly. So, I love that, you know, those are such great things we need to pay attention to. And I agree, you know, the starting simple before you build that sophistication. is one that, that it’s easy for all of us to miss. And I think it brings to light that there may be steps towards a data science like solution that are available to any organization, right?
That don’t require super complicated math, or, you know, high levels of training and statistics in order to to operationalize. It could be a simple, you know, true false statement written into an Excel file, right? That gets you actually a big step of the way. But as we do build towards that sophistication, obviously the.
Right thing to do is to reach out for external support. when you get to that level, right, you know, organizations like yours and your expertise in building this and I think we’re nonprofits and some of us, you know, that aren’t in this field start to struggle as well. How do we identify a good partner here when I don’t know what makes a good scientist?
Right, how do I know that somebody is going to help me properly in this project is going to get what I’m trying to do when I can’t really evaluate their technical skills. So I was wondering if you could speak a little bit to helping, you know, organization who’s interested in working with a contractor, bringing in a 3rd party to help with this in identifying, you know, a partner who’s going to be appropriate and helpful and effective for them.
Peter Bull: Yeah, this is,I think 1 of the most challenging things for organizations is bootstrapping new capacities. and so I think this structure problem happens with all kinds of capacities that organizations need. So, I’m sure a lot of groups can think back to, when they needed a social media manager, right?
To say, oh, okay, We have this new marketing channel where we need to spread the word about our organization and our services. and we don’t feel like we, as an organization and understand that how do I find someone who can help me think about this space? And so this bootstrap into a new capacity, I think is.
1 of the most challenging things that you can do as an organization. I do think that, in my experience, your professional network and your referral network is 1 of the most valuable sources for getting, trusted people in. so that I think is, the 1st step that I would take is, if I make a call out to my network and say, I think.
We need to invest as an organization and data science capacity. I’m not sure if we’re going to hire. I’m not sure if we’re going to get a contractor. I’m not sure if we just want to find a software package that’s going to solve our particular problem. can I get recommendations is where I start on most of my problems.
And I recommend that other people do too, is just. Saying, okay, what can I get from my network? And so then, once you’ve done that, or if you’ve done that, and you haven’t found that you’ve gotten an easy answer to your question, then you’re in the place where you’re saying, okay. What do I care about when I’m working with a firm on a data science question?
What are the qualities that are going to make sense? and I think 1 of them is, the ability to understand your organization’s domain. So, you know, it may not be possible to find a data science organization that has experience in your exact issue area. It’s likely you can find one that has a related structure of a problem or you can talk to someone there that can explain to you how they would reason by analogy from another problem they’ve solved in the past to the structure of the problem that you’re looking at.
So being able to feel like, the firm that you’re talking to can understand the context that you’re operating in, I think is important. Another 1 that I would think about is, how do I see outputs from what, the firm has done in the past? We’ve done all kinds of different things for organizations that range from writing report to,slide decks that are presentation to the board about the output of analysis to having software that we actually build for the organization that then they run internally to having full blown products that the organization now owns.
And it’s part of their programmatic offerings. And so there are all these levels of sophistication of what you might need when you’re thinking about a data project, and thinking about. Thank you. How do I see outputs from an organization about what they’ve done before for this kind of problem? If I’m just looking for data strategy, that’s what I want to see.
It’s cool that like, oh, you built this amazing computer vision application, but that doesn’t necessarily mean that you can help me with my data strategy and understanding how that works in my organization. So previous work samples, I think, is one of those critical pieces. and then I think the other thing that I look for, and this is, for me, when I think about hiring data scientists, our own data scientists that come and work on our team, is that Curious mindset, that willingness to be iterative to ask questions to try to understand deeper.
And this is something that’s like a soft skill that you try to get out in your conversations with people where, You want to work with someone who is going to dig deeper into the data than you even thought was possible. You want them to generate questions to try to answer those questions and to come back to you with a set of things where they want to talk about the process.
They want to talk about what they learned along the way so that. You can understand how they got from A to B, and you can help correct for the next iteration. I think I’ve said this a few times, but data science really is an iterative process. Expect there to be a number of loops that you go through from questions to data to analysis.
To communicating those results with you to thinking about this did or didn’t make any sense, knowing that’s part of the process. And that after that, you go back to the data again, and you do some more analysis and you have some more interim results. And you talk about this again. You say this is, or isn’t the perspective that we need.
And those iterations are what gets you to a good result. So just expecting that process and not a linear from A to B, is Like, I’ll just sign up and have this problem solved. This is a conversation where we’re both learning more together. The data science folks are learning more about the domain, the context, and the needs, and you are learning more about how data is getting used to answer the questions that you care about.
Alexandra Mannerings: I think that is such a helpful list because every single thing on there is something that anyone can evaluate, right? Anyone who’s in the organization, right? Who has, and we talked about, organizations often underselling their contextual and frontline experience, and if you have that deep domain experience, you’re going to be able to see if an organization coming in, like you said, has at least experience in the domain, right?
Are it. Have they worked with health care? If your interventions in the health care space, that kind of thing. But then also, like you said, that a lot of the soft skills are just as critical to success as the harder technical skills. And so making sure, like you said, that they’re listening to you and asking you deep.
interesting, engaging questions about your organization and what you do, you’re able to tell if that’s a meaningful interaction or not, regardless of whether you understand a particular algorithm that they’re suggesting, you know, to use in the mathematics. and so I think that, you know, hopefully that gives people confidence that they can, understand if somebody coming in is going to be a good partner.
and I love that you mentioned, you know, it being iterative, like that should be something too. You should want to work with them.
you should be having lots of interactions and conversations with them because it should not be a one and done project or it isn’t going to be as valuable, as one that has had meaningful back and forth and corrections and experimentation, and refinement.
Peter Bull: Right. Yeah, I purposefully did not say ask a bunch of technical questions like get them to explain what a P value is to you. If you can’t assess what’s coming out of that kind of interview process, it’s going to be even worse when you start working together. this is one of,my wife is currently interviewing, And 1 of the things that we say to each other is that interviewers are often on their best behavior.
And so, like, if you’re in the conversations with a firm, they’re on their best behavior. And so, if it doesn’t feel like you’re able to connect and communicate and get to the point where you’re talking about the same thing, you have a shared understanding, then it’s not probably going to get better.
and so. Thank you. That, I think, is 1 of the, like, the critical soft skills are in that communication piece, which is 1 of the core data science skills. When you think about what makes up a good data scientist communication is a big piece of that. Obviously, there are critical requisite. Technical skills, you can’t get by without having the right technical knowledge.
but. You, as an organization that is hiring your 1st data science. To start engaging your 1st, data science project is not going to be able to, effectively assess that. and so if you think you really need to assess the technical side of things, then you might look for partners that help with technical hiring.
So, 1st.
Data scientist, where we run the technical screening interview. We say, Hey, here’s our rubric. Here’s how we think about these pieces together. Here’s what we saw in these interviews. You use that in conjunction with the rest of your hiring process to understand if this person is a good fit for your team and to find someone that can be that.
All around data scientist, especially if they’re coming in the first as the 1st person, that can really do that communication in addition to the technical work. And so, if you really need to, you can, find 3rd parties to help with that technical assessment piece as well.
Alexandra Mannerings: That’s a great recommendation. I think another place you can go as well as your board, right? If you have a pretty sizable board, you may find somebody who has a background, like you said, to just give you some assessment of technical ability. the other thing that I do say, and obviously, you don’t necessarily want to get yourself in the point where you’ve invested in a project and you find out it’s not any good, but, you know, data projects are good when they help you do what you do better.
So even if you don’t understand necessarily how you got there, if the results you’re finding, right, from implementing the model or whatever it might be, are improved, then it’s a useful thing. Um, and so you can also, while you might not, like I said, want to wait until your project is done to find out if it was good, you can ask for those kinds of measures from past.
Projects that an organization has done, you know, do they have some data to show that after working with an organization on a program that program has become more effective or time savings from automation, etc. And so the results will speak for themselves as well.
Peter Bull: I think that’s. A really great point and, thinking about how that data science work that happens, how that project within your organization, how you’re going to assess that is actually going to help you in all the conversations that you’re having to say, Hey, here’s the needle. We want to move. How might we be able to move this together using the data that we have?
and you’ll get even more targeted responses that are going to be valuable. and, 1 of the other things that you can do is, you can start small with data science projects. so we, as an organization, have a sprint based model where, all of our work is not, hey, we’ll deliver this project to you in 6 months.
Like, it’s going to be, you know, this huge undertaking. We’ve got it all planned out ahead of time. like, we’ve been talking about for this entire conversation. a lot of times there are too many unknowns. To have that planning process turn into what the reality is when you actually execute. And so what we do when we engage with organizations is we have month long sprints, or we say, what’s a small piece of the question.
What’s a critical thing that we need to understand that’s on our path to answering the big picture question. Let’s start there. And work on that for a month, and we’ll check in throughout that month, and at the end, we’ll give a presentation and say, Hey, here’s what we learned about this. Here were the unexpected things that came up.
here’s what we would think about for a next sprint and how do we go from where we are and what we’ve learned to where we want to get to. And so we sort of chart that path as we go by having this structured. Iterative framework, rather than saying, Hey, we’re going to work for 6 months and hopefully we get something good.
It’s okay. We’ve worked for a month. We feel like we’ve answered a lot of key questions. Here’s what we learned. Does this make sense? Does this feel right? Let’s decide where we go next together. and so you can start just with some small projects like that to get a sense of what it’s like to work together and to make sure that the problem you want to solve is solvable.
Alexandra Mannerings: I love that. Thank you so much. Now, hopefully folks listening are excited to start working more with data scientists, whether that’s bringing someone in or or working with an organization like yours. If people want to learn more about you and your organization, where could they go to connect with you or follow you or or learn more?
Peter Bull: Yeah, great. Thanks. so the challenges, the machine learning challenges that I started out talking about, you can find those at driven data. Dot O. R. G. that’s our public community. We have, over 100, 000 data scientists who come participate in these challenges. and, that is just the challenge side of the organization at driven data dot C.
O. That’s the consulting side of the organization. So, when we work directly with folks, you can find information about that. There you can connect with me on LinkedIn. I’m Peter bull username. P. J. bull. Same thing on Twitter. I’m on a pretty good mastodon instance called data folks. And so if you’re excited about getting into data, you could hop on there and see what’s up.
so, yeah, those are probably the best ways. if you happen to be on GitHub, you can find me, at on GitHub as well. and if you don’t know what GitHub is, that’s okay. That’s where the engineers programmers and data scientists gather to share code.
Alexandra Mannerings: I love it. So all different manners of connection from the technical to the non technical, so perfect. Well thank you so much for your time today, Peter. This has just been a delightful conversation. I know I’ve learned a ton and I’m sure everyone listening will as well.
Peter Bull: It’s been my pleasure. Thank you so much for having me.
Peter Bull
Peter Bull is the Principal Data Scientist and Co-founder at DrivenData, an organization dedicated to harnessing the power of data science for social good. With a passion for leveraging data to drive positive change, Peter has made significant strides in helping nonprofits and social organizations maximize their impact through data-driven decision-making.
Comments are closed