In November 2007, Alliance talked to Fay Twersky, recently appointed to head up the Bill & Melinda Gates Foundation’s brand-new Impact Planning and Improvement Unit. Three and a half years later, Caroline Hartnell asked her newly appointed successor, Jodi Nelson, to what extent Bill and Melinda Gates’ original aims for the unit have been achieved. And what challenges does she face in her new role? One thing she emphasizes is the need to measure selectively and only when the results will actually be used to do something.
Fay Twersky told Alliance in 2007 that ‘Bill and Melinda want us to learn from everything we do … So we are putting in place systems of measuring results and feedback loops that will help us to learn in a continuous fashion and then use that learning to improve our grantmaking practices, using our resources most effectively, driving towards impact.’ To what extent has this been achieved?
As is the case for all non-profits, getting measurement right is a perennial challenge. To define the challenge more specifically: how do you measure the results and evaluate the effectiveness of your work when data are scarce, there is no bottom-line metric against which success can be assessed, and no natural feedback loop that provides regular data on performance and incentivizes the use of data to make decisions? It would be naive for us to think we can solve it quickly given how much time and effort so many other organizations – other foundations, bilateral and multilateral donors, implementing agencies and other grantees – have put into it for decades. We are still a young foundation and lucky to be able to learn from what they have tried, and build on their successes.
That said, there has been an enormous amount of progress in the last few years in understanding how best to measure success across the sectors the foundation works in – US education, global health and global development. You can see it in the emphasis that organizations increasingly place on data and evidence for decision-making. An example from the development community is that both the American and British bilateral aid agencies – USAID and DFID – have evaluation policies that emphasize the use of causal evidence to make decisions about how best to alleviate poverty in developing countries. There are heated debates among evaluators and pundits, and even high-level policy discussions, about which evaluation approaches are best suited for different contexts and different types of intervention. Most if not all organizations – both funding and implementing – have teams like ours that work to institutionalize best practice in planning and measurement. This is a far cry from the lack of attention that organizations paid to these essential competencies only a short time ago.
Jeff Raikes, the foundation’s CEO, brings a private sector lens to these discussions. He believes that we need to create feedback loops to bring into the foundation data about how we’re doing so we can continuously improve our work and make it more effective. A good example is the Center for Effective Philanthropy’s Grantee Perception Report. This was designed to get grantee feedback on foundation work and relationships, across a cohort of around 180 to 200 foundations. Given their essential role in our efforts, grantees’ perceptions are an important metric for us. We have been using the resulting data to improve the way we work with grantees, and continue to think about how to extend this type of measurement to other partners.
What do you see as the biggest challenges you face in the work of the Impact Planning and Improvement Unit?
We prefer to think of the work of the IPI team as part of a larger organizational effort. Because planning, measurement and evaluation are core to the foundation’s impact in the world, and to our business, my team is only a small, but important, part of the picture. Right now we are looking both outside and inside the foundation to figure out the best way to assure that the systems that Fay described in her interview are effective. Our partners and grantees do everything from basic research and science to product development, implementation of development and health interventions at scale, capacity building of national and international organizations, and advocacy for reform and policy change in many different areas, both in the US and globally. This diversity is one reason why we want to learn from outside the organization, from private and public sectors. We want to borrow and innovate rather than using a one size fits all approach to develop our strategies and evaluate their progress and effectiveness.
We are working on two related areas into this summer. The first is to make as tight a connection as possible between the way we plan here in Seattle, the execution, measurement and evaluation of the resulting strategies around the world, and our own learning and adaptation. The complex problems that we seek to address with our funding do not come with ‘how to’ guides. If there were easy solutions, someone else would have found them already. My team is responsible for providing what we call the ‘programme teams’ – the content experts in agriculture, HIV or US education, for example – with the tools, processes and support to translate their expertise into concrete plans for execution and measurement. Part of the trick is solely organizational: how do you create effective, efficient processes that help teams to do their best work to solve complex problems with feasible solutions? We have a good start, but still a long way to go to bring to bear the right balance between theory and practice, planning, execution and measurement.
The second thing we’re doing is figuring out how to bring in the right data at the right time so that they actually get used. I’m reading a book called Relevance, by David Apgar, that offers guidance for private companies that have an enormous amount of data available to them, yet still struggle to pinpoint which information and metrics matter most for successful goal setting, strategy and performance measurement. Apgar focuses on the relevance of measures for assessing key strategy issues and advancing organizational learning. It’s easier said than done, though, to figure out what are the most relevant metrics, given the complexity of some of the problems we’re trying to solve and the lack of evidence about how change occurs in many of these sectors.
Even when you’ve figured out the metrics, it doesn’t necessarily mean that the data you collect end up getting used. It may sound intuitive, but the key is figuring out how to guarantee the data are both relevant and used. You’d be amazed at how much data collection goes on in the world that produces reports that are never read or used to do anything differently. We are eager to figure out how to build incentives that would help to drive the use of the resulting data.
You say you hope that focusing your measurement more on learning will place less of a burden on grantees. Do you think this is happening, or is there a danger that grantees have their own things they want to measure and therefore end up with two sets of measuring to do?
One consequence of not having a natural feedback loop that brings performance information into the system is a tendency to try to measure everything. My sense is that this exists across the board – whether you’re an implementer on the ground in Angola or working in a school district in Atlanta, or a donor based in London or DC or here in Seattle. It’s furthered by conventional practice in measuring programme results whereby you define the objective you’re seeking to accomplish with a particular programme or intervention, identify the logical steps that will produce that change, and then collect the data that will tell you if the steps occurred and if they in fact led to the objective. If you think about the potential scale and scope of some of the work we do, you can imagine just how much data this would bring into the system. But not all of it would be relevant, for us or our grantees.
We don’t want to hold our grantees – or ourselves for that matter – accountable for measuring everything. We need the discipline and good strategic thinking to figure out what is feasible, to learn from our grantees, and to work with them to decide what good measurement looks like for them and how that intersects with what we need to fit into the bigger picture of a particular strategy. We need to be clear, strategic and purpose driven, rather than adding to what already exists.
Bill and Melinda coined the term ‘actionable measurement’ in response to questions about the foundation’s philosophy on measurement. The idea behind this is that we want to measure strategically – deliberately and in ways that inform our strategies and decisions – and only when we will actually use the results to do something. We need to pull up and ask what we actually need to know in order to learn and improve.
Like accountability, the term learning has become so much about everything that I fear it is hard to define exactly what it is and whether or not we would know it if we saw it. We are trying here to emphasize decision making in order to focus on measurement and evaluation that we really need to do. Ideally, this means that the information we need is information grantees need as well.
To be sure, it can be a big challenge for grantees to make sense of what donors want and what they themselves want, and, if there’s a difference, to negotiate that, given the inherent power that comes with the holding of the purse. We are very aware of this. I don’t think there’s an easy answer, but there is growing awareness here about the need to support our grantees and not impose undue burdens on them.
Is there a danger that you are failing to establish baselines at the beginning of programmes in order to spend money fast enough?
That’s an interesting question. It speaks to the tension that exists between getting resources out, or what we call ‘pay out’, and measurement. The first thing that comes to my mind is the tsunami that hit South Asia in 2004. If you remember, the crisis hit suddenly and dramatically. Aid agencies that were already in the region responded as quickly as possible; others rushed in to meet unmet needs. The amount of money available – in particular private money – was astonishingly large as people all over reached into their pockets to help. Having worked at one of these organizations at the time, one of the things I remember most is that we had all this unrestricted money, without the constraints that donors can sometimes put in terms of expecting certain results and therefore measurement. But over time, we figured out that we had very little data on what we had achieved. Without the constraints that came with scarce resources, incentives for measurement changed dramatically. Indeed, the same relationship can be seen here during the last few years: the contraction of the economy was perhaps a bit of a blessing for the foundation because it forced a renewed commitment to strengthening the quality of our approaches to strategy, measurement, and the mechanisms for feedback on our performance.
Your question also gives me a chance to tell you more about how we might differ from other donors in the area of measurement and evaluation. The word you use, ‘baselines’, usually refers to baseline surveys that are done to measure change over time. If you’re trying to address HIV prevention in part by influencing people’s attitudes about and use of contraception, for example, you might survey them before you begin an intervention and then return over time to see if their attitudes and practices have changed.
These surveys – baseline and follow-up – can be costly and time-intensive. It’s not only the enumerators – the people who do the surveys – who spend time doing them, but also the interviewees. Imagine that you’re a farmer in sub-Saharan Africa in a country that has a heavy presence of outsiders working to effect change in your lives. You are working hard daily to get food on your table, get your kids to school and keep the family farm productive. There’s a drought in the region and you find yourself approached three times by different teams of researchers, asking the same questions. Each interview takes more than an hour.
I don’t know about you, but I can’t remember the last time I said yes to a telemarketer who calls while I’m having dinner with my kids to ask me to answer a survey. Yet, we often expect people to spend time answering our questions. This is exactly what we don’t want to happen and why we need to be careful that the data collection we expect of our grantees and partners will help them to do their best work. When we ask our grantees to do surveys for us, it’s essential that we have a compelling reason to use the data and that we actually do something differently as a result.
I asked Fay if she thought the Gates Foundation might in the long term be willing to help build the infrastructure for funders to share evaluations, some sort of online information repository. She said: ‘We are a young foundation and our Impact Planning and Improvement unit is one year old … so I think it’s something for us to consider down the road.’ Has that position changed? The Foundation’s growing work in the area of philanthropy suggests that it is beginning to accept the need to play a role in developing the field.
I definitely hope so! I can think of several common purposes towards which we might contribute as part of a larger donor community: collaborating with our partners to agree quality standards for measurement and evaluation; sharing information broadly across sectors so that we decrease the transaction costs and inefficiencies of duplicative efforts; working together to increase the cumulative evidence base with which we can make decisions about how best to improve people’s lives. One thing I particularly hope we will contribute to in the next few years is moving the current dialogue away from overly philosophical, technical discourse on the utility of different evaluation designs and focusing instead on how to integrate both real-time and longer-term data into our organizations so we make increasingly better decisions about how to help people.
Fay quoted Warren Buffett as saying that ‘if we don’t fail, if we hit it out of the park every time, then we’re not going after the right problems’. Is the Foundation failing sometimes?
Of course, in this sense: not every investment will produce the results we hope for, but the challenges we are tackling are so big that we have to be willing to try new things and take new risks in search of answers. Ultimately, failure can lead to success. The only true failure would be if we don’t learn from it. If we weren’t failing sometimes, we’d definitely be doing something wrong and not genuinely representing the bold and always optimistic and inspiring aspirations of the foundation’s leadership.
Jodi Nelson is director of the Impact Planning and Improvement Unit at the Bill & Melinda Gates Foundation. Email Jodi.Nelson@gatesfoundation.org