Benchmarking change

The quest for excellence within social programs relies on knowing what excellence means and how programs compare against others. Benchmarks can enable us to compare one program to another if we have quality comparators and an evaluation culture to generate them – something we currently lack. 


A benchmark is something used by surveyors to provide a means of holding a levelling rod to determine some consistency in elevation measurement of a particular place that could be compared over time. A benchmark represents a fixed point for measurement to allow comparisons over time.

The term benchmark is often used in evaluation as a means of providing comparison between programs or practices, often taking one well-understood and high performing program as the ‘benchmark’ to which others are compared. Benchmarks in evaluation can be the standard to which other measures compare.

In a 2010 article for the World Bank (PDF), evaluators Azevedo, Newman and Pungilupp, articulate the value of benchmarking and provide examples for how it contributes to the understanding of both absolute and relative performance of development programs. Writing about the need for benchmarking, the authors conclude:

In most benchmarking exercises, it is useful to consider not only the nature of the changes in the indicator of interest but also the level. Focusing only on the relative performance in the change can cause the researcher to be overly optimistic. A district, state or country may be advancing comparatively rapidly, but it may have very far to go. Focusing only on the relative performance on the level can cause the researcher to be overly pessimistic, as it may not be sufficiently sensitive to pick up recent changes in efforts to improve.

Compared to what?

One of the challenges with benchmarking exercises is finding a comparator. This is easier for programs operating with relatively simple program systems and structures and less so for more complex ones. For example, in the service sector wait times are a common benchmark. In the province of Ontario in Canada, the government provides regularly updated wait times for Emergency Room visits via a website. In the case of healthcare, benchmarks are used in multiple ways. There is a target that is used as the benchmark, although, depending on the condition, this target might be on a combination of aspiration, evidence, as well as what the health system believes is reasonable, what the public demands (or expects) and what the hospital desires.

Part of the problem with benchmarks set in this manner is that they are easy to manipulate and thus raise the question of whether they are true benchmarks in the first place or just goals.

If I want to set a personal benchmark for good dietary behaviour of eating three meals a day, I might find myself performing exceptionally well as I’ve managed to do this nearly every day within the last three months. If the benchmark is consuming 2790 calories as is recommended for someone of my age, sex, activity levels, fitness goals and such that’s different. Add on that, within that range of calories, the aim is to have about 50% of those come from carbohydrates, 30% from fat and 20% from protein, and we a very different set of issues to consider when contemplating how performance relates to a standard.

One reason we can benchmark diet targets is that the data set we have to set that benchmark is enormous. Tools like MyFitnessPal and others operate to use benchmarks to provide personal data to its users to allow them to do fitness tracking using these exact benchmarks that are gleaned from having 10’s of thousands of users and hundreds of scientific articles and reports on diet and exercise from the past 50 years. From this it’s possible to generate reasonably appropriate recommendations for a specific age group and sex.

These benchmarks are also possible because we have internationally standardized the term calorie. We have further internationally recognized, but slightly less precise, measures for what it means to be a certain age and sex. Activity level gets a little more fuzzy, but we still have benchmarks for it. As the cluster of activities that define fitness and diet goals get clustered together we start to realize that it is a jumble of highly precise and somewhat loosely defined benchmarks.

The bigger challenge comes when we don’t have a scientifically validated standard or even a clear sense of what is being compared and that is what we have with social innovation.

Creating an evaluation culture within social innovation

Social innovation has a variety of definitions, however the common thread of these is that its about a social program aimed at address social problems using ideas, tools, policies and practices that differ from the status quo. Given the complexity of the environments that many social programs are operating, it’s safe to assume that social innovation** is happening all over the world because the contexts are so varied. The irony is that many in this sector are not learning from one another as much as they could, further complicating any initiative to build benchmarks for social programs.

Some groups like the Social Innovation Exchange (SIX) are trying to change that. However, they and others like them, face an uphill battle. Part of the reason is that social innovation has not established a culture of evaluation within it. There remains little in the way of common language, frameworks, or spaces to share and distribute knowledge about programs — both in description and evaluation — in a manner that is transparent and accessible to others.

Competition for funding, the desire to paint programs in a positive light, lack of expertise, not enough resources available for dissemination and translation, absence of a dedicated space for sharing results, and distrust or isolation from academia among certain sectors are some reasons that might contribute to this. For example, the Stanford Social Innovation Review is among the few venues dedicated to scholarship in social innovation aimed at a wide audience. It’s also a venue focused largely on international development and what I might call ‘big’ social innovation: the kind of works that attract large philanthropic resources. There’s lot of other types of social innovation and they don’t all fit into the model that SSIR promotes.

From my experiences, many small organizations or initiatives struggle to fund evaluation efforts sufficiently, let alone the dissemination of the work once it’s finished. Without good quality evaluations and the means to share their results — whether or not they cast a program in positive light or not — it’s difficult to build a culture where the sector can learn from one another. Without a culture of evaluation, we also don’t get the volume of data and access to comparators — appropriate comparators, not just the only things we can find — to develop true, useful benchmarks.

Culture’s feast on strategy

Building on the adage attributed to Peter Drucker that culture eats strategy for breakfast (or lunch) it might be time that we use that feasting to generate some energy for change. If the strategy is to be more evidence based, to learn more about what is happening in the social sector, and to compare across programs to aid that learning there needs to be a culture shift.

This requires some acknowledgement that evaluation, a disciplined means of providing structured feedback and monitoring of programs, is not something adjunct to social innovation, but a key part of it. This is not just in the sense that evaluation provides some of the raw materials (data) to make informed choices that can shape strategy, but that it is as much a part of the raw material for social change as enthusiasm, creativity, focus, and dissatisfaction with the status quo on any particular condition.

We are seeing a culture of shared ownership and collective impact forming, now it’s time to take that further and shape a culture of evaluation that builds on this so we can truly start sharing, building capacity and developing the real benchmarks to show how well social innovation is performing. In doing so, we make social innovation more respectable, more transparent, more comparable and more impactful.

Only by knowing what we are doing and have done can we really sense just how far we can go.

** For this article, I’m using the term social innovation broadly, which might encompass many types of social service programs, government or policy initiatives, and social entrepreneurship ventures that might not always be considered social innovation.

Photo credit: Redwood Benchmark by Hitchster used under Creative Commons License from Flickr.

About the author: Cameron Norman is the Principal of Cense Research + Design and works at assisting organizations and networks in supporting learning and innovation in human services through design, program evaluation, behavioural science and system thinking. He is based in Toronto, Canada.

