Tag: program evaluation

businesscomplexityevaluation

A mindset for developmental evaluation

BoyWithGlasses.jpg

Developmental evaluation requires different ways of thinking about programs, people, contexts and the data that comes from all three. Without a change in how we think about these things, no method, tool, or approach will make an evaluation developmental or its results helpful to organizations seeking to innovate, adapt, grow, and sustain themselves. 

There is nothing particularly challenging about developmental evaluation (DE) from a technical standpoint: for the most part, a DE can be performed using the same methods for data collection as other evaluations. What stands DE apart from those other evaluations is less the methods and tools, but the thinking that goes into how those methods and tools are used. This includes the need to ensure that sensemaking is a part of the data analysis plan because it is almost certain that some if not the majority of the data collected will not have an obvious meaning or interpretation.

Without developmental thinking and sensemaking, a DE is just an evaluation with a different name

This is not a moot point, yet the failure of organizations to adopt a developmental mindset toward its programs and operations is (likely) the single-most reason for why DE often fails to live up to its promise in practice.

No child’s play

If you were to ask a five-year old what they want to be when they grow up you might hear answers like a firefighter, princess, train engineer, chef, zookeeper, or astronaut. Some kids will grow up and become such things (or marry accordingly for those few seeking to become princesses or they’ll work for Disney), but most will not. They will become things like sales account managers, marketing directors, restaurant servers, software programmers, accountants, groundskeepers and more. While this is partly about having the opportunity to pursue a career in a certain field, it’s also about changing interests.

A five-year old that wants to be a train engineer might seem pretty normal, but one that wants to be an accountant specializing in risk management in the environmental sector would be considered odd. Yet, it’s perfectly reasonable to speak to a 35-year-old and find them excited about being in such a role.

Did the 35-year-old that wanted to be a firefighter when they were five but became an accountant, fail? Are they a failed firefighter? Is the degree to which they fight fires in their present day occupation a reasonable indicator of career success?

It’s perfectly reasonable to plan to be a princess when you’re five, but not if you’re 35 or 45 or 55 years old unless you’re currently dating a prince or in reasonable proximity to one. What is developmentally appropriate for a five-year-old is not for someone seven times that age.

Further, is a 35-year-old a seven-times better five-year-old? When you’re ten are you twice the person you were when you were five? Why is it OK to praise a toddler for sharing, not biting or slapping their peers, and eating all their vegetables and weird to do it with someone in good mental health in their forties or fifties? It has to do with developmental thinking.

It has to do with a developmental mindset.

Charting evolutionary pathways

We know that as people develop through stages, ages and situations the knowledge, interests, and capacities that a person has will change. We might be the same person and also a different person than the one we were ten years ago. The reason is that we evolve and develop as a person based on a set of experiences, genetics, interests, and opportunities that we encounter. While there are forces that constrain these adaptations (e.g., economics, education, social mobility, availability of and access to local resources), we still evolve over time.

DE is about creating the data structures and processes to understand this evolution as it pertains to programs and services and help to guide meaningful designs for evolution. DE is a tool for charting evolutionary pathways and for documenting the changes over time. Just as putting marks on the wall to chart a child’s growth, taking pictures at school, or writing in a journal, a DE does much of the same thing (even with similar tools).

As anyone with kids will tell you, there are a handful of decisions that a parent can make that will have sure-fire, predictable outcomes when implemented. Many of them are created through trial-and-error and some that work when a child is four won’t work when the child is four and five months. Some decisions will yield outcomes that approximate an expected outcome and some will generate entirely unexpected outcomes (positive and negative). A good parent is one who pays attention to the rhythms, flows, and contexts that surround their child and themselves with the effort to be mindful, caring and compassionate along the way.

This results in no clear, specific prototype for a good parent that can reliably be matched to any kid, nor any highly specific, predictable means of determining who is going to be a successful, healthy person. Still, many of us manage to have kids we can proud of, careers we like, friendships we cherish and intimate relationships that bring joy despite no means of predicting how any of those will go with consistency. We do this all the time because we approach our lives and those of our kids with a developmental mindset.

Programs as living systems

DE is at its best a tool for designing for living systems. It is about discerning what is evolving (and at what rate/s) and what is static within a system and recognizing that the two conditions can co-exist. It’s the reason why many conventional evaluation methods still work within a DE context. It’s also the reason why conventional thinking about those methods often fails to support DE.

Living systems, particularly human systems, are often complex in their nature. They have multiple, overlapping streams of information that interact at different levels, time scales and to different effects inconsistently or at least to a pattern that is only partly ever knowable. This complexity may include simple relationships and more complicated ones, too. Just as a conservation biologist might see a landscape that changes, they can understand what changes are happening quickly, what isn’t, what certain relationships are made and what ones are less discernible.

As evaluators and innovators, we need to consider how our programs and services are living systems. Even something as straightforward as the restaurant industry where food is sought and ordered, prepared, delivered and consumed, then finished has elements of complexity to it. The dynamics of real-time ordering and tracking, delivery, shifting consumer demand, the presence of mobile competitors (e.g., food trucks), changing regulatory environment, novelty concepts (e.g., pop-ups!), and seasonality of food demand and supply has changed how the food preparation business is run.

A restaurant might not just be a bricks-and-mortar operation now, but a multi-faceted, dynamic food creation environment. The reason could be that even if they are good at what they did if everything around them is changing they could still deliver consistently great food and service and fail. They may need to change to stay the same.

This only can happen if we view our programs as living systems and create evaluation mechanisms and strategies that view them in that manner. That means adopting a developmental mindset within an organization because DE can’t exist without it.

If a developmental evaluation is what you need or you want to learn more about how it can serve your needs, contact Cense and inquire about how they can help you. 

Image Credit: Thinkstock used under license.

psychologysystems thinking

Smart goals or better systems?

IMG_1091.jpg

If you’re working toward some sort of collective goals — as an organization, network or even as an individual — you’ve most likely been asked to use SMART goal setting to frame your task. While SMART is a popular tool for management consultants and scholars, does it make sense when you’re looking to make inroads on complex, unique or highly volatile problems or is the answer in the systems we create to advance goals in the first place?  

Goal setting is nearly everywhere.

Globally we had the UN-backed Millennium Development Goals and now have the Sustainable Development Goals and a look at the missions and visions of most corporations, non-profits, government departments and universities and you will see language that is framed in terms of goals, either explicitly or implicitly.

A goal for this purposes is:

goal |ɡōl| noun:  the object of a person’s ambition or effort; an aim or desired result the destination of a journey

Goal setting is the process of determining what it is that you seek to achieve and usually combined with mapping some form of strategy to achieve the goal. Goals can be challenging on their own when a single person is determining what it is that they want, need or feel compelled to do, even more so when aggregated to the level of the organization or a network.

How do you keep people focused on the same thing?

A look at the literature finds a visible presence of one approach: setting SMART goals. SMART goals reflect an acronym that stands for Specific, Measurable, Attainable, Realistic, and Time-bound (or Timely in some examples). The origin of SMART has been traced back to an article in the 1981 issue of the AMA’s journal Management Review by George Doran (PDF). In that piece, Doran comments how unpleasant it is to set objectives and that this is one of the reasons organizations resist it. Yet, in an age where accountability is held in high regard the role of the goal is not only strategic, but operationally critical to attracting and maintaining resources.

SMART goals are part of a larger process called performance management, which is a means by enhancing collective focus and alignment of individuals within an organization . Dartmouth College has a clearly articulated explanation of how goals are framed within the context of performance management:

” Performance goals enable employees to plan and organize their work in accordance with achieving predetermined results or outcomes. By setting and completing effective performance goals, employees are better able to:

  • Develop job knowledge and skills that help them thrive in their work, take on additional responsibilities, or pursue their career aspirations;
  • Support or advance the organization’s vision, mission, values, principles, strategies, and goals;
  • Collaborate with their colleagues with greater transparency and mutual understanding;
  • Plan and implement successful projects and initiatives; and
  • Remain resilient when roadblocks arise and learn from these setbacks.”

Heading somewhere, destination unknown

Evaluation professionals and managers alike love SMART goals and performance measurement. What’s not to like about something that specifically outlines what is to be done in detail, the date its required by, and in a manner that is achievable? It’s like checking off all the boxes in your management performance chart all at once! Alas, the problems with this approach are many.

Specific is pretty safe, so we won’t touch that. It’s good to know what you’re trying to achieve.

But what about measurable? This is what evaluators love, but what does it mean in practice? Metrics and measures reflect a certain type of evaluative approach and require the kind of questions, data collection tools and data to work effectively. If the problem being addressed isn’t something that lends itself to quantification using measures or data that can easily define a part of an issue, then measurement becomes inconclusive at best, useless at worst.

What if you don’t know what is achievable? This might be because you’ve never tried something before or maybe the problem set has never existed before now.

How do you know what realistic is? This is tricky because, as George Bernard Shaw wrote in Man and Superman:

“The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.”

This issue of reasonableness is an important one because innovation, adaptation and discovery are not able reason, but aspiration and hope. Were it for reasonableness, we might have never achieved much of what we’ve set out to accomplish in terms of being innovative, adaptive or creative.

Reasonableness is also the most dangerous for those seeking to make real change and do true innovation. Innovation is not often reasonable, nor are the asks ‘reasonable.’ Most social transformations were not because of reasonable. Women’s rights to vote  or The rights of African Americans to be recognized and treated as human beings in the United States are but two examples from the 20th century that are able lack of ‘reasonableness’.

Lastly, what if you have no idea what the timeline for success is? If you’ve not tackled this before, are working on a dynamic problem, or have uncertain or unstable resources it might be impossible to say how long something will take to solve.

Rethinking goals and their evaluation

One of the better discussions on goals, goal setting and the hard truths associated with what it means to pursue a goal is from James Clear, who draws on some of the research on strategy and decision making to build his list of recommendations. Clear’s summary pulls together a variety of findings that show how individuals construct goals and seek to achieve them and the results suggest that the problem is less about the strategy used to reach a goal, but more on the goals themselves.

What is most relevant for organizations is the concept of ‘rudders and oars‘, which is about creating systems and processes for action and less on the goal itself. In complex systems, our ability to exercise control is highly variable and constrained and goals provide an illusory sense that we have control. So either we fail to achieve our goals or we set goals that we can achieve, which may not be the most important thing we aim for. We essentially rig our system to achieve something that might be achievable, but utterly not important.

Drawing on this work, we are left to re-think goals and commit to the following:

  1. Commit to a process, not a goal
  2. Release the need for immediate results
  3. Build feedback loops
  4. Build better systems, not better goals

To realize this requires an adaptive approach to strategy and evaluation where the two go hand-in-hand and are used systemically. It means pushing aside and rejecting more traditional performance measurement models for individuals and organizations and developing more fine-tuned, custom evaluative approaches that link data to decisions and decisions to actions in an ongoing manner.

It means thinking in systems, about systems and designing for ways to do both on an ongoing, not episodic manner.

The irony is, by minimizing or rejecting the use of goals, you might better achieve a goal of a more impactful, innovative, responsive and creative organization with a real mission for positive change.

 

 

Image credit: Author

 

evaluationsocial innovation

Flipping the Social Impact Finger

7150802251_e1c559d402_k

Look around and one will notice a lot of talk about social enterprise and social impact. Look closer and you’ll find a lot more of the former and far less of the latter. 

There’s a Buddhist-inspired phrase that I find myself reflecting on often when traveling in the social innovation/entrepreneurship/enterprise/impact sphere:

Do not confuse the finger pointing to the moon for the moon itself

As terms like social enterprise and entrepreneurship, social innovation, social laboratories and social impact (which I’ll lump together as [social] for expediency of writing) become better known and written about its easy to caught up in the excitement and proclaiming its success in changing the world. Indeed, we are seeing a real shift in not only what is being done, but a mental shift in what can be perceived to be done among communities that never saw opportunities to advance before.

However exciting this is, there is what I see as a growing tendency to lose the forest amid the trees by focusing on the growth of [social] and less on the impact part of that collection of terms. In other words, there’s a sense that lots of talk and activity in [social] is translating to social impact. Maybe, but how do we know?

Investment and ROI in change

As I’ve written before using the same guiding phrase cited above, there is a great tendency to confuse conversation about something with the very thing that is being talked about in social impact. For all of the attention paid to the amount of ventures and the amount of venture capital raised to support new initiatives across the social innovation spectrum in recent years, precious little change has been witnessed in the evaluations made available of these projects.

As one government official working in this sector recently told me:

We tend to run out of steam after (innovations) get launched and lose focus, forgetting to evaluate what kind of impact and both intended and unintended consequences come with that investment

As we celebrate the investment in new ventures, track the launch of new start-ups, and document the number of people working in the [social] sector we can mistake that for impact. To be sure, having people working in a sector is a sign of jobs, but the question of whether they are temporary, suitably paying, satisfactory, or sustainable are the kind of questions that evaluators might ask and remain largely unanswered.

The principal ROI of [social] is social benefit. That benefit comes in the form of improved products and services, better economic conditions for more people, and a healthier planet and wellbeing for the population of humans on it in different measures. These aren’t theoretical benefits, they need to be real ones and the only way we will know if we achieve anything approximating this is through evaluation.

Crashing, but not wrecking the party

Evaluation needs to crash the party, but it need not kill the mood. A latent fear among many in [social] is likely that, should we invest so much energy, enthusiasm, money and talent on [social] and find that it doesn’t yield the benefits we expect or need a fickle populace of investors, governments and the public will abandon the sector. While there will always be trend-hunters who will pursue the latest ‘flavour of the month’, [social] is not that. It is here to stay.

The focus on evaluation however will determine the speed, scope and shape of its development. Without showing real impact and learning from those initiatives that produce positive benefit (or do not) we will substantially limit [social] and the celebratory parties that we now have at the launch of a new initiative, a featured post on a mainstream site, or a new book will become fewer and farther between.

Photo credit: Moonrise by James Niland used under Creative Commons licence via Flickr. Thanks for sharing your art, James.

behaviour changecomplexitypsychologysocial innovationsocial systems

Decoding the change genome

11722018066_c29148186b_k

Would we invest in something if we had little hard data to suggest what we could expect to gain from that investment? This is often the case with social programs, yet its a domain that has resisted the kind of data-driven approaches to investment that we’ve seen in other sectors and one theory is that we can approach change in the same way we code the genome, but: is that a good idea?

Jason Saul is a maverick in social impact work and dresses the part: he’s wearing a suit. That’s not typically the uniform of those working in the social sector railing against the system, but that’s one of the many things that gets people talking about what he and his colleagues at Mission Measurement are trying to do. That mission is clear: bring the same detailed analysis of the factors involved in contributing to real impact from the known evidence that we would do to nearly any other area of investment.

The way to achieving this mission is to take the thinking behind the Music Genome Project, the algorithms that power the music service Pandora, and apply it to social impact. This is a big task and done by coding the known literature on social impact from across the vast spectrum of research from different disciplines, methods, theories and modeling techniques. A short video from Mission Measurement on this approach nicely outlines the thinking behind this way of looking at evaluation, measurement, and social impact.

Saul presented his vision for measurement and evaluation to a rapt audience in Toronto at the MaRS Discovery District on April 11th as part of their Global Leaders series en route to the Skoll World Forum ; this is a synopsis of what came from that presentation and it’s implications for social impact measurement.

(Re) Producing change

Saul began his presentation by pointing to an uncomfortable truth in social impact: We spread money around with good intention and little insight into actual change. He claims (no reference provided) that 2000 studies are published per day on behaviour change, yet there remains an absence of common metrics and measures within evaluation to detect change. One of the reasons is that social scientists, program leaders, and community advocates resist standardization making the claim that context matters too much to allow aggregation.

Saul isn’t denying that there is truth to the importance of context, but argues that it’s often used as an unreasonable barrier to leading evaluations with evidence. To this end, he’s right. For example, the data from psychology alone shows a poor track record of reproducibility, and thus offers much less to social change initiatives than is needed. As a professional evaluator and social scientist, I’m not often keen to being told how to do what I do, (but sometimes I benefit from it). That can be a barrier, but also it points to a problem: if the data shows how poorly it is replicated, then is following it a good idea in the first place? 

Are we doing things righter than we think or wronger than we know?

To this end, Saul is advocating a meta-evaluative perspective: linking together the studies from across the field by breaking down its components into something akin to a genome. By looking at the combination of components (the thinking goes) like we do in genetics we can start to see certain expressions of particular behaviour and related outcomes. If we knew these things in advance, we could potentially invest our energy and funds into programs that were much more likely to succeed. We also could rapidly scale and replicate programs that are successful by understanding the features that contribute to their fundamental design for change.

The epigenetic nature of change

Genetics is a complex thing. Even on matters where there is reasonably strong data connecting certain genetic traits to biological expression, there are few examples of genes as ‘destiny’as they are too often portrayed. In other words, it almost always depends on a number of things. In recent years the concept of epigenetics has risen in prominence to provide explanations of how genes get expressed and it has as much to do with what environmental conditions are present as it is the gene combinations themselves . McGill scientist Moshe Szyf and his colleagues pioneered research into how genes are suppressed, expressed and transformed through engagement with the natural world and thus helped create the field of epigenetics. Where we once thought genes were prescriptions for certain outcomes, we now know that it’s not that simple.

By approaching change as a genome, there is a risk that the metaphor can lead to false conclusions about the complexity of change. This is not to dismiss the valid arguments being made around poor data standardization, sharing, and research replication, but it calls into question how far the genome model can go with respect to social programs without breaking down. For evaluators looking at social impact, the opportunity is that we can systematically look at the factors that consistently produce change if we have appropriate comparisons. (That is a big if.)

Saul outlined many of the challenges that beset evaluation of social impact research including the ‘file-drawer effect’ and related publication bias, differences in measurement tools, and lack of (documented) fidelity of programs. Speaking on the matter in response to Saul’s presentation, Cathy Taylor from the Ontario Non-Profit Network, raised the challenge that comes when much of what is known about a program is not documented, but embodied in program staff and shared through exchanges.  The matter of tacit knowledge  and practice-based evidence is one that bedevils efforts to compare programs and many social programs are rich in context — people, places, things, interactions — that remain un-captured in any systematic way and it is that kind of data capture that is needed if we wish to understand the epigenetic nature of change.

Unlike Moshe Szyf and his fellow scientists working in labs, we can’t isolate, observe and track everything our participants do in the world in the service of – or support to – their programs, because they aren’t rats in a cage.

Systems thinking about change

One of the other criticisms of the model that Saul and his colleagues have developed is that it is rather reductionist in its expression. While there is ample consideration of contextual factors in his presentation of the model, the social impact genome is fundamentally based on reductionist approaches to understanding change. A reductionist approach to explaining social change has been derided by many working in social innovation and environmental science as outdated and inappropriate for understanding how change happens in complex social systems.

What is needed is synthesis and adaptation and a meta-model process, not a singular one.

Saul’s approach is not in opposition to this, but it does get a little foggy how the recombination of parts into wholes gets realized. This is where the practical implications of using the genome model start to break down. However, this isn’t a reason to give up on it, but an invitation to ask more questions and to start testing the model out more fulsomely. It’s also a call for systems scientists to get involved, just like they did with the human genome project, which has given us great understanding of what influences our genes have and stressed the importance of the environment and how we create or design healthy systems for humans and the living world.

At present, the genomic approach to change is largely theoretical backed with ongoing development and experiments but little outcome data. There is great promise that bigger and better data, better coding, and a systemic approach to looking at social investment will lead to better outcomes, but there is little actual data on whether this approach works, for whom, and under what conditions. That is to come. In the meantime, we are left with questions and opportunities.

Among the most salient of the opportunities is to use this to inspire greater questions about the comparability and coordination of data. Evaluations as ‘one-off’ bespoke products are not efficient…unless they are the only thing that we have available. Wise, responsible evaluators know when to borrow or adapt from others and when to create something unique. Regardless of what design and tools we use however, this calls for evaluators to share what they learn and for programs to build the evaluative thinking and reflective capacity within their organizations.

The future of evaluation is going to include this kind of thinking and modeling. Evaluators, social change leaders, grant makers and the public alike ignore this at their peril, which includes losing opportunities to make evaluation and social impact development more accountable, more dynamic and impactful.

Photo credit (main): Genome by Quinn Dombrowski used under Creative Commons License via Flickr. Thanks for sharing Quinn!

About the author: Cameron Norman is the Principal of Cense Research + Design and assists organizations and networks in supporting learning and innovation in human services through design, program evaluation, behavioural science and system thinking. He is based in Toronto, Canada.

evaluationsocial innovation

Benchmarking change

The quest for excellence within social programs relies on knowing what excellence means and how programs compare against others. Benchmarks can enable us to compare one program to another if we have quality comparators and an evaluation culture to generate them – something we currently lack. 

5797496800_7876624709_b

A benchmark is something used by surveyors to provide a means of holding a levelling rod to determine some consistency in elevation measurement of a particular place that could be compared over time. A benchmark represents a fixed point for measurement to allow comparisons over time.

The term benchmark is often used in evaluation as a means of providing comparison between programs or practices, often taking one well-understood and high performing program as the ‘benchmark’ to which others are compared. Benchmarks in evaluation can be the standard to which other measures compare.

In a 2010 article for the World Bank (PDF), evaluators Azevedo, Newman and Pungilupp, articulate the value of benchmarking and provide examples for how it contributes to the understanding of both absolute and relative performance of development programs. Writing about the need for benchmarking, the authors conclude:

In most benchmarking exercises, it is useful to consider not only the nature of the changes in the indicator of interest but also the level. Focusing only on the relative performance in the change can cause the researcher to be overly optimistic. A district, state or country may be advancing comparatively rapidly, but it may have very far to go. Focusing only on the relative performance on the level can cause the researcher to be overly pessimistic, as it may not be sufficiently sensitive to pick up recent changes in efforts to improve.

Compared to what?

One of the challenges with benchmarking exercises is finding a comparator. This is easier for programs operating with relatively simple program systems and structures and less so for more complex ones. For example, in the service sector wait times are a common benchmark. In the province of Ontario in Canada, the government provides regularly updated wait times for Emergency Room visits via a website. In the case of healthcare, benchmarks are used in multiple ways. There is a target that is used as the benchmark, although, depending on the condition, this target might be on a combination of aspiration, evidence, as well as what the health system believes is reasonable, what the public demands (or expects) and what the hospital desires.

Part of the problem with benchmarks set in this manner is that they are easy to manipulate and thus raise the question of whether they are true benchmarks in the first place or just goals.

If I want to set a personal benchmark for good dietary behaviour of eating three meals a day, I might find myself performing exceptionally well as I’ve managed to do this nearly every day within the last three months. If the benchmark is consuming 2790 calories as is recommended for someone of my age, sex, activity levels, fitness goals and such that’s different. Add on that, within that range of calories, the aim is to have about 50% of those come from carbohydrates, 30% from fat and 20% from protein, and we a very different set of issues to consider when contemplating how performance relates to a standard.

One reason we can benchmark diet targets is that the data set we have to set that benchmark is enormous. Tools like MyFitnessPal and others operate to use benchmarks to provide personal data to its users to allow them to do fitness tracking using these exact benchmarks that are gleaned from having 10’s of thousands of users and hundreds of scientific articles and reports on diet and exercise from the past 50 years. From this it’s possible to generate reasonably appropriate recommendations for a specific age group and sex.

These benchmarks are also possible because we have internationally standardized the term calorie. We have further internationally recognized, but slightly less precise, measures for what it means to be a certain age and sex. Activity level gets a little more fuzzy, but we still have benchmarks for it. As the cluster of activities that define fitness and diet goals get clustered together we start to realize that it is a jumble of highly precise and somewhat loosely defined benchmarks.

The bigger challenge comes when we don’t have a scientifically validated standard or even a clear sense of what is being compared and that is what we have with social innovation.

Creating an evaluation culture within social innovation

Social innovation has a variety of definitions, however the common thread of these is that its about a social program aimed at address social problems using ideas, tools, policies and practices that differ from the status quo. Given the complexity of the environments that many social programs are operating, it’s safe to assume that social innovation** is happening all over the world because the contexts are so varied. The irony is that many in this sector are not learning from one another as much as they could, further complicating any initiative to build benchmarks for social programs.

Some groups like the Social Innovation Exchange (SIX) are trying to change that. However, they and others like them, face an uphill battle. Part of the reason is that social innovation has not established a culture of evaluation within it. There remains little in the way of common language, frameworks, or spaces to share and distribute knowledge about programs — both in description and evaluation — in a manner that is transparent and accessible to others.

Competition for funding, the desire to paint programs in a positive light, lack of expertise, not enough resources available for dissemination and translation, absence of a dedicated space for sharing results, and distrust or isolation from academia among certain sectors are some reasons that might contribute to this. For example, the Stanford Social Innovation Review is among the few venues dedicated to scholarship in social innovation aimed at a wide audience. It’s also a venue focused largely on international development and what I might call ‘big’ social innovation: the kind of works that attract large philanthropic resources. There’s lot of other types of social innovation and they don’t all fit into the model that SSIR promotes.

From my experiences, many small organizations or initiatives struggle to fund evaluation efforts sufficiently, let alone the dissemination of the work once it’s finished. Without good quality evaluations and the means to share their results — whether or not they cast a program in positive light or not — it’s difficult to build a culture where the sector can learn from one another. Without a culture of evaluation, we also don’t get the volume of data and access to comparators — appropriate comparators, not just the only things we can find — to develop true, useful benchmarks.

Culture’s feast on strategy

Building on the adage attributed to Peter Drucker that culture eats strategy for breakfast (or lunch) it might be time that we use that feasting to generate some energy for change. If the strategy is to be more evidence based, to learn more about what is happening in the social sector, and to compare across programs to aid that learning there needs to be a culture shift.

This requires some acknowledgement that evaluation, a disciplined means of providing structured feedback and monitoring of programs, is not something adjunct to social innovation, but a key part of it. This is not just in the sense that evaluation provides some of the raw materials (data) to make informed choices that can shape strategy, but that it is as much a part of the raw material for social change as enthusiasm, creativity, focus, and dissatisfaction with the status quo on any particular condition.

We are seeing a culture of shared ownership and collective impact forming, now it’s time to take that further and shape a culture of evaluation that builds on this so we can truly start sharing, building capacity and developing the real benchmarks to show how well social innovation is performing. In doing so, we make social innovation more respectable, more transparent, more comparable and more impactful.

Only by knowing what we are doing and have done can we really sense just how far we can go.

** For this article, I’m using the term social innovation broadly, which might encompass many types of social service programs, government or policy initiatives, and social entrepreneurship ventures that might not always be considered social innovation.

Photo credit: Redwood Benchmark by Hitchster used under Creative Commons License from Flickr.

About the author: Cameron Norman is the Principal of Cense Research + Design and works at assisting organizations and networks in supporting learning and innovation in human services through design, program evaluation, behavioural science and system thinking. He is based in Toronto, Canada.

complexityeducation & learningevaluationsystems thinking

Developmental Evaluation: Questions and Qualities

Same thing, different colour or different thing?

Same thing, different colour or different thing?

Developmental evaluation, a form of real-time evaluation focused on innovation and complexity, is gaining interest and attention with funders, program developers, and social innovators. Yet, it’s popularity is revealing fundamental misunderstandings and misuse of the term that, if left unquestioned, may threaten the advancement of this important approach as a tool to support innovation and resilience. 

If you are operating in the social service, health promotion or innovation space it is quite possible that you’ve been hearing about developmental evaluation, an emerging approach to evaluation that is suited for programs operating in highly complex, dynamic conditions.

Developmental evaluation (DE) is an exciting advancement in evaluative and program design thinking because it links those two activities together and creates an ongoing conversation about innovation in real time to facilitate strategic learning about what programs do and how they can evolve wisely. Because it is rooted in both traditional program evaluation theory and methods as well as complexity science it takes a realist approach to evaluation making it fit with the thorny, complex, real-world situations that many programs find themselves inhabiting.

I ought to be excited at seeing DE brought up so often, yet I am often not. Why?

Building a better brand for developmental evaluation?

Alas, with rare exception, when I hear someone speak about the developmental evaluation they are involved in I fail to hear any of the indicator terms one would expect from such an evaluation. These include terms like:

  • Program adaptation
  • Complexity concepts like emergence, attractors, self-organization, boundaries,
  • Strategic learning
  • Surprise!
  • Co-development and design
  • Dialogue
  • System dynamics
  • Flexibility

DE is following the well-worn path laid by terms like systems thinking, which is getting less useful every day as it starts being referred as any mode of thought that focuses on the bigger context of a program (the system (?) — whatever that is, it’s never elaborated on) even if there is no structure, discipline, method or focus to that thinking that one would expect from true systems thinking. In other words, its thinking about a system without the effort of real systems thinking. Still, people see themselves as systems thinkers as a result.

I hear the term DE being used more frequently in this cavalier manner that I suspect reflects aspiration rather than reality.

This aspiration is likely about wanting to be seen (by themselves and others) as innovative, as adaptive, and participative and as being a true learning organization. DE has the potential to support all of this, but to accomplish these things requires an enormous amount of commitment. It is not for the faint of heart, the rigid and inflexible, the traditionalists, or those who have little tolerance for risk.

Doing DE requires that you set up a system for collecting, sharing, sensemaking, and designing-with data. It means being willing to — and competent enough to know how to — adapt your evaluation design and your programs themselves in measured, appropriate ways.

DE is about discipline, not precision. Too often, I see quests to get a beautiful, elegant design to fit the ‘social messes‘ that represent the programs under evaluation only to do what Russell Ackoff calls “the wrong things, righter” because they apply a standard, rigid method to a slippery, complex problem.

Maybe we need to build a better brand for DE.

Much ado about something

Why does this fuss about the way people use the term DE matter? Is this not some academic rant based on a sense of ‘preciousness’ of a term? Who cares what we call it?

This matters because the programs that use and can benefit from DE matter. If its just gathering some loose data, slapping it together and saying its an evaluation and knowing that nothing will ever be done with it, then maybe its OK (actually, that’s not OK either — but let’s pretend here for the sake of the point). When real program decisions are made, jobs are kept or lost, communities are strengthened or weakened, and the energy and creative talents of those involved is put to the test because of evaluation and its products, the details matter a great deal.

If DE promises a means to critically, mindfully and thoroughly support learning and innovation than it needs to keep that promise. But that promise can only be kept if what we call DE is not something else.

That ‘something else’ is often a form of utilization-focused evaluation, or maybe participatory evaluation or it might simply be a traditional evaluation model dressed up with words like ‘complexity’ and ‘innovation’ that have no real meaning. (When was the last time you heard someone openly question what someone meant by those terms?)

We take such terms as given and for granted and make enormous assumptions about what they mean that are not always supported). There is nothing wrong with any of these methods if they are appropriate, but too often I see mis-matches between the problem and the evaluative thinking and practice tools used to address them. DE is new, sexy and a sure sign of innovation to some, which is why it is often picked.

Yet, it’s like saying “I need a 3-D printer” when you’re looking to fix a pipe on your sink instead of a wrench, because that’s the latest tool innovation and wrenches are “last year’s” tool. It makes no sense. Yet, it’s done all the time.

Qualities and qualifications

There is something alluring about the mysterious. Innovation, design and systems thinking all have elements of mystery to them, which allows for obfuscation, confusion and well-intentioned errors in judgement depending on who and what is being discussed in relation to those terms.

I’ve started seeing recent university graduates claiming to be developmental evaluators who have almost no concept of complexity, service design, and have completed just a single course in program evaluation. I’m seeing traditional organizations recruit and hire for developmental evaluation without making any adjustments to their expectations, modes of operating, or timelines from the status quo and still expecting results that could only come from DE. It’s as I’ve written before and that Winston Churchill once said:

I am always ready to learn, but I don’t always like being taught

Many programs are not even primed to learn, let alone being taught.

So what should someone look for in DE and those who practice it? What are some questions those seeking DE support ask of themselves?

Of evaluators

  • What familiarity and experience do you have with complexity theory and science? What is your understanding of these domains?
  • What experience do you have with service design and design thinking?
  • What kind of evaluation methods and approaches have you used in the past? Are you comfortable with mixed-methods?
  • What is your understanding of the concepts of knowledge integration and sensemaking? And how have you supported others in using these concepts in your career?
  • What is your education, experience and professional qualifications in evaluation?
  • Do you have skills in group facilitation?
  • How open and willing are you to support learning, adapt, and change your own practice and evaluation designs to suit emerging patterns from the DE?

Of programs

  • Are you (we) prepared to alter our normal course of operations in support of the learning process that might emerge from a DE?
  • How comfortable are we with uncertainty? Unpredictability? Risk?
  • Are our timelines and boundaries we place on the DE flexible and negotiable?
  • What kind of experience do we have truly learning and are we prepared to create a culture around the evaluation that is open to learning? (This means tolerance of ambiguity, failure, surprise, and new perspectives?)
  • Do we have practices in place that allow us to be mindful and aware of what is going on regularly (as opposed to every 6-months to a year)?
  • How willing are we to work with the developmental evaluator to learn, adapt and design our programs?
  • Are our funders/partners/sponsors/stakeholders willing to come with us on our journey?

Of both evaluators and program stakeholders

  • Are we willing to be open about our fears, concerns, ideas and aspirations with ourselves and each other?
  • Are we willing to work through data that is potentially ambiguous, contradictory, confusing, time-sensitive, context-sensitive and incomplete in capturing the entire system?
  • Are we willing/able to bring others into the journey as we go?

DE is not a magic bullet, but it can be a very powerful ally to programs who are operating in domains of high complexity and require innovation to adapt, thrive and build resilience. It is an important job and a very formidable challenge with great potential benefits to those willing to dive into it competently. It is for these reasons that it is worth doing and doing well.

In order for us to get there this means taking DE seriously and the demands it puts on us, the requirements for all involved, and the need to be clear in our language lest we let the not-good-enough be the enemy of the great.

 

Photo credit: Highline Chairs by the author

complexitydesign thinkingemergenceevaluationsystems science

Developmental Evaluation and Design

Creation for Reproduction

Creation for Reproduction

 

Innovation is about channeling new ideas into useful products and services, which is really about design. Thus, if developmental evaluation is about innovation, then it is also fundamental that those engaging in such work — on both evaluator and program ends — understand design. In this final post in this first series of Developmental Evaluation and.., we look at how design and design thinking fits with developmental evaluation and what the implications are for programs seeking to innovate.  

Design is a field of practice that encompasses professional domains, design thinking, and critical design approaches altogether. It is a big field, a creative one, but also a space where there is much richness in thinking, methods and tools that can aid program evaluators and program operators.

Defining design

In their excellent article on designing for emergence (PDF), OCAD University’s Greg Van Alstyne and Bob Logan introduce a definition they set out to be the shortest, most concise one they could envision:

Design is creation for reproduction

It may also be the best (among many — see Making CENSE blog for others) because it speaks to what design does, is intended to do and where it came from all at the same time. A quick historical look at design finds that the term didn’t really exist until the industrial revolution. It was not until we could produce things and replicate them on a wide scale that design actually mattered. Prior to that what we had was simply referred to as craft. One did not supplant the other, however as societies transformed through migration, technology development and adoption, shifted political and economic systems that increased collective actions and participation, we saw things — products, services, and ideas — primed for replication and distribution and thus, designed.

The products, services and ideas that succeeded tended to be better designed for such replication in that they struck a chord with an audience who wanted to further share and distribute that said object. (This is not to say that all things replicated are of high quality or ethical value, just that they find the right purchase with an audience and were better designed for provoking that).

In a complex system, emergence is the force that provokes the kind of replication that we see in Van Alstyne and Logan’s definition of design. With emergence, new patterns emerge from activity that coalesces around attractors and this is what produces novelty and new information for innovation.

A developmental evaluator is someone who creates mechanisms to capture data and channel it to program staff / clients who can then make sense of it and thus either choose to take actions that stabilize that new pattern of activity in whatever manner possible, amplify it or — if it is not helpful — make adjustments to dampen it.

But how do we do this if we are not designing?

Developmental evaluation as design

A quote from Nobel Laureate Herbert Simon is apt when considering why the term design is appropriate for developmental evaluation:

“Everyone designs who devises courses of action aimed at changing existing situations into preferred ones”.

Developmental evaluation is about modification, adaptation and evolution in innovation (poetically speaking) using data as a provocation and guide for programs. One of the key features that makes developmental evaluation (DE) different from other forms of evaluation is the heavy emphasis on use of evaluation findings. No use, no DE.

But further, what separates DE from ulitization-focused evaluation (PDF) is that the use of evaluation data is intended to foster development of the program, not just use. I’ve written about this in explaining what development looks like in other posts. No development, no DE.

Returning to Herb Simon’s quote we see that the goal of DE is to provoke some discussion of development and thus, change, so it could be argued that, at least at some level, DE it is about design. That is a tepid assertion. A more bold one is that design is actually integral to development and thus, developmental design is what we ought to be striving for through our DE work. Developmental design is not only about evaluative thinking, but design thinking as well. It brings together the spirit of experimentation working within complexity, the feedback systems of evaluation, with a design sensibility around how to sensemake, pay attention to, and transform that information into a new product evolution (innovation).

This sounds great, but if you don’t think about design then you’re not thinking about innovating and that means you’re really developing your program.

Ways of thinking about design and innovation

There are numerous examples of design processes and steps. A full coverage of all of this is beyond the scope of a single post and will be expounded on in future posts here and on the Making CENSE blog for tools. However, one approach to design (thinking) is highlighted below and is part of the constellation of approaches that we use at CENSE Research + Design:

The design and innovation cycle

The design and innovation cycle

Much of this process has been examined in the previous posts in this series, however it is worth looking at this again.

Herbert Simon wrote about design as a problem forming (finding), framing and solving activity (PDF). Other authors like IDEO’s Tim Brown and the Kelley brothers, have written about design further (for more references check out CENSEMaking’s library section), but essentially the three domains proposed by Simon hold up as ways to think about design at a very basic level.

What design does is make the process of stabilizing, amplifying or dampening the emergence of new information in an intentional manner. Without a sense of purpose — a mindful attention to process as well — and a sensemaking process put in place by DE it is difficult to know what is advantageous or not. Within the realm of complexity we run the risk of amplifying and dampening the wrong things…or ignoring them altogether. This has immense consequences as even staying still in a complex system is moving: change happens whether we want it or not.

The above diagram places evaluation near the end of the corkscrew process, however that is a bit misleading. It implies that DE-related activities come at the end. What is being argued here is that if the place isn’t set for this to happen at the beginning by asking the big questions at the beginning — the problem finding, forming and framing — then the efforts to ‘solve’ them are unlikely to succeed.

Without the means to understand how new information feeds into design of the program, we end up serving data to programs that know little about what to do with it and one of the dangers in complexity is having too much information that we cannot make sense of. In complex scenarios we want to find simplicity where we can, not add more complexity.

To do this and to foster change is to be a designer. We need to consider the program/product/service user, the purpose, the vision, the resources and the processes that are in place within the systems we are working to create and re-create the very thing we are evaluating while we are evaluating it. In that entire chain we see the reason why developmental evaluators might also want to put on their black turtlenecks and become designers as well.

No, designers don't all look like this.

No, designers don’t all look like this.

 

Photo Blueprint by Will Scullen used under Creative Commons License

Design and Innovation Process model by CENSE Research + Design

Lower image used under license from iStockphoto.