Tag: measurement

education & learningevaluation

The Quality Conundrum in Evaluation

lex-sirikiat-469013-unsplash

One of the central pillars of evaluation is assessing the quality of something, often described as its merit. Along with worth (value) and significance (importance), assessing the merit of a program, product or service is one of the principal areas that evaluators focus their energy.

However, if you think that would be something that’s relatively simple to do, you would be wrong.

This was brought home clearly in a discussion I took part in as part of a session on quality and evaluation at the recent conference of the American Evaluation Association entitled: Who decides if it’s good? How? Balancing rigor, relevance, and power when measuring program quality. The conversation session was hosted by Madeline Brandt and Kim Leonard from the Oregon Community Foundation, who presented on some of their work in evaluating quality within the school system in that state.

In describing the context of their work in schools, I was struck by some of the situational variables that came into play such as high staff turnover (and a resulting shortage among those staff that remain) and the decision to operate some schools on a four-day workweek instead of five as a means of addressing shortfalls in funding. I’ve since learned that Oregon is not alone in adopting the 4-day school week; many states have begun experimenting with it to curb costs. The argument is, presumably, that schools can and must do more with less time.

This means that students are receiving up to one fifth less classroom time each week, yet expecting to perform at the same level as those with five days. What does that mean for quality? Like much of evaluation work, it all depends on the context.

Quality in context

The United States has a long history of standardized testing, which was instituted partly as a means of ensuring quality in education. The thinking was that, with such diversity in schools, school types, and populations there needed to be some means to compare the capabilities and achievement across these contexts. A standardized test was presumed to serve as a means of assessing these attributes by creating a benchmark (standard) to which student performance could be measured and compared.

While there is a certain logic to this, standardized testing has a series of flaws embedded in its core assumptions about how education works. For starters, it assumes a standard curriculum and model of instruction that is largely one-size-fits-all. Anyone who has been in a classroom knows this is simply not realistic or appropriate. Teachers may teach the same material, but the manner in which it is introduced and engaged with is meant to reflect the state of the classroom — it’s students, physical space, availability of materials, and place within the curriculum (among others).

If we put aside the ridiculous assumption that all students are alike in their ability and preparedness to learn each day for a minute and just focus on the classroom itself, we already see the problem with evaluating quality by looking back at the 4-day school week. Four-day weeks mean either that teachers are creating short-cuts in how they introduce subjects and are not teaching all of the material they have or they are teaching the same material in a compressed amount of time, giving students less opportunity to ask questions and engage with the content. This means the intervention (i.e., classroom instruction) is not consistent across settings and thus, how could one expect things like standardized tests to reflect a common attribute? What quality education means in this context is different than others.

And that’s just the variable of time. Consider the teachers themselves. If we have high staff turnover, it is likely an indicator that there are some fundamental problems with the job. It may be low pay, poor working conditions, unreasonable demands, insufficient support or recognition, or little opportunity for advancement to name a few. How motivated, supported, or prepared do you think these teachers are?

With all due respect to those teachers, they may be incompetent to facilitate high-quality education in this kind of classroom environment. By incompetent, I mean not being prepared to manage compressed schedules, lack of classroom resources, demands from standardized tests (and parents), high student-teacher ratios, individual student learning needs, plus fitting in the other social activities that teachers participate in around school such as clubs, sports, and the arts. Probably no teachers have the competency for that. Those teachers — at least the ones that don’t quit their job — do what they can with what they have.

Context in Quality

This situation then demands new thinking about what quality means in the context of teaching. Is a high-quality teaching performance one where teachers are better able to adapt, respond to the changes, and manage to simply get through the material without losing their students? It might be.

Exemplary teaching in the context of depleted or scarce resources (time, funding, materials, attention) might look far different than if conducted under conditions of plenty. The learning outcomes might also be considerably different, too. So the link between the quality of teaching and learning outcomes is highly dependent on many contextual variables that, if we fail to account for them, will misattribute causes and effects.

What does this mean for quality? Is it an objective standard or a negotiated, relative one? Can it be both?

This is the conundrum that we face when evaluating something like the education system and its outcomes. Are we ‘lowering the bar’ for our students and society by recognizing outstanding effort in the face of unreasonable constraints or showing quality can exist in even the most challenging of conditions? We risk accepting something that under many conditions is unacceptable with one definition and blaming others for outcomes they can’t possibly achieve with the other.

From the perspective of standardized tests, the entire system is flawed to the point where the measurement is designed to capture outcomes that schools aren’t equipped to generate (even if one assumes that standardized tests measure the ‘right’ things in the ‘right’ way, which is another argument for another day).

Speaking truth to power

This years’ AEA conference theme was speaking truth to power and this situation provides a strong illustration of that. While evaluators may not be able to resolve this conundrum, what they can do is illuminate the issue through their work. By drawing attention to the standards of quality, their application, and the conditions that are associated with their realization in practice, not just theory, evaluation can serve to point to areas where there are injustices, unreasonable demands, and areas for improvement.

Rather than assert blame or unfairly label something as good or bad, evaluation, when done with an eye to speaking truth to power, can play a role in fostering quality and promoting the kind of outcomes we desire, not just the ones we get. In this way, perhaps the real measure of quality is the degree to which our evaluations do this. That is a standard that, as a profession, we can live up to and that our clients — students, teachers, parents, and society — deserve.

Image credit:  Lex Sirikiat

evaluation

Meaning and metrics for innovation

miguel-a-amutio-426624-unsplash.jpg

Metrics are at the heart of evaluation of impact and value in products and services although they are rarely straightforward. What makes a good metric requires some thinking about what the meaning of a metric is, first. 

I recently read a story on what makes a good metric from Chris Moran, Editor of Strategic Projects at The Guardian. Chris’s work is about building, engaging, and retaining audiences online so he spends a lot of time thinking about metrics and what they mean.

Chris – with support from many others — outlines the five characteristics of a good metric as being:

  1. Relevant
  2. Measurable
  3. Actionable
  4. Reliable
  5. Readable (less likely to be misunderstood)

(What I liked was that he also pointed to additional criteria that didn’t quite make the cut but, as he suggests, could).

This list was developed in the context of communications initiatives, which is exactly the point we need to consider: context matters when it comes to metrics. Context also is holistic, thus we need to consider these five (plus the others?) criteria as a whole if we’re to develop, deploy, and interpret data from these metrics.

As John Hagel puts it: we are moving from the industrial age where standardized metrics and scale dominated to the contextual age.

Sensemaking and metrics

Innovation is entirely context-dependent. A new iPhone might not mean much to someone who has had one but could be transformative to someone who’s never had that computing power in their hand. Home visits by a doctor or healer were once the only way people were treated for sickness (and is still the case in some parts of the world) and now home visits are novel and represent an innovation in many areas of Western healthcare.

Demographic characteristics are one area where sensemaking is critical when it comes to metrics and measures. Sensemaking is a process of literally making sense of something within a specific context. It’s used when there are no standard or obvious means to understand the meaning of something at the outset, rather meaning is made through investigation, reflection, and other data. It is a process that involves asking questions about value — and value is at the core of innovation.

For example, identity questions on race, sexual orientation, gender, and place of origin all require intense sensemaking before, during, and after use. Asking these questions gets us to consider: what value is it to know any of this?

How is a metric useful without an understanding of the value in which it is meant to reflect?

What we’ve seen from population research is that failure to ask these questions has left many at the margins without a voice — their experience isn’t captured in the data used to make policy decisions. We’ve seen the opposite when we do ask these questions — unwisely — such as strange claims made on associations, over-generalizations, and stereotypes formed from data that somehow ‘links’ certain characteristics to behaviours without critical thought: we create policies that exclude because we have data.

The lesson we learn from behavioural science is that, if you have enough data, you can pretty much connect anything to anything. Therefore, we need to be very careful about what we collect data on and what metrics we use.

The role of theory of change and theory of stage

One reason for these strange associations (or absence) is the lack of a theory of change to explain why any of these variables ought to play a role in explaining what happens. A good, proper theory of change provides a rationale for why something should lead to something else and what might come from it all. It is anchored in data, evidence, theory, and design (which ties it together).

Metrics are the means by which we can assess the fit of a theory of change. What often gets missed is that fit is also context-based by time. Some metrics have a better fit at different times during an innovation’s development.

For example, a particular metric might be more useful in later-stage research where there is an established base of knowledge (e.g., when an innovation is mature) versus when we are looking at the early formation of an idea. The proof-of-concept stage (i.e., ‘can this idea work?’) is very different than if something is in the ‘can this scale’? stage. To that end, metrics need to be fit with something akin to a theory of stage. This would help explain how an innovation might develop at the early stage versus later ones.

Metrics are useful. Blindly using metrics — or using the wrong ones — can be harmful in ways that might be unmeasurable without the proper thinking about what they do, what they represent, and which ones to use.

Choose wisely.

Photo by Miguel A. Amutio on Unsplash

psychologysystems thinking

Smart goals or better systems?

IMG_1091.jpg

If you’re working toward some sort of collective goals — as an organization, network or even as an individual — you’ve most likely been asked to use SMART goal setting to frame your task. While SMART is a popular tool for management consultants and scholars, does it make sense when you’re looking to make inroads on complex, unique or highly volatile problems or is the answer in the systems we create to advance goals in the first place?  

Goal setting is nearly everywhere.

Globally we had the UN-backed Millennium Development Goals and now have the Sustainable Development Goals and a look at the missions and visions of most corporations, non-profits, government departments and universities and you will see language that is framed in terms of goals, either explicitly or implicitly.

A goal for this purposes is:

goal |ɡōl| noun:  the object of a person’s ambition or effort; an aim or desired result the destination of a journey

Goal setting is the process of determining what it is that you seek to achieve and usually combined with mapping some form of strategy to achieve the goal. Goals can be challenging on their own when a single person is determining what it is that they want, need or feel compelled to do, even more so when aggregated to the level of the organization or a network.

How do you keep people focused on the same thing?

A look at the literature finds a visible presence of one approach: setting SMART goals. SMART goals reflect an acronym that stands for Specific, Measurable, Attainable, Realistic, and Time-bound (or Timely in some examples). The origin of SMART has been traced back to an article in the 1981 issue of the AMA’s journal Management Review by George Doran (PDF). In that piece, Doran comments how unpleasant it is to set objectives and that this is one of the reasons organizations resist it. Yet, in an age where accountability is held in high regard the role of the goal is not only strategic, but operationally critical to attracting and maintaining resources.

SMART goals are part of a larger process called performance management, which is a means by enhancing collective focus and alignment of individuals within an organization . Dartmouth College has a clearly articulated explanation of how goals are framed within the context of performance management:

” Performance goals enable employees to plan and organize their work in accordance with achieving predetermined results or outcomes. By setting and completing effective performance goals, employees are better able to:

  • Develop job knowledge and skills that help them thrive in their work, take on additional responsibilities, or pursue their career aspirations;
  • Support or advance the organization’s vision, mission, values, principles, strategies, and goals;
  • Collaborate with their colleagues with greater transparency and mutual understanding;
  • Plan and implement successful projects and initiatives; and
  • Remain resilient when roadblocks arise and learn from these setbacks.”

Heading somewhere, destination unknown

Evaluation professionals and managers alike love SMART goals and performance measurement. What’s not to like about something that specifically outlines what is to be done in detail, the date its required by, and in a manner that is achievable? It’s like checking off all the boxes in your management performance chart all at once! Alas, the problems with this approach are many.

Specific is pretty safe, so we won’t touch that. It’s good to know what you’re trying to achieve.

But what about measurable? This is what evaluators love, but what does it mean in practice? Metrics and measures reflect a certain type of evaluative approach and require the kind of questions, data collection tools and data to work effectively. If the problem being addressed isn’t something that lends itself to quantification using measures or data that can easily define a part of an issue, then measurement becomes inconclusive at best, useless at worst.

What if you don’t know what is achievable? This might be because you’ve never tried something before or maybe the problem set has never existed before now.

How do you know what realistic is? This is tricky because, as George Bernard Shaw wrote in Man and Superman:

“The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.”

This issue of reasonableness is an important one because innovation, adaptation and discovery are not able reason, but aspiration and hope. Were it for reasonableness, we might have never achieved much of what we’ve set out to accomplish in terms of being innovative, adaptive or creative.

Reasonableness is also the most dangerous for those seeking to make real change and do true innovation. Innovation is not often reasonable, nor are the asks ‘reasonable.’ Most social transformations were not because of reasonable. Women’s rights to vote  or The rights of African Americans to be recognized and treated as human beings in the United States are but two examples from the 20th century that are able lack of ‘reasonableness’.

Lastly, what if you have no idea what the timeline for success is? If you’ve not tackled this before, are working on a dynamic problem, or have uncertain or unstable resources it might be impossible to say how long something will take to solve.

Rethinking goals and their evaluation

One of the better discussions on goals, goal setting and the hard truths associated with what it means to pursue a goal is from James Clear, who draws on some of the research on strategy and decision making to build his list of recommendations. Clear’s summary pulls together a variety of findings that show how individuals construct goals and seek to achieve them and the results suggest that the problem is less about the strategy used to reach a goal, but more on the goals themselves.

What is most relevant for organizations is the concept of ‘rudders and oars‘, which is about creating systems and processes for action and less on the goal itself. In complex systems, our ability to exercise control is highly variable and constrained and goals provide an illusory sense that we have control. So either we fail to achieve our goals or we set goals that we can achieve, which may not be the most important thing we aim for. We essentially rig our system to achieve something that might be achievable, but utterly not important.

Drawing on this work, we are left to re-think goals and commit to the following:

  1. Commit to a process, not a goal
  2. Release the need for immediate results
  3. Build feedback loops
  4. Build better systems, not better goals

To realize this requires an adaptive approach to strategy and evaluation where the two go hand-in-hand and are used systemically. It means pushing aside and rejecting more traditional performance measurement models for individuals and organizations and developing more fine-tuned, custom evaluative approaches that link data to decisions and decisions to actions in an ongoing manner.

It means thinking in systems, about systems and designing for ways to do both on an ongoing, not episodic manner.

The irony is, by minimizing or rejecting the use of goals, you might better achieve a goal of a more impactful, innovative, responsive and creative organization with a real mission for positive change.

 

 

Image credit: Author

 

businessinnovationpsychologysocial systemssystems thinking

Fail Fast, Succeed Sooner(?)

 

8489654285_4224be5f61_b_2

Our series on paradox continues today by taking a look at the curious case of failure and how it’s popularity as a means to success represents more than just a paradox, but a series of contradictions that might just thwart the very innovation it seeks to support by embracing it. 

Failure is everywhere. Today I noticed a major research university share a post on LinkedIn celebrating failure in the workplace. This follows a recent conversation with a colleague who was thrilled that she’d received funding to study failure, having secured peer-reviewed funding to do it. If it hadn’t done so before, failure has finally jumped the shark. With all due respect to my colleague, the university and everyone who’s embraced failure, it’s use in common discourse has now reached a level where it was never intended to go and has perhaps done more to mask real solutions to problems than solve them.

The more we celebrate failure, the more likely we are to get it.

I’ve written about the failure fetishism that is sweeping over the world of business, innovation and now education. You know failure and innovation has reached its peak when scholars are getting peer-reviewed funding to study it. This in itself represents a paradox on many levels when you consider that research is intended to support innovation, yet the very process that funders typically use to support funding innovative ideas is based on the evidence of how those ideas have been used before as judged by peer review. Thus, you need to show that an innovative idea is worthy by means of evidence to support the research to generate the evidence of the innovative idea.

If you are doing peer review appropriately one could argue that you should never approve projects that are highly innovative as there simply isn’t evidence to support it. Given that the university and science have the goal of advancing new knowledge it’s hard to imagine a more perfect example of paradox.

Anxiety & failure

It’s interesting to review that post from 2011 — 5 years ago — in that much of the material seems as relevant and fresh today as it was back then. Citing a column in HBR by Daniel Isenberg, I highlighted a passage that resonated with me and what I was seeing in the discourse and use of failure in scholarship and innovation development:

Well-intentioned though they may be, these attempts to celebrate failure are misguided. Fear should not be confused with anxiety—and celebrating failure seems aimed at reducing anxiety.

Anxiety is defined as an extreme un-ease and a discomfort and stress about a situation, scenario or circumstance. While the rates of clinical anxiety and mood disorders appear to be quiet prevalent at over 11% of the adult population in Canada, the general mood of the public as expressed in the media, social media, and coffee pub conversations suggests this might be the tip of an iceberg of yet indeterminable size. Some have branded this the Age of Anxiety, drawing on the mid-19th century poem (pdf) of the same name  by W.H. Auden (suggesting our worry about worry isn’t new).

However, as digital marketing strategist and author Mitch Joel writes, digital technologies lend themselves to their own anxiety among citizens, business owners, marketers and communications professionals alike. As Joel and many others have advocated: we might need to unplug to better connect.

IBM has conducted its global C-Suite studies for years and has found that terms like collaboration, partnership, and social all emerged from the interviews and surveys across the world as priorities for business moving forward. All of these involve non-specific measures of success. Unlike profit (which is still a top-line item, even if not always spoken), the metrics of success in any of those areas are not clear and success is poorly defined. Ambiguity in the measures of your success and the uncertainty surrounding pathways to success is a recipe for anxiety.

If you don’t know what your criteria for success is, or what is expected of you, the ability to fail is low. But what often happens is that we see metrics almost arbitrarily introduced to program evaluations and research because we are using what worked before in one context into a new context. All of a sudden we have inappropriate measures and metrics meeting uncertainty meeting anxiety and all of a sudden failure becomes a big deal. Of course people are failing, but that doesn’t necessarily help the bigger picture.

The innovation problem

Innovation is something that can be enabled, but often not well-managed and the distinction is important. The former is more organic, complex and unpredictable while the latter notion implies a degree of control. The less control we have, the more anxiety we are likely to feel. But innovation is not just some word that’s sexy, it’s also about critically adapting to new conditions and new circumstances.

This Thursday in London, my friend and colleague John Wenger is leading a workshop on how to deal with Brexit for those feeling confused, upset, angry, or isolated because of the decision made by referendum this year. Through the use of sociodrama, dialogue and discussion, John helps people connect with their feelings and thoughts in novel settings and contexts to help them to ground what they don’t know in what they do. That is innovation lived out in real-time. This workshop’s not technological, it might not be easily commercialized, nor will it ‘scale’ enough to secure massive investments of venture capital, but it is a process that is at its heart about innovation: new thinking realized in practice through design to produce value.

If those participants go off and have more compassionate conversations with each other, their neighbours and with themselves as a result of this we will truly see social innovation.

Participants in processes like this are designing their life, their way of thinking and relating to each other that is new, even if the process, memories and material might be quite old and established. The confusion about the need for innovation to somehow be this (high) technological or world-reaching ‘thing’ is what limits our sense of what’s really possible and produces considerable failure. Failure would be a failure to learn and attend to what is happening, not a failure to experience hurt, shame, joy, confusion, or community.

Yet, if one were to adopt the rhetoric of failure in this case we might actually produce the very kind of failure that we, ironically are trying to avoid. Anchoring our metrics and focus on what constitutes ‘failure‘ — which is a concept that is rooted to some definition of success — leads us away from the complicated, tricky questions about what it means to innovate and adapt. It also draws us away from looking at problems of systems to problems of individuals.

Failed systems, not failed individuals

When individuals fail at not reaching an inappropriate target, it’s not a problem of them as individuals, but the system itself. Celebrating that failure might reduce some of the stigma associated with this ‘failure’, but it doesn’t address a larger set of problems.

While it may be that our interventions are aimed at individuals, it is the problem of the system in which individuals, groups and organizations are rooted that contribute to a great deal of the issues we individuals face. It’s why innovation requires platforms to be successful at a larger scale because they create new systems and ecosystems for innovations to anchor to other changes, which strengthens their power for change. If we were to look solely at individuals, divorced from context and the community/socitey in which they arise, concepts like Brexit cannot make any sense no matter how you look at them (whether voting for or against it).

Platforms and ecosystems do not fail as much as they succeed, but they do support the necessary change far more that idolizing the fact that we’ve not succeeded in achieving the wrong thing, which is more and more what failure is all about.

To borrow the phrase from design thinking: We may fail fast, but will not succeed sooner or ever if we continue to fail at the wrong thing.

Photo credit: Fail by Denise Krebs used under Creative Commons License. Thanks for sharing your art Denise!

 

 

complexityinnovation

Of tails, dogs and the wagging of both

Who's wagging whom?

Who’s wagging whom?

Evaluation is supposed to be driven by a program’s needs and activities, but that isn’t always the case. What happens when the need for numbers, metrics, ‘outcomes’ and data shape the very activities programs do and how that changes everything is something that is worth paying some attention to. 

Since the Second World War we’ve seen a gradual shift towards what has been called presence of neo-liberal values across social institutions, companies, government and society. This approach to the world is characterized, among other things, by its focus on personal and economic efficiency, freedom, and policies that support actions that encourage both. At certain levels of analysis, these policies have rather obvious benefits.

Who wouldn’t like to have more choice, more freedom, more perceived control and derive more value from their products, services and outputs? Not many I suspect. Certainly not me.

Yet, when these practices move to different levels and systems they start to produce enormous complications that are at odds with — and produce distortions of — the very values that they espouse. We’ve seen the same happen with other value systems that have produced social situations that are highly beneficial in some contexts and oppressive and toxic in others – capitalism and socialism both fit this bill.

Invisible tails and wags

What makes ‘isms’ so powerful is that they can become so prevalent that their purpose, value and opportunity stop being questioned at all. It is here that the tail starts to wag the dog.

Take our economy (or THE economy as it is somewhat referred to). An economy is intended to be a facilitator and product of activities used to create certain types of value in a society. We work and produce goods (or ideas), exchange and trade them for different things, and these allow us to fulfill certain human goals. It can take various shapes, be regulated more or less, and can operate at multiple scales, but it is a human construction — we invented it. Sometimes this gets forgotten and in times when we use the economy to justify behaviour we forget that it is our behaviour that is the economy.

We see over and again with neoliberalism (which is among the most dominant societal ‘ism’ of the past 50 years in the West and more reflected globally all the time) taken at the broadest level, the economy becomes the central feature of our social systems rather than a byproduct of what we do as social beings. Thus, things like goods, experiences, relations and so on we used to consider as having some type of inherent value suddenly become transformed into objects that judgements can be made.

The role of systems

This can make sense where there are purpose-driven reasons to assign particular value scores to something, but the nature of value is tied to the systems that surround what is valued. If we are dealing with simple systems, those where there are clear cause-and-effect connections between the product or service under scrutiny and its ability to achieve its purpose, then valuation measurement makes sense. We can assert that X brand of laundry detergent is better than Y on the basis of Z. We can conduct experiments, trials and repeated measures that can compare across conditions.

It is also safe to make an assumption of value based on the product’s purpose that can be generalized. In other words, our reason for using the product is clear and relatively unambiguous (e.g., to clean clothes using the above example). There may be additional reasons for choosing X brand over Y, but most of those reasons can be also controlled for and understood discretely (e.g., scent, price, size, bottle shape etc..).

This kind of thinking breaks down in complex systems. And to make it even more complex, it breaks down imperfectly so we have simple systems interwoven within complex ones. We have humans using simple products and services that operate in new, innovative and complex conditions. Unfortunately, what comes with simple systems is simple thinking. Because they are — by their nature — simple, these system dynamics are easy to understand. Returning to our example of the economy, classical micro-economic models of supply and demand as illustrated below.

Relationships and the systems that surround them

supply_and_demand

Using this model, we can do a reasonable job of predicting influence, ascertaining value and hypothesizing relationships between both.

In complex systems, the value links are often in flux, dynamic, and relative requiring a form of adaptive evaluation like developmental evaluation. But that doesn’t happen as much as it should, mostly because of a failure to question the systems and their influence. Without questioning the values and value that systems create — the isms that were mentioned earlier — and their supposed connection to outcomes, we risk measuring things that have no clear connection to value and worse, we create systems that get designed around these ineffective measures.

What this manifests itself in is mindless bureaucracy, useless meetings, pompous and intelligible titles, and innovation-squashing regulations that get divorced from the purpose that they are meant to solve. And in doing so, this undermines the potential benefit that the original purpose of a bureaucracy (to document and create an organizational memory to guide decisions), meetings (to discuss and share ideas and solve problems), titles (to denote role and responsibility — although these aren’t nearly as useful as people think in the modern organization), and regulations (to provide a systems lens to constrain uncoordinated individual actions from creating systems problems like the Tragedy of the Commons).

More importantly, this line of thinking also focuses us on measuring the things that don’t count. And as often quoted and misquoted, the phrase that is apt is:

Not everything that counts can be counted, and not everything that can be counted counts.

Counting what counts

It is critical to be mindful of the purpose — or to reconnect, rediscover, reinvent and reflect upon the purposes we create lest we allow our work to be driven by isms. Evaluators and their program clients and partners need to stand back and ask themselves: What is the purpose of this system I am dealing with?

What do we measure and is that important enough to matter? 

Perhaps the most useful way of thinking about this is to ask yourself: what is this system being hired to do? 

Regular mindful check-ins as part of reflective practice at the individual, organizational and, where possible, systems level are a way to remind ourselves to check our values and practices and align and realign them with our goals. Just as a car’s wheels go out of alignment every so often and need re-balancing, so too do our systems.

In engaging in reflective practice and contemplating what we measure and what we mean by it we can better determine what part of what we do is the dog, what is the tail and what is being wagged and by whom.

Photo credit: Wagging tail by Quinn Dombrowski used under Creative Commons License via Flickr. Thanks Quinn for making your great work available to the world.

Economic model image credit from Resources for Teachers used under Creative Commons License. Check out their great stuff for helping teachers teach better.

complexitydesign thinkingemergenceevaluationinnovation

Evaluation and Design For Changing Conditions

Growth and Development

The days of creating programs, products and services and setting them loose on the world are coming to a close posing challenges to the models we use for designing and evaluation. Adding the term ‘developmental’ to both of these concepts with an accompanying shift in mindset can provide options moving forward in these times of great complexity.

We’re at the tail end of a revolution in product and service design that has generated some remarkable benefits for society (and its share of problems), creating the very objects that often define our work (e.g., computers). However, we are in an age of interconnectedness and ever-expanding complexity. Our disciplinary structures are modifying themselves, “wicked problems” are less rare

Developmental Thinking

At the root of the problem is the concept of developmental thought. A critical mistake made in comparative analysis — whether through data or rhetoric — is one that mistakenly views static things to moving things through the same lens. Take for example a tree and a table. Both are made of wood (maybe the same type of wood), yet their developmental trajectories are enormously different.

Wood > Tree

Wood > Table

Tables are relatively static. They may get scratched, painted, re-finished, or modified slightly, but their inherent form, structure and content is likely to remain constant over time. The tree is also made of wood, but will grow larger, may lose branches and gain others; it will interact with the environment providing homes for animals, hiding spaces or swings for small children; bear fruit (or pollen); change leaves; grow around things, yet also maintain some structural integrity that would allow a person to come back after 10 years and recognize that the tree looks similar.

It changes and it interacts with its environment. If it is a banyan tree or an oak, this interaction might take place very slowly, however if it is bamboo that same interaction might take place over a shorter time frame.

If you were to take the antique table shown above, take its measurements and record its qualities and  come back 20 years later, you will likely see an object that looks remarkably similar to the one you lefty. The time of initial observation was minimally relevant to the when the second observation was made. The manner by which the table was used will have some effect on these observations, but to a matter of degree the fundamental look and structure is likely to remain consistent.

However, if we were to do the same with the tree, things could look wildly different. If the tree was a sapling, coming back 20 years might find an object that is 2,3,4 times larger in size. If the tree was 120 years old, the differences might be minimal. It’s species, growing conditions and context matters a great deal.

Design for Development / Developmental Design

In social systems and particularly ones operating with great complexity, models of creating programs, policies and products that simply release into the world like a table are becoming anachronistic. Tables work for simple tasks and sometimes complicated ones, but not complex ones (at least, consistently). It is in those areas that we need to consider the tree as a more appropriate model. However, in human systems these “trees” are designed — we create the social world, the policies, the programs and the products, thus design thinking is relevant and appropriate for those seeking to influence our world.

Yet, we need to go even further. Designing tables means creating a product and setting it loose. Designing for trees means constantly adapting and changing along the way. It is what I call developmental design. Tim Brown, the CEO of IDEO and one of the leading proponents of design thinking, has started to consider the role of design and complexity as well. Writing in the current issue of Rotman Magazine, Brown argues that designers should consider adapting their practice towards complexity. He poses six challenges:

  1. We should give up on the idea of designing objects and think instead about designing behaviours;
  2. We need to think more about how information flows;
  3. We must recognize that faster evolution is based on faster iteration;
  4. We must embrace selective emergence;
  5. We need to focus on fitness;
  6. We must accept the fact that design is never done.
That last point is what I argue is the critical feature of developmental design. To draw on another analogy, it is about tending gardens rather than building tables.

Developmental Evaluation

Brown also mentions information flows and emergence. Complex adaptive systems are the way they are because of the diversity and interaction of information. They are dynamic and evolving and thrive on feedback. Feedback can be random or structured and it is the opportunity and challenge of evaluators to provide the means of collecting and organizing this feedback to channel it to support strategic learning about the benefits, challenges, and unexpected consequences of our designs. Developmental evaluation is a method by which we do this.
Developmental evaluators work with their program teams to advise, co-create, and sense-make around the data generated from program activities. Ideally, a developmental evaluator is engaged with program implementation teams throughout the process. This is a different form of evaluation that builds on Michael Quinn Patton’s Utilization Focused-Evaluation (PDF) methods and can incorporate much of the work of action research and participatory evaluation and research models as well depending on the circumstance.

Bringing Design and Evaluation Together

To design developmentally and with complexity in mind, we need feedback systems in place. This is where developmental design and evaluation come together. If you are working in social innovation, your attention to changing conditions, adaptation, building resilience and (most likely) the need to show impact is familiar to you. Developmental design + developmental evaluation, which I argue are two sides of the same coin, are ways to conceive of the creation, implementation, evaluation, adaptation and evolution of initiatives working in complex environments.
This is not without challenge. Designers are not trained much in evaluation. Few evaluators have experience in design. Both areas are familiarizing themselves with complexity, but the level and depth of the knowledge base is still shallow (but growing). Efforts like those put forth by Social Innovation Generation initiative and the Tamarack Institute for Community Engagement in Canada are good examples of places to start. Books like Getting to Maybe,  M.Q. Patton’s Developmental Evaluation, and Tim Brown’s Change by Design are also primers for moving along.
However, these are start points and if we are serious about addressing the social, political, health and environmental challenges posed to us in this age of global complexity we need to launch from these start points into something more sophisticated that brings these areas further together. The cross training of designers and evaluators and innovators of all stripes is a next step. So, too, is building the scholarship and research base for this emergent field of inquiry and practice. Better theories, evidence and examples will make it easier for all of us to lift the many boats needed to traverse these seas.
It is my hope to contribute to some of that further movement and welcome your thoughts on ways to build developmental thinking in social innovation and social and health service work

Image (Header) Growth by Rougeux

Image (Tree) Arbre en fleur by zigazou76

Image (Table) Table à ouvrage art nouveau (Musée des Beaux-Arts de Lyon) by dalbera

All used under licence.