Invited Experts on Performance Question

Dancy Avatar Image Geoff Dancy, Ph.D. Assistant Professor of Political Science Tulane University

Evaluating ICC Performance: Design is Critical. The ICC should carefully apply social science methodology when devising performance indicators. Among other things, it needs to maintain a critical distinction between performance evaluation and impact assessment.

Cutting-edge social science research offers answers to these kinds of questions. The answers always start in the same place: design of inquiry.

Summary

Commentary on the International Criminal Court is heavy with evaluation but light on method. Observers make harsh pronouncements about the Court’s cost, its pace, its conviction rate, and its bias.1 Rarely, though, are gripes about the ICC accompanied by a measured discussion of what we should expect.

The ICC is the first of its kind. That means there are no baselines for comparison. There is also no protocol, or set of best practices, for how to review its work. Yet the criticisms mount, creating a pressing need for internal performance review.

In December 2014, the Assembly of State Parties requested that the Court “intensify its efforts to develop qualitative and quantitative indicators that would allow the Court to demonstrate better its achievements and needs.”2 This move could be a sign that the Court’s honeymoon period—if it ever existed—is over, and it’s now time to start counting beans. The request could also be a way to exert subtle pressure on the Court to demonstrate its worth in the face of political pressure.

The Court has obliged. Pursuant to the ASP’s request, the ICC is working with the Open Society Justice Initiative to develop indicators of Court performance. The pilot work is detailed in two 2016 reports. The Second Report,3 a twenty page document on the development of indicators, is helpful in two ways: it provides insight into the Court’s thinking about how it should be evaluated, and it appends a fifty-page annex with raw data on the last three years of Court activity.

Clear from this report, and from the question posed to the ICC Forum penned by Head Prosecutor Fatou Bensouda, is that many methodological questions remain unanswered. How should the Court quantify fairness? How can it possibly evaluate whether its own leadership is effective? Can contextual factors be captured by quantitative measures or benchmarks?

Luckily, cutting-edge social science research offers answers to these kinds of questions. The answers always start in the same place: design of inquiry.

Argument

Methodical research design is not glamorous but is absolutely crucial—and social scientists watch from the sidelines as practitioners routinely mess it up. At its most fundamental level, research design is about matching the questions you want to ask to the methods you will use to answer those questions. It also requires thinking about the audience for your work. Most importantly, design means planning before you waste time collecting data or creating indicators.

How should the ICC do this work? I make four suggestions.

Suggestion One: Maintain a Distinction Between Performance Evaluation and Impact Assessment

The first step in designing an examination of ICC operations is to separate “performance evaluation” from “impact assessment.” Though these terms sound vaguely similar—or like corporate jargon plastered on a PowerPoint slide—there is a real and meaningful difference between the two concepts.

Evaluation determines “the worth of a thing,”4 which involves weighing “strengths and weaknesses of programs, policies, personnel, productions, and organizations to improve their effectiveness.”5 This lies in contrast to impact assessment or the “tracking of change as it relates to an identifiable source or set of factors.”6 Where the first is about applying standards to facts, the second is about determining the casual power of an intervention.

To appreciate the difference, consider schools. Evaluating school performance means ensuring teachers are hired and compensated properly, students are treated fairly, the curriculum is chosen carefully, parents’ concerns are addressed, grades are awarded following equally applied procedures, and students’ standardized test scores meet certain thresholds. Closely monitoring a school’s performance in these areas is different from assessing the social impact of a school. Assessing social impact requires one to look outward: Do schools improve the neighborhoods in which they reside? Are schools helping students maintain gainful employment? Do school expulsion policies exacerbate incarceration? Evaluation is about the quality of internal performance; impact assessment is about external effects.

Embedded in the word evaluation is value. Evaluating the performance of a person or an organization means making judgments based on normative criteria about what is good. We do this all the time: humans are steady-state evaluators. That food server is good, that garbage truck is moving too slow, that government office is inefficient, and so on.

Impact assessment requires much more meticulous observation about what causes what. We know smoking leads to cancer based on painstaking medical research starting in the 1950s that established statistical correlations between addictive smoking and the incidence of lung cancer. Decades later, clinical studies established linkages between the inhalation of smoke and biological mutations in the body. We can be sure of the impact of smoking because scientists discovered correlations and then came to understand the biological mechanisms that produce this correlation. These two types of causal inference—statistical correlations and the identification of mechanisms—are essential to impact assessment.

When examining the ICC, observers exhibit two tendencies. The first is to conduct performance evaluation of the Court based on unspecified criteria. Trials are too long, or they are too costly, or the rights of the defendant are inadequate. However, criminal courts in the most developed countries in the world may be subject to the same criticisms. What is a good benchmark length for an investigation or an appeals process? What is the ideal expenditure-to-conviction ratio? Evaluating without stated baselines leaves analysis unanchored; and, many times, unmoored from reality.

The second tendency among observers is to blur the lines between performance evaluation and causal impact assessment or to evaluate based on impact. This amounts to thick consequentialism: the ICC is good if and only if it can stop violence on the ground or, at the very least, do no harm in a situation of interest. This argument is powerful because it simplifies complex interventions; nevertheless, the consequentialist standard is probably unhelpful for evaluation. No organization or set of interventions can produce an unblemished record of successes. No criminal court can deter violence in every instance.

This is why evaluation and impact assessment should remain distinct. An organization can perform well based on internal standards but have no impact on its external environment. Or it can perform poorly and have huge impacts. These are separate issues.

A 2015 Open Society Justice Initiative consultancy report on ICC performance does not recognize the distinction, urging the Court to create “impact indicators” in order to explore, among other things, “the Court’s legacy in the countries where it operates and beyond, including its deterrent effect.”7 The ICC seems not to have heeded this advice, which is for the better. In its two reports on performance, the Court emphasizes four criteria by which to evaluate its operations:

  1. that the proceedings are expeditious, fair, and transparent;
  2. that its leadership and management are effective;
  3. that it ensures adequate security for its work; and
  4. that victims have adequate access to the Court.

These criteria direct the Court inward, restricting it to evaluating operational conduct, and preclude judgment based strictly on the consequences of ICC interventions. This is good.

However, this list of goals, along with the Court’s work on indicators so far, combines a number of overlapping evaluative criteria. Some lend themselves to quantification and some do not. For instance, the first goal combines a concern for expeditiousness which could be measured temporally, with a concern for fairness which cannot be measured directly. This creates two potential problems. First, it invites arbitrary quantitative benchmarking; and, second, it introduces a number of overlapping and complex evaluative standards that must be meticulously parsed.

Suggestion Two: Carefully Design a Few Key Benchmarks

First, the ICC’s listed goals embody four areas of focus: trial proceedings, leadership, security, and victims. The Second Report presents fifty pages of tabled data.

The sheer number of indicators in these tables impress a data scientist, but they are probably dizzying to most people. Under a section labeled “fairness and expeditiousness of proceedings,” the report includes summary information on seven different phases of every ongoing proceeding at the ICC: confirmation, trial preparation, trial, judgment, sentencing, reparations, and final appeal. Under the confirmation stage alone, there are ten indicators; and, if each phase is included, the total number of indicators comes to sixty-three. This is a lot, and it is not obvious why some of these indicators are useful for evaluation.

However, we should resist the urge to be too critical at this stage. The Court is still assembling all of this raw data in one place and workshopping ideas about how to aggregate this data into more substantial indicators of performance.8

In this process, those staffers tasked with producing indicators should keep a few elements of design in mind. First, less is more. Among methodologists, there is something known as Goodhart’s Law or the “tendency of a measure to become a target.”9 Indicators are a powerful technology.10 When we produce measures of performance, those measures come to shape the way tasks are performed. For instance, if students are admitted to university on the basis of the SAT, schools will start to orient their secondary education around improving SAT scores.

This same process could occur at the ICC. If we produce a battery of indicators about the speed of trial proceedings, and those indicators are leveraged as performance benchmarks, then it will create a strong incentive to hasten proceedings at every stage to improve trial expeditiousness scores. In this instance, speedier trials may be a desired outcome; but the more benchmarks that are produced, the higher the risk that doing the job of prosecuting war criminals will be reduced to checking boxes on a form. Though it should continue to collect as much data as possible, the ICC staff should construct and publicize only a few key quantitative indicators of performance. Doing so will preserve its agency and protect it from the mire of audit culture.

Second, in constructing a crucial few indicators, plan carefully. Start by drawing diagrams that break big concepts into component parts and link those parts to observable Court activities. This is called operationalizing a concept. Take, for example, the notion of effective leadership and management. What are the components of this concept? Based on the presentation of data in the 2016 Court report, the components appear to be: budget implementation, human resources, and staff diversity represented by geographic and gender balance. Within the geographical balance component category, many countries are listed as “under-represented” in staff.

There are two problems of operationalization in this construct. The first is that geographical balance is not intuitively related to effective leadership. Geographical balance, in my mind, is a component of representativeness not effectiveness. An argument could be made that these ideas are cousins; but, in current form, there is a mismatch between the parent concept and one of its components. The second issue is that the method for determining that certain countries are under-represented on staff remains unspecified. No clear link is drawn between the conceptual component—geographical balance—and its level of representation measure. A diagram outlining the logic of conceptualization and indicator choice, of the type often used in studies of democracy,11 could easily address these issues.

Third, during the design process, don’t favor indicators just because they are easily quantified. One reason the Court may have chosen to present indicators of the staff’s geographical composition is that it’s easily measured and expressed in numbers. There are fourteen staff members from Italy. Showing this is much easier than trying to measure a concept as large as effective leadership. Even more difficult is measuring other outlined evaluative criteria like victim access or fairness of proceedings. For instance, in her question to the ICC Forum, Head Prosecutor Bensouda worries that the “subjectivity” of fairness “makes it an inherently difficult value to measure.”

This is a reasonable concern. But the inherent difficulty of measuring concepts like fairness or access does not mean that it cannot be done. Social scientists possess reliable indicators of highly complex phenomena, like state repression,12 democracy,13 and judicial independence.14 These are all built atop hundreds of subjective coding decisions, as are other regularly referenced measures of fairness. Electoral fairness, for instance, depends on the judgment of election monitors and monitoring agencies.15 How does one create usable indicators of such complex concepts?

The answer is the final point about design: when it comes to complex concepts or conceptual components, be inductive. To again reference the question written to the ICC Forum, Head Prosecutor Bensouda states, “Before fairness can be measured, there must be a shared understanding of what it means.” I understand the logic behind this statement, which reflects an impulse toward deductive reasoning: first we define and then we measure. However, it is not technically true that we must define something before we can measure it. Some interpretive concepts are innately unsuitable for top-down measurement. Just like Plato’s interlocutors in The Republic could not define justice, we probably can’t arrive at a universal definition of concepts like fairness.

What we can do is ask people—participants, staff, affected communities—whether they think certain proceedings are fair. This is an inductive process. It is possible to construct performance indicators by modeling people’s responses to survey questions alongside other information like expert evaluations. For example, Bayesian statistical models can estimate underlying or latent traits in a population based on various sources of available data. Used by the most well designed data projects in the world,16 Bayesian models do not assume a proper or universal definition of various concepts. Instead, they take thousands of individually recorded judgments and use them to generate estimates. In some ways, performing Bayesians statistics is the mathematical equivalent of analyzing connotation.

At relatively low cost, the Court could seek out high-level statisticians to conduct surveys and build, from the bottom up, indicators of complex concepts like trial fairness or transparency. This could be part of a much-needed larger strategy of “reckoning with complexity” in international criminal law.17 But to do so, the Court will first have to address a third issue of design: who is the audience?

Suggestion Three: Know Your Audience

Among the four goals outlined by the Court lurk six big evaluative criteria: trial expeditiousness, trial fairness, transparency of proceedings, leadership effectiveness, security, and victim access. If one is to assess perceptions of how fair or accessible the ICC is, one must first ask: fair or accessible to whom?

This is where evaluation gets political. Who is the Court’s master? For whom is this performance evaluation meant? Whose perceptions of fairness should count?

As I see it, the ICC has five important audiences:

  1. The Assembly of State Parties
  2. Outside jurists and experts
  3. Judges, staff, and counsel who have direct experience with the Court
  4. Compliance partners, who are actors within states that have the power to promote international criminal accountability18
  5. Victims

If the ICC intends on building indicators based on a combination of assembled data and stakeholder feedback, as I have suggested, it would be helpful to match evaluative aims to the audience of interest. For example, it strikes me that those most qualified to answer questions about expeditiousness and effective leadership are Groups 2 and 3. The same goes for trial fairness. It would be most enlightening to know how defense counsel perceives Court proceedings, in comparison to judges, staff, and observers. Groups 3 and 4 are probably in the best position to answer survey questions about operational security. These evaluations would be easy to implement because they require only that the ICC survey its own employees, or those with whom it regularly interacts.

The promise of survey-based responses is greatest in relation to Groups 4 and 5 that include compliance partners and victims. While knowing what staff and outside experts think of victim access could certainly yield interesting results, the crucial evaluations should come from those people in situation countries directly affected by ICC interventions.

There are two ways to meaningfully access the affected population. The first is to interview or survey victim participants or local ICC partners. Good work of this type is already underway. A study by the Human Rights Center at the Berkeley School of Law was based on interviews with 622 victims registered with the ICC. Among other things, it found that victims want more contact with the Court; and that they possess “insufficient knowledge to make informed decisions about their participation in ICC cases.”19 Another study performed by the ICTY in coordination with independent experts at University of North Texas asked tribunal witnesses to evaluate their experience testifying as well as their perceptions of ICTY effectiveness, administration of justice, and fairness. The results are surprisingly positive, with a majority of witnesses reporting that they think the ICTY has done a “good job.”20 Because these two studies are directly relevant to evaluative criteria being considered by the ICC, the Court might do well to borrow from their approach.

It will also be necessary to conduct random surveys of the wider population especially in those areas being examined or investigated. Based on its listed evaluative criteria, the ICC seems particularly concerned about victim access. This means addressing the following question: “Does the population of victims in a situation country have sufficient opportunity to engage with the Court?” Based on a sample of already-registered victims, we cannot know how many other victims were denied access or were generally unaware of the ICC’s involvement. Only by randomly drawing samples of the population at large can we determine how many people in the wider population were victimized but did not access the Court. While more difficult, this kind of work is certainly possible. For example, The Hague Institute for Justice conducted a random survey in four regions most devastated by 2007–2008 election violence in Kenya. The evidence shows that around half of the respondents witnessed or were victimized by violence; members of that half were much more favorable to the Court than the average Kenyan respondent.21

While survey research is costly, academic institutions and research partners can shoulder some of the financial burden. Moreover, survey-based evaluations will be an increasingly valuable investment over time for three reasons. First, as mentioned before, they are more flexible because they do not rely on top-down definitions of fuzzy concepts like fairness or effectiveness. Thus, survey evaluations can be deployed relatively quickly without waiting for groups of people to agree on definitions. Second, once designed, surveys can be re-used to attain data from anonymous respondents in various audiences across contexts. One could use the same survey in many different countries. Third, surveys help produce comparative benchmarks. If the ICC wants to create performance indicators that can serve as a guideline to future practice, it is absolutely necessary to establish comparable baselines. What is an appropriate fairness score? Is trial fairness improving? These questions can only be answered if there is more than one data point produced by administering the same data-generating instrument—repeating surveys—at different points in time.

Some people may be inclined to interrupt here. If the evaluative ideal I’m outlining were reached, the ICC would possess rigorously designed performance indicators that draw on feedback from important audiences. It might also convert these indicators into benchmarks against which it continually evaluates its own performance over time. However, a skeptic could claim this all amounts to little more than navel-gazing. Even if the ICC performs its work well, it does not mean that it has a positive impact on conflict-affected countries or on global politics as a whole.

According to the Second Report, civil society urged the ICC to “give serious attention to the development of indicators that measure and facilitate improvement in achieving a broader sense of impact in situation countries on the ground.”22 Shouldn’t the Court really focus its energies on assessing its broader impact on the deterrence of atrocity, on reconciliation, or on peace?

Suggestion Four: Assist, But Do Not Conduct Impact Assessment

My answer is: No. The ICC should not perform impact assessments, which should be kept separate from performance evaluation. Furthermore, it is good that the Court has so far approached this issue with caution, promising to consider impact assessment in the future but ultimately not moving on it. Why?

First, expecting a justice institution to assess its own impact is abnormal. The US Supreme Court does not publish research on how its decisions affect society. The Department of Justice conducts audits of DOJ operations to root out misconduct, and it also publishes statistical reports through the Bureau of Justice Statistics; but it is not tasked with assessing the larger impact of its operations on the deterrence of crime or recidivism. This is a good model for the ICC. Evaluate performance and furnish scholars and observers with statistics, but do not perform causal studies.

Second, impact assessments conducted by the ICC would likely be biased toward positive results. That is not to challenge the integrity of ICC leadership or staff; it is only to recognize, especially when funding or support are at stake, that it is nearly impossible to remain objective.

And third, impact assessment is very hard and requires expert training in causal inference, time, and investment. That kind of work should be left to social scientists.

Academic researchers, using both advanced qualitative and quantitative methods, are already producing a wealth of new ICC impact studies. These can be split into three types: those that focus on the legal effects of ICC interventions, those that focus on the Court’s deterrence of atrocity crimes, and those that focus on the impact of the ICC on political violence.

With regard to legal impacts, evidence suggests that actors operating in the shadow of the Court change behavior to appear compliant with international criminal law. The Colombian government made many alterations to its Special Peace Jurisdiction because of the OTP’s monitoring during an extended preliminary examination.23 Sarah Nouwen notes how Sudan and Uganda both established special courts to try atrocity crimes, but ultimately argues that these courts were established to under-perform.24 In this, she sees a blind spot in the complementarity regime.

Other studies show that the ICC can have more indirect legal impacts. In a forthcoming article, Florencia Montal and I find that ICC investigations are statistically correlated with more prosecutions of state agents—like police and security forces—for human rights crimes. To show this, we used a statistical matching procedure that compares countries with ICC investigations to similar countries that are not subject to ICC intervention. There is no legal reason to expect a relationship between ICC investigations and domestic human rights prosecutions. The latter are normally for crimes that do not reach the level of atrocity and are technically outside of the ICC’s jurisdiction. However, the correlation between investigation and human rights trials is strong. The reason is that domestic reformer coalitions are emboldened by the ICC presence in a country. They lobby for more local accountability, they push for judicial reform, and they file more legal cases. The government responds with more prosecutions. Because this was an unforeseen impact of ICC intervention, we call it “unintended positive complementarity.”25

New research also yields a much more nuanced understanding of atrocity-crime deterrence. Early research on the deterrence question was primarily hypothetical or based squarely on theoretical models. Much of it argued that the ICC could not possibly deter atrocities because those committing such offenses would be insensitive to the prospect of punishment.26

However, scholarly impact assessors are using statistical analysis of observed data to challenge excessively rationalist accounts of deterrence. Ben Appel finds the average levels of repressive violence decreases in states after they ratify the Rome Statue, and repressive violence is also lower in those states than in non-Rome-ratifying states.27 Courtney Hillibrecht presents evidence that government-sponsored killing decreased in Libya following the referral to the ICC.28 And Jo and Simmons discover that, among all civil war states, Rome Statute ratification is associated with roughly 50% fewer civilian killings by state governments. Furthermore, direct intervention by the ICC is associated with almost 60% fewer targeted killings by both government and rebel forces. The authors conclude that violent actors change behavior not only because they fear legal punishment but also because of the informal sanctions associated with being targeted by the ICC. The latter process they call “social deterrence.”

These findings should not be taken as evidence that ICC involvement in a country is universally positive. Other data scientists discover little relationship between ICC intervention and violence.29 The ICC also has mixed impacts on larger processes of organized political violence. The Court was not established to end war, but it regularly gets involved in civil war states. Researchers argue convincingly that the ICC affects the resolution of civil war in a non-linear fashion. For instance, Mark Kersten contends that ICC arrest warrants in Uganda encouraged LRA leaders to come to the negotiating table; but, because the warrants could not be dissolved, they also stood in the way of a peace settlement.30 Hillibrecht and Strauss contend that state leaders simply use the Court to constrain their main political opponents.31 And other research suggests that the ICC’s impact on political violence might change with its stages of involvement, varying between preliminary examinations, investigation, and trial phases.32 Much remains to be explored.

Conclusion

This is just a brief survey, but one thing seems certain. For all its faults and missteps, the ICC has definitely sent shockwaves through global society. It does alter political behavior. This raises a puzzle that circles back to the issue of performance indicators: Are the ICC’s on-the-ground impacts dependent on how well it performs its functions? Many critical voices charge that the Court has fallen far short of its operational expectations; yet new impact studies suggest that the Court exerts measurable effects on legal developments, patterns of violence, and political conflict. What does it mean if the ICC has these impacts despite sub-par performance?

It’s quite possible that, so far, the ICC is more important for what it is than what it actually does. The mere existence of the Court sends resonant signals of accountability across the globe. A second possibility is that political actors will adjust, learn from Kenya’s obstructionist example, and begin exploiting the ICC’s shortcomings.

It’s not the job for the ICC itself to sort out these possibilities. Instead, the Court must focus on improving its own practices. It needs to look inward and design well conceived performance indicators. In terms of knowledge, this will produce increasing returns. As time goes on, trained impact assessors can use reliable indicators to establish whether effective or fair Court operations yield greater impacts on the ground—and ultimately contribute to a better world.

Endnotes — (click the footnote reference number, or ↩ symbol, to return to location in text).

  1. 1.

    For examples, see David Davenport, International Criminal Court: 12 Years, $1 Billion, 2 Convictions, Forbes, Mar. 12, 2014, available online; Elizabeth Peet, Why is the International Criminal Court so Bad at Prosecuting War Criminals?, Wilson Q. (Jun. 15, 2015), available online.

  2. 2.

    Assembly of States Parties, Strengthening the International Criminal Court and the Assembly of States Parties, ICC-ASP/13/Res.5 at Annex I, ¶7(b) p.47 (Dec. 17, 2014), available online.

  3. 3.

    International Criminal Court, Second Court’s Report on the Development of Performance Indicators for the International Criminal Court (Nov. 11, 2016), available online, archived.

  4. 4.

    Blaine Worthen, Program Evaluation, in The International Encyclopedia of Education Evaluation 42 (Herbert J. Walberg & Geneva D. Haertel eds., 1990).

  5. 5.

    See American Evaluation Association, About AEA, available online (last visited Jun. 27, 2017).

  6. 6.

    Geoff Dancy, Impact Assessment, Not Evaluation: Defining a Limited Role for Positivism in the Study of Transitional Justice, 4 IJTJ 355, 358 (2010), Oxford Academic paywall, ResearchGate paywall.

  7. 7.

    See Open Society Justice Initiative, Briefing Paper: Establishing Performance Indicators for the International Criminal Court (Nov. 2015), available online, archived.

  8. 8.

    Second Report, supra note 3, ¶ 95.

  9. 9.

    AnnJanette Rosga & Margaret L. Satterthwaie, The Trust in Indicators: Measuring Human Rights, 27 Berkeley J. Int’l L. 253, 285 (2009), available online.

  10. 10.

    Kevin E. Davis, Benedict Kingsbury & Sally Engle Merry, The Local-Global Life of Indicators: Law, Power, and Resistance, in The Quiet Power of Indicators: Measuring Governance, Corruption, and Rule of Law 1–24 (2015), Cambridge Journals paywall.

  11. 11.

    See charts in Gerardo L. Munck & Jay Verkuilen, Conceptualizing and Measuring Democracy: Evaluating Alternative Indices, 35 CPS 5 (2002), available online.

  12. 12.

    Christopher J. Fariss, Respect for Human Rights has Improved over Time: Modeling the Changing Standard of Accountability, 108 Am. Pol. Sci. Rev. 297 (May 2014), available online.

  13. 13.

    Staffan I. Lindberg, Michael Coppedge, John Gerring & Jan Teorell et al., V-Dem: A New Way to Measure Democracy, 25 J. of Democracy 159 (Jul. 2014), available online.

  14. 14.

    Drew A. Linzer & Jeffrey K. Staton, A Global Measure of Judicial Independence, 1948–2012, 3 J. L. & Cts. 223 (2015), JSTOR paywall, University of Chicago paywall (last visited Oct. 1, 2015).

  15. 15.

    Judith G. Kelley, Monitoring Democracy: When International Election Observation Works, and Why It Often Fails (2012).

  16. 16.

    See e.g., The Varieties of Democracy Project, available online (last visited Jun. 20, 2017).

  17. 17.

    Diane Orentlicher, Owning Justice and Reckoning with its Complexity, 11 J. Int’l Crim. Just. 517 (2013), Oxford Journals paywall.

  18. 18.

    Karen J. Alter, The New Terrain of International Law: Courts, Politics, Rights 53 (2014).

  19. 19.

    Stephen Smith Cody, Eric Stover, Mychelle Balthazard & K. Alexa Koenig, UC Berkeley HRC, The Victims’ Court?: A Study of 622 Victim Participants at the International Criminal Court 3 (2015), available online

  20. 20.

    Helena Vranov Schoorl et al., Echoes Of Testimonies: A Pilot Study into the Long-Term Impact of Bearing Witness Before the ICTY 95 (2016), available online.

  21. 21.

    Tessa Alleblas, Eamon Aloyo, Geoff Dancy & Yvonne Dutton, Is the International Criminal Court Biased Against Africans? Kenyan Victims Don’t Think So, Wash. Post, Mar. 6, 2017, available online.

  22. 22.

    Second Report, supra note 3, at ¶ 100.

  23. 23.

    Éadaoin O’Brien, Par Engstrom & David James, In the Shadow of the ICC: Colombia and International Criminal Justice (May 26, 2011), available online; Geoff Dancy & Florencia Montal, From Law versus Politics to Law in Politics: A Pragmatist Assessment of the ICC’s Impact, 32 Am. U. Int’l L. Rev. 645 (2016), Lexis/Nexis paywall.

  24. 24.

    Sarah M. H. Nouwen, Complementarity in the Line of Fire: The Catalysing Effect of the International Criminal Court in Uganda and Sudan (Nov. 7, 2013).

  25. 25.

    Geoff Dancy & Florencia Montal, Unintended Positive Complementarity: Why International Criminal Court Investigations Increase Domestic Human Rights Prosecutions, Am. J. Int’l L. (forthcoming 2017), SSRN paywall. Earlier version (Jan. 20, 2015), available online, archived.

  26. 26.

    For reviews of the literature, see Tom Buitelaar, The ICC and the Prevention of Atrocities: Criminological Perspectives, 17 Hum. Rts. Rev. 286 (2016), SpringerLink paywall. Earlier version: The Hague Institute for Global Justice, Working Paper 8 (Apr. 2015), available online, archived; Martin Mennecke, Punishing Genocidaires: A Deterrent Effect or Not?, 8 Hum. Rts. Rev. 319 (2007), SpringerLink paywall.

  27. 27.

    Benjamin J. Appel, In the Shadow of the International Criminal Court: Does the ICC Deter Human Rights Violations?, J. Conflict Resol. (forthcoming Apr. 25, 2016), SAGE paywall.

  28. 28.

    Courtney Hillebrecht, The Deterrent Effects of the International Criminal Court: Evidence from Libya, 42:4 Int’l Interactions (forthcoming May 5, 2016), available online, archived.

  29. 29.

    James D. Meernik, International Tribunals and Human Security (2016).

  30. 30.

    Mark Kersten, Justice In Conflict: The Effects of the International Criminal Court’s Interventions on Ending Wars and Building Peace (Aug. 2016).

  31. 31.

    Courtney Hillebrecht & Scott Straus, Who Pursues the Perpetrators? State Cooperation with the ICC, 39 Hum. Rts. Q. 162 (Feb. 2017), Project Muse paywall.

  32. 32.

    Michael Broache, Beyond Deterrence: The ICC Effect in the OTP, openDemocracy, Feb. 19, 2015, available online (last visited Jun. 27, 2017); Yvonne M. Dutton & Tessa Alleblas, Unpacking the Deterrent Effect of the International Criminal Court: Lessons From Kenya, St. John’s L. Rev. (forthcoming Dec. 2016), available online.

Krcmaric Avatar Image Daniel Krcmaric, Ph.D. Assistant Professor of Political Science Northwestern University

The ICC’s pursuit of international justice creates difficult tradeoffs between ending ongoing conflicts and deterring future atrocities

A credible threat of international justice should both prolong ongoing conflicts and deter future atrocities. My argument hinges on a previously neglected factor: how international justice shapes the viability of exile as a retirement option for abusive leaders.

Summary

There are two diverging schools of thought that discuss how the ICC, and the pursuit of international justice in general, might influence violence. On the one hand, optimists argue that the threat of prosecution deters atrocities. Consistent with the claim of Human Rights Watch’s Kenneth Roth that “behind much of the savagery of modern history lies impunity,”1 the assumption is that the promise of legal accountability can prevent the next Holocaust or Rwandan Genocide. On the other hand, pessimists worry that if the warring parties are vulnerable to international criminal prosecution, they may decide to keep fighting when they would otherwise make peace. During the 2011 Libyan conflict, for instance, the Washington Post’s Jackson Diehl speculated that “Libyans are stuck in a civil war in large part because of Gaddafi’s international prosecution.”2

The current debate contains some valuable points. However, it is too simplistic to argue that the ICC is exclusively helpful or exclusively harmful. In fact, the debate between optimists and pessimists is missing the big picture: the positive and perverse effects of the ICC are intimately linked. In this post, I will make the case that a credible threat of international justice should both prolong ongoing conflicts and deter future atrocities. My argument hinges on a previously neglected factor: how international justice shapes the viability of exile as a retirement option for abusive leaders.

Argument

If we want to understand the real-world effects of the ICC on violence, it makes sense to start by thinking about how the shadow of the Court influences the incentives of belligerents. This post explores how the possibility of facing justice at the ICC shapes the decision-making of political leaders. In principle, other belligerents could be analyzed in a similar way, but I choose to focus on heads of state. After all, it is political leaders that typically instigate and enable mass atrocities.3

Until quite recently, leaders had no reason to worry about international justice. Despite the occasional high-minded rhetoric from the international community about “no impunity” and “never again,” proponents of global accountability had little to celebrate. In fact, many early advocates of international criminal law were mocked as dreamy idealists (e.g., consider the long and difficult path of Raphael Lemkin, described in Samantha Power’s book A Problem From Hell, as he attempted to outlaw genocide). Consequently, the prospects for holding violent leaders accountable were bleak. Impunity remained the norm for a long time.

Impunity had one important implication for the decision calculus of leaders: exile was a good retirement option—even for brutal leaders—when they were no longer welcome at home. Rather than face the wrath of their own people, oppressive rulers could find a safe haven abroad. For this reason, there used to be a long tradition of embattled leaders going into exile as a “golden parachute” exit strategy. Indeed, past leaders did not hesitate to spend their remaining days in exile once they were threatened by civil wars and mass protests. For instance, when Ugandan rebels and Tanzanian forces closed in on Kampala in 1979, Uganda’s dictator Idi Amin decided to flee into exile (he eventually settled in Saudi Arabia) rather than stay in Uganda and continue the conflict. Once in exile, leaders like Amin were safe.

Back then, Western democracies—the very states that do most of the enforcement work for international tribunals like the ICC today—viewed exile as a convenient policy tool for easing “bad” leaders out of power. For example, the United States flew Filipino leader Ferdinand Marcos to a quiet retirement in Hawaii to avoid a massive crackdown on protesters after a rigged election in 1986. That same year, French and American diplomats convinced Haiti’s corrupt and violent leader Jean-Claude Duvalier to give up power in exchange for exile on the French Riviera. As unpleasant as it is for an oppressive ruler to be rewarded with a safe—perhaps even luxurious—retirement home, exile did provide a way to facilitate political transitions.

The ICC was created with examples like this in mind. For far too long, the thinking went, oppressive dictators had been able to bargain their way out of trouble and into opulent exiles; and host states typically were happy to accept them because it was politically expedient. According to scholar David Bosco, the long process of building the ICC was not only a march toward the rule of law, but it also was “conceived of as a march away from something else: politics and expediency.”4 As then-UN Secretary General Kofi Annan proudly proclaimed after the creation of the ICC, “Until now, when powerful men committed crimes against humanity, they knew that as long as they remained powerful no earthly court could judge them.”5 The ICC and other international tribunals, of course, are meant to change that by making politics subordinate to law.

In recent years, norms of legal accountability have for the first time started to win over calculations of political expediency (though this certainly is not always the case). The United Nations created ad hoc tribunals to address mass violence in Yugoslavia, Rwanda, Sierra Leone, East Timor, Cambodia, and Chad. Foreign courts using the principle of universal jurisdiction began to pursue heads of states aggressively. This movement towards accountability culminated in 1998 with the Rome Statute, which created the ICC as the first permanent international court with broad jurisdiction over mass atrocities.

Also in the late 1990s, powerful Western states began to provide the muscle to make these tribunals work by arresting indicted criminals. The results have been dramatic. A host of oppressive rulers previously considered above the law—such as Augusto Pinochet, Slobodan Milosevic, Charles Taylor, Khieu Samphan, Laurent Gbagbo, and Hissène Habré—have been arrested in recent years. How has this global “justice cascade”6 influenced political violence?

To start, consider the exile dynamic described above. Exile used to be the standard exit strategy for dictators in distress. But due to the recent justice cascade, the world is becoming a smaller place for oppressive rulers. In today’s world of globalized justice, fleeing into exile no longer guarantees a safe retirement. To give just one example, Liberia’s Charles Taylor was arrested during his supposedly safe exile in Nigeria in 2006 and was then extradited to the Special Court for Sierra Leone. After witnessing events like this, other violent leaders such as Libya’s Muammar Gaddafi, the Ivory Coast’s Laurent Gbagbo, and Syria’s Bashar al-Assad have decided it is better to hunker down in their own countries than to flee abroad.

Taking away the “golden parachute” that exile once provided has implications for how leaders behave during civil conflicts. In fact, it generates two effects that pull in opposite directions.

On the one hand, there is a dark side to international justice. If abusive leaders think fleeing abroad will eventually land them in the dock at The Hague, they have incentives to remain entrenched in power. During civil wars, these leaders have a strong motive to keep fighting—even if the prospects for winning the war are poor—in the hopes of turning the conflict around. Consider the leaders just mentioned. Muammar Gaddafi, who faced an ICC arrest warrant, fought to the death rather than flee into exile. Similarly, Laurent Gbagbo kept fighting until the opposition forces captured him in the presidential mansion and then extradited him to the ICC. Bashar al-Assad, despite some speculation that he would be forced to flee abroad early in Syria’s war,7 is still fighting and violating nearly every law of war. Each of these leaders had at least one foreign country offer exile, but they all decided they would be better off clinging to power.

On the other hand, pursuing accountability comes with an important benefit. Precisely because leaders now know that committing atrocity crimes will decrease their future exit options, international justice creates a deterrent effect. Put another way, today’s leaders want to avoid getting trapped in a position like Gaddafi or Gbagbo—where the only option is to keep fighting a losing war—so they will be reluctant to commit atrocities in the first place. Thus, the ICC is achieving one of its primary goals: the deterrence of atrocity crimes.

My current book project, The Justice Dilemma, provides the statistics to back up the anecdotal evidence mentioned in this post. Three main findings stand out. First, up until 1998 (a watershed year for international justice featuring both the Rome Statute and the Pinochet arrest), culpable and non-culpable leaders went into exile at virtually identical rates. However, today’s culpable leaders (those who preside over mass atrocities) are about six times less likely to take the exile option. Second, a leader’s culpability previously had no effect on civil war duration, but culpable leaders today tend to fight significantly longer civil wars. Third, since the watershed events of 1998, leaders have been less likely to initiate campaigns of mass atrocities.

All together, my results suggest that there is a justice dilemma. By taking away the possibility of exile for culpable leaders, international justice both prolongs conflicts and deters atrocities.

Can the ICC find a way out of this dilemma? Some have suggested that the ICC should become more political (although both Luis Moreno-Ocampo and Fatou Bensouda have been adamant that the ICC is a purely apolitical institution). The basic idea is that a leader guilty of committing atrocities could be allowed to bargain away an ICC indictment if he steps down peacefully and leaves the country. Assuming a culpable leader agreed to such a deal, it could help resolve an ongoing conflict involving that specific leader. However, the problem with this approach is that it ignores how other leaders will respond. If offering “get out of jail free” cards become the norm, other leaders will know that they too can bargain away arrest warrants and find safe havens abroad. Consequently, the deterrent effect of international justice will be undermined. The justice dilemma, therefore, is inescapable.

Overall, there is an inherent tension between ending today’s conflicts and deterring tomorrow’s atrocities. It is possible for the ICC and the international community writ large to do one or the other, but they cannot do both simultaneously. Looking ahead, the ICC and national policymakers will face difficult choices on how to prioritize these competing demands.

Endnotes — (click the footnote reference number, or ↩ symbol, to return to location in text).

  1. 1.

    Kenneth Roth, The Case for Universal Jurisdiction, 80 Foreign Aff. 150 (2001), available online.

  2. 2.

    Jackson Diehl, After the Dictators Fall…, Wash. Post, Jun. 5, 2011, available online.

  3. 3.

    For an insightful leader-centric approach to the study of mass violence against civilians, see Benjamin A. Valentino, Final Solutions: Mass Killing and Genocide in the Twentieth Century (2004).

  4. 4.

    David Bosco, Rough Justice: The International Criminal Court in a World of Power Politics 3 (2014).

  5. 5.

    See Press Release, UN, Secretary-General Says Establishment of International Criminal Court Is Major Step in March Towards Universal Human Rights, Rule of Law, L/2890 (Jul. 20, 1998), available online.

  6. 6.

    The phrase “justice cascade” is typically associated with the work of Kathryn Sikkink. See, for example, Kathryn Sikkink, The Justice Cascade: How Human Rights Prosecutions are Changing World Politics (2011).

  7. 7.

    Andrew E. Kramer, In Russia, Exile in Comfort for Leaders Like Assad, N.Y. Times, Dec. 28, 2012, available online.

McIntyre Avatar Image Gabrielle Louise McIntyre1 Chef de Cabinet, Office of the President United Nations Mechanism for International Criminal Tribunals

Performance Assessment Cannot Take Place in a Vacuum

Providing only a partial or limited overview of the Court’s performance could be particularly detrimental in so far as it may give rise to ill-founded expectations and confusion as to what the Court can achieve, what hurdles it faces, and where meaningful reforms could be implemented.

Summary

The development of appropriate ways to measure the progress of the Court toward achievements of its stated goals and its performance overall is a complex and difficult undertaking, and the efforts thus far—reflected in the Court’s most recent report2—are to be commended. It likewise must be acknowledged that collecting performance-related data can be a resource-intensive undertaking, and the Second Report in many ways reflects the need to seek an appropriate balance between performance assessment, on the one hand, and actual performance, on the other. The Second Report appears to seek to strike such a balance by limiting the scope of the performance assessment in a number of different ways and making it a primarily inward-looking exercise.

For performance assessment to be meaningful, however, it cannot take place in a vacuum, as a number of commentators both inside and outside the Court have recognized.3 Rather, performance assessment must look at the work of the Court in its proper context.

That means, inter alia, that performance assessment must take account of the system created by the Rome Statute in which the ICC operates, including the primary role played by State Parties in the operationalization of that system. Indeed, an approach to performance assessment that looks at only some aspects of the Court’s work and mandate, excludes consideration of relevant external factors, and does not fully reflect the aims underlying the performance assessment process, represents a missed opportunity to identify and address all factors fundamental to the success of the Court. Such a circumscribed approach to performance assessment also risks misleading or providing inadequate information to key interlocutors—including the Assembly of States Parties and the general public—as to both the Court’s achievements and, more fundamentally, its needs. Given the current critical stage of the Court’s development, providing only a partial or limited overview of the Court’s performance could be particularly detrimental in so far as it may give rise to ill-founded expectations and confusion as to what the Court can achieve, what hurdles it faces, and where meaningful reforms could be implemented.

As set forth below, a more holistic approach to performance assessment will serve the Court and the Rome system well in the long run. Such an approach would mean going beyond the narrow confines of the performance assessment exercise outlined in the Second Report. To do this notwithstanding its resource limitations, the Court may have to limit performance indicators to those that are most critical to assessing the success of the Court as a whole. It may also wish to consider exploring partnerships with external bodies to help ensure that the performance assessment undertaken is as comprehensive, meaningful, and effective as possible. Difficult though it may be to implement a more comprehensive and holistic approach, it is important that such an approach be adopted from the beginning, not left to be phased in gradually over time, if the Court, the Assembly of States Parties, and all other interested parties are to truly benefit.

Argument

I. To measure its performance adequately and accurately, the ICC must take account of the context in which it operates and, in particular, the key role played by State cooperation and support.

The four goals identified by the ICC for the purposes of measuring performance in its Second Report are derived from two of the three priority objectives of its Strategic Plan of 2013–20174 and characterized as representing issues primarily within the control of the Court.5 While “recognizing that the performance of the Court substantially depends on external cooperation,” the Court is therefore leaving to one side the third priority objective of “Cooperation and Support” identified in the Strategic Plan, an issue considered primarily outside of the Court’s control.6

This same approach is reflected in the Court’s selection of particular performance indicators. While recognizing that external factors can have a “substantial impact on the Court’s ’own’ performance indicators” and noting that all Court-wide performance indicators should be read and evaluated in their specific context, the Court considers it “prudent to limit the choice of indicators for now to those primarily under the control of the Court itself,” leaving such matters to be considered at a later stage.7

This restrictive, inward-looking approach to assessing performance is problematic for a number of reasons.

First, and most simply, the alignment of performance measures with the overall objectives of the Court is critical to identification of what performance matters for the success of the Court. In this respect, the exclusion of the third priority objective from the initial development of performance measures is troubling, as it inhibits consideration of a critical variable that is acknowledged to be central to the work of the Court: the operational dependency of the Court on State Parties.8 A coherent and comprehensive approach to performance assessment requires that all of the Court’s priority objectives are taken into account, not just some of them.

Second, the restrictive approach by which external factors or impacts are omitted from the assessment risks undermining the overall aims underlying the performance assessment process. As set forth in the Second Report, the development of performance indicators is part of a continuing effort to improve efficiency at the Court, to demonstrate better its achievements and needs, and to allow States Parties to assess the Court’s performance in a more strategic manner.9 Without an integrated consideration of State cooperation and other external factors in the context of the performance assessment process, it is not apparent that the Court or the Assembly of States Parties will be able to achieve the desired ends in this regard. Indeed, through the appropriate formulation of performance indicators that take full account of the system in which the ICC operates, the ICC stands a much better chance of providing an accurate picture of its performance that underscores the intrinsic role played by States Parties and more accurately reflects the system created by the Rome Statute. After all, the Court alone is not in control of its performance; it is the Court in partnership with States Parties that determines the success or otherwise of the Court.

Third, even if one were to accept that issues outside of the Court’s control are less relevant or are not as readily measurable, it is not clear that issues related to State cooperation and support are entirely outside of the Court’s control. In the current ICC Strategic Plan, for example, the Court identifies a number of activities that the Court will undertake to encourage cooperation and support for the Court’s activities.10 These are activities that are to be carried out by the Court—either alone or in partnership with others—and are therefore in the control of the Court and could be measured. And even where particular aspects of State cooperation are outside of the Court’s control, as set forth below, there are ways that the degree of and obstacles to cooperation can be assessed in meaningful ways.

Fourth, it would not be overly onerous to develop means to assess performance in the area of State cooperation and support. To the contrary, it would seem relatively simple to develop indicators and collect data on the Court’s effectiveness in operationalizing the Rome system based on those activities identified in the Strategic Plan as priority objectives to securing cooperation and support. Indeed, with respect to many of these areas, the Court already provides some data in its reports to the Assembly of States Parties on cooperation.11

For example, one of the goals identified in the Strategic Plan is “to conclude further voluntary agreements with the Court on enforcement of sentences, relocation of witnesses and interim and other forms of release.”12 To measure its performance in this regard, the ICC could collect information concerning the number of States Parties that have been approached during an identified period with respect to entering such voluntary agreements, and the number that have agreed to enter any particular type of agreement or that have entered into such agreements.

These numbers will give the Court information about its performance in relation to meeting this objective. However, for that information to be useful to the Court or the Assembly of States Parties in terms of developing strategies to improve the ICC’s performance, the Court should seek quantitative and qualitative data from States Parties through surveys administered by the Court or civil society that seek to identify obstacles for States or reasons for delay in entering voluntary agreements with the Court. Collection of this data in one place and in relation to specific performance indicators would allow the Court and the Assembly to more strategically address these obstacles, but would also meaningfully highlight their impact on the Court’s performance.

The same approach just outlined could likewise be applied to the goal of encouraging non-State Parties to ratify the Rome Statute and States Parties and non-State Parties to ratify the Agreement on Privileges and Immunities of the Court.13 The ICC could record the number of States approached during a particular period, the number that have agreed to ratify or have ratified either instrument subsequent to the approach, as well as quantitative and qualitative data regarding obstacles encountered or reasons for delay in these ratifications.

Fifth, it would likewise appear relatively straightforward to develop measures that assess the impact of State cooperation and support in relation to the other key goals under consideration. Indeed, given the operational dependency of the Court on State cooperation and support in achieving its goal of expeditious, fair, and transparent proceedings,14 the development of performance indicators that capture the impact of these factors on Court proceedings would seem to be essential if the Court’s performance as a whole is to be properly assessed.

For example, if the Pre-Trial Chamber of the Court is to perform to its optimal level, States must execute warrants of arrest issued by the Court. Failure to do so will lead to situations (as at present) where there are no pending pre-trial proceedings. In addition, the efficiency of the Prosecutor’s investigations will be impacted as arrests long after the issuance of an arrest warrant will invariably require the Prosecution to update its investigation and ultimately delay the start of the proceedings. To capture and communicate the impact of State cooperation and support on Court proceedings, a number of steps could be taken.

For instance, the Court could identify the date of the issuance of arrest warrants, the numbers of requests sent for cooperation in the execution of arrest warrants to States Parties and non-State Parties, the number of judicial determinations of non-cooperation in arrests by States Parties made by the Chambers, and the responses of non-State Parties. Moreover, by collecting quantitative and qualitative data on the reasons why States Parties and Non-State parties are failing to execute arrest warrants—for example, perceived illegitimacy of the arrest warrant, conflict with competing regional or international obligations, security concerns, or lack of opportunity—the Court’s leadership and management (as well as the Assembly) can more readily address these obstacles.

Further, the Court could record the number of requests made to States for other forms of cooperation (for example, relocation of witnesses to allow disclosure to take place), the time taken for States to respond to such requests, the outcome of the requests, and the reasons given by States for refusing such requests or delaying addressing them. While this data should be collected across cases to allow the Court’s leadership and management to assess the operationalization of the Rome system, it should also be identified within each specific case so that the impact of these factors on individual case performance is apparent.

The same type of data could likewise be collected in relation to any prosecution and defense requests for cooperation from States Parties in a specific case. In this regard, the Court could collect quantitative data on the number of requests made by either party and State responses to those requests, including data on the reasons given for refusal or delay. Again, the collection of this data will allow the Court’s leadership and management to assess the impact of State cooperation on the performance of the Court overall and on specific cases.

Additionally, and in the same vein as is described above with regard to other voluntary agreements, the Court could collate data on the number of States Parties that have adopted implementing legislation to allow cooperation with the Court, the number of States that have been approached by the Court’s leadership and management to adopt implementing legislation during a given period, the number that have agreed to enter such agreements, and the reasons for refusal or delay in reaching such agreements.

These are just some examples of the ways the Court could measure the impact of State cooperation and support on its performance through collection of data on its efforts, the results of those efforts, and the reasons for State action or non-action. By collecting data of the types just described, the Court will be better able to demonstrate the impact upon its performance of State cooperation and to give a more holistic sense of the Court’s performance. And it is by reference to such data that the Court (and the Assembly of States Parties) can develop strategies to improve the Court’s performance in operationalizing the Rome system in support of its activities.

Finally, just as the Court cannot put to one side the impact of State cooperation on its performance, so too can it not ignore the impact of the founding principle of complementarity upon which the Rome Statute rests—a principle meant to ensure that the Court itself is a court of last resort. Indeed, in its Strategic Plan the Court recognizes the importance of complementarity to the success of the Court and the Rome system as a whole and indicates that it will “encourage and facilitate the development of national capacities to achieve the goals of the Rome Statute.”15 There is no reason why the Court cannot measure the activities in which it engages in this respect; and, more importantly, the impact of those activities on the development of national capacities. In this latter regard, independent initiatives along these lines are already underway.16 As discussed below, the Court could benefit from partnering with other organizations to collect relevant data.

Notably, with respect to the fourth goal (“victims have adequate access to the Court”),17 the Second Report demonstrates appreciation for the fact that the Court operates within the framework of the Rome system and that it is important to develop performance indicators, even for certain factors that may not be immediately under its control such as the Trust Fund for Victims, so as to give a holistic picture of the Court’s performance and impact in relation to victim participation and reparations.18 However, as set out above, if the Court’s purpose in developing performance indicators is to allow for a strategic assessment of its performance as a whole, then it is essential that it take the same holistic approach to all other areas of its performance that are critical to its success as an institution.

II. The ICC should tailor and target its performance indicators to ensure maximum efficacy.

As set forth above, the identification of performance indicators for the ICC is part of an ongoing effort to improve the Court’s efficiency and to respond to the request of the Assembly of States Parties that the Court develop performance indicators that allow the Court to “demonstrate better its achievements and needs, as well as allowing States Parties to assess the Court’s performance in a more strategic manner.”19

In light of this, it is essential that the performance indicators being developed reflect all relevant priorities. Indeed, as set forth above in relation to State cooperation and support, it is crucial that the performance measures deployed be aligned with the overall objectives of the Court so as to ensure that a comprehensive picture of the Court’s performance can be presented.

Further, it is essential that the performance assessment process not collect information in the abstract, or simply because data lends itself to measurement; but instead do so in context and, specifically, in a manner most likely to communicate meaningfully about the Court’s achievements and needs. Doing so would enable both the Court and the Assembly to take the information collected through the performance assessment process and evaluate it with an eye to improving performance and efficiency in the long run. Moreover, the adoption of carefully tailored performance indicators, and publicly available reports concerning the results obtained, may facilitate improved public awareness of the problems, and encourage accountability on the part of the Court.

In short, in its development of indicators, the ICC should be clear about its purpose for monitoring its performance and measure what counts. While many of the performance indicators identified by the Court thus far would supply information concerning the Court and its work, it is not clear that this information will necessarily be meaningful for the Court, the Assembly of States Parties or other interested interlocutors or that it will serve the Court’s interests in terms of providing a means by which to facilitate improved performance and efficiency.

For example, in relation to the goal of ensuring transparent proceedings, the Court proposes to collect information on the percentage of judicial decisions that are public vs. confidential and the overall percentage of courtroom time spent in public hearings versus those that are confidential or closed sessions. While such information is, of course, relevant to the transparency of proceedings, it is not apparent how these metrics will contribute to a meaningful assessment of the Court’s performance in the absence of relevant comparative statistics or other means to contextualize the information, such as explanations of the basis for these confidentiality determinations. For instance, will a rate of 74% of decisions being rendered publicly be considered an achievement and evidence that the Court is adhering to its obligation to maximize transparency? Or will it instead be seen as an indication that improvement is needed? And will a drop of 10% in that rate from one year to the next be considered a detrimental change or simply a reflection of the vagaries of litigation? Without more information, will the Court or the Assembly have what is needed to answer these questions or, if results are deemed problematic, to determine what has to change to positively impact the situation? The identification of useful comparators (as discussed below), the collection of generic information as to the reasons a decision or session was made confidential, and other steps could be taken to ensure that the performance indicators deployed in relation to the goal of ensuring transparent proceedings are of maximum utility.

Moreover, while the Court specifies that, in collecting data on the amount of material that is public, it will include “redacted and reclassified versions” of rulings, it adds that “many reclassifications from confidential to public are only undertaken towards the end of a trial; a reliable figure will therefore only be available towards the end of a trial.”20 If the aim is to ensure transparent proceedings, however, the percentage of public (including redacted and reclassified) rulings compared with confidential rulings calculated at different points during a trial is not “unreliable;” to the contrary, it is highly informative as an indicator of how transparent proceedings are while they are still in process. Collecting and reporting this information throughout the life of a case, together with information concerning the reasons for the issuance of reclassified versions of rulings, may also help to identify best practices and ways in which to improve the Court’s overall ability to render its proceedings more transparent, not just at the end of the case but throughout.

In addition, the possibility that rulings may be reclassified even after a case is concluded should not be discounted. The experience of the Mechanism for International Criminal Tribunals (“Mechanism”) suggests that there may be requests for reclassification of or access to classified information long after a judgment has been rendered. The Court would do well to track such activity, both as a measure relevant to transparency and also—in so far as requests stem from proceedings in national courts—as a means to assess its contributions to complementarity goals.

The Court may also wish to consider tracking not just the percentage of judicial decisions but also the percentage of submissions by the parties filed publicly versus those filed confidentially, as it is not only the Court’s own rulings that are relevant to the transparency of the proceedings but the contributions of the parties as well.

With respect to this same goal of ensuring transparent proceedings, the Court also proposes specific indicators concerning the numbers of people accessing the Court’s home page, social media networks, court hearings, and numbers of press-related materials, media information sessions, publications distributed, and numbers of audio and video summaries produced for international media.21 While not abundantly clear, it would appear that the Court intends to demonstrate that it is maintaining transparency in its proceedings by monitoring their accessibility to the broader general public over time. Although the statistics that would be collected under this rubric may indicate the effort made by the Court in ensuring transparency of its proceedings, the important issue in terms of assessing the performance of the Court is the result of that effort.

These are just a few examples of ways in which indicators could be developed and refined so as to capture more relevant data concerning the transparency of proceedings for purposes of ensuring a meaningful performance assessment process.

To take another example, it is not apparent that the indicators identified by the Court with respect to its second goal—the effectiveness of its leadership and management—are designed to best accomplish the desired aims underlying the performance assessment process.

The Court identifies the internal factors of “budget implementation, procurement and human resources issues” as the focus for the Court’s initial development of performance indicators in relation to this goal.22 In this respect, the Court will measure budget implementation rates per court organ, average time of recruitment process, percentage of staff appraisals conducted in a given time, geography and gender balance of staff, and relevant indicators regarding the Court’s procurement process.23 In addition, it is intended that implementation rates of plans to conduct training programs and measures to control priority risks will be measured as of 2017.24

While the Court’s measurement of performance in relation to these indicators will no doubt provide information about the Court’s management, it is questionable whether performance in relation to these particular internal matters is the most critical to the success of the Court as an institution. Indeed, whether staff performance assessments are completed on time, staff recruitment is carried out as efficiently as possible, or the Court conducts as many training courses as intended during a specific period, seems to provide relatively little information of genuine utility. In this respect, given the Court’s limited resources, the ICC would be better served by focusing its assessment of the effectiveness of leadership and management by identifying performance indicators that provide information about those issues that truly matter to the success of the Court as a whole.

For example, it would seem highly relevant to track issues related to the compliance of leadership and management with the regulatory framework in carrying out activities such as recruitment, and the integrity of the leadership and management in their overall decision-making related to the management of the Court. These important aspects of leadership and management performance could be assessed by gathering data concerning the number of staff complaints raised against decisions taken by the leadership and management, including the number of complaints resulting in litigation against the Court by its employees and the number that result in a negative finding against the Court or its leaders and managers. In addition, the number of requests by counsel for administrative review of decisions by the Registrar and the number of decisions decided in favor of the complainant could be reported, together with information concerning the basis for such decisions. The measurement of staff morale and commitment, such as through staff surveys, could also be a valuable way to assess the overall effectiveness of Court management and leadership, as discussed below.25

The effectiveness of the leadership in implementing the system created by the Rome Statute and operationalizing State cooperation in support of the activities of the Court would also appear to be a critical factor to the success of the Court. This could be measured as set forth above. The effectiveness of the Court’s leadership and management in shaping a shared understanding of the mandate of the Court and in promoting a court-wide vision and communicating that vision, including by managing expectations concerning what the Court can be expected to achieve, would also seem to be an issue critical to the success of the Court.

The effectiveness of the Court’s (and in particular Chambers’) leadership in actively managing cases so as to ensure their fairness, transparency, and expeditiousness, discussed further below, would be another important area to assess. Indeed, the Pre-Trial Chamber has already very successfully demonstrated its commitment to active management through addressing the inconsistency of judicial practice (and hence uncertainty and increased litigation) with respect to pre-trial proceedings by issuance of the Chambers Practice Manual.26

Overall, it would appear that those areas of leadership and management identified by the Court in the Second Report are not those that are most meaningful or significant in terms of assessing the performance of the leadership and management. The Court would do well to focus its efforts on those issues that are critical to the success of the Court.

III. The ICC can learn from and draw upon the experiences of other relevant institutions.

In the Second Report, the Court suggests that the use of specific performance benchmarks developed at the national level, as well as other methodologies followed at the national level to assess the performance of courts, are of limited value to the ICC.27 The Court recognizes that the experience of other international courts and tribunals “may be more relevant” but does not provide further details in this regard, simply inviting the views of others.28

While there are undeniable differences between the cases of the ICC, on the one hand, and the cases of national courts or other international criminal courts, on the other; not taking into account comparisons where relevant, and not taking on board lessons from these courts’ approach to performance assessment, limits the efficacy and meaningfulness of the Court’s own performance assessment. Indeed, the ICC would be ill served by simply ignoring other institutions trying core international crimes, disregarding performance indicators used by national courts, or considering its work in isolation from the system of complementarity on which it is founded.

This is particularly true when considering the question of the expeditiousness of proceedings. All international courts and national courts face obstacles in the expeditious completion of their cases. At the International Criminal Tribunal for the former Yugoslavia (“ICTY”) and International Criminal Tribunal for Rwanda (“ICTR”), for example, a backlog of accused persons in custody and limited courtroom availability meant long delays before the start of trials; pre-trial proceedings were often protracted and complicated by State cooperation issues in the securing of evidence, and delays were caused by, for example, the volumes of disclosure or the discovery of new evidence. The longest cases at the ICTY from the time of arrest took twelve and thirteen years29—two of which are currently pending either re-trial or appeal before the Mechanism—while at the ICTR the longest case from the time of arrest took close to twenty years.30 Cases tried in national courts are not necessarily proceeding much more quickly. For example, cases of two lower-level accused transferred to France by the ICTR in 2007 are still ongoing after nine and a half years. A recent case concluded in the Dutch national courts took twelve years from first charges.31

While the challenges faced in individual cases and by various courts may be different, they are not necessarily so different as to make the experience of these other proceedings, or the ways in which these courts measure their own performance, irrelevant to the ICC.

For example, if the overall length of proceedings is considered to be an important way to measure the expeditiousness of its proceedings,32 the ICC may do well to communicate its achievements—and the acknowledged distinctiveness of its proceedings and cases—to the Assembly of States Parties by using, as a measure of its performance, the time frames of other international and national courts, in particular those national courts dealing with core international crimes committed on the territory of another state. Although there may be real and important reasons that the cases at other courts are not directly comparable to ICC cases, providing information to the Assembly and others in this manner may provide a useful context in which to understand the Court’s own performance as compared with simply looking at these cases in a vacuum.

The relevance of other international and national courts is not limited to their possible use as basic comparators for the ICC; they can also be the source of valuable guidance on the development of a variety of performance indicators that could be meaningfully adapted to the ICC context. And, further to the discussion above, they can point to important ways in which performance assessment design can impact meaningfully on the underlying goals of the assessment exercise related to enhancing efficiency.

For example, some national court systems monitor the length of time required to dispose of individual motions and ask judges to indicate the reasons when a particular motion has been pending beyond a designated benchmark without being resolved. The ICC could adopt the same approach in considering its performance, not simply in relation to motions over all; but, perhaps more meaningfully, in relation to the disposal of particular types of motions such as those concerning disclosure or provisional release. Measuring its performance in this manner may assist the Court in identifying areas where motion practice is high, what types of motions take the longest time to dispose of and why. The identification of these factors may enable the Court (or the Assembly of States Parties) to explore measures to improve the Court’s performance overall.33

The ICC could also adopt the national practice of measuring clearance rates on motions filed during a specific time period (e.g. of the 100 motions filed in a year, 70 were disposed of, yielding a clearance rate of 70%). The ICC could measure clearance rates overall, clearance rates on particular types of motions, and clearance rates by specific Chambers. The identification of clearance rates on motions would enable the Court to assess its performance (and the performance of specific Chambers) in keeping up with workload and, if clearance rates are less than 100 percent, to take measures to identify the reasons for backlogs, communicate those reasons, and take steps to address them to improve overall performance.

In this same vein, the Court could also measure its progress in disposing of interlocutory appeals. This requires doing more than simply looking at the number of interlocutory appeals per year as compared with the average duration of such appeals, as is suggested at present.34 Rather, the Court could adopt measures counting the number of interlocutory appeals filed in relation to the same type of issue and the number of interlocutory appeals filed overall. The time taken to dispose of interlocutory appeals in relation to the same issue, and the time taken to dispose of interlocutory appeals overall, could be measured alongside information concerning the overall workload of the Appeals Chamber during the time that the interlocutory appeals have been pending. Such measurements would allow the Court to not only assess its performance in the disposal of interlocutory appeals, but also to identify what issues give rise to the greatest number of interlocutory appeals.

Understanding which issues spark the most litigation, and are most likely to seek interlocutory redress, may allow the Court to take steps to clarify the issues and reduce the volume of interlocutory appeals and thereby expedite its proceedings. Indeed, as procedural and substantive jurisprudential issues are settled through Appeals Chamber jurisprudence over time, one would anticipate seeing an overall improvement in the expeditious conduct of the Court’s proceedings (assuming, that is, that one can control for the effects of the expected fluctuation in overall workload of the Court). It is only by measuring this progress that the Court will be able to adequately convey it to the Assembly of States Parties.

To take another example: the Second Report notably suggests that “specific performance benchmarks developed at the national level will often be inappropriate in the ICC context.”35 While the actual benchmarks used at the national level may not be appropriate for the ICC, it is not apparent that benchmarking more generally would not be a valuable tool for the ICC. Data from past ICC cases, as the Court acknowledges, cannot accurately predict the time line of future cases.36 In this respect, using data from past cases as a benchmark for future cases has limited value in terms of predicting the amount of time a new case may take. However, benchmarking the duration of various stages of a case, and other such variables as recognized by the Court in the Second Report, may provide some limited information on the Court’s performance over time including, importantly, whether the progressive settlement of contested issues of procedure and jurisprudence is having a positive impact on overall case-related trends.37

For instance, given the Pre-Trial Chamber’s issuance of the Chambers Practice Manual and the principles set out within that Manual regarding such procedures as the arrest warrant hearing, one would anticipate that future hearings on the issuance of an arrest warrant will be conducted more expeditiously than past hearings. By collecting data from past cases to use as a benchmark, the Court could measure the impact of the Chambers Practice Manual on procedures such as the arrest warrant hearing and hopefully demonstrate an improvement in expeditiousness with respect to this particular procedure as well as with regard to other relevant elements of the pre-trial proceedings addressed by the Chambers Practice Manual.38

Such a comparative analysis would allow the Court to showcase the actual impact of reforms undertaken in a valuable way. Importantly, this sort of approach could be applied in a myriad of other areas of the Court’s operations to both identify areas warranting targeted efficiency initiatives and identify indicators that will measure the impact of those initiatives. It is not enough for the Court to measure its performance, it must also demonstrate that it is using the data it collects effectively to improve performance and it must be able to communicate about such improvement.

This is not the only way in which the lessons learned from national jurisdictions concerning the utility of benchmarking can be deployed at the ICC. If benchmarks from past cases are to be used, the Court could also borrow from the practice of national jurisdictions and derive from its past cases a range of time frames from the shortest to the longest for any particular phase in the proceedings. Provided that a particular phase of a pending case falls within this time range period, the Court has met its performance goal in relation to that particular milestone in the specific case. If, instead, a pending case produces results that fall outside of the benchmarked range, further inquiries could be made to identify what factors may have led to such a difference.

While benchmarking across different ICC cases in the manner just outlined may offer some benefits, each case is unique and, as the Second Report advocates,39 this fact should be underscored. Indeed, it is inherently difficult to reliably estimate the length of any case, especially at the beginning. This fact is borne out by the experience of the ICTY and the ICTR, where initial predictions on the length of cases were invariably found to be inaccurate, despite these courts having previous experience with considerably more cases than have been completed at the ICC.40 This is because the performance of a court in relation to any particular case will turn upon a host of variables that may or may not be present to varying degrees in other cases.41 Accordingly, a better approach—which seems to be the approach advocated by the Court in the Second Report—is to focus on each individual case and on ensuring that the specific case, with all of its inherent complexities, meets the goal of expeditiousness and fairness (to the extent that expeditiousness is an element of fairness).42

While not referenced, the approach proposed by the Court is similar to that adopted by the ICTY. As was the practice at the ICTY, the Court proposes developing its own case-specific projected timelines at the outset of a case based on an assessment of the case’s complexity and taking into account a wide range of other factors, risks, and possible mitigation measures, and then to track its performance against those projections.43

As the experience of the ICTY demonstrates, there are important advantages to be gained from the Court establishing projections in relation to individual cases in this manner, advantages that go above and beyond the ability to track expeditiousness. While early estimates of the amount of time that will be taken for each stage of the proceedings may be inherently unreliable, the process of setting goals for the duration of a case (and of its phases) encourages proper management and the identification and communication about problems encountered at various stages of the proceedings with a view to addressing such problems in a timely manner.

By monitoring the progress of a case against initial (or updated) projections, the ICTY was able to demonstrate its commitment to efficiency and to explain the complexities of its proceedings. This approach has also promoted both greater transparency and accountability at the ICTY. But, more fundamentally, the adoption of time frames for the completion of cases, and a continuous eye on measures to meet those time frames, helped to create a culture of efficiency at the ICTY, involving all of the Tribunal’s organs in a shared commitment to the expeditious completion of the Tribunal’s work. It could do the same for the ICC.

IV. Taking into account perceptions and the external impact of the Court’s work will make the Court’s performance assessment more meaningful.

One of the dangers of undertaking a performance assessment that is primarily, if not exclusively, inward-looking is that it ignores external perceptions of the work of the institution. The importance of such perceptions to the ICC’s ability to carry out its work and to its legitimacy cannot be overstated. Indeed, with respect to the Court’s fourth goal (victim access to the Court), the Second Report already indicates that the Court appreciates that it cannot ignore external perceptions of its work or its constituency.44

However, much more should be done vis-à-vis the collection of data concerning perceptions and impact so as to ensure a comprehensive and holistic assessment of the Court’s performance. Nowhere is this more true than in relation to the Court’s performance in terms of fairness.

According to the Second Report, some aspects of the ICC’s work, while of central importance, can be difficult to measure in practice. This is, the Court suggests, particularly true with regard to assessment of the fairness of the Court’s proceedings, which is a component of the Court’s first key goal. Indeed, in terms of assessing the fairness of proceedings under goal one, the Court does not identify any indicators separate from those concerning the expeditiousness of the proceedings, instead suggesting that the indicators adopted in that regard “seek to measure relevant aspects“ of expeditiousness and fairness taken together.45 Moreover, the concept of fairness is, in many ways, culturally bound and, as the Second Report suggests, contested.46

However, the fact that it may be difficult to measure the Court’s performance with regard to ensuring fair proceedings does not mean that the Court should abandon any effort to collect information as to this particular component of goal one, or that it should assume that fairness will be accurately and sufficiently reflected in measurements aimed at assessing the expeditiousness of the proceedings. These two principles are not synonymous, and it would be dangerously reductive to treat them as if they were. It is therefore important for the Court to gather information related to fairness in particular.

The hurdles in this regard are not insurmountable. As the Court itself recognizes in the Second Report, there are a host of different ways that fairness could be measured, such as measures related to time given to the defense.47 In addition, as the Court recognized in the First Report, potential indicators include the percentage of findings by Chambers confirming fair trial violations, and the percentage of grounds of appeal successfully arguing fair trial violations.48 The Second Report further notes that the Chambers, the Office of the Prosecutor, and the Registry all have distinctive roles and responsibilities in ensuring fairness,49 though no organ-specific performance indicators are set out in the Second Report. More could be done in this respect. Other more general measures, such as indicators designed to compare how many contested prosecution motions were granted versus how many contested defense motions were granted and the types of motions involved, might also yield important information.

But the measures just mentioned, while undoubtedly relevant, are not sufficient. Given the contested and variable nature of the concept of fairness and the environment in which the Court operates (where perceptions of the Court by all stakeholders matter), it would be particularly prudent and useful for the Court to assess not just objective markers such as those identified above but also subjective ones: perceptions of the fairness of the proceedings of the Court.

In this respect, and following the practice of many national jurisdictions, the Court should survey its constituency—including members of the public in the affected communities, counsel appearing before the Court, and others—on their perceptions of the fairness of the proceedings. While in the Second Report the Court shies away from the collection of quantitative data through the use of surveys due to resources concerns,50 with the availability of on-line electronic tools such as SurveyMonkey, the Court’s own assessment of relevant professional and public constituents does not need to be resource intensive. (Other possibilities involving partnerships with external bodies are discussed below and in the Second Report.)51

Accepting that fairness may mean different things to different constituents, the Court could craft particular questions to pose to particular stakeholders. For example, to counsel appearing before the Court, the question could be posed “Do you consider the proceedings were fair in the sense of each party being offered an equal opportunity to present their case?” The Court could pose the same question to other legal professionals and judges who follow the Court’s proceedings. To victims impacted by the proceedings or the population of the situation country, however, the Court could administer through the Office of Public Counsel for Victims (or seek the assistance of an on the ground non-governmental organization to administer) simple surveys focused on perceptions of the fairness of the Courts’ proceedings in relation to a particular case. For example, victims could be asked: “Do you think the proceedings were fair to the extent that victims were given a reasonable opportunity to present their issues to the Court?”

In addition, the Court could survey other relevant stake-holders, such as members of the Assembly of States Parties, by asking questions related to definitions and perceptions of fairness, doing so once again through a simple means such as SurveyMonkey.

The collection of data regarding perceptions of fairness of the Court’s work and proceedings from a broad range of stakeholders may assist the Court in understanding how its performance is perceived by relevant stakeholders and address areas of misunderstanding or dissatisfaction. This is particularly important given that perceptions of fairness of proceedings at the ICC directly impact the ICC’s legitimacy as a court.

The use of survey tools to assess performance can be applied to other goals as well. Indeed, with respect to assessing the second goal—the ICC’s leadership and management are effective—it is important that the ICC not ignore the perceptions of relevant stakeholders both external and internal to the Court, including the staff of the institution (as set forth above and below), the Assembly of States Parties, and civil society. Adopting the practice of national jurisdictions, the ICC could administer simple surveys asking questions regarding the effectiveness of the Court’s leadership and management, and the answers obtained could be a helpful means to alert the Court to perceptions and provide the opportunity for Court leadership to correct misperceptions or address criticisms and demonstrate a commitment to improving perceptions.

The collection of information from staff would be particularly meaningful in this regard. Numerous national jurisdictions recognize the impact of employee commitment and morale on the performance of an institution. Indeed, research indicates that “a high level of employee engagement—its creation and maintenance—is one of the most crucial imperatives of any successful organization” and “(i)t is characterized as a proxy for court excellence insofar as employee engagement correlates with individual, group and organizational performance in areas such as retention, turnover, productivity, customer service and loyalty.”52 Given the importance of employee commitment and morale to the success of any organization, drawing on the practice of national jurisdictions, the Court could at the very least determine through simple surveys the percentage of employees “who indicate […] that they are productively and positively engaged in the mission and work of the court.”53

V. The ICC does not need to undertake performance assessment in isolation and can productively partner with civil society, academia, and other interested parties to maximize the utility of the performance assessment process.

According to the Second Report, for “the purpose of developing Court-wide indicators, the Court needs to be modest and concentrate on a reduced number of measurable criteria that adequately reflect the overall operational performance of the Court without overburdening the exercise with too many criteria and details.”54 It is likewise noted that the development and refinement of performance indicators per organ, in relation to State cooperation and concerning other factors, will have to take place in future, if at all.55

The Court’s instinct to limit the performance assessment process in light of the ICC’s limited resources is entirely understandable. Indeed, there is a real risk for any organization adopting performance targets that those targets and related performance assessments will take on an outsized importance as compared with the institution’s core work. That said, there is also a risk that undertaking performance assessment in a way that is too modest or minimalist will undercut the overall utility of the project.

If the ICC is to make its own performance assessment process meaningful, it would do well to take specific steps to ensure that this exercise is properly resourced and prioritized, including integrating performance assessment tasks into management procedures, staff performance plans, and evaluations. Allocating clear roles and responsibilities, setting explicit objectives, ensuring adequate training, and seeking to improve upon processes and results in regular cycles, would all be important ways to facilitate the performance assessment process.

More generally, and as set forth above, there is real value in ensuring that the performance assessment process be rendered more holistic, more comprehensive, and more useful now, rather than waiting for this to occur in some undefined future. A solution must therefore be swiftly found that will enable the Court to engage in a more meaningful performance assessment process without going beyond those resources that it can reasonably devote to the process.

One such approach would be to look outside of the Court and find other organizations and individuals to partner with in collecting and evaluating data related to the Court’s performance, particularly where such data relates to external factors or interlocutors. This is something that the ICC has already considered with respect to its fourth goal. In this regard, the Court advocates the collection of data from victims through the use of surveys and, due to resource concerns, notes the possibility of engaging with other bodies and institutions involved with international criminal justice to assist with this work.56 The Court could adopt a similar approach with respect to gathering relevant data in relation to many of the other aspects of its work discussed above. The collation of survey and other data in this manner by partners such as non-governmental organizations and academic institutions will allow the Court to bring greater meaning to its own data collection efforts and measures (which, as the Court itself acknowledges, must be understood in context)57 while at the same time allowing the Court to use its limited resources in the most meaningful ways possible.

Conclusion

In presenting its Second Report, the Court is careful to indicate that its development of performance indicators is a work in progress and that the Second Report is “a first attempt at an international level to provide a holistic picture of judicial activities through performance indicators” that may need to be modified “as some factors may turn out to be less relevant that others, and further indicators may need to be added.”58

Given the scope of the Court’s undertaking with regard to the development of performance indicators and its pioneering role in that respect, it is perhaps not surprising that the ICC is proposing that its approach to performance assessment will evolve and grow over time. While some degree of growth and evolution are inevitable, it is nevertheless essential that a more concerted effort be undertaken now to ensure a more holistic approach to performance assessment. A piecemeal or partial approach, or one that proceeds without having been purposefully designed to yield meaningful information, risks being misleading, of minimal utility, or both.

As set forth above, the ICC does not operate in a vacuum; it operates in a system—the Rome system—and for a performance assessment to be meaningful it must take full account of the impacts of that system on the performance of the Court as well as the Court’s impact on external stakeholders. The ICC should measure what matters, the fundamentals that are critical to the success of the Court, and do so in a way that takes account of relevant experience of other courts and relevant perceptions by stakeholders both inside and outside the Court. It is by doing all of this that the performance of the ICC as a whole can be properly assessed. And it is in doing all of this that the Court will be able to better position itself to effectuate meaningful change, to harness greater possible efficiencies, and to continue to carry out its mandate in the best way possible.

Endnotes — (click the footnote reference number, or ↩ symbol, to return to location in text).

  1. 1.

    The views expressed herein are those of the author alone and do not necessarily reflect the views of the organization for which the author works or the United Nations in general. I would like to thank Willow Crystal, Deputy Chef de Cabinet, for her helpful comments and edits throughout the drafting of this piece.

  2. 2.

    International Criminal Court, Second Court’s Report on the Development of Performance Indicators for the International Criminal Court (Nov. 11, 2016) [hereinafter Second Report], available online, archived.

  3. 3.

    See, e.g., Fatou Bensouda, ICC Prosecutor, Remarks at Tenth Plenary Meeting, Discussions on the Efficiency and Effectiveness of Court Proceedings (Nov. 24, 2015), available online, archived; International Justice Monitor, Establishing Performance Indicators for the International Criminal Court, Nov. 23, 2015, available online, archived.

  4. 4.

    International Criminal Court, Strategic Plan 2013–2017 (Interim Update—July 2015) (Jul. 24, 2015) [hereinafter Strategic Plan], available online.

  5. 5.

    Second Report, supra note 2, ¶¶ 5–7.

    (The four key goals identified by the Court are: “(a) The Court’s proceedings are expeditious, fair and transparent at every stage; (b) The ICC’s leadership and management are effective; (c) The ICC ensures adequate security for its work, including protection of those at risk from involvement of the Court; and (d) Victims have adequate access to the Court.”)

  6. 6.

    Id. ¶ 7.

  7. 7.

    Id. ¶¶ 23–24. See also Id. ¶ 7.

  8. 8.

    See International Criminal Court, Report of the Court on the Development of Performance Indicators for the International Criminal Court ¶ 12 (Nov. 12, 2015) [hereinafter First Report], available online, archived.

    (“It should also be noted that, while the Court has not at this stage tried to develop specific indicators for external factors that can affect its performance, these factors unavoidably remain relevant when evaluating performance on issues which are seen as largely under the Court’s control. In particular, the duration of cases is directly affected not only by the quality and efficiency of the Court’s work, but also by a wide range of external factors.”)

  9. 9.

    Second Report, supra note 2, ¶ 1.

  10. 10.

    Strategic Plan, supra note 4, ¶¶ 3.4-.6.

  11. 11.

    See, e.g., Assembly of States Parties, Report of the Court on Cooperation, ICC-ASP/15/9 (Oct. 11, 2016), available online.

  12. 12.

    Strategic Plan, supra note 4, ¶ 3.4.

  13. 13.

    Id. ¶ 3.5.

  14. 14.

    Second Report, supra note 2, ¶ 5.

    (This is the first of the four key goals identified as critical for the assessment of the ICC’s overall performance.)

  15. 15.

    Strategic Plan, supra note 4, ¶ 3.6.

  16. 16.

    Resource Center on Complementarity Monitoring, Int’l Nuremberg Principles Acad., available online (last visited Jun. 25, 2017).

  17. 17.

    Second Report, supra note 2, ¶ 5(d).

  18. 18.

    Id. ¶ 86.

  19. 19.

    Id. ¶ 1, quoting Assembly of States Parties, Strengthening the International Criminal Court and the Assembly of States Parties, ICC-ASP/13/Res.5 at Annex I, ¶7(b) p.47 (Dec. 17, 2014), available online.

  20. 20.

    Id. n.31.

  21. 21.

    Id. ¶¶ 44–45.

  22. 22.

    Id. ¶ 48.

  23. 23.

    Id. ¶ 49.

  24. 24.

    Id.

  25. 25.

    See Strategic Plan, supra note 4, ¶ 2.4.

    (Reference is made to a “structured follow-up on staff surveys,” which suggests that the Court is indeed surveying the morale of its employees.)

  26. 26.

    International Criminal Court, Chambers Practice Manual, Feb. 2016, [hereinafter Manual], available online.

    (The Manual provides clarity to the parties on a range of issues in pre-trial proceedings that were unclear due to a lack of shared vision on the part of pre-trial judges as to the purpose of the pre-trial procedure and the appropriate interpretation of the procedures applicable. However, for the impact of the Manual on performance to be assessed, the Court would need to measure the time and complexity of pre-trial proceedings prior to the issuance of the Manual and after the issuance of the Manual, as discussed below.)

  27. 27.

    Second Report, supra note 2, ¶¶ 18–19.

  28. 28.

    Id. ¶¶ 19–20.

  29. 29.

    Nikola Šainović, surrendered May 2, 2002, Appeal Judgement Jan. 23, 2014 (Prosecutor v. Sainović et al., Case No. IT-05-87, formerly known as Milutinović et al.); Jovica Stanišić and Franko Simatović arrested on Mar. 13, 2003, Appeal Judgement Dec. 15, 2015, retrial ordered and currently pending before the Mechanism (Prosecutor v. Stanišić & Simatović, Case No. MICT-15-96); Vojislav Šešelj, surrendered Feb. 23, 2003, appeal pending before the Mechanism (Prosecutor v. Šešelj, Case No. MICT-16-99); Jadranko Prlić et al., surrendered Apr. 5, 2004, Appeal Judgement of the ICTY expected November 2017 (Prosecutor v. Prlić et al., Case No. IT-04-74).

  30. 30.

    Joseph Kanyabashi and Elie Ndayambaje arrested Jun. 28, 1995 in Belgium and transferred to the ICTR on Nov. 8, 1996, Appeals Judgement Dec. 14, 2015 (Prosecutor v. Pauline Nyiramasuhuko, Arsène Shalom Ntahobali, Sylvain Nsabimana, Alphonse Nteziryayo, Joseph Kanyabashi, Elie Ndayambaje, Case No. ICTR-98-42-A).

  31. 31.

    See Dutch Arms Trafficker to Liberia Given War Crimes Conviction, The Guardian, Apr. 22, 2017, available online.

  32. 32.

    See Second Report, supra note 2, ¶ 35.

  33. 33.

    Id. ¶¶ 38–39.

    (The Court proposes calculating the number of motions filed by all parties and participants but does not propose identifying the issues upon which such motions are filed.)

  34. 34.

    Id. ¶ 39(g)(ii).

  35. 35.

    Id. ¶ 19.

  36. 36.

    Id. ¶ 25.

  37. 37.

    Id. ¶ 47.

  38. 38.

    Id.

  39. 39.

    Id. ¶ 36.

  40. 40.

    See United Nations Security Council, Report of the Office of Internal Oversight Services, Evaluation of the Methods and Work of the International Tribunal for the Former Yugoslavia, A/70/873-S/2016/441 at ¶¶ 29–32 (May 12, 2016), available online.

  41. 41.

    Second Report, supra note 2, ¶ 36.

  42. 42.

    Id. ¶¶ 36–37.

  43. 43.

    Id. ¶ 37.

    (“(T)he different phases in the ‘life’ of a case can provide working assumptions for the likely overall duration per case. If and where delays are incurred vis-à-vis the timelines set by a chamber, the reasons for such delays can be documented for purposes of transparency and lessons learnt with a view to developing improved ways over time for anticipating and managing such difficulties.”)

  44. 44.

    Id. ¶ 78.

  45. 45.

    Id. ¶ 34. First Report, supra note 8, ¶ 29.

    (“While fairness of proceedings cannot be directly measured, some potential indicators may be identified: (a) % of findings by Chambers confirming fair trial violations pursuant to motions of the parties; (b) % of grounds of appeals successfully arguing fair trial violations in Chamber decisions or judgments.”)

  46. 46.

    Second Report, supra note 2, ¶31.

  47. 47.

    See id. ¶¶ 31–33.

  48. 48.

    First Report, supra note 8, ¶ 29.

  49. 49.

    Second Report, supra note 2, ¶ 30.

  50. 50.

    Id. ¶ 82.

  51. 51.

    See id.

  52. 52.

    Dan H. Hall & Ingo Keilitz, Global Measures of Court Performance, International Framework for Court Excellence, Discussion Draft Version 3, at 4 (Nov. 9, 2012), available online.

  53. 53.

    Id.

  54. 54.

    Second Report, supra note 2, ¶ 21 (emphasis in original).

  55. 55.

    Id. ¶ 28.

  56. 56.

    Id. ¶¶ 80–82.

  57. 57.

    Id. ¶ 24.

  58. 58.

    Id. ¶ 28.

Shany Avatar Image Professor Yuval Shany Hersch Lauterpacht Chair of Public International Law Hebrew University of Jerusalem

An ICC Availability Bias? The Performance Indicators relating to the Four “Key Goals” Provide Useful Information on the Court’s Operations, but are of Limited Utility in Measuring the Court’s Overall Effectiveness

It appears as if the key goals are a hodgepodge of process and outcome goals, which are related to different evaluative criteria—judicial effectiveness, cost-effectiveness and efficiency—and do not sufficiently relate to the core business of the ICC—e.g., ending impunity and developing international criminal law.

Summary

The four key goals identified by the ICC in 2015 and reaffirmed in a slightly revised version in a Second Report from 2016—expeditious, fair, and transparent proceedings; effective leadership and management; adequate security; and access for victims to the Court—offer a useful starting point for evaluating the performance of the Court. They have also facilitated the collection of many relevant performance indicators—quantitative data that enables the Court, the ASP and outside observers to track changes over time in judicial performance (and to compare it to the practice of other international criminal courts). Still, one may question how central these key goals are for evaluating the over-all effectiveness of the ICC. In fact, it appears as if the key goals are a hodgepodge of process and outcome goals, which are related to different evaluative criteria—judicial effectiveness, cost-effectiveness, and efficiency—and do not sufficiently relate to the core business of the ICC—e.g., ending impunity and developing international criminal law. Moreover, one may question whether the Court is sufficiently sensitive to the risk of availability bias, which might lead to distorted evaluation of the fulfilment of the four key goals, and of the overall operations of the Court. Finally, one may question the choice of “primarily under the control of the court itself” as a reason for narrowly focusing only on the four key goals. Such a criterion results in an analysis that ignores the most important goals of the Court, and its application even to the four key goals is questionable.

Argument

The four key goals identified by the ICC in 20151 and reaffirmed in a slightly revised version in a 2016 Second Report2—expeditious, fair, and transparent proceedings; effective leadership and management; adequate security; and access for victims to the Court—offer a useful starting point for evaluating the performance of the Court. This is because they relate to four important dimensions of the Court’s operations—the manner of conduct of trials and the provision of information about them, administration of the Court as an international organization, ensuring security for those involved in the judicial process, and addressing the needs of victims. In particular, it would be beneficial for those monitoring the performance of the Court to make use of the performance indicators collected in the periodic reports issued by the Court in order to evaluate the Court’s operations. Although the Second Report is correct in cautioning against over-reliance on the data presented3 and in underscoring the importance of context,4 it does provide an interesting snapshot of the Court’s work, which allows outside observers to track changes in its performance over time and to identify possible causes for fluctuations in the data. It is regrettable, however, that the Court has not deemed it useful to use performance indicators available for other international courts, such as the ICTY and ICTR, as benchmarks for comparative assessment of the performance of the ICC in those areas where such a comparison may be particularly useful, such as length of proceedings, number of motions submitted by the parties, transparency of documents, media coverage, recruitment times, etc.

My more fundamental criticism of the exercise relates, however, to the selection of the key goals. Although the justifications provided in the Second Report for selecting the goals appear sensible—“measurable criteria” and “primarily under the control of the court itself”—and although the Second Report qualifies the exercise as “work in progress,”5 the Second Report does purport to reflect the “overall operational performance of the Court”.6 This proposition is hard to reconcile with the vocabulary of court effectiveness, which foregrounds other aspects of judicial performance and suggests that, in fact, the key goals selected may not be as central for evaluation of judicial effectiveness as the Second Report insinuates.

Arguably, an effective international court is one that fulfills the goals established by relevant stakeholders,7 and the normative expectations of the ASP—which is the principal target audience for the performance indicators reports—take a pride of place in establishing the main goals of the ICC. In a previous work that I co-authored with Sigall Horovitz and Gilad Noam, we claimed that the ICC has been entrusted by its “mandate providers” (i.e. the ASP) with the following goals: ending impunity, encouraging domestic proceedings against perpetrators, generating deterrence against future crimes, promoting peace and security, internalization of international criminal law into domestic legal systems, development of international criminal law, satisfaction of victim needs, conveying a message of condemnation of international crimes, projecting an image of procedural fairness and legitimacy, and—possibly also—establishing an historical record of atrocities.8 It is striking that only two of the four key goals are even included in this list and that the Court’s most prominent goals relating to the fight against international crimes—arguably its raison d’être—are excluded.

In fact, with the exception of the fourth goal, the key goals identified pertain to judicial processes and not to judicial outcomes, a distinction which further underscores their limited utility. Put bluntly, it is not clear whether fewer or more prosecution motions, or the number of court days, affects the “overall operational performance” of the Court; nor is it clear how the implementation of training programs, or the number of data security incidents, affects judicial effectiveness. Moreover, the four key goals are a hodgepodge of performance targets relating to three different evaluative criteria: judicial effectiveness (goal-attainment), cost-effectiveness (ratio between investment of resources and outcomes), and efficiency (relationship between positive impacts and costs and other negative externalities). I am of the view that evaluation of operational performance should be conducted with such evaluative criteria in mind, since conclusions based on performance indicators associated with each of these criteria are likely to be tied to very different policy reforms, involving very different methods and areas of operation: reforms geared to better attain the principal goals, reducing Pareto sub-optimal waste, preventing incidental harms, nurturing unintended benefits, etc.

Ultimately, examination of the Second Report raises questions of whether the Court has been sufficiently sensitive to the risk of availability bias in performance evaluation,9 both in relation to evaluation of the fulfilment of the four key goals, as well as of other aspects in the “overall operations of the ICC”. With respect to the four goals themselves, the most acute measurement problem in the Second Report involves the notion of fairness, on which the approach taken is confusing. On the one hand, the Second Report notes that “care must be taken to balance speed with fairness”;10 but, on the other hand, it concludes that:

(D)iscussions at Glion and other meetings also highlighted that the concepts of expeditiousness and fairness are in fact intertwined and affecting each other, and that relevant indicators may either relate to both or that fairness-related values many need to be read in light of expeditiousness and vice versa.11

A close look at the Second Report, suggests that an important factor which contributed to the decision to collapse the review of fairness and expeditiousness has been the acknowledged difficulty of measuring fairness.12 So, despite the grave doubts as to whether performance indicators measuring fairness and expeditiousness are actually correlated,13 the Second Report eventually measures them together. This looks like availability bias writ large!

Moreover, as indicated above, the Second Report does not contain information, which could throw light on the “overall operations of the Court”, related to some of its other principal goals such as ending impunity and developing international criminal law. To the extent that this is related to the associated measurement problems (which are, no doubt, formidable), this is understandable. Still, the Court should then have scaled down even further any claims regarding the ability of the four key goals to produce an overall picture of the Court’s performance and clarify that they merely provide some interesting data on some aspects—not necessarily the most important aspects—of the Court’s work. In this context, it is particularly regrettable that the Second Report lacks easily available data on preliminary examinations (such as duration and investigative activities) and general outreach activities, which could assist in evaluating the attainment of ICC goals other than the four key goals.

Finally, one may question the choice of “primarily under the control of the court itself” as a reason for focusing only on the four key goals: Not only does insistence on such a criterion lead to the omission from the analysis of some of the most important goals of the Court (e.g., ending impunity and internalization of international criminal law by State parties), but even its application to the four key goals is questionable. As the Second Report indicates, expeditiousness of proceedings and implementation of security measures often depend on cooperation by third parties,14 and it is not clear why such aspects, and not others, remain “primarily under the control of the court itself”. Furthermore, it is dubious that the very idea that a meaningful performance evaluation of a Court, one of whose main “claims to fame” is that it is situated at the heart of a new international criminal law legal system,15 can be undertaken without addressing its impact on other elements of the system. To the contrary, it may be alleged that the ability of the Court to secure legal cooperation from States, and to motivate them to take actions necessary to advance the goals of the Court (introduction of new criminal legislation, initiation of criminal proceedings or referrals, and transfer of suspects), is one of the most important indicators of the Court’s performance.

To conclude, the compilation of performance indicators by the ICC is a useful exercise. It may help the Court itself, the ASP, and other outside observers evaluate certain aspect of the Court’s work with a view to forming an opinion on the Court’s effectiveness, cost-effectiveness, and efficiency. The picture generated by the Second Report of 2016 is, however, very limited. It focuses on the attainment of four key goals, while omitting reflection on no-less important goals of the institution. The Second Report also over-relies on quantitative data, which raises concerns of an availability bias. Moving forward, it would be desirable for the Court to expand the list of goals, whose attainment is being monitored, with a view to including more central “outcome goals,” including those over which the Court exercises only partial degrees of control. It would also be desirable for the Court to supplement the quantitative analysis it offered in the Second Report with qualitative research that investigates perceptions of key aspects of the Court’s performance. This research should include the Court’s effectiveness and legitimacy in the eyes of core constituencies including defense teams, victims, and outside observers.

Endnotes — (click the footnote reference number, or ↩ symbol, to return to location in text).

  1. 1.

    International Criminal Court, Report of the Court on the Development of Performance Indicators for the International Criminal Court (Nov. 12, 2015), available online, archived.

  2. 2.

    International Criminal Court, Second Court’s Report on the Development of Performance Indicators for the International Criminal Court (Nov. 11, 2016), available online, archived.

  3. 3.

    Id. at 6

    (“For the purpose of developing Court-wide indicators, the Court needs to be modest and concentrate on a reduced number of measurable criteria that adequately reflect the overall operational performance of the Court without overburdening the exercise with too many criteria and details. Some aspects, while central to key goals of the institution, are very difficult to measure in practice.”)

  4. 4.

    Id.

    (“(A)ll Court-wide performance indicators need to be read and evaluated in their specific context, particularly where they relate to case-specific performance.”)

  5. 5.

    Id. at 7

    (“The Second Report continues to be work in progress in light of the fact that it is indeed a first attempt at an international level to provide a holistic picture of judicial activities through performance indicators.”)

  6. 6.

    Id. at 6.

  7. 7.

    For a discussion, see Yuval Shany, Assessing the Effectiveness of International Courts 13–16, 31–35 (2014).

  8. 8.

    Id. at 226–237 (with Sigall Horovitz & Gilad Noam).

  9. 9.

    See Michael J. Albers, Introduction to Quantitative Data Analysis in the Behavioral and Social Sciences 209 (2017). The term “availability-bias” was coined by Tversky and Kahneman to describe distortions in decision making through over-reliance on readily available information. Amos Tversky & Daniel Kahneman, Availability: A Heuristic for Judging Frequency and Probability, 5 Cognitive Psychology 207 (Sep. 1973), available online, archived.

  10. 10.

    Second Report, supra note 2, at 8.

  11. 11.

    Id.

  12. 12.

    Id. at 6

    (“This is particularly the case of fairness, which may be very difficult to measure as such and would require great efforts to identify relevant proxy values instead.”)

  13. 13.

    Id.

    (“Expeditiousness and fairness are also examples of potentially conflicting goals, reflecting the difficulties of measuring the performance of a judicial institution in qualitative terms.”)

  14. 14.

    Id.

    (“External factors such as local security conditions and the cooperation of local and international partners can however have a substantial impact on the Court’s ‘own’ performance indicators.”)

    See also id. at 8

    (“The duration of each case is affected by a number of case-specific factors such as … cooperation of States in providing needed assistance, and the speed with which such assistance is provided.”)

  15. 15.

    See e.g., David Tolbert, Stocktaking: Peace and Justice, Review Conference of the Rome Statute, ICC Doc. RC/ST/PJ/M.6, at 8 (Jun. 1, 2010), available online.

    (“The Rome Statute and the ICC form part of a new legal order”);

    M. Cherif Bassiouni, Introduction to International Criminal Law 25 (2d ed. 2013.)

Stahn Avatar Image Carsten Stahn, Ph.D., LL.M. Professor of International Criminal Law & Global Justice Leiden University

Is ICC Justice Measurable? Re-Thinking Means and Methods of Assessing the Court’s Practice

In many cases, the importance of ICC proceedings lies not only in the production of certain judicial outcomes (i.e. cases, trials, reparation), but in the transformation of certain normative discourses, the creation of common discursive spaces or the initiation of longer-term processes.

Summary

The effectiveness of international criminal justice is challenged from many sides.1 There is an increasing trend among international institutions to respond to critiques through the development of performance indicators. The ICC started to formalize indicators for its operation in 2015. Such instruments seek to reply to efficiency critiques, such as concerns related to cost or the length of proceedings. They are a double-edged sword. There is a risk that they trivialize the complex nature of international criminal justice. They face many methodological challenges. The effects of the ICC are more complex and diffuse than anticipated. Some of the most important effects of ICC justice are not measurable or quantifiable. Performance indicators are inherently linked to macro goals. There is a need to broaden perspectives. The assessment of the ICC requires a holistic account which views institutional performance in the context of systemic considerations and perceptions by a variety of stakeholders. The ICC not only matters to states or global audiences, but mostly to affected societies. It is thus important to determine what justice goals are important for local communities. In many cases, the importance of ICC proceedings lies not only in the production of certain judicial outcomes (i.e. cases, trials, reparation), but in the transformation of certain normative discourses, the creation of common discursive spaces, or the initiation of longer-term processes. Complementarity is an important indicator for the success of the Court that complements other factors such as fairness, independence, and accessibility of justice.

Argument

I. Introduction

Measuring effects of the ICC is a difficult task. For a long time, it was taken for granted that international criminal justice produces beneficial effects for accountability, such as delivering effective justice, conducting fair trials, enhancing domestic capacity, or contributing to the creation of an international rule of law. International criminal justice was largely a faith-driven project.2 Deeper and critical inquiry into the effects of international criminal courts and tribunals has only started with growing critiques and concerns about the performance of international criminal courts and tribunals, and doubts as to the extent they are able to meet expectations. In this context, it has become more common to assess performance and validity of courts against quantitative or technical criteria, such as economic cost-benefit analysis and rational source allocation.3

Individual courts and tribunals, such as the ICTY, ICTR , or the SCSL have addressed performance and impact-factors as part of their completion and legacy strategy.4 The ICC started to develop performance indicators in 2015. It has taken a relatively modest point of departure. It has identified four key goals as reference: Expeditiousness, fairness and transparency of proceedings, effective leadership and management, adequate security of its work, and adequate access of victims to the Court.5 Efforts inside criminal courts and tribunals are complemented by emerging studies on the legal and societal impact of the practice of the ICC.6 Some studies have found that trials have a positive effect on human rights practices by virtue of a number of normative (social alarm, demonstration effect) and coercive factors, like punishment and enforcement.7 International criminal justice has been credited for providing visibility to the plight of hundreds of thousands of victims of crimes. But lenses and indicators to assess impact remain contested. There is no one-size-fits-all formula.8 Anthropologists, such as Sally Engle Merry, caution against the temptation to measure effects of accountability mechanisms in quantitative terms.9 Another warns that statistical references may easily turn into an advocacy tool.10

This contribution discusses some of the merits and weaknesses of existing ICC approaches to assessing impact and performance. It analyzes the ICC’s choice of indicators, their link to macro goals, comparators for ICC performance, and the role of context. It argues that some fundamental premises of ICC approaches deserve re-consideration. It concludes with some recommendations to improve the status quo.

II. Indicators: From a 2D to a 3D vision

The existing ICC framework is largely ICC-centric. It is heavily focused on ICC operations. It is two-dimensional in its approach. It assesses ICC operations mainly from two main angles: (i) input—the relevance of certain procedural and operational elements underlining ICC action (e.g., inclusiveness, transparency of proceedings); and (ii) output—fair and effective outcomes.11 It largely misses an essential third dimension, namely its relation to affected constituencies. This focus is of key importance. Performance assessment should not be a one-way street or a self-serving exercise of performance validation. It requires a relational account which takes into account how justice is communicated and perceived. The ICC matters to states or global audiences, but mostly to countries and societies where the crimes are committed. For instance, the fact that a trial constitutes a success from the perspective of the ICC does not necessarily imply that it brings justice for local communities. Similarly, fairness (e.g., the fair treatment of participants in the process and the equal and unbiased application of norms and standards) is often as much about action as it is about perception. Thus far, such a relational perspective is only partially taken into account in ICC policy, namely in relation to the Second Report’s Goal 3—victim access to the Court.12

The narrow focus on input and output stands in contrast with the experiences of ICC practice. The effects of ICC action are far more complex and diverse than the Rome Statute drafters expected.13 The Court has produced many unanticipated effects and certain outcomes that go beyond mere rationalization. The virtue of ICC engagement is often more abstract and diffuse than suggested. Its strength lies in the affirmation of respect for law, the reinforcement of certain legal or moral norms, broader narrative or didactic functions, or even symbolic functions (e.g., recognition as a victim, collective reparation)14 Measuring performance simply in terms of the number of ICC cases; or the expeditiousness, fairness, or inclusiveness of proceedings; fails to take into account this complexity. For instance, it does not explain why ICC proceedings may enjoy high societal relevance, although they are highly selective or do not even result in ICC cases. It struggles to explain why ICC decisions have strong normative authority, even though they might suffer from non-compliance. Many outcomes of ICC actions are not concrete results, but processes. For example, a key effect of ICC proceedings may lie in their broader transformative effect on justice discourses or the creation of common discursive spaces. These nuances are sidelined in the existing policy framework.

The Open Society Justice Initiative has suggested a wider taxonomy of indicators. It includes three main types: “operational indicators” that are geared at assessing the court’s operations, broader “systemic indicators” that view ICC operations in the broader context of the Rome Statute as a system of justice, and “impact indicators” that take into account “the degree to which people affected by the crimes […] understand and engage with the process, as well as the court’s legacy.”15 Such a 3D vision provides a more nuanced and richer account of the complexity of ICC justice.

III. Goal Relevance

The formulation of indicators is useful if they are not merely technical, but tied to progress towards certain macro goals. Cost-benefit analysis of institutions typically focuses on quantitative factors, such as the number of trials, the number of convictions or acquittals, or the length of proceedings. But the value of the ICC goes beyond the quality or fairness of ICC trials. There is a need to place the assessment in perspective in relation to the goals of the ICC.

In this context, it is important to distinguish the functions of the institution as a whole from the objectives of specific proceedings. The ICC as an institution pursues certain broad objectives,16 such as retribution (i.e. prosecution and punishment),17 deterrence,18 prevention, elucidation of facts,19 justice for victims,20 or strengthening of domestic jurisdiction.21 They are related to the diverse functions of the Court and the architecture of the Rome Statute. They are optimization commands linked to the systemic environment of the ICC. Their assessment depends on a balancing of objectives and the perception of different stakeholders. Proceedings have a more limited function. They give effect to rights, or pursue specific rationales, such as determining guilt or innocence or revealing a “legal,” rather than a broader “historical truth.”22 The assessment of operational performance cannot be fully isolated from the underlying macro goals of the institution. They shed a differentiated light on timeframes and performance.

A good example is the assessment of the length of proceedings. This assessment differs if pace is related not only to criminal adjudication; but also to other objectives such as fact-finding, the establishment of a record, or transformative goals. A figure of four to five years may appear long for a trial. But it is less threatening if it is associated with a broader process of the clarification of historical context. In some cases, from an effectiveness point of view, it may even be desirable to postpone charges in order to gradually build lines of responsibility, or to improve the accurateness of charges or the completeness of justice.23 The passage of time may thus, in some circumstances, represent an asset and result in a better pursuit of justice. Some of the purported transformative goals, such as capacity-building or reconciliation, cannot be reached without longer-term engagement since they are contingent on recovery and stabilization. The weighing of these goals may require the ICC to balance expediency against the need for expending time.

It is thus unrealistic to define absolute benchmarks. The main challenge is rather to define acceptable limits of tolerance and to set operational performance within the context of ICC goals.

IV. Relevant Comparators

The assessment of performance depends on the relevant comparator. The problem is that there is no direct reference point. As the Court has rightly noted, it is misguided to compare the length of ICC investigations or trial statistics to traditional domestic cases.24 A more appropriate comparison are cases of other international criminal courts and tribunals or transnational crime cases. Such cases are not directly comparable25 but may provide approximations. From this perspective, the actual length of proceedings may be less dramatic than conventional wisdom suggests. The length of international criminal proceedings is driven by a wide range of factors, such as the context-driven nature of atrocity crime, the scope and complexity of the charges, the level of responsibility of defendants, the number of suspects, the availability of evidence, or the number of motions filed. A comparative study has shown that international cases are only “modestly slower” than complex cases in domestic settings.26 For instance, in complex transnational crime cases, it is not unusual that proceedings take between five to eight years from investigation to completion.27

V. Context

One key dilemma of assessing ICC performance is the unpredictability of outcomes and the indeterminacy of causal pathways. The operation of indicators is heavily influenced by contextual factors, including factors that are beyond the control of the ICC.

Many of the existing ICC indicators, such as fairness of proceedings, security, or access of victims to ICC justice, cannot be assessed separately from other influences, or result from a combination of factors. For instance, the Katanga case has shown that ICC operation may be a partial success in relation to one factor (e.g. victim participation), but doubtful in relation to others (e.g. fairness to the defendant).28 Indicators, such as security or access of victims to justice, depend on the timing of ICC intervention (e.g. in conflict or after conflict), relevant State support, the length of ICC engagement, and capacity/resources. It remains difficult to deal with the role of unintended effects.29 Some of these effects, such as the mobilizing of civil society or domestic alliances to increase accountability, are positive.30 Others are negative (e.g. a risk of derailing peace negotiations, rising victim expectations, or the “mimicking” of ICC processes at the domestic level).31 Neither all positive nor all negative effects can be solely attributed to the ICC. For instance, even a perfectly run ICC trial with inclusive reparation might fail to reconcile tensions among victim groups, since the perception of reality by these groups is heavily shaped by certain emotional or rational factors (e.g. prior attitudes, beliefs, narratives) that impede engagement with other views.32 There is thus a need to differentiate among measurable and non-measurable outcomes, intended and unintended effects, and long-term and short-term impacts, and to develop criteria determining the correlation of cause and effect.

A weakness of the ICC framework is that it fails to differentiate sufficiently between different stages of proceedings. The current focus is selective. It relates to criminal proceedings, i.e. pre-trial and trial. But much of the activity of the Court takes place before the start of the case. For example, preliminary examinations have turned into one of the most important areas of activities of the Court.33 The drafters of the Rome Statute anticipated that preliminary examinations would mark a gateway to investigations and cases. But practice has shown that preliminary examinations have a genuine function of they own, even if they do not culminate in cases before the ICC.34 They may draw alert to atrocities or serve as an incentive for domestic investigations and prosecutions.

The concept of complementarity deserves a more prominent role in the assessment of ICC justice. It marks a key indicator to determine to what extent the system of the Court functions in context. The ICC has made some steps in this direction. It has implicitly recognized that complementarity provides a means to guide “exit” from situations.35 It has noted that the “ability of the domestic jurisdiction to exercise its primacy over the crimes” requires attention in the “progressive completion of the ICC’s activities in a situation.”36 But the role of complementarity extends beyond “exit.” It is a recurrent feature from preliminary examination until trial. Former ICC Prosecutor Luis Moreno-Ocampo has alluded to the importance of complementarity, when noting that the “number of cases before the Court should not be a measure of its efficiency,” but rather the regular functioning of national institutions.37 The idea that the success of the Court is related to its complementarity “footprint” deserves further clarification. Complementarity is not necessarily an institution-related indicator. But it may serve as an important systemic indicator related to the functioning of the Rome Statute. The key test is not necessarily whether states have law on the books that implement or copy ICC crimes and procedures, but rather to what extent they “internalize” them in equivalent form.38 Experience from ICC situations suggests that mere domestication of laws and procedures alone (e.g. through lawmaking) does not ensure effective justice.39 It is important to assess whether and how they are effectively applied in the domestic realm by legislative, executive, or judicial bodies.

VI. Some Recommendations

The attempt to develop indicators for an assessment of performance is a double-edged sword. It is driven by a desire to counter some of the critiques that the ICC faces. But it faces many methodological challenges. It should not be used as an instrument to market ICC performance40 or to assert its superiority. One risk of the existing framework is that it trivializes ICC justice. Some caution is required in order to avoid counter-productive side effects.

First, it is important to resist the temptation to quantify all aspects of ICC practice. Quantification may detract from the value of the ICC. It is necessary to go beyond mere technical analysis in order to understand the Court’s diverse impact. Some of the most important contributions of the ICC have been on a broader normative level—i.e. through social alarm, expressivist functions, demonstration effects, impact on discourse, etc.—rather than actual enforcement. These contributions easily get lost in economic cost-benefit analysis.

Second, the existing indicators should be related more clearly to the nature and purpose of the ICC. The focus on operational practices is too narrow. A deeper understanding of ICC justice requires a more holistic approach, including consideration of systemic factors. Complementarity plays a key role. As Valerie Arnould has argued:

(Q)uestions of the ICC’s success should be approached by exploring the local changes it produces and how this interacts with broader processes of social, political, and institutional change at the local level in which the ICC’s operations are necessarily embedded.41

Third, the differences between different stages of the proceedings need to be articulated more clearly. Factors such as fairness, expeditiousness, or access of victims to justice, apply in different forms during preliminary examination, investigation, pre-trial, or trial stages. They must therefore be assessed by partly different methods.

Fourth, impact assessment is not a one-way street. It requires dialogue and engagement with affected entities. External perspectives serve as an important check on the Court’s goals and ambitions. Outreach, demonstration of misunderstandings, or identification of inflated expectations alone do not suffice to address ICC critiques. It is necessary to better analyze how perceptions about the ICC are formed and by what factors they are shaped. The existing framework is strongly oriented towards states and international audiences. A key challenge is to inquire more deeply what justice goals matter for local communities.

Endnotes — (click the footnote reference number, or ↩ symbol, to return to location in text).

  1. 1.

    See e.g., Ralph Zacklin, The Failings of Ad Hoc International Tribunals, 2 J. Int’l Crim. Just. 541 (2004), Oxford Academic paywall, ResearchGate paywall; William Schabas, The Banality of International Justice, 11 J. Int’l Crim. Just. 545 (2013), available online.

  2. 2.

    Carsten Stahn, Between ‘Faith’ and ‘Facts’: By What Standards Should We Assess International Criminal Justice?, 25 Leiden J. Int’l L. 251 (2012), earlier version (Oct. 31, 2011), available online, archived. David Koller, The Faith of the International Criminal Lawyer, 40 N.Y.U. J. Int’l L. & Pol. 1019 (2008), Lexis/Nexis paywall.

  3. 3.

    See Yuval Shany, Assessing the Effectiveness of International Courts (2014).

  4. 4.

    On the ICTY, see Diane F. Orentlicher, OSJI, Shrinking the Space for Denial: The Impact of the ICTY in Serbia (May 1, 2008), available online. On the SCSL, see Antonio Cassese, Report on the Special Court for Sierra Leone Submitted by the Independent Expert, Dec. 12, 2006, available online; Office of the UN High Commissioner for Human Rights, Maximising the Legacy of Hybrid Courts, HR/PUB/08/2 (2008), available online.

  5. 5.

    International Criminal Court, Report of the Court on the Development of Performance Indicators for the International Criminal Court (Nov. 12, 2015), available online, archived; International Criminal Court, Second Court’s Report on the Development of Performance Indicators for the International Criminal Court (Nov. 11, 2016) [hereinafter Second Report], available online, archived.

  6. 6.

    A number of situation-specific surveys have been conducted over the past years. See Phuong Pham, Patrick Vinck, Marieke Wierda, Eric Stover & Adrian di Giovanni, ICTJ et al., Forgotten Voices: A Population-Based Survey on Attitudes About Peace and Justice in Northern Uganda (Jul. 2005), available online; Phuong Pham, Patrick Vinck, Eric Stover, Andrew Moss, Marieke Wierda & Richard Bailey, UC Berkeley HRC et al., When the War Ends: A Population-Based Survey on Attitudes about Peace, Justice, and Social Reconstruction in Northern Uganda (Dec. 2007), available online; Patrick Vinck, Phuong Pham, Suliman Baldo & Rachel Shigekane, UC Berkeley HRC et al., Living with Fear: A Population-based Survey on Attitudes about Peace, Justice, and Social Reconstruction in Eastern Democratic Republic of Congo (Aug. 2008), available online; Stephen Smith Cody, Eric Stover, Mychelle Balthazard & K. Alexa Koenig, UC Berkeley HRC, The Victims’ Court?: A Study of 622 Victim Participants at the International Criminal Court (2015), available online.

  7. 7.

    Kathryn Sikkink, The Justice Cascade: How Human Rights Prosecutions are Changing World Politics (2011); Hun Joon Kim & Kathryn Sikkink, How Do Human Rights Prosecutions Improve Human Rights After Transition?, 7 IJHRL 69 (2012–2013), available online. For a critique, see Padraig McAuliffe, The Roots of Transitional Accountability: Interrogating the ‘Justice Cascade’, 9 Int’l J. L. Context 106 (2013), Cambridge Journals paywall.

  8. 8.

    Oskar N.T. Thoms, James Ron & Roland Paris, State-Level Effects of Transitional Justice: What Do We Know?, 4 IJTJ 329 (Aug. 31, 2010), Oxford Academic paywall.

  9. 9.

    Sally Engle Merry, Measuring the World: Indicators, Human Rights, and Global Governance, 52 Current Anthropology 83 (Apr. 2011), available online.

  10. 10.

    McAuliffe, supra note 7, at 106 (“advocacy cascade”).

  11. 11.

    For a broader account, see Barbara M. Oomen, Justice Mechanisms and the Question of Legitimacy: The Example of Rwanda’s Multilayered Justice Mechanisms, in Building a Future on Peace and Justice: Studies on Transitional Justice, Peace and Development: The Nuremberg Declaration on Peace and Justice 175 (Kai Ambos, Judith Large & Marieke Wierda Eds., 2009), available online.

  12. 12.

    Second Report, supra note 5, ¶ 78.

  13. 13.

    Christian De Vos, Sara Kendall & Carsten Stahn, Eds., Contested Justice: The Politics and Practices of International Criminal Court Interventions (2015) [hereinafter Contested Justice], available online; Rome Statute of the International Criminal Court, Adopted by the United Nations Diplomatic Conference of Plenipotentiaries on the Establishment of an International Criminal Court, Jul. 17 1998, UN Doc. A/CONF.183/9 [hereinafter Rome Statute].

  14. 14.

    Mark A. Drumbl, Atrocity, Punishment and International Law 12 (Jun. 2007); Robert D. Sloane, The Expressive Capacity of International Punishment: The Limits of the National Law Analogy and the Potential of International Criminal Law, 43 Stan. J. Int’l L. 39 (2007), available online; Mirjan Damaska, What is the Point of International Criminal Justice?, 83 Chi.-Kent. L. Rev. 329 (2008), available online.

  15. 15.

    Open Society Justice Initiative, Briefing Paper: Establishing Performance Indicators for the International Criminal Court 4 (Nov. 2015), available online, archived.

  16. 16.

    In 2004, the UN Secretary-General outlined a list of broadly defined goals. They include: retribution (i.e. bringing responsible perpetrators to justice), ending violations and preventing their recurrence, securing justice and dignity for victims, establishing “a record of past events,” promoting national “reconciliation,” “re-establishing the rule of law,” and contributing to the “restoration of peace.” See Report of the Secretary-General, The Rule of Law and Transitional Justice in Conflict and Post-Conflict Societies, ¶ 38, UN Doc. S/2004.616 (Aug. 23, 2004), available online.

  17. 17.

    Possible indicators include: number of trials, objectivity and fairness of proceedings, adequate sentencing for crimes, domestic cooperation.

  18. 18.

    Possible indicators include: changes in military strategy of armed forces and civil/military personnel (cost-benefit analysis), number of human rights violations, enforcement of decisions by domestic agents and judiciaries, political and social support for perpetrators, number of arrests, limitation of safe havens.

  19. 19.

    Possible indicators include: substantiated record of incidents and events in charges and decisions, accommodation of different narratives of conflict in charges and testimony, societal responses to judicial decisions.

  20. 20.

    Possible indicators include: status and treatment of victims in judicial proceedings, post-testimony satisfaction of victims, satisfaction with reparation, mobilisation of other alternative forms of justice.

  21. 21.

    Possible indicators include: implementation of crimes in domestic law, public confidence in the domestic justice system, reform of domestic institutions.

  22. 22.

    See generally Richard A. Wilson, Writing History in International Trials (May 2011).

  23. 23.

    Alex Whiting, In International Criminal Prosecutions, Justice Delayed Can Be Justice Delivered, 50 Harv. Int’l L.J. 323, 331–3 (Jun. 2009), available online.

  24. 24.

    Second Report, supra note 5, ¶ 19.

  25. 25.

    There are various differences between the ICC and the ad hoc tribunals. They relate to the selection and scope of situations; the application of complementarity (Rome Statute, Arts. 17-19); the investigative mandate, including the obligation to investigate both incriminating and exonerating circumstances (Art. 54(1)); the confirmation hearing (Art. 61); the participation of victims at the various stages of the proceedings (Art. 68(3)); and the need to provide for reparation proceedings (Art. 75).

  26. 26.

    Jean Galbraith, The Pace of International Criminal Justice, 31 Mich. J. Int’l L. 79, 142 (Nov. 2009), available online.

  27. 27.

    See European Commission for the Efficiency of Justice, Length of Court Proceedings in the Member States of the Council of Europe Based on the Case Law of the European Court of Human Rights (2d ed, Jul. 31, 2011), available online.

  28. 28.

    For analysis, see Carsten Stahn, Justice Delivered or Justice Denied? The Legacy of the Katanga Judgment, 12 J. Int’l Crim. Just. 809 (Sep. 2014), Oxford Academic paywall, ResearchGate paywall.

  29. 29.

    For a study of unintended effects in peace operations, see Chiyuki Aoi, Cedric de Coning & Ramesh Thakur, Eds., Unintended Consequences of Peacekeeping Operations 6 (2007), available online. On unintended effects in the ICC context, see Alejandra Espinosa, Exploring the Unintended Effects of ICC Intervention on the Domestic Politics of the DRC, Sudan, and Kenya, Special Working Paper Series on ‘Unintended Effects of International Cooperation’ (Jan. 2017), available online.

  30. 30.

    Geoff Dancy & Florencia Montal, Unintended Positive Complementarity: Why International Criminal Court Investigations Increase Domestic Human Rights Prosecutions, Am. J. Int’l L. (forthcoming 2017), SSRN paywall. Earlier version (Jan. 20, 2015), available online, archived.

  31. 31.

    Sarah Nouwen has shown that complementarity becomes ambivalent, if domestic actors simply imitate the normative and procedural universe of the ICC. See Sarah M. H. Nouwen, Complementarity in the Line of Fire: The Catalysing Effect of the International Criminal Court in Uganda and Sudan 413 (Nov. 7, 2013). See also Christian De Vos, All Roads Lead to Rome: Implementation and Domestic Politics in Kenya and Uganda, in Contested Justice, supra note 13, at 379, 402–407.

  32. 32.

    On the ICTY, see Marko Milanović, The Impact of the ICTY on the Former Yugoslavia: An Anticipatory Postmortem, 110 Am. J. Int’l L. 233 (2016), available online.

  33. 33.

    Office of the Prosecutor, International Criminal Court, Policy Paper on Preliminary Examinations (Nov. 2013), available online, archived.

  34. 34.

    See Carsten Stahn, Damned If You Do, Damned If You Don’t: Challenges and Critiques of ICC Preliminary Examinations, J. Int’l Crim. Just.. (forthcoming Apr. 2017), available online.

  35. 35.

    Elizabeth Evenson & Alison Smith, Completion, Legacy and Complementarity at the ICC, in The Law and Practice of the International Criminal Court 1259 (Carsten Stahn ed. 2015).

  36. 36.

    Assembly of States Parties, Report of the Court on Complementarity: Completion of ICC Activities in a Situation Country, ICC-ASP/12/32, ¶ 29 (Oct. 15, 2013), available online.

  37. 37.

    Luis Moreno-Ocampo, Ceremony for the Solemn Undertaking of the Chief Prosecutor of the International Criminal Court, 2 (Jun. 16, 2003), available online.

  38. 38.

    On complementarity and norm internalization, see Jann K. Kleffner, Complementarity in the Rome Statute and National Criminal Jurisdictions 309 (2008).

  39. 39.

    See supra note 31.

  40. 40.

    Christine Schwöbel, The Market and Marketing Culture of International Criminal Law, in Critical Approaches to International Criminal Law—An Introduction 264 (Christine Schwöbel ed., 2014).

  41. 41.

    Valerie Arnould, Rethinking what ICC Success Means at the Bemba Trial, openDemocracy, Sep. 14, 2016, available online (last visited Jun. 28, 2017).