On the Mechanisms of Idea Evaluation and Selection

  • Cui, Zhijian (PI)
  • Dong, Jiyang (Co-PI)
  • Borucki, Chet (Co-PI)
  • Baraboshkin, Vladimir (Other participant)
  • Kozhabek, Assemgul (Other Faculty/Researcher)
  • Saginova, Nurgul (Administrative and clerical specialist)

Project: FDCRGP

Project Details

Grant Program

Faculty Development Competitive Research Grant Program 2018-2020

Project Description

The concept of open innovation has gained prominence in both research and practice (Chesbrough 2006, Terwiesch and Xu 2008, Terwiesch and Ulrich 2009, Bockstedt, et al, 2016) as firms across industries open up their innovation processes (Chesbrough 2006, Terwiesch and Xu 2008). Through open innovation communities, both internal and external players have become integral members of the innovation process (Von Hippel 2005) and are helping to generate thousands of creative ideas (Girotra et. al. 2010, Poetz and Schreier 2012).
Due to resource constraints, firms cannot pursue all generated ideas. Instead, only a small portion of the most promising ideas are selected after being evaluated (Terwiesch and Xu 2008, Terwiesch and Ulrich 2009). For example, in the entertainment industry, of 300 “idea scratches”, only 5–6 survive and are commercialized into films (Terwiesch and Ulrich 2009). Other research finds that, at most, 10–30% of the ideas from open innovation engagements are eventually considered by firms (Klein and Garcia 2015). It implies that firms need managerial guidance on evaluating and selecting most promising ideas.
In practice, different idea evaluation processes have been observed in the open innovation community. In particular, three processes were most popular: the scoring, ranking and voting processes. In the scoring process, the evaluators are asked to rate the quality of each idea by assigning it a score (for example, a number from 0 to 100). Next, the scores provided by the evaluators are aggregated according to certain rules, and the ideas that receive the highest scores are selected. The scoring process has been widely applied in project management and research proposal evaluation (Dahan and Hauser 2002, Toubia and Florès 2007, Dahan et al. 2010). In contrast with the scoring process, the ranking process does not require the evaluator to assign a score to each specific idea. Instead, the evaluator simply orders all ideas according to their perceived qualities. Then, the rankings made by the evaluators are aggregated, and the ideas ranked at the top of the list are selected. The ranking process is also popular in practice. For example, in its weekly design contests, Threadless decides which T-shirt designs to produce by selecting the entries that receive the highest overall ranks (Malone et al. 2010). The third process often used by the firms is the voting process. In this process, each evaluator has the right to vote for certain amount of ideas. Different from ranking and scoring, the evaluator’s judgement in the voting process is binary: yes or no. The evaluator does not further differentiate the quality of ideas that receive “yes” votes. The ideas are ordered according to the number of votes it received and the ideas receiving most votes are selected. For example, in another open innovation contest called “Osram LED—Emotionalize Your Light”, users were asked to vote for a set of design entries and the top five entries receiving most votes will be funded (Cruz-Cunha 2012).
Unfortunately, very few existing research on idea evaluation has investigated the efficacies of the different processes being used to evaluate new ideas (King and Lakhani 2013). A process itself does not depend on individual experts and therefore could yield more robust evaluation accuracy in different situations. Understanding the efficacy of different evaluation processes is therefore crucial for the success of open innovation. Aiming to provide new insights concerning the idea evaluation processes, we asked the following question: what factors are determining the efficacy of abovementioned idea evaluation processes? Which process works the best under what conditions? In the scope of this project, the evaluation efficacy refers to the likelihood of selecting the best quality ideas.
Second, related with the first research question outlined before, this project also aims to examine the impact of “wisdom of crowds” in idea evaluation. The extant literature of innovation management argues that when many evaluators are involved in the evaluation process, the individual inputs and information are aggregated and the errors could be systematically cancelled out. In extreme, when the crowd size is sufficiently large, the “best” idea can always be selected. This effect is referred to as “wisdom of crowds” (Surowiecki, 2004). Empirically, how the wisdom of crowds affects the accuracy of idea evaluation has never been examined in depth. Specifically, it remains unclear under what conditions the wisdom of crowd is effective and how “large” the crowd must be in order to achieve the desired idea evaluation accuracy. We aim to fill this academic void and contribute new results to this stream of literature.
Third, the extant literature of innovation management has extensively examined the characteristics of the evaluators that are related with the idea evaluation efficacy (Kornish and Hutchison-Krupat, 2017), including knowledge background (Randall et al., 2007), familiarity of the product/idea (Kornish and Ulrich, 2014), etc. Most of the existing studies have been directed to the characteristics of individual evaluator instead of the dynamics at the group level, i.e., the interactions among multiple evaluators. In social science, the critical concept that describes the social behaviors and group norms is culture (Cremer 1993, Hermalin 2001). The extant literature of innovation management has highlighted the importance of culture in enabling, fostering and facilitating innovations in the organizations (Efrat, 2014; Hutchison-Krupat and Chao, 2014). Similarly, the cultural factors, such as tolerance of failure, the preference of risk and/or competition, etc. could also have huge impacts on the efficacy of idea evaluation and selection. Unfortunately, to our best knowledge, very few existing studies have examined this. Based on the results concerning the first and second research questions, this project aims to contribute new insights on understanding the cultural impact in idea evaluation and selection.
Figure 1 summarizes our research questions in this proposal.
Figure 1: Research Questions: On the Mechanisms of Idea Evaluation and Selection
Literature Review
The extant innovation management literature has mainly viewed idea evaluation as a prediction task and aimed to answer two questions: (1) What should be asked? and (2) who should be asked (Kornish and Hutchison-Krupat, 2017)? Specifically, rich results have been found concerning the design of idea evaluation criteria (Ulrich and Eppinger, 2015) and the evaluation performances of different types of participants (Kornish and Ulrich, 2014). Very few studies have examined how different methods of implementing the evaluation criteria influence evaluation efficacy. For example, Wilson and Schooler (1991) demonstrated that forcing the participants to rate ideas explicitly according to certain criteria may yield worse results than when the judgments are made holistically based on an aggregated feeling. The underlying mechanism is that when a subject is asked to decompose an idea into specific dimensions, their perception of an idea may change for the worse.
Some existing studies have examined the relationship between idea generation and evaluation under different team structures and found mixed results. For example, in an experimental setting, Putman and Paulus (2009) examined the performance of two team structures (nominal vs. interactive) when the participants conducted both idea generation and evaluation. In a nominal team, the members generated and evaluated ideas individually, whereas in interactive groups, the members created and evaluated ideas collectively. These authors found that the average originality of the selected ideas that received the best evaluations was higher for nominal groups than the interactive groups. However, as assessed by independent idea quality evaluators, participants under both team structures rarely selected their best ideas (Putman and Paulus 2009). Faure (2004) and Rietzschel et al. (2010) conducted similar studies and also found that nominal teams are better than interactive teams in terms of idea generation. However, they did not find a significant difference between the interactive and nominal teams in terms of the quality of the selected ideas. These findings imply that the potential of selecting better ideas is not (fully) realized in the idea evaluation stage. In contrast with these studies, we do not aim to examine the idea generation process; instead, we focus on the idea evaluation stage. Additionally, to our best knowledge, there are quite limited insights in the innovation management literature on the comparisons and efficacies of different idea evaluation processes.
This research proposal is also related to, yet different from, strategic project selection in the new product development (NPD) literature. A project is typically described as a temporary endeavor undertaken to create a unique product, service, or result (Chao and Kavadias 2008, Chao et. al., 2009). In organizations, the selection of projects is often a strategic decision of resource allocation that is characterized by multiple, conflicting and incommensurate criteria (Liesio et al. 2007). From the perspective of project selection, a project should be selected based not only on its intrinsic quality but also its strategic role in a firm’s overall research and development (R&D) portfolio (Chao and Kavadias 2008, Chao et. al., 2009). In contrast, an idea does not necessarily require resource allocation or implementation. In our study, the “quality” refers to the intrinsic quality of each individual idea, and the goal is to identify and select the highest quality ideas. The strategic “fit” between a firm’s needs and an idea’s intrinsic quality is beyond the scope of our study.
With respect to our first research question (comparing three idea evaluation processes), the existing literature in decision analysis has compared the rating (scoring) and ranking processes based on bounded rationality. Under the perfect rationality assumption in which each decision maker comprehends and processes complete information about the alternatives and selects the alternative that maximizes the expected utility, both processes are expected to produce similar outcomes, i.e., these two processes can be considered interchangeable (Klein and Garcia 2015). However, numerous experimental studies have demonstrated that the consistency between these two processes is relatively high only at the team level and is quite low at the individual subject level (Moore 1975, Russell and Gray 1994).
Some existing studies have argued that the ranking process should outperform the rating (scoring) process due to two reasons. First, despite its simplicity, the rating (scoring) process tends to elicit primarily average scores from evaluators and thus tends to do a poor job of distinguishing between good and excellent ideas. In contrast, the ranking process forces participants to provide relative rankings of idea pairs rather than rating ideas individually, which can help to alleviate rating lock (Klein and Garcia 2015). Second, the ranking process requires a respondent to pay a high level of attention to all items together while making the selection decision, whereas in the rating process, the respondent is focused on one item at a time. Consequently, the ranking process forces decision makers to consider more information and could lead to better evaluation accuracy (Alwin and Krosnick 1985). In a related study, Harzing et al. (2009) found the ranking mechanism to be superior in evaluating cross-cultural values. These authors found that the ranked responses (with the stimuli presented in a concise manner) conformed more closely to their benchmark than the scored or rated responses.
In contrast, another school of thought predicts better performance from the scoring process. The scoring process is designed in such a manner that an evaluator focuses her attention on appraising the quality of only one idea at a time. In contrast, in the ranking process, the evaluator’s attention cannot be focused on any single particular idea. Consequently, the ranking process is a much more complex and demanding task than the scoring process, and the processing and transforming of information to arrive at an evaluation decision thus becomes more difficult and stressful in the ranking process (Baddeley 1992; Medin et al. 2004), which in turn leads to mistakes in the evaluation due to incomplete transformation of information (Bettman and Kakkar 1977). Additionally, the ranking process does not necessarily perform better than the scoring process in terms of reducing the close-to-average bias of the evaluators. Several related studies have found that people who have to choose from many alternatives tend to utilize simplifying strategies to reduce the cognitive complexity of the decision (e.g., Simon 1974, Hastie and Dawes 2001). With these simplifying strategies, people often engage in pre-choice screening of the available alternatives (Beach 1993) to make more cognitive resources available for careful consideration of the remaining options (Parks and Cowlin 1995). As a result, the ranking of ideas that the evaluators propose often represents the evaluators’ pre-ranking preferences, which could undermine the efficacy of the ranking process.
The voting process has been extensively studied in the context of political science, with a main focus on examining the strategic voting behavior of the voters who have conflicts of preferences over multiple political candidates. Under different voting schemes, the voters may strategically abandon his/her most favorable political candidate and vote for one that could possibly beat the least favorable candidate (we refer the reviewers to Myatt (2007) for a comprehensive review of strategic voting literature). In the context of innovation management, however, all evaluators can be incentivized to share a fundamentally similar interest, i.e., selecting the best quality idea. In this case, the voting record can truthfully reflect the evaluator’s intrinsic preference over the ideas to be evaluated. In this project, we do not examine the potential strategic voting behavior of the idea evaluators but instead focus on the voting process itself. To our best knowledge, very few academic studies have rigorously studied the efficacy of the voting process in idea evaluation. Some practitioners from the innovation industry argue that the voting process is less cognitively demanding than having to perform a full ranking of all the options, because participants are not required to give a comparative judgment of each option, and it allows participants to express a preference for more than one option at the same time†. Therefore, the voting process could potentially have higher evaluation accuracy over both the ranking and scoring processes. By contrast, other practitioners argue that in the voting process, the evaluators are still expected to review, consider and compare all options before casting their votes. As a result, the voting process will also be vulnerable to all weakness of the ranking process‡. In addition, as there is no difference among ideas that receive the votes, the voting process may do a poorer job of distinguishing between good and excellent ideas than the scoring process does§. This implies that the voting process may be the least effective evaluation process among three.
Overall, scarce results and arguments have been found in the literature and some of them are conflicting. In the context of idea evaluation, it remains unclear which process would dominate under what conditions. This project aims to experimentally explore and answer this research question.
With respect to our second research question (wisdom of crowds in idea evaluation), it has been shown in the literature that when crowds have diverse sources of information and expertise with a problem area, they can provide more accurate collective forecasting than can even well‐informed individuals (Larrick et al., 2012). Indeed, a substantial amount of research has gone into examining the best method of combining the predictions of many individuals as accurately as possible (Budescu & Chen, 2014; Larrick & Soll, 2012). Using the data from a crowdsourcing website, Mollick and Nanda (2016) found that the online evaluations made by thousands of anonymous end users could achieve evaluation accuracy almost as high as the expert panel.
However, some conflicting results have been found that the wisdom of crowds tends to be better for idea generation than for idea evaluation (Bonabeau, 2009). In practice, crowds may suffer from a wide variety of factors identified by social psychologists and cognitive scientists that degrade the quality of crowd decision making (e.g. Bahrami et al., 2012). For example, groups can be subject to emotional contagion (Barsade, 2002) and even to hysterical reaction (Balaratnasingam & Janca, 2006), that may cause crowd members to act in non‐rational ways. As a result, Surowiecki (2004) argued that to make a “right” collective decision, the crowds in general need to satisfy three conditions: (1) diverse; (2) independent and (3) decentralized.
Another critical factor that may hinder the efficacy of implementing wisdom of crowds in idea evaluation is crowds’ expertise. In general, the crowd, as a whole, may not have the relevant expertise to evaluate projects (Simmons et. al., 2011). There are no admissions criteria for entering the crowd, making it unclear how the crowd would develop criteria to identify potentially high quality projects in areas where they do not have expertise. While increasing the number of evaluators may statistically increase the chance of having those with relevant expertise, it also carries the risk of diluting the “weight” of the experts in a larger pool of crowds. In addition, getting access to a larger pool of crowds could be quite costly for the firms. As a result, it becomes practically relevant and academically important to examine the “optimal” size of crowds in order to achieve the desired level of evaluation accuracy. We have conducted some preliminary simulation analysis by making the idea quality and the expertise of the evaluator as random variables and examined the relationship between the number of evaluators and the evaluation accuracy under different conditions (different distributions of idea quality and expertise). Our preliminary results show that the relationship is not linear, i.e., the evaluation accuracy may not strictly increase with the size of the crowds. We aim to conduct a finer and more in-depth analysis on this question.
With respect to our third research question (the impact of organizational culture on idea evaluation), the extant literature of innovation management has extensively studied the impact of culture on innovation performance at both firm and country levels (Efrat, 2014; Tian and Wang, 2014). At the country level, Efrat (2014) examined that the impacts of different cultural dimensions, including power distance, individualism, masculinity, and uncertainty avoidance, and showed that most of these dimensions have strong and lasting impact on the tendency to innovate.
At the firm level, culture is defined as the shared beliefs and organizational preferences among a firm’s employees about the optimal course of action (e.g., Cremer 1993, Hermalin 2001). In the context of innovation management, empirical studies on the effect of corporate culture remain sparse mainly because corporate culture has different aspects and there is inadequate theoretical guidance on how a specific cultural aspect affects firm performance (Ostroff et. al., 2003).
The recent theoretical literature on corporate innovation, however, provides a specific context to examine the effect of corporate culture on firm performance. This literature shows that tolerance for failure is critical in motivating innovation (see, e.g., Manso, 2011; Tian and Wang, 2014; Hutchison-Krupat and Chao, 2014). This is because unlike standard tasks, exploration of novel untested approaches is subject to a high probability of failure. However, the existing studies are mainly examining the idea generation or a firm’s tendency to innovation. To our best knowledge, no existing study has ever examined the impact of organizational culture on idea evaluation and selection. In this project, we aim to fill this academic void by systematically comparing the idea evaluation efficacy across the crowds from different countries, including China, India and Kazakhstan and comparing the ideas selected by the crowds under different cultural contexts.
StatusCurtailed
Effective start/end date1/1/183/12/19

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.