The Brazilian Innovation Award: Analysis of assessment instrument validity and reliability

Goal: This article presents an analysis of the validity and reliability of the assessment instrument used in the 2016–2017 Brazilian Innovation Award. 
Design/Methodology/Approach: This study used multivariate analysis techniques on the data from 2,651 companies. Two hypotheses were tested. The first (H1), related to reliability, used Pearson’s Correlation and Cronbach’s alpha. The second hypothesis (H2), related to validity, used Confirmatory factor analysis (CFA). 
Results: The instrument is reliable and valid and is an important mechanism for the assessment of the maturity of innovation management. 
Limitations of the investigation: One of the constructs can still be improved in future studies and applications, although it has demonstrated acceptable levels of reliability and validity. 
Practical implications: The combined use of the constructs “organizational dimensions” and “innovation outcomes” proved to be an accurate conceptual model for assessing the maturity level of innovation management in organizations. 
Originality/Value: The instrument is a robust diagnostic instrument and, with appropriate adaptations, it can be replicated and used in other contexts and countries, providing international comparative studies.


INTRODUCTION
Brazil has the opportunity to establish an institutional environment more conducive to innovation. In the Global Innovation Index, Brazil ranks 64th in a list of 126 countries (Cornell University et al., 2018). Considering its low innovation performance in comparison to other economies (Gonçalves et Ferreira Neto, 2016;Hui-bo et Bingwen, 2011;Mehta, 2018), the Brazilian Entrepreneurial Mobilization for Innovation (MEI) launched the Brazilian Innovation Award (PNI) in 2006. The award is coordinated by the Brazilian Industry Confederation (CNI) and the Brazilian Service of Support for Micro and Small Enterprises (SEBRAE) and is intended to stimulate innovation in the Brazilian private sector.
Until 2015, the award evaluated only individual innovation projects, not the entirety of the organizational environment. The 2016-2017 award, in which 2,651 companies participated-including medium and large industrial companies and small companies from all sectors-underwent a profound reformulation of its objectives, its conceptual model, and its assessment instrument. From then on, the objective of the award was redefined to encourage and recognize successful efforts in innovation management within organizations operating in Brazil. The evaluation process also began to consider a more modern and comprehensive view of innovation, assessing the environment of the organization in a holistic and unified way, considering not only its innovation inputs and processes, but also its outputs and performance effects (CNI, 2018a).
The award process consists of several phases. In the first phase, the candidate companies respond to a sixty-question self-assessment questionnaire. Companies that advance to the second phase must present arguments (evaluated by a committee of innovation experts) that justify the performance obtained in the self-assessment. Companies that continue to the next stage undergo an audit visit. The finalists are then selected to be judged by a committee of experts and representatives of public and private innovation bodies in Brazil. The empirical results, derived from the application of the assessment instrument, allowed the construction of a comprehensive and diversified database, encompassing companies of different sizes, sectors, and levels of technological dynamism and complexity. According to Sekaran (1992) and Hayes (1992), the availability of this database makes it possible to carry out a statistical analysis of the validity (i.e., whether it measures what it was designed to measure) and reliability (i.e., whether it has any significant measurement errors) of the assessment instrument.
Using this opportunity, the present article analyzes the validity and reliability of the Brazilian Innovation Award's assessment instrument, using multivariate analysis techniques on the data from 2,651 companies active in Brazil, with reference to the results of the 2016-2017 award. Two hypotheses are tested. The first hypothesis (H1), that the assessment instrument has satisfactory levels of reliability, was tested using Pearson's correlation and Cronbach's alpha. The second hypothesis (H2), that the assessment instrument presents satisfactory levels of correlation between the variables and their underlying constructs, was tested using confirmatory factor analysis (CFA). The article is structured as follows: the next section presents the theoretical basis and contextualizes the hypotheses to be tested. The methods section presents the approach used to analyze the validity and reliability of the instrument. The results and discussion section explores the results, implications, and limitations. The final section presents the conclusions.

THEORETICAL BACKGROUND
The theoretical and methodological foundation of the 2016-2017 PNI is based on studies about innovation assessment in organizations, such as Corsi et Neau (2015), Hervas-Oliver et al. (2015), Laforet (2013), Leal-Rodríguez et al. (2015), Saunila et al. (2014a) and Saunila et al. (2014b). Influenced by the characteristics of respected Brazilian and international reference models such as the Brazilian Quality Award (MEG-PNQ), the Capability Maturity Model Integration (CMM-I) and the Malcolm Baldrige National Quality Award were designed to measure the degree of maturity of innovation management in companies from a combined analysis of two perspectives: (1) the organizational dimensions of innovation capability, expressed in terms of its inputs and processes, and (2) the innovation results, expressed in terms of innovation outputs and outcomes (CNI, 2018b).
The dimensions include organizational aspects (Crossan et Apaydin, 2010;Francis, 2005;Narcizo et al., 2013;Saunila et Ukko, 2012) that enable and support innovation in companies in terms of initiatives, processes, and managerial practices. The main conceptual support for the definition of the organizational dimensions of innovation capability are initially linked to Narcizo (2012), in the preliminary assessment model, and later to Narcizo (2017), Narcizo et al. (2017) and  as reference models for the maturity of innovation capability. The PNI's conceptual model has ten organizational dimensions, each consisting of four variables. Table 1 presents the forty variables, grouped in their respective organizational dimensions. Considering different performance themes (Edwards et al., 2005;Hervas-Oliver et al., 2015;Ngo et O'Cass, 2012;OECD, 2005;Simpson et al., 2006;Stock et Zacharias, 2011), the innovation results measure the degree of success obtained by companies from their innovative outputs. Innovation results relate to the types of innovations that the companies successfully launch and are expressed in alignment with the third edition of the Oslo Manual (OECD, 2005) in terms of product, process, marketing, and organizational innovations. Table 2 shows the twenty performance variables, grouped into four themes for innovation results present in the PNI's conceptual model. Table 1 and Table 2, the variables connected to the dimensions of innovation capability and the themes of innovation results are the variables for PNI's assessment instrument. This is not the only instrument for assessing innovation in organizations, however. The specialized literature presents many conceptual models and instruments to assess innovation in organizations, including those in Martínez-Román et al. (2011), Saunila et Ukko (2014) and Yang (2012).

Based on
The PNI's assessment instrument uses an interval scale, where numbers are used to classify objects and events so that the distance between them is equal. These scales are adopted when one wants to measure concepts as perceptions about events through classification. Interval scales of classification typically evolve the use of affirmations accompanied by pre-coded categories (Hair et al., 2005). The application of the instrument recommends the use of metric scales, making possible the statistical treatment of the results. To facilitate the responses, the use of a semantic differential scale was adopted, as it is characterized by the use of five or seven points, with bipolar final labels, and may contain an intermediate point as well (Hair et al., 2005). The PNI instrument adopted a seven-point scale, with labels at the ends and at the center of the scale.
When developing this type of measuring instrument, it is essential to ensure that it is reliable and valid (Hayes, 2008). The accuracy of an instrument's variables in measuring the associated concept represents the validity, while the instrument's reliability is connected to its coherence (Hair et al. 2005). Bolarinwa (2015) has argued that, throughout the history of scientific research, scholars have been dedicated to the development of assessment instruments that are accurate enough, so that possible errors and deviations do not compromise the results of the research. The relevance of the reliability and validity of research instruments is shown in Martinez-Lorente et al. (1998), Moustakis et al. (2006), Saravanan et Rao (2006) and Torbica et Stroh (2000). Based on this context, the first hypothesis, H1, has been developed, as follows:

H1:
The PNI assessment instrument presents satisfactory levels of reliability.
Complementarily, although an instrument's reliability is one of the factors necessary to ensure its scientific value, it is not sufficient to support inferences. After the construction of an assessment instrument, it is therefore necessary to perform tests to verify that the instrument will measure what it was designed to measure. When it is possible to affirm that the assessment instrument measures what it has proposed to measure, one can then say that it has scientific validity (Bolarinwa, 2015;Hair et al., 2005;Nunnally, 1967;Sekaran, 2003).
To ensure the validity of an instrument, it is necessary to first guarantee its reliability. An instrument does not need to be valid to be reliable: an instrument can be internally aligned-and thus reliable-even if its indicators do not measure the desired construct. It is impossible for an instrument, whose indicators are not aligned, to measure a specific construct. Unlike reliability, the validity of an instrument D9. Innovation-enabling processes D.9.1 Technological surveillance: the company's ability to anticipate the development of new products (goods or services) or processes. D.9.2. Technological sophistication: how the company remains competitive using new technologies in its products (goods or services) and processes. D.9.3. Management of development projects: how the company conducts the development of a new product (good or service), process, or technology. D.9.4. Flexibility: the characteristics of the processes implemented by the company. is not a property of the instrument per se, but is defined by the degree to which the correct interpretations of the questions can be guaranteed. The validity of an instrument is therefore related to the respondent's interpretation of the questions (Kimberlin et Winterstein, 2008). Drawing on this background, the hypothesis H2 was thus developed:

H2:
The PNI assessment instrument presents satisfactory levels of correlation between the variables and their underlying constructs.
According to Sekaran (2003) and Hair et al. (2005) there are different types of tests to verify the validity of an assessment instrument. These tests can be classified into three main groups: content validation, which verifies how well the set of indicators delineates the concept to be measured; criterion validation, which verifies the ability of the instrument to differentiate the respondents, when it is expected to do so; and construct validation, which examines the relationships between variables and is derived from both content and criterion validation strategies. Validity can be obtained  through factor analysis (Hair et al. 2005), which is an interdependence technique whose main purpose is to define the inherent structure among the variables in the analysis. Factor analysis was done by analyzing the structure of the relationships between the variables to identify the variables that are strongly interrelated and to group them together. A factor is a group of strongly intercorrelated variables (Hair et al., 2009).

METHOD
The data analysis of this study consisted of two main steps, as shown in Figure 1. In Step 1, it was prepared the data from the available database, which included the responses of 2,651 companies of different sizes, sectors of activity, and regions of Brazil. According to Hair et al. (2009), when using multivariate analysis techniques, it is important that the researcher prepare the data to avoid biased results or low significance. Data preparation included the following steps: (i) evaluation of lost data, (ii) identification of atypical observations, and (iii) testing of the assumptions inherent in multivariate analysis techniques.
The evaluation of lost data sought to identify whether all the variables in the assessment instrument had valid values and if these data were available for analysis. There was no loss of data because it was mandatory to answer all questions in the instrument due to the diagnostic process adopted by the PNI, and the instrument included information control mechanisms to guarantee it. Similarly, due to the use of an information system to apply the instrument and collect the company responses, no atypical observations were identified that could cause distortions in the analysis. Finally, due to the analysis techniques used in this study, it was not necessary to perform tests to verify whether the data violated the requirements of normality, homoscedasticity, and linearity, as proposed by Hair et al. (2009).
In view of the considerations presented for the preparation of the data (Step 1), it was possible to proceed to the analysis (Step 2). Considering the two hypotheses, H1 and H2, Cronbach's alpha and Pearson's correlation were chosen to test H1 and CFA was chosen to test H2, as shown in Table 3. Hypothesis H1 was tested using the internal consistency approach, which considers the degree to which the indicators measure the same object and is an indication of the homogeneity of the items that make up a questionnaire. When the condition of uniformity is not reached, the proper functioning of the instrument is not feasible, and it loses its scientific usefulness. The degree of an instrument's internal consistency can be obtained through the Cronbach's alpha and split-half methods. The split-half method determines the correlation between two halves of the same questionnaire, while Cronbach's alpha verifies the degree of convergence of the indicators for the same construct in an assessment instrument, measured by the alpha correlation coefficient. This is one of the most commonly used methods for determining the reliability of an instrument based on internal consistency, and when using Cronbach's alpha, the closer to the absolute value of 1.00 the coefficient is, the greater the correlation between the evaluated items (Bonett et Wright, 2015;Cronbach, 1951;Hayes, 1992;2008;Nunnally et Bernstein, 1967;Sekaran, 1992;2003). Hayes (1992) suggests the use of Pearson's equation to establish the correlation index and to start evaluating the reliability of an instrument. In this technique, the linear relationship between two variables can be represented by a single number, which is called the Pearson's correlation coefficient. This coefficient indicates the intensity and direction of the correlation between two variables, where the intensity is given by the absolute value of the coefficient, and the direction is given by its signal. Thus, for Hypothesis 1, the reliability of the instrument was obtained through the analysis of its internal consistency, verified by the Pearson's correlation and Cronbach's alpha coefficients. Reliability is indicat- Figure 1. Method overview.

Brazilian Journal of Operations & Production Management
Volume 16, Número 2, 2019, pp. 201-212 DOI: 10.14488/BJOPM.2019 ed by the value of this coefficient, considering the individual correlations acceptable by the Pearson's correlation, calculated based on the values of the correlations between the questions of the assessment instrument. Sample size is important in defining the acceptance limit of Cronbach's alpha. Samples greater than thirty cases are statistically sufficient and more reliable in the accuracy of the alpha, which can be considered to have a minimum limit of 0.6 (Flynn et al., 1994). The acceptance limit of Cronbach's alpha was 0.60, as recommended by Hair et al. (2005) for the acceptance of Cronbach's coefficient in management studies.
In turn, Hypothesis H2 sought to verify the validity of the assessment instrument. The construct validity test was performed. This was justified because the content validity analysis was already considered in the construction of the conceptual model to support the instrument (CNI, 2018b). Criterion validation was also considered unnecessary, because both the conceptual model and the assessment instrument were constructed from a universal perspective, without distinguishing between different types of respondent companies. Construct validation was therefore used to ensure that observable variables were adequately correlated with the concept to which they were associated. The validity was verified through CFA. Considering that there was already an instrument designed and in use, the purpose of this technique was to confirm or reject the previously formulated conceptual model and measurement theory.
CFA is applicable when there are already preconceived groupings before the calculations are executed and the allocation of each variable in the group has already been defined (Hair et al., 2009). In this situation, it is useful to formulate hypotheses about the distribution of variables by factors, considering which variables best fit a given factor or which the best quantity of factors is. It is not possible to perform a CFA without a theory of measurement, because in doing so, the researcher must define a priori the quantity of factors and the variables within each factor with theoretical support. The factors obtained through the application of CFA can serve as the input to estimate the reliability of constructs. This estimator is obtained by the square of the sum of the factor loads of each construct and the sum of the variances of the constructs. For confirmation of construct validation, Hair et al. (2009) suggests that the loads should be greater than 0.50 and that the measures of variance extracted should equal or exceed 50%, which is found with eigenvalues greater than 1.00 for a single factor (component).

RESULTS AND DISCUSSION
Regarding the evaluation of the internal consistency of the instrument, Figure 2 presents the Pearson's correlation and Cronbach's alpha coefficients found for the variables of each dimension. As can be observed, a Cronbach's alpha greater than 0.60 was obtained for all of the instrument's innovation capability dimensions. Furthermore, considering Hair et al. (2005), because Pearson's correlations were between 0.2 and 0.9, the correlations among the variables were considered sufficient, and it is likely that there was a coherent and systematic relationship between variables. Correlation coefficients between 0.91 and 1.00 are very strong and indicate that covariance is decidedly shared between the two variables being examined. The coefficients 0.00 and 0.20, however, indicate the chance that the associated null hypothesis will not be rejected. This means that all the dimensions of the instrument have satisfactory internal consistency.  Following the evaluation of the internal consistency of the instrument, Figure 3 presents the Pearson's correlation and Cronbach's alpha coefficients found for the variables of each innovation result theme. As can be observed, a Cronbach's alpha greater than 0.60 was obtained for all of them, so it appears that all of the themes have satisfactory internal consistency.

Cronbach's alpha
However, Theme 3 (Workplace organization) presented a Cronbach's alpha lower than the values found for the other constructs analyzed. This construct also presented a correlation of 0.269 between variables, which is considered small, although it is sufficient. The three variables related to Theme 3 (T.3.1, T.3.2, and T.3.3) derived directly from the guidelines contained in the Oslo Manual (OECD, 2005).
Verifying the individual relationships between the variables (suppressed in this article), it is noticeable that variable T.3.1 (development of strong relations with the consumers) may weaken the relationships among the others, consequently resulting in a Cronbach's alpha that is less significant for the internal consistency of this theme. This interpretation suggests that there are still opportunities for further studies to analyze the clusters variables for innovation results derived from the Oslo Manual (OECD, 2005).  Once the internal consistency of Theme 3 was considered satisfactory via Cronbach's alpha, no modifications were made to this construct. It would, however, be interesting to carry out further tests or new consistency analysis using the data from subsequent PNIs. Depending on the results obtained, it may be advisable to adapt variable T.3.1 or even to substitute another variable that is more conceptually adherent to the construct under evaluation.  Regarding the validity of the constructs, Figure 4 shows that all dimensions have a variance greater than 50% and eigenvalues greater than 1, indicating that the extraction of a single factor was sufficient in each of the dimensions and ensuring the consideration of a single construct per dimension  was appropriate. Each dimension possessed an adequate extraction of the representative basic concept; thus, it was possible to evaluate the loading of the variables associated with each of them, as shown in Figure 5. Figure 5 indicates that the variables, for all dimensions, had loads greater than 0.5, demonstrating the validity of these constructs. In addition to the loads being satisfactory, the load values of the variables in the same dimension were close to each other, suggesting that, in addition to the validity of the construct, there was a roughly uniform relationship between the variables and the concept underlying that dimension.  Regarding the validity of the constructs, Figure 6 also indicates that, for all themes, the variance of the variables exceeded 50% and had eigenvalues greater than 1. These results indicate that the extraction of a single factor was sufficient, confirming as adequate the consideration of a single construct per theme. As can be seen in Figure 7, the respective variables for each theme also had loads greater than 0.5, indicating the validation of these constructs.

Construct (Themes) % of Variance Eigenvalue
From the presented results, it is possible to verify hypotheses H1 and H2. The first hypothesis tested was H1; the PNI assessment instrument has satisfactory levels of reliability. The use of Cronbach's alpha to verify the internal consistency of the instrument presented satisfactory results for all constructs. Thus, the hypothesis was accepted, and the reliability of the instrument was attested by its internal consistency. Although the tests presented results that met the established success variables, the variable T.3.1 (development of strong relationships with the consumers) was determined to be a point requiring attention. The interpretation of the results indicated that there was the possibility that this variable may weaken the consistency of this construct. For future studies and applications of this instrument, it would be recommended that special attention be paid to this variable and its relationship with Theme 3.
The second hypothesis tested was H2; the PNI assessment instrument presents satisfactory levels of correlation between the variables and their underlying constructs. The use of CFA to verify the validity of the instrument presented satisfactory results for all constructs. The hypothesis was therefore accepted, and the validity of the instrument attested through the correlations between the variables and the constructs in which they were inserted. Although the analysis presented results satisfying the minimum acceptable values, a review of the variables that compose the constructs referring to the "Innovation Results Themes" is recommended. The results suggest that the contents of these variables have underlying concepts that may possibly be more dispersed and diversified than the concepts for the dimensions' variables, which behaved in a more homogeneous way. Although this does not imply low reliability or validity-nor even the existence of conceptual misunderstandings regarding the content or the allocation of these variables-it would be advisable to perform new tests and improvements in these items in other studies and future applications of the instrument.

CONCLUSION
This article sought to verify the reliability and validity of the assessment instrument used in the Brazilian National  Innovation Award. Multivariate analysis techniques were employed on the database generated by the 2016-2017 award, which involved the participation of 2,651 companies of different sizes and sectors. Pearson's correlation and Cronbach's alpha coefficients were used to analyze the reliability of the instrument, while CFA was used to analyze the instrument's validity.
The results obtained from the analysis indicated that the instrument was in adequate condition to use and demonstrated its accuracy in assessing the maturity level of the innovation management in organizations through a combined assessment of the organizational dimensions of innovation capability and themes of innovation outcome. In addition to the statistical validation of PNI's assessment instrument, the analysis presented in this article also provided a better understanding of the relationships between the evaluation variables and their respective constructs, contributing to the increased robustness, reliability, and representativeness of the data obtained from the application of this instrument in future iterations of the award.
The organizational dimensions of innovation capability demonstrated significant reliability and validity. The themes of innovation results also demonstrated reliability and validity, but there were still opportunities for improvement. Theme 3 (Workplace organization) obtained a value below the recommended minimum for the Pearson's correlation among its variables. However, these results do not compromise their quality. Therefore, the assessment instrument used in the Brazilian National Innovation Award does appear to be a robust diagnostic tool for the level of maturity of the innovation management in companies in Brazil. In view of the representative and heterogeneous number of organizations that responded to the questionnaire, in the case of appropriate adaptations-particularly in terms of translations, linguistic, and cultural adjustments-it is believed that this instrument could be successfully used in other contexts and countries.