OPINION

The Assessment that Fails to Assess

November 16, 2015

Rafael E. de Hoyos Article published in Nexos journal, October 2014

Winner article of the 2014 National Journalism Awards (Mexico), in the commentary/in-depth writing article category

The mandate of the Mexican education system (like that of most countries), is to fulfill the right of all children and youth in school-age to access quality education services. Evidence shows that the strict compliance of this mandate will be sufficient to improve economic growth rates, reduce income inequality and poverty, and promote social mobility. On the other hand, Mexico invests 5.3 % of its Gross Domestic Product (GDP) or more than 20 % of the Federal Government’s total budget in public education. This is a considerable investment and taxpayers have the right to know if the education system is satisfying its part of the social contract by providing quality education to all Mexican children and youth. Therefore, considering that access to quality education is a social right, added to the importance education has in the country’s social and economic development and the size of public investments in this area, the society must have reliable tools to measure and monitor the educational system’s performance.

How can we measure if the education system is complying with its mandate? Provision of quality education to all children and young people can be monitored using three indicators: (1) Coverage, this is the proportion of school-age children and youth enrolled in the educational system; (2) the system’s average quality of service provision; (3) dispersion on the quality of services or quality gaps in the services received by students. Measuring access is relatively easy; children are either in the system or they are out. Since 1990, official statistics reported by the Secretariat of Public Education (SEP) report the system’s gross coverage and other efficiency-related indicators. However, the definition, and hence the measure of the quality of education services has been the subject of much debate. One way —simple if you may— to approach measurement the quality of education is to apply achievement tests in certain knowledge areas. There is nothing new about this. For decades teachers all over the world have been applying tests to their students. But just as a teacher in the classroom must apply similar assessment criteria to all students, the same is true when assessing students in different classrooms, schools, municipalities and states. In this case the test’s complexity, contents, application process and scoring criteria should be comparable across all students. Standardized tests align these criteria to measure all students “with the same yardstick”. When well designed, standardized tests provide reliable information about achievement or knowledge levels in certain areas and are therefore, good insights of the system’s mean quality and its dispersion or distribution.

Tests to Measure or to Measure and Improve?

Since the starting point is the right to receive quality education, a required condition for the State to enforce that right is to have reliable and transparent tools that measure quality. Measuring compliance of the education system’s mandate is a strong enough case to justify the need to have standardized tests. However, this is not the only argument. Evidence has shown that standardized tests can be powerful tools to improve the quality of education services.

There are at least two links between standardized tests and the quality of the education system. Mediation of standardized tests between the State’s duty to provide quality education and parent’s right to demand it makes implementation and public dissemination sufficient to trigger a virtuous cycle among accountability, citizen participation and outcomes. Parents better informed about the quality of education services tend to participate more actively in the teaching process of their children and demand better outcomes from education authorities. Standardized tests are also useful to align incentives of all stakeholders in the system, including federal and state authorities, school supervisors, school directors, teachers, students and parents, around what is really important, namely, that all children and young people can effectively learn. This incentive alignment provides the stakeholders of the learning process with a common starting point and helps them design improvement strategies to address the challenges identified and set goals. Notwithstanding, merely running a standardized test does not guarantee that all of its related benefits will be capitalized.

Measuring the Quality of Education Services in Mexico

During the past few years, Mexico has made significant progress in the use of standardized tests to measure the quality of education services. With international tests such as the “Programme for International Student Assessment” (PISA) designed by the OECD or UNESCO’s PERSE and SERCE tests, we have a snapshot of the quality of services provided by the Mexican education system which can be compared to other countries.

Perhaps the most relevant experience in the application of standardized tests in Mexico began in 2006 and gained greater relevance, in 2007, with the application of the “National Assessment of Student Performance in Schools”, ENLACE. In that year, all children from grades 3 to 6 in primary schools, and in grades 7 to 9 in secondary schools took part in the ENLACE test, which measured performance in Math, language (Spanish) and a revolving subject; as of 2008, the ENLACE test also assesses students in their last year of upper secondary (high school), a level which has been recently declared compulsory in the country. As in all international tests, ENLACE provides a snapshot of the quality of the education services provided by the system and given that it is a census, ENLACE can make a diagnostic about performance levels in each state, municipality, town, school, grade and student in the system. For the first time in Mexico, the information provided by ENLACE, assured that all system’s stakeholders (federal and state authorities, supervisors, principals, teachers, students and parents) start from a common diagnosis based on a tool that measured what is really important, namely, what children and youth are learning in classrooms.

Figure 1: Model of information poster showing ENLACE results for a school

Source: ENLACE website, http://www.enlace.sep.gob.mx/

ENLACE opened the possibility of strengthening —in all school of the country–– the dynamic between accountability, citizen participation and outcomes. Each school was given an information poster with the results in a format similar to the one shown in Figure 1. The principal, teachers, parents, and students from all Mexican schools could see how average achievement levels evolved in the past few years. It was also possible to make comparisons between school’s achievement levels vis-à-vis the average among schools of the same condition (public or private), in the same state and in in localities with the same levels of poverty or marginalization. In ENLACE, the outcomes at the school level could be disaggregated by knowledge area, grade, group, and test item, to identify the contents most challenging for students. Figure 2 shows an example of the detailed diagnosis available to all principals and teachers in Mexico, allowing them to focus their efforts on areas with most flaws. This information could also be used by principals and teachers to design improvement strategies involving teacher training for those areas of knowledge that were the most challenging.

Figure 2: Example of School Level Results Disaggregated by Grade and Test Item

Source: Test Internet website, http://www.enlace.sep.gob.mx/

Limitations of Standardized Tests

Just as any measuring tool, standardized tests have limitations. Critics of standardized tests, particularly those for the ENLACE test in Mexico, make the following claims:

a. These tests create incentives so that school stakeholders focus their attention only on the contents covered by the test, neglecting other similar or even more important contents (teaching to the test).

b. Standardized tests capture only part of the knowledge generation processes, which is highly complex.

c. Comparing student achievement outcomes among states, schools, communities, students, etc. is unfair since it doesn’t take into account the differences in cultural, regional, social and economic contexts, among others.

All these valid claims, but none of them invalidates the need to have a standardized test. Instead, they are useful when considering improving and designing an adequate test that can come up with reliable outcomes. Concerning points (a) and (b) above, no one could reasonably claim that a standardized test can and should measure all areas of knowledge, let alone all the processes involved in the production of knowledge. Tests measure achievement levels in certain areas of knowledge; the concerns about creating incentives to teach only to succeed in the test can be addressed in the test design. If we know that the system will focus on the test contents (teaching to the test) and these contents are, by their very definition, limited, then the test should measure what students should know through a representative sample of all relevant contents by area of knowledge. Test items should change every year but keeping their statistical representation of the universe of knowledge. If the test design has these features, then teaching to the test should not be a matter of concern. In other words, much of the arguments against standardized tests are of a methodological or statistical nature.

The comparison of results among schools or students from different backgrounds (point (c) above) can be addressed by contextualizing these results. In Mexico, as shown in Figure 1, comparisons of ENLACE test results were contextualized by school type (public or private) and marginalization level —as defined by the National Population Council. But the most relevant comparison is not among different schools but of each school with itself, i.e. the evolution in achievement levels within the same school. By definition, this comparison removes all elements that are fixed in time (or that change very slowly), such as social and economic context, cultural level, among others. However, contextualizing results is very different from contextualizing test contents. The right for quality education for all and measuring its fulfillment should be through homogenous and common standards applicable to all of the country’s cultural, social and economic diversity.

Perhaps the strongest criticism to Mexico’s standardized test, ENLACE, refers to the lack of control during its implementation. In the outset, ENLACE was designed with few controls to guarantee reliable outcomes against potential cheating. The core argument was that given the low stakes attached to the test, there were no incentives for cheating in ENLACE. This was true during the first years of test implementation, when the society was not familiar with ENLACE and therefore, did not demand better results from school officials (principals and teachers). As society’s knowledge about the test and the relationship among accountability, citizen participation and outcomes grew, incentives for cheating developed. Since the test was not adapted to the emergence of cheating incentives --as a result of the increased parental involvement, ENLACE’s success was, to a certain extent, the premise for its failure. The other major issue that resulted in cheating incentives and which ended up in a number of flaws related to the test implementation was linking ENLACE results to the teacher wage compensation program Carrera Magisterial. Clearly, ENLACE was not designed for that purpose, leading to huge incentives for “strategic behavior” (a euphemisms for cheating) and thereby casting doubts on its reliability.

Standardized Tests Are Only a Tool not an Education Policy

In spite of all the benefits related to standardized tests and the need to rely on them to measure compliance of the education system’s mandate, tests are only a tool that can be used to improve service quality. Even the best tool does not guarantee its good use. Having a test that measures student achievement every year is of little use if the relevant stakeholders (policy makers, education authorities, school supervisors, principals and teachers) fail to use these results to design policies that address the issues identified and improve service delivery. Between 2007 and 2013, SEP produced more results than what the system could absorb. Our supervisors, principals and teachers were not trained and lacked the skills required to take full advantage of the information provided by ENLACE year after year.

International and national evidence reveals that guidelines for simple improvements based on a specific diagnostic for individual schools are enough to improve learning outcomes, even in low-performing schools. Positive impacts associated with the correct use of standardized test results at the school level represent a strong case for advocating for a census test. Proper management skills of principals and supervisors must go hand in hand with the results of individual schools, so that these skills can be put to the best use in the design of improvement strategies. Another major pending issue related to the use of results from standardized tests in Mexico is the scarcity of academic papers exploiting the vast information of ENLACE to contribute to the design of evidence-based educational policies. From the standpoint of empirical research, the database generated by ENLACE is a true gold mine where researchers could follow-up the education trajectories of all children and youth in Mexico. This information is priceless to identify the role played by different school inputs, socioeconomic contexts, and personal efforts, among others, in the progress of learning outcomes.

Final Remarks

Mexico has taken major strides by using standardized tests as a tool to monitor the education system’s compliance with its mandate. The country has also made progress in the use of these tests to meet the State’s accountability to citizens. In my opinion based on this progress, today’s debate is not focus on weather standardized tests should exist but rather on its features, implementation protocols and, above all, the way in which we want to use the results as a strategy to improve quality. The ENLACE experience provides several key lessons:

One of the major benefits in terms of accountability and the use of results as an input for improvements is related to the census nature of the test.
If standardized tests are successful in creating a virtuous cycle between accountability and citizen participation, then cheating incentives can emerge and the test should include means to prevent this.
Given expected cheating incentives and other claims associated with the difficulty of attribution, it is not a good idea to use student test scores for teacher evaluation.
No matter how good a measurement tool (test) is, using it to improve services is subject to the skills of users. If we want to fully tap the benefits associated to proper test use, implementation must be complemented with a training strategy for teachers, principals and supervisors.
Relevant decision-makers should publicly disseminate all information related to standardized tests (data confidentiality must be strictly followed) to promote their use in much needed studies and research.

Recently passed Constitutional reforms in education entrust the design and implementation of the country’s assessment system to the National Institute for Education Assessment (INEE), including, among others, measuring student achievement. INEE’s task is not a simple one. Fortunately, the Institute is not starting from scratch and has considerable experience that comes from applying the ENLACE test; even more relevant, INEE relies on high level human capital with the skills to design a test expected to measure, inform and unleash the full potential of the educational system.

The Assessment that Fails to Assess

Newsletters