This study investigates students’ and adults’ performance in judging reasonableness of computational results, namely reflecting on whether these results qualify as acceptable answers to mathematical tasks. Data was gathered via task-based questionnaires from 160 participants, evenly divided between fifth-graders and adults. Their responses to a systematically varied collection of context-free and context-based items shed light on their performance and strategies in relation to two interrelated aspects of reasonableness: number relationships and the effect of operations (

The

Drawing on previous theoretical discussions about the facets of reasonableness (Alajmi & Reys,

The aspects of judging reasonableness: internal and external reasonableness

The first criterion refers to the consistency of a computational result with expectations relevant to the size and properties of the involved numbers, the relationships between them, and the effects of operations on them. We name this criterion

The second criterion refers to the practicality of a computational result, namely to its

The internal and external reasonableness distinction fits well with that between routine and adaptive expertise (Hatano,

Routine and adaptive expertise in judging reasonableness

Routine expertise | Adaptive expertise | |
---|---|---|

Internal reasonableness | ||

which results may qualify as acceptable answers and may require critical reflection upon the known calculation techniques | ||

External reasonableness | ||

Past studies with a specific focus on students’ abilities for judging reasonableness are limited. However, there exists evidence suggesting that students at all school levels typically lack many of the competencies required for obtaining reasonable results. For instance, Alajmi and Reys (

Additional enlightening findings come from studies that concentrated on concepts closely related to the judgement of reasonableness, albeit without explicitly focusing on it. For example, Menon (

Overall, previous findings suggest that students tend to encounter numerous challenges in distinguishing between reasonable and unreasonable results, which span both internal and external reasonableness. Among the factors that contribute to their unsatisfactory performance appears to be their tendency to prefer unproductive algorithmic strategies over number-sense-based ones, as well as their failure to make connections between mathematics and the real world.

Reasonableness is being increasingly considered by researchers in the field. However, research with an explicit focus on its underlying demands, such as the present study, remains scarce. Instead, reasonableness has been typically treated as one aspect among many others in studies elaborating on various topics, including number sense, sense making, and computational estimation (e.g., LeFevre et al.,

The present study aspires to contribute to the body of literature that investigates the intersection of the two topics (e.g., Alajmi & Reys,

A total of 160 participants were selected to participate in this study through a combination of convenience and purposive sampling: accessible participants were recruited, ensuring that the sample was evenly divided in terms of age (80 fifth-graders, coded as S1-S80; 80 adults, coded as A81-A160) and gender (80 males; 80 females). The fifth-graders (average age: 10 years 10 months) were studying at two primary schools in Thessaloniki, Greece; their selection was conditional upon achieving a varied sample in terms of socioeconomic background and academic performance. The selection of the adult participants (age range: 18–64 years, 25% over 46 years of age) also aimed at a varied educational background: the majority had graduated from middle school (98.7%), and significant proportions also held higher education qualifications (72.4%), including master’s (10%) and doctoral degrees (2.5%).

These age groups were chosen for two reasons. Firstly, in contrast to very young learners, fifth-graders have sufficient mastery of numbers and operations since they normally receive instruction on both natural and decimal numbers from first and third grades onward, respectively. However, instruction is in line with typical mathematics practices in Greece and focuses on fluent application and computation of the algorithms for the operations, while judging reasonableness is not included in the most recent primary school mathematics curriculum (Greek Ministry of Education [MINEDU],

Data were collected through task-based questionnaires which included demographic questions and two tasks, each consisting of eight items, following a cross-sectional design. In all items, the participants were asked to provide an answer alongside an explanation of their thinking. The reason why we asked participants to do both is because reasonableness and correctness are interrelated, but not equivalent terms. In fact, a result may be reasonable despite being erroneous. For example, 6364 is clearly an unreasonable result for 7 × 9092; in contrast, although erroneous, 63,654 might appear as reasonable at first glance. In our design, we decided to avoid boundary cases like this: all correct results were also reasonable, whereas all the erroneous results were also unreasonable for at least one apparent reason (e.g. in the previous example: 6364 < 9092). Moreover, the requirement for the justification of answers enabled us to know whether participants made appropriate judgements about the

The tasks were designed for the needs of the study, and each was targeted towards a different aspect of reasonableness. In Task 1 (internal reasonableness), participants were asked to decide whether eight computational results of horizontal multiplications and divisions were true or false and justify their answers (e.g.,

Both tasks included a systematic variation of items, based upon three binary classification conditions: (1) number type:

The 16 items of the questionnaire

Correct computation result | Erroneous computation result | ||||
---|---|---|---|---|---|

Multiplication | Division | Multiplication | Division | ||

Task 1 | Natural numbers | 709 × 50 = 35,450 | 696 : 4 = 174 | 7 × 9,092 = 6364 | 10,800 : 9 = 12,000 |

Decimal numbers | 9.85 × 11.04 = 108.744 | 486.2 : 1.1 = 442 | 25.3 × 5 = 107.3 | 74.8 : 3 = 26.2 | |

Task 2 | Natural numbers | 4 balls cost 80€. How much do 8 such balls cost? | 4 cereal bars cost 2€. How much does 1 bar cost? | 2 shirts need 4 hours to fully air-dry. How long does it take for 6 shirts to air-dry? | How many 30-seat buses are needed to transport a group of 315 travellers? |

Decimal numbers | 1 coffee bag weighs 0.4 kg. How much do 3 such coffee bags weigh? | 12.4 l oil is equally distributed among 4 containers. How many litres will each one of them contain? | At the age of 10 Nick is 1.30 m tall. How tall will Nick be at the age of 20? | 1 tissue pack costs 0.40€. How many packs can we get with 5€? |

The descriptions of the items in Task 2 have been abridged.

Participation in the study was voluntary and anonymity was guaranteed. Students were examined in their classroom during school time, while adults were examined at a place and time of their choice. All participants were examined individually and were not given a time limit to complete the questionnaire. The average time of completion was 40 minutes for students and 25 minutes for adults.

The participants’ mean correct response in the total number of 16 items was 12.54 (

Mean number of correct responses (mx = 8) by task and age group

The type of numbers used in the items had a significant effect on correct responses (

The two-way interaction between type of numbers and type of tasks was found significant (

Last, the three-way interaction between type of numbers, type of tasks, and age was significant (

The main term of arithmetic operation was significant (

The interaction between operation and type of tasks was found significant (

The type of result was found to be a statistically significant variable in determining participants’ success (

The ANOVA produced a significant difference for the two-term interaction between type of result and type of task (

The type of result, type of task, and age interaction were significant (

Mean number of correct responses in both tasks by number set, operation, and computational result for the two age groups

5th graders | Adults | 5th graders | Adults | |
---|---|---|---|---|

3.54 (0.73) | 3.82 (0.41) | 2.28 (0.91) | 3.65 (0.64) | |

2.43 (0.85) | 3.19 (0.81) | 2.47 (0.99) | 3.70 (0.60) | |

3.09 (0.62) | 3.48 (0.57) | 2.17 (0.99) | 3.56 (0.82) | |

2.87 (0.91) | 3.54 (0.61) | 2.58 (1.03) | 3.79 (0.47) | |

2.45 (0.95) | 3.11 (0.93) | 3.44 (0.79) | 3.95 (0.22) | |

3.51 (0.67) | 3.90 (0.34) | 1.31 (1.29) | 3.40 (1.02) |

Maximum correct score is 4; format: mean (SD).

The explanations accompanying participants’ answers were analysed for themes revealing the strategies they used. For each Task, one set of themes was generated which varied in terms of sophistication, popularity, and efficacy.

Having excluded the

Mean strategy use (mx = 8) and correlation between strategy use and performance by age in Task 1

Overall | 5th graders | Adults | 5th graders | Adults | |
---|---|---|---|---|---|

Without justification | 0.76 | 1.13 | 0.40 | − 0.440** | − 0.175 |

Algorithm | 2.26 | 2.21 | 2.30 | − 0.080 | − 0.174 |

Rules and properties | 2.03 | 2.41 | 1.65 | 0.236 | 0.001 |

Split | 0.83 | 0.41 | 1.25 | 0.249* | 0.308** |

Equivalent expression | 0.35 | 0.29 | 0.41 | 0.125 | − 0.011 |

Computational estimation | 1.77 | 1.55 | 1.99 | 0.206* | 0.194* |

*Significant correlation at the 0.05 level, **significant correlation at the 0.01 level.

As shown in Table

Looking at the efficacy of the different strategies (Table

In Task 2, aside from the

Mean strategy use (mx = 8) and correlation between strategy use and performance by age in Task 2

Overall | 5th graders | Adults | 5th graders | Adults | |
---|---|---|---|---|---|

Without justification | 0.47 | 0.71 | 0.23 | − 0.534** | − 0.007 |

Algorithm | 3.72 | 3.71 | 3.73 | − 0.302* | − 0.645** |

Guess and check | 1.93 | 2.23 | 1.64 | 0.191 | 0.262 |

Practicality | 1.88 | 1.35 | 2.41 | 0.420** | 0.559** |

*Significant correlation at the 0.05 level, **significant correlation at the 0.01 level.

Algorithm-based strategies were by far the most commonly used ones for both age groups. Interestingly, children and adults based their solutions on algorithms with strikingly similar frequency (means: 3.71 and 3.73, respectively). However, further analyses showed that children clearly preferred conventional ways of executing and presenting algorithms as they used written algorithms significantly more frequently (

Despite the clear dominance of algorithm-based solutions, it was found that this strategy was negatively correlated with performance (Pearson’s

In this study, we explored Greek fifth-graders’ and adults’ ability to judge the reasonableness of computational results in context-free and context-based tasks. Responding to the research questions of the study, results revealed three key findings.

First, the performance of adults (14.36/16) was substantially better than that of students (10.71/16) with the adults clearly outperforming students in both tasks. The relatively weak performance of students may be partially attributed to their limited chances for engagement with relevant mathematical activities prior to their participation in the study due to the lack of instructional attention to reasonableness in primary school (Greek Ministry of Education [MINEDU],

Turning to the competencies and difficulties of each age group, the no answer rates and the justifications of answers indicated that adults found it easier to give sensible responses in Task 2, which required examining the meaning of numbers in the real world, compared to Task 1, which revolved around number relationships and the effect of operations. However, the opposite was true for students, a finding that is in opposition to Alajmi and Reys’ (

The second key finding centres upon factors that may facilitate or place obstacles on solvers’ efforts to give reasonable answers. Unlike previous studies that reported consistently low performance across the different number domains (e.g., Alajmi & Reys,

The third key finding focuses on the range of the employed strategies alongside the frequency and the efficacy of each. Overall, two broad categories of strategies emerged: routine-based and sense-making strategies. Despite being less effective, the former clearly prevailed over the latter in terms of frequency. This result is consistent with findings of previous studies showing a general preference for the use of algorithmic techniques (e.g., Alajmi & Reys,

In Task 1, students typically resorted to the use of rules and properties of the involved numbers and operations. These were often misinterpreted, or unrelated to the particular task and insufficient for its solution, highlighting that students often do not make sense of the rules and algorithms they learn, which results in the misunderstanding of their meaning or their limitations (Markovits & Sowder,

The strategies used in Task 2 offer similar insights. The use of algorithms tended to be associated with an increased risk of giving inappropriate responses. The very effective adaptive strategy of filtering results through considering the task context was again much more popular among adults than among students. In general, students very often did not pursue connections between mathematics and real life, which is consistent with findings from past studies (e.g., Yang & Sianturi,

This study was subject to at least two limitations. First, concerning the selected sampling technique, although the combination of purposive and convenience sampling enabled a varied sample, it is unknown whether our participants are typical Greek adults and fifth-graders. Second, turning to the design of the task-based questionaries, the two aspects of reasonableness are deeply interrelated, and thus, it is technically impossible for them to be completely distinguished from one another. To tackle this issue, we included only context-free items in Task 1; the absence of context eliminated the need for pursuing connections between the numbers and the real world, encouraging judgements solely based on the relationships between numbers and the effect of operations on them, namely the internal reasonableness. In contrast, since external reasonableness was the focus of Task 2, this included only context-based items that enabled reflection upon the meaning of results in real-life situations. The numbers and the operations involved in Task 2 were fairly easy to eliminate the need for judgements based on internal reasonableness. Additionally, sometimes different items within the same task may have favoured the use of different strategies. For example, in the ÷ ✗items of Task 2, it was probably easier to adjust the result of the algorithm than to avoid the use of algorithms altogether, while for the × ✗ items of the same task, the opposite was probably true. All these may have eliminated the gap that typically exists between the level of difficulty of multiplications and divisions as well as natural and decimal numbers. Thus, a selection of more complex items in terms of operations and numbers might shed light on this issue.

Given the previous limitation regarding the generalisability of our results, studies with a larger number of participants recruited through more refined sampling techniques could be conducted in the future. We also recommend exploring the performance and strategies of students in early primary grades which has largely remained understudied. Finally, all past studies have revealed severe weaknesses in students’ understandings about the concept of reasonableness, which stresses the need for classroom-based intervention studies. This research direction can offer valuable insights into how students at different school levels can be appropriately introduced to the concept of judging reasonableness, as well as how they can best meet these multidimensional ability demands.

^{th}grade students in Taiwan