![]() |
|
|
8 Free-Response
Results
This chapter presents detailed information about the free-response items, the coding schema, and the results. Free-response booklets were randomly interleaved so that the students who received these forms would be randomly selected when booklets were distributed. There was some concern that teachers might, when realizing that a student who could not write well had received a free-response booklet, give that booklet to another student who could write well. The evidence from the responses indicates that, while such a situation may have occurred in isolation, it was not a general item of concern. The free-response booklets all contained the same background information, the School Science attitude scale, a mathematics problem-solving scale, and the same set of core multiple-choice items as in the multiple choice booklets. There were no significant differences among the four multiple choice forms and the two free-response forms at any grade level on the variables common to the two types of forms. For example, the number of students who do not speak English at home, the School Science attitude scores, and the mean score on the core multiple-choice items all matched within acceptable sampling limits. This evidence gives credence to the assumption that the vast majority of forms were randomly distributed among the populations at all the grades. 8.1 Background and Design of the Codes Free-response items were included in order to collect information on the processes and type of thinking which students applied in answering questions in mathematics and science. In this way, it was possible to examine some of the complex and multi-stage thinking which takes place in these subjects. Through the design of two-digit codes or rubrics, it was possible, by the assignment of a single code to a question on a student paper, to collect information on the correctness of response, the method or approach used in the problem, and the misconception or error-type which may have been demonstrated. Rubrics for items selected from the Third International Mathematics and Science Study (TIMSS) had been developed internationally, and were based on empirical evidence from field trials in several countries. An international committee designed and developed final versions of each code. New codes had to be developed for items selected from earlier provincial assessments. The Contract Team developed draft versions of these codes from evidence of students' responses to those items. The rubrics were then validated by the coding teams at the beginning of the coding sessions. Each two-digit code was designed so that the leading digit corresponded to one of three measures: the score (1, 2, or 3 depending on how many marks the question was worth), an incorrect response (a leading 7), or an omitted or frivolous response (a leading 9). The second digit in each case corresponded to which method was used or what type of error was made. An example is shown in Table 8.1. Table 8.1 Two-digit Codes Used for Free-Response Items (5K) .2 Description of the Instruments Two rotated forms were used at each of Grades 4, 7, and 10, and each form consisted of two parts. Part 1 included background questions and core multiple-choice items, 10 each at Grades 4 and 10, and 12 at Grade 7. Part 2 of each form consisted of a mixture of short-answer and extended-response items, 22 at Grade 4, 20 at Grade 7, and 14 at Grade 10. This resulted in a total of 44 free-response items at Grade 4, 40 at Grade 7 and 28 at Grade 10 across each pair of forms, evenly divided between mathematics and science. At each grade level, mathematics items were placed first, followed by science items, on one of the forms. The reverse order was followed on the other form. Content or process measured by items at each grade level are listed next. Mathematics and science are reported separately. 8.2.1 Mathematics Content
Given a potentially wide range of responses from students, it was important that exact procedures were developed for processing booklets and for training coders. To check that the training was effective and codes were assigned consistently, measures of reliability were calculated. Coding procedures are described next. 8.3.1 Preparation of Booklets and Reliability Checks8.3.1.1 Batches and Header Sheets All booklets of the same number were organized into batches of 50. A header sheet was prepared for each batch and a sequential batch number was assigned. 8.3.1.2 Reliability coding A 10% sample of student booklets was drawn from each batch and photocopies were made. The booklets were then re inserted in their original locations in the batches. Student IDs for booklets which had been photocopied were then recorded on the header sheet for later access. The chair proceeded to code the photocopied booklets, recording code allocations onto a Reliability Coding Form. 8.3.1.3 Calculating Reliability Coefficients After the committee had coded the sample papers, clerical staff recorded codes onto the same Reliability Coding Form used earlier by the chair. Correlations were then calculated between codes assigned by the chair and other committee members. 8.3.2 Training of Coding CommitteesA rigorous training procedure was employed with the coding committees. It involved training at both the general and specific levels. A description of the procedures follows. 8.3.2.1 General Session Coders from all grade levels met and the following topics were covered: overview of the assessment, description of the two-digit coding scheme, display and discussion of sample responses and assigned codes, and affirmation that international codes could not be adjusted. 8.3.2.2 Committee Specific Sessions by Grade/Subject Examples, which were grade specific, were reviewed and an opportunity was provided for committee members to practise and then discuss coding exemplars. 8.3.2.3 Calibration Round During the calibration round, a set of sample student booklets was coded individually by each committee member and assigned codes recorded onto a calibration form. Booklets were passed around the committee until all items from each booklet were coded by each member. Results were then discussed and coding differences resolved. 8.3.2.4 Beginning Coding Coders were divided into groups of two or three; one batch of 50 papers was allocated to each group. Coders worked independently but discussed issues with partners. Where agreement could not be reached, the chair mediated the issue. Interesting responses were flagged by committee members. 8.3.3 Inter-Rater ReliabilityTo monitor how consistently the committees assigned codes to student responses, inter-rater reliabilities for first-digit coding agreement were calculated for each booklet number. Results are shown in Table 8.2. Second-digit reliability only has meaning at the individual item level, since the same code (e.g., 71) on two items could mean different types of incorrect student responses. Second-digit reliabilities are therefore not reported. Results show a very high rate of consistency in the assignment of codes to student responses. For example, all first-digit reliabilities were greater than 82, with the majority between 91.2 and 98.3. First-digit reliabilities were higher for mathematics (94.9 to 98.3) than for science (82.7 to 92.2). This result is likely due to more precisely described rubrics in mathematics, where numerical answers were more definitive. These results verify that the coding of student responses on free-response items was highly reliable. The training and monitoring procedures proved to be very effective. Table 8.2 Inter-Rater Reliabilities by Booklet Number (5K) .4 Mathematics Achievement Results 8.4.1 Grade 4 Results The two Grade 4 booklets contained a total of 22 mathematics items: 14 were assigned a weighting of one mark, 6 two marks, and 2 three marks. This resulted in a total mark allocation of 32, and the Mean Percent Correct across all items was 58.1%. In calculating this result, part marks were included for items with weightings of two or three. On the one-mark items, Item Percent Correct values ranged from 41% to 82%. The Mean Percent Correct for all of these items was 61.9%. For the multi-mark items, the mean percent of students who received full marks was 44.9%. An average of 16.9% of the students received part marks on the questions. The items clustered into five major categories. These categories, with corresponding numbers of items, are listed next: place value and magnitude of numbers: 3; graphs and tables: 4; length and area: 2; numbers and patterns: 7; fractions: 3; and other (shapes, conversions, and time): 3. Results for these categories are not intended to be aggregated to produce mean percentage values. Rather, they serve as organizers for discussing item level results and general topic coverage. 8.4.1.1 Examples of Items and Responses Examples, showing one item from each of the discussion categories, follow. Student responses relative to correctness, approach, and misconception or error are discussed. 8.4.1.1.1 Place Value and Magnitudes of Numbers Example (3K) This item was answered correctly by 66% of the students. The most frequent incorrect answer was "7," given by 12% of the students. These students confused place value and the order of terms. For example, the correct answer, which was "700," is located in the hundreds position in the number on the right-hand side of the equation, which is located three columns to the left of the decimal point. The term "7" is also located in the third position to the left on the left hand side of the equation. However, it is the third term to the left of the equal sign, rather than the third column to the left of a decimal point. Other incorrect responses totalling 9% , were spread across a variety of answers. The proportion of students who did not answer this question was 11% , with another 1% providing either crossed-out or illegible answers. 8.4.1.1.2 Graphs and Tables The item on the next page was worth two marks. It was answered correctly, for full marks, by 61% of the students. Part marks, one out of two, were earned by 19%, and another 6% gave an incorrect answer. Almost all (97%) of the students with full marks provided an illustration with all four bars correct for height, placement, and shading. The other 3% who earned full marks provided a complete answer which was correct with the exception of a difference in shading or placement of no more than one set of bars. Since the item was only worth two marks, this variation was permitted for a score of two out of two. Students who earned one mark out of two were fairly evenly divided on the type of error or omission they had made. For example, 10% of all students had the placement, shading, and height correct for one, two, or three bars (at least one bar was completely correct). Another 9% earned one mark by showing all four bars of correct height, but showed two or more errors involving placement or shading. Example (5K) Those who did not earn any marks on this question gave responses which fell into one of the following categories: work is shown, but no bars are drawn, 1%; some other incorrect response, 5%; crossed-out or illegible, 2%; and blank, 13%. 8.4.1.1.3 LENGTH AND AREA Item J1, below, was answered correctly by 65% of respondents. Another 29% gave an incorrect response, a very few (0.4%) gave an illegible answer, and 6% left it blank. Since this item involved the estimation of length, answers within a certain range were considered correct. Three correct responses which were coded within the range of tolerance were as follows: 4, 5, and other numbers within the interval 4 < x < 5.5. The proportions of students giving these answers were 24%, 38%, and 4% respectively. Five categories of incorrect responses were coded. These categories and the corresponding percentage of students providing them are shown next in Table 8.3. Example (3K) Table 8.3 Categories of Incorrect Response to Item 1, Form 4J (5K) 8.4.1.1.4 Numbers and Patterns Sixty-four percent
of the students answered item K16 correctly. Incorrect responses were
given by 27%, illegible or crossed-out answers by 1%, and no response
by 8%.
Correct responses were classified into three categories. The first related to cases where students reported that the number decreases by 4, including the use of terms like minus, subtract, take away, and less. This type of response was given by nearly half (43%) of the students responding incorrectly. Another correct response category included numerical answers such as either the number 30 on its own, or the sequence 30, 26, 22, .... This type of response was given by 17% of the students. The third category, given by 5%, included some combination of the first two, such as 34 - 4 = 30. Incorrect responses were of three types: answers which indicated that the next number increased by 4; answers focussing on the number 4, with no indication of an increase or decrease; and other incorrect answers, showing for example, the wrong pattern of numbers. These types of responses were given by 4%, 6%, and 17% of the students respectively. 8.4.1.1.5 Fractions Example (3K) About two-thirds (65%) of the students gave a correct response to question K16. It was answered incorrectly by 17% and omitted by 16%. Since many answers could be correct, the coding rubric identified the following types: a fraction with numerator greater than 2 and denominator equal to 7; a fraction with numerator equal to 2 and a denominator less than 7; 3/8; 1/2; and other correct fractions. Corresponding percentages of students who gave these answers were 47%, 0%, 6%, 2%, and 11% respectively. A number of incorrect responses, consisting of specific fractions, were anticipated. They were 1/7, 4/14, 2/8, and other incorrect answers. Respective responses matching these were 1%, 1%, 6%, and 9%. 8.4.1.1.6 Metric Measure Example (3K) Students did not do as well on this question as may have been expected. Only 41% answered correctly, 44% gave an incorrect response, 1% wrote an illegible answer, and 14% left it blank. Almost all students who answered correctly gave the answer as 1000. A few (0.4%) wrote the answer in words, as either thousand or one thousand. Incorrect answers, with the corresponding percent of students who gave them, were as follows: 10 (4%), 60 (1%), 100 (18%), 10 000 (2%), and other incorrect (20%). 8.4.1.2 General Comments and Recommendations Overall, students did relatively well on most questions, demonstrating that they tried hard and made serious attempts. Achievement in mathematics ranged from poor to excellent, with overall results being satisfactory. The following observations were made about the nature of some of the responses and misconceptions which were demonstrated:
Given these results, it is recommended
At the Grade 7 level, there were a total of 20 free-response mathematics items across two forms. Fifteen questions were assigned a weighting of one mark, three were assigned two marks, one was assigned three marks, and one was worth four marks. This resulted in a total of 28 marks for free-response items. The Mean Percent Correct for these items was 60.5%. On the one-mark items, students scored an average of 59.7%. Across the multi-mark items, an average of 56% of the students earned full marks, and 15% were assigned part marks. Items clustered into five discussion categories: patterns and numbers, measurement, decimals, fractions, and other. Under patterns and numbers, questions dealt with number sequences and combinations, place value and magnitude, and operations. Questions involving measurement included those testing student understanding of area and perimeter, the relationship between the areas of parallelograms and rectangles, estimation of the areas of triangle sequences, and the estimation of length. In decimals, students were asked questions related to operations and conversions to common fractions in lowest terms. Items on fractions involved a fraction as a portion of a whole, operations, and relationships between order and magnitude. Three items were clustered under the heading of other. One involved data representation with a pictograph, another involved a word problem referring to time, and the third dealt with spatial visualization through a pattern and transformation. 8.4.2.1 Examples of Items and Responses Examples, showing one item from each of the discussion categories, follows Student responses, relative to correctness, approach, and misconception or error, are discussed. 8.4.2.1.1 Patterns and Numbers Example (3K) Item K13 was answered correctly by 53.7% of the students in Grade 7. The same item was administered at Grade 4, with a Mean Percent Correct of 50.7%. This reflects a modest increase in performance on this item between the grades. More than a third (38%) answered incorrectly, 1% gave illegible answers, and 6% left the question blank. The incorrect answer most frequently given was "1," which appeared on 14% of the papers. In this case students simply selected the smallest of the numbers. Another 6% gave "17" as their answer. In this case students summed the numbers. 8.4.2.1.2 Measurement Example (5K) The above item (K17) was answered correctly by 38% of the students. Another 54% gave an incorrect answer, 3% gave responses that were illegible or crossed out, and 6% left it blank. Among the incorrect answers were 30, 18, and 26. The most frequently occurring one was 30, given by 11% of the students. In this case, students did not answer the question, finding the area of the rectangle, rather than the parallelogram inside of it. The second most popular incorrect answer was 18, which appeared on 5% of the papers. Students giving this answer likely found the areas of the triangles located on the ends of the rectangle by multiplying their altitudes times their bases, rather than taking one-half of their products, before subtracting their areas from the area of the larger rectangle. 8.4.2.1.3 Decimals Example (3K) This item was answered correctly by 52% of the students. It was a straightforward application of multiplication by decimals, without use of a calculator. However, 45% gave an incorrect answer, 1% gave an answer which was illegible, and 3% left it blank. The most common type of error involved misplacing the location of the decimal point in the answer, given by 13% of the students. Of these, most (5% of the total) placed the decimal after the second digit in their answer. Another 6% miscalculated one of the digits in their final answer. 8.4.2.1.4 Fractions Example (5K) Almost two-thirds (62%) of the students answered item J4, above, correctly. This question was left unanswered by 6%, and 32% answered incorrectly. The most frequent error, which appeared on 13% of the papers, showed five squares shaded. These students either did not understand the concept of a fraction as part of a whole or else just shaded in the number in the numerator of the fraction, assuming the denominator corresponded to the total number of squares in the figure. The second most frequent error, which occurred on 4% of the papers, showed either 14 or 16 squares shaded. 8.4.2.2 General Comments Students in Grade 7 answered most questions quite well, as reflected by the Mean Percent Correct score of 60.5% across all of the open ended items. However, based on all of the responses, the coding committee felt that the following areas needed more attention:
A total of 14 mathematics items were contained on the two forms administered to the sample of Grade 10 students. All of the items were TIMSS items which have not been released at this point. Eight of them were assigned a single mark, five were worth two marks, and one was worth three marks. This came to a total of 21 marks for all items, and the Mean Percent Correct was 45.8%. On the single mark items alone, the average mark was 46.6%. The items measured outcomes in four discussion areas: data analysis, number patterns and operations, measurement, and algebra. Data analysis consisted of three items: interpreting a line graph, reading a table and estimating, and interpreting a bar graph with a broken scale. There were five questions on number patterns and operations. Three were word problems involving operations with numbers, one involved number patterns, and the fifth dealt with a number sequence. Of the four items on measurement, one involved the area of a rectangle containing an excluded region, another involved perimeter and area, a third involved the measure of an angle, and the fourth dealt with the area of a parallelogram inscribed in a rectangle. Of the two items measuring algebra, one required students to solve a linear equation, and the other asked them to derive an equation given a linear graph showing the relationship between two variables. Students did well on an item presenting a graph showing rates of speed of a car which changes during different time intervals. The item was a two-part item, and the Mean Percent Correct values were 92.1% and 64.1% on parts a) and b) respectively. Many students did not demonstrate an understanding of the concept of ratio, and found very difficult an item requiring them to find the area of a figure, comprised of a rectangle, less the area of an excluded region within its boundaries. It was worth two marks; 28% of the students earned full credit, and 1% received one mark. Another 45% gave an incorrect answer, 6% either crossed their answer out or gave an illegible response, and 20% left it blank. Students did poorly on an item requiring them to determine the equation of a linear relationship, given its graph. Only 11% gave a correct answer, and 31% responded incorrectly. A large number of students either crossed out their work (25%) or else left the question blank (32%). These results show that a large majority of students are not able to determine the equation of a linear relationship, given its graph. Considerable time is allocated in the curriculum to graphing linear equations in the coordinate plane. However, it is likely that little time is spent in the reverse situation, where given the graph, students are asked to determine its equation. 8.4.3.1 General Comments and Recommendations A considerable number of Grade 10 students did not respond to the questions seriously. This resulted in low performance overall, as shown by the Mean Percent Correct of only 45.8%. In many cases, there were no attempts to answer questions; in others, written comments were incoherent or inappropriate. Of those who made serious attempts to answer questions, many left out steps or approaches to questions in their answers. Among the concepts on which students did poorly were the following:
Given these results, it is recommended
There were a total of 22 free-response, or open-ended, science items in Grade 4. These items were placed on separate forms: 4J and 4K. Twenty of the free response science items were taken from the 1991 assessment, and TIMSS provided two items. The free-response items were scored using the TIMSS rubrics. All items were scored on a 0-to-1 scale of correctness with partial marks allowed. This coding system is different from that used in 1991; comparisons of achievement are therefore not possible. The coding team made several observations about student participation and achievement on the free-response items. Students seemed generally sincere and conscientious in their attempts to answer the questions but at the Grade 4 level did not appear to have learned test writing skills and often did not explain their answers fully. Students often wrote vague or single-word answers. Science achievement ranged from poor to excellent, with an overall rating of low average. The lowest level of understanding was demonstrated in the physical sciences and the highest in the biological sciences. Grade 4 students displayed high levels of achievement in both earthquake preparedness and concern for the environment. The highest levels of achievement seemed to be related to science concepts that students encounter in everyday life, those concepts that are part of their past experiences. There were, however, some specific instances of student misconceptions about scientific phenomena. For example, there were many cases of misconceptions about physical science topics such as magnets and sound vibrations. Item J15, below, elicited many drawings of fridge magnets with the poles not clearly defined. Correct lines of force were virtually never shown, but wiggly lines were drawn to indicate attractive force. Example (3K) The most common response ( 55%) given to explain "why one gets a higher sound with a shorter ruler" (see item J18 below) was the size of the ruler. Nearly a quarter (23%) of the students left the answer blank. Students are relating sound to size with little understanding of the concepts related to sound. Both of these items would require past experience with hands-on exploration, again pointing out the need for more experiential science programs. Example (5K) Several items (K7 and K9 are shown next) had an environmental theme. Responses often included lobbying and demonstrating based on emotional appeals, rather than rational thinking based on scientific knowledge or understandings. While there was a high level of concern for the environment displayed, the lack of rational thinking skills was again apparent. Example (3K) Generally, the coding team recommended more consistency between "work asked to be shown" and "coded for" as sometimes work was asked for and ignored in the coding, while other times it was coded and not asked for. The question and scoring rubrics were often inconsistent. The coding team also suggested that teachers give more advice on writing tests such as writing more complete explanations. A strength was noted in earthquake preparedness. The students seemed well-drilled and practised. A weakness in experimentation skills was apparent in item J14 ( shown below) where students were asked to improve the experiment. Many students suggested ways to make the plant grow better, rather than controlling a variable as an option. This response reinforces earlier findings regarding the lack of hands-on activities with plants. Example (5K) The coding team highlighted the following science curriculum areas and skills as not being well covered by some schools and districts: physical sciences, scientific vocabulary, and experiments. Students are weak in reading and interpreting questions, organizing notes or information, and reading for information and understanding. Questions are often restated but not answered. Students had difficulty deciding on one important fact and often gave two or three. The use of headings to organize information is misunderstood. The strengths identified by the coding team support earlier findings that students do well when they have received meaningful instruction or teaching and when they can draw on past experiences with materials to help them answer the test items. 8.5.2 Grade 7 ResultsAll 20 free response science items for Grade 7 were selected from the TIMSS item bank. In several questions, more than one correct answer was coded. For the purposes of reporting, the Mean Percent Correct value is an aggregate of all correct responses to that question. The coding protocol allowed for the coding of partially correct responses. These have been noted where appropriate. A few items had more than one question. In these instances each question has been reported separately (e.g., J11a, J11b). Item K4 (shown below) was an interesting anomaly in the questions. This was the only item which a very significant number of students (45%) did not attempt to answer. It is even more surprising given that this question gave students a starting point of 0 degrees and asked them to predict and explain what was going to happen. Unless students start generating very creative answers, there are only three possibilities; the temperature goes up, down, or stays the same. Example (3K) A weakness was also noted with question K7, where students were asked about the conversion of electrical energy to light energy. Only 3% of students responded correctly. A much larger number of students (33%) were able to answer in a partially correct way, but this still points to a general weakness in students' ability to respond to this topic. Example (5K) Students' ability to design an experiment was also less than desirable. Only 20% were able to do so successfully when asked to design an experiment which showed how the heart rate fluctuates with exercise. A further 26% were able to respond in a partially correct way. Even considered together, less than half of students were able to respond with any degree of correctness. The teachers who coded these items did make some comments and recommendations. They noted that, while some students answered the questions in good paragraph form, many others displayed poor sentence structure, poor spelling, and poor paragraphing skills. The questions on the topics of ecology, astronomy, and biology were felt to be generally well answered, while students' knowledge of the topics of physics, energy, and chemistry appeared to be weak. It is recommended
8.5.3 Grade 10 Results There were eight science items from the 1991 assessment and six items from TIMSS on the Grade 10 free response forms. Since the six TIMSS items in Grade 10 have not been released for reporting at this point, the discussion regarding Grade 10 student performance in science is somewhat brief. The free response coding team made several observations about student responses on the open-ended items. "Misinterpretations" were reportedly more common than "misconceptions." Few students were able to describe experiments well. They also had difficulty explaining comparisons. Furthermore, students had an even greater difficulty putting explanations into a more scientific context. This was indicated by the more superficial answers evident throughout the responses on this element of the science assessment. Generally, there was a decided lack of uniqueness in approaches to the open-ended items. Scientific misconceptions appeared more often when students were requested to provide written explanations. There were, however, some specific instances of student misconceptions about scientific phenomena. For example, there were many cases of misconceptions about boiling in the item which dealt with a boiling kettle. The answer keys provided usually did not account for responses based on misinterpretations. For example, in that particular item, students who missed the idea that the kettle was already boiling, gave the response: "It will heat up and begin to evaporate, never going over the boiling point." The keys were also problematic where students gave more explicit answers than others and yet tended to end up with the same score. The results on another question prompted the coding team to comment that students do not have a clear understanding of half-life. Results on other items indicate that students all seem to have experience with graphs, but many focus on details without showing trends in data evident with a graph. They also have a strong sense of the importance and relevancy of communications in their lives. The coding team summarized their comments about student participation and achievement as follows. There is a large variation in how seriously and earnestly students approached the assessment. A number of students put a lot of careful thought into their responses and apparently brought ideas that had been part of their school science activities to the assessment. On the other hand, other students tended to be extremely flippant in their responses. They did not appear to take this written assessment very seriously. The papers tended to be grouped, and so the team wondered whether the approach to the administration of this assessment was a factor. The coding team felt that there was a "huge" difference in terms of student attitudes toward this assessment between the two forms (V and W). In general, the "W-form-students" were more likely to take on the challenge of the questions than their "V-form" counterparts. The chairperson's report also indicated that the markers' responses to the keys for the two forms echoed these comments--the keys for these items did not account well for the type of student responses. It should be noted that the chairperson's comments for the Mathematics 10 component of the free response items also indicated a lack of seriousness on the part of students. 8.5.4 General RecommendationIt is apparent that many Grade 10 students did not treat the assessment seriously. This problem has been detailed in Chapter 2 and will not be discussed further here. However, it is recommended
|
||||
|
|
||||||