![]() |
|
| 2
Design and Methodology
The 1995 British Columbia Assessment of Mathematics and Science was designed to measure student achievement in, and attitudes toward, mathematics and science at three levels of the system: Grade 4, Grade 7, and Grade 10. Further, it was intended to examine teachers' backgrounds, attitudes, and classroom practices. 2.1 Participants in the Study Several committees and panels were involved in the study. They included a Contract Team, a Technical Review Committee, Item Validation Committees, an Item Selection Committee, Free-Response Coding Committees, and Interpretation Panels. A description of each follows. 2.1.1 Contract Team The Contract Team included the following members: a faculty member from UBC's Science Education Department, the executive director of Applied Research and Evaluation Services at UBC, a school district official with expertise in mathematics education and testing and measurement, a mathematics classroom teacher, and a doctoral candidate with expertise in science education and testing and measurement. This group was well qualified in the area of testing and measurement, and had extensive experience in program evaluation and the fields of mathematics and science education. It was the responsibility of the Contract Team to design the assessment, develop the instrumentation needed to address the assessment's objectives, undertake the scoring and analysis of the data, and write reports on the results. For a list of the members of the Contract Team, see Appendix A. 2.1.2 Technical Review Committee There were six members on the Technical Review Committee (see Appendix A): three representatives from the Contract Team and three from the Examinations and Assessment Branch of the Ministry of Education. This committee reviewed design and rotation schemes for the instruments, established administration and student writing times, developed administration instructions, reviewed overall timelines, and discussed technical issues related to scoring and analysis. 2.1.3 Item Validation Committees Six different Item Validation Committees were formed (see Appendix A): one each for Grades 4, 7, and 10 mathematics and one each for Grades 4, 7, and 10 science. Item Validation Committees were composed of practicing teachers from throughout the province. Members of these committees reviewed potential achievement items and rated them for appropriateness, based on content and suitability, for each respective grade level. These committees also classified items by cognitive behaviour level. The pool from which these committees made their selections consisted entirely of achievement items either from earlier provincial assessments or from the Third International Mathematics and Science Study (TIMSS). All items in the pool possessed acceptable psychometric properties. 2.1.4 Item Selection Committee The Item Selection Committee (see Appendix A for members) reviewed ratings of the Item Validation Committees to determine which items had been rated as appropriate to use in the assessment. The committee then selected items from the remaining pool to reflect content/process and cognitive behaviour level weightings as described in each respective table of specifications (see Section 2.4.1.3.2). The items were then placed on six different forms (four multiple-choice forms and two free-response forms) for each grade. Content and difficulty levels across forms were balanced as much as possible. 2.1.5 Free-Response Coding Committees There were six different Free-Response Coding Committees: one each for Grades 4, 7, and 10 mathematics and one each for Grades 4, 7, and 10 science. It was the task of these committees to code student responses to the free-response (open-ended) items for three aspects of student response: correctness, method or approach, and misconception or error-type. Members of these committees were experienced teachers of these grades. For a list of committee members see Appendix A. 2.1.6 Interpretation Panels 2.1.6.1 OverviewIn October 1995, Provincial Interpretation Panels for mathematics and science met separately over two three-day periods; mathematics met in early October, 1995; science met in late October, 1995. There were seven panels: one each for mathematics and science at Grades 4 and 7, and one each for Mathematics 10, Mathematics 10A, and Science 10. Each panel consisted of 10 to 13 members plus a chairperson from the Ministry of Education. In addition, at least one member of the Contract Team observed each Interpretation Panel. To get as broad an interpretation of the findings as possible, the panels consisted not only of practicing teachers at the various grade levels, but also of school administrators, parents, school trustees, and members of the business community with an interest in mathematics or science. 2.1.6.2 ProcedureAll of the panels operated in the same way over the three days of interpretation. To begin with, each member of a grade-level committee was given all of the multiple-choice items that were presented to the students in that grade. Rather than giving copies of the student booklets to the panelists, the items from the four booklets at each grade were organized by the objectives and goals listed in the table of specifications (see Section 2.5.1.3.4). Each panelist therefore received all of the items designed to measure each objective of the curriculum for that subject and grade. Committee members were then asked to rate each item as easy, average, or hard for students at that grade level. Committee members worked individually and made their decisions based on their knowledge of the school system, the curriculum, and the students at the given grade level. Once individuals had rated the items, they were grouped into teams of two to four people and worked to reach consensus on the rating for each item. After consensus had been reached in these small groups, the entire panel for each grade came together to arrive at a summary consensus on each item. This rating of the difficulty of items provided a perspective for the panel members to complete their next task--determining the levels of student performance they would consider to be acceptable and desirable. Again individually, panelists looked at the sets of items measuring each goal. This time, however, they were asked to estimate two potential scores on each item set, taking into consideration the difficulty level of each set of items. The first score to be estimated was what the panelist considered to be the minimally acceptable number of items correct for a given set of items. The second score was the desirable number of items correct that the panelist believed appropriate for that grade level on that item set. When panelists had individually completed these two tasks for all the sets of items measuring each of the objectives, they again met in their small groups in order to reach consensus, following which they met as a total panel to arrive at consensus. The final outcome of this stage of the process was that a minimally acceptable and a desirable score for each objective and goal was set by the panel. The panels were then given the actual item results for each of the objectives. By comparing the actual results to their minimally acceptable and desirable scores, the panels came to a final consensus on a rating for each of the objectives on a five-point scale of weak, marginal, satisfactory, very satisfactory, or strong. This process is different from that used in previous assessments. In the past, panel members were sent the actual student forms well in advance of the meetings. Before panelists arrived, they would complete all the student booklets as well as a form which asked them to set acceptable and desirable percentage correct scores for each item. As a result, by the time interpretation panels were held, each panelist was very familiar with the test booklets and the nature of the items. During the meetings, then, the panels spent three of their five days coming to small group and panel consensus on the same rating scale of weak to strong. One day was allotted for reviewing non-achievement results, and one day for identifying representative items for various levels of achievement. These activities did not take place in the 1995 panels. 2.2 Target Populations and Timelines 2.2.1 Student Populations The populations in the study included all students in British Columbia, in either public or funded independent schools, enrolled in Grades 4, 7, or 10. For purposes of the mathematics analyses, students at the Grade 10 level were divided into two categories&emdash;those enrolled in Mathematics 10 and those enrolled in Mathematics 10A. An attempt was made to include students who were absent during the testing period by having them complete an assessment instrument upon their return to school. The only exceptions permitted were for students with special needs for whom testing was judged to be inappropriate. 2.2.2 Teacher Populations All teachers of Grades 4 and 7 mathematics and/or science were included in the study. At the Grade 10 level, however, only samples of teachers of Science 10, Mathematics 10, and Mathematics 10A received questionnaires. 2.2.3 Timelines The assessment was administered during the latter part of May 1995, and schools were given a week within which to administer the instruments. The collection and sorting of responses, as well as the coding of free-response items, took place during the month of June 1995. Scoring and analysis were completed during July and August 1995, and interpretation panels were held in October 1995. This report summarizes the data collection, analysis, and interpretation activities of the assessment. In August 1996, school districts and schools received individual reports of the multiple-choice data by reporting categories, plus student background and attitude data. Results from the free-response forms are reported for the province only, as are the teacher questionnaire results. 2.3 Sampling Procedures and Returns 2.3.1 Nature of the Sample All students (approximately 50,000 per grade) in Grades 4, 7, and 10 in public schools and funded independent schools constituted the target populations. In each grade, the four multiple-choice forms were interleaved to provide randomly equivalent samples of students when distributed. The two free response forms for each grade were also inserted at intervals, but only often enough to yield a distribution of 600 of each open-response booklet, with expected returns of at least 400, sufficient to provide provincial results with reasonable levels of accuracy. 2.4 Assessment Instruments For each of the three grades, student questionnaires, student achievement instruments, and teacher questionnaires were developed. Each of these is described for each grade in the following sections. 2.4.1 Student Instruments 2.4.1.1 OverviewA total of six different forms--four randomly-rotated multiple-choice forms and two free-response forms--containing both mathematics and science items were produced for each grade; each form consisted of two parts. Part 1 of each form, the student questionnaire booklet, contained background and attitude questions and scales and, in Grades 4 and 7, a number of core items common to all forms; in Grade 10, the core items were placed in Part 2. The achievement items were placed in Part 2 of each form. The vast majority of students completed multiple-choice booklets&emdash;only one in 40 students completed a free response booklet. 2.4.1.2 Student Questionnaires2.4.1.2.1 Introduction The Part 1 booklet of each form consisted mainly of background questions and attitude scales. The questions and scales used in the 1995 assessment were selected from three sources: the 1990 Provincial Mathematics Assessment, the 1991 Provincial Science Assessment, and TIMSS. Three broad categories of items were included in the student questionnaires: general background items, mathematics-specific items, and science-specific items. The general items were placed in the Part 1 booklets of all forms and were completed by all students. To reduce the time required of students to complete the student questionnaire portion of the assessment instruments, most of the mathematics specific and science specific items were distributed across the booklets; hence, only a sample of students completed them. The student questionnaires were similar across all three grades; however of necessity, modifications were made to accommodate the differences in age-levels and educational programs. Overviews of three categories of items included in the student questionnaires are presented below. 2.4.1.2.2 General Survey Items All students at the three levels completed items designed to collect background information about them. These questions asked them to indicate their age, gender, how long they had lived in Canada, the extent to which English was spoken at home, whether they were in an English- or a French-language program, whether they had a calculator or computer at home, and how much time they spent on a variety of outside-of-school activities. Students in Grade 10 were also asked to indicate how their mathematics and science courses were organized in the school (semester or full year), whether or not they worked at a part-time job, and their future educational and vocational plans. Students' responses to the general questions are reported in Chapter 3 of this report. 2.4.1.2.3 Mathematics Survey Items Items designed to learn more about students' attitudes toward mathematics and their classroom learning experiences were rotated across the Part 1 booklets of the forms; they are described below. 2.4.1.2.3.1 Attitudes Toward Mathematics Items Items pertaining to students' attitudes toward mathematics were included in the assessment because several previous studies have indicated that students' attitudes toward mathematics are related to achievement (Aitken, 1976; Phillips, 1973; Taylor, 1988; Taylor & Robitaille, 1987), and students' perceptions of mathematics are important outcomes of the program. To collect information in these areas, students were asked about the importance of mathematics in the everyday world and their perceptions of the status of a number of major topics in the mathematics curriculum. Students in all three grades responded to three questions related to mathematics and work (Form A for Grades 4 and 7, Form R for Grade 10). They were asked their opinion about the necessity of knowing mathematics in order to get a good job, whether most people used mathematics in their jobs, and if they would like a job where mathematics was needed. A five-point Likert-type scale was used with the response options: strongly agree, agree, undecided, disagree, and strongly disagree. Scales used in both the 1985 and 1990 provincial mathematics assessments (see Robitaille & O'Shea, 1985; Robitaille et al., 1991) were used to collect students' perceptions of major topics in the mathematics curriculum. These scales were adapted from the Mathematics in School scale developed by the International Association for the Evaluation of Educational Achievement (IEA) and used in the Second International Mathematics Study. Each of the three versions of the Mathematics in School scale used in this assessment contained a list of 12 different topics; a different list of major mathematics topics from the curriculum was used for each grade level. Students were asked to rate each topic on three dimensions: importance, difficulty, and enjoyment. They responded on a five-point Likert-type scale, with options ranging as follows: importance: "not at all important" to "very important"; difficulty: "very difficult" to "very easy"; and enjoyment: "dislike a lot" to "like a lot". The content validity of the scales was addressed during the 1990 mathematics assessment and judged to be appropriate (Robitaille et al., 1991). Because only three topics appeared on each form, estimates of internal consistency were not computed, and results are reported only at the item level. The topics used at each grade level are listed in Table 2.1. Table 2.1 Topics for Mathematics in School Scale by Grade (5K) Form B at each grade level included an item designed to obtain information about how students felt about using calculators and computers. Using a five-point Likert-type scale, students were asked to indicate the extent to which they like using computers in math class or at home. The results and discussions of student responses to the attitudes toward mathematics items can be found in Chapter 3. 2.4.1.2.3.2 Classroom Practices Items Two different sets of items, placed on two different forms at each grade level, were used to collect information about students' learning experiences in mathematics classrooms. To estimate the frequency of certain activities during a typical school week, students were asked to indicate how often 14 different classroom activities took place (Form C for Grades 4 and 7, Form T for Grade 10). They indicated the frequency of each activity on a four-point scale anchored by the options almost always, quite often, once in a while, and never. The same 14 activities, used at all three grade levels, were as follows:
A second set of items was included to learn more about what occurs in mathematics classrooms whenever students begin a new topic. With this set of items, students were again asked to indicate on a four-point scale how often six different activities took place. The four response options were: almost always, quite often, once in a while, and never. The six activities that could occur whenever a class begins a new topic in mathematics included:
The results for the classroom practices items are discussed in Chapter 3. 2.4.1.2.3.3 Attitudes Toward Problem-Solving Scale Part 1 of the free-response forms (Forms J and K for Grades 4 and 7; Forms V and W for Grade 10) all included the Attitudes Toward Problem Solving scale. This scale, constructed for the 1990 mathematics assessment, asked students to indicate on a four-point Likert-type scale the extent to which they agreed with eight statements pertaining to problem-solving. Two examples of the statements included in the scale are: "I enjoy solving math problems" and "Problems that make you think are more fun than easy problems." 2.4.1.2.4 Science Survey Items Items designed to collect information about a variety of science-related activities and students' attitudes toward science were distributed across the Part 1 booklets of each of the forms; they are described below. 2.4.1.2.4.1 Science-Oriented Activities Items All students were asked to indicate how often they did six different science-oriented activities. Two examples of these questions are: "Do you ever make up and do your own science experiments, or read about a science experiment and do it?"; and "Have you ever participated in a science fair?" Students responded on a three-point scale anchored by the options: Yes, several times; Yes, once or twice; and No, never. The six science-oriented activities items were distributed evenly across two forms at each grade level: Forms B and D for Grades 4 and 7; and Forms S and U for Grade 10. Details of these items and students' responses can be found in Chapter 3. 2.4.1.2.4.2 Science Classroom Activities Items To learn more about the kinds of activities taking place in science classrooms throughout the province, students at all three levels completed items placed on two forms at each level: Forms B and D for Grades 4 and 7; and Forms S and U for Grade 10. Students indicated the extent to which 11 science classroom activities occurred in a month by responding on a five-point Likert-type scale anchored with the options: always, quite often, sometimes, rarely and never. Details of these items and students' responses are presented in Chapter 3. 2.4.1.2.4.3 Science Affective Scales and Items Used in the Assessment Students in British Columbia are expected not only to develop the skills and processes of science, to increase their scientific knowledge, and to develop critical thinking skills, but also to develop positive science attitudes. For that reason, several instruments were used to collect information about students' science attitudes. The 1995 assessment used exactly the same scales, in equivalent formats, that were used in the 1991 assessment. Readers who are interested in a full discussion of science attitudes and details of the psychometric characteristics of the items and scales are referred to the technical report of that assessment (Bateson, et. al, 1991, Chapter 3). Suffice it to say that all of these scales demonstrate excellent psychometrics; they are very reliable and valid to a high degree. Four of the affective instruments used Likert-type items and were based on instruments used in the 1991, 1986, and 1982 British Columbia science assessments: the School Science scale, the Science in Society scale, the Careers in Science scale, and the Specific Issues instrument (referred to as Environmental Issues in 1991). For these scales, students were presented with value statements about different aspects of science and were asked to indicate their agreement with each statement on a five-point scale: strongly agree, agree, undecided, disagree, and strongly disagree. Whenever the statement reflected a positive attitude, strongly agree was scored as a "5" and strongly disagree was scored as a "1"; whenever a statement reflected a negative attitude, the scoring was reversed. 2.4.1.2.4.4 The School Science Scale The School Science scale included statements designed to assess a student's generalized attitude towards science as a subject. It included statements such as: "I like to study science in school," and "Science classes are boring." In 1995, all 10 items were presented to each of the grades. However, in previous assessments, only the first seven items were presented to Grade 4 students. Accordingly, while the results for all 10 items will generally be reported herein, when reporting changes in attitudes at the Grade 4 level, only the first seven items from the 1995 assessment will be used. The School Science scale was placed on all forms at Grade 4; at Grade 7 it was placed on Forms A, J, and K, and at Grade 10 it was placed on the R, V, and W forms. The School Science scale was the only Likert-type affective instrument used with the Grade 4 students. The Cronbach's Alphas for this scale in 1995 were: Grade 4, each form was .92; Grade 7, Form A was .93; and Grade 10, Form R was .94. As was the case in previous assessments, two methods of analysis of the students' scores were employed. The first method classifies students as having an overall positive, neutral, or negative attitude based on the total score they obtained on the School Science scale. On this scale, the maximum score possible is 50, the minimum is 10 (the mid-point of the scale is 30), and the continuum of scores contains 41 points; as a result, the three categories were assigned intervals of 14, 13, and 14. The resulting attitude categories are as follows:
For Grade 4 comparisons with previous assessments, only the first seven items were used. Therefore, because the maximum score possible is 35, the minimum is 7, (the mid-point of the scale lies between 21 and 22), and the continuum of scores contains 29 points; as a result, the three categories were assigned intervals of 10, 9, and 10. The resulting attitude categories are as follows:
This report will consider proportions of students in each of the above categories. The second method examined the Mean Total Score of various groups. In this, a student's Mean Total Score can range from a low of 10 (all items extremely negative) to a high of 50 (all items extremely positive). The report will also consider the Mean Total Score of each grade that participated in the assessment. Again for the purposes of Grade 4 change analyses, only the first seven items were considered, and a student's Mean Total Score on the scale can range from a low of 7 (all items extremely negative) to a high of 35 (all items extremely positive). 2.4.1.2.4.5 The Science in Society Scale The Science in Society scale "attempts to measure a broad area that includes the interrelationships and interdependencies of science, technology and society ... [and] can be considered as a measure of student attitude regarding the value to our society of science and technology" (Bateson, et al., 1986, p. 28). Two examples of statements found on the Science in Society scale are: "Science is NOT necessary to society," and "Our society depends on science to exist." The scale was included on only the Form B booklets at Grade 7 and the Form S booklets at Grade 10. The Cronbach's Alphas for this scale in 1995 were: Grade 7, Form B was .70, and Grade 10, Form S was .75. The same two methods of analysis of students' scores used in 1991 for the Science in Society scale were used again in 1995. The first method classified students as having an overall positive, neutral, or negative attitude based on their Science in Society total score. Because the scale consisted of 10 items, the maximum score possible is 50, the minimum is 10 (the mid-point of the scale is 30), and the continuum of scores contains 41 points; as a result, the three categories were assigned intervals of 14, 13, and 14. The resulting attitude categories for Grades 7 and 10 are as follows:
This report will consider proportions of students in each of these categories. A second method of analysis examined Mean Total Scores. Since the scale consisted of 10 items, a student's total score can range from a low of 10 (all items extremely negative) to a high of 50 (all items extremely positive). The report will consider the Mean Total Score of each grade that participated in the assessment. 2.4.1.2.4.6 The Careers in Science Scale The Careers in Science scale was designed to "measure a student's attitude towards entering a career in the field of science" (Bateson, et al., 1986, p.31) and was included only on the Form C booklets at Grade 7 and the Form T booklets at Grade 10. Two examples of statements found on the Careers in Science scale are: "A career in science would be very satisfying," and "I would hate to be a scientist." The Cronbach's Alphas for this scale in 1995 were: Grade 7, Form C was .93, and Grade 10, Form T was .94. Again, the same two methods of analysis of the students' scores employed in the 1991 assessment were employed in 1995. The first method classified students as having an overall positive, neutral, or negative attitude based on total score obtained on the Careers in Science scale. Because the scale consisted of 10 items, the maximum score possible is 50, the minimum is 10 (the mid-point of the scale is 30), and the continuum of scores contains 41 points; as a result, the three categories were assigned intervals of 14, 13, and 14. The resulting attitude categories are as follows:
This report will consider proportions of students in each of these categories. A second method of analysis examined the Mean Total Scores. Since the scale consisted of 10 items, a student's Mean Total Score can range from a low of 10 (all items extremely negative) to a high of 50 (all items extremely positive). The report will consider the Mean Total Score of each grade that participated in the assessment. 2.4.1.2.4.7 The Specific Issues Instrument The Specific Issues instrument included items which "attempt to measure student opinions about a variety of prominent scientific issues/moral values, including conservation, pollution, animal experimentation, creation of life, and the use of herbicides/insecticides" (Bateson et al., 1986, p.32). Two examples of the statements found on the "Environmental Issues" scale are: "Scientists should conduct experiments on live animals if they think people will be helped," and "The government should make recycling all waste a law." These items were included only on the Form D booklets at Grade 7 and the Form U booklets at Grade 10. As noted above, in past assessments the Specific Issues instrument was referred to as the Environmental Issues instrument. However, inspection of the Environmental Issues items indicates that the instrument consists of two "subscales"; three of the items on the instrument refer to scientists' activities/moral values (Items 1, 3, and 6), while seven items pertain to environmental issues. The Cronbach's Alphas for 1995 were: Grade 7, Form D was .50; and Grade 10, Form U was .26. To ascertain whether or not students in British Columbia hold environmentally "friendly" or "unfriendly" attitudes, a total score was generated for each student on the basis of the seven items representing statements about environmental issues. The maximum score possible for the seven-item subscale is 35, the minimum is 7 (the mid-point of the scale lies between 21 and 22), and the continuum of scores contains 29 points; as a result, the three categories were assigned intervals of 10, 9, and 10. The resulting attitude categories are as follows:
This report will consider the proportion of students in each of these categories. It will also report on the three items having to do with scientists' activities/moral values separately. 2.4.1.3 Achievement Instruments 2.4.1.3.1 Overview As mentioned above, there were six forms produced for each grade (four multiple-choice forms and two free-response forms) and each form consisted of two parts. The nature of the Part 1, or student questionnaire, booklets is discussed above; this section describes the Part 2, or achievement, booklets used at each grade level. Tables of specifications from the 1990 mathematics and 1991 science assessments were replicated for use in the 1995 study (see Section 2.4.1.3.2 below). They provided the blueprints which described each item set and which provided direction for item selection on the multiple-choice component. The tables reflected the provincially prescribed curriculum in each subject, establishing a link between each item and an intended learning outcome at each respective grade level. Since achievement items for potential use in the study were from either earlier provincial assessments or TIMSS, they had known properties which met selection criteria. As a result, there was no need to pilot or field test these items. Given the acceptable psychometric properties of all potential items, the primary task was to select those from the pool which best met each table of specifications and which provided for measures of change over time as well as for future international comparisons. To ensure curricular validity, however, it was necessary to validate the items from TIMSS. This involved review and rating by practicing teachers in the field. Several teachers at each respective grade level rated the 1990 mathematics assessment, 1991 science assessment, and TIMSS items in terms of curriculum match and appropriateness. They also categorized each item by cognitive behaviour level. To select the best
items, the following steps were taken:
Once items were selected, they were placed on the assessment forms. Care was taken to balance for content coverage and range of difficulty. The tables of specifications used for this assessment are presented next, followed by brief descriptions of the multiple-choice and free-response forms. The technical properties of the multiple choice instruments and items are reported in a later section. 2.4.1.3.2 Table of Specifications As previously noted, the tables of specifications from the 1990 mathematics and 1991 science assessments provided the blueprints that described each item set, and directed the item selection for the multiple-choice component of the 1995 Assessment. The tables of specifications for each subject and grade are given below. 2.4.1.3.2.1 Mathematics Specifications 2.4.1.3.2.1.1 Grade 4 Mathematics Specifications The table of specifications for the Grade 4 mathematics multiple-choice and free-response items is given in Table 2.2. A total of 125 mathematics items appeared on the four multiple-choice forms used at the Grade 4 level. As mentioned earlier, five were core items which appeared on each form, and 120 were rotated evenly across forms. Of the 125 mathematics items at the Grade 4 level, 67 were replicated from the 1990 mathematics assessment and 58 were from TIMSS. Intended content reporting categories, with corresponding numbers of items, were as follows: Number and Operations: 84 (Numeration: 22, Place Value: 9, Whole Number Operations: 27, Fractions: 17, Decimals: 9); Data Representation: 10; Geometry: 16; and Measurement: 15. Free-response questions included 14 short-answer and eight extended-response mathematics items; all but two of the free-response items were from TIMSS. The reporting focus for these is at the item level with respect to correctness, approach, and error-type. Items tested a number of content areas: Numeration, Place Value, Whole Number Operations, Fractions, Data Representation, Transformations, and Units and Measures. Table 2.2 Table of Specifications for Grade 4 Mathematics Multiple-Choice and Free Response Items (8K) 2.4.1.3.2.1.2 Grade 7 Mathematics Specifications Table 2.3 displays the table of specifications for the Grade 7 multiple-choice and free-response items. At the Grade 7 level, there were 126 multiple-choice items allocated across four forms. Fifty-three of these were from TIMSS, and 73 were replicated from the 1990 mathematics assessment. Reporting categories for Grade 7 mathematics include each of the six content categories listed in the table of specifications as well as subtest scores on Whole Number Operations, Decimals, Fractions, Ratio and Proportion, Integers, and Similarity and Transformations. There were 14 short-answer and six extended-response questions among the mathematics free-response items; all of these items were selected from TIMSS. Items tested outcomes from the following areas: Decimals, Fractions, Numeration, Place Value, Data Representations, Transformations, Patterns, Whole Number Operations, and Area. Table 2.3 Table of Specifications for Grade 7 Mathematics Multiple-Choice and Free-Response Items (8K) 2.4.1.3.2.1.3 Grade 10 Mathematics Specifications It was necessary, at the Grade 10 level, to include not only sets of items common to all students enrolled in that grade, but also others which were unique to each of the Mathematics 10 and Mathematics 10A courses. Each of the four multiple-choice forms included the following item sets: a core of five math items common to each form that all students were expected to write; a rotated set of four math items that all students wrote, and sets of 11 unique Mathematics 10 and 11 Mathematics 10A items on each form, to be completed only by students who had taken each respective course. Out of a total of 109 different mathematics items at the Grade 10 level, 21 were to be written by all students, 44 only by Math 10 students, and 44 only by Math 10A students. The tables of specifications for Mathematics 10 and Mathematics 10A are shown in Table 2.4 and Table 2.5, respectively. Table 2.4 displays the table of specifications for the Mathematics 10 multiple-choice items. Reporting categories for Mathematics 10 include the five major categories shown in the table: Number and Operations, Data Analysis, Geometry, Measurement, and Algebra. Student achievement will also be discussed on several sub-categories within these topics. Table 2.4 Table of Specifications for Mathematics 10 Multiple-Choice and Free-Response Items (8K) The table of specifications for the Mathematics 10A multiple-choice items is shown in Table 2.5. For Mathematics 10A, the content reporting categories include the same topics as for Mathematics 10.However, sets of items for these categories include both some which are common, and others which are unique to Mathematics 10A. No distinction was made between the performances of Grade 10 students by mathematics course on the free-response items. Table 2.4 and Table 2.5 show the number of free-response items per strand and topic used at the Grade 10 level. Results for these questions are presented as total Grade 10 scores, with breakouts only by gender. Students answered a total of 14 open-ended mathematics items across the two rotated forms: six short-answer and eight extended-response. Content areas tested by individual items included Fractions, Equations, Angles, Whole Numbers, Interpret Data, Area, Ratio, and Estimation. Table 2.5 Table of Specifications for Mathematics 10 A Multiple-Choice and Free-Response Items (8K) 2.4.1.3.2.2 Science Specifications There are four goals
in the science curricula in place in British Columbia:
Within each of the goals, the objectives differ by grade. These differences are reflected in the individual tables of specifications. Given the limitations of time and the numbers of items which can be used in any assessment, it is impossible to measure all objectives. The inclusion and exclusion of certain objectives was determined in 1991 by the deliberations of the advisory committee in conjunction with the ministry and the contract team for that assessment. The 1995 tables of specifications preserved those inclusions and exclusions with some modification to the percentage of items dedicated to each goal. 2.4.1.3.2.2.1 Grade 4 Science Specifications The table of specifications for Grade 4 science is given in Table 2.6. It can be seen that there were 118 Grade 4 science multiple-choice items (see Table 2.6); 75 items were from the 1991 science assessment and 43 were from TIMSS. It also shows that of the 22 free-response items at the Grade 4 level, 20 were selected from the items used in the 1991 science assessment and two from TIMSS. Table 2.6 Table of Specifications for Grade 4 Science Multiple-Choice and Free-Response Items (10K) 2.4.1.3.2.2.2 Grade 7 Science Specifications Table 2.7 displays the table of specifications for the Grade 7 science multiple-choice and free-response items. There were 123 multiple-choice science items at the Grade 7 level; 76 were items from the 1991 science assessment and 47 were from TIMSS. At Grade 7, all 20 open-ended science items were taken from TIMSS; all items measured Goal D (Cognitive Processes) and required students to use a combination of the three objectives within this goal to a greater or lesser extent. These items were therefore not classified by objective, but rather under the overall goal. Table 2.7 Table of Specifications for Grade 7 Science Multiple-Choice and Free-Response Items (30K) Note: a All of the free-response items required students to use a combination of the objectives listed under Goal D. 2.4.1.3.2.2.3 Grade 10 Science Specifications Table 2.8 shows that there were 65 Grade 10 science multiple-choice items in the assessment; 49 items were from the 1991 science assessment and 16 were from TIMSS. It also shows that there were eight free-response items from the 1991 science assessment and six from TIMSS, for a total of 14. Table 2.8 Table of Specifications for Grade 10 Science Multiple-Choice and Free-Response Items (10K) 2.4.1.3.3 MULTIPLE-CHOICE FORMS Four multiple-choice forms were produced for each grade level. As described above, each multiple-choice form consisted of two parts. In addition to the background and attitude questions, Part 1 of the Grades 4 and 7 forms also contained a common set, or core, of multiple-choice items. At the Grade 10 level, the core items were placed in the Part 2 booklet of each form. The core items measured important outcomes at each grade level and were evenly divided between mathematics and science. The balance of the multiple-choice items were rotated across Part 2 of each of the four forms. Each Part 2 booklet included a set of multiple-choice achievement items unique to each form. There were 60 items on each form at Grades 4 and 7, and 41 items per form at Grade 10. Half of the items tested mathematical knowledge and concepts, and the rest dealt with scientific knowledge, concepts, and processes. Items were selected from three sources: the 1990 mathematics assessment, the 1991 science assessment, and TIMSS. All of the multiple-choice items from previous provincial assessments included the option I don't know; TIMSS items, on the other hand, did not include this option. Descriptions of the multiple-choice instruments by grade are found below. 2.4.1.3.3.1 Grade 4 Multiple-Choice Instruments Part 1 of each of the four Grade 4 multiple-choice forms contained student background information questions, attitude scales, and 10 core multiple-choice achievement items: five for mathematics and five for science. The 10 core items were the same in all Part 1 booklets, and an additional 240 items (120 mathematics and 120 science) were distributed (rotated) across the four multiple-choice Part 2 booklets, evenly balanced by content/process coverage and difficulty level. As a result, each Grade 4 form contained a total of 70 items: 10 core and 60 rotated (five core and 30 rotated mathematics items, and five core and 30 rotated science items). There were, therefore, a total of 250 Grade 4 mathematics and science multiple-choice items. At the Grade 4 level, Forms A, B, C, and D contained the multiple-choice achievement questions. 2.4.1.3.3.2 Grade 7 Multiple-Choice Instruments Part 1 of each Grade 7 multiple-choice form contained student background information questions, attitude scales, and 12 core multiple-choice achievement items: six for mathematics and six for science. The 12 core items were the same for all forms, and an additional 240 items (120 mathematics and 120 science) were distributed across the four multiple-choice Part 2 booklets, evenly balanced by content/process coverage and difficulty level. Hence, each Grade 7 form contained a total of 72 items: 12 core and 60 rotated (six core and 30 rotated mathematics items, and six core and 30 rotated science items). In total, there were 252 mathematics and science multiple-choice items at the Grade 7 level. The multiple-choice items were placed on Forms A, B, C, and D at the Grade 7 level. 2.4.1.3.3.3 Grade 10 Multiple-Choice Instruments In Grade 10, Part 1 of each multiple-choice form contained background information questions and attitude scales; Part 2 of each form contained 51 multiple-choice achievement items. Of the 51 items, 31 were math and 20 were science. Each form was composed of 10 core items (five math and five science) which were common to all forms, eight items (four math and four science) written by all students, 11 math items for Math 10 students only, 11 math items for Math 10A students only, and 11 science items written by all students. Since students were enrolled in one of either Math 10 or Math 10A, each student wrote a total of 40 items on each form. In total, there were 174 mathematics and science multiple-choice items at the Grade 10 level. 2.4.1.3.4 FREE-RESPONSE FORMS There were two free-response forms for each grade level: Forms J and K for Grades 4 and 7; Forms V and W for Grade 10. As was the case for the multiple-choice forms, each free-response form consisted of two parts. Part 1 of the Grades 4 and 7 free-response forms contained background information and attitude questions plus the same set of core items found in Part 1 of the multiple-choice forms. Part 1 of the Grade 10 free-response forms, however, contained only background information and attitude questions&emdash;the core items were placed in the Part 2 booklets of the Grade 10 forms. In addition to the core items, each form contained 22, 20, and 14 free-response items at Grades 4, 7, and 10, respectively. Like the multiple-choice items, the free-response items were selected from three sources: TIMSS and previous mathematics and science provincial assessments. Brief descriptions of the free-response instruments for each grade are found below. 2.4.1.3.4.1 Grade 4 Free-Response Instruments Part 1 of each of the two Grade 4 free-response forms (Forms J and K) contained student background information questions, attitude scales, and 10 core multiple-choice achievement items: five for mathematics and five for science. The 10 core items were the same as those found in the Part 1 booklets of the multiple-choice forms. The Part 2 booklets each contained 22 free-response items: 11 mathematics items and 11 science items. In total, 44 free-response items were administered at the Grade 4 level: 22 mathematics and 22 science. When the free-response items were distributed across the forms, care was taken to ensure that they were balanced for content and difficulty level. 2.4.1.3.4.2 Grade 7 Free-Response Instruments Part 1 of each of the two Grade 7 free-response forms (Forms J and K) contained student background information questions and attitude scales. In addition, they contained the same 12 core multiple-choice achievement items found in the Part 1 booklets of the multiple-choice forms. Each Part 2 booklet contained 20 free-response items: 10 mathematics items and 10 science items. Hence, 40 free-response items were administered at the Grade 7 level in all. The free-response items were balanced for content and difficulty level across the two forms. 2.4.1.3.4.3 Grade 10 Free-Response Instruments At the Grade 10 level, the Part 1 booklets of the two free-response forms (Forms V and W) contained student background information questions and attitude scales. Each Part 2 booklet contained the 10 core multiple-choice items along with seven mathematics free-response items and seven science free-response items. In total, 28 free-response items were administered at the Grade 10 level: 14 mathematics items and 14 science items. Care was taken to balance for content and difficulty level when items were distributed across the forms. 2.4.2 Teacher Instruments 2.4.2.1 Overview Four teacher questionnaires were developed for the 1995 assessment. There was one questionnaire each for teachers at the Grade 4 and 7 levels, and three for teachers at the Grade 10 level: one for Mathematics 10, one for Mathematics 10A, and one for Science 10. The teacher questionnaire items were selected from three sources: the 1990 mathematics assessment, the 1991 science assessment, and TIMSS. Three broad categories of items were included in the teacher questionnaires: general background items, items specific to the teaching of mathematics, and items specific to the teaching of science. At the Grade 4 and 7 levels, the three categories of items were placed into one booklet. At the Grade 10 level, each of three booklets contained background questions plus questions specifically designed for mathematics or science. At the Grade 4 and 7 levels, each questionnaire was to be completed by the teacher, or teachers, responsible for teaching mathematics and science to any given class. Hence, depending on how the school was organized and the teaching assignments of the teachers, either one or two teachers completed a teacher questionnaire for each class. To accommodate the fact that two different teachers may have had to complete each questionnaire, two sets of background questions were placed in each questionnaire: one before the teaching of mathematics section and one before the teaching of science section. Teachers who were responsible for teaching both mathematics and science to the same class were instructed to complete the first background information section only before they completed the sections on the teaching of mathematics and science. At the Grade 10 level, samples of teachers were selected to complete the questionnaires. Hence, a teacher was required to complete a questionnaire for only one of Mathematics 10, Mathematics 10A, or Science 10. 2.4.2.2 General Survey Items All teachers at the three levels were asked to supply some background information about themselves. The items included in the background section of each questionnaire were intended to provide descriptions of the mathematics and science teachers in the province. They dealt with gender, years of teaching experience, teaching assignment, educational and professional preparation, professional activities, teaching preferences, and amount of time spent on a variety of activities. Details of these items and teachers' responses to them are discussed in Chapters 9 (mathematics teachers) and 10 (science teachers). 2.4.2.3 Mathematics Teacher Items Teachers at all three levels were asked to complete questions and scales on three aspects of the teaching of mathematics: classroom practices, mathematics in the school, and student evaluation as it applies to mathematics. The classroom practices items were intended to provide information about teaching strategies used in mathematics classrooms. The mathematics in the school items included activities and topics specific to each grade; teachers were asked to rate each activity or topic in terms of its importance for the class, how easily it could be taught, and how much they enjoyed teaching the topic. The student evaluation items asked teachers about their evaluation strategies, the extent to which they used several kinds of assessment information, and which factors they felt best explained students' failure to succeed at the appropriate level in mathematics. In addition, teachers of Grades 7 and 10 completed several items dealing with homework. Teachers were asked how often homework was assigned, how much student time was required for assignments, what kinds of tasks were assigned, and how results were dealt with. Details of these items and teachers' responses to them are discussed in Chapter 9. 2.4.2.4 Science Teacher Items Teachers completed questions and scales designed to elicit information on the following aspects of science teaching as it takes place throughout the province: instructional practices and strategies, course materials, the coordination of science in the schools, resources and facilities, safety, and student evaluation. Teachers of science at all three grades answered the same questions; however, questions were modified when necessary to reflect the needs of the different grade levels. Information about science teachers' instructional practices and strategies was obtained by having them rate, on a four-point Likert-type scale, the extent to which they used each of 14 different strategies. To learn about the coordination of science in schools, teachers were asked whether there was a science coordinator in their school and/or district, and if they felt the science coordination was adequate. The resources and facilities items asked teachers to indicate which of 15 different resources and facilities were available, used by them, and adequate. Three items had to do with safety: two pertained to the Workplace Hazardous Materials Information system (WHMIS), and one to the safety concerns they had about their classrooms. The student evaluation items asked teachers about their evaluation strategies, the extent to which they used several kinds of assessment information, and which factors they felt best explained students' failure to succeed at the appropriate level in science. Details of these items and teachers' responses to them are discussed in Chapter 10. 2.5 Methods of Analysis Multiple choice item p-values and point-biserial correlation coefficients were examined to check for problems with items. No problems were found. Table 2.9 shows the descriptive statistics for the multiple-choice achievement booklets. The means, standard deviations, and Kuder-Richardson Formula 20 (KR-20) coefficients are displayed separately for the mathematics items and science items. The KR-20 for each form as a whole is also provided. Given that balancing of the forms by content and difficulty was not possible to the extent usually sought, the form means and standard deviations for each subject are reasonably consistent within each grade. The reliability coefficients range from a low of .63 to a high of .90. As is to be expected, given the limited number of items for each subject at the Grade 10 level, the coefficients are somewhat lower than those for the other grade levels where there were more items for each subject. Overall, the reliability coefficients for each booklets are quite acceptable. Further evidence as to the quality of the items can be obtained by examining the point-biserial correlation coefficients for all the correct answers and distracters. For each of the 666 multiple-choice items used in the assessment, the point-biserial for the correct answer was positive. One of the best ways of understanding the dependability of the instruments used, however, is to develop generalizability coefficients based on the structure of the instruments. For both construction and interpretation of the achievement surveys, the goal/strand structure of the curriculum guides was used. The science goals are common to the three grades: Goal B is related to the processes and skills of science, Goal C is related to the knowledge of science, and Goal D is related to the higher thinking skills used in science. The mathematics strands vary from grade to grade. The numbers of items in each of the forms related to these goals and strands are shown in the tables of specifications later in this chapter. It should be noted that Goal A in science refers to affective outcomes and will be discussed in detail in Chapter 6 of this report. Table 2.9 Reliability Coefficients (KR-20) for the Multiple-Choice Booklets (8K) Notes: a Maximum 35; b maximum 36; c maximum 20. 2.5.1 Generalizability Coefficients The procedures used to generate generalizability coefficients are the same as those used in the 1991 science assessment. The following discussion is taken from that assessment's report (Bateson et al., 1992), updated to reflect the 1995 situation. Using two-factor analyses of variance under a fully-crossed student-by-item (S x I) design for each goal/strand within each form, it is possible to estimate the variance components due to students (s2S), due to items (s2I), and due to the confounding of student by item interaction and random error (s2SIE). Pooling these variance components across the forms for each grade allows for the estimation of generalizability coefficients for each goal/strand in each grade. The principle of construction of generalizability coefficients is to form a ratio of "true" variance with "total" variance, where "true" variance is variance that is desired and "total" variance is the desired variance plus error variance or variance due to undesirable factors. Stated as a formula,
In this case, variance due to students is desired variance, and variance due to student-by-item interaction, confounded with general error variance, is undesirable. The variance due only to items, when items do not interact with students, acts merely as a "scaling" factor. Since the results of this assessment were interpreted without reference to a pre-determined, exterior criterion, any "scaling" effects are not relevant to the assessment. In this case, therefore, the variance due to items can be ignored. The formula for the generalizability coefficient (r2) for each goal/strand is thus:
where s2S is the pooled (averaged) variance due to the students across the four forms in a grade, s2SIE is the pooled (averaged) variance due to student by item interaction and general error across all forms, and nI is the total number of items measuring the goal/strand across all four forms. Table 2.10 and Table 2.11 display the estimated generalizability coefficients for each goal/strand at each grade. Generalizability coefficients, as discussed in this report, should be interpreted in a fashion similar to the interpretation of a squared reliability coefficient for each goal/strand. In order to estimate standard errors of goal/strand scores it is again necessary to employ generalizability theory. The general formula for the variance error of a mean score over any set of items is expressed by the following:
where K = the total number of forms, N = the total number of students, and M = the total number of items. The standard error is then the square root of the variance error. The standard errors of the reporting category scores when the average scores are calculated as a mean p value, expressed as a decimal (e.g., a goal area where the mean score was 78% would have a mean p value of .78) range from 0.03% for the total scores (at least 65 items each) to 0.34% for a category with seven items. These standard errors are so small that almost any differences noted will be statistically significant; the 95% confidence interval on reporting categories is at most plus and minus two thirds of one percentage point (±0.68%). 2.5.2 Standard Errors of Individual Multiple-Choice Items For individual items, standard error calculations are less complicated. The standard error of the p value (SEp) expressed as a decimal (i.e., an item which 67% of the students answered correctly would have a p-value of .67) is equal to the square root of the following quantity: the p value (p) multiplied by 1 minus the p-value (q) divided by the number of students responding to the item.
For a less than infinite population, the finite population correction factor of 1 minus the sample size divided by the population size is applied, making the formula
Table 2.12 presents the approximate standard errors of items with various p values for each of the grades. For items with intermediate p values that are not presented in the table, interpolation of the standard error can be performed. Table 2.12 Approximate Standard Errors of Individual Multiple-Choice Item P-Values (5K) The standard errors are very low; always less than 0.5% for all items. The 95% confidence interval (1.96 X SE above and below any observed p value) is therefore always less than 1%; the largest is at Grade 10 for an item with a p value of .50 where it is 0.83%, and the smallest is at Grade 4 for an item of p value .10 or .90 where it is 0.42%. Provincially, any observed difference of 1% in this study will be statistically significant, even at the individual item level. Results at the school and school district levels have substantially greater errors associated with them, although the results at the reporting category level will ordinarily have small enough errors to allow fairly confident use of the data. Only in very small districts and schools (less than 50 students per grade) does the 95% confidence interval on a reporting category (six or more items) span more than six percentage points. 2.5.3 Participation Rates Sufficient booklets to test all students at Grades 4, 7, and 10 were sent to all schools enrolling those grades in British Columbia. Table 2.13 displays the participation rates for each grade. The number of usable returns is the sum of the multiple choice forms and the number of free response forms. Table 2.13 Participation Rates (5K) Note: aPublic school data only. Enrolment data were not available for the independent schools. b Includes multiple-choice and free-response forms. Provincially, and not including independent schools, there were 1102 schools eligible to participate in the Grade 4 component of the assessment. Most of these schools were also eligible to participate in the Grade 7 component, for which there were 964 eligible schools. There were 275 schools eligible to participate in the Grade 10 component; some of these were also eligible for the Grade 7 component. Table 2.14 shows the levels of participation of students in eligible schools in each grade of the assessment. The vast majority (93%) of schools had Grade 4 school participation rates over 75%. A similarly high percentage (94%) of schools had student participation rates over 75% in Grade 7. At the Grade 10 level, 59% of the schools had student participation rates over 75%, a substantial reduction from the student participation rates of the lower grades. Only 3% (nine schools) had 100% of their students participate in the assessment in Grade 10, compared to 12% and 13%, respectively, of the Grade 4 and Grade 7 schools. Table 2.14 Percentages of Students Participating Within Schools (5K) Provincially, 90% of Grade 4 students, 90% of Grade 7 students, and 75% of Grade 10 students participated in the assessment. Table 2.15 shows the participation rates by district. Nearly two-thirds of the districts had over 90% student participation in Grade 4 (63% of the districts) and Grade 7 (67% of the districts). Only one district (1%) had that level of student participation in Grade 10. Table 2.15 Percentages of Students Participating Within Districts (5K) The relatively low levels of Grade 10 student participation are of concern in that they may render non-comparable the assessment results for a school or district. It is imprudent to compare the results of a district with nearly 100% participation to the results of a district with, say, only a 50% or 60% participation rate. Given the cost of an assessment, and the kinds of decisions in which assessment information is used, it is essential that participation rates be sufficiently high to allow for meaningful analysis and comparison. Accordingly, it is recommended
Requiring student names and teacher names on assessment booklets/answer sheets and questionnaires, and requiring make-up sessions for absent students, would assist in establishing suitable follow-up capability in the case of insufficient or faulty data. |
||||
|
|
||||||