|
Final Report
Mathematics B Regents Examination--Data and Information Related to Standard Setting
A study performed for the New York State Education Department by
Gary Echternacht
Gary Echternacht, Inc.
4 State Park Drive
Titusville, NJ 08560
(609) 737-8187
garyecht@aol.com
April 27, 2001
Introduction
The New York State Board of Regents has established learning standards all students must meet to graduate from high school. One set of learning standards is for mathematics, science and technology. Students entering grade nine in September 2001 and thereafter will have to complete three credits in mathematics and will have two diploma options. All students will have to pass the Mathematics A Regents Examination in order to obtain a Regents Diploma. Students wishing to pursue an Advanced Regents Diploma will also have to pass the Mathematics B Regents Examination. Thus, the Mathematics B Regents Examination is required only for students seeking an advanced Regents Diploma. A curriculum for mathematics B is currently being developed, but is not yet widely implemented.
Although scores for Mathematics B Regents Examination are placed on a numerical scale, essentially there are only three scores—does not meet standards, meets standards, and meets standards with distinction. New York State teachers, using professionally established procedures, have developed the test items, and the items have been pretested and field-tested on samples of students.
The purpose of the study described in this report is to obtain information that the State Education Department can use to establish scores that will classify test takers into does not meet standards, meets standards, and meets standards with distinction categories. Setting cut-scores requires judgment. This study employs professionally established methods to quantify and summarize the judgements of experts related to how individuals who have met the learning standards and curricular objectives for mathematics B will perform on the test.
The Mathematics B Regents Examination
The Mathematics B Regents Examination is a four-part examination. Test content is based on the commencement-level key ideas and performance indicators found in the Learning Standards for mathematics and the Mathematics Resource Guide with Core Curriculum, published by the State Education Department. The four parts of the examination are as follows:
Part I consists of 20 multiple-choice questions. The questions cover all content areas. There is no partial credit given on this section of the test. Answers are recorded on a separate answer sheet. Each correct answer scores two points. Each incorrect answer scores 0 points.
Part II consists of six open-ended response questions. These questions are scored either as 0, 1, or 2. If the question is not answered correct, a score of 0 is given. If the question is answered correctly, but without the work needed to get the answer shown, a score of 1 is given. If the answer is correct and the work is shown, a score of 2 is given.
Part III consists six open-ended response questions. These questions are scored on a 0-4 point scale. A score of 1 is given for a correct answer that is given without supporting work. Otherwise partial credit is given responses.
Part IV consists of two open-ended questions. These questions are scored on a 0-6 point scale. A score of 1 is given for a correct answer that is given without supporting work. Otherwise partial credit is given responses.
In general, the complexity of questions in Parts II-IV increases. Total test scores are found by adding the number of points over all parts of the test.
Specifications for the test are given in the table below:
|
Key Ideas |
Range |
Item type |
Number of Items, points |
|
Mathematical Reasoning |
5-10% |
Multiple-choice |
20, 2 points each |
|
Number and Numeration |
5-10% |
Short constructed response |
6, 2 points each |
|
Operations |
5-10% |
Longer constructed response |
6, 4 points each |
|
Modeling/Multiple Representations |
15-25% |
Extended constructed response |
2, 6 points each |
|
Measurement |
15-20% |
|
|
|
Uncertainty |
10-15% |
|
|
|
Patterns and Functions |
15-25% |
|
|
Test takers are required to use graphing calculators.
A panel of mathematics experts at the high school and college level, with representatives from business and the community, developed the Mathematics B section of the core curriculum from portions of the commencement and four-year sequence level of the mathematics learning standards.
A complete description of the examination, including test specifications and scoring rubrics, is given in a test sampler.
Methods Employed
Data related to the performance standards for the test were obtained from a committee of experts. Judgments from committee members were quantified using standard practices employed by psychometricians who conduct standard setting studies. The committee made their judgments with respect to the difficulty scale resulting from the scaling and equating of field test items. In the filed testing, each item, or score category if the item has multiple scores, is given a difficulty parameter obtained through item response methods. Test items corresponding to various points on the difficulty scale are presented as examples of test items at that difficulty level. The majority of the items used came from the anchor test form. The anchor test form is the test form upon which the passing standards are set and the form to which all later forms of the test will be equated.
Committee members were given definitions of three performance categories—not meeting standards, meeting standards, and meeting standards with distinction. The State Education Department has developed these category definitions and they are applied to all of the Regents tests that are being developed. In addition, committee members were given an exercise designed to help familiarize themselves with the examination and an exercise in which they were asked to categorize some of their students into the performance categories as defined by the State Education Department.
The committee met as a group on March 1, 2001 at the State Education Department.
The standard setting study test used the bookmarking approach because all the multiple choice items and constructed response item had been scaled using item response theory methods and because the bookmarking procedure enables committee members to consider these two item types together.
In the bookmarking procedure, multiple choice items and constructed response items are ordered in terms of their difficulty parameters. The purpose of the items is to illustrate the meaning of the difficulty scale at specific points. Committee members are asked to apply their judgments to these ordered items. The committee meeting is conducted in rounds. The rounds and the activities employed in each round are given
below.
|
Round |
Activity |
|
1 |
Committee members review the Learning Standards for the content area and consider ways of measuring accomplishment of the performance indicators and key ideas. Committee members review the ordered items and learn and understand the increasing complexity of the items and responses required. |
|
2 |
Working individually, committee members set their bookmark for passing. That is, committee members conceive of an individual who has the minimum level of skill and knowledge needed to meet the standards and indicate the last item (or difficulty level) where the hypothetical individual is likely to answer the item correctly two-thirds of the time (or to construct a response that is at least as good). |
|
3 |
Working individually, committee members set their bookmarks for meeting standards with distinction. That is, committee members conceive of an individual who has the minimum level of skill and knowledge needed to meet the standards with distinction and indicate the last item at which such students are likely to answer correctly (or to construct a response that is at least as good). |
|
4 |
A report of the results of round 2 is given committee members. The committee is divided into small groups and the individual results are discussed. Committee members revise their judgments in light of the discussion. |
|
5 |
The same procedure as in round 4 is used with the round 3 results. |
|
6 |
A report of rounds 4 and 5 are given the committee. Also given the committee are the impacts (percent failing and passing with distinction based on field test results). Committee members make final judgments based on the accumulated judgments and data. |
Committee members were also asked two overall questions about accomplishment of the learning standards and test performance. Answers to these questions might aid New York in setting appropriate performance standards on the test. These questions
asked:
Which was the more serious error--to pass a student who has not met the learning standards and curricular objectives or to fail a student who has met the learning standards and curricular
objectives.
Which was the more serious error--to pass with distinction a student who has not met the learning standards and curricular objectives at that level or to fail to pass a student with distinction who had achieved at that level.
Committee Members
The New York State Education Department's Office of Curriculum and Instruction assembled a committee of 20 people to provide judgments for the study. Committee members were, with one exception, current classroom teachers. One committee member was a representative from the teachers union who had taught mathematics and who was well versed in the learning standards and mathematics B curriculum. All committee members were recognized as very knowledgeable of the learning standards and mathematics B curriculum and of how students perform on standardized tests similar to the Mathematics B Examination. Some had worked on an aspect of either the standards or development of the curriculum or tests.
Committee members, their schools, the number of years experience each has in teaching mathematics, and the number of students they are currently teaching advanced mathematics are given in the table
below.
|
Committee Member |
School and Location |
Years Teaching Mathematics |
Number of Students Currently |
|
Steven Arnofsky |
George W Wingate High School
Brooklyn |
32 |
36 |
|
Antoine Atinkpahoun |
Lincoln High School
Yonkers |
3 |
25 |
|
Sheila Batson |
Hempstead High School
Hempsted |
20 |
100 |
|
Carole Bernhardt |
Sheepshead Bay High School
Brooklyn |
30 |
34 |
|
James Burrell |
McKinley High School
Buffalo |
32 |
140 |
|
Virginia Cronin |
Lincoln High School
Yonkers |
20 |
40 |
|
Eva Demyen |
Valley Stream Central High School
Valley Stream |
27 |
40 |
|
Melody DeRosa |
New York State United Teachers |
7 |
0 |
|
Peggy Fisher |
North Syracuse Central High School
Cicero |
30 |
140 |
|
Arlane Frederick |
Kenmore West High School
Kenmore |
31 |
75 |
|
Marcia Horelick |
Saint Anne Institute
Albany |
27 |
25 |
|
Kathleen Klee |
Randolph Central High School |
30 |
115 |
|
John Maus |
North Shore Middle School
Glenhead |
10 |
80 |
|
S. Mary Ann Napier |
St. Francis Preparatory
Fresh Meadows |
30 |
124 |
|
Marguerite Niforos |
Galway High School
Galway |
27 |
55 |
|
David Passer |
Mexico High School
Mexico |
16 |
96 |
|
Harry Rattien |
Townsend High School
Flushing |
28 |
70 |
|
Richard Robertson |
Susquehanna Valley High School
Conklin |
30 |
80 |
|
John Woodward |
Northville High School
Northville |
35 |
100 |
|
Phyllis Zagelbaum |
Samuel H Wang Yeshiva University High School
Holliswood |
32 |
105 |
Committee members were chosen so that they would represent a wide range of schools and different types of students. Each committee member was asked to complete a short background questionnaire that included questions about their sex, ethnic background, and the setting for their school. Results of the questionnaire tabulations are given in the table
below.
|
Characteristic |
Percent of committee |
|
Sex |
|
|
Female |
60% |
|
Male |
40% |
|
Ethnic Background of Committee Member |
|
|
African-American |
15% |
|
White |
85% |
|
School Setting |
|
|
New York City |
25% |
|
Other urban |
20% |
|
Suburban |
25% |
|
Rural |
25% |
|
Not representing a school |
5% |
Findings related to the bookmarking procedure
Findings--Round 2
In round 2 every committee member independently placed his or her own bookmarks for meeting standards. The results of the placements are given in the table below. The table gives the difficulty level of the last item that the student who has minimally met the learning standards is likely to answer correct, the corresponding raw score for that item, and the corresponding percent of students that fall below each cut-point based on the field test data. The cut-points include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee
estimates.
|
Cut-point |
Difficulty |
Raw score (Maximum 87) |
Percent below |
|
Mean + 2 SD |
1.5 |
66 |
98% |
|
Mean + 1 SD |
1.0 |
47 |
83% |
|
Mean |
0.5 |
33 |
58% |
|
Mean - 1 SD |
0.0 |
20 |
28% |
|
Mean - 2 SD |
-0.5 |
8 |
8% |
|
75% |
0.8 |
46 |
82% |
|
Median |
0.7 |
41 |
75% |
|
25% |
0.3 |
26 |
41% |
It is important to note that individuals in the field test had not take the mathematics B course. The field tests were administered on a voluntary basis and the many of test takers had just completed the mathematics 3 course. Thus, the estimates provided are surely overestimates of the percentage of students who fall below the cut-point.
Findings--round 3
In round 3 every committee member independently placed his or her own bookmarks for meeting standards with distinction. The results of the placements are given in the table below. The table gives the raw score, difficulty of the item corresponding to the cut-point, and the corresponding percent above that cut-point based on the field test data. The cut-points include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee
estimates.
|
Cut-point |
Difficulty |
Raw score (Maximum 87) |
Percent achieving |
|
Mean + 2 SD |
2.0 |
72 |
1% |
|
Mean + 1 SD |
1.8 |
70 |
1% |
|
Mean |
1.5 |
66 |
2% |
|
Mean - 1 SD |
1.3 |
58 |
5% |
|
Mean - 2 SD |
1.0 |
47 |
17% |
|
75% |
1.6 |
68 |
1% |
|
Median |
1.5 |
66 |
2% |
|
25% |
1.4 |
64 |
2% |
Again, it is important to note that individuals in the field test had not take the mathematics B course. The field tests were administered on a voluntary basis and the majority of test takers had just completed the mathematics 3 course. Thus, the impact estimates provided are surely underestimates of the percentage of students who might fall above the cut-points.
Findings--round 4
In round four, committee members received a report of their round two results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for meeting standards based on the information and knowledge they had gained up to this point. The round four results are given in the table
below:
|
Cut-point |
Difficulty |
Raw score (Maximum 87) |
Percent below |
|
Mean + 2 SD |
1.1 |
50 |
88% |
|
Mean + 1 SD |
.7 |
41 |
75% |
|
Mean |
.4 |
31 |
53% |
|
Mean - 1 SD |
0 |
20 |
27% |
|
Mean - 2 SD |
-.4 |
9 |
8% |
|
75% |
.8 |
46 |
83% |
|
Median |
.3 |
29 |
49% |
|
25% |
.1 |
21 |
30% |
Similar comments about the nature of the field test results apply again.
Findings--round 5
In round five, committee members received a report of their round three results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for meeting standards with distinction based on the information and knowledge they had gained up to this point. The round five results, which generally show less variation than the round three results, are given in the table
below.
|
Cut-point |
Difficulty |
Raw score (Maximum 87) |
Percent above |
|
Mean + 2 SD |
2 |
72 |
1% |
|
Mean + 1 SD |
1.7 |
69 |
1% |
|
Mean |
1.5 |
66 |
2% |
|
Mean - 1 SD |
1.3 |
58 |
5% |
|
Mean - 2 SD |
1.1 |
50 |
12% |
|
75% |
1.7 |
69 |
1% |
|
Median |
1.5 |
66 |
2% |
|
25% |
1.4 |
64 |
2% |
Similar comments about the nature of the field test results apply again.
Findings--round 6
In round six, committee members received a report of their round four and five judgments. They also received a report of the impact of their estimates from that round. Impact was reported in terms of the frequency distributions of the field test scores. The committee was also advised that scores from field-testing will underestimate operational test performance, but that the amount of the underestimate was not known. Committee members then returned to their groups and discussed the report and their judgments. At the end of the discussion, committee members were asked to place new bookmarks for both meeting standards and meeting standards with distinction based on the information and knowledge they had at that time. Results of this final placement are given in the table
below.
|
Cut-point |
Meeting standards |
Meeting standards with distinction |
|
Diff |
Raw Score |
Percent below |
Diff |
Raw Score |
Percent above |
|
Mean + 2 SD |
.6 |
38 |
70% |
1.8 |
70 |
1% |
|
Mean + 1 SD |
.4 |
31 |
53% |
1.6 |
68 |
1% |
|
Mean |
.2 |
25 |
39% |
1.4 |
64 |
2% |
|
Mean - 1 SD |
0 |
20 |
28% |
1.2 |
56 |
6% |
|
Mean - 2 SD |
-.2 |
14 |
15% |
1 |
47 |
17% |
|
75% |
.3 |
29 |
49% |
1.5 |
66 |
2% |
|
Median |
.1 |
23 |
33% |
1.5 |
66 |
2% |
|
25% |
.1 |
23 |
33% |
1.1 |
50 |
12% |
Other Judgments Obtained
When tests are used to classify individuals into categories, there are always two kinds of classifying errors that are made. For example, in classifying students into passing and failing categories, a student may be misclassified into these two categories. These misclassifications always occur and they are inversely related. That is when we try to reduce one type of classification error, we increase the other type of classification error.
With respect to the relative severity of the errors of classification, 85% of the committee said that failing a student who should pass was more serious than passing a student who should fail. Fifteen percent of the committee said the opposite. Thirty percent of the committee said that passing a student with distinction who should only pass was more serious than just passing a student who should pass with distinction. Seventy percent of the committee said the opposite.
Discussion and Recommendations
The purpose of this study is to obtain data and information that New York may use in setting passing points for its Mathematics B Examination. The data should be used to guide those decisions.
The committee that provided the data was diverse and well represented the diversity of New York students, teachers, and school districts. With that diversity, it is not surprising that committee judgments varied.
The final bookmarks from the procedure are given in the table
below.
|
Cut-point |
Meets standards |
Meets standards with distinction |
|
Diff |
Raw Score |
Percent below |
Diff |
Raw Score |
Percent above |
|
Mean + 2 SD |
.6 |
38 |
70% |
1.8 |
70 |
1% |
|
Mean + 1 SD |
.4 |
31 |
53% |
1.6 |
68 |
1% |
|
Mean |
.2 |
25 |
39% |
1.4 |
64 |
2% |
|
Mean - 1 SD |
0 |
20 |
28% |
1.2 |
56 |
6% |
|
Mean - 2 SD |
-.2 |
14 |
15% |
1 |
47 |
17% |
|
75% |
.3 |
29 |
49% |
1.5 |
66 |
2% |
|
Median |
.1 |
23 |
33% |
1.5 |
66 |
2% |
|
25% |
.1 |
23 |
33% |
1.1 |
50 |
12% |
Further, the committee overwhelmingly believes that the error of failing a student who should pass should be minimized. The committee also believes, though to a lesser extent, the same about the passing with distinction classification.
Finally, the impact data—i.e., the performance data from the field-testing—was based on students who had not had the Mathematics B course and many of whom had just completed the mathematics 3 course. Thus, these estimates of the percentage of students failing are overestimated. The percentage of students who would achieve passing with distinction is also underestimated.
What should be made of these results?
The study author recognizes that New York has the responsibility and duty to set cut-points in such a way that the purpose of the testing program is best accomplished. That requires judgment and consideration of all the data and information that is available at the time cut-points are set.
To the study author, one item stands out in importance. The field test data upon which the difficulty parameters were calculated and which forms the basis of estimating the impact of the average passing and passing distinction points is seriously flawed. It is flawed because the students who had taken the field test had not been exposed to the course content and because most of the students had just completed the mathematics 3 course. It is possible, and certainly highly probable at the higher levels of difficulty, that the items chosen to represent specific levels of difficulty do not accurately represent those levels of difficulty.
Thus, the study author’s strongest recommendation to New York is to repeat both the scaling and standard setting studies after this year’s administration is completed. At that time, more valid and reliable data should be available. Further, the study author urges New York to repeat both the scaling and standard setting annually until the curriculum is in place statewide and operational testing is taking place.
It is extremely important to recognize that cut-points are not immutable. All cut-points should be set based on the best information that is available. But as more information becomes available, cut-points should be revised (or at a minimum reviewed) to make sure that they are consistent with the information available. This may result in periodic raising or lowering cut-points until stable conditions of instruction and testing conditions are achieved.
Having said that, the issue at hand is what to implement as a passing score and passing with distinction score for the 2001 operational year. For 2001, the study author recommends that New York choose a cut-score for meeting standards between 23 and 31 raw score points and a cut-score for meeting standards with distinction between 56 and 68. If forced to recommend single cut-points, the study author would recommend raw scores of 30 and 66 for the two cut-points. The study author is most concerned over the effect the reported impact data had on the round 6 bookmarks. There are 87 possible raw score points on the test and although the test is recognized as being difficult, having a cut-score of about 30 raw score points appears to the study author to be very low. For that reason, the recommended cut-score for meeting standards is slightly higher than the round 6 average and median.
The study author believes that test developers and other state staff who know and understand implementation of the Mathematics B curriculum can make the best choice of cut-points within the proposed ranges.
In general, the study author also believes that medians are better guides than means because the judgments committee members give appear not to be normally distributed.
|