My School Committee Blog: How to Evaluate the 9th Grade Science Course

Given the interest among readers on my blog about how to evaluate the new required 9th grade science course, and what seems to be the general support for conducting such an evaluation, I'm posting a memo that Steve Rivkin (School Committee Candidate) and I submitted last fall to the Superintendent, copied to the Chairs of both the Amherst and Regional School Committee chairs. When this memo, which describes how to conduct such an evaluation, was submitted, we offered to meet with the superintendent, high school principal, and/or high school science teachers. I also personally offered to have an Amherst College thesis student conduct the evaluation, under my supervision. We have yet to receive a response from anyone regarding such an evaluation.

To: Alton Sprague
From: Steve Rivkin, Ph.D., and Catherine Sanderson, Ph.D.
Cc: Michael Hussin, Andy Churchill
Re: Review of 9th Grade Science
Date: October 15, 2008

This is a brief note on issues relevant to the review of 9th grade science, and we hope this is helpful. We both conduct program evaluations in our work, and our CVs are attached. We would also be both glad to answer questions or suggestions about conducting such an evaluation at any time, so please be in touch if and when that would be helpful.

Review of 9th grade Science Course Change
A comprehensive and informative evaluation of the change in 9th grade science requires the development of a valid empirical model, which includes consideration of both short-tern and long-term outcomes and the collection of the requisite data. In particular, it is important to produce separate estimates of the effect of the change on students who would have been in honors biology, on students who would have been in honors earth science, and on students who would have been in college prep earth science. In addition, it is important to obtain estimates for specific sub-groups, such as students of color, lower income students, and females. Such an evaluation will help the School Committee, Superintendents, and high school administrators and teachers understand the impact of the new required 9th grade science course on all students (and it is certainly possible that the new course will have a different impact on different students).

We briefly describe an ideal but not feasible evaluation framework as a kind of Holy Grail for evaluation in which all changes in student outcomes can be attributed directly to the change in 9th grade science. We then turn to a feasible approach given the available information and discuss some of the problems that must be addressed given that different kids experienced the two 9th grade science courses at different times. Finally, we describe the data needed for a successful evaluation.

Ideal evaluations: In the ideal evaluation, students would attend high school and go on to post-secondary activities first under the old 9th grade science curriculum and then under the new 9th grade science curriculum. Everything else would be identical, so that any differences in student outcomes potentially including satisfaction with 9th grade science, MCAS performance, number of science courses taken, number of AP science courses taken in various subjects, colleges attended, college majors, performance in science, future occupation and earnings could be directly attributed to the change in 9th grade science.

Of course people only go through high school once, so this “ideal” is not feasible. An alternative, and more feasible, approach that is commonly used in research to make such comparisons would be to randomly assign 9th graders to take either the old or the new science curriculum. Differences in outcomes between the two groups would provide a valid estimate of the effect of changing the 9th grade science curriculum if the randomization is done well. However, in this case such randomization is clearly not appropriate, as the high school would have had to offer two sets of science courses and parental and student efforts to get into the course of their choosing would have compromised the experiment. Most important, this was not done.

Feasible evaluation: The feasible alternative is to compare the cohort of students who attended 9th grade just prior to the change in science curriculum to a cohort of students who attended 9th grade just following the change. Although this approach is similar to the ideal framework, there are a number of potential impediments to a successful evaluation. These include cohort differences in student characteristics (such as interest in science, math background, etc.) as well as other changes (such as in teachers, curriculum, school policies, etc.). One should also acknowledge that the final year under the old curriculum could be less interesting and less well organized than the previous years as teachers understand that is the final year. Moreover, the first year under the new program might have some glitches, though the enthusiasm for the new program might be unusually strong in its first year.

In terms of the implementation of this evaluation, some of these potential impediments cannot be directly addressed and some can be with appropriate steps. On the one hand, changes in school policies, personnel, general interest in science, and other factors can be noted and considered but not directly incorporated in the analysis. For example, it would be important to know whether the different science courses influence the number of students who leave the school either as dropouts or school transfers (e.g., do some students who would have taken honors biology opt out of the high school for private school once it is no longer offered?). On the other hand, differences in student preparation and characteristics can be addressed with information on middle school performance, particularly in eighth grade mathematics, and student demographics.

There are a number of different methods that can be used to estimate the effects of the new curriculum on different groups of students.

Option #1: The state of the art is to use the method of propensity score matching to essentially “match” each student in the pre-change cohort to a student in the post-change cohort on the basis of middle score academic performance, family income, gender, race, ethnicity, and other relevant factors. This method provides a way to correct for cohort differences along a number of dimensions. (This is the approach that Catherine is now using to examine the effectiveness of the Pipeline Program.)

Option #2: An alternative and less technically demanding approach is to classify all students in the post-change cohort by which class they would have taken under the old system. Then students who actually took college prep earth science can be compared with those who would have taken the course, students who actually took honors earth science can be compared with those who would have been in the course, and students who actually took honors biology can be compared with students who would have been in the course. Although some students will be wrongly assigned to a particular group (because we don’t actually know what course they would have taken), one can make a pretty good guess about which course they would have taken based on 8th grade math preparation (given that only students with 8th grade algebra were eligible for the honors biology class).

Either of these methods clearly requires information on middle school academic performance and student demographics in order to mitigate the effects of cohort differences and to allocate students in the post-change group into the various courses they would have taken under the old system. It is our understanding that this type of information on individuals was not collected in the initial round of data collection (with the exception of gender and race). This complicates the evaluation and does rule out certain comparisons that ideally one could have made. However, it is certainly possible to collect information on middle school transcripts (including math class taken in 8th grade), high school transcripts (including 9th grade science course taken) and demographic characteristics for the pre-cohort students that can be used to both create the post-change groups and collect longer-term outcome data for the pre-change cohort.

Data needed: The initial short-term outcomes were measured based on a survey of 9th grade science students (which includes interest in science and future intentions to take science). We agree that these are important questions, and that students should be surveyed on such measures again this year to evaluate the qualitative outcomes of the different science courses. It is also important to collect data on longer-term, and quantitative, outcomes of such courses, and to build such plans into the overall evaluation model. These outcomes should include scores on 10th grade Science MCAS (biology, chemistry), number (and type) of science courses taken, and achievement on standardized tests (e.g., SAT IIs, APs). Of course ideally longer-term outcomes (such as college attended, proficiency in college science courses, and occupation) would be measured, but such outcomes are probably not feasible, nor can such data be collected in a timely enough way for decisions about the current science program to be made.

4 comments:

Anonymous said...: Maybe they (the ex-co-superintendents, the rest of the SC) think that if they ignore you, you might go away?!

You basically offered to do everything - all the work - and yet they can't respond?

It sounds like it's time the rest of us flooded the SC mailbox with demands for an evaluation of 9th grade science, a more rigorous math and science curriculum, and all the other things we've discussed on these boards. Would it be worth it to wait until the new members are elected?

I would like the ability to be anonymous though - but if we email the SC, they will know who is emailing them, right?; March 17, 2009 at 1:55 PM
Joel said...: The first comment about anonymity is almost too funny. Sad really. We were all assured that the residents of Amherst didn't need an anonymous electronic suggestion box. I guess we didn't need it until we needed it.

I'm sure the old guard of the SC will once again refuse to study what's going on. They just don't want to know.

They are, however, willing to admit that their unwillingness to act responsibly is indeed "lame."; March 17, 2009 at 6:58 PM
Anonymous said...: I think it's possible that the proposal wasn't appropriate. I read it and think there are so many variables and uncertainties in even the best-case study that the value of its outcome is questionable. And I'm curious - wouldn't we learn as much with a close look at the 9th grade science curriculum to see if it's rigorous enough? However, Catherine and Steve deserved the courtesy of a response to their effort, no question. And Catherine, I support your efforts on the school committee.; March 17, 2009 at 7:21 PM
Anonymous said...: Three things.

First, do not, under ANY circumstance, have one of *your* students do the review. Even if this isn't a direct violation of the state ethics law and the related "conflict of interest" stuff (which it might not be because AC is a private college and thus you are not a "state employee") it is going to become a multifaceted mess that you really don't want to deal with.

Think logically here: lets say that you have some views that - for the sake of argument - are completely accurate. Let's say that your student(s) reaches the same view because they are, after all, accurate. How, exactly, are you going to convince even neutral people that your student(s) reached said conclusion independent of your supervision (etc.)????

This is what professional associations are for and you have someone out of, say, BC volunteer his/her/its student for this purpose.

Second, I see a major research design flaw. IMHO far more relevant is the question is if either parent holds either a BS degree or the foreign version thereof. You are going to have some children with a (or both) parent who is either a grad student or post doc at UMass (Mark's Meadow) and this is going to be a bigger distinction than race, gender or SES because not only do all parents homeschool to some extent, there are going to be some attitudes taught about science as well.

Third, many ask the difference between the PhD and the EdD -- this is it. Respectfully, your model is wrong, you don't get a control group as much as you have this sample population and the larger population -- state norms, national norms and norms adjusted to be similar to your sample.

All of this is likely moot as I see Amherst as a place where people don't want to confuse themselves with the facts. So what if the DPW is spraying stuff on the roads that will destroy everyone's vehicle - it is "environmentally friendly" and thus good. But I digress...

*Any* evaluation has merit, but I would be cautious about what would be the best approach...; March 17, 2009 at 8:17 PM

My School Committee Blog

My Goal in Blogging

Tuesday, March 17, 2009

How to Evaluate the 9th Grade Science Course

4 comments:

Upcoming Meetings

Additional Information/Contacts

About Me

Blog Archive

Education Links