The difficulty in wording survey questions about American education

Emily Richmond points out some of the difficulties in creating and interpreting surveys regarding public opinion on American education:

As for the PDK/Gallup poll, no one recognizes the importance of a question’s wording better than Bill Bushaw, executive director of PDK. He provided me with an interesting example from the September 2009 issue of Phi Delta Kappan magazine, explaining how the organization tested a question about teacher tenure:

“Americans’ opinions about teacher tenure have much to do with how the question is asked. In the 2009 poll, we asked half of respondents if they approved or disapproved of teacher tenure, equating it to receiving a “lifetime contract.” That group of Americans overwhelmingly disapproved of teacher tenure 73% to 26%. The other half of the sample received a similar question that equated tenure to providing a formal legal review before a teacher could be terminated. In this case, the response was reversed, 66% approving of teacher tenure, 34% disapproving.”

So what’s the message here? It’s one I’ve argued before: That polls, taken in context, can provide valuable information. At the same time, journalists have to be careful when comparing prior years’ results to make sure that methodological changes haven’t influenced the findings; you can see how that played out in last year’s MetLife teacher poll. And it’s a good idea to use caution when comparing findings among different polls, even when the questions, at least on the surface, seem similar.

Surveys don’t write themselves nor is the interpretation of the results necessarily straightforward. Change the wording or the order of the questions and results can change. I like the link to the list of “20 Questions A Journalist Should Ask About Poll Results” put out by the National Council on Public Polls. Our public life would be improved if journalists, pundits, and the average citizen would pay attention to these questions.

Sociology professor developed and used computer program for grading papers

Sociologist Ed Brant has developed and used a grading program for student papers:

Brent designed software called a SAGrader to grade student papers in a matter of seconds. The program works by analyzing sentences and paragraphs for keywords and relationships between terms. Brent believes the program can be used as a tool to save time for teachers by zeroing in on the main points of an essay and allowing teachers to rate papers for the use of language and style.

“I don’t think we want to replace humans,” Brent says in an article in Wired. “But we want to do the fun stuff, the challenging stuff. And the computer can do the tedious but necessary stuff.”

Using the software still requires work on the teacher’s part, though. To prepare the program to grade papers, a teacher must enter all of the components they expect a paper to include. Teachers also have to consider the hundreds of ways a student might address the pieces of an essay.

Interestingly, one person in the testing business argues that the biggest issue is not how well the software does at grading but whether people believe the program can do a good job:

But it’s tough to tout a product that tinkers with something many educators believe only a human can do.

“That’s the biggest obstacle for this technology,” said Frank Catalano, a senior vice president for Pearson Assessments and Testing, whose Intelligent Essay Assessor is used in middle schools and the military alike. “It’s not its accuracy. It’s not its suitability. It’s the believability that it can do the things it already can do.”

If this were used widely and becomes normal practice, it could redefine what it means to be a professor or teacher. This is not a small issue in an era where many argue that learning online or from a book could be as effective (or at least as cost-effective) compared to sending students to pricey colleges.

I wonder what percentage of sociologists would support using such grading programs in their own classrooms and throughout academic institutions.

The prospect of the automated grading of essays

As the American public debates the exploits of Watson (and one commentator suggests it should, among other things, sort out Charlie Sheen’s problem) how about turning over grading essays to computers? There are programs in the works to make this happen:

At George Mason University Saturday, at the Fourth International Conference on Writing Research, the Educational Testing Service presented evidence that a pilot test of automated grading of freshman writing placement tests at the New Jersey Institute of Technology showed that computer programs can be trusted with the job. The NJIT results represent the first “validity testing” — in which a series of tests are conducted to make sure that the scoring was accurate — that ETS has conducted of automated grading of college students’ essays. Based on the positive results, ETS plans to sign up more colleges to grade placement tests in this way — and is already doing so.

But a writing scholar at the Massachusetts Institute of Technology presented research questioning the ETS findings, and arguing that the testing service’s formula for automated essay grading favors verbosity over originality. Further, the critique suggested that ETS was able to get good results only because it tested short answer essays with limited time for students — and an ETS official admitted that the testing service has not conducted any validity studies on longer form, and longer timed, writing.

Such programs are only as good as the algorithm and method behind it. And it sounds like this program from ETS still has some issues. The process of grading is a skill that teachers develop. Much of this can be quantified and placed into rubrics. But I would also guess that many teachers develop an intuition that helps them quickly apply these important factors to work that they read and grade.

But on a broader scale, what would happen if the right programs could be developed? Could we soon reach a point where professors and teachers would agree that a program could effectively grade writing?

An example of statistics in action: measuring faculty performance by the grades students receive in subsequent courses

Assessment, whether it is for student or faculty outcomes,  is a great area in which to find examples of statistics. This example comes from a discussion of assessing faculty by looking at how students do in subsequent courses:

[A]lmost no colleges systematically analyze students’ performance across course sequences.

That may be a lost opportunity. If colleges looked carefully at students’ performance in (for example) Calculus II courses, some scholars say, they could harvest vital information about the Calculus I sections where the students were originally trained. Which Calculus I instructors are strongest? Which kinds of homework and classroom design are most effective? Are some professors inflating grades?

Analyzing subsequent-course preparedness “is going to give you a much, much more-reliable signal of quality than traditional course-evaluation forms,” says Bruce A. Weinberg, an associate professor of economics at Ohio State University who recently scrutinized more than 14,000 students’ performance across course sequences in his department.

Other scholars, however, contend that it is not so easy to play this game. In practice, they say, course-sequence data are almost impossible to analyze. Dozens of confounding variables can cloud the picture. If the best-prepared students in a Spanish II course come from the Spanish I section that met at 8 a.m., is that because that section had the best instructor, or is it because the kind of student who is willing to wake up at dawn is also the kind of student who is likely to be academically strong?

It sounds like the relevant grade data for this sort of analysis would not be difficult. The hard part is making sure the analysis includes all of the potentially relevant factors, “confounding variables,” that could influence student performance.

One way to limit these issues is to limit student choice regarding sections and instructors. Interesting, this article cites studies done at the Air Force Academy, where students don’t have many options in the Calculus I-II sequence. In summary, this setting means “the Air Force Academy [is] a beautifully sterile environment for studying course sequences.”

Some interesting findings both from the Air Force Academy and Duke: students who were in introductory/earlier classes that they considered more difficult or stringent did better in subsequent courses.

Americans blame parents for bad education

A perpetual question in our country is who to blame for poor educational results. A recent poll shows a large number of Americans blame parents:

An Associated Press-Stanford University Poll on education found that 68 percent of adults believe parents deserve heavy blame for what’s wrong with the U.S. education system — more than teachers, school administrators, the government or teachers unions.

Only 35 percent of those surveyed agreed that teachers deserve a great deal or a lot of the blame. Moms were more likely than dads — 72 percent versus 61 percent — to say parents are at fault. Conservatives were more likely than moderates or liberals to blame parents.

Those who said parents are to blame were more likely to cite a lack of student discipline and low expectations for students as serious problems in schools. They were also more likely to see fighting and low test scores as big problems.

Figuring out how to improve education is always a difficult issue to address. I’ve always thought the discussion is compounded by the fact that people feel more control or duty to check on how their property taxes are being used for education. People gripe about paying money to the federal government or the state but when it comes to the more local level and education, everyone has an opinion (and often a solution).

As the story goes on to day, it is not all about blaming: “55 percent believe their children are getting a better education than they did, and three-quarters rate the quality of education at their child’s school as excellent or good.”

A final thought: the next question on the survey should have been: if you are a parent of a child in school, do you blame yourself for your child’s performance? Or do the people who blame parents really blame other parents?

Fantasy football in the classroom

Fantasy football is not just for adults or for recreation.  Some teachers are now using it in the classroom to help teach math:

Empirical data show that classroom fantasy-sports programs help improve grades and test scores.

In a 2009 survey of middle and high school students by the University of Mississippi, 56 percent of boys and 45 percent of girls said they learned math easier because they played fantasy sports in class. And 33 percent of boys and 28 percent of girls said their grades improved.

This sounds like a fun way to learn math. And the story suggests that whole families got involved with the process and helped the children decide whom to draft and how to score.

On another front: will everyone will be playing fantasy football in the future?

LA Times portal on value-added analysis of teachers

The Los Angeles Times has put together an information and opinion filled portal regarding their recent publication of a value-added analysis of Los Angeles teachers.

Measuring teacher performance is a tricky subject as there are a number of factors at play in a student’s academic performance. In an article, the newspaper summarizes how value-added scores are estimated:

Value-added estimates the effectiveness of a teacher by looking at the test scores of his students. Each student’s past test performance is used to project his performance in the future. The difference between the child’s actual and projected results is the estimated “value” that the teacher added or subtracted during the year. The teacher’s rating reflects his average results after teaching a statistically reliable number of students.

In addition to these methodological questions, there are number of other fascinating issues: should this sort of information be publicly available and how will affect teacher’s performance? Is it an accurate assessment of what teachers do? What should be done for the teachers who fall outside the normal range? How will the politics of all of this play out?

For those interested in education and measuring outcomes, this all makes for interesting reading.

(As a side note: I can only imagine what discussions would ensure if similar information was published regarding college professors.)