Standard-setting study

A standard-setting study is an official research study conducted by an organization that sponsors tests to determine a cutscore for the test. To be legally defensible and meet the Standards for Educational and Psychological Testing, a cutscore cannot be arbitrarily determined, it must be empirically justified. For example, the organization cannot merely decide that the cutscore will be 70% correct. Instead, a study is conducted to determine what score best differentiates the classifications of examinees, such as competent vs. incompetent.

Standard-setting studies are often performed using focus groups of 5-15 subject matter experts that represent the testing organization.

Types of standard-setting studies
Standard-setting studies fall into two categories, item-centered and person-centered. Examples of item-centered methods include the Angoff, Ebel, Nedelsky, and Bookmark methods, while examples of person-centered methods include the Borderline Survey and Contrasting Groups approaches. These are so categorized by the focus of the analysis; in item-centered studies, the organization evaluates items with respect to a given population of persons, and vice versa for person-centered studies.

Item-centered studies
The Angoff approach is very widely used. This method requires the assembly of a group of subject matter experts, who are asked to evaluate each item and estimate the proportion of minimally competent examinees that would correctly answer the item. The ratings are averaged across raters for each item and then summed to obtain a panel-recommended raw cutscore. This cutscore then represents the score which the panel estimates a minimally competent candidate would get.

The Bookmark method is another widely used item-centered approach. Items in a test (or a subset of them) are ordered by difficulty, and each expert places a "bookmark" in the sequence at the location of the cutscore.

Person-centered studies
Rather than the items that distinguish competent candidates, person-centered studies evaluate the examinees themselves. While this might seem more appropriate, it is often more difficult because examinees are not a captive population, as is a list of items.

For example, a if a new test comes out regarding new content (as often happens in information technology tests), the test could be given to an intial sample called a beta sample, along with a survey of professional characteristics. The testing organization could then analyze and evaluate the relationship between the test scores and important statistics, such as skills, education, and experience. The cutscore could be set as the score that best differentiates between those examinees characterized as "passing" and those as "failing."