II
QUALIFYING TESTS

― 185 ―

7
The Forest of Pencils

Examinations, sir, are pure humbug. . . . If a man is a gentleman he knows quite enough, and if he is not a gentleman, whatever he knows is bad for him.
Lord Fermor, in Oscar Wilde's
The Picture of Dorian Gray

Qualifying tests measure aptitude and competency in a variety of abilities as part of the evaluation process for entering, continuing in, or being promoted in schools, occupations, the armed forces, and other organizations. They have a much wider range of application than authenticity tests because the latter are limited to those who are suspected of wrongdoing (sometimes, to be sure, a generalized or diffuse suspicion, as with preemployment drug or integrity tests), whereas modern, industrialized society, with its elaborate division of labor and highly specialized skills and knowledge, has generated an extensive regime of qualifying testing that touches virtually everyone. While people would prefer to avoid authenticity tests whenever possible, they willingly submit themselves to qualifying tests because these unlock the gates to rewards and success in life.

This is not to suggest that everybody likes taking qualifying tests. Some do, but many others are terrified of them. Those who consistently have done poorly on tests cringe at the prospect of yet another demonstration of their inadequacy and limited pros-

― 186 ―

pects in life, unless repeated failures in tests and other personal evaluations have put them beyond caring. Others, who have previously done well, shudder at the prospect that this time they may fail—that this test might unmask all the previous ones as mistakes and reveal their true ability to be average or only slightly above. And, of course, there are the few who have always done well on tests and are confident that they always will. They take tests gladly as little pleasures in themselves, reinforcements of their self-image as gifted individuals who gain "A" grades as a matter of course and score above the 95th percentile across the board.

Regardless of people's feelings about them, qualifying tests are a key factor for living successfully in contemporary society. Those who reject the message of personal insufficiency reiterated by poor test performance may turn off on tests, but then the system turns off on them. They are excluded from educational opportunities and good jobs and (just as the tests predicted!) they never are able to accomplish much. The rest continue to take tests, whether they do it happily or under stress, and that has a great deal to do with the niche they find in life. Qualifying tests constitute one of the central conditions of contemporary society. Here I trace how it came to be that way.

The Chinese Civil Service Examination

The distinction of producing the world's first system of qualifying testing unquestionably belongs to imperial China, which predated the West in this area by a thousand years or more. As early as the Chou dynasty (ca. 1122–256 B.C. ) some form of tests existed for identifying the talented among the common people, and during the T'ang dynasty (A.D. 618–907) these were developed into a formal system of examinations.^[1] But it was from around A.D. 1000, when imperial power rose to near absolutism in the Sung dynasty (960–1279), that the civil service examination was opened to nearly everyone and became the most important avenue to position, power, and prestige in China.^[2] The system took its final form during the Ming dynasty (1368–1662) and remained in force until

― 187 ―

the first decade of the twentieth century.^[3] It attracted Western attention as early as the sixteenth century, and the British civil service examinations both in India and at home were influenced by the Chinese examination system.^[4]

In a history radically different from the West, the power of hereditary aristocracy in China was largely finished by the time of the Sung dynasty. Thenceforth, the class holding power, wealth, and prestige was composed mainly of administrators and bureaucrats in the emperor's civil service. Membership in this class depended more on passing the civil service examination than on parentage. Persons from certain occupations (and their immediate descendants) were excluded from the examinations: "watchmen, executioners, yamen torturers, labourers, detectives, jailors, coroners, play actors, slaves, beggars, boatpeople, scavengers, musicians and a few others."^[5] Still, the great majority of the population was eligible, and the examinations effectively prevented the formation of a hereditary ruling class. For example, the lists of those who passed the highest-level examinations in 1148 and 1256 show that only 40 percent were sons, grandsons, or great-grandsons of civil servants, while the rest came from families with no history in the bureaucracy.^[6] Thus, for nearly a thousand years, beneath the overall control of an emperor, China was governed by a meritocratic elite.

It was an elite with distinct privileges. Those who passed the examinations given in the prefectural capitals were designated sheng-yuan , or government students. This entitled them to wear a distinctive dress and to courteous treatment from government officials. They were exempted from government labor service and, should they run afoul of the law, from the demeaning punishment of lashing. These perquisites were not perennial, however. Shengyuan had to pass the examination each time it was given (every three years) to retain their status.^[7] Those who went on to pass higher-level examinations became government officials and enjoyed great privilege, power, and wealth.

Because they constituted the gateways to exalted position, people reputedly spared no exertion to achieve success in examinations. According to C. T. Hu,^[8] families with high aspirations for their sons would even commence their education before birth by

― 188 ―

"requiring expectant mothers to be exposed to books and cultural objects" (a waste of time in this male-dominated system, it would seem, if the baby turned out to be a girl). Aspiring scholars brooked no distractions in their quest for knowledge. When he was young and poor, one famous eighteenth-century poet, scholar, and official "confined himself and two younger brothers in a second floor room without stairs for more than a year at a time, in order not to interrupt their studies."^[9] Another is said to have carried his reading into the night by the light of fireflies he kept in a gauze bag. (Some of these stories, however, may have become embellished with retelling. Dubious about the efficacy of fireflies, for example, the K'ang Hsi emperor had his retainers collect hundreds of them and found that he could not read a single character by their light.)^[10]

The prefectural examinations were the beginning of a multistage process. The number who passed varied with the region, historical period, and needs of the bureaucracy, but the range seems to have been from 1 to 10 percent. Those who passed the prefectural examination were eligible for a further preparatory examination that, if they passed (about half did), qualified them for an examination held every three years in the provincial capital. Successful candidates at the provincial level (again, 1 to 10%) were admitted to the metropolitan examination, also held each three years, in the national capital. Those who passed this examination were summoned to the palace for a final examination conducted by the emperor himself, on the basis of which their final rank on the list of successful candidates was determined. Those who passed all of these examinations were appointed to administrative posts or to the prestigious Hanlin Academy. (Its name translatable as "The Forest of Pencils," or "Brushes," scholars of the Hanlin Academy compiled books and drafted decrees for the emperor.) Lower-level official appointments were often also given to many who qualified for the metropolitan examination but did not pass it.^[11]

Masses of candidates would present themselves for examinations—up to 10,000 for prefectural examinations and 20,000 for the provincial examinations in, for example, the southern provincial capital of Chiang-ning-fu. Examinees were crowded into

― 189 ―

huge, walled compounds that contained thousands of tiny cells. Huddled in his cubicle for three days and two nights, under the scrutiny of guards who prowled the lanes and watched from towers, the candidate would write commentaries on the Confucian classics, compose poetry, and write essays on subjects pertaining to history, politics, and current affairs.^[12] No one could enter or leave the compound during an examination, a rule so strictly enforced as to cause certain inconveniences on occasion:

If a candidate died in the middle of an examination, the officials were presented with an annoying problem. The latch bar on the Great Gate was tightly closed and sealed, and since it was absolutely never opened ahead of the schedule, the beleaguered administrators had no alternative but to wrap the body in straw matting and throw it over the wall.^[13]

Numerous stories circulated of candidates going insane during the examination, or being visited by ghosts who would confuse and attack them in retaliation for evildoing, or assist them as a reward for some previous act of kindness.^[14] Accounts of miraculous events in examinations dramatized the Buddhist principle of preserving all living things: an examiner finally passed a paper he had twice rejected when, each time he discarded it, three rats brought it back to his desk. It was later ascertained that the candidate's family had not kept cats for three generations. Another candidate received the highest pass after an ant whose life he had saved posed as a missing dot in one of the characters in his essay—a flaw that would have been sufficient for disqualification.^[15]

Elaborate measures were taken to safeguard fairness and honesty in the examinations. Candidates were thoroughly scrutinized and searched on entering the examination compound and forbidden to leave while the test was in progress. These precautions were intended to prevent impostors from taking the test in someone's place, smuggling cribs into the examination, or consulting outside materials after the questions were known. Precautions also guarded against collusion or bias by graders. They were cloistered in the compound until all the tests had been evaluated.

― 190 ―

Each paper was identified only by number and reproduced by professional copyists to prevent identification of the author by name or distinctive calligraphy. Each one was read independently by two evaluators, their sealed grades being opened and, if necessary, reconciled by a third.^[16]

Ingenious strategies were devised to defeat the safeguards and to enlist dishonest means to enhance one's chances of passing the test. Impostors did succeed in taking the examination for their friends or clients, and some clerks and officials were not above accepting bribes. Bookstores did brisk business in tiny printed books of the classics, with characters no larger than a fly's head, that were designed to be smuggled into the examinations.^[17] One form of collusion was for a candidate to arrange beforehand with a bribed or friendly grader that a certain character would appear in a specified space and line on the examination paper. This technique enabled the grader to identify his protégé's paper despite precautions of copying papers and identifying them only by number.^[18] Such tactics enabled numerous unqualified candidates to pass unscathed through the "thorny gates of learning" that constituted the examination system, such as eight who passed the metropolitan examination in 1156 although they were virtually illiterate.^[19]

One reason that crib books and answers smuggled into the examinations could be used with good effect is that the tests placed minimal stress on creativity. By the Ming dynasty, official dogma on the Confucian classics was fixed and allowed no room for individual interpretation. Topics were severely limited, and essays were constrained to such a rigid form that they became "no more than stylistic frippery and literary gyrations."^[20] The whole system atrophied. As Miyazaki points out, "Since officials were content as long as there were no serious errors and their fairness was not challenged, and since candidates feared that they would fail if they wrote something too different from the run-of-the-mill sorts of answers, both groups stifled any tendencies toward originality."^[21]

As early as the eleventh century (Sung dynasty), the problems attendant on this orientation of the system were recognized. Critics observed that the examination system awarded administrative

― 191 ―

posts to those who demonstrated an ability to write poetry and to memorize classical texts. They questioned the relevance of these academic skills to the good character and ability to govern that ought to be requisite for civil servants.^[22] The weakness of China's bureaucratic system became painfully apparent as China was increasingly exposed to foreign ideas and powers. As Otto Franke wrote in 1905, "Instead of wise and morally outstanding representatives of government authority the system supplied incompetent officials ignorant of the ways of the world; instead of an intellectual aristocracy [it supplied] a class of arrogant and narrow-minded literati."^[23] Humiliated by the Boxer Rebellion and the foreign intervention it brought in 1900, the government of the Ch'ing dynasty determined that China must modernize. A new educational system was announced for the entire country in 1901, and the traditional examination system proved to be incompatible with this innovation. The final metropolitan examination was held in 1904, and the system was formally abolished by edict of the empress dowager on September 2, 1905.^[24]

When compared with the history of Western institutions, the period of time over which the Chinese civil service examination system functioned is almost unbelievable. Despite its many critics and flaws, it underwent very little change for 500 years prior to its abolition, and it served for nearly 1,000 years as the major device for recruiting civil servants for China's powerful bureaucracy. Given this amazing persistence, together with the central role it played in imperial Chinese society, the Chinese civil service examination must certainly be credited as the most successful system of testing the world has ever known.

Qualifying Tests and the Development of Written Examinations in the West

In the Western world, the earliest qualifying tests were demonstrations of the mastery of skills. Medieval craft guilds regulated advancement to the status of master craftsman through juries that would judge masterpieces submitted by candidates as evidence of their workmanship. It was similar in the medieval and Renais-

― 192 ―

sance universities, where candidates would prove their mastery of a body of knowledge by means of oral examinations, called disputations. Often these took the form of the candidate expounding on an assigned text and then defending his position against the questions and critique of faculty examiners. Compurgators (character witnesses) had their place in universities as well as in medieval law courts (see chap. 2). At Oxford, candidates had to swear that they had read certain books, and nine regent masters were required to testify to their "knowledge" of the candidate's sufficiency and an additional five to their "belief" in his sufficiency.^[25]

The first written examination in Europe apparently took place at Trinity College, Cambridge, in 1702.^[26] It was a test in mathematics. This seems appropriate, for while theology, morality, or metaphysics lend themselves well to examination by oral disputation, the problem-solving abilities essential to mathematics and the natural sciences are more readily demonstrated in written form. Oxford, which stressed mathematics and the sciences less than Cambridge, was somewhat slower to change. Written examinations were introduced there in 1800,^[27] and by 1830, both universities were abandoning oral disputations in favor of written tests.

The earliest record of university examinations in North America is a 1646 requirement that, in oral disputation, the Harvard University degree recipient "prove he could read the Old and New Testament in Latin and 'Resolve them Logically.'"^[28] Examinations were not common in colonial times, and in 1762, students at Yale University refused to submit to them other than at the time of graduation.^[29] During the first half of the nineteenth century, however, examinations grew both in importance and frequency. Yale's President Woolsey instituted biennial examinations at the close of the second and fourth years of instruction. Following developments at Cambridge and Oxford, they were in written form. For its part, Harvard's first written examination, in mathematics, was given in 1833. Considerable emphasis was laid on examinations at Mount Holyoke, where, in the 1830s, the idea was current that progress through the seminary should be measured not according to the amount of time a student has spent there but according to performance on examinations. The trend continued as the century

― 193 ―

wore on. Harvard introduced written entrance examinations in 1851, and Yale moved from biennial to annual examinations in 1865.^[30]

The frequency and format of examinations in American universities was also greatly affected by important curricular developments during the nineteenth century. The pattern of a uniform, classical curriculum that all students were expected to master lent itself to periodic examinations (annual, biennial, or only at the time of graduation), identical for all students of a given level, and designed, administered, and graded by persons who were not necessarily the students' tutors. Some dissatisfaction with the classical curriculum had been expressed, but the Yale Report of 1828 defended it on the grounds that it developed the mental discipline necessary for the educated person in any walk of life and that any practical or professional training was inappropriate for a college. Although delayed by the Yale Report, elective systems in which students could select among several prescribed curricula, or could design their personalized educational program from a variety of subjects, were introduced at various points in the nineteenth century by the University of Virginia, Brown University, the University of Michigan, Harvard University, Cornell University, and Johns Hopkins University. Early on, these met with varying degrees of success, but they set the trend that after 1900 became established as the rule in American higher education.^[31] This development had a significant impact on modes of assessment. Standard examinations based on a common curriculum are obviously inappropriate for students following different programs of study. Much better suited to the elective system is the now-familiar pattern of separate examinations in each course, devised and graded by the instructor.

Soon after their introduction in the colleges, written examinations diffused to the public schools. Boston's elementary and secondary schools had long followed the practice of annual oral examinations conducted by committees of visiting examiners. By the mid-nineteenth century, however, it was becoming impossible for these panels effectively to examine the large number of students involved, particularly in the more populous elementary schools. In 1845, faced with the daunting prospect of examining

― 194 ―

over 7,000 children in nineteen schools, Boston introduced a written examination. It was probably the first large-scale written test to be used in American public schools. Its goals were well conceptualized:

It was our wish to have as fair an examination as possible; to give the same advantages to all; to prevent leading questions; to carry away, not loose notes, or vague remembrances of the examination, but positive information, in black and white; to ascertain with certainty what the scholars did not know, as well as what they did know.^[32]

There were a few practical wrinkles at the beginning. The same test was given in all schools, and although it was printed, it did not occur to the committee to have it administered in all schools simultaneously. Instead they gave it in the schools one at a time, rushing as quickly as possible from one school to the next in an effort to prevent knowledge of the questions reaching some schools before the test did.^[33] Such flaws were soon smoothed out, however, and by the middle of the next decade, written examinations had been adopted by public school systems in nearly all the major cities in the country.^[34]

Competitive, written examinations came to exert massive influence, constituting "possibly the single most intrusive and expensive innovation in Western education in the last century."^[35] From their origins in the schools, written examinations began in the nineteenth century to proliferate widely throughout the rest of society. A spirit of reform was in the air in Britain as means were sought to improve social policy in ways consistent with the demands of an industrial society and global empire. Beginning around 1850, examinations were settled on as the means to curtail the old patronage system and rationalize personnel selection and appointments in a variety of contexts. The resulting impact of examinations in British life was immense. E. E. Kellett, in his 1936 autobiography, recalls that in his youth, examinations had been "almost the be-all and end-all of school life. . . . If, in fact, I were asked what, in my opinion, was an essential article of the Victorian faith, I should say it was 'I believe in Examinations.'"^[36]

― 195 ―

The India Act of 1853 introduced competitive examinations for the Indian Civil Service (the first examinations were held in 1855), and new examinations for army commissions followed in 1857–58.^[37] By 1870, most positions in the civil service were subject to competitive examination.^[38] Examinations also influenced the lives of the lower classes. The Department of Science and Art, a government agency, was founded in 1853 to stimulate British industry by assisting the technical training of artisans. The department operated largely through examinations, inaugurating a test for teachers in 1859 and one for students in 1860. Reluctant to expend its funds on anything other than concrete results, the department established a scheme whereby it paid teachers one pound for each student who achieved a third-class pass, two pounds for a second-class pass, and three pounds for a first-class pass on its examinations.^[39]

The flow of written tests from the universities into other sectors of society occurred on a smaller scale in the nineteenth-century United States than in Great Britain. Testing in America, as we will see, flowered in the twentieth century, by which time the technology of testing had undergone significant changes. However, some testing was introduced into the federal bureaucracy during the nineteenth century. It was largely a political matter. The spoils system had become so entrenched that, in the 1860s, the election of a new president brought about a complete change in government employees. The result, of course, was a poorly trained and inefficient civil service. The assassination of President Garfield by a disgruntled office seeker served as the catalyst for the passage of the first step toward reform: the Civil Service Act of 1883. It was a small beginning, bringing just 10 percent of government employees under a system wherein jobs were awarded on the basis of examinations and protected against changes in political administrations. Nevertheless, by 1908, the civil service system had been expanded to cover some 60 percent of the federal work force.^[40]

The United States, of course, was founded partly to escape the class privileges of Europe, and the radical democratic spirit that prevailed here fostered a distrust of elites of any sort. This produced an attitude toward civil service examinations very different

― 196 ―

from that which prevailed in Britain. Government jobs in an egalitarian society, it was held, should be such that anyone would be able to fill them. For this reason, early American civil service examinations were quite simple and stressed practical, job-related skills.^[41]

Oral versus Written Examinations

Almost as soon as they were introduced, written examinations became the subject of lively controversy. Their supporters stressed virtues integral to the spirit of positivism, such as precision and efficiency. Written examinations were extolled as superior to oral examinations in objectivity, quantification, impartiality, and economy for administration to large numbers of students.^[42] Reflecting on Boston's introduction of written examinations in 1845, Horace Mann claimed for them seven major advantages over the oral format: (1) the same questions being given to students from all schools, it is possible to evaluate the students and their schools impartially (and, indeed, the Boston examiners were at least as interested in using the examination results to measure how well the various schools were fulfilling their mission as they were in assessing individual students—a common use of examinations that persists today); (2) written tests are fairer to students, who have a full hour to arrange their ideas rather than being forced, when a whole class is being examined orally, to display what they know in at most two minutes of questioning; (3) for the same reason, written examinations enable students to express their learning more thoroughly in response to a wider range of questions; (4) teachers are unable to interrupt or offer suggestions to examinees; (5) there is no possibility of favoritism; (6) the development of ideas and connecting of facts invited in more extensive written answers makes it easier to evaluate how competently the children have been taught than is possible with brief, factual oral responses; and (7) "a transcript, a sort of Daguerreotype likeness, as it were, of the state and condition of the pupils' minds is taken and carried away, for general inspection," and this almost photographic im-

― 197 ―

age, permanent because written, enables the establishment of objective standards for the accurate comparison of examinees and their schools.^[43] Proponents of written tests in England also claimed a wider social advantage of written university entrance examinations to be that they tended to open higher education, formerly the preserve of the aristocracy, to the middle classes.^[44]

An important issue in the debate over the relative value of oral and written examinations was the sort of learning they encouraged and, therefore, the sort of minds they tended to produce. Mann's second, third, and sixth arguments (above) supporting written examinations were framed in the context of the Boston public schools where, previously, large numbers of schoolchildren had been examined orally by a visiting committee in a short time. When, however, written examinations were compared with the traditional university oral disputations, in which a single candidate might be questioned by a group of examiners for an hour or more, the opportunities for open-ended development of ideas were obviously greater with the oral format. This was not necessarily a plus for oral examination in the eyes of all interested parties, however. Especially in the late eighteenth and early nineteenth centuries, the very tendency of written examinations to encourage the regurgitation of received wisdom was applauded by some (in language curiously reminiscent of some contemporary critiques of postmodernism) as

a means of diminishing controversy on subjects potentially injurious to good discipline. This had particularly important implications during the revolutionary years of the late 18th century. An Oxford don rejected the intrusion of French ideas as "that reptile philosophy which would materialise and brutalise the whole intellectual system." The solution was the written examination, with "approved" answers. Writing in 1810, Henry Drummond insisted that it was important to teach "those old and established principles that are beyond the reach of controversy," and Edward Copleston concluded simply that "the scheme of Revelation we think is closed, and we expect no new light on earth to break in upon us." And writing in the late 1870's, Henry Latham recalled that questions about ethics, important in the early 19th century, disappeared as

― 198 ―

a Tripos [written examinations at Cambridge] subject because they left too much room for variety of opinion.^[45]

Nevertheless, by the 1870s in Britain and the 1880s in the United States, critics had developed a deep suspicion of written tests and used precisely the same arguments as Mann had advanced but in favor of oral examinations. The emphasis in written tests on factual knowledge and questions with preestablished answers tended to stifle imagination and creative thought.^[46] They constituted "a system of straitjackets," forcing students with diverse interests and abilities to attempt to satisfy a uniform, stultifying set of expectations and evaluations.^[47]

The methodologies connected with written tests were also criticized. Their apparent objectivity is a chimera, it was argued, because graders diverge widely in the marks they give.^[48] Grades expressed numerically (e.g., on a scale of 100) are downright misleading, for "in the ultimate analysis he [the grader] is . . . marking by impression and later clothing his impression with a similitude of numerical accuracy."^[49] (How the use of oral examinations would solve these problems is not clear.)

Questions were also raised about the possible implications of written examinations for social discrimination, although the concern was apparently more to perpetuate inequalities than to end them. It was argued, for example, that "where both men and women were examined together . . . they [the examinations] caused 'social damage' by leveling the sexes."^[50] According to the eleventh (1910) edition of the Encyclopaedia Britannica , "Exams have in England mechanically cast the education of women into the same mould as that of men, without reference to the different social functions of the two sexes (the remedy is obvious)."^[51] Britons were also alive to the possible effect of written examinations on the class structure, particularly the dangers of opening military and civil service positions that had been traditionally filled by gentlemen to just anyone on the sole basis of an examination. In 1854, no less exalted a personage than Queen Victoria wrote to W. E. Gladstone, an advocate of civil service examinations and then chancellor of the Exchequer, expressing concern that persons who had passed the requisite examination might still lack

― 199 ―

the qualities of loyalty and character necessary for certain sensitive posts. In his reply, Gladstone expressed the conviviction that the diligence necessary to excel on an examination was simultaneously evidence of good character.^[52] In any event, he had little doubt about the capacity of the aristocracy to maintain itself in a system of competitive examinations. He wrote, in 1854,

I do not hesitate to say that one of the great recommendations of the change (to open competition) in my eyes would be its tendency to strengthen and multiply the ties between the higher classes and the possession of administrative power. . . . I have a strong impression that the aristocracy of this country are even superior in natural gifts, on the average, to the mass; but it is plain that with their acquired advantages their insensible education, irrespective of book-learning, they have an immense superiority. This applies in its degree to all those who may be called gentlemen by birth and training.^[53]

Nevertheless, even with the extension of qualification by examination to nearly the entire British civil service in 1870, certain posts were exempted. For example, "only a young man whose antecedents and character were thoroughly known could be regarded as 'a fit and proper person to be entrusted with the affairs, often delicate and confidential, of the British Foreign Office.'"^[54]

Finally, critics charged that as written examinations became increasingly frequent in the educational system, they tended to focus students' attention on preparation for tests rather than on the subject matter in its own right. Students are then motivated to work for good examination grades rather than for the intrinsic rewards of learning.^[55] Examinations also subject teachers to similar pressures. Insofar as teachers and schools are evaluated on the basis of their students' examination results (this was the case, as mentioned above, in the Boston public school system), instructors and school administrators may well attempt to influence the process in their favor by "teaching to the test." Again, the ultimate goal shifts from acquiring knowledge and skills and nurturing the love of learning to successful performance in examinations. Doubtless, an early incentive to teach to the test was provided by the British Department of Science and Art when it

― 200 ―

adopted the practice of paying teachers for each student who passed its examinations.

Phrenology

The written examinations discussed thus far were largely of an essay format. Although essay examinations were touted as more exact than oral tests, their results were still insufficiently quantifiable or comparable across large numbers of subjects for the examinations to be widely accepted as truly scientific instruments of measurement. Moreover, as with the oral disputations that preceded them, the written examinations that became popular during the nineteenth century were achievement tests. Their gaze was directed to the past, in the sense that they were designed and used to certify that the subject had, at some time prior to the test, mastered a certain body of knowledge or skill. How much more efficient it would be if tests could predict the future, if they could tell in advance whether subjects possessed the aptitude or talent to learn certain skills or perform satisfactorily in some job. Then people could be directed toward goals that they had a high probability of achieving, and palpable benefits would result both for individual fulfillment and the efficiency of society's utilization of its human resources. These dual objectives—to make tests more scientific and to make them future oriented—constitute the positivist program for the development of qualifying tests.

During the nineteenth century, a form of testing that claimed to satisfy both of these objectives with elegant simplicity was enthusiastically put forward under the name of phrenology. It turned out to be both false and pseudoscientific. Nevertheless, phrenology enjoyed a great deal of popularity for a time, and its claim to apply scientific methods to the problem of how to predict future performance on the basis of present information makes it an important and interesting chapter in the history of positivist testing.

The basic principles of phrenology were postulated by Franz Joseph Gall in Vienna at the beginning of the nineteenth century. His starting point was the faculty theory of mind, the notion that

― 201 ―

the mind is made up of a series of discrete capacities or faculties. Gall identified thirty-seven of them, of which fourteen are "intellective" (including order, language, and time) and twenty-three are "affective" (destructiveness, acquisitiveness, "amativeness" or capacity for love, etc.). He claimed further that a close correlation exists between the various faculties of the mind and the surface of the brain, such that a well-developed mental faculty would be marked by a bulge at the point on the surface of the brain where that faculty is located, while a depression in the brain's surface indicates that the faculty at that cerebral address is underdeveloped. Finally, Gall assumed that the skull fits the brain like a glove, such that the contours of the exterior of the head constitute a faithful map of the shape of the brain within.

None of these propositions is true. Nonetheless, for those who can be convinced of it, the system offers a wonderfully objective and precise means of learning the specific mental qualities of any individual, living or dead. A cranial chart was developed to identify the precise location on the skull corresponding to each mental faculty. Then all that is necessary is to carefully examine the bumps and depressions on an individual's skull and compare them with the standard chart to determine the degree of development of each of the mental faculties.^[56] It would be difficult to find a better example of a test, as that term has been defined in this book. The phrenological examination is an outstanding case of intentionally seeking knowledge about someone by collecting information about one thing (the shape of the skull) that is taken to represent another thing (the individual's mind or behavioral propensities).

Phrenology was introduced to the United States in 1832 by Gall's onetime collaborator, Johann Casper Spurzheim. A great popularizer of the technique, Spurzheim's phrenological lectures and demonstrations were an immediate sensation in this country—although his time to give them was limited because, as it happened, he died just six weeks after his arrival. This reverse notwithstanding, phrenology caught the American popular imagination as practitioners, brandishing their motto "Know thyself," promoted themselves as vocational counselors and aids to all who would know better their own nature, the precise combination

― 202 ―

of their mental faculties. Betrothed couples were urged to consult a phrenologist to ascertain their compatibility. In an early form of preemployment testing, some businesses even began to require phrenological examinations of their applicants. As the profession grew, a number of improvements in the technique for taking phrenological measurements were achieved. Noteworthy among them was the Lavery Electric Phrenometer, a device developed in 1907 and advertised as capable of measuring head bumps "electrically and with scientific precision."^[57]

The leading American advocates of phrenology were the brothers Orson and Lorenzo Fowler and their brother-in-law, Samuel Wells. Orson Fowler was unequivocal about the one-to-one mind-brain linkages on which phrenology rested. "The brain," he pronounced, "is composed of as many distinct organs as the mind is of faculties."^[58] He expanded somewhat on Gall's analysis of the mental faculties, identifying forty-three in all and classifying them in nine categories: (1) animal propensities, (2) social, (3) aspiring sentiments, (4) moral sentiments, (5) the perfecting group, (6) senses, (7) perceptives, (8) literacy, and (9) reflective faculties. He further organized these into two great classes, the first five being grouped together as feelings and the last four as intellectual faculties. The organs corresponding to the feelings are located in the part of the head covered by hair, while the intellectual organs are found in the forehead.^[59]

Fowler claimed to have proved the effect of these faculties 10,000 times by demonstrations on "patients already under magnetized [hypnotic] influence" whose phrenological organs were excited to exaggerated responses when touched with the finger.^[60] "Examples: he [the author] never touched Devotion but the patient clasped hands, and manifested the most devout adoration of God in tone, natural language, words, and every other indication of worship. He never touched Kindness but the subject gave away all he could get to give."^[61] Proofs provided by Spurzheim during his brief introduction of phrenology to Americans in 1832 were equally convincing: as he would pass a magnet from a subject's area of veneration to that of acquisitiveness, the person would immediately abandon a "worshipful air" and attempt to pick the phrenologist's pocket.^[62]

― 203 ―

The mental faculties were not to be toyed with by nonprofessionals, however, for abuse could permanently dull their acuity. Such is the sad fate of those who abandon themselves wantonly to the pleasures of sex. Orson Fowler gravely reports that "instances by the hundreds have come under the Author's professional notice, in which a few moments of passioned ecstasy have stricken down the sensory nerves; both killing itself forever after, and along with it their power to enjoy all other pleasures of life" (his emphasis).^[63]

Typical head shapes vary among groups of mankind, and so, therefore, do their phrenological endowments. Generic portraits of a variety of races (and a few animals, such as the gorilla, thrown in for comparative purposes) are found in Fowler's book, with explanations of their mental capacities.^[64] American Indian heads, sad to say, manifest much destructiveness, caution, and quite little in the way of intellectual faculties. Hence, Indians are "little susceptible of becoming civilized, humanized, and educated."^[65] What goes for the races also goes for the sexes, and Fowler's comparison finds male heads to be colder, braver, and more reflective, while the female crania show more parental love, religion, and morality.^[66] In collusion with his brother Lorenzo, Fowler found that "in females, this faculty [acquisitiveness] is generally weaker than in males, while ideal. [ideality] and approbat. [approbative] are generally much larger, which accounts for the fact, that they spend money so much more freely than men, especially, for ornamental purposes."^[67]

In addition to its utility for counseling and self-knowledge, phrenology may be used to improve our understanding of the prominent men of history. "Great men have great brains," wrote Fowler,^[68] and that normally means they have big heads. Throughout his book are drawings of the heads of various famous men—Napoleon, Cuvier, Reubens—together with textual explanations of how their peculiar cranial characteristics (pronounced in the drawings) account for their particular accomplishments. At the end of the book, Orson Fowler includes a picture of his own head, with the comment that "the desire to do good is its largest organ."^[69] That desire appears to have known no limits, for in this massive tome, Fowler moves beyond the benefits available from

― 204 ―

phrenology to include a logical argument for the existence of God,^[70] a demonstration that we will all be distinct and recognizable individuals in the afterlife, just as we are here,^[71] useful advice on how to grind flour properly,^[72] and finally, in the closing chapter, "Phrenology Applied," a detailed set of instructions for "How to Make Good Rain Water Cisterns Cheap."^[73]

Despite its popular appeal, phrenology was always recognized as a pseudoscience by scholars.^[74] The problem, of course, is that phrenology's core assumptions about detailed correlations between faculties of the mind and the brain, as well as between the shape of the brain and the shape of the exterior of the head, are simply false. Ill-founded as it was, however, the strategy of phrenology accurately foreshadows the positivist developments in testing that were to come in the twentieth century. Elements of this strategy include the assumption that human faculties can be measured by the techniques of science and that central among those faculties is aptitude . This term applies not to what persons have achieved but to what they are likely to achieve, are capable of achieving. The phenomena measured, the means of measuring them, and the rationale behind the measurements have changed, and today the process rests on more secure footing. Nevertheless, in its fundamental spirit of positivism, the development of testing during the twentieth century remains at one with the false start that was phrenology.

The Birth of Scientific Testing

Modern positivist qualifying testing—the program of using scientific methods to measure the differing capacities of individuals with verifiable results—began as an outgrowth of the work of Charles Darwin. Theorists such as Hobbes, Locke, Rousseau, Hegel, and Marx were of the opinion that normal human beings are fundamentally equal, with what individual differences there were in strength or quickness of wit being of minor significance. This view was countered by Darwin's thesis in Origin of the Species (1859) that individual differences constituted the raw material on which natural selection works. He reasoned that

― 205 ―

traits that enable their possessors to survive longer and reproduce more will become more prevalent in the population (while traits conducive to early death and/or diminished reproduction will tend to disappear), and in that way the species evolves. Therefore, Darwin himself and like-minded thinkers formed the opinion that individual differences are of fundamental importance to the present and future state of the human species. This view evolved into the conviction that an applied psychology devoted to the scientific measurement and enlightened cultivation of individual differences could contribute to the progress of civilization as effectively as had engineering.^[75] In a nutshell, this captures the vision that has guided positivist qualifying testing ever since.

Francis Galton, Darwin's cousin, was among the first to recognize that if individual differences are fundamental to our understanding of human evolution, they should be identified systematically and studied with scientific precision. To that end, Galton established the Anthropometric Laboratory in connection with the International Health Exhibition held in London in 1884. Adopting the admirable technique of having research subjects contribute to the expense of the project, visitors to the exhibition were measured in a variety of ways for a fee of four pence. Laboratory personnel entered the data in a register and informed subjects of the results, thus enabling them "to obtain timely warning of remediable faults in development, or to learn of their powers."^[76] The laboratory continued its work for six years after the Exhibition closed, and in all, over 9,000 individuals were measured according to variables such as keenness of sight, strength, ability to discriminate weights, swiftness of reactions, memory of forms, and ability to discriminate colors. In addition to his pioneering work in human measurements, one of Galton's greatest contributions was to devise the notion of the standard deviation and the statistical concept of correlation, which he found to be useful for the analysis of the large amounts of data his measurements produced.^[77]

One of Galton's assistants in the Anthropometric Laboratory was a young American psychologist named James McKeen Cattell. When he became professor of psychology at the University of Pennsylvania (1888–1891) and later at Columbia University, he

― 206 ―

introduced Galton's style of testing in the United States. In an article published in the British journal Mind in 1890, Cattell proposed a test of ten measures, all of which could be readily administered in a laboratory and precisely measured. These were: (1) strongest possible squeeze with the hand; (2) quickest possible movement of the hand and arm; (3) minimum distance between two points of pressure at which they are felt as two; (4) amount of pressure at which it begins to be felt as pain; (5) least notable difference between two weights; (6) minimum reaction time to a sound; (7) minimum time between seeing and naming colors; (8) accuracy of finding the center of a 50-centimeter line; (9) accuracy of judging ten seconds of time; (10) number of random consonants that can be repeated after hearing them once.^[78] Clearly Galton's and Cattell's tests were concerned more with physical properties than mental ones. But they believed that physical strength and acuity were indicative of like properties of the mind because, as Cattell put it with reference to the apparently purely physiological character of the strength of the squeeze of the hand, "it is, however, impossible to separate bodily from mental energy."^[79] The underlying assumption seems to have been that perception is linked to cognition in such a way that finely developed capacities of sense discrimination (the ability to distinguish colors, weights, or sensations with precision) are signs of intellectual ability. The idea remains embedded in popular attitudes, as when intelligent people are often described as "clear eyed," and physically clumsy individuals are suspected of stupidity.

In his 1869 book, Hereditary Genius , Galton argued that human intelligence is largely a matter of inheritance and is distributed unequally among class and racial groupings. His measure was the rate at which different races produce individuals of genius. On this basis, he ranked the Anglo-Saxon inhabitants of his own Britain two grades above Negroes but two grades below the ancient Athenians, whom he took to be the most intelligent race in history.^[80] These rankings imply huge racial differences. For example, Galton claimed that a level of intelligence high enough to appear in 1 in every 64 persons among Anglo-Saxons would occur in only 1 of every 4,300 Negroes but 1 in every 6 ancient Athenians.^[81] So far as classes are concerned, Galton shared with many others of the

― 207 ―

day (see the quote from Oscar Wilde's Lord Fermor that serves as epigraph to this chapter) the idea that the aristocracy enjoyed innate superiority over the lower classes. Indeed, that explains why the class differences exist. The Darwinians, interested in the future course of human evolution and hopeful that it would proceed in a beneficial direction, entertained apprehensions about the high birthrate of the lower classes. Herbert Spencer, who blended his Darwinism with a thoroughgoing laissez-faire attitude that it was best to let nature take its course without interference, took comfort in the fact that the high death rate of the lower classes served to weed unfit traits out of the population.^[82] Galton, who coined the term "eugenics" in 1883,^[83] favored a somewhat more aggressive policy of countering the alarming birthrate of the lower classes by encouraging intelligent people to seek each other out as mates and so improve the mental power of the species.^[84] Cattell apparently believed that the best place to start building hereditary lines of intelligence is at home. An already-formed aristocracy being less in evidence in the United States than in Europe, he decided to appeal to the American sense of a good business deal and promised to give $1,000 to any of his children who married the son or daughter of a professor.^[85]

Not all earlier thinkers held identical views about the transmission of intelligence. A refreshing alternative was developed by the sixteenth-century Spanish physician, Juan Huarte, whose 1575 book, Examen de ingenios para las sciencias , was translated into several languages and ultimately "Englished out of . . . [the] Italian" in 1594 as Examination of Men's Wits . Huarte thought that quick and dull wits are indeed a matter of heredity but not in the sense of genius begetting genius. Quite to the contrary, the relationship of intelligence between father and child is, in Huarte's view, inverse. Wise men often have slow children because they do not devote themselves vigorously to the task at hand while copulating, their minds being preoccupied with loftier subjects, and thus their seed is weakened. Duller men, in contrast, "apply themselves affectionately to the carnal act, and are not carried away to any other contemplation," and thus produce strong seed that results in brighter progeny.^[86] I must recommend, however, the exercise of some caution in assessing Huarte's conclusions. He

― 208 ―

seems not to have been an overly exact observer (or else he was in thrall to preconceptions in spite of empirical evidence), for he also reports that the equal ratio of males to females, previously ensured by the fact that all human offspring were brother and sister twins, had been replaced in his own day by single births that produced six or seven females for every male.^[87]

Intelligence Testing

Whether clever children are born of dull fathers or (the majority opinion) of clever ones, the question persists of how this differentially distributed intelligence so cherished by Galton, Cattell, and others could be tested. This is a very different proposition from the oral or written tests in education or the civil service. The challenge before intelligence testing is to measure not so much what the subject has already learned as how much the subject is likely to learn in the future, or, better, the subject's ability to learn. Phrenological measurements of the contours of people's heads would have been a marvelous way to ascertain such aptitudes and talents, but unfortunately it did not work. Galton's and Cattell's batteries of anthropomorphic measurements had the merit of precision, but precisely what they told about a person's intelligence was far from clear. A major step in the history of testing was taken when the first successful test of intelligence was developed by the French psychologist, Alfred Binet.

Binet was asked by the French education ministry to develop a test to identify children with learning deficiencies who should be given the benefit of special education. His desire was to measure intelligence apart from any instruction the child may have received. He first did this in 1905 by constructing several series of tasks of increasing difficulty, the test consisting of setting a child to these tasks and noting how far in each series the child was able to go. In 1908, Binet refined the test by deciding at what age a normal child should be able to complete each task successfully. The age level of the most difficult tasks that a child performed successfully was identified as the child's "mental age." Children with mental ages well behind their chronological

― 209 ―

ages were identified as apt subjects for special education. The relation between mental age and chronological age became IQ (intelligence quotient) when, in 1912, the German psychologist, William Stern, proposed that mental age be divided by chronological age and the result multiplied by 100 (to get rid of the decimal point). Thus, a child with a mental age of 8 and a chronological age of 6 has an IQ of 133, and a mental age of 5 and a chronological age of 7 yields an IQ of 71, while the most perfectly average persons have an IQ of 100 because their mental and chronological ages are identical.^[88]

Binet's test was soon translated to America, where it rapidly attracted a great deal of interest. This was partly because its reporting of IQ in numerical form appealed to the positivistic assumptions of many American psychologists that intelligence testing would become truly scientific only when it yielded quantitative results. As educational psychologist E. L. Thorndike wrote,

In proportion as it becomes definite and exact, this knowledge of educational products and educational purposes must become quantitative, take the form of measurements. Education is one form of human engineering and will profit by measurements of human nature and achievement as mechanical and electrical engineering have profited by using the foot- pound, calorie, volt, and ampere.^[89]

Consistent with Binet's own purposes, his test was initially used in the United States to assist in the diagnosis of mental deficiencies.^[90] In 1909, Henry H. Goddard, director of research at the Training School for Feeble-minded Girls and Boys in Vineland, New Jersey, applied the test to inmates of his institution and reported that its results squared very well with staff assessments of their mental level.^[91] It was, however, Lewis M. Terman who contributed most to the popularity of Binet's test in this country. In 1916, Terman, a psychologist at Stanford University, revised and expanded the test and named it the Stanford-Binet. This has been the model for virtually all American IQ tests ever since.

In addition to its utility for identifying the feeble-minded, Terman was convinced that great social benefits could be reaped

― 210 ―

by testing the normal population as well. Enunciating the quintessential positivist creed that the application of science can contribute to the successful and efficient conduct of the affairs of society, he held that intelligence testing would facilitate the placement of people in those educational programs and vocations for which their endowments best suit them.^[92] Those with IQs of 75 and below should be channeled into the ranks of unskilled labor, while those of 75 to 85 were appropriate for semiskilled labor. An IQ of at least 100 was necessary for any prestigious and/or financially rewarding profession. Proper training and placement were especially important for those with IQs of 70 to 85. Otherwise, they were likely to fail in school, drop out, "and drift easily into the ranks of the anti-social or join the army of Bolshevik discontents."^[93]

Terman's enthusiasm for the efficiency that could result from placement on the basis of predictions provided by intelligence testing was shared by E. L. Thorndike, another strong defender and designer of intelligence tests. He wrote with regard to education, "It is surely unwise to give instruction to students in disregard of their capacities to profit from it, if by enough ingenuity and experimentation, we can secure tests which measure their capacities beforehand."^[94] Now at last one could glimpse a possible realization of Saint-Simon's and Comte's utopian visions of the benefits to be realized by applying science to society. It was imagined that testing and judicious placement on the basis of test results could bring about a situation where everybody wins. Society would profit by making optimal use of its human resources, while individual satisfaction would be maximized as everyone finds the niche in which they can contribute most fully and successfully.

The major remaining obstacle to the fulfillment of this positivist dream had to do with the technology of testing. Intelligence tests such as the Stanford-Binet are conducted one on one by trained technicians. Obviously, this technique is too expensive and time-consuming to be used to test the masses. The crucial technological development in test administration was achieved in the context of World War I, mainly through the efforts of Harvard psychologist Robert Yerkes. Sharing the notion with other posi-

― 211 ―

tivists that mental measurement would become truly scientific only when it became quantitative and standardized against large bodies of data, he concocted a scheme whereby psychology could contribute to the war effort while the war effort contributed to the development of psychology. He proposed that all U.S. Army recruits be given intelligence tests. The results would assist the army in making the most effective use of its manpower, and, simultaneously, psychology would generate a huge body of uniform, quantitative data on which to build its investigations into the nature of intelligence.

The army accepted the proposal, and from May to July 1917, Yerkes, Terman, Goddard, and other major figures in psychology gathered at Goddard's Training School for Feeble-minded Girls and Boys to devise a way that a limited number of psychologists could test the intelligence of massive numbers of subjects in a relatively short time. The crucial technoligical breakthrough was the multiple-choice question. The first multiple-choice question was devised in 1915 by Frederick J. Kelly (later dean of the School of Education at the University of Kansas) in his Kansas Silent Reading Test for elementary schoolchildren. Arthur Otis, one of Terman's students, had been exploring the potential of Kelly's innovation as a device for testing, and Terman brought the results of that research with him to Vineland.^[95] There Yerkes and his group succeeded in fashioning a multiple-choice test that correlated in outcome with one-on-one administrations of the Stanford-Binet. Their test was the Army Alpha, the first written, objective intelligence test and the ancestor of all subsequent tests of that type (often called "aptitude tests") so well known now to every American. A second test, the purely pictorial Army Beta, was devised for illiterate recruits and immigrants who did not know English.

Yerkes and his colleagues certainly got the massive body of data they desired, for in the brief period between the devising of the tests in 1917 and the war's end the following year, 1.75 million men took the Army Alpha or Beta. On the basis of the test results, the psychologists made recommendations such as which recruits were intelligent enough to qualify for officer training and which ones should be assigned to special labor duty

― 212 ―

or discharged outright on the grounds of mental incompetence. It is unclear to what extent the army actually acted on such recommendations. Nevertheless, some disturbing conclusions emerged from the army testing program. The average mental age of white Americans turned out to be 13 (barely above the level of morons). Test results revealed immigrants to be duller still (the average mental age of Russians being 11.34, Italians 11.01, and Poles 10.74), and Negroes came in last, with an average mental age of 10.41. These findings fueled debates about immigration quotas, segregation, eugenics, and miscegenation for years to come.

But the most lasting effect of the army testing program was that it revolutionized the perception and use of intelligence tests in American society. "The war changed the image of tests and of the tested. Intelligence tests were no longer things given by college professors and resident examiners like Henry H. Goddard to crazy people and imbeciles in psychopathic institutions and homes for the feeble-minded, but legitimate means of making decisions about the aptitudes and achievements of normal people."^[96] With the development of written, standardized intelligence tests that could be easily administered to unlimited numbers of subjects, the dream of Terman and other American psychometricians was on its way to realization. Now it would be possible to determine everybody's intelligence and to use that information to channel people in directions where they presumably would both find personal satisfaction and make optimal contributions to society commensurate with their abilities.^[97]

CEEB, ETS, and Standardized Testing in Education

The social sector where mass intelligence testing has made its greatest impact is education, particularly in the form of entrance examinations for colleges and universities. Although these tests are often called aptitude tests, the terminological distinction is of little substance. "Aptitude" is used primarily to avoid the political and social volatility of "intelligence," being less freighted with connotations of innate, immutable ability.^[98]

― 213 ―

The circumstances that eventually resulted in standardized college entrance examinations may be traced back to the immense burgeoning of American secondary education in the decades around the turn of the twentieth century. In 1870, about 80,000 students attended some 500 secondary schools, nearly all of them private. By 1910, the number of secondary school students had grown to 900,000, 90 percent of whom were in public high schools. Between 1890 and 1918, the general population of the United States grew by 68 percent, while the number of high school students over the same period increased by 711 percent.^[99] This explosion of the secondary school population of course produced a like increase in higher education: the number of college students grew at a rate nearly five times greater than the general population between 1890 and 1924.^[100] The old system of screening college applicants soon proved to be hopelessly inadequate in dealing with the changing circumstances.

To call the old arrangements a system is hardly appropriate, because no coordination existed among the admission procedures followed by the various colleges and universities. Many eastern schools administered written entrance examinations on their own campuses. Faculty committees from some midwestern universities would visit various high schools to evaluate them, and graduates of the schools certified by this process would then be admitted to the university. Other universities assessed applicants on the basis of the performance of previous graduates of their high schools who had attended the university.^[101] In an effort to bring order to the chaos, in 1885, the principal of Phillips Andover Academy entered the plea that some organization and standardization be introduced into the preparatory curriculum for college entrance in American secondary schools. Beginning in 1892, the National Education Association formed committees to address this question. Not everyone shared the notion that college entrance requirements should be standardized. Lafayette College president Ethelbert D. Marfield did not look kindly on the prospect of being told by some board whom he should and should not admit. Raising an issue of perennial weight with academic administrators, he insisted that if he wanted to discriminate in favor of the son of a benefactor, he should be able to do so.^[102]

― 214 ―

Such dissenting voices notwithstanding, a widespread desire to bring some consistency to college entrance procedures and to open admissions to greater geographic and social diversity than was possible under the old system of requiring applicants to take entrance tests on each campus resulted in the formation of the College Entrance Examination Board (CEEB) in 1900. The CEEB was charged to design and administer standard entrance examinations that all member colleges would accept in making their admissions decisions.^[103]

In the beginning, the CEEB was composed entirely of eastern colleges, thirty-five of which agreed to accept the board's tests in lieu of their own entrance examinations. The first CEEB examinations—essay tests in chemistry, English, French, German, Greek, history, Latin, mathematics, and physics—were offered during the week of June 17, 1901, at sixty-seven locations in the United States and two in Europe. Columbia University was the dominant influence on the board at its inception: the grading committee met to read the examinations at Columbia's library, and of the 973 persons who took those first examinations, 758 were seeking admission either to Columbia or its sister institution, Barnard College.^[104]

The early College Board examinations were achievement tests, intended to measure how well an applicant had mastered Latin, mathematics, and the other specific subjects tested. Aptitude or intelligence testing, which is designed to ascertain an individual's general capacity to learn, was introduced to the college admissions process in 1918. Columbia University again took the lead, this time for dubious reasons pertaining to the changing demographic profile of its student body. Not only did the number of college students vastly increase in the decades around the turn of the twentieth century but immigrants and their children constituted an ever-larger proportion of them. About half of the students in New York City public schools were in this category by 1910. Those who went on to college had for the most part attended City College or New York University. A 1908 change in entrance requirements made Columbia University more accessible to public high school graduates, and during the next decade, the proportion of high school students of immigrant background jumped

― 215 ―

dramatically. Many of these were Eastern European Jews, whom "many of Columbia's faculty and administration considered . . . [to be] socially backward, clannish, and hostile to upper-middleclass values . . . and scorned as achieving far beyond their native intelligence."^[105]

In 1918, Columbia was deluged with applicants of immigrant background for its new Student Army Training Corps class, an officer training program. Given the prevalent belief that immigrants were less intelligent than older American stock, intelligence tests appeared to offer one means of winnowing these unwelcome students without establishing formal quotas. Therefore, in the first use of an intelligence test for college admission, applicants to this program were required to take the Thorndike Tests for Mental Alertness. The following year, Columbia allowed applicants with otherwise acceptable credentials to substitute the Thorndike College Entrance Intelligence Examination for the usual, achievement-type, entrance examinations. This seems to have had the desired effect, for the proportion of out-of-state entrants (most of whom were presumably of suitable social status) increased significantly. The Thorndike test proved also to be a better predictor of first-year college performance than traditional entrance examinations or one's high school record.^[106]

In 1919, the CEEB expressed interest in the more general use of intelligence tests for college admissions, but it was not until 1925 that a commission was established under the direction of Princeton psychologist Carl Campbell Brigham to develop one. Brigham had been closely connected with the army testing program during World War I, and the test his commission devised was objective (multiple-choice) in format and heavily influenced by the Army Alpha. One of its major purposes was to test intellectual ability without excessive reliance on any specific subject matter. This would promote the principle of equality of opportunity, in that discrimination against students from inferior secondary schools would be minimized. The new test was dubbed the Scholastic Aptitude Test (SAT). It proved to be remarkably durable, for as late as the 1970s, the SAT was still virtually the same test that Brigham's commission had developed.^[107]

― 216 ―

The first SAT was taken in 1926 by 8,040 college applicants, but for many years, it remained less popular than the CEEB's traditional essay examinations.^[108] That changed in 1942, when war intervened once again in the history of testing in America. In that year Princeton, Harvard, and Yale shifted to a wartime yearround calendar of instruction, with applicants to be informed of admission in early May and freshmen beginning classes in June or early July. CEEB essay examinations were regularly given in June. Since 1937, however, the CEEB had offered a one-day battery of tests in April. These consisted of the SAT and a series of short achievement tests. They were used in scholarship decisions and, from about 1939, by candidates who wanted to learn the fate of their applications before the traditional date in July. The 1942 decision by Harvard, Yale, and Princeton to notify applicants in May directed greatly increased attention to the April tests, so much so that the CEEB decided to cancel the already-announced June essay tests. With that development, the era of essay tests for college admissions abruptly came to an end and the present CEEB arrangement of the SAT plus short achievement tests—all multiple choice in format and to be taken in a single day—was fully established. The enterprise assumed its current form in 1947, when the Educational Testing Service was established as a nonprofit, nonstock corporation that took over most of the work of designing and administering tests from the CEEB.^[109]

Testing and Learning

A theoretical issue that has remained at the heart of the testing enterprise since its inception has to do with the effect of tests on how and what people learn. By examining applicants on material that is not directly relevant to the positions to which they aspire, qualifying tests have been accused of selecting professionals or functionaries who are ill-suited for the jobs they are supposed to perform. Tests are also criticized for promoting the development of only part of the mind's capacity: encouraging, for example, memorizing disconnected facts and the ability to regurgitate received wisdom at the expense of critical and creative thinking. It

― 217 ―

is claimed that people—both test givers and test takers—become so engrossed in tests that they become ends in themselves rather than means to the larger end of acquiring knowledge and skill.

These criticisms are perennial. As we have seen, many of them were directed against the Chinese Civil Service Examination nearly a thousand years ago, and they were raised again by both sides in the nineteenth-century debate over the relative merits of oral and written examinations. In our own time, the contended ground has shifted to essay versus multiple-choice test formats, but the basic questions have remained the same.^[110] As a result of this debate, the American College Testing Assessment (the ACT, a standardized college entrance examination taken by about one million high school seniors annually) was redesigned in 1989 to place less emphasis on factual material and more on scientific concepts and abstract reading skills. This is the first major revision of the ACT since its inception in 1959. Moreover, the SAT—the nation's other major standardized college entrance exam, taken by some 1.2 million high school seniors each year—will be revamped as of 1994. As with the ACT, changes in the SAT are designed to stress abstract thinking skills. The new SAT will include some math questions with open-ended answers rather than the traditional multiple-choice format and an optional essay question. Serious consideration was given to making the essay mandatory, but the proposal was not adopted because of its potential adverse impact on foreign-born candidates.^[111]

A critic who framed the issues of the effect of testing on learning with uncommon clarity was the Reverend Mark Pattison, Rector of Lincoln College, Oxford, from 1861 to 1884. Pattison was an active supporter of the liberal reform movement, which during these years transformed Oxford from a university keyed to preindustrial British society to its modern form. One of the central elements of the reform was selection by merit, as merit was revealed by competitive examinations.^[112] Examinations represented a dilemma for Pattison, and the weight of his allegiance shifted from one horn to the other during the course of his career. As a younger don in the 1840s and 1850s, Pattison was favorably impressed by examinations, most notably, their capacity to stimulate otherwise complacent students to harder work. So in 1855

― 218 ―

he reckoned that the introduction of written examinations in 1800

had been the root of the vast improvement in undergraduate study which occurred during the first half of the century; similarly, the opening of scholarships to unrestricted competition [by examination] had not only given "a most powerful impulse" to the schools, but, within the university, had "awakened dormant energies, and stirred the stagnant waters."^[113]

Pattison had always recognized that examinations also had severe drawbacks, and by the 1860s and 1870s, the balance of his opinions was swinging against them. The essence of the matter is simple. Examinations are supposed to be a means to an end, that end being learning. But when prizes, scholarships, and prestige are allocated on the basis of performance in examinations, they become transformed from means to ends in themselves. This is destructive to disinterested learning—the greatest good that Pattison knew—because the imperative to excel in examinations dictates precisely what will be studied and prevents students from exploring those areas that have stimulated their curiosity but will not be covered by the tests. Pattison learned this last lesson only too well because, as an Oxford undergraduate himself, he followed his muse into a wider variety of subjects than was prescribed for the degree, with the result that his own performance on examinations was mediocre. Thus a man who became one of the most energetic and effective teachers and scholars in the Oxford of his epoch was for years burdened with a sense of inferiority before those who surpassed him in the examinations. One of the remedies he suggested for these ills was that provision be made for a large proportion of Oxford students to be non-degree-seeking individuals who would come to the university for the pure love of learning and would sit for no examinations at all.^[114]

The drive to excel on tests, and thus to get good grades, continues to detract from learning today. Students' intellectual curiosity is eclipsed as they develop proficiency in calculating what material is likely to be covered on tests and studying that exclusively, with little effort to understand or appreciate the subject

― 219 ―

matter in its own right. Especially when the tests are objective in format, grades depend on students' ability to identify what they have been taught in the precise form that it was presented to them. There is little room or reason to assess material critically, creatively, or synthetically. The all-important measure of success in college is the grade point average.^[115] An alarming number of students seem to think that getting good grades is important enough to justify any means to that end. Studies have indicated, for example, that 50 percent or more of college students attempt to improve their grades by cheating.^[116]

Probably the most repugnant aspect of the competitive system of testing and grading is that it requires that some fail, if only to identify, by contrast, those who succeed. "Failure is structured into the American system of public education," writes William Bailey. "Losers are essential to the success of winners."^[117] The effects of this are psychologically devastating. Donald Holt carried out an experiment in a college course in education by giving failing grades to all students for the first several tests. The reactions were disbelief, anxiety, anger, and, finally, outright rebellion. Although he probably earned the undying enmity of several of the students, at least a few of them eventually got the point of the experiment. "I learned a lesson I will never forget," wrote one, "how it feels to fail."^[118]

The stress and threat to self-esteem posed by failure, or the prospect of it, stalk the educational system from start to finish. At the highest level, graduate students endure Ph.D. preliminary examinations that are "extraordinarily stressful and frequently humiliating." "Your whole adequacy depends on them," one student reported. Said another, "It's one of the greatest challenges I have ever been faced with. I will really feel left out if I don't make it. . . . It's the most important thing I've ever come up against. It's either attaining what I've been going for, or not getting there."^[119] Failure is no less destructive to young children just entering the educational system who, if they struggle in the classroom, perceive the contrast in teacher attitudes and behavior toward themselves and other children and who lack the power to express their resentment in rebellion. Such children feel guilty and look on themselves as stupid.^[120] Eventually their poor performance ac-

― 220 ―

cumulates in a self-fulfilling prophecy as, over time and with the continued reinforcement of more low grades, the students turn off to education and their learning capacities are curtailed.^[121] The impact of this on the future course of their lives—in terms of occupation, income, and social standing—is incalculable.

In concert with Pattison's proposal of nearly a century before, Howard Becker, Blanche Geer, and Everett Hughes suggested on the basis of a 1959–1961 study of university students that the student preoccupation with grades could be refocused on subject matters and the intrinsic rewards of learning if universities would abolish or drastically deemphasize the grading system.^[122] Interestingly, about a decade later, a movement along those lines actually caught hold. The University of California at Santa Cruz and Washington's Evergreen State University set aside traditional letter grades in favor of written evaluations of each student's performance by the professor (together, in some cases, with the student's assessment of what he or she gained from the course). Although other universities did not go so far, a widely accepted aim to facilitate learning for its own sake led them to limit grading in many courses to a simple pass or fail, to relax requirements of all types, and to allow students wider latitude in designing their own curricula. But the results of the experiment were disappointing. With a few (far too few) notable exceptions, students lacked the motivation and maturity to enhance their educational experience under a regime of liberalized and minimized requirements. They learned less, not more. Innovations such as one- or two-week study periods between the end of classes and the beginning of final examinations, intended to allow students to explore more deeply the issues touched on in their courses that stimulated their intellectual curiosity, produced more beer parties and ski trips than hours in the library. Recognizing that things were not working out as planned, during the 1980s, American universities moved to restore the requirements that had been relaxed ten to fifteen years previously.

Possibly the experiment was doomed to failure because it took place in the universities. Students came to college after twelve years of primary and secondary education oriented toward grades, and, as Richard Curwin succinctly observed, "Grades

― 221 ―

motivate students not to learn, but to get good grades."^[123] With this sort of background, they may simply not have had the requisite attitudes toward learning, nor have known how to go about it, when they were turned loose on the university campus with the admonition to study for the love of knowledge in its own right. Whatever the reason, after the experiment, we find ourselves very much where we were before it: still caught on the horns of Pattison's dilemma between the capacity of tests and the grading system to stimulate students to work, and their deadening impact on learning.

Further critique of the general practice of grading in the educational system, or of the teacher-devised classroom tests that are central determinants of course grades, is beyond the scope of this study. But standardized intelligence tests are also an important part of the competitive system of success and failure, and they are a matter of central interest here. One of the major items on the agenda of the remaining three chapters is to extend the investigation of when and how intelligence tests are used, the damage they do, and what might be done to control them.

― 222 ―

8
Willing, Ready, and Able: Vocational Testing

A round man cannot be expected to fit a square hole right away. He must have time to modify his shape.
Mark Twain,
Following the Equator

The positivist dream of using scientific qualifying tests to identify people's potentials has also taken root in vocational placement. Here the aim is to achieve an optimal match between two sorts of differences: differences in what is required by various jobs in a specialized division of labor and differences in the talents, interests, and other characteristics of the candidates for those jobs. Here I trace how vocational psychology has grown from the desire to intervene in this matching process systematically, even scientifically, and how it has used testing to foster the most effective utilization of human talents and resources in the workplace.

The field may be divided into two orientations according to the primary objective of the intervention. One approach is concerned with selecting the most suitable candidates for jobs or with improving the performance of current occupants of a job. Here the

― 223 ―

paramount interest in view is that of an organization, and every effort is made to staff it with personnel who can satisfy its needs most efficiently and successfully. This sort of intervention is primarily administered by employing organizations, and the professional field associated with it is usually called organizational psychology. The other approach is designed primarily to help individuals chart life courses that will maximize their personal fulfillment. These interventions tend to take place in schools and counseling centers; the relevant professional field is counseling or guidance.

The idea that it would be a good thing to intervene in people's vocational choices, both for their own good and for that of society, is anything but new. The point could hardly be made more forcefully than it was in the sixteenth century by Huarte. He began the dedication of his Examination of Men's Wits to the king of Spain with the following recommendation:

To the end that Artificers may attaine the perfection requisit for the vse of the commonwealth, me-thinketh (Catholike roiall Maiestie) a law should be enacted, that no carpenter should exercercise [sic ] himselfe in any work which appertained to the occupation of an husbandman, nor a tailor to that of an architect, and that the Aduocat should not minister Phisicke, nor the Phisition play the Aduocat, but ecah [sic ] one exercise only that art to which he beareth a naturall inclination and let passe the residue. For considering how base and narrowly bounded a mans wit is for one thing and no more, I haue alwaies held it for a matter certaine, That no man can be perfectly seene in two arts, without failing in one of them: now to the end he may not erre in chusing that which fitteth best with his owne nature, there should be deputed in the commonwealth, men of great wisedome and knowledge, who might discouer each ones wit in his tender age, and cause him perforce to studie that science which is agreeable for him, not permitting him to make his owne choice.^[1]

Huarte's recommendation to coerce people into the occupations for which they are best suited never, of course, materialized. Instead, for most of history, the most common—and commonsense—means of achieving a good match between human resources and jobs has been by trial and error. From an organization's point of

― 224 ―

view, of the numerous employees who are hired, those who perform well in the tasks assigned to them are retained and rewarded. From the individual's perspective, one moves from job to job until a satisfying position is found. This method has the advantage of a certain directness, in that the capacity of a person to do a particular job and the satisfaction of the job for the individual are determined from the person's actual experience in the job. However, a good deal of time and productivity are wasted in the "error" side of the trial and error process: those cases where a worker proves not to be cut out for the job, or finds it to be unsatisfying. Moreover, decisions are made by supervisors, and to the considerable extent that different supervisors would evaluate an employee differently, they are subjective and inexact. How much more efficient it would be if one could determine, with precision and in advance, the optimal match between people and jobs! Huarte had proposed sending "men of great wisedome and knowledge" into the population to decide who should do what. Today vocational psychologists fill precisely that role. But they have the advantage of testing—an objective means of measuring human capacities and dispositions—as the keystone of their craft.^[2]

The Birth of Vocational Psychology

An early proposal for a special profession devoted to vocational guidance, to be called "vocophy," was advanced in 1881 by Lysander Salmon Richards. Equally skeptical as Huarte about people's ability to select appropriate vocations when left to their own devices, Richards confidently predicted the advent of professional "vocophers" who, "after having studied and gained a thorough knowledge of their profession, should and will in due time, be located in every town and city of importance throughout the civilized world."^[3] To be trained as fully and compensated as handsomely as lawyers, vocophers would identify clients' vocational talents by means of a thorough examination of their physiology, phrenology, physiognomy, and mental, moral, and speculative philosophy.^[4] Richards's vision was before its time, however, and modern vocational psychology did not begin for

― 225 ―

another quarter century. In 1908, Frank Parsons opened the Vocational Bureau of Boston and dedicated its work to the principles that have remained at the fulcrum of the guidance-oriented sector of the field ever since: to assist the client to get a clear understanding of the self, to gain knowledge of various vocations, and to match the two. These objectives, and Parsons's methods of pursuing them, were much the same as had been proposed by Richards. Parsons used the tests of sensory acuity, manual dexterity, and memory that were then available, and, for a passing nod to phrenology, he would observe the development of clients' heads before, above, and behind the ears.^[5]

If Parsons was a pioneer in the guidance-oriented aspect of vocational psychology, an early contribution to organizational psychology's placement goal of locating the best person for the job is represented by The Job, the Man, the Boss . In this book, first published in 1914, Katherine M. H. Blackford and Arthur Newcomb proposed that companies replace haphazard hiring practices with centralized "employment departments" in which trained experts would use scientific methods to select the bestqualified applicants for jobs throughout the company. An intervention of this sort, they argued, would enhance productivity and efficiency, establish uniform policies and standards throughout the organization, and perfect management's control over the firm and its employees.^[6] A novel idea at the time, its success may easily be gauged by the ubiquity of personnel (or "human resources") departments in modern corporations.

Any scientific matching of employees to job requirements demands, of course, some means of measuring personal qualities. "After a great deal of study and experimentation," Blackford and Newcomb settled on a method that allowed the assessment of people's "physical, mental, and psychical aptitudes and character" simply by looking at them.^[7] The technique required the evaluation of nine variables, including the person's coloring, facial form as seen in profile, degree of fineness or coarseness of hair, skin, nails, hands, feet, general body build, and facial expression. As an example of the procedure, they specify what to look for in a successful salesman. The entire list fills more than a page;^[8] it includes (quoting freely) medium color, convex profile, height of

― 226 ―

5́6" to 5́10" and weight of 140 to 160 pounds, medium to medium coarse textured hair, elastic flexibility with a medium rigid thumb, medium conical or square hands, fingers that are medium short with broad, smooth, long, pink, clean, fine-textured nails, a clean mouth, eyes that are clear and bright but not too brilliant, no nervous twitchings or unpleasant mannerisms, and cheerfulness as shown by upward curves of the mouth.

Blackford and Newcomb readily admit that numerous exceptions to the values of mind and character that they associate with various physical features will readily spring to mind among the personal acquaintances of any reader. They assure, however, that "exceptions are always merely apparent—never real."^[9] The explanation for these apparent exceptions is that in so brief an account, they cannot delve into all of the intricacies of the system, and, in any event, no one could be expected on a single reading to acquire the expertise necessary to apply it. In particular, while it is not difficult to grasp the import of a single variable (a concave facial profile, for example, denotes mildness, absent-mindedness, and "a slow, easy, reliable digestion")^[10] to understand how the nine variables conspire together to reflect the complete man is a task best left to trained experts.^[11] The scientific skill and precision of these is nothing short of marvelous. Blackford and Newcomb relate how, in one employment department, an assistant walked quickly through one hundred or more applicants to select employees for eighteen jobs, including a boring mill hand, drill press hand, and lathe hand. He sent the men he had selected into the office, "where they were met by the other assistant, who had a duplicate list. In every case the assistant in the office knew for which position each man had been chosen by his team-mate."^[12]

Vocational psychology has remained true to Blackford and Newcomb's vision of using scientific methods to match personal characteristics with job requirements. The means of scientific assessment, however, have changed a great deal over the decades. Today, batteries of tests are conjoined with counseling to explore three basic questions about the individual in relation to vocations. One of these concerns the individual's interest in various occupations. The second explores vocational maturity—the extent to which the individual is ready to make vocational decisions

― 227 ―

and commitments. The third has to do with the person's abilities to do certain sorts of work.

Vocational Interest Testing

If we recollect the distinction made between placement and guidance, it is clear that interest testing is concerned mainly with guidance. Its primary purpose is to help individuals to identify vocations that would be interesting and satisfying to them. Several interest tests are in common use, two of the most popular being the Strong Interest Inventory and the Kuder Occupational Interest Survey. These operate on the assumption that people in a given occupation tend to have a definable pattern of preferences for a variety of activities. These include hobbies and other leisuretime interests as well as activities directly connected with the occupation. Vocational interest tests identify subjects' interests and preferences, in order that they may be matched with standard profiles that have been developed for a variety of occupations by tabulating the preferences expressed by hundreds of people successfully employed in them.^[13]

The Strong Interest Inventory and the Self-Directed Search, another popular vocationally oriented test, make use of a general theory developed by John L. Holland of the relation between personality types and occupations. This theory distinguishes six basic personality types and articulates the connections between them in terms of a hexagonal structure, as follows:^[14]

― 228 ―

These categories are related, such that adjacent categories (e.g., social and enterprising, or social and artistic) are highly consistent, those at one remove (e.g., enterprising and artistic) are less so, and those occupying opposite positions (e.g., enterprising and investigative) are inconsistent, even contradictory. Each individual's personality may be characterized in terms of one or a combination of these categories. Each occupation is also classified in terms of the same categories. In the guidance process, the makeup of the client's personality is identified on the basis of the SelfDirected Search or the Strong Interest Inventory.^[15] The client is then apprised of the various occupations that are congenial to his or her personality type and counseled to give them special consideration in career selection. The classification of occupations is done according to the personality types of people who successfully practice them. Therefore, Holland's theory operates according to the same premises as interest tests such as the Strong Interest Inventory or the Kuder Occupational Interest Survey. All of them seek to match personal traits of clients with personal traits of successful employees in various occupations. The main contribution that Holland's theory adds to this approach is to isolate the six main personality types and specify the degrees of consistency among them according to the hexagon.

Holland states that vocational counseling is least necessary for people whose personalities are clearly classifiable in one or a few adjacent types on the hexagon, with few or no inconsistencies, and where vocational aspirations are highly correlated with the personality type. Those at the opposite extreme require a great deal of counseling, and perhaps psychotherapy, in order to rectify misperceptions of self and the world.^[16]

I do not intend to mount a major critique of vocational interest testing. We do need to recognize that it has the potential to do damage to individuals who rely exclusively on its results. Consider, for example, the sad fate of the boy who would be a mortician. This boy was a good student, near the top of his class in all his subjects. His troubles began during the ninth grade, when his parents began to pressure him to select a vocation. Uncertain in his own mind about his chosen career, he asked the school counselor to administer an interest inventory. While the resulting

― 229 ―

profile was not well defined, it did show him to be in the top 10 percent of the scale for mortician. The boy determined that an undertaker is what he was cut out to be, and he undertook to prepare himself for that calling. Against the advice of the counselor, he rearranged all of his intended high school classes, being sure to take those that constituted the best preparatory track for morticians' school. Once in it, however, he found his new curriculum to be uninteresting. Moreover, it separated him from his friends, and he eventually became alienated from them. He became dissatisfied with school in general, and ultimately he dropped out of high school in his senior year.^[17] Miscalculations as blatant as this seem avoidable enough, however, and this boy would have saved himself a great deal of grief if he had heeded the advice of the school counselor. And, on the other side of the coin, vocational interest tests do help many people to get their own interests into clearer focus and to become better informed about the array of occupations available to them and the qualities of various jobs that may or may not be appealing.

One curious—and certainly unintended—consequence of vocational interest testing that should be mentioned is its potentially deleterious effect on social diversity. We live in a society that claims to value diversity. Universities, government agencies, and corporations extol the benefits to be derived from their diverse personnel, while television ads routinely pay obeisance to diversity by ensuring that every group they depict has the appropriate gender and ethnic mix. At first glance, vocational interest testing appears to support the social agenda of promoting diversity. The goal of such testing, after all, is to identify and help apply the client's uniqueness: the particular constellation of abilities and interests that define each one as a distinctive individual. But on further reflection, it becomes clear that the tests lead to the opposite result.

Perhaps the first to recognize this was William H. Whyte, although his reference was to personality tests rather than vocational interest tests per se. In his influential book, The Organization Man , Whyte argued that, far from promoting diversity and individuality, the tests widely used for placement purposes following World War II produced conformity among executive

― 230 ―

employees by consistently rewarding three qualities: extroversion, disinterest in the arts, and cheerful acceptance of the status quo.^[18] Whyte vehemently disputed the moral right of organizations to pry into people's psyches. While acknowledging that the threat of not being considered for a job or promotion might make it impossible for an individual to refuse to take a personality test, he reminded, "He can cheat. He must. Let him respect himself."^[19]

The tendency to produce conformity out of diversity is clearer still in contemporary vocational interest testing. Consider again how these tests work. The pattern of interests that a client expresses in a diverse array of subjects and activities is compared with the profiles of interests of persons successfully employed in a wide range of occupations. Clients are encouraged to consider those occupations where their patterns of interests match the interests of those already employed in them. Quite obviously, this process works to diminish diversity. Each individual's distinctiveness is identified to place people in precisely those situations where they are not distinctive. It follows that to the degree that clients rely on vocational interest tests and the counseling associated with them to select occupations, uniformity among the persons employed in a given occupation increases. This can produce a certain monotony. In the language of information theory, the greater the uniformity in any system, the greater the redundancy, or predictability. Activity in systems characterized by high redundancy tends to be routine, low in interest and creativity. In contrast, systems marked by diversity contain a good deal of information, or unpredictability. That is an important source of interest and imagination.^[20]

Most organizations that value diversity do so for the creative cross-fertilization that results from the interaction of people of different interests and backgrounds. This would still emerge from teamwork among people in different specialties—say, among designers, engineers, floor workers, and marketing specialists in a manufacturing enterprise. But when interaction is primarily between people in the same specialty, the intraoccupational uniformity fostered by vocational interest testing may in the long run produce jobs that are less interesting to their occupants in an

― 231 ―

overall occupational system that becomes sluggish and unproductive of creative innovation.

Vocational Maturity

Beginning with the founding work of Parsons and extending at least through the 1950s, the dominant theoretical orientation in vocational counseling was the "trait and factor" approach. This approach is governed by the assumptions that individuals have certain traits—abilities, interests, personality characteristics—and that various occupations require particular constellations of such traits in those who would pursue them successfully. The counseling process amounts to using tests of various sorts and other techniques to ascertain the client's traits and recommending consideration of occupations for which those particular traits are most appropriate.^[21] The trait and factor approach is alive and well in counseling today; for instance, Holland's influential theory of personalities, vocations, and their convergence is rooted in this perspective.^[22] Nevertheless, the trait and factor approach has come under fire for relying too heavily on tests and for simply accepting clients' personality traits as givens rather than investigating the psychological and sociological conditions that produce them.

One important reaction to perceived limitations of the trait and factor approach has been increased attention to when and how those personality traits relevant to vocational choices develop. Vocational counselors would identify the unfortunate lad described above, who dropped out of school after a frustrating attempt to pursue a career goal of mortician, as one who had not achieved vocational maturity at the time he took the interest inventory in the ninth grade. He was still in an exploratory phase. Preferences expressed at that time are likely to be unstable, so it is unwise (as it obviously was in this boy's case) to base important decisions and plans on them.^[23] Most interest inventories are designed for people of high school age and older, and it is recommended that they not be taken by younger individuals because their interests have not yet stabilized.^[24]

― 232 ―

Stimulated largely by the pioneering work of Eli Ginzberg and associates^[25] and Donald Super,^[26] interest in vocational maturity was born in the 1950s and continues to be an important issue in counseling psychology. In a moment of linguistic inspiration, Edwin Herr and Stanley Cramer coined "vocationalization" as the process by which people come to internalize "the values, knowledge, and skills which led [sic? lead?] to effective vocational behavior."^[27] The individual who has vocationalized properly comes to define the self to a substantial degree in vocational terms ("I am a musician," or accountant, or bricklayer, etc.). Such people have gained a good understanding of their own abilities and interests and on that basis have selected "occupational careers." This term refers to a vocation that provides the opportunity throughout an entire work life for steady rises to increasing levels of responsibility, prestige, and/or compensation.^[28] For those who have vocationalized well, the occupational career is an important part of the meaning they find in life. Among the many occupational careers, a few examples are the academic or military professions (with their well-defined ranks), skilled crafts, management careers in business, and the Catholic priesthood.

For practitioners interested in vocational maturity, one of the major concerns of guidance is to act as midwife to the process of vocationalization. To this end, researchers in counseling have devised tests to measure the level of vocational maturity, for use primarily with adolescents and high school students. Instruments such as the Career Maturity Inventory, the Cognitive Vocational Maturity Test, and the Career Development Inventory measure attitudes toward occupational planning and choice, knowledge of what occupations are available and how to get information about them, and development of decision-making skills.^[29]

In a socioeconomic system as complex as our own, vocationalization is a lengthy process that requires extensive shaping of the human raw material to develop in people the specific skills necessary for various occupations and, more profoundly, to dispose them psychologically to include vocation as an important part of their definition of self. In Super's theory, for example, vocational maturity is a process consisting of five stages and

― 233 ―

lasting nearly a lifetime. Adolescence is the time for crystallization of vocational preferences; identifying a specific career direction and taking initial steps to implement it occurs around ages 18 to 21; completing necessary training and entering the relevant occupation occurs around ages 21 to 24; during the stabilization phase, between ages 25 and 35, the individual settles down in the chosen career; the latter 30s to the mid-40s mark the consolidation phase, when attention turns to developing seniority and security in one's vocation.^[30]

If we look at it from the perspective of the socioeconomic system, it is clear that vocationalization is a highly desirable process because it produces marvelously efficient and devoted workers. The full significance of this becomes apparent if we compare vocationalization with Max Weber's thesis in his famous essay, The Protestant Ethic and the Spirit of Capitalism .^[31] There Weber argued that the industriousness characteristic of capitalists had its origin in the notion that success in one's worldly calling was evidence to Protestants that one was among the Elect, destined for salvation. Hence people strove mightily to succeed so as to prove to themselves that they were doing God's work and to relieve anxiety about the fate of their immortal souls. Given its otherworldly orientation, however, this mind-set could scarcely allow the successful ones to use the considerable wealth they achieved for temporal pleasures. They continued to live frugally and valued self-denial. Even after the religious underpinnings of these attitudes and behaviors passed from the scene, people retained the habits of industriousness and asceticism, resulting in capitalists who work not to enjoy the fruits of their labor but as an end in itself.

I suggest that as a technique for inveigling people to devote their energy and lives to the growth and efficiency of the socioeconomic system, today's vocationalization is well evolved beyond the situation Weber described. Far from being a hollow shell left over from former eschatological anxiety, work today has been positively redefined as a source of satisfaction, happiness, and meaning in life. For the person who has truly vocationalized, one's sense of honor, self-worth, and identity is closely tied to career. Marry this degree of commitment with a program of extensive

― 234 ―

training and placement on the basis of the individual's particular abilities and interests, and the result is a corps of workers who serve the system with boundless energy, consummate skill, and unstinting conscientiousness.

At first blush, there seems to be nothing wrong with this, for everybody wins. The socioeconomic system benefits from the attentiveness of its workers, while the workers simultaneously secure material well-being and find meaning in life in the context of an occupational career. Closer scrutiny reveals, however, that such human rewards are by no means invariably forthcoming. Vocationalization theory stresses those factors necessary for success that are dependent on the individual: maturity, ability, motivation, and an appetite for hard work. The theory is silent about external considerations that may impede the determined efforts of even those with high ability and motivation to achieve vocational success. But in reality, factors external to individuals and beyond their control frequently frustrate their career aspirations. In particular, two factors hold the ideal of a satisfying occupational career beyond the reach of many people. The more ancient of the two is denial of equal opportunity because of discrimination. More recent is "corporate restructuring" and the fundamental shift it represents in the organization of employment away from the notion of vocational careers.

Discrimination

The salient issues are embedded in the well-known caricature: when the chairman of the board at General Motors retires, everybody moves up a notch, and they hire an office boy. While the image this evokes plainly depicts an occupational career as a coherent pattern of progress from an entry-level position to retirement through grades of increasing responsibility, respect, and compensation, it reveals two biased assumptions inherent in the whole concept of vocationalization. One is that the process of moving up the career ladder is modeled on the ideal experience of white-collar employees. Workers on the assembly line are not involved. The other is that they hire an office boy, not an office girl. In other words, while vocationalization and an occu-

― 235 ―

pational career are held up as ideals for everyone, they are modeled on the stereotyped experience of white, middle-class males.

This bias is built into the concept of vocational maturity at its base.^[32] Probably the first explicitly articulated theory of vocational maturation was advanced by Ginzberg and his associates in 1951. The empirical study on which the theory was erected pertained to Anglo-Saxon male adolescents of rather high socio-economic standing and IQs of 120 or higher.^[33] The tests used to measure vocational maturity are similarly skewed. The item construction, selection and validation of two important vocational maturity tests—the Career Maturity Inventory and the Career Development Inventory—was done on the basis of work with middle class subjects.^[34] Therefore, the "vocational maturity" that such tests measure may actually be how closely subjects approximate middle-class attitudes toward the world of work.^[35] Again, vocational interest tests such as the Strong Interest Inventory and the Kuder Occupational Interest Survey match interests expressed by subjects with those of samples of persons successfully employed in various occupations. Because white middle-class males predominate in the more prestigious and highly compensated occupations, their interests serve as the norm with which subjects' interests are compared. To the extent that interests are conditioned by socioeconomic class, ethnicity, and gender, those of white middle-class male subjects will match better with the norming groups in these higher-level occupations, and therefore test results will show them to be more interested in such occupations than subjects from other social categories.^[36]

The tilt toward white middle-class males is apparent in the facts regarding who actually experiences the coherent developmental pattern of an occupational career. Women and minorities have tended disproportionately to hold poorly compensated and less honored jobs. A 1959 study examined what had become of over 1,500 people in their mid-forties who were identified as gifted children (IQ of 135 or higher) in 1921–22. The great majority of the men had achieved prominence in professional and managerial positions, while about 50 percent of the women were full-time housewives.

― 236 ―

Of those [women] who were working full time, 21% were teachers in elementary or secondary school, 8% were social workers, 20% were secretaries, and 8% were either librarians or nurses. Only 7% of those working were academicians, 5% were physicians, lawyers, or psychologists, 8% were executives, and 9% were writers, artists, or musicians.^[37]

Women's position in the work force has improved since then—but not dramatically. In 1979, women employed full time earned 63 percent of what men earned, while the comparable figure for 1988 was 70 percent. Between 1983 and 1988, the number of women in managerial and professional specialties rose from 41 to 45 percent, but they made no gains in salary, earning 70 percent of male salaries in both years.^[38] These figures bear out the widespread perception that women's inroads into business and the professions have been largely limited to the lower and middle levels and that a "glass ceiling" continues to bar their ascent to the highest positions.

As for minorities, in the late 1970s, black college graduates were earning about as much as white high school graduates.^[39] Since then, the economic condition of black families has actually worsened slightly, for earnings of black families dropped from 72 to 71 percent of those of white families between 1979 and 1988.^[40] In 1988, 27 percent of the whites in the work force were employed in managerial and professional positions, while 15 percent of the blacks and 13 percent of the Hispanics held jobs in these categories. Conversely, 23 percent of the blacks and 24 percent of the Hispanics were employed as operators, fabricators, and laborers, while 15 percent of the whites held jobs of these sorts.^[41] In sum, minorities are much likelier than whites to hold jobs that are poorly compensated and do not lend themselves to the sorts of challenges, responsibilities, and opportunities for creative growth associated with the notion of an occupational career. "For routine unskilled or semiskilled occupations," writes W. L. Slocum, "the distinctions between occupational steps may be so small and worker turnover so great that it would be difficult to consider that a career line exists at all."^[42]

A danger of vocational maturity theory and the practices of guidance derived from it is that it encourages everyone to voca-

― 237 ―

tionalize, to place career near the center of their sense of self and the meaning they seek in life. Minorities and the poor who buy into this strategy are likelier than others to be disappointed in their efforts to achieve a successful vocation. Given the emphasis guidance places on internal factors such as ability and motivation, they may well blame the failure on themselves, with resulting damage to their self-esteem. The tragedy is that the responsibility often lies not so much with them as with a system that discriminates against them and denies them equal opportunity. To preserve their psychological well-being, minorities and others who are subjected to discrimination are sometimes driven to a stance totally contrary to that promoted by vocational maturity theory. Far from finding personal fulfillment in work, persons in these circumstances are alienated from their work. They must find ways of convincing themselves that what they do for most of their waking hours has little or nothing to do with what they are as persons.^[43]

Charles Ford and Doris Jeffries Ford suggest that this situation calls for a special counseling strategy.^[44] They hold that many black workers, recognizing that they face greater obstacles in achieving a successful and satisfying occupational career than do their white counterparts, are not necessarily out to make career an integral part of their self-definition as vocational maturity theory expects. Their lateral movement from one job to another is therefore not "floundering," as it would appear from the perspective of such theories, but "calculated job speculation" designed exclusively to improve their economic position.^[45] Clients with this objective may be more effectively served by helping them to develop the most effective strategy for getting the most lucrative and secure job available rather than following the traditional counseling approach of seeking to place them in a vocation in which they might rise over the long term from an entry-level position to the top.^[46] Although this may well be an appropriate course for some clients in today's social conditions, it must be acknowledged that implementing it could in certain circumstances be extremely problematic. Imagine the charges of racism that would be forthcoming, for example, if a white counselor encouraged a middle-class white youth to go for his dream of becoming a lawyer

― 238 ―

while (quite honestly and realistically) informing a lower-class black youth with the same goal that the cards are stacked against success and suggesting that she might want to learn how to become proficient at calculated job speculation.

Even if the utopian day should arrive when there is no more discrimination and equal opportunity is a reality for everyone, many people would still not achieve a satisfying occupational career. That ideal has always been beyond the reach of many, including many white middle-class males. In the mid-1960s, most men did not envision work in terms of a long-term career but made occupational choices according to short-term considerations. Some 70 percent of lower-middle-class men spent less than half their work lives in positions that manifested any sort of orderly career progression.^[47] Twenty years later it became apparent that even for the upper middle class, "career development as a process of implementing one's self-concept is a fast eroding dream for many Americans, and not just racial minorities."^[48] One thinks, for example, of the executives who must make midlife career changes on losing their jobs in corporate reorganizations, mergers, or takeovers and of the crowd of would-be academics who graduated with Ph.D.'s in the 1970s and 1980s to find no tenure-track faculty jobs awaiting them and who migrated for years from one temporary position to another until many of them left the profession entirely. This brings us to the second external factor that frustrates the ambitions of many people to achieve a satisfying occupational career.

The Demise of Fordism

In about 1973, a fundamental change in the American structure of employment occurred, signaling the end of the "Fordism" that had dominated the scene for the preceding sixty years. With its symbolic beginning in Henry Ford's $5 day (see chap. 4), the Fordist system rested on the proposition that employees be paid sufficiently high wages to enable them to be prime consumers of the ever-increasing quantity of goods produced by industrial capitalism.^[49] At its height, this system produced a particular organization of employment.

― 239 ―

From World War II through the early 1970s, a growing proportion of American companies organized the division of work and the management of employees within their firms around the key institution of a full-time work force and an "internal labor market." Ordered hierarchies, promotion from within rather than from outside the company whenever possible, the erection of promotion ladders with relatively explicit rules and flexible procedures by which workers would be judged worthy of upgrading—these were the dominant characteristics of this form of corporate bureaucracy.^[50]

Note that precisely these are the conditions that foster the vocational experience we have been discussing under the name "occupational career." I suspect, in fact, that the emphasis in vocational guidance on the process of vocationalization and its culmination in an occupational career is an artifact of the Fordist organization of employment. Of course, these conditions were not realized in all companies or for all employees. That is why occupational careers were not available to many workers even at the peak of Fordism. Nevertheless, they did obtain in the largest private and public organizations, and the vocational ideal for everyone was formulated in their terms.

Since about 1973, the Fordist system has been transformed. International markets and competition, fluctuating currency exchange rates, technological innovations, new financial arrangements, and speed of communication favor those companies that can adjust rapidly to changing conditions. In response, many companies are "restructuring" both their blue-collar and white-collar work forces. The core of permanent, full-time employees is reduced ("downsized") and surrounded by a periphery consisting of part-time employees, temporary workers, and subcontractors. This represents considerable savings in labor costs, for peripheral workers are paid lower wages than those in the permanent core and the company does not provide them with health insurance, retirement, and other fringe benefits. Moreover, the company gains flexibility because it can enlarge or diminish the size of its labor force far faster than was possible under the Fordist system by simply adding to or cutting back on its number of subcontractors and part-time and temporary workers.^[51]

― 240 ―

Under Fordism there was some truth in the adage, "What's good for General Motors is good for the country." But the new system is anything but good for employees, because low-paying part-time or temporary jobs without fringe benefits are replacing Fordist jobs that held out the possibility of increasing responsibility and remuneration in a secure and satisfying occupational career. The transformation has serious implications for the profession of vocational counseling. To the extent that counselors continue to encourage their clients to vocationalize—to prepare themselves for an occupational career and to make it an important element in their self-image—they may be orienting them toward a world that, for many, no longer exists. Today, to make career an integral part of one's concept of self is less advisable than it was even two or three decades ago. In the present circumstances, an employment strategy exclusively focused on economic goals such as calculated job speculation may be relevant to more than just those who suffer from racial or gender discrimination.

Ability Testing

When purchasing steel or any other material used in the manufacturing process, observed H. C. Link in 1919, specifications are set and the material is tested to ensure that it meets them. The human material required by any enterprise ought to be selected with no less care, he argued. Applicants and employees should be extensively tested to ensure that they satisfy the particular specifications set for them.^[52] Business has espoused this point of view and regularly tests the abilities of workers before being hired or promoted. When the jobs in question call for physical qualities such as strength, manual dexterity, or hand-foot coordination, the task of stipulating job specifications and designing tests to measure how well people meet them seems to be relatively straightforward. Building in the spirit of Frederick W. Taylor's ideas about scientific management, "methods engineers," as they have been called, have devised a multitude of tests that ingeniously measure motor coordinations and perceptions that are similar to the operations an employee would be expected to perform on the job.^[53]

― 241 ―

However, these procedures rest on the assumption that a close correlation exists between an individual's ability to perform certain tasks and the efficiency with which one will perform them on a day-to-day basis. That assumption was called into question by the well-known study of Western Electric's Hawthorne works in the 1920s and 1930s. One of the conclusions of that project was that standards of productivity are set by group consensus and tend to be well below what individuals could attain if they worked up to the level of their abilities. To the extent that these findings are correct, they raise serious questions about the value of ability tests for predicting the performance of workers.^[54]

More vexed still is the relation between ability tests and the performance of people in management or the professions. The talents here are so elusive that it is difficult to design tests to mirror them. Link's advocacy of testing workers to ascertain their qualifications met a blank wall when it came to managerial positions. Executives themselves do not know what personal qualities contribute to their success, so it is clear that specifications for these jobs have not been defined. In that circumstance, psychologists have little hope of designing tests that can determine to what degree various candidates meet them.^[55]

Others have rushed in where Link feared to tread, but professional success has continued to defy explanation in spite of a plethora of theories advanced to account for it. In the nineteenth and early twentieth centuries, the secret was thought to lie in general character traits: proponents of the self-made man extolled thrift and industriousness, phrenologists attributed success in business to well-developed organs of acquisitiveness, and Blackford and Newcomb, in addition to enunciating more specific physical characteristics requisite to particular occupations, stipulated that any worthy employee should possess health, intelligence, honesty, and industry.^[56]

Health and honesty continue to be held in high regard, and hiring is frequently contingent on the results of a physical examination and a lie detector test or (in the aftermath of the 1988 legislation controlling mechanical lie detector tests) an integrity test. Aside from gross defects in health or integrity, however, the general trait that has been considered throughout the present

― 242 ―

century to be most pertinent to occupational success—particularly for professional and managerial positions—is intelligence. Doubtless this is partly due to the notion that intelligence helps people to learn quickly and with retention, to grasp the nuances of situations, and to deal flexibly and imaginatively with new problems—all capacities that would contribute to success as a business executive or professional. Although the drawback has been noted that intelligent people are likely to become bored with routine tasks, a certain degree of intelligence is also useful in less exalted positions because the clever worker can learn a job more rapidly and understand its place in the overall enterprise better than a dull one. The possibility must seriously be entertained, however, that another important reason for the widespread perception of intelligence as important to vocational success is that with the development of intelligence tests, means have been available to measure intelligence with greater apparent precision than other general character traits. Moreover, the practice of reporting on intelligence in quantitative terms, such as IQ scores, lends an aura of scientific objectivity to the use of intelligence as a criterion in personnel selection.

It was, in fact, after the development of the first standardized intelligence tests during World War I that psychologists such as W. D. Scott and Hugo Munsterberg devised and promoted intelligence tests for employee selection and placement.^[57] By about 1925, however, the keen interest that businesses had originally showed in intelligence testing abated, largely because they did not find the test results to be useful in revealing if candidates had the requisite skills for specific jobs. In conditions of an economic downturn, intelligence testing programs did not generate enough useful information to justify their expense.^[58]

The use of intelligence testing for placement purposes has gone through at least two more cycles since the 1920s. With the outbreak of World War II, the armed forces again came to the fore in the history of testing as aptitude tests were used to classify recruits in an elaborate system of categories. Unlike the original Army Alpha and Army Beta used in World War I, which were general intelligence tests, the World War II batteries included aptitude tests for specialized, technical jobs as well as general

― 243 ―

intelligence tests.^[59] By the war's end, some nine million recruits—five times the number who had taken the army tests during World War I—had taken the Army General Classification Test and had been sorted by it into five categories of learning ability, ranging from very rapid to very slow.^[60] Personality tests were also used during the war for purposes of identifying personnel with submissive or pliable personalities, as well as those who might be troublemakers or be of liberal or radical political persuasions.^[61]

During the period immediately following World War II, the U.S. economy experienced a dramatic increase in the need for managerial and professional positions. As had happened after World War I, a number of firms turned to testing programs modeled on the military's testing of officer candidates as a means of selecting the most qualified personnel. By 1950, for example, Sears, Roebuck and Co. was testing the intelligence, personality, and vocational interests of some 10,000 employees within the company as part of the decision-making process for placement and promotions.^[62] Further recapitulating developments in the aftermath of World War I, however, the popularity of testing immediately following World War II tapered off as the years went by.

One reason was that testing in the workplace was sharply criticized in works such as Whyte's The Organization Man^[63] as unwarranted and insidious intrusions into personal privacy. Another is that intelligence testing came under suspicion in the late 1960s and 1970s as a result of the civil rights movement. The notion became widespread that intelligence tests are discriminatory because members of minorities and disadvantaged groups tend to score lower on them than middle-class whites. The bell curve of general intelligence test scores for blacks, for example, is about one standard deviation below that for whites.^[64] Court decisions pertaining to Title VII of the Civil Rights Act of 1964 reinforced the perception of intelligence testing as discriminatory. Most important was the 1971 Supreme Court decision in Griggs v. Duke Power Co. , which shifted the notion of employment discrimination from disparate treatment to disparate impact.^[65] That is to say, whereas discrimination had previously been defined as differential treatment of individuals on the basis of race,

― 244 ―

sex, religion, and so on, the tendency after Griggs was to view a statistical difference in the outcome of employment selection procedures on members of different groups as evidence of discrimination. For example, if whites tend to be hired disproportionately more than blacks because on average the former score higher on intelligence tests, by the disparate impact standard, that is a case of discrimination. This decision led employers to suspend measures such as standardized intelligence tests, because score differences between ethnic groups invited litigation or adverse actions by regulating agencies such as the Equal Employment Opportunity Commission.^[66]

In a related development, affirmative action procedures designed specifically to encourage the employment of minorities were introduced. Recognizing that minorities tend on average to score lower than whites on standardized tests, the Equal Employment Opportunity Commission moved to prevent the discrepancy from having a discriminatory effect against minorities by disallowing tests in the employment selection process if they do not recommend members of minority groups for employment at a rate proportional to their representation in the population at large.^[67] In response, in the early 1980s, the U.S. Employment Service and many state employment agencies adopted a procedure known as "within-group score conversion" or "race norming." Percentile scores on the Employment Service's General Aptitude Test Battery are calculated by ranking black applicants only with reference to other blacks and Hispanics in comparison to other Hispanics; whites and Asians are lumped together in a third, "other" category. Since blacks and Hispanics tend in general to score considerably lower on the test than whites and Asians, this means that if a black, Hispanic, and white all make the same raw score on the test and that score is sufficient to place the white applicant in the top 40 percent in the "other" category of whites and Asians, the Hispanic would be in the top 25 percent and the black applicant in the top 20 percent of their respective groups. This practice was intended to ensure that some groups are not disproportionately represented in referrals made to employers because of group differences in test scores. However, since usually only the percentile score is reported to them, employers (who are

― 245 ―

often unaware of the practice) may be misled by within-group score conversion. Returning to our hypothetical three applicants, an employer might easily get the impression that the black did best on the test, the Hispanic next, and the white worst, when in fact raw scores were identical for all three.^[68]

Given that the social, legal, and political climate of the time was decidedly critical of testing, John Crites could write in 1983 that although testing had a long and productive history in vocational psychology, current research in the field was not focused in that direction.^[69] As if to verify Crites's generalization, an essay from the same year devoted to a review of current theoretical issues in vocational psychology made almost no mention of testing.^[70] What little theoretical attention testing received during this period tended to be of a critical nature, as, for example, Lee Cronbach's assessment of the Armed Services Vocational Aptitude Battery (ASVAB). Its primary purpose is to enable the armed services to identify potential recruits, but many school districts use it for the more general purpose of assisting high school seniors to identify vocations commensurate with their abilities. A unique feature of the ASVAB is that the military administers the test, scores it, and reports the results free of charge. This incentive has so much appeal to budget-conscious school districts that the ASVAB is one of the most prevalent tests in America, taken by over one million high school seniors annually. It would seem that everyone benefits. The schools can provide vocational testing at no charge, and the armed forces have access to information about large numbers of young people among whom they can identify potential recruits. The only losers may be the students who take the test. Cronbach argued that its reporting techniques and the research into its reliability and validity were inadequate. Although it may be of help in conjunction with other tests and when interpreted by a trained counselor, a student who makes important vocational decisions alone or in consultation with a military recruiter solely on the basis of ASVAB results may be seriously misled. This is especially true for females, because the test is oriented toward typically male interests and activities.^[71] Cronbach's critique led to a revision of the ASVAB, although its utility as a vocational guide is still

― 246 ―

questionable because it is oriented more toward general rather than specific abilities.^[72]

Now the worm is turning again. The shift to the political right in America in the 1980s has muffled civil rights concerns about ethnic and class bias in testing and reinvigorated interest in testing—at least, intelligence testing—for vocational purposes. The possibility that different racial groups really do differ in intelligence is being raised again as people notice that Great Society programs seem to be doing little to reduce the discrepancies in intelligence test scores between different groups. One response was to maintain that intelligence is not so important to many jobs after all and to relax employment requirements. But, according to Linda Gottfredson, this represented wishful thinking and resulted in diminished performance and productivity.^[73] Now many vocational psychologists are resurrecting the notion that intelligence is a significant factor in job performance and argue that testing is a valuable tool for successful vocational placement. Two special issues of the Journal of Vocational Behavior have been devoted to issues of intelligence testing in vocational psychology (one in 1986 and the other in 1988), and nearly all contributors adopt a positive stance toward testing. For example, John Hawk of the U.S. Employment Service holds that general intelligence is so crucial to performance in all human activities that other tests of specific aptitudes are largely redundant. In his mind, the most effective vocational counseling amounts to advising individuals about the probabilities of success in different vocational levels on the basis of general intelligence tests.^[74] Today some employers require that college graduates applying for jobs submit SAT scores as part of their application materials. It may seem curious to demand that people who are on the verge of graduating from college (or have already graduated) submit results of a test that they took in high school, a test used primarily to predict how well they would do in college (and especially in the first year of college). However, employers who require it may recognize that the SAT is basically an intelligence test, and they may use it as a convenient way to gain information about the general intelligence of their applicants without having to go to the trouble and expense of giving them a new test.^[75]

― 247 ―

The contemporary move toward intelligence testing in the workplace is part of a general effort to curtail affirmative action programs, which in some quarters have even been branded as detrimental to the very groups they were designed to assist. It has been suggested, for example, that blacks are well aware that they may be hired because of preferential treatment rather than individual merit, and this undermines their confidence and self-esteem.^[76] Arguments such as these have brought within-group score conversion on the General Aptitude Test Battery (whereby, as discussed above, scores for blacks and Hispanics are adjusted upward relative to scores for whites and Asians) under fire in 1991 as a quota measure. It is probably no accident that the matter was brought to the public's attention at a particularly sensitive time, just when Congress was trying to craft a civil rights bill that could not be construed as demanding racial quotas in hiring and promotions. No one seems prepared to defend within-group score conversion, which appears to be a blatant case of preferential treatment for minorities, so federal and state employment agencies are fast discontinuing the practice.^[77]

Parallel developments are occurring on the legal front. Authors such as Clint Bolick^[78] and James Sharf^[79] argue that the doctrine of disparate impact stemming from the Griggs decision is counter to the spirit of Title VII of the Civil Rights Act of 1964, which they claim is oriented toward protecting the rights of individuals rather than groups. This is not a unanimous opinion, for Richard Seymour claims that Congress did intend to control disparate impact as well as disparate treatment in Title VII.^[80] In any event, Sharf perceives in the 1988 decision on Watson v. Fort Worth Bank & Trust an indication that the present Supreme Court may be willing to reconsider Griggs and is adopting a more tolerant attitude toward the use of standardized testing in employment decisions.^[81]

The Supreme Court's 1988 Watson plurality decision is likely to be viewed in years to come as the turning point in the Griggs disparate impact definition of discrimination. . . . Refocusing on objective employment standards will likely have the salutary effect of returning individual merit to its rightful place as the touchstone of

― 248 ―

opportunity. Objective standards are coming back into focus in both education and employment, personnel measurement is "in," and the rising of competence cannot be far behind.^[82]

Its use in the context of vocational placement marks the second place where we have encountered intelligence testing in the course of this book. The other is in the educational system, as described in chapter 7. As our review of recent history indicates, intelligence testing has provoked an immense amount of controversy. Most of the points of contention are equally relevant to its role in vocational and educational placement. Thus, it will be convenient to consider the social effects of intelligence testing in these two areas together. The issues are sufficiently numerous and complex that chapter 9 is devoted entirely to them.

― 249 ―

9
"Artificial" Intelligence

The decisive moment was at hand when the hopes reposed in Sapo were to be fulfilled, or dashed to the ground. . . . Mrs. Saposcat, whose piety grew warm in times of crisis, prayed for his success. . . . Oh God grant he pass, grant he pass, grant he scrape through!
Samuel Beckett,
Malone Dies

This chapter is a critique of intelligence testing; more, it is a critique of the conventional concept of intelligence. After some preliminary remarks about the relation between intelligence testing and the principle of equality of opportunity, an effort will be made to define "intelligence" as it is conceptualized by the general public. I identify a number of unfortunate consequences of the conventional concept of intelligence and suggest that it came into being largely as a result of intelligence tests. "Intelligence" is not some preexisting quality waiting to be measured by intelligence tests but is rather defined and fabricated by them. Hence the title of the chapter: the intelligence of the human mind is no less "artificial" than the intelligence associated with computers.

― 250 ―

Intelligence, Equality, and Inequality

In 1928, the state of Wisconsin established the Committee on Cooperation to explore ways of predicting who, among the rapidly growing high school population, could be expected to succeed in college. The intent was to lessen "the serious tragedy of student mortality" by discouraging those who are doomed to fail in college from matriculating in the first place. Positioning itself to grapple with its charge, the committee adopted the following as its guiding philosophy:

That educational opportunity shall be viewed as a broad highway extending from kindergarten to college graduation and that every child has a right to travel this highway as far as his native interest, capacity and endowment will permit. If this philosophy is sound then the committee felt that it must think of its problem, not in terms of that democratic principle which insists upon the political equality of human beings but in terms of the principle of biological inequality of human beings. While the committee subscribed completely to the principle of equality of opportunity it recognized just as completely that equality of opportunity means, not identity of opportunity but diversity of opportunity.^[1]

With these words, the Wisconsin committee enunciated the tacit assumption that underlies essentially all intelligence testing. That assumption marries two propositions, one having to do with inequality and the other with equality.

The first proposition holds that human beings are created unequal. In classical antiquity, Plato distinguished among men of gold, silver, and brass. The first category contains the leaders and decision makers of society, the second of their administrators and henchmen, and the third the farmers and ordinary workers. The assumption of human inequality is still nearly universally affirmed, although the metaphor is no longer metallurgical and the tripartite division has been replaced with a smoother slope of finely graded distinctions. The second proposition is of more recent vintage. As articulated in eighteenth-century documents such as the Declaration of Independence, it holds that all men

― 251 ―

(today, human beings) are created equal. This in no way contradicts the first proposition, for the one acknowledges that individuals differ in their interests, talents, and motivations, while the other stipulates that all have (or should have) identical civil rights and equal opportunity. The two principles are actually complementary. The principle of equal opportunity makes it possible for the unequal talents and qualities of different individuals to be recognized and appreciated. "The argument for democracy," Thorndike wrote, "is not that it gives power to all men without distinction, but that it gives greater freedom for ability and character to attain power."^[2]

The United States was the first Western nation to wed the assumptions of equal opportunity and unequal endowments. It was argued, particularly before social reforms moved Europe in the same direction, that equal opportunity and the social mobility that inevitably stems from it assured America's success in competition with the class-ridden countries of the Old World. Even with roughly equal populations, the United States would have a larger pool from which to draw human talent because important positions were available to everyone rather than reserved for members of the upper social stratum.

There was an explicit rejection of the classic conservative assumption that virtue could be concentrated in an elite social class and transmitted by blood over generations. The mobility ideology rested on the equalitarian premise that talent was distributed at random throughout the population. This made the repeated running of the race conducive to progress; the open society could tap the energies of all its members by allowing all to compete freely.^[3]

A Positivist Meritocracy

It is difficult to imagine a more efficient, rational social order than one in which all persons are placed in the positions for which their particular talents are most suited. Such a social order is often called meritocracy. The concept becomes all the more compelling when positivism serves as its midwife, when the powerful

― 252 ―

techniques of science are enlisted to achieve the goal of the optimal utilization of human resources. The positivist meritocracy shines as a utopian state in which want, waste, and conflict will be replaced by prosperity, efficiency, personal satisfaction, and social tranquility. People looked especially to psychology as the science that would lead the way to the meritocratic utopia, and American psychologists of the era during and after World War I eagerly anticipated the "prediction and control," "human engineering," and "social efficiency" that would be realized from applications of their science.^[4] These attitudes persist in some sectors of psychology today, particularly in behaviorism and in psychometrics, the branch of psychology concerned with measurement and testing.

The essence of positivist meritocracy is scientific placement of persons according to ability: political heads would be those with the greatest leadership and decision-making abilities, scholars would manifest the most highly developed cognitive skills, captains of industry would have uncommon ability in matters pertaining to economics and management, and artists and artisans would be talented in manipulating material of various kinds, while unskilled laborers would be those who lack sufficient abilities to suit them for anything else. A meritocracy grounded in equal opportunity requires that the race be run repeatedly in order that the unequal talents of people from all social strata might be identified and put to optimal use. What form should the race take? According to a blue ribbon committee convened in the late 1970s by the National Research Council,

One tool that appeared particularly promising in organizing society—and one that was promoted particularly aggressively by its practitioners—was the new technique of educational and mental testing, which emerged at the turn of the century. Enthusiasts claimed that testing could bring order and efficiency to schools, to industry, and to society as a whole by providing the raw data on individual abilities necessary to the efficient marshaling of human talents.^[5]

Testing was considered to be a no-lose instrument of meritocracy, because the public benefits to be realized from the most

― 253 ―

efficacious placement of people would be matched by the advantages to be reaped by each individual, whose peculiar interests and capacities would be identified by testing and then nurtured and developed by education. So even John Dewey, a strong proponent of education for the development of the individual and no great friend of testing, pointed early to its potential service to both the individual and society in "My Pedagogic Creed" of 1897: "Examinations are of use only so far as they test the child's fitness for social life and reveal the place in which he can be of most service and where he can receive the most help."^[6]

The way testing is used, however, is strangely at odds with certain core features of modern society. If there is anything that characterizes the modern socioeconomic system, it is complexity. The jobs that need to be done to keep the multifaceted system going require a multitude of different skills. The abilities with which people are assumed to be unequally endowed are great in number and diversity. If, then, a meritocratic selection technique is to achieve the optimal placement of persons in social positions, certainly it should measure their capacities along many different dimensions. Curiously, one ability soon came to stand above the others as most likely to ensure success in any and all endeavors. This is mental ability, or general intelligence.^[7] I suggest that an important reason for the ascendancy of intelligence as the premier candidate for meritocratic selection is because it became technologically practicable to test intelligence on a massive scale. The first mass test was an intelligence test. Thanks to the technological breakthrough of the multiple-choice question, the Army Alpha of 1917 became an instrument that could evaluate millions of individuals cheaply and quickly. Additional refinements such as machine grading further perfected the efficiency of the Army Alpha's descendants to the point that, particularly in the last half century, standardized intelligence tests have been applied repeatedly to virtually everyone in our society.

Of course, the perpetuation and expansion of the huge and lucrative enterprise of intelligence testing required it to be commonly accepted that what intelligence tests measure is something of importance. Among many others, psychologists from H. H. Goddard, Lewis Terman, and E. L. Thorndike at the dawn

― 254 ―

of the era of mass testing to Arthur Jensen, Richard Herrnstein, Linda Gottfredson, and John Hawk of the present day have vigorously promoted the notion that level of intelligence is a crucial variable for success in virtually every undertaking.

What is Intelligence?

Despite the immense importance claimed for it, precisely what intelligence is has been the subject of a great deal of uncertainty and debate. This may relate to a certain disinterest in theory that has characterized testing. The intellectual center of testing lies in the branch of psychology known as psychometrics, a term that means simply the measurement of mental phenomena. Several psychologists have remarked on a gap that has opened up between psychometrics and another major branch of the discipline, cognitive psychology. Cognitivists, who trace their pedigree from Wilhelm Wundt, have been interested in determining the basic processes of mind and behavior in general. Differences between individuals are of marginal interest in this research program. Psychometricians, however, descend intellectually from biologists—especially Darwin—via Galton. Their attention has been directed precisely to individual differences, originally in an effort to trace the operation of natural selection in the evolutionary development of our own species.^[8] But this theoretical orientation waned as psychometricians increasingly concentrated their attention on the practical applications of testing. A number of psychologists have criticized this development, claiming that psychometricians have accommodated themselves to demands from the public for simple solutions to complex problems by developing tests that claim to predict who is likely to succeed in various educational programs, military assignments, jobs, and so on. The result has been a profusion of tests with little substantial grounding in psychological theory.^[9] The situation has become serious enough that Oscar Buros, who founded the Mental Measurements Yearbook as a means of reviewing and providing some quality control for the multitude of mental tests now available, stated in both the 1972 and 1978 editions that "at least half

― 255 ―

of the tests currently on the market should never have been published."^[10]

In 1923, Harvard psychologist Edwin Boring set out to cut the Gordian knot over what intelligence really is by calling it simply the human capacity that is measured by intelligence tests. This is termed an "operational" definition—the practice of defining something in terms of the procedures used to measure it. "Intelligence," as Boring put it, "is what the tests test."^[11] This definition has been reiterated several times since.^[12]

At first blush, the concept of intelligence held by the general public seems quite different from the operational definition. The popular view, seldom precisely articulated, focuses on general mental ability; perhaps it is best stated as the ability to learn. This is fleshed out by associating the general ability called "intelligence" with three attributes: (1) it is a single thing; (2) it comes in varying quantities, and different people have different amounts of it; and (3) the amount of intelligence possessed by each individual is fixed for life. Although the popular or conventional concept of intelligence does not have the classic form of an operational definition ("intelligence is what intelligence tests test"), I argue that it nevertheless is operational in essence because the attributes commonly associated with intelligence stem from testing practices. First, the idea that intelligence is a single thing is rooted in the fact that the results of intelligence tests are often expressed on a single scale, such as IQ, even when the test itself consists of several distinct parts. Where there is a single score, it is widely assumed that some single thing must exist to which that score refers. The second attribute—that intelligence is quantitative, and that some people have more of it than others—derives from the practice of reporting intelligence test scores on numerical scales. Only quantitative phenomena may be expressed in numbers. And when those numbers vary from one person to another, so must the amount of intelligence that the numbers represent. Finally, the notion that the amount of intelligence possessed by each individual is fixed for life stems from the belief that intelligence tests measure not what one already knows but one's ability to learn. It is commonly believed that how much an individual actually learns depends on opportunity, mo-

― 256 ―

tivation, and ability. Opportunity and motivation may vary at different times in the individual's life, but sheer ability to learn is generally considered to be a constant. It is hard-wired in the person. Hence each individual's intelligence is considered to be fixed by heredity.^[13]

This conventional or popular notion of intelligence has achieved the status of a bedrock assumption. It is taken by most people in our society to describe a simple fact of nature. I wish to dispute that point of view. I have just argued that the popular concept of intelligence results from intelligence testing. And I argued earlier (chap. 2) that all testing traffics in representations and that any representation is not a given in nature but is a product of cultural conventions. By this reasoning, the "intelligence" that is represented in intelligence tests is not some independently existing natural phenomenon but a reality as construed by culture. It could very well be—and in fact is—conceptualized differently in other cultural traditions.^[14]

An important reason why the particular concept of intelligence that reigns in our society has gained ascendancy is that intelligence tests have made it possible to measure, evaluate, and make a variety of selections among the masses conveniently and economically. Nevertheless, as I will attempt to demonstrate, that notion of intelligence has been responsible for a great deal of confused thinking, unjust policies, and human damage.

Eugenics

The most blatantly noxious social policy spawned by the conventional concept of intelligence goes by the name eugenics. This is the policy of strategically governing human reproduction so as to maximize the most desirable traits (and eradicate the undesirable ones) in future generations. Seizing on the notion that intelligence is fixed for life and determined by heredity, eugenicists placed it at the top of the list of traits to be manipulated by selective breeding. When Thorndike enumerated things humanity could do do improve its future, his first recommendation was "better genes." "A world in which all men will equal the top ten

― 257 ―

percent of present men" is within reach if the "able and good" will assiduously propagate. Meanwhile, for the good of the future, the "one sure service (about the only one) which the inferior and vicious can perform is to prevent their genes from survival."^[15]

Thorndike's conjoining of "able" and "good" is not serendipitous. The assumption was widespread that moral fiber varies directly with intelligence. According to Terman, "all feeble-minded are at least potential criminals. That every feeble-minded woman is a potential prostitute would hardly be disputed by anyone. Moral judgment, like business judgment, social judgment or any other kind of higher thought process, is a function of intelligence."^[16]

Terman and others concerned about the moral deficiency of the feeble-minded encouraged the use of intelligence testing to identify them. Goddard used the test particularly to ferret out morons, a term he coined to refer to those "high-grade defectives" (well above idiots and imbeciles) with a mental age of from eight to twelve. Goddard wrote, "The idiot is not our greatest problem. He is indeed loathsome; he is somewhat difficult to take care of; nevertheless, he lives his life and is done. He does not continue the race with a line of children like himself. . . . It is the moron type that makes for us our great problem."^[17] Given the presumed link between intelligence and morality, Goddard was persuaded that the ranks of criminals, prostitutes, and ne'er-do-wells of all sorts contained disproportionate numbers of morons. It was impossible to ameliorate their condition, for, he believed, it was the unalterable result of heredity. It was unrealistic to expect them voluntarily to refrain from multiplying, for their dull mentality and deficient morality would hardly enable them either to grasp or to embrace their civic responsibility. Society could, however, take steps to prevent them from reproducing. Sterilization could do the job, but Goddard had misgivings about using it until more perfect understanding of the laws of human inheritance had been achieved. His solution of choice was "colonization:" morons should be confined in institutions (perhaps on the model of Goddard's own Training School for Feeble-minded Girls and Boys) where society could minister to their inadequacy, control their immorality, and curtail their sexuality. Whatever financial bur-

― 258 ―

den such institutions might place on the public treasury would be more than offset by reduced needs for almshouses and prisons.^[18]

So much for the morons already here. Goddard was also concerned to prevent more from entering the country. Beginning in 1912, he directed efforts to identify possible morons among the new arrivals at Ellis Island. As a first sort, he used women who, he claimed, could pick out likely morons simply by looking at them. The candidates they selected were then subjected to Binet's intelligence test. Resulting IQs were appallingly low, leading him to conclude that immigrants of that time were the dregs of Europe. Thanks largely to Goddard's efforts, numerous immigrants were deported for mental deficiency in 1913 and 1914.^[19]

Notwithstanding Goddard's misgivings about its social palatability, sterilization was for a time practiced as a more aggressive treatment for those whose inferior and vicious genes were to be extirpated. A proposal at the First National Conference on Race Betterment in 1914 called for a total of about five million Americans to be sterilized between 1915 and 1955.^[20] An organization rejoicing in the name of the Committee to Study and Report on the Best Practical Means of Cutting Off the Defective Germ-Plasm in the American Population took the position that "'society must look upon germ-plasm as belonging to society and not solely to the individual who carries it' . . . and advocated segregation, sterilization, and education in the facts of heredity as the chief means of reducing defective germ-plasm" in the 10 percent of the American population who carry it. Committee chairman H. H. Laughlin targeted for sterilization "the feebleminded, insane, criminalistic ('including the delinquent and wayward'), epileptic, inebriate, diseased, blind, deaf, deformed and dependent ('including orphans, ne'er-do-wells, the homeless, tramps, and paupers')."^[21] While such grandiose plans were never realized, some 8,500 people were sterilized between 1907 and 1928 in the twenty-one states that enacted sterilization laws.^[22]

In eugenics, the conventional concept of intelligence leads to a particularly vicious form of injustice and discrimination because it encourages the privileged and the powerful to vilify the moral character of society's most defenseless members and sanc-

― 259 ―

tions the use of violence (such as enforced sterilization and confinement) against them. Mercifully, the most blatant cries for eugenics are in the past, although occasional apparitions prove that some elements of the mind-set are still alive. For example, the tacit assumption that "germ-plasm" belongs to society and not solely to individuals carrying it underlies the (ultimately unsuccessful) proposal set before the 1991 Kansas State Legislature that unmarried female welfare recipients be paid $500 to allow a contraceptive device to be implanted under their skin, as well as the 1991 stipulation by a judge in California that a woman sentenced for child beating use a similar device during three years of probation (the order is currently under appeal).

Race, Class, Gender, and Affirmative Action

While it is doubtless true that human abilities are to some extent inherited, numerous unfortunate consequences are spawned when the notion of heredity joins with the conventional idea of intelligence as a single thing that is possessed in a fixed amount for life. An offspring of this union that has done incalculable social damage is the idea that intelligence varies among different ethnic groups. If intelligence is inherited, so the reasoning goes, groups that marry and breed primarily within themselves might differ from each other in amount of intelligence just as they do in other inherited traits such as skin color or color and texture of hair. Those who are persuaded by this reasoning find evidence for it in the palpable differences in average intelligence test scores achieved by different ethnic groups. An example discussed earlier is the differences found between immigrants from different countries in the Army Alpha examination administered to army recruits during World War I. Similar discrepancies persist today. Jews and Japanese Americans tend to score higher than whites on intelligence tests, while blacks on average score about one standard deviation below whites.^[23] That is to say, the bell curve for blacks on an IQ scale is about 15 points lower than that for whites.

This situation is relevant to public policy because intelligence tests are often among the criteria used for hiring, school admis-

― 260 ―

sions, and other important selective decisions made about people. The difference in test scores means, of course, that Jews, Asian Americans, and whites are likelier to be recommended for selection than blacks or other minorities with lower average test scores. Civil rights advocates would identify the systematic discrepancy in test scores and the decisions made on that basis as an example of institutional racism. So, as Seymour has pointed out, "The use of tests and similar instruments can be an engine of exclusion of minorities far more efficient than any individual's personal intent."^[24]

The engine slowed down for a while from the late 1960s to the early 1980s, when the general political climate favored steps to compensate for disadvantages suffered by minorities through affirmative action. These included quotas of various sorts and within-group score conversion. As explained, this procedure involves upward adjustments in the percentile scores for blacks and Hispanics on the U.S. Employment Service's General Aptitude Test Battery to assure that members of ethnic groups were not recommended for jobs in numbers different from the proportion of that group in the population at large.

Since the latter 1980s, however, many affirmative action practices have come under fire as contradictions of the basic American principle that people be judged solely according to their individual merits. The basic issues pertaining to affirmative action had been crystallized in the late 1970s by the Bakke case, which concerned the admission of some members of minority groups to a medical school even though they had credentials that were inferior to those of some white males who were rejected. Within-group score conversion, in use since the early 1980s, was catapulted into the public spotlight during congressional debate over civil rights in 1991. The practice was widely condemned as a particularly blatant quota measure that adulterates individual merit with considerations of group membership. A scramble to do away with it ensued, and it was explicitly outlawed by the Civil Rights Act of 1991.

Rolling back affirmative action measures does not answer the question of what should be done about the fact that minorities tend on average to score lower than whites on standardized tests

― 261 ―

and therefore tend to lose out when vocational selection is made on the basis of them. Gottfredson's response is to accept the message conveyed by test scores at face value and to deal with it forthrightly: "We do not have a testing problem so much as we have a social problem brought on by real differences in the job-related capacities that tests measure,"^[25] primary among them being a real difference in general intelligence between ethnic groups.^[26] It is a long-term social problem, she goes on to say, that can eventually be solved only if it is addressed in a nonpatronizing, nondefensive, and nonpolitical manner.^[27] She does not go deeply into the form that such solutions might take. However, one certainly nonpatronizing and nondefensive (but hardly nonpolitical) course of action that Gottfredson foresees even in the shorter run has to do with supposed differences in intelligence between blacks and whites of equal education. Reasoning from the facts that blacks on average score lower than whites on both the precollege SAT and the pregraduate school GRE and fail professional licensing examinations more frequently, she concludes that the mean IQ of blacks is lower than that of whites of equal education. Out of a conviction that job performance varies directly with intelligence, she continues, "Black-white differences in intelligence at equivalent educational levels suggest that if employers rely heavily and equally on educational credentials when selecting black and white workers, then the blacks they select can be expected to be less productive on the average than the whites they select."^[28] Presumably, the antidote to the unwelcome consequences that stem from relying equally on educational credentials is to rely on them unequally. Employers who are mindful of Gottfredson's warning, that is to say, may favor white applicants over black ones with equivalent educational credentials, or they may "replace educational credentials with more valid selection criteria."^[29] Confronted with an argument such as this, one wonders what blacks can possibly do to get ahead. Just when they begin to acquire the education that has always been held up to them as the key to success, the rules change and equal education turns out not to assure equal opportunity after all.

Because Gottfredson's conclusion and the action it seems to recommend are likely to dismay those who favor affirmative ac-

― 262 ―

tion for minorities, it is worth unpacking the logic of her argument. It begins with the principle that people should be evaluated exclusively on their merits and abilities as individuals. In line with the conventional concept of it, general intelligence is taken to be a single, largely inherited thing—a very important thing because it is closely related to job performance and productivity. It is noticed, however, that minority group members on average score lower than whites on intelligence tests. Because general intelligence is thought to be accurately measured by the tests, the difference in test scores must mean that members of these minority groups are, on average, less intelligent than whites. Now comes an important twist in the argument: people are supposed to be considered purely as individuals rather than as members of groups, but now it develops that group membership is relevant to intelligence because members of some groups, on average, have less of it than members of other groups. People then allow this group-related consideration to affect their treatment of individuals. Given the average difference in general intelligence between members of different groups, when evaluating individuals from different groups who present equal credentials, the safer course is to select candidates from the more intelligent group.

Notice how this line of reasoning has culminated in a position precisely opposite from where it started. Beginning with the insistence that persons be assessed purely as individuals, we end up with the conclusion that they will be evaluated partly as members of groups. This involves no less bias on the basis of group membership than affirmative action policies are accused of, but with this difference: while affirmative action normally seeks to redress injustice by favoring groups that have traditionally suffered from discrimination, this is a form of affirmative action that works to perpetuate the privileges of the already advantaged. It is not a matter, as in the Bakke case, of selecting blacks over whites who have superior credentials. It is selecting whites over blacks who have equal credentials.

Lloyd Humphreys applauds Gottfredson's candid confrontation of the black-white difference in intelligence and wishes she had gone even further.^[30] Presumably the next step he desires is to add a class component to considerations of ethnicity. He points

― 263 ―

out that tested intelligence is correlated more strongly with socioeconomic status than with race,^[31] and the conclusion he draws from this seems less to be that a disadvantaged social position is responsible for lower tested intelligence than vice versa. So, in terms reminiscent of the link that Thorndike, Terman, and Goddard forged between intelligence, morality, and social value, Humphreys claims that the "cognitive deficits" measured by intelligence tests "are part of a complex that includes the development of the underclass, teen pregnancy, female-headed families, crime, drugs, and AFDC." Adopting a stance not remote from eugenics, he rues the fact that the possibility of dealing with these problems through the "constructive social action" of liberalized abortion has sadly been curtailed by religious and other pro-life groups.^[32]

The remark about abortion makes it clear that Humphreys too evaluates individuals on other than purely individual criteria. If there were some way of knowing which unborn fetuses were going to be cognitively deficient and turn into criminals, teen mothers, female heads of families, or welfare recipients, so that only they would be aborted, then, unpalatable as it may still appear to some, at least this final solution to our social problems would rest on individual considerations. But of course such foreknowledge is impossible, so the judgment as to what abortions are salutary must be made according to group-based and other criteria, of which the socioeconomic class and marital status of the mother would appear to be paramount. As it did with Gottfredson, the conventional concept of intelligence and the assumption that it is accurately measured by tests has led Humphreys a long way from the hallowed American value that people be assessed solely as individuals.

Humphreys is certainly correct that intelligence test scores vary systematically by socioeconomic status as well as by race. As table 2 demonstrates, the correlation between average SAT scores and family income is perfect. Of course, this information is not entirely independent from the relation already discussed between race and intelligence test scores, because ethnic minorities such as blacks and Hispanics are disproportionately represented in the lower class. Thus, Terman combined class and

― 264 ―

TABLE 2 Average SAT Scores by Family Income
Family Income	Combined Verbal/Math SAT Score
Over $50,000	998
$40,000–$49,999	968
$30,000–$39,999	947
$24,000–$29,999	927
$18,000–$23,999	900
$12,000–$17,999	877
$6000–$11,999	824
Under $6000	771
Source: Data released by FairTest in 1985 and reprinted here from Fallows (1989:164)

ethnicity when he raised the alarm about the consequences of the reproductive rates of desirable and undesirable stocks:

The fecundity of the family stocks from which our gifted children come appears to be definitely on the wane. It has been figured that if the present differential birth rate continues, 1,000 Harvard graduates will at the end of 200 years have but 50 descendants, while in the same period 1,000 South Italians will have multiplied to 100,000.^[33]

Herrnstein is among those who in our own day have rushed to man the rampart formerly guarded by Terman. In "IQ and Falling Birth Rates," an article that appeared in the Atlantic Monthly just at graduation time (May 1989),^[34] he berated commencement speakers who encourage intelligent female graduates to enter business and the professions for doing scant service to society. Herrnstein notes with some alarm that women of lower socioeconomic status produce more children per capita than those of higher classes. According to measures such as intelligence tests, the prolific proletarians tend to be mentally inferior to their wealthier, better educated, but reproductively reticent counterparts. Persuaded that intelligence is largely a matter of inheritance, Herrnstein, echoing Terman, anticipates that the discrep-

― 265 ―

ant fecundity among the classes will result in a general lowering of intelligence in future generations. His suggested antidote is to reaffirm the value of motherhood in the eyes of intelligent young women, in the hope that they will be fruitful and multiply for the sake of the nation.

If one is inclined to detect group bias in Herrnstein's argument, one would see it as directed most explicitly against the lower socioeconomic class. It is also possible, however, to discern a subtler sexism. It is, of course, yet another example of men telling women what to do with their bodies, but there is more. Herrnstein does not address the question of why it would be a good thing to prevent the general level of intelligence in the population from declining, but certainly the answer is self-evident: the more intelligent the people, the more effective the conduct of social, economic, and political affairs—of, indeed, all human activities. But who is to manage those affairs? Herrnstein's thesis is that today's bright young women should not be overly encouraged in that direction, for fear that it would divert their attention from motherhood and thus compromise the quality of the next generation. So the enlightened conduct of human affairs falls, by default if nothing else, to the men. What of the next generation? What roles does he envision for the children conceived from the happy mating of wealthy, well-educated, intelligent parents? What, specifically, would he have those children do if they happen to be daughters? Because they are likely to grow up, as their mothers, to be wealthy, well-educated, intelligent young women, presumably they should do the same thing that their mothers are supposed to do—reproduce. The enlightened management of human affairs would have to be left largely in the next generation, as in this one, to their brothers. And so it would go indefinitely: bright women should bear sons to run the world and daughters to bear sons to run the world and daughters to bear sons. . . .

Obviously, one effect of this arrangement would be to quash the competition for lucrative and prestigious positions that males (who have traditionally held them) have recently experienced from upstart females. Herrnstein's proposals thus boil down to yet another case of affirmative action that favors an already-advantaged group. Females cheeky enough to venture into the

― 266 ―

masculine world of work would be vulnerable to public condemnation for charting a selfish life course that threatens the well-being of future generations.

Arguments such as those by Gottfredson, Humphreys, and Herrnstein typically begin with the premise that people should be evaluated entirely on their individual merits, among which are the scores they achieve on intelligence tests. Then, by a convoluted logical transit, they end up judging people in terms of gender, class, or ethnic group membership. Now I want to suggest that a similar situation obtains the other way around—that what initially appears to be assessment on the basis of group membership turns out to be grounded in the principle that the sole relevant criterion for evaluating persons should be individual ability.

Let us develop the argument through an analysis of within-group score conversion on intelligence tests. This looks like a group-based procedure if ever there was one, because the percentile score that is reported for any individual depends not only on the raw score achieved but also on the person's ethnic group. We need to ask, however, why scores are converted. As we have seen, it stems from the purely individualistic assumption lying at the heart of the American creed of social mobility and equal opportunity: ability is not correlated with group membership. If this assumption is correct, then average intelligence test scores would be identical for all groups. That this is not the case indicates that factors other than intelligence must be influencing the performance on tests by members of different groups. These factors are usually identified with inequalities of opportunity, such as differences in home environment and educational preparation stemming from socioeconomic status, and cultural bias in the tests themselves.^[35] So, for example, IQ scores of Australian aboriginal children rise as their contact with whites is greater. This is one of several facts that "should not be hard to explain considering the tests were designed by and for Northwestern Europeans."^[36] When one controls for socioeconomic status, health, and the attitudes and motivations connected with home and school background, the differences in average test scores achieved by members of different ethnic groups in the United States drop to insignificant levels.^[37]

― 267 ―

Within-group score conversion may be understood as a measure to screen out the effect of these extraneous factors on intelligence test scores. Therefore, while it initially appears to assess people according to group membership, in fact within-group score conversion turns out to have the precise opposite effect of doing away with group-related privileges and disadvantages. The scores after conversion reveal what the distribution of intelligence would look like if those tested were assessed exclusively according to individual abilities in circumstances of truly equal opportunity.

The final proposition in this argument is that within-group score conversion is a temporary measure. The assumption that group membership is not a determining factor in the distribution of ability generates the anticipation that if and when differences in opportunity produced by social conditions such as discrimination and poverty are ended, group-related differences in intelligence test scores will disappear. At that point, within-group score conversion and other affirmative action measures will no longer be necessary.

Having attempted to unravel the tangled premises and consequences of the arguments about whether intelligence testing entails individual or group-related assessments, let me now spell out my own position on the issue. Of the two positions we have considered, I much prefer the one that accepts affirmative action policies because it avoids the racist, classist, and sexist implications that we detected in the alternative. In the last analysis, however, I think that both of these stances are untenable. The culprit beneath the entire debate is the conventional concept of intelligence. Specifically, both of the positions are built on the notion that intelligence is a single thing that is measurable by intelligence tests. The salient difference between them is that one accepts current testing as an accurate measure of intelligence, while the other claims that test results at present are adulterated by extraneous considerations. I do not accept the assumption that intelligence is singular and measurable by intelligence tests. In fact, my purpose in exploring these arguments about possible group differences in intelligence and affirmative action has been to point out how the conventional notion of intelligence is re-

― 268 ―

sponsible for a great deal of confused thinking and unwieldy if not downright destructive social policies. In the concluding section of this chapter, I will present the quite different concept of intelligence that is much preferable to the conventional one. Our exploration of the deleterious consequences of the conventional concept is not quite finished, however. Having looked at some of its sociocultural consequences, we must now say a few words about its psychological effect on individuals.

Self-Definition and Self-Esteem

A few years ago, students at a prestigious women's college began wearing T-shirts with the inscription, "I'm a 1600. What are you?" Few methods of presenting the self are more popular today than messages written on T-shirts and sweatshirts. These announce one's allegiance to a university or professional athletic team; declare, rebuslike, affection for a city ("I © New York"); broadcast political causes ("It will be a great day when our schools get all the money they need and the air force has to hold a bake sale to buy a bomber"); or, all other subjects having been exhausted, display information about themselves ("My parents had a fantastic vacation in New Orleans and all I got was this dumb T-shirt"). Students who wear the slogan, "I'm a 1600. What are you?" simultaneously boast about their own intelligence—1600 is the top score on the SAT—and issue a challenge to others. Most important, the students define themselves in terms of the test score. (It is, however, a fanciful or idealized self-definition, because those who actually achieve 1600 on the SAT are extremely rare.)

Likewise, people are defined by others in terms of tests. Byron Hollinshead answers the question posed in the title of his 1952 book Who Should Go to College? largely in terms of intelligence tests. It is in the interest of society at large, he argues, that its best talent be identified and developed. Therefore, "we believe that all students in the upper quarter in ability of the age group should be given a certificate which so states and that society should find ways of financing those who want a college education if they need

― 269 ―

assistance."^[38] Those to receive the certificate and all the rights and privileges thereunto appertaining are to be identified by IQ tests, high school record (special consideration should be given to performance in mathematics and language classes), and teacher recommendations, with special school committees making the decisions in borderline cases.^[39]

How Hollinshead conceptualizes and evaluates persons is obvious from several little fictional vignettes that he presents to dramatize his ideas about what society owes to, and can expect from, individuals of various ability levels. It is worth reproducing a few of them in full:

John is a high school graduate with an I.Q. of 143. Most of the youngsters in his block think he is a little queer, since he spends much of his spare time fiddling with radio sets. He graduated on the honor roll in high school, but his parents think they cannot afford to send him to college. If he goes to college, the chances that he will do well and graduate are about five to one. In peacetime or wartime, he will be such an asset that society cannot afford not to send him. Furthermore, he is a good risk to spend college endowment on, and his parents are justified in sacrificing for him.

Mary has an average high school record and only average perseverance. She has an I.Q. of 110. This would give her a low rating in a first-rate college, where her chances of success are poor. In a mediocre college she has about one chance in two of success. Society might take a chance on her, though the risk is great. A first-rate college which would have to support her partially by endowment would scarcely be justified in admitting her. Her parents might take the chance if it does not involve sacrifices on their part.

Alice is a lovely girl, with many friends. She tried hard in high school, but did not do well in any courses except typing and homemaking. Her I.Q. is 102. She played in the band and was a cheerleader. She would do well in a community-college course in cosmetology or secretarial work, although she would have trouble with shorthand. Her chances for success in a four-year college of even mediocre standards are poor. Alice's parents are scarcely justified

― 270 ―

in making many sacrifices for her education. If society provides the opportunity of a community college, it has done its share.^[40]

The benefits to be realized by society from keeping students in the top quartile in school are so great, Hollinshead holds, that where necessary, the community should provide the necessary financial support. If such a student is forced by economic necessity to consider dropping out of high school, for example, he recommends that friends and local organizations such as parent-teacher associations provide the needed funds.^[41] (It is amusing to imagine parents participating enthusiastically in bake sales and other fund-raisers to keep someone else's child in high school while their own children receive no special support because they failed to qualify for the magical top 25%.)

Defining persons—both the self and others—in terms of intelligence blends immediately and imperceptibly into evaluating them on the same grounds. This produces preoccupation with personal limitations. While people have been expected since biblical times to make the most of their God-given talents, it has also always been believed that God gives a good deal more in the way of talents to some people than to others. The discrepancy can produce a sense of inferiority among the less endowed and superiority among the gifted. The same considerations apply to intelligence, and for most of us, the precise location of the upper limit of our abilities (as revealed by intelligence tests) is a matter of supreme importance for our sense of self. I remember one summer day in 1959 when a high school friend and I compared our IQs as tested years before in grade school. Everything about the incident is vivid in my mind: we were in a swimming pool, and we splashed each other with playful anxiety as we counted up the scale one point at a time until our scores were reached. I especially recall the slight stab of dismay I felt when I learned that his IQ was (is now, and ever will be!) one point higher than mine. Now I knew why, a year or two previously, he had scored higher on the PSAT than I did. Would I never, I wondered, be able to aspire to quite the same heights as he?

Walter Kirn captures the painful self-assessment brought on by intelligence tests with humor and pathos in his short story, "A

― 271 ―

Satisfying Ride in the Country."^[42] Paul, the story's protagonist, found that he no longer had to work hard in school after a childhood IQ test showed him to be a genius (by 2 points). His teachers assumed he was brilliant and graded him accordingly even when he did not complete his assignments. The idea that he was a genius became a crucial component of Paul's self-image, and he decided to take another intelligence test as an adult. His curiosity about whether he still had the genius IQ was tempered with anxiety, leading him to adopt various delaying tactics before he finally sat down to take the test. When he did, his worst fears were confirmed, and the effect was devastating. As he tried to come to terms with the score, "I sat there, shaking. I faced the wall. . . . The drop was of a mere six points, of course, but that is a lot when you are alone and have only two to spare."^[43]

For those near the bottom of the heap, the irremediable character of the news that one's intelligence has been measured and found to be insufficient for anything but the lowliest of social and occupational positions can be devastating to self-esteem. In his fanciful history, The Rise of the Meritocracy, 1870–2033 , Michael Young claims that those who occupy the lower rungs of the social ladder are particularly abject in a meritocracy since they know they are there because of their own inferiority.^[44] Gardner made the point unequivocally:

It must never be forgotten that a person born to low status in a rigidly stratified society has a far more acceptable self-image than the person who lost out in our free competition of talent. In an older society, the humble member of society can attribute his lowly status to God's will, to the ancient order of things, or to a corrupt and tyrannous government. But if a society sorts people out efficiently and fairly according to their gifts, the loser knows that the true reason for his lowly status is that he is not capable of better. That is a bitter pill for any man.^[45]

The assault on self-esteem produced by the perception of low intelligence is not evenly distributed throughout society because test scores that indicate low intelligence are disproportionately represented among ethnic minorities and the lower class. Gott-

― 272 ―

fredson's discussion of personal development suggests that these connections between social status and intelligence are established during childhood. Children develop a sense of social class and personal ability during the stage of orientation to social valuation, which typically occurs between the ages of nine and thirteen. This phase of development largely sets the prestige and ability level of the vocations they are willing to entertain as possibilities for themselves.^[46] Linkages are forged such that high ability and high prestige occur together, as do low ability and low prestige. This is clear from Gottfredson's references to "lower-level" and "high-level" jobs,^[47] locutions that combine both ability and prestige levels. In another essay, she explicitly develops the proposition that the hierarchy of occupations is related to general intelligence. As the relationship between job levels and the intelligence of their occupants is more perfectly actualized, overall economic productivity increases.^[48]

Gottfredson's theory is supported by popular assumptions. It is widely believed (at least in the middle and upper classes) that lower-class people tend to be of limited intelligence and should fill low-level jobs (so defined in terms of both prestige and ability), while middle- and upper-class people are more intelligent and should fill the high-level jobs. This opinion obviously works at cross purposes with the notion that equality of opportunity can and should result in a high degree of social mobility. It fosters instead a stultifying notion of limitations, including psychological limitations that mark the child's developing image of self. The linkage of class membership with intelligence not only conditions the expectations and promises that society holds out for various categories of people but also colors the expectations and prospects that individuals imagine for themselves.

The conventional concept of intelligence and its measurement by intelligence tests are deeply implicated in this unfortunate situation. The notion that intelligence is inherited bolsters the idea that different classes, as largely in-marrying groups, may have different average levels of intelligence. Testing supports this point of view because, as we have seen already (table 2), intelligence test scores are highly correlated with socioeconomic status. Therefore, to use such tests as qualifiers for higher-level jobs

― 273 ―

will assure that those positions, with their higher levels of compensation, continue to be reserved for the higher classes. As for the more psychological issue of self-esteem that is our primary focus now, the correlation between socioeconomic status and test scores means that testing the intelligence of children in the schools likewise reinforces the association they develop in their perceptions of self between the class to which they belong and the amount of intelligence that they imagine they have. Thus, general intelligence testing perpetuates the problem rather than contributing to its solution.

The root of these social evils, I have claimed, is the conventional concept of intelligence as a single, innate mental capacity that varies considerably between individuals but remains fixed in each individual throughout life. We turn now to a critical examination of the merits of that concept.

Is Intelligence Fixed?

Consider first the notion that the amount of intelligence possessed by each individual is immutable. The source of that idea doubtless stems from the common view that intelligence has to do not with what a person has learned but with the ability to learn. That is not an achievement but a capacity or talent, and those are commonly thought to be part of a person's genetic makeup. No one denies that inheritance has something to do with human intelligence, but precisely what is not presently possible to isolate. If intelligence were a simple biological trait such as blood type or eye color, it would be meaningful to talk about it in terms of heredity and immutability. But intelligence is a behavioral trait (or, as will be argued shortly, a complex of different behavioral traits), and it is extremely difficult to deal with behavioral characteristics in terms of hereditary biology.^[49] As Anne Anastasi put it, the individual inherits not intelligence "but certain chemical substances which, after innumerable interactions with each other and with environmental factors, lead eventually to different degrees of intelligent behavior. . . . It should be obvious that the relation between the intellectual quality of the individual's be-

― 274 ―

havior at any one time and his heredity is extremely indirect and remote."^[50]

As already mentioned, intelligence is commonly conceptualized as the ability to learn, as distinguished from what one has actually learned. That distinction is often framed in terms of the difference between achievement and aptitude tests. Achievement tests are designed to measure how much the subject knows about a given subject matter, so that one might speak of achievement tests in calculus, or music theory, or the history of seventeenth-century France. Aptitude tests aim to measure not so much what the person has learned in particular but rather one's ability to learn in general (sometimes broken down into a few broad divisions of knowledge, for example, quantitative and verbal). This ability is commonly thought to constitute intelligence, so intelligence tests such as the pioneering Army Alpha and Beta tests developed during World War I, the Stanford-Binet and Wechsler Intelligence Scales, and the SAT and ACT college entrance examinations are all aptitude tests.

It is essential to recognize, however, that no aptitude test directly measures the ability or capacity to learn. That is an inference drawn from a sampling of what people have already learned. Therefore, the difference between aptitude and achievement tests is much less than is commonly recognized by the general public.

It is now widely accepted [by psychologists] that all cognitive tests measure developed abilities , which reflect the individual's learning history. Instruments traditionally labeled as aptitude tests assess learning that is broadly applicable, relatively uncontrolled, and loosely specified. Such learning occurs both in and out of school. Instruments traditionally labeled as achievement tests, on the other hand, assess learning that occurred under relatively controlled conditions, as in a specific course of study or standardized training program; and each test covers a clearly defined and relatively narrow knowledge domain."^[51]

For Anastasi, all mental ability tests may be located along a continuum in terms of the degree of specificity or generality of experiential background that they presuppose. To call some of

― 275 ―

them "aptitude tests" and others "achievement tests" may be confusing and lead to the misuse of test results.^[52]

Sternberg goes even further than Anastasi in discounting the distinction between aptitude (or intelligence) tests and achievement tests:

If one examines the contents of the major intelligence tests currently in use, one will find that most of them measure intelligence as last year's (or the year before's, or the year before that's) achievement. What is an intelligence test for children of a given age would be an achievement test for children a few years younger. In some test items, like vocabulary, the achievement loading is obvious. In others, like verbal analogies and arithmetic problems, it is disguised. But virtually all tests commonly used for the assessment of intelligence place heavy achievement demands upon the individuals tested.^[53]

Intelligence tests have been concerned to measure previous learning from the very beginning. Binet, author of the first intelligence test, aimed to measure intelligence as something distinct from any instruction a child had received. Nevertheless, Robert Schweiker's synopsis of what Binet actually did makes it plain that his intelligence test was also, as Sternberg would put it, last year's (or the year before that's) achievement test:

First grade teachers told Binet that most children had opportunity to learn many things before starting school. Those children who had learned many of those things, later learned well in school. Binet made a test of many of those things-which-most-children-had-opportunity-to-learn, and found that the test gave a fair prediction of success in school.^[54]

In other words, the logic of Binet's test, as with all subsequent intelligence tests, is that the probability of learning things in the future is directly proportional to what one has learned in the past. But once this logic is made explicit, it immediately becomes obvious that even if the individual is endowed with a fixed amount of innate mental ability, that can be only one of several variables responsible for one's past learning and, therefore, one's perfor-

― 276 ―

mance on intelligence tests. Important among the others are the person's opportunities and motivation to learn. These are complex phenomena, turning on matters such as what rewards and encouragements the individual has received for learning, personal relationships with parents and teachers, if and when the individual was exposed to subject matters that stimulated interest, how much time and how many scarce or expensive facilities, books, instruments and other resources have been available for learning, and so on.^[55] These factors may increase or decrease with time, and one's intelligence, as measured by intelligence tests, will change accordingly.

Binet was explicit that the intelligence possessed by an individual is not permanently fixed.^[56] His main purpose was, in fact, precisely to increase the intelligence of those children who, for one reason or another, were developing slowly. Binet's test was intended to identify such children, so that they might be placed in special education classes where they could learn better habits of work, attention, and self-control and thus assimilate information more successfully.^[57] His notion of special education, that is to say, focused on metalearning, or learning to learn. And that involves a change in the individual's intelligence. Binet wrote,

It is in this practical sense, the only one accessible to us, that we say these children's intelligence can be increased. We have increased what constitutes the intelligence of a student, his capacity to learn and to assimilate instruction.^[58]

Many contemporary theorists agree that intelligence can be increased. Sternberg, for example, argues that "intelligence is malleable rather than fixed,"^[59] and he is one of several psychologists who have developed programs for increasing intelligence.^[60] To conclude, that part of the conventional notion of intelligence which holds that each person's intelligence is fixed for life at a certain level is untenable.

Is Intelligence a Single Thing?

The popular notion that intelligence is a single thing stems largely from our habit of referring to it by a singular noun and the

― 277 ―

practice of reporting the amount of it that an individual has in terms of a single number: IQ, composite SAT score, and so on. Quite clearly, this is an example of the fallacy of misplaced concreteness. This is the error of assuming that where there is a name—in this case, "intelligence"—and a number, there must be some unique, preexisting phenomenon to which the name and number refer.

The fallacy of misplaced concreteness escalates to the point of absurdity under the operational strategy of defining intelligence (as Boring and several more recent psychologists cited above have done) as that which is measured by intelligence tests.^[61] Consider, for example, the effect of this definition on efforts to improve intelligence tests, such as the changes that are currently being made in the ACT and the SAT. If intelligence is nothing more or less than that which is measured by intelligence tests, it is nonsense to say that changes in a test could produce more accurate measurements of it. The "improved" test, being different from its predecessor, would not measure the same thing better but must measure something else . The addition of an essay section on a college entrance examination, for example, quite obviously brings skills into play that are different from those involved in answering multiple-choice questions. Therefore, the operational definition leads to the conclusion that there are as many "intelligences" as there are different tests to measure it. To imagine that for every intelligence test that ever has been or will be devised there is a real, unitary phenomenon waiting to be measured by it is to be absurdly mired in the fallacy of misplaced concreteness.

Throughout the history of theorizing about intelligence, scholars such as Alfred Binet, E. L. Thorndike, L. L. Thurstone, and J. P. Guilford have recognized this problem and have suggested that what we call "intelligence" is really a variety of different abilities, some of them possibly only distantly related to others.^[62] Different intelligence tests tap different combinations of these abilities and place different emphasis on them. The notion of intelligence as multifaceted continues to be fruitfully developed today, particularly by Howard Gardner and Robert J. Sternberg. Gardner's theory of multiple intelligences stipulates several quite

― 278 ―

distinct kinds of intelligence: linguistic, musical, logical-mathematical, spatial, bodily-kinesthetic, and "the personal intelligences" (capacities to deal effectively with one's inner feelings and social relationships).^[63] Sternberg's "triarchic" theory distinguishes three aspects of intelligence: components (the nature of the thinking process); experience (learning from and reasoning on the basis of experience); and context (adapting to and shaping the environment).^[64]

Gardner and Sternberg agree that present intelligence tests are woefully inadequate measures of the full range of intelligence. Gardner contends that they are focused primarily on linguistic and logical-mathematical abilities and are especially deficient in providing information about the musical, bodily-kinesthetic, and personal intelligences.^[65] Sternberg's opinion is that current tests pertain to scarcely one-third of intelligence's triarchy, being limited essentially to certain parts of its componential aspect.^[66]

So long as intelligence is viewed from a purely psychological perspective, as a property of individual minds and behavior, the view of it as plural or multifaceted appears to be entirely accurate and highly useful. But things change dramatically when we look at intelligence from a sociocultural perspective—as a product of social institutions rather than a property of the individual. From that point of view, intelligence emerges again as a single thing. Moreover, both the definition of what intelligence is and the amount of it empirically possessed by any individual are, from the sociocultural perspective, determined by intelligence tests. Curiously, this brings us to a position quite close to the absurd outcome of the operational definition, that for every possible intelligence test there is an "intelligence" waiting out there to be measured by it. The main difference is that a sociocultural view denies the preexistence of intelligence; it takes intelligence to be constructed by the test instead of somehow discovered by it. So the formulation becomes that for every possible intelligence test, there is an "intelligence" out there that is fabricated by it. This does not diminish the reality of intelligence, for artificial things are no less real than natural ones. Nor, although I will argue for its validity, do I wish to imply that the sociocultural concept of intelligence as a single

― 279 ―

thing is any less absurd than the operational view. There are, after all, no guarantees that the human condition is free from absurdity.

The sociocultural perspective on intelligence can be developed most clearly if we engage in the thought experiment of constructing a new test and imagining its consequences. Let us call it, simply, the New Intelligence Test, or NIT. It is intended especially to excel current tests by paying more attention to the practical aspects of intelligence used in everyday life and to sample more widely from the scope of intelligence as conceptualized by proponents of the multifaceted view. Hence, the NIT consists of nine sections.

1. A name recall scale tests ability to remember the names of persons to whom the subject has just been introduced.

2. A mathematics section tests the subject's ability to do problems of arithmetic and algebra.

3. The first impression scale invites a panel of ordinary people to evaluate the personableness of subjects by simply looking at them.

4. In the exposition of ideas section, the subject is given five minutes to read a page from Rousseau describing his distinction between self-love (amour de soi ) and selfishness (amour-propre ) and thirty minutes to present a clear and accurate written account of it, with original examples. (To avoid subjects learning of this problem in advance and studying for it, different forms of the test will feature other, analogous tasks in this section.)

5. The small talk scale evaluates subjects' ability to carry on an interesting conversation with someone they have never met.

6. A bullshitting scale assesses skill at participating in a discussion with two other people on a topic about which the subject knows nothing.

7. In the follow-the-directions scale, the subject is told once, at the speed of ordinary conversation, to do a task that consists of six distinct steps and is evaluated on how well the task is accomplished.

― 280 ―

8. The adult sports scale evaluates the subject's ability to play golf or tennis, with suitable adjustments for male and female subjects.

9. Finally, the SES scale is a simple rating of subjects according to parental socioeconomic status.

A composite score is generated from the results of the NIT's nine sections. What ability or human capacity is tested by the NIT? A good operational response would be that it tests the skills or wits used in taking the NIT, no more and no less. This is certainly nothing inconsequential, for were the appropriate studies to be done, it would doubtless turn out that high NIT scores correlate positively (probably more positively than IQ scores) with desirable social outcomes such as success in the university, high income, and election to public office. But it is also obvious that what the NIT tests is not a single quality or capacity of persons. It is rather a set of distinct qualities, which have been measured by the several sections of the NIT and combined into a single score for convenience in reporting NIT results. In that sense our thought experiment is in line with the view of intelligence as multifaceted.

But assume now that the NIT were to catch on in a big way—that it came, for example, to be widely used for college and graduate admissions and for hiring and promotion purposes by law firms, government, and corporations. In such an event, the composite of different abilities measured by the NIT would not remain static. People would spare no effort in preparing for the test, in the hope of achieving the rewards awaiting those who excel on it. They would review arithmetic and algebra, they would master techniques for remembering the names of strangers, they would practice bullshitting, they would take golf and tennis lessons, they would groom themselves to appear more likable on first sight. High school and college curricula would shift in the direction of more training in the areas covered by the NIT (if they did not, irate parents would demand to know why their children were not being taught something useful). Kaplan and Princeton Review would explode into the marketplace with courses that promise dramatic improvement in one's NIT scores. (One side

― 281 ―

effect would swell the public treasury as people report inflated income to improve their children's showing on the NIT/SES scale—and then have to pay taxes on it.)

All of this dedicated effort would have a palpable effect. Although the NIT obviously measures several quite different abilities, people would knit them together as they strive to improve them all in order to raise their NIT scores. They would begin to imagine these several abilities to be one. They would name it, perhaps, "NITwit." Given its importance for success in life, it would be valued as a thing of great significance. People would worry about how much of it they possess; they would envy evidence of its abundance in their contemporaries and look for promising signs of it in their children.

Not only would a new mental category swim into the social consciousness. The empirical amount of it possessed by individuals would literally increase as, in preparing for the NIT, they hone their skills at following directions, expounding on ideas, small talk, and the rest of it. And, of course, as individuals increase these skills, NIT scores would go up. There would be rejoicing in the land as today's average NIT scores exceed those achieved in the past or by test takers in other countries—until, perhaps, an apogee were passed, and then national consternation about declining NIT scores would set in. Given all these transformations and developments, it is fair to say that NITwit would become a new, singular personal trait, an objective reality literally constructed by NIT testing. Perhaps the ultimate development (and the ultimate absurdity, but it unquestionably would happen) would be when rival tests are marketed that claim to test it faster, cheaper, or more accurately than the NIT.

What happened in our thought experiment has been the experience of "intelligence" in the real world. Because of intelligence tests, a variety of disparate abilities (to do mathematical problems, to comprehend texts, to compare shapes, to sort ideas or objects into classes, to define words, to remember historical events, and to do all of these things rapidly) have been lumped together to form a new, unitary mental characteristic called "intelligence." It is a quality considered to be of great importance because intelligence tests serve as the basis for offering or de-

― 282 ―

nying educational and career opportunities and other social rewards. Given this importance, intelligence has become a target for special effort and training, with the result that people increase their overall proficiency in it. Precisely as with NITwit in our hypothetical example, intelligence has been fashioned into an objectively real personal trait by the practice of intelligence testing.

From this perspective, intelligence seems to belong to a wonderland where everything gets absurdly turned around. If we examine the form of representation that is involved in intelligence testing, in principle, it is clear that IQ scores and other intelligence test results are supposed to be signifiers of certain abilities and capacities. That is to say, intelligence (the signified) comes first as a thing in itself, and test results are signifiers or representations of it. But this analysis has demonstrated that the tests come first, that those abilities and capacities constituting "intelligence" have been formulated by society and developed in individuals on the basis of the tests that measure it. Thus the absurd situation emerges in which the process of signification does not so much represent the signified as produce it. From this point of view, it is appropriate to refer to all intelligence—the human kind as well as that developed in machines—as "artificial" intelligence.

This issue will be developed further later. For the moment, I want to suggest that absurdities of this sort are not particularly rare. They are, indeed, so common that the term "absurd" may not be appropriate for them. For the most part, human beings regulate their affairs in terms of conventions or agreements they have established among themselves (often tacitly) rather than according to the dictates of external reality. Constitutions and codes of law, religions, money, hierarchical systems of rank and privilege are all cases in point. Indeed, what external reality is understood to be and appropriate methods for knowing it are themselves matters of social convention.^[67] The concept of intelligence that has been analyzed and criticized in the preceding pages is conventional not only because it is commonplace but also in the sense that it is one example of a social convention or tacit agreement in terms of which human affairs are regulated. In a society

― 283 ―

such as ours, it is important to have some means of evaluating people for the purpose of allocating scarce rewards and opportunities, and "intelligence" as a socially constituted phenomenon has come to be an important criterion in that process.

To explain something is not, however, to justify it. The burden of this chapter has been that the conventional concept of intelligence has done a great deal of mischief by closing opportunities and damaging self-esteem for millions. It has created and justified eugenic programs of selective mating, immigration quotas, sterilization, and other blatant and insidious forms of racial, class, and gender discrimination. If this concept of intelligence is the product of social convention, however, it stands to reason that it can be changed by social convention. How that might be accomplished is one of the topics to be addressed next.

― 284 ―

10
Conclusion: Man the Measured

Thomas Gradgrind, sir. A man of realities. A man of facts and calculations. . . . With a rule and a pair of scales . . . always in his pocket, sir, ready to weigh and measure any parcel of human nature, and tell you exactly what it comes to.
Charles Dickens,
Hard Times

This book's sociocultural perspective on testing has generated two basic theses. One is that tests do not simply report on preexisting facts but, more important, they actually produce or fabricate the traits and capacities that they supposedly measure. The other is that tests act as techniques for surveillance and control of the individual in a disciplinary technology of power. This concluding chapter extends the analysis and critique of these two properties of tests and offers some suggestions as to what might be done about them.

Production

The Efficiency of Representation

If people were left to their own devices, they would identify their interests and talents by introspection and select training pro-

― 285 ―

grams and occupations accordingly. The only way of knowing whether they had chosen well would be to assess their performance after they had actually entered the program or had been working for some time on the job. From the point of view of a positivist agenda to manage social affairs scientifically, that process of trial and error is utterly wasteful. People with insufficient understanding of their own inclinations and abilities are likely to choose inappropriate goals and thus misdirect their efforts. Their mistakes will be revealed only after they, the training programs, or their employers had wasted much time, effort, and money in false starts and failed projects. The most visible waste is represented by those who aim too high or drastically in the wrong direction and who fail after perhaps years of frustration and futile effort. Less evident but no less wasteful are the missed opportunities of those who aim too low or only somewhat askew. They do reasonably well in their chosen vocation but would have attained loftier heights and made greater contributions had their aspirations been better targeted.

Testing offers a solution to this problem because it is designed to identify who will do well in what parts of the race before it is run. The reason, of course, is that much testing is concerned to assess not so much what one has done already but one's aptitude or potential to do it. This stems from the representational nature of all tests. The information collected by any test is important not in itself but only as it represents other information about the subject (what we have called the target information). When the target information pertains to events that will happen (or will probably happen) after the test is given, those tests may be termed future oriented. An example is using the SAT to predict how a high school senior is likely to perform in college.

In principle, future-oriented testing slices through the inefficiency associated with placement by trial and error. No longer need people spend months or years in a particular vocation to discover whether they are really cut out for it. Aptitude tests measure their ability against what is required in that vocation; interest inventories reveal whether they are likely to enjoy it; drug and integrity tests reveal if they have the requisite moral qualifications. All of these matters can be discovered not only prior to

― 286 ―

entering a vocation but even before entering a training or education program that prepares one for it.

So, at least, goes the propaganda promoting the efficiency of testing. In reality, testing delivers less than it promises, and it brings a number of unsavory side effects in tow. The typical stress in aptitude tests on verbal and quantitative skills is so narrow that these tests are often imperfect indicators of who is best suited for certain jobs or educational opportunities; especially are they silent on such critical matters as motivation and emotional stability. Moreover, although future-oriented tests are designed to predict performance, they accomplish this with reasonable success only in the short term. For example, Naomi Stewart's findings indicate a positive correlation between IQ and prestige level of occupation among thousands of army recruits in World War II.^[1] Her study, however, correlated intelligence as measured by the Army General Classification Test taken on induction with occupation held immediately prior to induction. When research is extended over the long term, a different picture emerges. Another study of World War II servicemen found no correlation at all between the predictions based on a one-and-one-half-day battery of aptitude tests taken by 10,000 recruits aged about 20 and their occupational success twelve years later.^[2] Also relevant is the longitudinal study of men who attended public schools in Kalamazoo, Michigan, between 1928 and 1950 and who were interviewed in 1973 and 1974, when they ranged in age from 35 to 59.^[3] When comparing their first jobs with scores on intelligence tests taken as youths, the correlation of high scores with top jobs was indeed visible. In subsequent years, however, the pattern changed as men with low intelligence test scores began to find their way into prestigious occupations. The percentages of those with childhood IQ scores of 100 or below rose from 26 percent of the professionals and 36 percent of those who held managerial positions in their first jobs to 31 percent of the professionals and 42 percent of the managers in their current jobs. Thus, the value of intelligence test scores for predicting occupational success diminishes over time, and it is especially unwarranted to imagine that those with low scores will not succeed in top-level occupations. The results of these studies underline how erroneous and destructive it is to

― 287 ―

discount the future prospects of people who happen to have done poorly on intelligence tests as children or youths.^[4]

The Priority of Potential over Performance

It could be argued that the poor capacity of future-oriented tests to predict in the long range is ultimately a practical shortcoming that will eventually be overcome with improvements in testing theory and technology. Be that as it may, certain other consequences inevitably accompany testing because they are embedded in the tests systemically, or in principle.

Some of these consequences stem from the peculiar nature of tests as representational devices. I have demonstrated that in intelligence testing, the relation between signifier and signified gets turned around. Now I want to extend that argument to cover all future-oriented testing.^[5] Normally signifiers follow signifieds in time and are in one way or another modeled on them. For example, an Elvis look-alike is a signifier of Elvis Presley. The look-alike comes after the original Elvis and resembles the original Elvis in appearance. In future-oriented testing, the signifier precedes the signified. The signifier is the test result: a present or previous state of affairs measured by the test. The signified is the target information: a future state of affairs that the test result is used to predict. Consider drug tests. The signifier concerns whether the subject has used drugs during some period prior to the test. This is what the test directly measures. But, especially in the preemployment situation, the ultimate point of a drug test is not to gain information about what the subject has done in the past. It is to use that information to indicate something about what the individual is likely to do in the future, as an employee of the company. That prediction is the signified. Therefore, in urinalysis, particularly when used in the preemployment context, the signifier (the test result) precedes the signified (the subject's future behavior). The case is even clearer with qualifying tests such as the SAT and ACT. Obviously, the test measures what the student knows at the time of taking it. Nevertheless, the explicit purpose of the test is to use its result to predict (to signify) how well the

― 288 ―

student will do in college. Again the signifier comes first because, of course, the individual takes the test before entering college.

We have noted that in the usual form of representation when the signifier follows the signified, the signifier is modeled on the signified. But when the signifier precedes the signified, this modeling relation is also reversed. Consider some other cases of representation where the signifier precedes rather than follows the signified: a recipe is a signifier of the dish that is made from it, a blueprint signifies the building to be constructed according to its specifications, and a particular structure articulated in DNA is a signifier of the plant, animal, or human being having that genotype. Notice that in each of these examples it is not simply the case that the signifier precedes the signified. The signifier also acts as a code that is used to produce the signified. This is also true of most testing, and it is of critical importance for the fabricating quality of tests. Usually tests do not simply measure things in a purely neutral and nonintrusive manner, as calipers might be used to measure the length of a skull or the width of a nose. Tests often change or condition that which they purport to measure. It is essential to recognize this property, because many of the unintended and often unrecognized social consequences of testing flow directly from it. The fabricating process works according to what may be called the priorith of potential over performance. Because tests act as gate-keepers to many educational and training programs, occupations, and other sectors of social activity, the likelihood that someone will be able to do something, as determined by tests, becomes more important than one's actually doing it. People are allowed to enter these programs, occupations, and activities only if they first demonstrate sufficient potential as measured by tests. The result of this, I argue, is that to pass through the gates guarded by testing is to undergo metamorphosis. The process works in two ways: by selection and by transformation.

Selection

The selective power of testing may be introduced by considering contemporary authenticity tests. An employer who wants to

― 289 ―

avoid hiring people who might use drugs, steal, embezzle, or engage in industrial espionage or other disapproved behavior may make the appointment contingent on a drug, integrity, or (until they were outlawed in the private sector in 1988) lie detector test. These tests use past behavior as the basis for prediction of future behavior. Drug testing by urinalysis in particular is a crude instrument because it reveals drug use during only a short period prior to the test. From the employer's point of view, it would represent an advance if authenticity tests were more purely future oriented, revealing the probability that people will engage in certain kinds of behavior in the future even if they have never behaved in that way before. To draw out the logic of this situation, imagine that a genetic tendency to drug abuse or crime had been identified. It would then be possible to base employment decisions on genetic tests, with employers declining to hire individuals who fit the troublesome genetic profile. Of course, it is not likely that such tests would predict future behavior with absolute certainty. Some who would have made perfectly honest and reliable employees will be excluded, while others who seem to be a safe bet on genetic grounds will turn out to be bad apples. Nonetheless, it would serve the interests of employers to use the genetic test in employment decisions because it would reduce the statistical incidence of undesirable behavior in the workplace.

Notice, however, what would happen if such practices were put into effect. The goal is to avoid hiring drug users, criminals, and troublemakers of various sorts. But the end actually achieved is to avoid hiring anyone who is identified by a test as fitting a certain profile that has a probability of such behaviors. This is how testing, as a selective gate-keeping device, results in the priority of potential over performance. Decisions are made about people not on the basis of what they have done, or even what they certainly will do, but in terms of what they might do.

All this is more than the dream of the positivist social planner or the nightmare of the civil libertarian. Some actual forms of testing fit this description precisely. One of them is preemployment integrity testing. These are past oriented to a degree, in that they normally include a section that invites the subject to divulge any previous wrongdoing, and those admissions are taken into

― 290 ―

account as possible indicators of future behavior. But the tests also include questions that probe general attitudes and opinions with the purely future-oriented purpose of revealing general dispositions that are thought to have a probability of producing unacceptable behavior, quite apart from any past record. Employers make hiring decisions on the basis of this information, and those decisions exemplify the priority of potential over performance in the same way as the imaginary example of genetic testing described above. As a result, integrity tests utilize selection to modify that which they are intended to measure, for a work force hired with the aid of integrity testing tends to manifest a certain personality profile. Thus, as was reported in chapter 3, the proprietor of a fast food chain in Texas said, "We used written honesty tests for a while but quit when we found that the people we were hiring were just too weird ."

Potential predominates over performance even more in qualifying tests than in authenticity tests. Aptitude and placement tests of all sorts stand as gate-keepers to select who shall be admitted to and promoted in various educational programs from nursery school through graduate school, vocations, professions, and positions of influence and responsibility. It is impossible to receive awards such as a National Merit Scholarship or a National Science Foundation Graduate Fellowship—and nearly impossible to be admitted to a prestigious college or graduate or professional school—without achieving high scores on one or another standardized aptitude test such as the PSAT, SAT, ACT, GRE, MCAT, GMAT, or LSAT. Duke University sponsors a Talent Identification Program that provides a special summer course of enriched study for junior high school students. The sole qualification for this program is a sufficiently high score on the SAT or the ACT, taken in the seventh grade. As with the authenticity tests considered a moment ago, these qualifying tests mold, by systematic bias of selection, the intellectual and personality characteristics of award recipients, student bodies, and members of professions.

Aptitude tests often become ends in themselves rather than means to an end. High school seniors are pictured in local newspapers because their scores on entrance examinations result in

― 291 ―

National Merit Scholarships, but their progress through college is not given the same attention. This is part of a general tendency in contemporary society to place more emphasis on qualifying to do something than actually doing it. In academia, scholars may be more honored for receiving contracts and grants to conduct research than for successfully completing the work and publishing significant results. In the world of work, it is often harder to get a job than to keep it. Union contracts, various forms of tenure, and the Equal Employment Opportunity Commission and other regulating agencies make it onerous to remove someone from a job for mediocre or minimal performance of duties. Illegal or flagrantly immoral behavior will normally suffice as grounds for dismissal, but charges of gross incompetence may or may not stand up against protests and hearings, while mere inadequacy is often a nonstarter. Thus, potential is again prior to performance, because after people have succeeded (often by tests) in demonstrating the potential necessary to get hired, it is not likely that they will be dismissed from it merely because that potential is not realized on the job.

Finally, the power of tests to select, and therefore to create that which they profess to measure, is so great that they sometimes determine life chances in the most literal sense. Parents use amniocentesis to identify the gender of fetuses, and in some cases, they couple the procedure with abortion if the gender is not to their liking. This may have a dramatic impact on the sex of fetuses that are carried to term. Philadelphia medical geneticist Laird Jackson reports that "virtually all of the babies ultimately born to his [East] Indian patients are males. For other patients, the relation of males to females is about 50-50."^[6] There can be no more vivid or disturbing example than this of how future-oriented tests, as gate-keepers, may exercise a determining effect on those who are allowed to pass through.^[7]

The case of amniocentesis highlights the crucial mechanism whereby selection operates in tests. The effect is in the aggregate: amniocentesis does not change the sex of any particular fetus, but selective abortion based on test results can have a dramatic impact on the sex ratio of babies that are ultimately born. Similarly, integrity or intelligence tests use a screening process to exercise

― 292 ―

a determining effect on the aggregate personality, cognition, and other characteristics of people who pass through their gates to be hired or promoted, awarded a scholarship, admitted to a prestigious college or graduate program, and so on.

Transformation

It also happens that tests bring about the transformation of individual subjects. The sheer act of measuring something may change it. In physics, the Heisenberg uncertainty principle holds that the outcome of experiments on subatomic particles is affected by the experiments themselves, and probably everyone has had the experience of blood pressure rising simply because it is being taken. A similar effect, albeit subtler and more pervasive, often occurs with the various kinds of tests that we have been examining.

One transforming capacity of tests is played out in the negative, in that tests prevent changes or developments in individuals that would otherwise take place. In schools, tests enable teachers to identify deviations in learning early, making it possible to correct them with small interventions. One thinks of Nancy Cole's image of the ideal classroom of the future—a sort of constant examination where the work of each student (done on a computer) is subject at every moment to observation, analysis, and correction by the teacher.^[8] This is an outstanding example of Foucault's notion, discussed especially in chapter 4, that constant surveillance enables power to be applied with maximum efficiency.^[9] Just as in the political realm dissidents, detected early by constant surveillance, can be nipped in the bud, so in the realm of thought, regular testing enables early identification and correction of deviant or independent thinking. One must not, of course, ignore the beneficial results of this. It enhances the learning process by identifying things that students do not understand and preventing them from going off on unproductive tangents. Nevertheless, it is also a marvelously efficient means of thought control, and it comes at a price. Often, it is not possible to be certain in advance that a tangent will be unproductive. In all areas of social life—political, artistic, intellectual, technological, economic—as well

― 293 ―

as in biological evolution, change often originates in minor deviations that, if allowed to develop, may open up hitherto unrealized potentials. By inhibiting aleatory exploration in thinking, testing encourages stagnation of the status quo and impedes the process of change.^[10]

Among transformations in individuals that testing produces rather than prevents, test scores redefine the person in the eyes of others and in one's own eyes as well. Such was the case with Victor Serbriakoff, who transformed himself from a school dropout working as a day laborer to the author of books and president of the International Mensa Society after an intelligence test reported his IQ to be 161. The other side of the issue is less inspiring: low intelligence test scores have caused untold numbers to be treated as stupid by their teachers, lose opportunities, lower their aspirations, and suffer lingering injury to self-esteem. And, as was explained in chapter 4, polygraph tests have the capacity to redefine people in their own eyes by convincing them that they are guilty of misdeeds that they did not in fact commit.

The transforming capacity of tests works on individuals before taking them as well as after. Because people covet the rewards that are available to those who pass through the gates guarded by tests, many spare no effort to remake themselves in ways that will improve their test performance. An outstanding example is the ancient Chinese civil service examination. Ambitious Chinese youths and men spared no effort over years and even decades of arduous study to acquire the knowledge and skills that would be tested. Thus, they literally transformed themselves to mirror the expectations embodied in the examination. The remarkable capacity of the Chinese civil service examination to mold human material was summarized by Wolfgang Franke: "For over 500 years the traditional Chinese system achieved harmoniously and smoothly, with almost no resort to force, a degree of intellectual homogeneity eagerly sought, but so far scarcely attained, by the most totalitarian systems."^[11]

People also transform themselves for the sake of doing well on tests in our own society. For many students, learning is less a matter of acquiring the information covered in their courses than becoming skillful at cramming and other techniques to get high

― 294 ―

grades on tests. Kaplan courses and the Princeton Review are designed explicitly and exclusively to raise scores on standardized tests. Nor are students the only ones who put test performance ahead of the acquisition of knowledge. Teachers often "teach to the test" with the goal that their students' performance on aptitude and minimum competency tests will make them (the teachers ) and their school look good. Jacques Barzun has suggested that the preoccupation with doing well on standardized tests has literally conditioned the way young people in America think.^[12] They have better-developed cognitive abilities to recognize random facts than to construct patterns or think systematically, he argues, because the former skill is favored and rewarded by the multiple-choice format of standardized tests. In all of these ways, tests create that which they purport to measure by transforming the person.

In some cases, the transformation may be quite unintended and even counterproductive. Integrity tests, for example, do not seem to be conducive to integrity. Those who take them are job applicants who enter the situation not as disinterested and compliant subjects (as is usually the case, for example, with tests given in the context of psychological experiments) but as deeply interested parties who are out to get hired. In responding to questions about their attitudes and opinions, their answers are likely to be what they believe will produce the most favorable result rather than what they actually think. Therefore, a test that is intended to measure an applicant's honesty actually diminishes the honesty that the subject manifests while taking it. This may also have a persistent effect. Some employees may see little reason to extend loyalty and respect to employers who trust them so little as to demand integrity tests, and so the likelihood that such employees will steal or cheat if given the opportunity might actually increase.^[13]

Tests, I have claimed, transform people by assigning them to various categories (genius, slow learner, drug-free, etc.), where they are then treated, act, and come to think of themselves according to the expectations associated with those categories. But tests do more. As with recipes, blueprints, or codes articulated in DNA, they also define or act as constitutive codes for the cate-

― 295 ―

gories themselves. It was argued in chapter 2 that in sixteenth-and seventeenth-century Europe, tests for witchcraft (and the fact that numerous suspects were revealed by them to be witches) served as important wedges for inserting the belief in witches into the public mind. The process works by a reversal of logic. Ostensibly the reasoning proceeds deductively: there are witches; witches have teats; this individual has teats; therefore, this individual is a witch. In actuality, however, the logic runs in the opposite direction: this individual has teats; teats are associated with witches; therefore, this individual is a witch; therefore, witches exist.

The capacity of tests to act as constitutive codes for social beliefs and categories remains in evidence today. As one important example, for the last three-quarters of a century or more, the idea has been abroad in the land that intelligence is a single thing that is possessed in significantly different amounts by different people, who on that basis may be classified on a scale of categories ranging from idiot to genius. As was discussed in chapter 9, the source of that complex of beliefs and categories is our widespread use of tests that report the intelligence of individuals on simple, quantitative scales such as IQ. The same reversal of logic that operated centuries ago for witchcraft is in play today with reference to intelligence. The apparent structure is deductive: intelligence is a single thing differently distributed among individuals; intelligence is measured by IQ tests; these individuals made certain scores on an IQ test; therefore, each one has the amount of intelligence corresponding to his or her score. But in fact, the premise about intelligence that starts the deductive chain is the conclusion that emerges when the argument runs in the opposite, inductive direction: these individuals made certain scores on an IQ test; IQ tests measure intelligence; therefore, each one has the amount of intelligence corresponding to his or her score; therefore, intelligence is a single thing differently distributed among individuals.

Still more subtly, as part of a process that has been taking place since at least the Industrial Revolution, testing has contributed to a general transformation in the way in which the person is known. As occupational specialization has become widespread,

― 296 ―

the various roles that an individual plays in social life have become increasingly disparate. In small hunting-and-gathering, herding, or farming communities, the same people interact with each other in a wide variety of contexts—familial, religious, economic, political. The result is that people are known to each other as complete persons, in all their roles. They know each other's skills, preferences, sense of humor, mannerisms, temper, general state of health, and so on.

Family members, friends, and close work associates still know each other in much the same way. But with increasing social complexity and division of labor, the individual tends to interact with different groups of people in different roles and thus is known to most other people only as a partial person: as a relative to some, a church member to others, a co-worker to others, a community volunteer to still others, and so on indefinitely. This has produced a certain fragmentation in the person as socially known, and possibly the different constituencies and rules that operate in the distinct spheres of the individual's life are conducive to disarticulation of the person even as known to oneself.

Testing is a major contributor to this general transformation of the person or self. Because of all the testing presently done, more knowledge has been accumulated about individuals in our society than at any previous time in history. It is a sort of knowledge, however, that probes deep into each particular but does little to connect them. The knowledge is stored in records kept in different places and used for different purposes. Physicians, dentists, optometrists, psychologists, and psychiatrists have records of tests pertaining to the individual's state of health; schools and universities keep records of academic tests; each employer has records of integrity and drug tests and tests of aptitude or skills taken in connection with a job; and the police may have records of lie detector, drug, or forensic tests taken in connection with criminal investigations. Officials in each of these spheres normally have little interest in the records of tests held in another, and even if they did, rules protecting privacy and confidentiality prevent transmission of the information. For example, a previous employer will not share an individual's test and other work

― 297 ―

records with a subsequent employer, for fear of grievances or lawsuits.

The result of all this is that individuals are known not in the round but piecemeal—different fragments for different circumstances. Therefore, while testing has indeed produced an explosion of knowledge about persons, it is an explosion in the literal sense of blowing the person apart. It scatters discrete bits of information to diverse points on a vast grid of disparate purposes and interests. The consequence is that the self as a coherent, integrated whole is vanishing. In its place this dissecting way of knowing produces a self as coincidence —a chance intersection of scores, measurements and records in standardized schemes of classification.^[14]

These remarks apply not only to how the person is known to others but also to oneself. As we have seen, a polygraph test has the disconcerting capacity to divide the self and set the parts against each other, for the body betrays the mind to the polygraph machine. Another consequence of testing for self-knowledge may be drawn from George Herbert Mead's insight that "the individual experiences himself . . . not directly, but only indirectly, from the particular standpoints of other individual members of the same social group or from the generalized standpoint of the social group as a whole."^[15] If (as argued above) others know a person not in the round but in fragments, and if the person knows oneself through the eyes of others, then the self will know itself as fragmented rather than as a coherent whole. This implies that the decentered knowledge of persons (self as well as others) that stems from testing may be partly responsible for the widespread malaise about the self and the contemporary orientation of popular psychology and psychotherapy toward knowing or "getting in touch with" oneself to become an integrated or "put together" individual. Given the role of testing at the source of the fragmentation of the self, it is ironic that one of the most common methods to get in touch with oneself is through still more tests, either various personality tests administered by professional counselors and psychotherapists or the myriad tenquestion tests promising insights into the self that abound in popular magazines.

― 298 ―

The piecemeal notion of the person that results from testing should be related, finally, to a general fragmentation that has been identified by social theorists as an important characteristic of postmodern society. As the modernist anchors in absolute truth, fixed reality, and integrative meaning tear loose, the beliefs and institutions of society become disengaged both from firm foundations and from each other. The result is not unlike the disarticulated, coincidental self described above: a collection of loosely connected parts that constantly rearrange their associations as they slide in and out of contact with each other in unstable, varying patterns.^[16]

The Hyperreality of Testing

Testing produces the priority of potential over performance because decisions are made and actions are taken on the basis of test results rather than on more direct knowledge of the target information that they supposedly signify. Positivist proponents of testing perceive this as an efficient means of ascertaining how well someone can do something without expending time and resources in a trial period of actually doing it. At least as important, however, it is easier and safer to act on test results because they are "hyperreal."^[17] As the term is used here, a situation is hyperreal when a signifier so dominates or enhances the process of representation that it (the signifier) becomes more impressive, memorable, actionable, and therefore more real than what it signifies. A radio baseball commentator described "what might have been the greatest catch I ever saw." He qualified it thus because he saw it in a minor league stadium that lacked a huge screen that could flash instant replays of the catch, reproducing it several times, stopping the action at critical junctures, displaying it from various angles. Instant replays are, of course, representations or signifiers of actual events. But they are hyperreal because they allow richer experience of events than mere observation of the real thing. So deeply have these representations become embedded in our expectations that the commentator

― 299 ―

remarked how, having seen the catch only once, as it actually happened, it did not seem quite real.

Test results are also hyperreal. Unlike the more complex and nebulous target information they presumably signify, test results are distinct, explicit, and durable. Partly, this is because they are written down: concrete records that are storable, retrievable, and consultable. They bear the same relation to the aptitude, ability, or other personal qualities they report on that the instant replay bears to the catch in baseball. The one is a signifier or representation of the other, but it is more palpable and scrutable than a fleeting event or an abstract, immaterial quality. The signifier therefore takes on a greater reality than the signified. Test results are unlike instant replays, however, in that the difference between signified and signifier in tests is greater. Test results are expressed in a form that is different—and drastically pared down—from what they signify. Consider, for example, the compression and transformation that is necessary to get from the rich complexity of an intelligence to the single number IQ score that supposedly represents it.

Perhaps most important, test results take on a greater importance and more palpable reality than what they signify because test results are actionable . The abbreviation of test results constitutes an operational advantage when, as commonly happens in contemporary society, many people vie for limited rewards, and those charged with making the selection have no extensive knowledge of the candidates and time only for a rapid review of their credentials. Decision makers for scholarships or fellowships, admission to selective colleges, or other competitive programs often know the candidates only as the bearers of certain academic and extracurricular records, test scores, and assessments in letters of recommendation. Their work would be vastly complicated and perhaps unmanageable if all the applicants were to present themselves as complete human beings with full and nuanced ranges of interests, talents, penchants, and problems. In evaluating people for a position in which intelligence is deemed to be an important criterion, for example, it is faster and simpler to make a decision between two candidates when told that one has an IQ of 125 and the other 115 than when given detailed, richly textured

― 300 ―

accounts of their respective cognitive capabilities. Nor should we overlook the security factor involved. Resting decisions on explicit and readily comparable test scores makes it possible to claim that the best selection was made on the basis of the information available, should decisions work out badly or there be troublesome protests or lawsuits.

The Problem of Intelligence

One of the most harmful instances of hyperreality that is spawned by testing is the peculiar notion of intelligence that holds sway in America. A grossly abbreviated and distorted signifier of the complex and varied capacities of the mind, the components of this notion are that intelligence is a single entity, that it is unequally distributed in the population such that some people have considerably more of it than others, that each person's allotment of it is fixed throughout life, and that it has a great deal to do with the vocations to which people might aspire and the success they can expect to achieve in them. In some of its forms, "intelligence" has additional, particularly destructive corollaries, such as that it is differentially distributed by gender or among ethnic groups.

Numerous theorists and studies reviewed earlier recommend a viewpoint quite contrary to this notion of intelligence. Gardner and Sternberg promote views of intelligence as multifaceted. Binet held that an individual's intelligence is not fixed but can improve. Varying profiles on intelligence test scores achieved by different ethnic groups can be attributed to differences in culture and socioeconomic variables rather than differing amounts of something called innate intelligence. Longitudinal studies have found little if any correlation between results on intelligence tests taken in childhood or youth and occupational success later in life. And yet the conventional notion of intelligence persists. In its name, rewards and opportunities are extended to some, while others are dismissed as having too little potential to justify investments to develop it. Because intelligence is generally accepted as a preeminently important capacity, those who are con-

― 301 ―

sidered to be deficient in it are viewed both by others and themselves as inferior, while those who are thought to richly endowed with it receive social adulation and develop a bloated sense of self-worth (often accompanied by neurotic fragility born of anxiety that perhaps they are not really as intelligent as everyone thinks they are). In sum, the notion of intelligence that abides in America is both erroneous and detrimental, and it should be changed.

The burden of my argument has been that the conventional concept of intelligence has its source in the abundance of intelligence tests that are routinely given in today's society. If this is true, then a useful way of undoing the popular concept of intelligence would be to do away with intelligence tests. This would be a step of considerable magnitude. It would mean terminating one-on-one tests that style themselves as intelligence or IQ tests, such as the Stanford-Binet and the Wechsler Intelligence Scales. But much more important, also slated for extinction would be the myriad standardized so-called aptitude tests given at all levels in elementary and secondary school, the SAT, ACT, and ASVAB for high school seniors, the GRE, MCAT, LSAT, and GMAT for applicants to graduate or professional schools, and the GATB for applicants who seek jobs through the U.S. Employment Service.

Massive as the prospect may be, such a development is not inconceivable. It may already have begun. Although the trend is still toward increasing emphasis on admissions tests, in recent years, a reaction against them has sprouted. Antioch, Bard, Hampshire, and Union colleges, together with some two dozen others, no longer require applicants to submit SAT or ACT scores, and Harvard Business School has dropped the GMAT (Graduate Management Aptitude Test) as an application requirement.^[18] These institutions make their selections on the basis of academic records, written statements by applicants, and letters of recommendation, and they manage to operate their admissions programs effectively without intelligence tests.

If intelligence tests were abolished, it would not be long before the conventional notion of intelligence as a single, quantifiable entity would change. People would distinguish more clearly among a variety of abilities, and these might be demonstrated, as

― 302 ―

Howard Gardner suggests, by evidence of an individual's accomplishments in linguistic, musical, logical-mathematical, spatial, bodily-kinesthetic, and personal intelligences.^[19] This would not signal the end of all qualifying tests. In addition to other evidence of accomplishments, tests would continue to play a role in decisions about school promotions and graduation as well as competition among aspirants for scholarships, admission to selective colleges and training programs, or employment in desirable jobs. The tests, however, would be strictly past oriented. They would be concerned to measure how well individuals had succeeded in mastering knowledge or skills that had been presented to them in academic courses or technical or artistic training programs. Different individuals would, of course, perform at different levels on these tests, and this would be taken into account along with other accomplishments in deciding who will receive scarce rewards and opportunities.

To develop these practices and attitudes is not unthinkable. They are already well established in some sectors. Consider how evaluation works in a typical American college course. Depending on the discipline, students are usually graded on the basis of some combination of the following: problems or questions to be completed and handed in at regular intervals, laboratory reports, term papers, performance in discussion groups, and tests. Far from being future-oriented intelligence tests, the tests are strictly based on material covered in the course. (To include anything else, as every professor knows, is sure to provoke students to rise up in rebellion.) The notion of general intelligence plays almost no role in the process. When students do not perform adequately and one wishes to understand why, the first questions have to do with how much interest they have in the subject matter and how much effort they put into it. If it is clear that they are interested and are trying hard, investigation turns next to their preparation. Have they developed good study habits? Do they have the requisite background for this course? Have they learned the particular modes of thinking and analysis that are used in this discipline? Academic advisers account for the great majority of cases of unsuccessful course performance in terms of one or another of these lines of investigation. Only for the few cases that remain does the

― 303 ―

question of sheer ability or "intelligence" come up. And even then, the matter is posed in terms of the particular abilities appropriate for a specific subject matter (ability to do mathematics, to draw, to interpret poetry, etc.) rather than general intelligence.^[20]

If the attitudes represented in this process were to become commonplace, it is likely that we would lose the habit of thinking of intelligence as an all-important, single thing that is distributed unequally among the population. Instead, we would evaluate quality of performance in terms of a variety of factors, only one of which is native ability in that particular area. Such a change in thinking would drastically curtail the destructive view that some people are irredeemably inferior to others by birth, perhaps even by race. It would place primary responsibility for achievement squarely on the individual's effort and hold out the promise that if given a fair opportunity, the degree of one's own determination is the major factor in achieving one's goals.

The most important discontinuity in applying the model of procedures in a college classroom to larger evaluation programs has to do with equal opportunity. It is a given that all of the students enrolled in a single course have the opportunity to receive the same instruction, but this, of course, does not hold if large numbers from different localities and backgrounds are being assessed. The applicants will have been exposed to a variety of different experiences and curricula in schools that are anything but uniform in the quality of education they provide. The question is how to achieve a fair evaluation of how well people have acquired academic, technical, artistic, or other skills when some of them have had much richer opportunities to acquire them than others. This is no new problem. It also plagues the present system, for, as we have seen, a direct correlation exists between intelligence test scores and family income. Sad to say, the present proposal offers no magic bullet for solving this most intransigent dilemma in our system of education and mass evaluation. Probably no simple solution exists. In the short range, admissions committees and other evaluators will need to continue, as they do now, to factor variables of previous opportunity into their decisions. But this is a difficult process at best, and some decision makers are less adept at it or take it less seriously than others. The

― 304 ―

only satisfactory long-range solution is the obvious one of making a massive commitment to provide all primary and secondary school children with equal educational opportunities. And that, of course, will require much more than just fixing the schools. It also involves fostering supportive home environments, and that will be realized only when the larger social problems of poverty and discrimination are successfully resolved.

Domination

From Seduction to Pornography

If some of the most important social consequences of tests flow from their representational character, others stem from the fact that they are devices of power. Testing is an outstanding example of the collusion and mutual extension of power and knowledge (expounded on by Foucault in nearly all his works), because testing as a technique for acquiring knowledge about people has simultaneously operated as a means to extend power over them. How this comes about is, in the most general terms, signaled in the clause of our definition stating that tests are applied by an agency to an individual with the intention of gathering information. Test givers are nearly always organizations, while test takers are individuals. Organizations are richer and stronger than individuals, so a power differential is established at the start.^[21] The asymmetrical relation of power is further evident from the total control that the test-giving agency exercises over the situation. The individual is required to submit passively while the agency extracts the information it wants in order to use it for its own purposes.

Compare this situation with how persons are otherwise known. In The Presentation of Self in Everyday Life , Erving Goffman examined how the person seeks to manipulate the impressions that others form of oneself and thereby to exert some measure of control over the social situations in which one participates through a process of creative and selective masking and revela-

― 305 ―

tion of the self.^[22] The self so presented is typically a nuanced character, evincing a unique pattern of abilities, temperament, and preferences. It is also variable, for depending on the circumstances and ends in view, one may present a self that is forthright and businesslike, or playful, or vindictive, or seductive, or enigmatic, and so on.

The capacity of the self to adopt such a rich variety of roles in social life is grounded in "privileged access." This term refers to the idea that other people have no direct knowledge of what is going on in someone's mind—one's thoughts, desires, day-dreams, fantasies, jealousies, and hidden agendas. The notion that the self can exclude all others from this inner sanctum (except, in some religious persuasions, God) ensures the ultimate uncertainty or mystery that the self can parlay into selective, creative, and variable presentations in the social world. Obviously, if this mystery were dispelled and all one's inner states were transparent to others, one's ability to mold one's public image would be drastically curtailed.

The effect of testing is precisely to dispel that mystery. Testing thwarts privileged access, intruding unchaperoned into the private realm formerly controlled by the self as gatekeeper and monitor of information. We have seen in earlier chapters how such intrusion is the explicit goal of lie detection, but it also occurs in somewhat subtler forms in all kinds of testing. Intelligence tests probe one's cognitive faculties, personality tests profile one's temperamental and emotional state, and drug tests provide information about possible private habits, proclivities, and activities. Production and presentation of knowledge about the self comes under the control of test givers. The self is no longer able, in a test situation, to temper or embellish it. Whatever tempering and embellishing takes place now stems from the tests themselves, which, as we have seen, regularly redefine or even fabricate the qualities they are intended to measure. If the artful presentation of Goffman's self is seductive, what happens in testing is, to borrow a simile from Jean Baudrillard, pornographic.^[23] Pornography differs from seduction in that the individual fixed by the pornographic gaze is powerless to conceal, control, or nuance anything. She or he is displayed for the observer's inspection,

― 306 ―

recreation, probing, and penetration in whatever way satisfies his or her purely selfish purposes.

The development of testing is an outstanding example of Foucault's thesis that power has been evolving in the direction of increasing efficiency, subtlety, and scope. Tests are applied ever more frequently for an expanding array of purposes. Especially remarkable is that people have increasingly found themselves in the position where they feel their only recourse is to ask, even to insist, that they undergo the pornographic scrutiny of tests. Power has become refined indeed when people demand that they be subjected to it.

In medieval times, this was limited to circumstances in which a person was suspected or accused of some wrongdoing and would demand trial by ordeal or by battle as a means of exoneration. A similar situation exists today when people under investigation by law enforcement agencies or employers, or who feel the need to lend credence to some important statement they have made, demand a lie detector test in an effort to bolster their veracity. The polygraphing of Anita Hill in connection with her accusation of sexual harassment against Supreme Court nominee Clarence Thomas during his confirmation hearings in 1991 is a case in point. People also request or demand drug tests in circumstances of individualized suspicion. An example occurred in 1990 when a Northwest Airlines pilot was called in at the last minute as a substitute to take a flight from Detroit to Atlanta. While they were waiting for him, a woman (recollecting an incident that had occurred six weeks earlier, when three Northwest Airlines pilots were arrested on charges of drunkenness after they landed a plane in Minneapolis) speculated to the other passengers that the delay was probably due to his being drunk or partying. Learning of her statements, the pilot refused to take off until blood and urine tests proved that he was not under the influence of drugs or alcohol.^[24]

While people still demand to be exonerated by authenticity tests, as they did centuries ago, modern lie detector and drug tests are much less violent than ordeal by water or hot iron, they involve less expenditure of public resources, and they are used in a wider range of circumstances. This is in line with Foucault's claim that

― 307 ―

power has developed in the direction of lighter, more efficient, and more pervasive application. Most important in this regard, contemporary authenticity testing advanced beyond the medieval forms when it burst the limitations of individualized suspicion. No one would demand trial by ordeal or by battle unless they had been accused of some specific misdeed. But with the development of preemployment, periodic, and random lie detection and drug testing, the pool of potential test takers expanded to include people in general, who are suspected of having committed some as yet undiscovered wrongdoing. The advance from individualized to generalized suspicion as grounds for tests vastly increased the number of people who are subject to them and who, therefore, are brought under the exercise of power.

The law has supported the expansion of drug testing to cover those under generalized suspicion, with a number of recent court decisions sustaining random testing. And if testing of hair should become popular, it would constitute an advance beyond urinalysis and blood tests both in efficiency and simplicity in procedures for sample collection and in the period over which drug use could be monitored. In contrast, mechanical lie detector testing in circumstances of generalized suspicion was drastically cut back by the federal antipolygraph law of 1988. Although this deflected the growth trajectory of authenticity testing, it did not stop it. Written integrity tests are filling the breach created by the curtailment of polygraph testing,^[25] and the result is likely to be a net gain for authenticity testing. While written tests are largely limited to preemployment testing,^[26] from the perspective of efficiency and economy, they are far superior to polygraph tests. The latter require an hour or more of one-on-one contact between examiner and subject, while the standardized format of written tests allows them to be given to subjects either individually or in groups of any size. This, together with the fact that they can be machine graded in a matter of seconds, makes written integrity tests much cheaper than polygraph tests (often under $10 per test as opposed to $50 to $100). Hence they have a growth potential considerably beyond that ever enjoyed by polygraph testing.

If one were to imagine the next step in the perfection of power by testing, it would be for people to request that they be tested in

― 308 ―

circumstances of generalized suspicion, as they already ask to be tested when they are under individualized suspicion. At first glance, such a development seems preposterous. Why would anyone demand a test to prove that they are not doing something that nobody accuses or specifically suspects them of doing in the first place? To bring this about would mark a truly ingenious extension of power.

Claims for it have actually been made on behalf of lie detection, but they are not convincing. A high-ranking police officer who manages lie detection in a metropolitan department told me that the police welcome the polygraph screen they all must pass as part of their training, because it bolsters the esprit de corps of the force as a fraternity of outstanding individuals, honest and true. But when I raised this possibility with one of the lower-ranking members of this police fraternity—who was more on the receiving than the giving end of polygraph tests—his answer was a terse and unequivocal, "Bullshit." Again, Zales jewelry chain has argued that polygraph tests boost employee morale because they assure that one's co-workers, superiors and subordinates, are honest people. This is welcome news to employees because it eliminates worry that company profits (and, therefore, one's benefits from the employee profit sharing plan) are being ripped off by unscrupulous fellow workers.^[27] But this information comes from the personnel director at Zales rather than from employees themselves. And even if this were a correct characterization of employee attitudes, at most it would mean that they approve a policy of lie detector tests on the basis of individualized suspicion. There is no suggestion that Zales employees make specific requests to be tested unless they are identified as suspects.

It falls to drug testing actually to achieve this next step in the extension of power: getting people positively to endorse—on occasion, even specifically to request—tests of themselves even when no suspicion has been directed against them. One case in point has to do with the use of steroids by athletes. It is widely recognized that steroids enhance performance in many events. Athletes who observe the ban against steroids in their own training are deeply concerned that their competitors who use steroids gain an unfair advantage. The several college athletes whom I

― 309 ―

interviewed on this issue expressed the opinion that the only way to be sure that no athletes use steroids is to test all of them. One individual stressed the importance of dealing with this issue early and recommended that the testing begin in high school. The strategy to deter steroid use by testing athletes universally or at random is currently in effect for the Olympics, NCAA events, and as part of the policies governing intercollegiate athletics at many universities. Most important for the present analysis, many if not most athletes approve that strategy although it requires that they themselves submit to testing. In this case, then, the level of power has been achieved where people gladly submit themselves to testing in the absence of individualized suspicion.

Moving to street drugs, the students of Oklahoma's Bennington High School enlisted in the war on drugs with such fervor that they decided to make themselves an example of a school that is 100 percent drug free. To prove it, the entire student body (all seventy-five of them) voluntarily took drug tests. They all passed, and as evidence of their continuing commitment, 10 percent of them selected at random are to be tested again each month. Here is a situation in which people who are under no individualized suspicion of drug use positively ask to be tested. The students proudly wear black T-shirts that proclaim, "Drug Free Youth"; 15-year-old sophomore Christie Wilson gushed, "I just hope that they start doing this drug test all over."^[28]

There are some signs of her wish coming true. As discussed above, Chicago's St. Sabina Academy conducts random drug tests of sixth through eighth graders, although drugs have not been a problem within the school. Parents welcomed the move. Their most common response when the program was proposed was, why not begin in kindergarten?

How can we analyze people's willingness to display themselves to the scrutiny of drug tests when they are not suspected of using drugs? In cases of individualized suspicion, a person is already in some degree of trouble, and the offer to take a test is made as an effort to clear oneself. The individual submits to the application of power represented by the test, that is to say, to escape from a present threat. Someone who volunteers to take a test when not under individual suspicion would seem to be under

― 310 ―

far less compulsion. That is not the case, however. Power works more subtly here, but no less insistently. What is not a present threat may quickly become one. After a policy has been adopted for voluntary testing of a group, any member of that group (e.g., a student at Bennington High School) who declines to "volunteer" for the test is immediately suspected of having something to hide. The choices become either to submit to the test now in order to avoid being brought under individualized suspicion or to submit to it later in an effort to clear oneself of individualized suspicion.

This reasoning does not account for those who take the lead in movements to encourage voluntary testing or for individuals who are anxious to submit to testing when they do not belong to a group that brings pressure on them to do so. Probably some of them are ingenuous. The gravity of the drug problem, the imperative to win the war on drugs, impresses itself on them so overwhelmingly that they believe extraordinary measures are necessary in the face of a monstrous threat. Hence they willingly open themselves to the power of testing and work to get others to do the same in the name of a great cause that justifies compromising the control they exercise over the collection and promulgation of information about themselves. Others may be more cynical and perceive the war on drugs as an opportunity for self-advancement. A political figure who calls for voluntary drug testing can garner publicity and gain the reputation as a diligent and fearless public servant who demands decisive action against the evil lurking at our very doorsteps. Moreover, the tactic is politically safe. Voluntary drug testing does not call for a significant outlay of funds, and it plays on the acute anxiety about drug abuse that has dominated the media and public opinion in recent years. It is not difficult to dismiss the civil libertarians who carp about invasion of privacy as being soft on drugs and pointedly ask why they should oppose voluntary testing unless they have something to hide.

Authenticity testing has been unmasked here as a technique for maintaining people under surveillance and insidiously transforming them into docile and unwitting subjects of an expanding disciplinary technology of power. What steps can be taken to curtail this threat to the autonomy and dignity of the individual?

― 311 ―

I have argued that the expansion of surveillance and coercion exercised by authenticity testing has largely been a story of the increasing application of the tests to people who are under generalized rather than individualized suspicion. It follows that the harmful effects of these tests would be greatly reduced if that development were reversed. Quite simply, then, my suggestion is that authenticity tests be strictly limited to circumstances of individualized suspicion.

One effect of this proposal would be to extend the provisions of the Employee Polygraph Protection Act of 1988 (EPPA). That act outlaws most lie detector tests by polygraph and other mechanical devices in the private sector. It should be expanded to cover the few private industries now exempted. Most important, governmental agencies should be brought under the act, for at present it does not apply to them, and local, state, and federal agencies may use lie detector tests in any way they wish.

In the wake of the EPPA, integrity tests given in written and other forms have flourished in the private sector. These too would be eliminated by my proposal. Some preliminary steps have already been taken in that direction, but efforts to control integrity tests by legislation crafted along the lines of the EPPA are complicated by the fact that it is difficult to construct a watertight definition of them.^[29] The EPPA uses a technological definition, proscribing tests that use a mechanical device such as a polygraph machine or psychological stress evaluator. A technological definition is problematic for integrity tests because some of them are taken in written form, others at a computer terminal, and still others orally either by direct interview or over the telephone. The publishers of many of them do not even acknowledge that they are tests, choosing instead to designate them by a wide variety of terms such as "survey," "inventory," or "audit." The policy recommended here that authenticity testing be limited to cases of individualized suspicion avoids this definitional problem because it focuses not on what the tests are but on how and when they are used. It would virtually terminate integrity tests, because they are used almost exclusively in circumstances of generalized suspicion, most especially preemployment testing.

― 312 ―

Adoption of this proposal would also bring about drastic reductions in drug testing. Preemployment, periodic, and random tests would be eliminated, for these are all conducted on the basis of generalized rather than individualized suspicion. The only legitimate circumstance for a drug test would be for cause: when there is good reason to suspect from an individual's behavior that the person is under the influence of drugs.^[30] There are certain encouraging developments in this direction. As of July 1991, fourteen states had enacted legislation regulating drug testing by private employers. So far as current employees are concerned, a trend toward rejecting testing on the basis of generalized suspicion is visible:

Most of the statutes provide that before requiring drug testing of an employee, the employer must have a reasonable suspicion that the employee is impaired to the point of affecting job performance.^[31]

The statutes allow drug tests of job applicants without individualized suspicion, but Montana, at least, restricts such preemployment tests to those applying for jobs involving security, public safety, a hazardous work environment, or fiduciary responsibility.^[32]

Implementation of my recommendation would dramatically change the landscape of authenticity testing, and energetic opposition would inevitably be forthcoming from a coalition of interests committed to it. Those with an economic stake are the people and organizations that conduct and market the tests. Politicians who play on public fears about crime and drugs and who use outspoken support for testing as a way to draw attention to themselves and to obtain votes have a political interest in perpetuating authenticity testing. Those with an ideological commitment may be divided into two categories. One is composed of social scientists and others imbued with a positivistic creed that any and all means of acquiring and applying scientific information about people and society should be encouraged as contributions to social progress. The other includes persons of an authoritarian turn of mind who explicitly or implicitly operate on the assumption that people in general are not to be trusted and that

― 313 ―

society is best served by firm controls that keep human impulses and liberty in check.

Although the opposition would be formidable, with sufficient resolve, a policy to restrict authenticity testing to cases of individualized suspicion could be implemented in the short term. The reason is that authenticity testing is not yet inextricably woven into the fabric of society. Few other institutions are dependent on it, and therefore it could be drastically reduced with minimal effect on the social structure. Drug testing, for example, is a recent phenomenon. The socioeconomic system got along without it quite adequately some ten years ago, when drug use was actually more prevalent in the United States than it is today. A general policy shift prohibiting preemployment, periodic, and random drug tests would have little effect on hiring and promotion practices, other than to make them less complicated and less expensive. Lie detection by polygraph was never massively practiced in the workplace, and its demise with the passage of the EPPA has not brought private business to its knees. Integrity testing has only been practiced in the last few years, and there has not been time for other business institutions to become systemically dependent on it. Terminating it before it becomes established would not produce major disruptions in personnel practices except, again, to save business the time and expense of giving the tests.

Implementing the policy would, however, require some explicit state or even national commitment in the form of a general agreement among employers or legislation. Organizations are reluctant to cease (or not to commence) authenticity testing in preemployment and other circumstances of generalized suspicion for several reasons. As has been demonstrated, they get the notion that they have to test because otherwise, with everyone else testing, drug abusers and criminals would flock to them. Again, if some organizations routinely conduct preemployment, periodic, and random tests for drugs and/or integrity, those that do not test feel vulnerable in case of accidents or losses to lawsuits claiming that they did not take reasonable precautions. That is, one of the strongest reasons organizations test is the fact that others do. They conclude that they must expend the time and money to conduct authenticity tests out of generalized suspicion not be-

― 314 ―

cause they anticipate any particular benefits in productivity but in self-defense. General consensus or legislation on the policy to restrict authenticity testing to cases of individualized suspicion would defang that incentive.

Testing and the Birth of the Individual

I have argued that several forms of authenticity and intelligence testing pose a threat to the autonomy and dignity of the individual, and my suggestion for countering that threat has been to do away with many of the most offensive tests. This should not be taken as a recommendation that we return to some earlier time when tests were fewer and the individual was freer. No such time ever existed, because the human individual as we know it is a relatively recent creation and one that to a significant extent has been produced by testing.

In probably the most perceptive analysis of testing yet written, Foucault has argued that the contemporary concept of the individual is a product of the development and extension of examinations in the seventeenth and eighteenth centuries.^[33] He does not mean, of course, that prior to that time there were no individuals. Obviously, there were, for people had individual names, they could tell each other apart, and it was possible to identify an individual as the same person on encounters at different times or in different places. Foucault is referring instead to the concept of the individual as a complex, dynamic being with specified physical, mental, political, and other properties. He means the individual as an object of study and detailed knowledge, for whom it is possible to define ranges of the normal along various physical and psychological dimensions, to explain the nature and processes of normal development and behavior, to diagnose deviations from the normal, and to intervene with the aim of correcting or treating those deviations. Obviously, all of this requires that there be a rich corpus of knowledge about individuals, and prior to the seventeenth and eighteenth centuries so little information about individual persons was systematically gathered and recorded that discourse about the properties, development, and

― 315 ―

pathologies of the human individual was not possible. This changed with more systematic examination of patients in hospitals and students in schools and with the keeping of retrievable records of those examinations and of information about individuals gathered in other contexts such as the military. These developments enabled "the constitution of the individual as a describable, analyzable object."^[34] Because Foucault views examinations and record-keeping as disciplinary devices, he regards the individual as both constituted and dominated by the disciplinary technology of power.^[35]

If the individual as an object susceptible of description, analysis, and treatment was born with the testing practices of the seventeenth and eighteenth centuries, this study suggests that its mature form is largely a product of the testing practices of the twentieth century. Today's individual is much more richly textured and fine grained than its ancestor of two and three centuries ago, a being with normalities and pathologies then undreamed. If, as Foucault maintains, power is exercised over the individual by interventions licensed by knowledge (guidance, treatment, punishment, rehabilitation, etc.), the contemporary individual is subject to even more coercion than its predecessors. As we have seen at every point in this study, the rich panoply of tests that are routinely deployed to probe all aspects of our physical and mental makeup, the decisions and interventions that are taken on the basis of test results, and the asymmetrical relation of power between test givers and test takers conspire to produce an individual who is suspended within an increasingly total network of surveillance and control.

Although "individual" implies singularity and unity, the sheer number and diversity of tests to which people are subjected has a corrosive effect on the integrity of both the concrete reality and the concept of the individual. As different constituencies limit their interest to only some of its tested parts, the twentieth-century individual tends to become a fragmented being. The complete individual is known and treated less as a unique entity than as a coincidental intersection of scores and measurements on a broad array of standardized classifications. And finally, particularly in the mental sphere, contemporary tests are future ori-

― 316 ―

ented to a degree far surpassing previous ones. This too has its effect on the person of the late twentieth century. It creates an individual who is less real than hyperreal: not so much a present as a potential or deferred being, defined less by what it is than by what it is likely to become. As the mapping of the human genome proceeds and new genetic tests are developed, we can expect the twenty-first century to bring with it an extension of future-oriented testing into the physical realm. The individual will increasingly be known in terms of the diseases it is likely to contract, how and when the feebleness of old age will set in, and so on.

When combined with Foucault's path-breaking work and other studies, this analysis of the social consequences of testing indicates that the accumulation and storage of information about the individual has been steadily increasing. The process has accelerated in the twentieth century and shows every sign of continuing in the future. It is not possible to turn the clock back. Nor should we want to, because if the growth of knowledge about the individual has enhanced coercion, it has also encouraged human liberation. As Foucault has pointed out,^[36] the seventeenth and eighteenth centuries produced both the military dream of society that leads to the disciplinary technology of power and also a liberal dream of individual rights, freedom, and dignity based on the social contract and enshrined in documents such as the American Declaration of Independence and Constitution and the French Declaration of the Rights of Man. It follows that by fostering both domination and autonomy, the historical development of the individual is laced with contradictions and tensions. The contradictions offer us, as participants in the formation of our own destiny, toeholds for intervening in the process. If we seek to stimulate the growth of personal autonomy and to slow the spread of coercion, it is not so much an effort to reverse the course of history as to influence the trajectory of the individual's and society's future development. To control testing is one positive intervention that is well within our grasp.

― 317 ―

II QUALIFYING TESTS

7 The Forest of Pencils