Marvin Minksy, who died on Sunday, was a pioneering researcher in the field of artificial intelligence. Our Profile, from 1981.
DECEMBER 14, 1981 ISSUE
A.I.
BY JEREMY BERNSTEIN
TABLE OF CONTENTS
CREDITPHOTOGRAPH BY LEONARD MCCOMBE / THE LIFE PICTURE COLLECTION / GETTY
In July of 1979, a computer program called BKG 9.8—the creation of Hans Berliner, a professor of computer Science at Carnegie-Mellon University, in Pittsburgh—played the winner of the world backgammon championship in Monte Carlo. The program was run on a large computer at Carnegie-Mellon that was connected by satellite to a robot in Monte Carlo. The robot, named Gammonoid, had a visual-display backgammon board on its chest, which exhibited its moves and those of its opponent, Luigi Villa, of Italy, who by beating all his human challengers a short while before had won the right to play against Gammonoid. The stakes were five thousand dollars, winner take all, and the computer won, seven games to one. It had been expected to lose. In a recent Scientific American article, Berliner wrote:
Not much was expected of the programmed robot. . . . Although the organizers had made Gammonoid the symbol of the tournament by putting a picture of it on their literature and little robot figures on the trophies, the players knew that existing microprocessors could not give them a good game. Why should the robot be any different?
This view was reinforced at the opening ceremonies in the Summer Sports Palace in Monaco. At one point the overhead lights dimmed, the orchestra began playing the theme of the film “Star Wars,” and a spotlight focused on an opening in the stage curtain through which Gammonoid was supposed to propel itself onto the stage. To my dismay the robot got entangled in the curtain and its appearance was delayed for five minutes.
This was one of the few mistakes the robot made. Backgammon is now the first board or card game with, in effect, a machine world champion. Checkers, chess, go, and the rest will follow—and probably quite soon. But what does this mean for us, for our sense of uniqueness and worth—especially as machines evolve whose output we can less and less distinguish from our own? Some sense of what may be in store is touched on in Berliner’s article:
I could hardly believe this finish, yet the program certainly earned its victory. There was nothing seriously wrong with its play, although it was lucky to have won the third game and the last. The spectators rushed into the closed room where the match had been played. Photographers took pictures, reporters sought interviews, and the assembled experts congratulated me. Only one thing marred the scene. Villa, who only a day earlier had reached the summit of his backgammon career in winning the world title, was disconsolate. I told him that I was sorry it had happened and that we both knew he was really the better player.
My own involvement with computers has been sporadic. I am of a generation that received its scientific education just before the time—the late nineteen-fifties—when the use of computers in scientific work became pervasive. I own and can operate one of the new, programmable pocket calculators. I once took a brief course in FORTRAN programming, and the ten-year-old son of a colleague of mine once gave me an afternoon’s worth of instruction in BASICprogramming language, which he uses to operate a typewriter-size computer in his father’s study. But as a theoretical physicist, I have avoided physics problems that have to be run off on large machines. Even so, I have read a great deal over the years about the new computer revolution and the age of the microprocessor: an age in which circuits with thousands of elements can be packed into a computer chip—a silicon wafer—so small that it can be inserted in the eye of a needle; in which the speed of machine operations is measured in billionths of a second; and in which the machines’ limitations resulting from the fact that electromagnetic signals propagate at only the speed of light are beginning to manifest themselves. There are so many books and articles on this subject and its implications that it is hard to distinguish one voice from another. But in all this computer literature I have constantly been delighted by what I have read by Marvin Minsky, who since 1974 has been the Donner Professor of Science at the Massachusetts Institute of Technology. In a paper entitled “Matter, Mind, and Models,” Minsky comments on free will:
If one thoroughly understands a machine or a program, he finds no urge to attribute “volition” to it. If one does not understand it so well, he must supply an incomplete model for explanation. Our everyday intuitive models of higher human activity are quite incomplete, and many notions in our informal explanations do not tolerate close examination. Free will or volition is one such notion: people are incapable of explaining how it differs from stochastic caprice but feel strongly that it does. I conjecture that this idea has its genesis in a strong primitive defense mechanism. Briefly, in childhood we learn to recognize various forms of aggression and compulsion and to dislike them, whether we submit or resist. Older, when told that our behavior is “controlled” by such-and-such a set of laws, we insert this fact in our model (inappropriately) along with other recognizers of compulsion. We resist “compulsion,” no matter from “whom.” Although resistance is logically futile, the resentment persists and is rationalized by defective explanations, since the alternative is emotionally unacceptable.
Later in the paper, Minsky writes:
When intelligent machines are constructed, we should not be surprised to find them as confused and as stubborn as men in their convictions about mind-matter, consciousness, free will, and the like. For all such questions are pointed at explaining the complicated interactions between parts of the self-model. A man’s or a machine’s strength of conviction about such things tells us nothing about the man or about the machine except what it tells us about his model of himself.
I have known Minsky for more than thirty years. When I first met him, in the late nineteen-forties, at Harvard, it was not entirely clear what his major academic field was—or, perhaps, what it wasn’t. He was taking courses in musical composition with the composer Irving Fine. Although he was an undergraduate, he had his own laboratories—one in the psychology department and one in the biology department—and he was writing what turned out to be a brilliant and original senior mathematics thesis on a problem in topology. For all his eclecticism, however, his basic interest seemed to be in the workings of the human mind. When he was a student, he has said, there appeared to him to be only three interesting problems in the world—or in the world of science, at least. “Genetics seemed to be pretty interesting, because nobody knew yet how it worked,” he said. “But I wasn’t sure that it was profound. The problems of physics seemed profound and solvable. It might have been nice to do physics. But the problem of intelligence seemed hopelessly profound. I can’t remember considering anything else worth doing.”
In later years, I had not been in touch with Minsky, but about a year ago, when I realized that something very new in the way of technology was engulfing us, I decided to look him up and ask him about it. I knew that he had been in the field of what is now called artificial intelligence, or A.I., even before it had a name. (The term “artificial intelligence” is usually attributed to John McCarthy, a former colleague of Minsky’s at M.I.T. McCarthy, a mathematician and now a professor of computer science at Stanford, coined the phrase in the mid-nineteen-fifties to describe the ability of certain machines to do things that people are inclined to call intelligent. In 1958, McCarthy and Minsky created the Artificial Intelligence Group at M.I.T., and it soon became one of the most distinguished scientific enterprises in the world.) During our talks, Minsky proved to be a fascinating conversationalist, with an engaging sense of humor and a luminous smile. He has one of the clearest minds I have ever encountered, and he is capable of elucidating the most complicated ideas in simple language—something that is possible only if one has a total mastery of the ideas. Our conversations took place both at his M.I.T. office and at his home, near Boston. He lives in a sprawling house with his wife, Gloria Rudisch, who is a prominent Boston pediatrician, and two of their children—Julie and Henry, eighteen-year-old twins. The Minskys’ oldest child, Margaret, who is twenty-three, graduated from M.I.T. and is now studying astronautics and designing educational programs for home computers.
That a doctor lives in the Minsky house one might deduce from various books and medical supplies at hand, but the interests of the other residents would be a real challenge to figure out. On a table during one of my visits I noticed a fireman’s hat with a red light on it, and, on another table, a sizable plastic shark. Mounted on a wall was a wrench so large that at first I took it for a playful sculpture of a wrench. On the wall near the wrench was what appeared to be a brass alpenhorn—one of several musical instruments in the house, the others being three pianos, two organs, and a Moog synthesizer. Minsky spends many hours composing and improvising, and hopes to make a record of some fugues he has composed in the baroque style. There were also innumerable recording instruments and a huge jukebox—a present from Minsky to his wife.
Minsky’s study, a crowded place, contains a computer terminal; a number of researchers in A.I. all around the country can exchange messages with one another over a computer network they established in 1969. Several times while I was there, Minsky paused to read his “mail”—messages on the terminal’s printout system. Near the telephone is a machine that I naïvely thought might be a stereo set. When Minsky saw me looking at it, he asked if I would like to listen to it. He flipped a few switches, and the machine began to make an uncanny series of ever more complex musical sounds. Minsky told me that some years ago he had taken a box of computer modules home to use in constructing logic circuits. He was having trouble debugging the circuits, because he did not have an oscilloscope—an instrument that renders the behavior of the circuits visible on a screen—and it occurred to him that if he ran computing circuits very fast and wired them to a loudspeaker he might be able to listen to them and tell by the sound if something was wrong. “I connected a couple of speakers to the circuits,” Minsky told me. “And I found that by listening to them I could tell if any of the flip-flops were dead.” Flip-flops are electronic components that can take one of two stable positions. “The machine was making all those sounds, and I started to like them. So I set up various circuits to make little chords and tunes. This thing was going one day when a friend of mine named Edward Fredkin, who’s a professor of computer science at M.I.T., came in, and he said, ‘That sounds pretty good. How did you get it to make those sounds?’ I showed him, and we spent the afternoon making more sounds. Fredkin formed a company to manufacture the machines as toys.”
Minsky’s office in the Artificial Intelligence Laboratory at M.I.T. is equally crowded. There is a plastic statue of a robot. There is a surprisingly lifelike cloth plant. There is also the inevitable computer terminal. The lab has its own large computer, which, over the years, has been rigged with just about every bit of programming anyone could think of. It can open doors in the lab and summon the elevators in the building; it has had mechanical arms attached to it, and special television cameras, to simulate vision, and a radio transmitter, to operate remote-controlled robots. There is also a trophy on it for a chess tournament it once won. Initially, the laboratory was in a ramshackle building that housed a Second World War electronics laboratory, but since 1963 it has been housed on three floors of a modern nine-story building overlooking Technology Square, just across the street from the main M.I.T. campus. About a hundred people work in it, including seven professors, most of them former students of Minsky’s; some twenty-five graduate students; and a corps of people whom Minsky refers to affectionately as hackers. These hackers—computer scientists call an elegant bit of programming a hack—are mostly people who entered M.I.T. and became infatuated with computers. Some never bothered to get their bachelor’s degree, but several have gone on to acquire advanced degrees.
One day, Minsky took me on a tour of the A.I. Laboratory, and explained something of its evolution. When he and McCarthy formed the Artificial Intelligence Group, it consisted only of them and a couple of students. About a year later, when Minsky and McCarthy were talking in a hallway at M.I.T., Jerome Wiesner, who was then directing the school’s Research Laboratory of Electronics, happened by and asked them what they were working on. He found their answers so interesting—McCarthy was initiating a system of time-sharing for computers and was also creating a new and extremely sophisticated computer language, and Minsky was beginning his attempts to get computers to do non-numerical things, such as reasoning by analogy—that he asked them if they needed money for their work. They said they could use a little money for equipment and for students. Not long before, Wiesner had received a joint grant from the armed services to do scientific research, so he was able to provide the money they needed. For some years, they never once had to write a research proposal. Things changed, though, and the laboratory now gets its money—some two and a half million dollars a year—from various government agencies, which require written proposals. In 1968, when the group formally became the Artificial Intelligence Laboratory, Minsky became its director—a job he held until 1973, when he got tired of writing the funding proposals and turned the directorship over to Patrick Winston, one of his former students.
On my tour of the lab, I noticed a giant drawing—perhaps six feet by fifteen—of what I thought at first might be the street plan of a large city. The drawing was taped to a wall on the eighth floor. Minsky told me that it was an engineering drawing of a computer chip, and that the lines were circuits that were photoengraved on the wafer. In fact, the drawing was of the circuitry on a chip that was an essential part of the first microcomputer designed expressly for artificial-intelligence work. That computer was designed by Gerald Sussman—a former student of Minsky’s, who is now a professor of electrical engineering at M.I.T.—and some of his students. Minsky then took me down to the third floor to see the actual chip. It was less than half an inch square, or roughly a hundred thousand times smaller than its circuitry diagram, and we had to put it under a microscope just to see the circuitry lines. On computer chips, a transistor exists where two circuitry lines cross; each transistor is about seven micrometres across—about the size of a red blood cell. The next generation of transistors will be only a fourth as large.
While we were on the third floor, Minsky also showed me a computer that he had designed and built. In 1970, he became convinced that a computer that could produce animated visual displays would be an extremely valuable aid in schools. “Even young children become deeply engaged with ideas about computers when they can literally see what they are doing by creating moving pictures on a screen,” Minsky told me. “So I designed this computer capable of making two million dots a second on the screen—enough for realistic animation effects.” By comparison, typical hobby computers can draw only a few thousand dots a second. Moreover, they cannot display a whole book-size page of text, so Minsky included a second screen on his computer which had room for six thousand alphabetical characters, so that children could edit their compositions on it. One fifth grader in Lexington, Massachusetts, programmed a garden of flowers that appeared on the screen and grew according to laws that the child wrote into his program.
Minsky called his computer the 2500, because he thought that its price for schools would be twenty-five hundred dollars. For a year, he immersed himself in its design, and learned to read circuit diagrams as if they were novels. “By the time I finished, I knew what happened in about two hundred different kinds of computer chips,” he told me. His work was helped along by the work of the Artificial Intelligence Laboratory at Stanford, which John McCarthy had formed in 1963: it had developed programs that automatically analyzed circuit diagrams for short circuits and other flaws. Using these programs on his own computer console, Minsky sat in his office and designed his machine. It needed some three hundred chips, which he ordered from the Texas Instruments catalogue. Its circuits required twenty-four pages of drawings. “Wiring a computer used to be a huge task,” Minsky noted. “But in this case my son, Henry, and I were able to do it ourselves by making use of a computer program that some of my friends at Stanford had written. It automatically did the most repetitive parts of the design and checked for mistakes. The best part was that we could put the whole thing on magnetic tape, which could be read by an automatic wiring machine that actually made the connections on the back of a huge panel of chip sockets. When that was done, we had to plug the three hundred chips in and attach power supplies, keyboards, and television screens. It wasn’t all that easy, but it did prove that a small group of people—together with a helpful computer-design program—could do better than a large industrial-design division.” By this time, Seymour Papert, a South African mathematician, had come to the A.I. Laboratory. In the late fifties, Papert had been working in Jean Piaget’s renowned child-psychology laboratory, in Geneva, and he had a professional interest in the education of children. He made a special mathematical language for the machine—one that he and Minsky thought children might like—called the LOGO language. Minsky showed me how to use it to get the machine to draw all sorts of polygons on its display screen and make some of them rotate like propellers. He hadn’t used the program for a while, and at one point he was stopped by a display that read, “POLY WANTS MORE DATA.” The data were supplied.
In the early seventies, Minsky and Papert formed a small company to market the machines, but within a few years it went broke. “Seymour and I weren’t very good at getting people to part with their money,” Minsky said. Many schools seemed to like the machine, but they often took as long as three years to come up with the money, and the company was hurt by the delays. “Our company ran out of money, because we had not realized how much time it would take for teachers to persuade school boards to plan budgets for such things,” Minsky told me. “Finally, we gave the company to a Canadian friend, who found that business people could learn Papert’s LOGO language as easily as children could. His company became successful—but in the field of business-data processing. Seymour and I went back to being scientists.” In the past year, the people working on LOGO have managed to find ways of programming it into some of the popular home computers, and Minsky and Papert are again trying to make it available to children, since the machines have now become cheap enough for schools to buy. In a few years, Minsky thinks, they should become as powerful as his original 2500.
Minsky was born in New York City on August 9, 1927. His father, Henry Minsky, was an eye surgeon who was also a musician and a painter, and he became head of the department of ophthalmology at Mount Sinai Hospital; his mother, Fannie, has been active in Zionist affairs. Minsky has two sisters—Ruth, who is younger, and Charlotte, who is older. Charlotte is an architect and a painter, and Ruth is a genetics counsellor for the Committee to Combat Huntington’s Disease. Minsky is like many other gifted mathematicians in that he can find no trace of a mathematical bent in his background, and also in that he has mathematical memories that go back to his earliest childhood. He recalls taking an intelligence test of some sort when he was about five. One of the questions was what the most economical strategy was for finding a ball lost in a field where the grass was so tall that the ball could not be seen immediately. The standard answer was to go to the center of the field and execute a spiral from the center until the ball was found. Minsky tried to explain to the tester that this was not the best solution, since, because you would have had to cross part of the field to get to the center in the first place, it would involve covering some of the area twice. One should start from the outside and spiral in. The memory of being unable to convince the tester of what appeared to Minsky to be an obvious logical point has never left him. “Everyone remembers the disillusion he experienced as a child on first discovering that an adult isn’t perfect,” he said recently. The five-year-old Minsky must have made a favorable impression nonetheless, since on the basis of the test results he was sent to an experimental public school for gifted children. He hated the school, because he was required to study tap dancing. Soon after his enrollment, however, his parents moved from Manhattan to Riverdale, in the Bronx, and he entered a public school there. He also disliked that one. “There were bullies, and I was physically terrorized,” Minsky told me. “Besides, a teacher wanted me to repeat the third grade because my handwriting was bad. My parents found this unreasonable, so in 1936, when I was in the fourth grade, I was sent to Fieldston—a progressive private school.
J. Robert Oppenheimer had graduated from Fieldston (the branch on Central Park West, then named the Ethical Culture School) in 1921, and while Minsky was there the memory of Oppenheimer’s student days was still fresh. “If you did anything astonishing at Fieldston, some teacher would say, ‘Oh, you’re another Oppenheimer,’ ” Minsky recalled. “At the time, I had no idea what that meant. Anyway, at Fieldston I had a great science teacher—Herbert Zim. Later on, he wrote a whole series of science books for children. He lives in Florida now, and I call him up every once in a while to chat.”
By the time Minsky was in the fifth grade, he had become interested in both electronics and organic chemistry. “I had been reading chemistry books, and I thought it would be nice to make some chemicals,” he told me. “In particular, I had read about ethyl mercaptan, which interested me because it was said to be the worst-smelling thing around. I went to Zim and told him that I wanted to make some. He said, ‘Sure. How do you plan to do it?’ We talked about it for a while, and he convinced me that if we were going to be thorough we should first make ethanol, from which we were to make ethyl chloride. I did make the ethanol and then the ethyl chloride, which instantly disappeared. It’s about the most volatile thing there is. I think Zim had fooled me into doing this synthesis knowing that the product would evaporate before I actually got to make that awful mercaptan. I remember being sort of mad, and deciding that chemistry was harder than it looked on paper, because when you synthesize something it can just disappear.”
Minsky finished the eighth grade at Fieldston in 1941, and in the fall of that year he entered the Bronx High School of Science. Bronx Science had been created just three years before to attract and train young people interested in the sciences. (Two of the 1979 Nobel laureates in physics—Steven Weinberg and Sheldon Glashow—were classmates at Bronx Science in the late forties, along with Gerald Feinberg, who is now the chairman of the physics department at Columbia, and during their senior year there the three taught themselves quantum mechanics.) “The other kids were people you could discuss your most elaborate ideas with and nobody would be condescending,” Minsky said in recalling his experience there. “Talking to people in the outside world was always a pain, because they would say, ‘Don’t be so serious—relax.’ I used to hate people saying ‘Relax.’ I was a hyperactive child—always zipping from one place to the next and doing things very fast. This seemed to bother most adults. But no one at Science felt that way. Later, when I went to Harvard, I was astonished at how much easier the course work was there than it had been at Science. I keep running across people I knew at Science—including Russell Kirsch, a computer pioneer who’s now at the National Bureau of Standards, and Anthony Oettinger, who’s a professor of applied mathematics and information-resources policy at Harvard. He was one of the first people to get a computer to learn something and to use computers for language translation. Frank Rosenblatt, who was tragically drowned in a boating accident in 1971, was also one of my classmates at Science.” Rosenblatt, who was a pioneer in artificial intelligence, became a researcher at the Cornell Aeronautical Laboratory, where he invented what was called the Perceptron. In the early nineteen-sixties, the Perceptron became the prototypical artificial-intelligence machine for a generation of young computer scientists.
In 1944, Minsky’s parents decided to send him to Andover for his senior year, reasoning that it would be easier for him to get into college as an Andover graduate. The year at Andover left Minsky with mixed feelings, because he found he was not permitted to devote himself exclusively to science. When he finished his year there, it was June of 1945 and he was seventeen. The war was still on, and he enlisted in the Navy. He had been told that if he enlisted in a particular Navy program he would be sent to electronics school. “Everybody was a bit suspicious about such a promise, since we felt that you couldn’t trust the government in something like that,” he recalled. “But it turned out to be true, and I was sent to the Great Lakes Naval Training Center, north of Chicago, to start my training. There were about a hundred and twenty people in my company, and most of them seemed very alien and rather scary. They were regular recruits from the Midwest and places like that. I could hardly understand what they said, and they certainly couldn’t understand what I was talking about. They provided my first—and, essentially, my last—contact with nonacademic people. But about forty of the people in my company were enrolled in the same sort of electronics program that I was. After we completed basic training, which involved firing rifles and anti-aircraft guns, we were going to be sent to radar school. There were maybe four people in my company who were really remarkable, including a mathematician, an astronomer, and a young musician named David Fuller. Fuller had been at Harvard for a year and was an organist. He took my music very seriously. By this time, I had sort of drafted a piano concerto, which Fuller liked a lot and said I should finish. I never did, though. Our little group was a strange kind of mini-Harvard in the middle of the Navy. Everything seemed very unrealistic. I practiced shooting down planes on an anti-aircraft simulator. I held the base record. I ‘shot down’ a hundred and twenty planes in a row. I realized that I had memorized the training tape and knew in advance exactly where each plane would appear. But I must have some odd skill in marksmanship. Many years later, my wife and I were in New Mexico on a trip. We came across some kids shooting at things with a rifle. I asked them if I could try it, and I hit everything. It seems that I have a highly developed skill at shooting things, for which there is no explanation.”
Minsky had been at the training center only two months when the war ended. “There really wasn’t anything for us to do, so we just spent a couple of months chatting. I finished up my term of enlistment at a naval base in Jacksonville, Florida, and then I was discharged—in time for me to go to Harvard as a freshman,” he told me.
Minsky entered Harvard in September of 1946 and found the place a revelation—a sort of intellectual garden salad, “a whole universe of things to do.” He said, “The only thing I was worried about was English, because there was a required English composition course. The thing I had always disliked most of all in school was writing. I could never think of anything to write about. Now I love to write. Anyway, they had a test that, if you passed, could get you out of the required course. I passed, and it was one of the best things that happened to me. I felt that I would not have to do the one thing I hadn’t liked in high school. In this test, we were supposed to interpret a couple of passages from Dostoevski, and in a perfectly straightforward way I explained what they were about. Apparently, whoever was reading all those things was tired of reading the long ones that the other students had done, and he passed me. I always enjoyed the challenge of school tests, but I never liked the idea of tests, so, as a professor, I have never given any. I make all my students write a paper instead. I don’t care how long it takes them. If they take a year or two, so much the better. Anyway, I took freshman physics and advanced calculus at Harvard—I had learned elementary calculus at Andover. I was nominally a physics major, but I also took courses in sociology and psychology. I got interested in neurology. Around the end of high school, I had started thinking about thinking. One of the things that got me started was wondering why it was so hard to learn mathematics. You take an hour a page to read this thing, and still it doesn’t make sense. Then, suddenly, it becomes so easy it seems trivial. I began to wonder about the learning process and about learning machines, and I invented some reinforcement theories. I came across the theories of B. F. Skinner, which I thought were terrible, because they were an attempt to fit curves to behavior without any internal ideas. Up until this time, I had been almost pathologically uninterested in how minds work. I wasn’t at all good at guessing how people felt. I think I was generally insensitive—almost intentionally insensitive—to people’s feelings and thoughts. I was interested only in what I was doing. But in my freshman year I began to get interested in psychological issues. After I had done some reading in neurology, I talked a professor of zoology, John Welsh, into letting me do some laboratory work on my own. For some reason, he gave me a huge room with a lot of equipment all to myself.”
Welsh told Minsky that one unsolved problem was how the nerves in a crayfish claw worked. “I became an expert at dissecting crayfish,” Minsky said. “At one point, I had a crayfish claw mounted on an apparatus in such a way that I could operate the individual nerves. I could get the several-jointed claw to reach down and pick up a pencil and wave it around. I’m not sure that what I was doing had much scientific value, but I did learn which nerve fibres had to be excited to inhibit the effects of another fibre so that the claw would open. And it got me interested in robotic instrumentation—something that I have now returned to. I am trying to build better micromanipulators for surgery, and the like. There hasn’t been much progress in that field for decades, and I’m determined to make some.”
When Minsky was not doing physics or working on his crayfish project, he began hanging around the psychology laboratory, which was then in the basement of Memorial Hall. “The people down in that basement fascinated me,” Minsky said. “There were Skinner and his people at the western end. While the theory that they were working with was of no interest to me, theyhad been able to optimize the training of animals—get them to do things in a shorter time and with less reward than anyone else could. Clearly, there was something in their technique that should be understood. At the other end of the basement were people who were also called psychologists, and who were totally removed from the sort of thing that Skinner did. For example, there was a man who was trying to show that the sensitivity of the ear operated according to a power law rather than a logarithmic one. I could never make any sense of why that was so important, and still can’t; presumably, both theories are false. In the middle of the basement were some young assistant professors who were new kinds of people. There were young George Miller—now a professor of psychology at Princeton—who was trying to make some mathematical theories of psychology, and with whom I spent lots of time, and J. C. R. Licklider, with whom I later worked. He ran a wonderful seminar at that time, mostly of graduate students, with a few undergraduates. I worked with Miller on theories of problem-solving and learning, and with Licklider on theories of perception and brain models. Many years later, I had a chance to work with Licklider again on designing computer programs. It was a whole universe in that basement, but the things that affected me most were the geometry of it and the fact that it was underground and away from the world. On the west were the behaviorists, who were trying to understand behavior without a theory; on the east were the physiological psychologists, who were trying to understand some little bit of the nervous system without any picture of the rest; and in the middle were these new people who were trying to make little theories that might have something to do with language and learning and the like but weren’t really getting anywhere. Even farther underground was the physicist Georg von Békésy. He was in the subbasement. He didn’t bother anyone but just worked on the real problem of how the ear functions.” In 1961, von Békésy became the first physicist to win a Nobel Prize in Physiology or Medicine, for his work on the ear.”
Minsky paused, and then continued, “What bothered me most about the whole situation was the graduate students who were trying to learn from these people. They would gather in the middle of the basement and argue about one doctrine or another—the politics of the situation and the merits of the different schools. They never seemed to have any good ideas. There was something terrifying about this clash of two different worlds—the physiological and the behaviorist. There were no psychoanalytically oriented people around them. If there had been, the situation would have been even worse. I couldn’t fathom how these people could live down there arguing about personalities, with no methodology, no ideas about what to do, and no real theories of what was happening deep inside the mind. So I tried to make one up. I imagined that the brain was composed of little relays—the neurons—and each of them had a probability attached to it that would govern whether the neuron would conduct an electric pulse; this scheme is now known technically as a stochastic neural network. I tried to explain Skinner’s results by finding some plausible way for a reward sensor to change the probabilities to favor learning. It turned out that a man in Montreal named Donald Hebb had come up with a similar theory, but at the time, luckily or unluckily, I didn’t know of his work, which he described soon afterward in a seminal book, “The Organization of Behavior,” published in 1949. So I had a laboratory in the psychology department and one in the biology department and I was doing experimental work, mostly in physical optics, in the physics department, where I was nominally majoring. My grades were fairly low. I had also taken a number of music courses with Irving Fine. He usually gave me C’s or D’s, but he kept encouraging me to come back. He was a tremendously honest man. I think the problem was that I was basically an improviser—one of those people who can occasionally improvise an entire fugue in satisfactory form without much conscious thought or plan. The trouble is that the more I work on a piece deliberately, the worse it gets. I tried learning to write scores, but I guess I never committed myself to the effort it takes. During most of this time at Harvard, I didn’t very much care what would happen to me in the future, but then, in my senior year, I began to worry about graduate school. I thought that what I would do would be to write a nice undergraduate thesis to make up for my grades. I discovered that at Harvard you couldn’t do an undergraduate thesis in physics, so in my last semester I switched to the mathematics department, where you could do a thesis. This was not a problem, since I had taken enough mathematics courses to qualify me as a math major.”
Early in his college days, Minsky had had the good fortune to encounter Andrew Gleason. Gleason was only six years older than Minsky, but he was already recognized as one of the world’s premier problem-solvers in mathematics; he seemed able to solve any well-formulated mathematics problem almost instantly. Gleason had served in the Navy, in cryptanalysis, during the war, and then had become a junior fellow at Harvard. (The fellowships allowed unlimited freedom for a small number of creative people in various fields.) Gleason made a tremendous impression on Minsky. “I couldn’t understand how anyone that age could know so much mathematics,” Minsky told me. “But the most remarkable thing about him was his plan. When we were talking once, I asked him what he was doing. He told me that he was working on Hilbert’s fifth problem.”
In 1900, David Hilbert, who is generally regarded as the greatest mathematician of the twentieth century, delivered a paper entitled “Mathematical Problems” to the Second International Congress of Mathematicians, in Paris, in which he presented a list of what he believed to be the most important unsolved problems in mathematics. (Hilbert’s full list consisted of twenty-three problems, but he presented only ten in his talk.) This list has all but defined mathematics for much of this century. Many of the problems have now been solved. At least one of them has been shown to be insoluble in principle; it falls into Gödel’s category of formally undecidable mathematical propositions. The sixth problem—“To axiomatize those physical sciences in which mathematics plays an important role”—is probably too vague to have a real solution. Some have not yet been solved. Most of the twenty-three problems have opened up entirely new fields of mathematics. In his lecture, Hilbert said, “This conviction of the solvability of any mathematical problem is a strong incentive in our work; it beckons us: This is the problem, find its solution. You can find it by pure thinking, since in mathematics there is no Ignorabimus!” Hilbert’s fifth problem was a deep conjecture in the theory of topological groups. In mathematics, a group is a collection of abstract objects that can be combined by some operation to make a sort of multiplication table; a topological group has in addition a “topology”—a kind of generalized geometry. Minsky recalls his conversation with Gleason vividly. “First, I managed to understand what the problem was,” he told me. “Then I asked Gleason how he was going to solve it. Gleason said he had a plan that consisted of three steps, each of which he thought would take him three years to work out. Our conversation must have taken place in 1947, when I was a sophomore. Well, the solution took him only about five more years, with Deane Montgomery, of the Institute for Advanced Study, and Leo Zippin, of Queens College, contributing part of the proof. But here I was, a sophomore, talking to this man who was only slightly older than I was, and he was talking about a plan like that. I couldn’t understand how anyone that age could understand the subject well enough to have such a plan and to have an estimate of the difficulty in filling in each of the steps. Now that I’m older, I still can’t understand it. Anyway, Gleason made me realize for the first time that mathematics was a landscape with discernible canyons and mountain passes, and things like that. In high school, I had seen mathematics simply as a bunch of skills that were fun to master—but I had never thought of it as a journey and a universe to explore. No one else I knew at that time had that vision, either.”
Inspired by Gleason, Minsky began work the fall of his senior year on an original problem in topology. Early in this century, the great Dutch mathematician L. E. J. Brouwer proved the first of what are known as fixed-point theorems. Imagine that one attempts to rearrange the surface of an ordinary sphere by taking each point on it and moving it somewhere else on the sphere. This is what mathematicians call a mapping of the surface of the sphere onto itself. Under very general assumptions, Brouwer managed to show that in any such mapping there would be at least one point that would necessarily remain fixed—it would necessarily be mapped onto itself. One example of a fixed-point theorem is the rigid rotation of a sphere—for example, the surface of the earth. In this case, there are two fixed points—the north and south poles—around which the rotation takes place. Over the years, mathematicians have generalized this theorem in all sorts of surprising ways. For instance, one can use similar ideas to show that at any time there must be two points at opposite ends of the globe which have exactly the same temperature and humidity. Minsky happened to read that Shizuo Kakutani, a mathematician at Yale, had proved, essentially, that at each moment there are three points on the earth situated at the vertices of an equilateral triangle at which the temperature is the same. “I became convinced that Kakutani had not got the most general result out of his logic,” Minsky recalled. “So I proved it first for three of the four corners of a square and then for any three points of a regular pentagon. This required going into a space of a higher dimension. So I went into this higher dimension for a couple of months, living and breathing my problem. Finally, using the topology of knots in this dimension, I came out with a proof. I wrote it up and gave it to Gleason. He read it and said, ‘You are a mathematician.’ Later, I showed the proof to Freeman Dyson, at the Institute for Advanced Study, and he amazed me with a proof that there must be at least one square that has the same temperature at all four vertices. He had found somewhere in my proof a final remnant of unused logic.”
I asked Minsky if he had published his proof.
“No,” he replied. “At the time, I was influenced by the example of my father. When he made a surgical discovery, he would take six or seven years to write it up—correcting it and doing many more operations to make sure that he was right. I felt that a successful scientist might publish three or four real discoveries in his lifetime, and should not load up the airwaves with partial results. I still feel that way. I don’t like to take some little discovery and make a whole paper out of it. When I make a little discovery, either I forget about it or I wait until I have several things that fit together before I write them up. In any case, at the time Gleason said ‘You are a mathematician’ he also said ‘Therefore you should go to Princeton.’ At first, I felt rejected. I was perfectly happy at Harvard, and I didn’t see why I should go somewhere else for graduate school. But Gleason insisted that it would be wrong for me to stay in one place. So I presented myself at Princeton, to the mathematics department, the next year.”
Minsky found the mathematics department at Princeton to be another perfect world. “It was like a club,” he told me. “The department admitted only a handful of graduate students each year, mostly by invitation. It was run by Solomon Lefschetz. He was a man who didn’t care about anything except quality. There were no exams. Once, I got a look at my transcript. The graduate school required grades. Instead of the usual grades, all the grades were A’s—many of them in courses I had never taken. Lefschetz felt that either one was a mathematician or one wasn’t, and it didn’t matter how much mathematics one actually knew. For the next three years, I hung around a kind of common room that Lefschetz had created for the graduate students, where people came to play go and chess as well as new games of their own invention, and to talk about all sorts of mathematics. For a while, I studied topology, and then I ran into a young graduate student in physics named Dean Edmonds, who was a whiz at electronics. We began to build vacuum-tube circuits that did all sorts of things.”
As an undergraduate, Minsky had begun to imagine building an electronic machine that could learn. He had become fascinated by a paper that had been written, in 1943, by Warren S. McCulloch, a neurophysiologist, and Walter Pitts, a mathematical prodigy. In this paper, McCulloch and Pitts created an abstract model of the brain cells—the neurons—and showed how they might be connected to carry out mental processes such as learning. Minsky now thought that the time might be ripe to try to create such a machine. “I told Edmonds that I thought it might be too hard to build,” he said. “The one I then envisioned would have needed a lot of memory circuits. There would be electronic neurons connected by synapses that would determine when the neurons fired. The synapses would have various probabilities for conducting. But to reinforce ‘success’ one would have to have a way of changing these probabilities. There would have to be loops and cycles in the circuits so that the machine could remember traces of its past and adjust its behavior. I thought that if I could ever build such a machine I might get it to learn to run mazes through its electronics—like rats or something. I didn’t think that it would be very intelligent. I thought it would work pretty well with about forty neurons. Edmonds and I worked out some circuits so that—in principle, at least—we could realize each of these neurons with just six vacuum tubes and a motor.”
Minsky told George Miller, at Harvard, about the prospective design. “He said, ‘Why don’t we just try it?’ ” Minsky recalled. “He had a lot of faith in me, which I appreciated. Somehow, he managed to get a couple of thousand dollars from the Office of Naval Research, and in the summer of 1951 Dean Edmonds and I went up to Harvard and built our machine. It had three hundred tubes and a lot of motors. It needed some automatic electric clutches, which we machined ourselves. The memory of the machine was stored in the positions of its control knobs—forty of them—and when the machine was learning it used the clutches to adjust its own knobs. We used a surplus gyropilot from a B-24 bomber to move the clutches.”
Minsky’s machine was certainly one of the first electronic learning machines, and perhaps the very first one. In addition to its neurons and synapses and its internal memory loops, many of the networks were wired at random, so that it was impossible to predict what it would do. A “rat” would be created at some point in the network and would then set out to learn a path to some specified end point. First, it would proceed randomly, and then correct choices would be reinforced by making it easier for the machine to make this choice again—to increase the probability of its doing so. There was an arrangement of lights that allowed observers to follow the progress of the rat—or rats. “It turned out that because of an electronic accident in our design we could put two or three rats in the same maze and follow them all,” Minsky told me. “The rats actually interacted with one another. If one of them found a good path, the others would tend to follow it. We sort of quit science for a while to watch the machine. We were amazed that it could have several activities going on at once in its little nervous system. Because of the random wiring, it had a sort of fail-safe characteristic. If one of the neurons wasn’t working, it wouldn’t make much of a difference—and, with nearly three hundred tubes and the thousands of connections we had soldered, there would usually be something wrong somewhere. In those days, even a radio set with twenty tubes tended to fail a lot. I don’t think we ever debugged our machine completely, but that didn’t matter. By having this crazy random design, it was almost sure to work, no matter how you built it.”
Minsky went on, “My Harvard machine was basically Skinnerian, although Skinner, with whom I talked a great deal while I was building it, was never much interested in it. The unrewarded behavior of my machine was more or less random. This limited its learning ability. It could never formulate a plan. The next idea I had, which I worked on for my doctoral thesis, was to give the network a second memory, which remembered after a response what the stimulus had been. This enabled one to bring in the idea of prediction. If the machine or animal is confronted with a new situation, it can search its memory to see what would happen if it reacted in certain ways. If, say, there was an unpleasant association with a certain stimulus, then the machine could choose a different response. I had the naïve idea that if one could build a big enough network, with enough memory loops, it might get lucky and acquire the ability to envision things in its head. This became a field of study later. It was called self-organizing random networks. Even today, I still get letters from young students who say, ‘Why are you people trying to program intelligence? Why don’t you try to find a way to build a nervous system that will just spontaneously create it?’ Finally, I decided that either this was a bad idea or it would take thousands or millions of neurons to make it work, and I couldn’t afford to try to build a machine like that.”
I asked Minsky why it had not occurred to him to use a computer to simulate his machine. By this time, the first electronic digital computer—named ENIAC, for “electronic numerical integrator and calculator”—had been built, at the University of Pennsylvania’s Moore School of Electrical Engineering; and the mathematician John von Neumann was completing work on a computer, the prototype of many present-day computers, at the Institute for Advanced Study.
“I knew a little bit about computers,” Minsky answered. “At Harvard, I had even taken a course with Howard Aiken”—one of the first computer designers. “Aiken had built an electromechanical machine in the early forties. It had only about a hundred memory registers, and even von Neumann’s machine had only a thousand. On the one hand, I was afraid of the complexity of these machines. On the other hand, I thought that they weren’t big enough to do anything interesting in the way of learning. In any case, I did my thesis on ideas about how the nervous system might learn. A couple of my fellow graduate students—Lloyd Shapley, a son of the astronomer Harlow Shapley, and John Nash—helped out with a few points, and occasionally I talked to von Neumann. He was on my thesis committee, along with John W. Tukey and A. W. Tucker, who had succeeded Lefschetz as chairman of the mathematics department. Later, Tucker told me that he had gone to von Neumann and said, ‘This seems like very interesting work, but I can’t evaluate it. I don’t know whether it should really be called mathematics.’ Von Neumann replied, ‘Well, if it isn’t now, it will be someday—let’s encourage it.’ So I got my Ph.D.”
That was in 1954. “I hadn’t made any definite plans about what to do after I got the degree, but, the year before, some interesting people had come along and said that they were starting a new kind of department at Tufts, which was to be called systems analysis, and that if I came I could do anything I wanted to,” Minsky said. “I wanted to get back to Boston, so I had joined them, and I finished my doctoral thesis up there. Soon afterward, Senator Joseph McCarthy made a vicious attack on several members of the group, and its funding vanished. But then Gleason came to me and said that I should be a junior fellow at Harvard. He nominated me, and my nomination was supported by Claude Shannon, von Neumann, and Norbert Wiener. The only obligation I had was to dine with the other junior fellows on Monday evenings. It was a welcome opportunity for me, because I was trying to make general theories about intelligence—in men or machines—and I did not fit into any department or profession. I began to think about how to make an artificial intelligence. I spent the next three years as a junior fellow. There were about thirty of us, sort of one from each field—thirty gifted children.”
Two years after Minsky began his fellowship, one of the more important events in the history of artificial intelligence occurred. This was the Dartmouth Summer Research Project on Artificial Intelligence, which took place in the summer of 1956. Earlier that year, Minsky and three colleagues—John McCarthy, who had been one of Minsky’s fellow graduate students at Princeton and was now a professor of mathematics at Dartmouth; Nathaniel Rochester, who was manager of information research at the I.B.M. laboratory in Poughkeepsie; and Claude Shannon, a mathematician at the Bell Telephone Laboratories in Murray Hill, New Jersey, for whom Minsky had worked in the summer of 1952—submitted a proposal to the Rockefeller Foundation for a conference on what McCarthy called artificial intelligence; their proposal suggested that “every aspect of learning or any other feature of intelligence” could be simulated. The Rockefeller Foundation found the proposal interesting enough to put up seventy-five hundred dollars for the conference. Needless to say, twenty-five years later the several participants in the conference have different ideas of what its significance was. Minsky told me a few of the things that struck him. “My friend Nat Rochester, of I.B.M., had been programming a neural-network model—I think he got the idea from Donald Hebb’s book ‘The Organization of Behavior,’ and not from me—on the I.B.M. 701 computer,” Minsky recalled. “His model had several hundred neurons, all connected to one another in some terrible way. I think it was his hope that if you gave the network some simultaneous stimuli it would develop some neurons that were sensitive to this coincidence. I don’t think he had anything specific in mind but was trying to discover correlations—something that could have been of profound importance. Nat would run the machine for a long time and then print out pages of data showing the state of the neural net. When he came to Dartmouth, he brought with him a cubic foot of these printouts. He said, ‘I am trying to see if anything is happening, but I can’t see anything.’ But if one didn’t know what to look for one might miss any evidence of self-organization of these nets, even if it did take place. I think that that is what I had been worried about when I decided not to use computers to study some of the ideas connected with my thesis.” The other thing that struck Minsky at Dartmouth has by now become one of the great legends in the field of artificial intelligence. It is the sequence of events that culminated when, in 1959, for the first time, a computer was used—by Herbert Gelernter, a young physicist with I.B.M.—to prove an interesting theorem in geometry.
I had come across so many versions of this story that I was especially interested in hearing Minsky’s recollection. Sometime in the late spring of 1956, Minsky had become interested in the idea of using computers to prove the geometric theorems in Euclid. During that spring, he began to reread Euclid. “If you look through Euclid’s books, you find that he proves hundreds of theorems,” he told me. “I said to myself, ‘There are really only a small number of types of theorems. There are theorems about proving that angles are equal, there are theorems about circles intersecting, there are theorems about areas, and so forth.’ Next, I focussed on the different ways Euclid proves, for example, that certain angles are equal. One way is to show that the angles are in congruent triangles. I sketched all this out on a few pieces of paper. I didn’t have a computer, so I simulated one on paper. I decided to try it out on one of Euclid’s first theorems, which is to prove that the base angles of an isosceles triangle are equal. I started working on that, and after a few hours—this was during the Dartmouth conference—I nearly jumped out of my chair.”
To understand Minsky’s excitement, one must look at an isosceles triangle:
We are given that the line segments AB and BC are equal; the problem is to show that the base angles a and c are equal. To prove this, one has to show that the angles a and c are in congruent triangles. Minsky recalled saying to himself, “My problem is to design a machine to find the proof. Any student can find a proof. I mustn’t tell the machine exactly what to do. That would eliminate the problem. I have to give it some general techniques that it can use for itself—ways that might work. For example, I could tell it that the angles a and c might lie in congruent triangles. I would also have to tell it how to decide if two triangles were congruent. I made a diagram of how the machine could use them by trying new combinations when old ones failed. Once I had this set up, I pretended I was the machine and traced out what I would do. I would first notice that the angle a is in the triangle BAC but the angle c is in the triangle BCA. My machine would be able to figure this out. Next, it would ask if these two triangles were congruent. It would start comparing the triangles. It would soon notice that these were the same triangle with different labellings. Its techniques would lead it to make this identification. That’s when I jumped out of my chair. The imaginary machine had found a proof, and it wasn’t even the same proof that is given in Euclid. He constructed two new triangles by dropping a perpendicular from B to the line AC. I had never heard of this proof, although it had been invented by Pappus, a Greek geometer from Alexandria, who lived six hundred years after Euclid. It is sometimes credited to Frederick the Great. I thought that my program would have to go on a long logical search to find Euclid’s proof. A human being—Euclid, for example—might have said that before we prove two triangles are congruent we have to make sure that there are two triangles. But my machine was perfectly willing to accept the idea that BAC and BCA are two triangles, whereas a human being feels it’s sort of degenerate to give two names to the same object. A human being would say, ‘I don’t have two houses just because my house has a front door and a back door.’ I realized that, in a way, my machine’s originality had emerged from its ignorance. My machine did not realize that BAC and BCA are the same triangle—only that they have the same shapes. So this proof emerges because the machine doesn’t understand what a triangle is in the many deep ways that a human being does—ways that might inhibit you from making this identification. All it knows is some logical relationships between parts of triangles—but it knows nothing of other ways to think about shapes and space.”
Minsky smiled and went on, “For me, the rest of the summer at Dartmouth was a bit of a shambles. I said, ‘That was too easy. I must try it on more problems.’ The next one I tried was ‘If the bisectors of two of a triangle’s angles are equal in length, then the triangle has two equal sides.’ My imaginary machinery couldn’t prove this at all, but neither could I. Another junior fellow at Harvard, a physicist named Tai Tsun Wu, showed me a proof that he remembered from high school, in China. But Nat Rochester was very impressed by the first proof, and when he went back to I.B.M. after the summer he recruited Gelernter, who had just got his doctorate in physics and was interested in computers, to write a program to enable a computer to prove a geometric theorem. Now, a few months earlier, a new computer language called I.P.L.—for ‘information-processing language’—had been invented by Allen Newell, J. C. Shaw, and Herbert Simon, working at the Rand Corporation and the Carnegie Institute of Technology.” Newell and Shaw were computer scientists, both of whom worked for Rand, but Newell was getting his doctorate at Carnegie Tech with Herbert Simon, who was in fact a professor at the Graduate School of Industrial Administration. In 1978, Simon was awarded the Nobel Prize in Economic Science. “It was John McCarthy’s notion to combine some of I.P.L.’s ideas with those of FORTRAN—the I.B.M. programming language that was in the process of being developed—to make a new language in which the geometry program would be written,” Minsky went on. “Gelernter found ways of doing this. He called his new language FLPL, for ‘FORTRAN List-Processing Language.’ FORTRAN, by the way, stands for ‘formula translation.’ Well, FLPLnever got much beyond I.B.M. But a couple of years later McCarthy, building on I.P.L. and Gelernter’s work and combining this with some ideas that Alonzo Church, a mathematician at Princeton, had published in the nineteen-thirties, invented a new language called LISP, for ‘list-processing,’ which became our research-computer language for the next generation.” By 1959, Gelernter had made his program work. Having done that, he gave it the job of proving that the base angles of an isosceles triangle are equal. The computer found Pappus’ proof.
In 1957, Minsky became a member of the staff of M.I.T.’s Lincoln Laboratory, where he worked with Oliver Selfridge, one of the first to study computer pattern-recognition. The following year, Minsky was hired by the mathematics department at M.I.T. as an assistant professor, and that year he and McCarthy, who had come to M.I.T. from Dartmouth the year following the conference, started the A.I. Group. McCarthy remained at M.I.T. for four more years, and during that time he originated or completed some developments in computer science that have since become a fundamental part of the field. One of these was what is now universally known as time-sharing. “The idea of time-sharing was to arrange things so that many people could use a computer at the same time instead of in the traditional way, in which the computer processed one job after another,” Minsky explained to me. “In those days, it usually took a day or two for the computer to do anything—even a job that needed just two seconds of the computer’s time. The trouble was that you couldn’t work with the computer yourself. First, you’d write your program on paper, and then punch holes in cards for it. Then you’d have to leave the deck of cards for someone to put in the computer when it finished its other jobs. This could take a day or two. Then, most programs would fail anyway, because of mistakes in concept—or in hole punching. So it could take ten such attempts to make even a small program work—an entire week wasted. This meant that weeks could pass before you could see what was wrong with your original idea. People got used to the idea that it should take months to develop interesting programs. The idea of time-sharing was to make the computer switch very quickly from one job to another. At first, it doesn’t sound very complicated, but it turned out that there were some real problems. The credit for solving them goes to McCarthy and to another M.I.T. computer scientist, Fernando Corbató, and to their associates at M.I.T.”
Minsky went on, “One of the problems was that if you want to run several jobs on a computer, you need ways to change quickly what is in the computer’s memory. To do that, we had to develop new kinds of high-speed memories for computers. One trick was to develop ways to put new information into the memories while taking other information out. That doubled the speed. A more basic problem was something that we called memory protection. One had to arrange things so that if there were several pieces of different people’s programs in a computer one piece could not damage another one by, say, erasing it from the main memory. We introduced what we called protection registers to prevent this from happening. Without them, the various users would have interacted with one another in unexpected ways. One of the most interesting aspects of all this was that for a long time we couldn’t convince the computer manufacturers that what we were doing was important. They thought that time-sharing was a waste of time, so to say. I think that many of them were confused about the difference between what is called time-sharing and what is called multiprocessing, which means having different parts of the computer running different parts of someone’s program at the same time—something totally different from the idea of many people sharing the same computer nearly simultaneously, with each user getting a fraction of a second on the machine. I.B.M., for example, was working on a system in which a program was being run, another one was being written on tape, and a third one was being prepared—all simultaneously. That was not what we had in mind. We wanted, say, a hundred users to be able to make use of the hardware at once. It took several years before we got a computer manufacturer to take this seriously. Finally, we got the Digital Equipment Corporation, in Maynard, Massachusetts, to supply the needed hardware. The company had been founded by friends of ours from M.I.T., and we collaborated with them to make their little computer—the PDP-l—into a time-sharing prototype. Soon, they had the first commercial versions of time-sharing computers. Digital Equipment eventually became one of the largest computer companies in the world. Then we decided to time-share M.I.T.’s big I.B.M. computer. It worked so beautifully that on the basis of it M.I.T. got three million dollars a year for a long time for research in computer science from the Advanced Research Projects Agency of the Defense Department.”
Time-sharing is now used universally. It is even possible to hook up one’s home computer by telephone to, for instance, one of the big computers at M.I.T. and run any problem one can think of from one’s living room.
The computer revolution in which people like Minsky and McCarthy have played such a large role has come about in part because of the invention of the transistor and in part because of the development of higher-level computer languages that have become so simple that even young children have little trouble learning to use them. The transistor was invented in 1948 by John Bardeen, Walter H. Brattain, and William Shockley, physicists then at the Bell Telephone Laboratories, and in 1956 they were awarded the Nobel Prize in Physics for their work. The transistor has evolved in many different ways since the days of the original invention, but, basically, it is still made of a material in which the electrons have just the right degree of attachment to nearby atoms. When the electrons are attached too loosely, as in a metal, they are free to move anywhere in the material. Hence, metals conduct electricity. Attached too tightly, as in an electrical insulator, the electrons cannot move freely; they are stuck. But in pure crystalline silicon and a couple of other crystalline substances the electrons are bound just loosely enough so that small electrical force fields can move them in a controllable way. Such substances are called semiconductors. The trick in making a transistor is to introduce an impurity into the crystal—a process known as doping it. Two basic types of impurities are introduced, and scientists refer to these as n types and p types—negative and positive. One substance used for doping the crystal is phosphorus, an n type. The structure of phosphorus is such that it contains one electron more than can be fitted into the bonds between the phosphorus atoms and the atoms of, say, silicon. If a small voltage is applied to a silicon crystal doped with phosphorus, this electron will move, creating a current of negative charges. (The charge of an electron is, by convention, taken as negative.) Conversely, if an element like boron is inserted into the silicon lattice, an electron deficiency is created—what is known as a hole. When a voltage is applied, an electron from an atom of silicon will move to fill in the hole, and this will leave yet another hole. This progression of holes cannot be distinguished in its effects from a current of positive charges. To make transistors, one constructs sandwiches of n-type and p-type doped crystals. The great advantage of the transistor is that the electrons will respond to small amounts of electric power. In the old vacuum tubes, it took a lot of power to get the electrons to move, and a lot of waste heat was generated. Moreover, the transistor can be miniaturized, since all of its activity takes place on an atomic scale.
The first commercial transistor radios appeared on the market in 1954. They were manufactured by the Regency division of Industrial Development Engineering Associates, Inc., of Indianapolis (and were not, as it happened, a commercial success). By 1959, the Fairchild Semiconductor Corporation had developed the first integrated circuit. In such a circuit, a chip of silicon is doped in certain regions to create many transistors, which are connected to one another by a conducting material like aluminum, since aluminum is easier than, say, copper to attach to the silicon. In 1961, the Digital Equipment Corporation marketed the first minicomputer, and in 1963—first in Britain and then in the United States—electronic pocket calculators with semiconductor components were being manufactured, although it was not until the nineteen-seventies that mass production brought the costs down to where they are now.
Still, the developments in computer hardware do not in themselves account for the ubiquity of computers in contemporary life. Parallel to the creation of this technology has been a steady evolution in the way people interact with machines. Herman Goldstine, who helped to design both the ENIAC, at the University of Pennsylvania, and the von Neumann computer, at the Institute for Advanced Study, points out in his book “The Computer from Pascal to von Neumann” that the von Neumann computer had a basic vocabulary of twenty-nine instructions. Each instruction was coded in a ten-bit expression. A bit is simply the information that, say, a register is on or off. There was a register known as the accumulator in the machine, and it functioned like a scratch pad. Numbers could be brought in and out of the accumulator and operated on in various ways. The instruction “Clear the accumulator”—that is, erase what was on the register—was, to take one example, written as the binary number 1111001010. Each location in the machine’s memory had an “address,” which was also coded by a ten-digit binary expression. There were a thousand and twenty-four possible addresses (210 = 1,024), which meant that the Institute’s machine could label, or address, a thousand and twenty-four “words” of memory.
Hence a typical “machine language” phrase on the Institute computer might be:
00000010101111001010
This meant “Clear the accumulator and replace what had been stored in it by whatever number was at the address 0000001010.” Obviously, a program written for this machine would consist of a sequence of these numerical phrases, and a long—or even not so long—program of this sort would be all but impossible for anyone except, perhaps, a trained mathematician to follow. It is also clear that if this situation had not changed drastically few people would have learned to program computers.
By the early nineteen-fifties, the first attempts to create the modern programming languages were under way. In essence, these attempts and the later ones have involved the development of an understanding of what one does—the steps that one follows—in trying to solve a problem, and have led the workers in this field to a deeper and deeper examination of the logic of problem-solving. Initially, the concentration was on the relatively simple steps that one follows in doing a fundamental arithmetic problem, like finding the square root of a number. It became clear that certain subroutines or subprograms—such as a routine for addition—came into play over and over. Once these subroutines were identified, one could make a code—what is called a compiler—that would automatically translate them into machine language every time they were needed in a computation. J. Halcombe Laning and Neal Zierler, at M.I.T., and, independently, Heinz Rutishauser, of the Eidgenössische Technische Hochschule (Albert Einstein’s alma mater), in Zurich, were among the first to attempt this. Their work did not gain wide acceptance, however, and it was not until the late fifties, after a group led by John Backus, a computer scientist with I.B.M., had developed FORTRAN, that computers became widely accessible. Some years ago, I had an opportunity to discuss the development of FORTRAN with Backus. He told me that he and his group had proceeded more or less by trial and error. A member of the group would suggest a small test program, and they would use the evolving FORTRANsystem to translate it into machine language to see what would happen. They were constantly surprised by what the machine did. When the system was fairly well advanced, they began to race their FORTRAN-made programs against machine-language programs produced for the same job by a human programmer. They used a stopwatch to see which program was faster. If theFORTRAN-made programs had turned out to be substantially slower, they could not have become a practical alternative to their man-programmed machine-language competitors. It took Backus and his group two and a half years to develop FORTRAN; it was completed in 1957.
In a 1979 Scientific American article, Jerome A. Feldman, chairman of the computer-science department at the University of Rochester, noted that in the United States alone there were at that time more than a hundred and fifty programming languages used for various purposes. For simple numerical computations, most of these languages work almost equally well; in fact, BASIC(for “beginner’s all-purpose symbolic instruction code”), which was developed by a group at Dartmouth in 1963-64, is the most widely available language for small home computers, and will enable people to do about anything that they want to do with such a computer. (What most people seem to want to do with these computers is play games on them, and the programs for games come ready-made.) These small computers have very little memory—at most, sixty-five thousand eight-bit words—and so cannot fully exploit the most advanced computer languages, although simplified versions of some high-level languages are available. The differences begin to be felt in the complex programs needed in the field of artificial intelligence. For these programs, FORTRAN and BASICare simply not sophisticated enough. When FORTRAN was first invented, computer memory cost over a dollar per memory bit. Today, one can buy a sixty-five-thousand-bit memory-circuit chip for around six dollars—so memory is about ten thousand times as cheap now. The next generation of personal computers should give their users the most advanced computer languages. But someday, according to Minsky, the most useful programs for personal computers will be based on artificial-intelligence programs that write programs of their own. The idea is for an ordinary person—not a programmer—to describe what he wants a program to do in informal terms, perhaps simply by showing the program-writing program some examples. Then it will write a computer program to do what was described—a process that will be much cheaper than hiring a professional programmer.
Between machine language and compilers, there is another level of computer-language abstraction—assemblers—which was developed even before the compilers. In an assembly-language instruction, instead of writing out a string of binary digits that might tell the machine to add two numbers one can simply write “ADD” in the program, and this will be translated into machine language.FORTRAN is one step up from this in sophistication. In any computation, the next step will often depend on the result of a previous step. If one number turns out to be larger than another, one will want to do one thing, and in the opposite case another thing. This can be signalled in a FORTRAN program by the instruction “IF” followed by instructions for what to do in either of the alternative cases—a marvellous simplification, provided that one knows in advance that there are two cases. In a chess-playing program, one might well get into a situation in which the number of cases that one would like to examine would depend on one’s position on the board, which cannot be predicted. One would thus like the machine to be able to reflect on what it is doing before it proceeds. In the late nineteen-fifties, a new class of languages was developed to give computers the capacity for reflection. The instructions in these languages interact creatively with the machine.
When I asked Minsky about these languages, he said, “In an ordinary programming language, like FORTRAN or BASIC, you have to do a lot of hard things to get the program even started—and sometimes it’s impossible to do these things. You must state in advance that in the computer memory certain locations are going to be used for certain specific things. You have to know in advance that it is going to use, say, two hundred storage cells in its memory. A typical program is made up of a lot of different processes, and in ordinary programs you must say in advance how each of these processes is to get the information from the others and where it is to store it. These are called declarations and storage allocations. Therefore, the programmer must know in advance what processes there will be. So you can’t get a FORTRAN program to do something that is essentially new. If you don’t know in advance what the program will do, you can’t make storage allocations for it. In these new languages, however, the program system automatically creates space for new things as the program creates them. The machine treats memory not as being in any particular place but, rather, as consisting of one long string, and when it needs a new location it just takes it off the beginning of the string. When it discovers that some part of the program is not being used, it automatically puts it at the end of the string, where it can be used again if it is needed—a process that is known in the computer business as garbage collection. The machine manipulates symbols, and not merely numbers. It is much closer to using a natural language.” One remarkable feature of these new list-processing languages is that they can be used to design other new languages. A list-processing program can be designed to read and write list-processing programs, and so generate new programs of essentially limitless complexity. The development of the list-processing languages derived from attempts to carry out two of the classic problems in artificial intelligence: the use of machines to play games like chess and checkers, and the use of machines to prove theorems in mathematics and mathematical logic. Many of the programming ideas in the two domains are the same. The first significant modern paper on chess-playing programs was written in 1950 by Claude Shannon, then at the Bell Labs, who later, in the sixties and early seventies, preceded Minsky as the Donner Professor at M.I.T. The basic element in Shannon’s analysis—and in all subsequent analyses, including those that have made possible the commercially available chess-playing machines—is a set of what scientists call game trees; each branching of a game tree opens up new possibilities, just as each move in a chess game creates more possible moves. A player opening a chess game has twenty possible moves. On his second play he can have as many as thirty. As play progresses, the number of possible combinations of moves expands enormously. In a typical game, all future possible positions would be represented by a number on the order of 10120—an absurdly large number. If a computer could process these possibilities at the rate of one per billionth of a second, it would take 10111 seconds to run the entire game tree for a single chess game. But the universe is only about 1017 seconds old, so this is not the way to go. (In checkers, there are only 1040 possible positions, which at the same rate would take 1022 centuries—or 1031 seconds—to consider.) Obviously, the human player can consider only a minute fraction of the branches of the tree of continuations resulting from any given chess move, and the computer must be programmed to do the same. While Shannon did not actually write a computer program for making such considerations, he did suggest a framework for a program. First, one would choose a depth—two or three moves—to which one would analyze all legal moves and their responses, and one would evaluate the position at the end of each of these moves. On the basis of evaluations, one would choose the move that led to the “best” final configuration. In a position where there are, say, three legal moves, white may find that one move will lead to a draw if black makes his best move; in another of the three moves, white will lose if black does what he is supposed to do; and in the third possible move white will win if black misplays but will lose if black plays correctly. In such a situation, Shannon’s procedure would call for white to make the first of the three moves—an assumption that would guarantee a draw. In reality, matters are rarely as cut and dried as this, so more complicated criteria, such as material mobility, king defense, and area control, have to be introduced and given numerical weights in the calculation, and Shannon suggested procedures for this. In 1951, the British mathematician Alan Turing—who after von Neumann was probably the most influential thinker of this century concerning the logic of automata—developed a program to carry out Shannon’s scheme. Since he did not have a computer to try it on, it was tried in a game in which the two players simulated computers. It lost to a weak player. In 1956, a program written by a group at Los Alamos was tried on the MANIAC-Icomputer. Their program, which involved a game tree of much greater depth, used a board with thirty-six spaces (the bishops were eliminated) instead of the board of sixty-four spaces that is used in real chess. The computer beat a weak player. The first full chess-playing program to be run on a computer was devised by Alex Bernstein, a programmer with I.B.M., in 1957. Seven plausible moves were examined to a depth of two moves each, and the program played passable amateur chess. It ran on the I.B.M. 704 computer, which could execute forty-two thousand operations a second, compared with eleven thousand operations a second by the MANIAC-I.
In 1955, Newell, Shaw, and Simon began work on a chess program. In a paper published in 1958 in the IBM Journal of Research and Development, they wrote: “In a fundamental sense, proving theorems [in symbolic logic] and playing chess involve the same problem: reasoning with heuristics that select fruitful paths of exploration in a space of possibilities that grows exponentially. The same dilemmas of speed versus selection and uniformity versus sophistication exist in both problem domains.” The three also invented what they called the Logic Theorist, which was a program designed to prove certain theorems in symbolic logic. In a 1957 paper on this, published in the Proceedings of the Western Joint Computer Conference, they wrote:
The reason why problems are problems is that the original set of possible solutions given to the problem-solver can be very large, the actual solutions can be dispersed very widely and rarely throughout it, and the cost of obtaining each new element and of testing it can be very expensive. Thus the problem-solver is not really “given” the set of possible solutions; instead he is given some process for generating the elements of that set in some order. This generator has properties of its own, not usually specified in stating the problem; e.g., there is associated with it a certain cost per element produced, it may be possible to change the order in which it produces the elements, and so on. Likewise the verification test has costs and times associated with it. The problem can be solved if these costs are not too large in relation to the time and computing power available for solution.
The Logic Theorist was run on a computer at Rand that was a copy of the von Neumann machine at the Institute for Advanced Study and that Rand had named, over the objections of von Neumann, the JOHNNIAC. The program was able to supply proofs of some fairly complex theorems, though it failed with others. To program the JOHNNIAC, Newell, Shaw, and Simon used their newly invented I.P.L., which was the forerunner of the list-processing languages. In 1958, they wrote their chess program in a later model of this language called I.P.L.-IV for the JOHNNIAC, and they subsequently described its performance as “good in spots.” However, most current chess programs are written in machine language rather than in any of the list-processing languages (including LISP), for reasons of speed and economy of memory.
Chess players who compete in tournaments are given a numerical point rating. At present, the mean rating of all the United States tournament players is 1,500. Anatoly Karpov, who is the world champion, is rated by the World Chess Federation at 2,700. The best current chess program is Belle, which was developed by Ken Thompson and Joe Condon, of the Bell Labs, followed closely by the chess programs of Northwestern University—Chess 4.9, designed by David Slate and Lawrence Atkin, and Nuchess, designed by Slate and William Blanchard. Belle is rated at about 2,200, and Chess 4.9 at about 2,050; Nuchess has not yet played in enough tournaments to receive a rating. The microcomputer chess-playing machines that are available commercially, for between one hundred and four hundred dollars, can be set to play at various levels (some up to 1,800), but at the higher levels they take an eternity to decide on a hard move. In general, these programs do not really mimic what a human chess master can do. The human chess master can take in more or less at a glance the general structure of a position and then analyze a limited number of moves—three or four—to depths that vary greatly depending on the position. To give a famous example, when Bobby Fischer was thirteen he played a tournament game against the master Donald Byrne—a game that some people have called the greatest chess game played in this century. On the seventeenth move, Fischer sacrificed his queen, for reasons apparent at the time to no one but him. The resulting combination was so profound that it was not until twenty-four moves later that Fischer executed the mate he must have seen from the beginning. This game convinced many people—including, no doubt, Fischer—that it was only a matter of time before he became chess champion of the world. It would be fascinating to replay this game with, say, Belle taking Fischer’s role, to see if by using its methods, which could be quite different, it would have found this mate. That seems unlikely. Still, Belle now plays chess better than all but five per cent of the American tournament chess players, and there is every reason to believe that in the near future it, or some similar program, will beat all of them.
At present, something of a debate is raging both within the artificial-intelligence community and outside it about what the enterprise of “artificial intelligence” really is. The most commonly accepted idea among workers in the field is that it is the attempt to produce machines whose output resembles, or even finally cannot be distinguished from, that of a human mind. The ultimate machine might by itself be able to perform all the cognitive functions, or, more modestly, many kinds of machines might be needed to perform such functions. The first question that this goal raises is what is meant by a machine. Nearly all workers in the field seem to mean some sort of digital computer when they refer to a machine. In this respect, there is a remarkable fact about computers known since the nineteen-thirties; namely, that although real computers may come in all sorts of models, there is in theory only one kind of computer. This notion derives from the pioneering work of Alan Turing, who conceived of something he called the abstract universal computer, which, in principle, can be programmed to imitate any other computer. This universal computer can do any sequence of operations that any model of computer can do. One view of the goal of artificial intelligence would be to build a computer that, by its output, simply could not be distinguished from a mind. Since human minds play games like chess and checkers, do mathematics, write music, and read books, the ideal machine would have to be able to do all of these things at least as well as human beings do them. Obviously, to make such a machine is an enormous task, perhaps an impossible one. People working in artificial intelligence, like any scientists confronted with an incredibly complex problem, have been trying to attack this task in pieces: thus the attempts to make machines—both the hardware and the necessary programs—that play games, that “understand” newspaper accounts, and that can recognize patterns. That machines can already do all of these things with varying degrees of success is certainly a fact. The debate nowadays is over what this means. Are we thereby approaching a better understanding of the human mind? It is not entirely clear what would settle the debate. Even if a humanoid machine were built, many people would certainly argue that it did not really understand what it was doing, and that it was only simulating intelligence, while the real thing lay beyond it and would always lie beyond it. Minsky feels that there is at least a possibility that this might not be true. He sees the development of artificial intelligence as a kind of evolutionary process and thinks that just as intelligence developed in animals over a long sequence of trials and improvements, the same thing might happen in a shorter time as we guide the evolution of machines.
The vast majority of the contemporary workers in artificial intelligence have concentrated on the development of increasingly complex programs for computers—an activity that is justifiable, considering all that has been achieved. But it may also be misleading. This point was made in 1979 by the British molecular biologist Francis Crick, in a Scientific Americanarticle called “Thinking about the Brain.” Crick writes:
The advent of larger, faster and cheaper computers, a development that is far from reaching its end, has given us some feeling for what can be achieved by rapid computation. Unfortunately the analogy between a computer and the brain, although it is useful in some ways, is apt to be misleading. In a computer information is processed at a rapid pulse rate and serially. In the brain the rate is much lower, but the information can be handled on millions of channels in parallel. The components of a modern computer are very reliable, but removing one or two of them can upset an entire computation. In comparison the neurons of the brain are somewhat unreliable, but the deletion of quite a few of them is unlikely to lead to any appreciable difference in behavior. A computer works on a strict binary code. The brain seems to rely on less precise methods of signalling. Against this it probably adjusts the number and efficiency of its synapses in complex and subtle ways to adapt its operation to experience. Hence it is not surprising to find that although a computer can accurately and rapidly do long and intricate arithmetical calculations, a task at which human beings are rather poor, human beings can recognize patterns in ways no contemporary computer can begin to approach.
While many workers in artificial intelligence might agree with Crick’s statement about patterns, it is nonetheless true that computers, in conjunction with electronic visual sensors—TV cameras, in effect—are now able to perform some interesting feats of pattern recognition. The first machine that was able to do sophisticated pattern recognition was the Perceptron, designed by Minsky’s former Bronx Science classmate Frank Rosenblatt. Working at the Cornell Aeronautical Laboratory, Rosenblatt built the prototype version of the Perceptron in 1959. A few years later, I had an opportunity to discuss with him how it worked. The machine consisted of three elements. The first element was a grid of four hundred photocells, corresponding to the light-sensitive neurons in the retina; they received the primary optical stimuli. The photocells were connected to a group of components that Rosenblatt called associator units—the second element—whose function was to collect the electrical impulses produced by the photocells. There were five hundred and twelve associator units, and each unit could have as many as forty connections to the photocells. These connections were made by randomly wiring the associators to the cells. The wiring was done randomly because it was then believed that some, and perhaps most, of the “wiring” in the brain that connects one neuron to another was done randomly. The argument for this was essentially one of complexity. Our brains gain—grow—neurons during prenatal development, until, at birth, the total may have reached forty billion or more. How do they all know where to go in the brain and elsewhere, and what to connect up to when they get there? It was argued by many early brain researchers that if this wiring was largely random the neurons wouldn’t have to know, since where an individual neuron went would not matter much. As a result of experimental work done in recent years, however, it appears that this is not in fact how things work. The connections do seem to be determined from an early stage of development, and are specific both for specific regions of the brain and for specific neurons within these regions. How the information to specify all this is processed so that the neurons do what they are supposed to do remains a mystery. But when Rosenblatt was building the Perceptron it was thought that randomness was important. The third element of Rosenblatt’s Perceptron consisted of what he called response units. An associator—in analogy to a neuron—would produce a signal only if the stimulus it received was above a certain threshold, at which point it would signal the response units. The idea was to use this structure to recognize shapes. First, the machine was shown, say, an illuminated “A,” to which it would respond in accord with its initially random instructions. Then the “A” was deformed or moved and was shown to the machine again. If it responded in the same way both times, it had recognized the “A”; if not, then some of its responses would presumably be “right” and some “wrong.” With adjustments in electronics, the wrong responses could be suppressed, and Rosenblatt’s claim was that after a finite number of adjustments the machine would learn to recognize patterns.
Rosenblatt was an enormously persuasive man, and many people, following his example, began to work on Perceptrons. Minsky was not among them. Ever since his days as a graduate student, when he and Dean Edmonds built one of the earliest electronic learning machines, he had been aware of the limitations of such machines, and had come to the conclusion that it was more profitable to concentrate on finding the principles that will make a machine learn than to try building one in the hope that it would work. Minsky and Rosenblatt engaged in some heated debates in the early sixties. During my discussions with Minsky, he described what the issues were.
“Rosenblatt made a very strong claim, which at first I didn’t believe,” Minsky told me. “He said that if a Perceptron was physically capable of being wired up to recognize something, then there would be a procedure for changing its responses so that eventually it would learn to carry out the recognition. Rosenblatt’s conjecture turned out to be mathematically correct, in fact. I have a tremendous admiration for Rosenblatt for guessing this theorem, since it is very hard to prove. However, I started to worry about what such a machine could not do. For example, it could tell ‘E’s from ‘F’s, and ‘5’s from ‘6’s—things like that. But when there were disturbing stimuli near these figures that weren’t correlated with them the recognition was destroyed. I felt the proponents of the Perceptron had been misled experimentally by giving the machine very clean examples. It would recognize a vertical line and a horizontal line by themselves, but when you put in a varied background with slanted lines the machine would break down. It reminds me, in some ways, of a wonderful machine that J. C. R. Licklider made at Harvard early in the nineteen-fifties. It could recognize the word ‘watermelon’ no matter who said it in no matter what sentence. With a simple enough recognition problem, almost anything will work with some reliability. But to this day there is no machine that can recognize arbitrarily chosen words in ordinary speech.”
In 1963, Minsky began to work with Seymour Papert, and the two men are still collaborators. Papert, who was born in South Africa in 1928, had received his Ph.D. in mathematics at the University of Witwatersrand in 1952, and then, deciding that he still didn’t know enough mathematics, had gone to Cambridge University and taken a second Ph.D. in the subject. He had become interested in the question of learning, and it was in 1958 that he became an associate of Jean Piaget, in Geneva, where he remained for several years. Minsky and Papert were brought together by the neurophysiologist Warren McCulloch, whose paper with Pitts on neurons had so impressed Minsky in the forties; McCulloch had come to work at M.I.T.’s Research Laboratory of Electronics in 1952. “Seymour came to M.I.T. in 1963 and then stayed forever,” Minsky recalled. Within a few months of Papert’s arrival, they had initiated new research programs in human perception, child psychology, experimental robots, and the theory of computation. In the middle nineteen-sixties, Papert and Minsky set out to kill the Perceptron, or, at least, to establish its limitations—a task that Minsky felt was a sort of social service they could perform for the artificial-intelligence community. For four years, they worked on their ideas, and in 1969 they published their book “Perceptrons.”
“There had been several thousand papers published on Perceptrons up to 1969, but our book put a stop to those,” Minsky told me. “It had, in all modesty, some beautiful mathematics in it—it’s really nineteenth-century mathematics. As we went on, more and more questions were generated, so we worked on them, and finally we solved them all. As a result, the book got some rave reviews when it came out. People said, ‘Now computer science has some fundamentally new mathematics of its own. These people have taken this apparently qualitative problem and made a really elegant theory that is going to stand.’ The trouble was that the book was too good. We really spent one year too much on it. We finished off all the easy conjectures, and so no beginner could do anything. We didn’t leave anything for students to do. We got too greedy. As a result, ten years went by without another significant paper on the subject. It’s a fact about the sociology of science that the people who should work in a field like this are the students and the graduate students. If we had given some of these problems to students, they would have got as good at it as we were, since there was nothing special about what we did except that we worked together for several years. Furthermore, I now believe that the book was overkill in another way. What we showed came down to the fact that a Perceptron can’t put things together that are visually nonlocal.”
At this point in our conversation, Minsky took a spoon and put it behind a bowl. “This looks like a spoon to you, even though you don’t see the whole thing—just the handle and a little part of the other end,” he said. “The Perceptron is not able to put things together like that, but then neither can people without resorting to some additional algorithms. In fact, while I was writing a chapter of the book it began to dawn on me that for certain purposes the Perceptron was actually very good. I realized that to make one all you needed in principle was a couple of molecules and a membrane. So after being irritated with Rosenblatt for overclaiming, and diverting all those people along a false path, I started to realize that for what you get out of it—the kind of recognition it can do—it is such a simple machine that it would be astonishing if nature did not make use of it somewhere. It may be that one of the best things a neuron can have is a tiny Perceptron, since you get so much from it for so little. You can’t get one big Perceptron to do very much, but for some things it remains one of the most elegant and simple learning devices I know of.”
When computers first came into use in this country, in the early nineteen-fifties, they were so expensive that they were almost exclusively the province of the large military-oriented government laboratories, like Los Alamos and the Rand Corporation—the former financed by the Atomic Energy Commission and the latter by the Air Force. The computer project at the Institute for Advanced Study, one of the few such projects that were then being carried out in an educational institution, was financed jointly by RCA, the Atomic Energy Commission, the Office of Naval Research, the Office of Air Research, and Army Ordnance. Hardly anyone imagined that within a few decades almost every major university would have a large computer and a department of computer science. A great deal of the funding for pure science in this country now comes from the National Science Foundation, whose budget is spread across all the sciences, with a small fraction of it going to computer science. The Defense Department has its Advanced Research Projects Agency, known as ARPA, whose mission is to finance technology that might eventually have some application to the military. This agency has sometimes interpreted its function as being to detect technological weaknesses in American science and to attempt to remedy them. In the early nineteen-sixties, it began to finance pure research in computer science at universities, and over half the money spent in this field since then has come from ARPA. Around 1963, following the work of McCarthy, Corbató, and their M.I.T. collaborators on computer time-sharing, Project MAC—MAC stood for both “machine-aided cognition” and “multiple-access computer”—was begun at M.I.T., with ARPA providing a budget of about three million dollars a year. About a million dollars of this went to the Artificial Intelligence Group.
“In the first years, we spent this money on hardware and students,” Minsky told me. “But by the tenth year we were making our own hardware, so we spent nearly all the money on faculty and students. We assembled the most powerful and best-human-engineered computer-support system in the world—bar none.” Initially, some of the students were dropouts, who became systems engineers and, eventually, distinguished scientists. Most of them were refugees from other fields, principally mathematics and physics. From the beginning, Minsky’s goal was to use this pool of talent to learn what computers could be made to do in solving non-arithmetic problems—in short, to make these machines intelligent.
For their first problem, Minsky and his students tried to program a computer to do freshman calculus. Calculus was one of the discoveries of Isaac Newton, who found that it was the best way in which to express his laws of motion. One way of applying calculus is to think of it as a sort of “infinite arithmetic,” in which one can calculate, for example, the behavior of planets by doing a great many steps of addition and multiplication. This method—called numerical integration—was one of the first things computers were used for. In fact, it was just this application that the inventors of computers had in mind for them. But there is a second way of doing calculus—this was also explored by Newton—in which one thinks of the calculus as a finite algebra, a skill that involves symbolic manipulation rather than numbers. If one can solve a calculus problem in this closed, algebraic way, one gets an answer that is not just highly accurate but perfect. Such accuracy had never been achieved on any machine, and it was this symbolic manipulation that Minsky and his students set about to program into their machines.
The two most important symbolic methods are called differentiation and integration. In the first, one finds the rate of a process from a description of the process; and in the second, which is, in some sense, the inverse of the first, one recovers the process from the rate. Specifically, in the first case one finds the tangent to a given curve, and in the second one computes the curve from the knowledge of the tangent. Freshman calculus students are taught a certain number of techniques for doing this—what are essentially mental computer programs. When one is confronted with a new problem, one searches around in one’s head—or in a calculus book—for a procedure that looks as if it could be made to work, and then tries to make the expression that one has been given fit one of these algorithms. Minsky’s student James Slagle codified this process in his program SAINT—for “symbolic automatic integrator”—in 1961. The machine he used, an I.B.M. 7090, was given twenty-six standard forms—certain elementary integrals—and eighteen simple algorithms. SAINT would take its problem—given to it in the language of elementary functions, which had to be defined as well—and then begin a search among its algorithms. If the computer found the problem to be too hard, it would break it up into simpler ones. If the computer received a problem that could not be done in closed form, it would try and then quit. For his doctoral thesis, Slagle gave the computer eighty-six workable integrals to evaluate, using SAINT. Even the I.B.M. 7090, which by modern standards is a dinosaur, did eighty-four of the eighty-six at speeds comparable to those achieved by an M.I.T. freshman, and sometimes faster. Many it did in less than a minute. Two of the integrals were beyond it. Slagle’s program, which was written in LISP, was regarded as a breakthrough in the attempt to get a computer to do symbolic manipulation. (Some ten years later, two of Minsky’s students, William Martin and Joel Moses, and Carl Engelman, a mathematician with the MITRE Corporation, building on this work, designed a program that, as it exists now, can do almost any symbolic manipulation that a working physicist or engineer might be called on to do. It is called MACSYMA. Comparable systems are now available in some of the other major computer centers. Some of the algebraic computations that physicists run into these days would take thousands of pages to work .out; they go beyond the point where one would have a great deal of confidence in the answer, even if one could find it. These calculations can be done automatically with MACSYMA, which has become a standard reference in many theoretical-physics papers. One may now tie in to this system by telephone—so that, for better or worse, one can get one’s algebra done by dialling a number.)
The next kind of problem that Minsky attacked with a student—Thomas Evans, in this case—was to get a machine to reason by analogy. The machine was supposed to solve problems of the sort that ask “Figure A is to Figure B as Figure C is to which of the following figures [D-l through D-5]?” Evans’ doctoral thesis, completed in 1963, posed this one:
The first thing that came to my mind when I saw these drawings was: How did they get the computer to deal with them? Did it look at them? “No,” Minsky told me. “The trouble with computer-vision programs in those days was that they were always full of bugs. We really didn’t know how to make them work reliably. It would have taken us a year to get one that worked, and then nobody would have really cared. Evans developed a little sublanguage for describing line figures, and this was typed into the machine.” Evans wrote his program inLISP, which allows one to define two symbols—S and T, say—and then create a third, related symbol, such as L. In this way, one can code the proposition that two “points” (S and T) define a “line” (L). This is all that the computer has to know about what a line is; namely, it is something defined by two symbols called points. A line and a point define a plane. From a mathematical point of view, this abstract set of relationships is what these objects are, although, of course, we attach many other meanings to them. “Nowhere does the machine ‘really’ know what a line is, but I believe that there is nothing in us that ‘really’ knows what a line is, either—except that our visual system identifies certain inputs with ‘lineness,’ ” Minsky said. “It is the web of order among these inputs that makes them unique. Or, even if they are not unique, one pretends to know what a line is anyway.”
Once the machine had Figures A and B coded into its memory, it would compare them, using such criteria as “big” and “small,” “inside” and “outside,” and “left” and “right.” It would attempt to see what operations were necessary to transform one drawing into the other. The specification of “inside” and “outside” employs a method invented in the nineteenth century. To take an example: Imagine that one is somewhere inside a circle and one draws a line due north from where one is. This line will intersect the circle once as it goes through it. But if one is below the circle and outside it and draws a line north the line either will not intersect the circle at all or will intersect it twice. There are nuances, but, in general, this procedure will tell one that one is inside a closed curve if there is an odd number of crossings and outside if there is an even number of them. This was the procedure that Evans used. In Evans’ example, the machine would note that one of the differences between A and B is that the circle has been moved down to encase the small figure; after comparing A and B, the machine would next compare A and C. In this case, it would note that both had big figures above, and that each encased a small figure. Now the machine would try to find a diagram in the D series in which the big figure had moved down to encase the small one below it—D-3.
To make a computer do all this, Evans worked out one of the most complex programs that had ever been written. The machine he was using had a memory of thirty-two thousand words, each of thirty-six bits—about a million bits of memory. At that time, memory units cost about a dollar a bit, so the cost of the memory alone was about a million dollars. Memory now costs about a hundredth of a cent a bit, so today a comparable memory would cost at retail about a hundred dollars. Evans’ program used essentially every bit in the machine’s memory, and was able to do about as well on the tests as an intelligent high-school student. Apart from the specific results of the program, what fascinated Minsky was the reactions to it. “It irritated some people a lot,” he told me. “They felt that if you could program something, then the machine was not ‘really’ doing it—that it didn’t really have a sense of analogy. I think that what Evans’ program showed was that once one came to grips with ‘intuitions’ they turned into a lot of other things. I was convinced that the way the thing worked was pretty lifelike. Until one finds a logic for the kind of thing that Evans’ program did, it looks like ‘intuition’—but that is really superficial. What we had done was to find a logic for this kind of problem-solving. What we never did do was to use a lot of statistical psychology to learn what some ‘average’ person does when solving these problems. For a long time, I had a rule in my laboratory that no psychological data were allowed. I had read a lot of such data when I was in college, and I felt that one couldn’t learn very much by averaging a lot of people’s responses. What you had to do was something like what Freud did. Tom Evans and I asked ourselves, in depth, what we did to solve problems like this, and that seemed to work pretty well.”
During the period when Evans was finishing his thesis work, Minsky and his colleagues were involved in two other kinds of projects. computer linguistics and robotics. One of the earliest non-numerical projects that were tried on computers was language translation. It was not a notable success, in part because not enough was known about syntax and in part because of the inherent ambiguity of words. Simple word-by-word translation leads to absurdities. For example, one often cannot tell a noun from a verb without an understanding of the contextual meaning—and at the time such an understanding seemed beyond the capacity of computers. In his view, Minsky told me, the notions of Noam Chomsky and others concerning the formal theory of syntax helped to clarify many of the technical issues about the structure of phrases and sentences. “But I felt that they actually distracted linguists from other basic problems of meaning and reference,” he went on. “I saw little hope for machines to deal realistically with language until we could make simple versions of programs that really understood simple sentences in simple ways. In doing this semantic-information processing, as I called it, the early A.I. community worked pretty much by itself, without the help, or hindrance, of the linguists—at least, until much later.”
The work on language resulted in two M.I.T. doctoral theses that have become widely known. In 1964, Bertram Raphael, a mathematics student, wrote, as part of his thesis, a program that would allow a computer to make decisions, in a limited domain, about the meaning of words within a given context. Raphael first gave a computer a sequence of statements: Every boy is a person.
A finger is part of a hand.
There are two hands on each person.
He then asked it a question:
How many fingers does John have?
Up to this point, the name “John” had not been defined in the program, and the verb “have” can be used in several senses—as in “John had his dinner,” “John was had for dinner,” and “We had John to dinner.” When the computer was confronted with such a problem, Raphael’s program did not break down. Instead, the machine responded, “The above sentence is ambiguous. But I assume ‘has’ means ‘has as parts.’’ It then asked, “How many fingers per hand?” Having been told that “John is a boy” and that each hand has five fingers, it was asked once again how many fingers John had. It now replied, “Ten.” Later, Raphael asked the machine “Who is President of the United States?” and it replied, “Statement form not recognized.”
Minsky told me that he had found Raphael’s program particularly interesting because it could tolerate contradictions. “If you told the machine that John had nine fingers, it would not break down,” he said. “It would try to build a sort of hierarchy of knowledge around this fact. In other words, given any situation, it would look for the most specific information it had about it, and attempt to use it.”
Probably the most spectacular program of this sort to be developed in the nineteen-sixties was one created by Minsky’s student Daniel Bobrow that sought to combine language and mathematics. He named it STUDENT. To keep the mathematics relatively simple, Bobrow chose to work with high-school-algebra problems. These are basically word problems, since, once the words have been translated into equations, what is involved is the solution of two, or possibly three, simultaneous equations—a snap for a computer. One of the problems posed in Bobrow’s thesis, also completed in 1964, was this:
The gas consumption of my car is 15 miles per gallon. The distance between Boston and New York is 250 miles. What is the number of gallons of gas used on a trip between New York and Boston?
The machine was programmed to make the assumption that every sentence is an equation, and was given some knowledge about certain words to help it to find the equations. For example, it knew that the word “is” often meant that the phrases on both sides of “is” represent equal amounts. It knew that “per” meant division. “The program was usually just barely good enough at analyzing the grammar of sentences to discern where phrases begin and end,” Minsky told me. “The program is driven by the possible meanings—the semantics—to analyze the syntax. From the mathematical word ‘per’ in that first sentence’s ‘miles per gallon,’ it can tell that the number fifteen would be obtained by dividing a certain number, x of miles, by some other number, y of gallons. Other than that, it hasn’t the slightest idea what miles or gallons are, or, for that matter, what cars are. The second sentence appears to say that something else equals two hundred and fifty miles—hence the phrase ‘the distance between’ is a good candidate to be x. The third sentence asks something about a number of gallons—so that phrase ‘of gas used on a trip’ Claude is a candidate to be y. So it proposes one equation: x = 250, and another controlled equation, x/y = 15. Then, the mathematical part of the program can easily find that y = 250/15.” Such problems are easy for STUDENT when exactly materials the same phrases are used for the same quantities in the different sentences. When the phrases are as different as they are in Bobrow’s problem, the program matches them up by using such tricks as seeing which have the most words in common. This didn’t always work, but, Minsky remarked, “It seemed incredible that this could work so often when so many high-school students find those problems so hard. The result of all this is a program that—on the surface, at least—can not only manipulate words syntactically but also understand, if only in a shallow way, what it is doing. It does not know what ‘gas’ is or what ‘gallons’ are, but it knows that if it takes miles per gallon and multiplies by gallons it will find the total distance. It can wander through a little of this common-sense logic and solve algebra problems that students find hard because they get balled up in the understanding of how the words work.”
Minsky went on, “I am much more interested in something like this than I am in one of those large performances in which a machine beats, for example, a chess master. In a program like Bobrow’s or Raphael’s, one has cases in which the skills required appear to be rather obscure but can nonetheless be analyzed. In some sense, the performance of the machine here is childish; but this impresses me more than when a computer does calculus, which takes a kind of expertise that I think is fundamentally easy. What children do requires putting together many different kinds of knowledge, and when I see a machine that can do something like that it’s what impresses me most.”
While Minsky has always had a great fondness for robots, he came to the conclusion rather early that from the point of view of laboratory experiments making a robot mobile was more trouble than it was worth. “I thought that there were enough problems in trying to understand hands and eyes, and so forth, without getting into any extra irrelevant engineering,” he told me. “My friends at the Stanford Research Institute decided in the mid-sixties to make their first robot mobile—against my advice.”
In 1962, Henry Ernst, who was studying with both Minsky and Claude Shannon, made the Artificial Intelligence Group’s first computer-controlled robot. It was a mechanical arm with a shoulder, an elbow, and a gripper—basically, the kind of arm that is used to manipulate radioactive materials remotely. The arm was attached to a wall and activated by several motors, which, in turn, were controlled by a computer. The robot’s universe of discourse consisted of a box and blocks that were set out on a table. It had photocells in the fingertips of the gripper. The hand would come down until it was nearly in contact with the surface of the table, and then, when the photocells sensed the darkness of the hand’s shadow, its program would tell it to stop. It would thereupon begin to move sidewise until it came into contact with a block or the box. It could tell the difference, because if the object was less than three inches long it was a block and if it was more than three inches long it was the box. The program would then direct the arm to pick up the block and put it in the box. The arm could find all the blocks on a table and put them into the box. “It was sort of eerie to watch,” Minsky recalled. “Actually, the program was way ahead of its time. I don’t know if we appreciated then how advanced it was. It could deal with the unexpected. If something that it didn’t expect happened, it would jump to another part of its program. If you moved the box in the middle of things, that wouldn’t bother it much. It would just go and look for it. If you moved a block, it would go and find another one. If you put a ball on the table, it would try to verify that it was a block. Incidentally, when Stanley Kubrick was making his film ‘2001’ he asked me to check the sets to see if anything he was planning to film was technically impossible. I drew a sketch for Kubrick of how mechanical hands on the space pod might work. When I saw the film, I was amazed that M-G-M had been able to make better mechanical hands than we could. They opened the spaceship’s airlock door fantastically well. Later, I learned that the hands didn’t really work, and that the door had been opened by a person concealed on the other side.”
In the mid-nineteen-sixties, Minsky and Papert began working together on the problem of vision. These efforts . ultimately produced a program created by Minsky in collaboration with a group of hackers—Gerald Sussman, William Gosper, Jack Holloway, Richard Greenblatt, Thomas Knight, Russell Noftsker, and others—that was designed to make a computer “see.” To equip the computer for sight, Minsky adapted some television cameras. He found that the most optically precise one had been invented in the early nineteen-thirties by Philo Farnsworth, who was one of the early television pioneers. It was still being manufactured by ITT. Minsky ordered one and managed to get it working, but it kept blurring. He telephoned the company and was told that the best thing to do would be to talk to Farnsworth himself, who was still doing research at the company. Minsky explained his problem on the telephone, and Farnsworth instantly diagnosed it. Minsky then fixed the blurring and attached the camera to a PDP-6 computer. The idea was to connect this camera to an arm so that one could tell the computer to pick up objects that its eye had spotted and identified. The arm was then to do various things with the objects. In the course of this, Minsky designed a mechanical arm, powered by fourteen musclelike hydraulic cylinders. It had a moving shoulder, three elbows, and a wrist—all not much thicker than a human arm. When all the bugs were finally out and the machine was turned on, the hand would wave around until the eye found it. “It would hold its hand in front of its eye and move it a little bit to see if it really was itself,” Minsky said. The eye had to find itself in the coördinate system of the hand. Despite all the problems, they were able to get the arm to catch a ball by attaching a cornucopia to the hand, so that the ball would not fallout. It would sometimes try to catch people, too, so they finally had to build a fence around it.
The project turned out to be much more difficult than anyone had imagined it would be. In the first place, the camera’s eye, it was discovered, preferred to focus on the shadows of objects rather than on the objects themselves. When Minsky and his colleagues got that straightened out, they found that if the scene contained shiny objects the robot would again become confused and try to grasp reflections, which are often the brightest “objects” in a scene. To solve such problems, a graduate student named David Waltz (now a professor of electrical engineering at the University of Illinois at Urbana) developed a new theory of shadows and edges, which helped them eliminate most of these difficulties. They also found that conventional computer-programming techniques were not adequate. Minsky and Papert began to try to invent programs that were not centralized but had parts—heterarchies—that were semi-independent but could call on one another for assistance. Eventually, they developed these notions into something they called the society-of-the-mind theory, in which they conjectured that intelligence emerges from the interactions of many small systems operating within an evolving administrative structure. The first program to use such ideas was constructed by Patrick Winston—who would later succeed Minsky as director of the A.I. Laboratory. And by 1970 Minsky and his colleagues had been able to show the computer a simple structure, like a bridge made of blocks, and get the machine, on its own, to build a duplicate.
At about the same time, one of Papert’s students, Terry Winograd, who is now a professor of computer science and linguistics at Stanford, produced a system called SHRDLU. (On Linotype machines, operators used the phrase “ETAOIN SHRDLU” to mark a typographical error.) SHRDLU was probably the most complicated computer program that had ever been written up to that time. The world that Winograd created for his SHRDLU program consisted of an empty box, cubes, rectangular blocks, and pyramids, all of various colors. To avoid the complications of robotics, Winograd chose not to use actual objects but to have the shapes represented in three dimensions on a television screen. This display was for the benefit of the people running the program and not for the machine, which in this case was a PDP-10 with a quarter of a million words of memory. The machine can respond to a typed command like “Find a block that is taller than the one you are holding and put it into the box” or “Will you please stack up both of the red blocks and either a green cube or a pyramid?” When it receives such a request, an “arm,” symbolized by a line on the television screen, moves around and carries it out. The programming language was based on one named PLANNER, created by Carl Hewitt, another of Papert’s students.PLANNER, according to Minsky, consists largely of suggestions of the kind “If a block is to be put on something, then make sure there is room on the something for the block to fit.” The programmer does not have to know in advance when such suggestions will be needed, because the PLANNER system has ways to detect when they are necessary. Thus, the PLANNER assertions do not have to be written in any particular order—unlike the declarations in the ordinary programming languages—and it is easy to add new ones when they are needed. This makes it relatively easy to write the language, but it also makes it extremely difficult to anticipate what the program will do before one tries it out—“so hard,” Minsky remarked, “that no one tries to use it anymore.” He added, “But it was an important stepping stone to the methods we use now.” One can ask it to describe what it has done and say why it has done it. One can ask “Can a pyramid be supported by a block?” and it will say “Yes,” or ask “Can the table pick up blocks?” and it will say “No.” It is sensitive to ambiguities. If one asks it to pick up a pyramid—and there are several pyramids—it will say “I don’t understand which pyramid you mean.” SHRDLU can also learn, to a certain extent. When Winograd began a question “Does a steeple—” the machine interrupted him with “Sorry, I don’t know the word ‘steeple.’ ” It was then told that “a steeple is a stack which contains two green cubes and a pyramid,” and was then asked to build one. It did, discovering for itself that the pyramid has to be on top. It can also correctly answer questions like “Does the shortest thing the tallest pyramid’s support supports support anything green?” Still, as Douglas Hofstadter, in his book “Gödel, Escher, Bach,” points out,SHRDLU has limitations, even within its limited context. “It cannot handle ‘hazy’ language,” Hofstadter says. If one asks it, for example, “How many blocks go on top of each other to make a steeple?” the phrase “go on top of each other” —which, despite its paradoxical character, makes sense to us—is too imprecise to be understood by the machine. We use phrases like this all the time without being conscious of how peculiar they are when they’re analyzed logically.
What are we to make of a program like SHRDLU? Or of one likeHEARSAY—designed by Raj Reddy, a former student of McCarthy’s at Stanford and now a professor of computer science at Carnegie-Mellon—which, on a limited basis, began to understand speech? Do these programs bring us closer to understanding how our minds work, or are they too “mechanistic” to give us any fundamental insight? Or is what they show, perhaps, that the closer we get to making machine models of ourselves, the less we begin to understand the functioning of the machines? Minsky and I discussed these questions at length, and also the question of whether the fact that we are beginning to learn to communicate with machines might help to teach us to communicate with one another or whether our sense of alienation from the machines will grow as they begin to perform more and more in domains that we have traditionally reserved for ourselves.
What would it mean to understand the mind? It is difficult to believe that such an understanding would consist of an enumeration of the brain’s components. Even if we had a diagram that included everyone of the billions of neurons and the billions of interconnections in the human brain, it would stare at us as mutely as the grains of sand in a desert. But this is the way—for most of us, at least—that a high-resolution microscope photograph of a silicon computer chip appears. Such a photograph, though, does not truly show the components—the atoms and the molecules. On the chip, these have been organized into functional units—memory, logic circuits—which can be understood and described. Minsky and others working in this field believe that in time the functional parts of the brain will be identified and their function described in language we can understand. Still, these people seem certain that the description, whatever it turns out to be, will not be like the great unifying descriptions in physics, in which a single equation, or a few equations, derived from what appear to be almost self-evident principles, can describe and predict vast realms of phenomena. Will the ultimate description of the brain resemble the description of a machine—in particular, that of a computer? The nervous systems of living organisms have been evolving on earth for more than three billion years, while the computer revolution has taken place only over the past forty years. We simply do not yet fully understand what computers can be made to do, and until this is clearer we cannot be sure what the final comparisons between mind and computer may be.
In the meantime, how should we view the machines we do have? Minsky and Papert, among others, see in the machines a great new opportunity for changing our methods of education. This is not because the machines can do arithmetic, say, better than we can but because “the computer provides a more flexible experience than anything else a child is likely to encounter,” Minsky said. He went on, “With it, a child can become an architect or an artist. Children can now be given resources for dealing with complex systems—resources that no one has ever had before. That’s one side of it. On the other side, dealing with a computer—as Seymour and I see it, at least—allows a child to have a whole new set of attitudes toward making mistakes, or what we call finding bugs. We have not been able to devise any other term for it. This attitude does not seem to get taught in schools, even though the concern there is to teach the truth. To really understand a mechanism—a piece of clockwork, for example—what you have to understand is what would happen if there were, for example, a tooth missing from a gear. In this case, part of the mechanism might spin very fast and set off a long chain of things that could end with the clock’s smashing itself to bits. To understand something like this, you must know what happens if you make a perturbation around the normal behavior—do the sort of thing that physicists do in what they call perturbation theory. We call this kind of information knowledge about bugs. Traditionally, such encounters are looked on as mistakes—something to be avoided. Seymour wanted to develop a working place for a child in which it would be a positive achievement when a child could find the things that could go wrong. If you know enough of those things, you get close to something like the truth. This is what happens with children who use computers in schoolroom environments that Seymour has set up, and in this the computers are essential, since their behavior is so flexible.”
Minsky continued, “We hope that when a child does something that does not quite work out he will say, ‘Oh, isn’t it interesting that I came out with this peculiar result. What procedure in my head could have resulted in something like this?’ The idea is that thinking is process and that if your thinking does something you don’t want it to do you should be able to say something microscopic and analytical about it, and not something enveloping and evaluative about yourself as a person. The important thing in refining your own thought is to try to depersonalize your interior; it may be all right to deal with other people in a vague, global way by having ‘attitudes’ toward them, but it is devastating if this is the way you deal with yourself.”
In the last few years, Minsky’s thoughts have ranged from the use of robotics both on earth and in space—he thinks that with a relatively small amount of technical improvement in robots automatic factories in space would be feasible—to the development of the human mind and its ability to cope with paradoxes. “Children’s innate learning mechanisms do not mature for a long time,” he said during one of my talks with him. “For example, a child usually doesn’t completely learn spatial perspective until he is about ten. If one is seated at a table with a six-year-old and there are several objects on the table and the child is asked to draw them not from the point of view of what he sees but from the point of view of someone who is sitting opposite him, the child will get the perspective wrong. Children won’t begin to get this right until they are ten or twelve. I suspect that this is one of many instances in which the computational ability to do many things, while it may be built in from the beginning, is not dispensed to you until later in life. It is like memory. Most of your memory capacity is very likely not available to you when you are a baby. If it were, you might fill it up with childish nonsense. The genetics is probably arranged to add computational features as you grow, whatever they may be—push down stacks, interrupt programs, all the kinds of things that computer scientists talk about. The hardware for these things is probably built in, but it makes more sense not to give them to the infant right away. He has to learn to use each of the pieces of machinery reliably before he is given the next one. If he were given too many at once, he would ruin them or make no use of them.”
Minsky paused, and then went on, “There is another side to this which occurred to me recently. I have often wondered why most people who learn a foreign language as adults never learn to speak it without an accent. I made up a little theory about that. What is a mother trying to do when she talks with her baby? What is her goal? I don’t think that it is to teach the baby English or some other adult language. Her goal is to communicate with the baby—to find out what it wants and to talk it out of some silly demands that she can’t satisfy. If she could really imitate the baby—speak its language without an accent—she would. But she can’t. Children can learn to speak their parents’ language without an accent, but not vice versa. I suspect there is a gene that shuts off that learning mechanism when a child reaches sexual maturity. If there weren’t, parents would learn their children’s language, and language itself would not have developed. A tribe in which adults lost their ability to imitate language at sexual maturity would have an evolutionary advantage, since it could develop a continuous culture, in which the communication between adult and child went in the right direction.
“There is something else that is interesting about children, and that is their attitude toward logical paradoxes. I have often discussed Zeno’s paradox with little kids. I ask a kid to try to walk halfway to a wall, and the kid does it. Then I say, ‘Now walk halfway from where you are now to the wall,’ and then I ask him what would happen if he kept that up. ‘Would you ever get to the wall?’ If the child appreciates the problem at all, what happens is that he says, ‘That is a very funny joke,’ and he begins to laugh. This seems to me to be very significant. It reminds me of the Freudian theory of humor. Something that is funny represents a forbidden thought that gets past the censor. These logical paradoxes are cognitively traumatic experiences. They set up mental oscillations that are almost painful—like trying to see both sides of the liar paradox: ‘The sentence that you are now reading is false.’ These intellectual jokes represent the same sort of threat to the intellect that sexy or sadistic jokes do to the emotions. The fact that we can laugh at them is valuable. It enables us to get by with an inconsistent logic.”
Minsky concluded, “To me, this is the real implication of Gödel’s theorem. It says that if you have a consistent mathematical system, then it has some limitations. The price you pay for consistency is a certain restrictiveness. You get consistency by being unable to use certain kinds of reasoning. But there is no reason that a machine or a mathematician cannot use an inconsistent system of logic to prove things like Gödel’s theorem and even understand that, just as Gödel did. I do not think that even Gödel would have insisted that he was a perfectly consistent system that never made a logical error—although, as far as I know, he never published one. If I am doing mathematical logic, I take great pains to work within one of those logical systems which are believed to be foolproof. On the other hand, as a working mathematician, I behave quite differently in everyday life. The image I have is that it is like ice skating. If you live in a conscientious community that does not try to prohibit everything, it will place red flags where the ice is thin, to tell you to be careful. When you are doing mathematics and you begin to discover that you are working with a function that has a peculiar behavior, you begin to see red flags that tell you to be careful. When you come to a sentence that says it’s false or you come to sentences that appear to be discussing things that resemble themselves, you get nervous. You say to yourself, ‘As a mathematician, I am on thin ice now.’ My view of mathematical thinking is like Freud’s view of everyday thinking. We have in our subconscious a number of little demons, or little parasites, and each of them is afraid of something. Right now, I am working on the society-of-the-mind theory. I believe that the way to understand intelligence is to have some parts of the mind that know certain things, and other parts of the mind that know things about the first part. If you want to learn something, the most important thing to know is which part of your mind is good at learning that kind of thing. I am not looking so much for a unified general theory. I am looking for an administrative theory of how the mind can have enough parts and know enough about each of them to solve all the problems it confronts. I am interested in developing a set of ideas about different kinds of simple learning machines, each one of which has as its main concern to learn what the others are good at. Eventually, I hope to close the circle, so that the whole thing can figure out how to make itself better. That, at least, is my fantasy.” ♦
A.I.
BY JEREMY BERNSTEIN
TABLE OF CONTENTS
CREDITPHOTOGRAPH BY LEONARD MCCOMBE / THE LIFE PICTURE COLLECTION / GETTY
In July of 1979, a computer program called BKG 9.8—the creation of Hans Berliner, a professor of computer Science at Carnegie-Mellon University, in Pittsburgh—played the winner of the world backgammon championship in Monte Carlo. The program was run on a large computer at Carnegie-Mellon that was connected by satellite to a robot in Monte Carlo. The robot, named Gammonoid, had a visual-display backgammon board on its chest, which exhibited its moves and those of its opponent, Luigi Villa, of Italy, who by beating all his human challengers a short while before had won the right to play against Gammonoid. The stakes were five thousand dollars, winner take all, and the computer won, seven games to one. It had been expected to lose. In a recent Scientific American article, Berliner wrote:
Not much was expected of the programmed robot. . . . Although the organizers had made Gammonoid the symbol of the tournament by putting a picture of it on their literature and little robot figures on the trophies, the players knew that existing microprocessors could not give them a good game. Why should the robot be any different?
This view was reinforced at the opening ceremonies in the Summer Sports Palace in Monaco. At one point the overhead lights dimmed, the orchestra began playing the theme of the film “Star Wars,” and a spotlight focused on an opening in the stage curtain through which Gammonoid was supposed to propel itself onto the stage. To my dismay the robot got entangled in the curtain and its appearance was delayed for five minutes.
This was one of the few mistakes the robot made. Backgammon is now the first board or card game with, in effect, a machine world champion. Checkers, chess, go, and the rest will follow—and probably quite soon. But what does this mean for us, for our sense of uniqueness and worth—especially as machines evolve whose output we can less and less distinguish from our own? Some sense of what may be in store is touched on in Berliner’s article:
I could hardly believe this finish, yet the program certainly earned its victory. There was nothing seriously wrong with its play, although it was lucky to have won the third game and the last. The spectators rushed into the closed room where the match had been played. Photographers took pictures, reporters sought interviews, and the assembled experts congratulated me. Only one thing marred the scene. Villa, who only a day earlier had reached the summit of his backgammon career in winning the world title, was disconsolate. I told him that I was sorry it had happened and that we both knew he was really the better player.
My own involvement with computers has been sporadic. I am of a generation that received its scientific education just before the time—the late nineteen-fifties—when the use of computers in scientific work became pervasive. I own and can operate one of the new, programmable pocket calculators. I once took a brief course in FORTRAN programming, and the ten-year-old son of a colleague of mine once gave me an afternoon’s worth of instruction in BASICprogramming language, which he uses to operate a typewriter-size computer in his father’s study. But as a theoretical physicist, I have avoided physics problems that have to be run off on large machines. Even so, I have read a great deal over the years about the new computer revolution and the age of the microprocessor: an age in which circuits with thousands of elements can be packed into a computer chip—a silicon wafer—so small that it can be inserted in the eye of a needle; in which the speed of machine operations is measured in billionths of a second; and in which the machines’ limitations resulting from the fact that electromagnetic signals propagate at only the speed of light are beginning to manifest themselves. There are so many books and articles on this subject and its implications that it is hard to distinguish one voice from another. But in all this computer literature I have constantly been delighted by what I have read by Marvin Minsky, who since 1974 has been the Donner Professor of Science at the Massachusetts Institute of Technology. In a paper entitled “Matter, Mind, and Models,” Minsky comments on free will:
If one thoroughly understands a machine or a program, he finds no urge to attribute “volition” to it. If one does not understand it so well, he must supply an incomplete model for explanation. Our everyday intuitive models of higher human activity are quite incomplete, and many notions in our informal explanations do not tolerate close examination. Free will or volition is one such notion: people are incapable of explaining how it differs from stochastic caprice but feel strongly that it does. I conjecture that this idea has its genesis in a strong primitive defense mechanism. Briefly, in childhood we learn to recognize various forms of aggression and compulsion and to dislike them, whether we submit or resist. Older, when told that our behavior is “controlled” by such-and-such a set of laws, we insert this fact in our model (inappropriately) along with other recognizers of compulsion. We resist “compulsion,” no matter from “whom.” Although resistance is logically futile, the resentment persists and is rationalized by defective explanations, since the alternative is emotionally unacceptable.
Later in the paper, Minsky writes:
When intelligent machines are constructed, we should not be surprised to find them as confused and as stubborn as men in their convictions about mind-matter, consciousness, free will, and the like. For all such questions are pointed at explaining the complicated interactions between parts of the self-model. A man’s or a machine’s strength of conviction about such things tells us nothing about the man or about the machine except what it tells us about his model of himself.
I have known Minsky for more than thirty years. When I first met him, in the late nineteen-forties, at Harvard, it was not entirely clear what his major academic field was—or, perhaps, what it wasn’t. He was taking courses in musical composition with the composer Irving Fine. Although he was an undergraduate, he had his own laboratories—one in the psychology department and one in the biology department—and he was writing what turned out to be a brilliant and original senior mathematics thesis on a problem in topology. For all his eclecticism, however, his basic interest seemed to be in the workings of the human mind. When he was a student, he has said, there appeared to him to be only three interesting problems in the world—or in the world of science, at least. “Genetics seemed to be pretty interesting, because nobody knew yet how it worked,” he said. “But I wasn’t sure that it was profound. The problems of physics seemed profound and solvable. It might have been nice to do physics. But the problem of intelligence seemed hopelessly profound. I can’t remember considering anything else worth doing.”
In later years, I had not been in touch with Minsky, but about a year ago, when I realized that something very new in the way of technology was engulfing us, I decided to look him up and ask him about it. I knew that he had been in the field of what is now called artificial intelligence, or A.I., even before it had a name. (The term “artificial intelligence” is usually attributed to John McCarthy, a former colleague of Minsky’s at M.I.T. McCarthy, a mathematician and now a professor of computer science at Stanford, coined the phrase in the mid-nineteen-fifties to describe the ability of certain machines to do things that people are inclined to call intelligent. In 1958, McCarthy and Minsky created the Artificial Intelligence Group at M.I.T., and it soon became one of the most distinguished scientific enterprises in the world.) During our talks, Minsky proved to be a fascinating conversationalist, with an engaging sense of humor and a luminous smile. He has one of the clearest minds I have ever encountered, and he is capable of elucidating the most complicated ideas in simple language—something that is possible only if one has a total mastery of the ideas. Our conversations took place both at his M.I.T. office and at his home, near Boston. He lives in a sprawling house with his wife, Gloria Rudisch, who is a prominent Boston pediatrician, and two of their children—Julie and Henry, eighteen-year-old twins. The Minskys’ oldest child, Margaret, who is twenty-three, graduated from M.I.T. and is now studying astronautics and designing educational programs for home computers.
That a doctor lives in the Minsky house one might deduce from various books and medical supplies at hand, but the interests of the other residents would be a real challenge to figure out. On a table during one of my visits I noticed a fireman’s hat with a red light on it, and, on another table, a sizable plastic shark. Mounted on a wall was a wrench so large that at first I took it for a playful sculpture of a wrench. On the wall near the wrench was what appeared to be a brass alpenhorn—one of several musical instruments in the house, the others being three pianos, two organs, and a Moog synthesizer. Minsky spends many hours composing and improvising, and hopes to make a record of some fugues he has composed in the baroque style. There were also innumerable recording instruments and a huge jukebox—a present from Minsky to his wife.
Minsky’s study, a crowded place, contains a computer terminal; a number of researchers in A.I. all around the country can exchange messages with one another over a computer network they established in 1969. Several times while I was there, Minsky paused to read his “mail”—messages on the terminal’s printout system. Near the telephone is a machine that I naïvely thought might be a stereo set. When Minsky saw me looking at it, he asked if I would like to listen to it. He flipped a few switches, and the machine began to make an uncanny series of ever more complex musical sounds. Minsky told me that some years ago he had taken a box of computer modules home to use in constructing logic circuits. He was having trouble debugging the circuits, because he did not have an oscilloscope—an instrument that renders the behavior of the circuits visible on a screen—and it occurred to him that if he ran computing circuits very fast and wired them to a loudspeaker he might be able to listen to them and tell by the sound if something was wrong. “I connected a couple of speakers to the circuits,” Minsky told me. “And I found that by listening to them I could tell if any of the flip-flops were dead.” Flip-flops are electronic components that can take one of two stable positions. “The machine was making all those sounds, and I started to like them. So I set up various circuits to make little chords and tunes. This thing was going one day when a friend of mine named Edward Fredkin, who’s a professor of computer science at M.I.T., came in, and he said, ‘That sounds pretty good. How did you get it to make those sounds?’ I showed him, and we spent the afternoon making more sounds. Fredkin formed a company to manufacture the machines as toys.”
Minsky’s office in the Artificial Intelligence Laboratory at M.I.T. is equally crowded. There is a plastic statue of a robot. There is a surprisingly lifelike cloth plant. There is also the inevitable computer terminal. The lab has its own large computer, which, over the years, has been rigged with just about every bit of programming anyone could think of. It can open doors in the lab and summon the elevators in the building; it has had mechanical arms attached to it, and special television cameras, to simulate vision, and a radio transmitter, to operate remote-controlled robots. There is also a trophy on it for a chess tournament it once won. Initially, the laboratory was in a ramshackle building that housed a Second World War electronics laboratory, but since 1963 it has been housed on three floors of a modern nine-story building overlooking Technology Square, just across the street from the main M.I.T. campus. About a hundred people work in it, including seven professors, most of them former students of Minsky’s; some twenty-five graduate students; and a corps of people whom Minsky refers to affectionately as hackers. These hackers—computer scientists call an elegant bit of programming a hack—are mostly people who entered M.I.T. and became infatuated with computers. Some never bothered to get their bachelor’s degree, but several have gone on to acquire advanced degrees.
One day, Minsky took me on a tour of the A.I. Laboratory, and explained something of its evolution. When he and McCarthy formed the Artificial Intelligence Group, it consisted only of them and a couple of students. About a year later, when Minsky and McCarthy were talking in a hallway at M.I.T., Jerome Wiesner, who was then directing the school’s Research Laboratory of Electronics, happened by and asked them what they were working on. He found their answers so interesting—McCarthy was initiating a system of time-sharing for computers and was also creating a new and extremely sophisticated computer language, and Minsky was beginning his attempts to get computers to do non-numerical things, such as reasoning by analogy—that he asked them if they needed money for their work. They said they could use a little money for equipment and for students. Not long before, Wiesner had received a joint grant from the armed services to do scientific research, so he was able to provide the money they needed. For some years, they never once had to write a research proposal. Things changed, though, and the laboratory now gets its money—some two and a half million dollars a year—from various government agencies, which require written proposals. In 1968, when the group formally became the Artificial Intelligence Laboratory, Minsky became its director—a job he held until 1973, when he got tired of writing the funding proposals and turned the directorship over to Patrick Winston, one of his former students.
On my tour of the lab, I noticed a giant drawing—perhaps six feet by fifteen—of what I thought at first might be the street plan of a large city. The drawing was taped to a wall on the eighth floor. Minsky told me that it was an engineering drawing of a computer chip, and that the lines were circuits that were photoengraved on the wafer. In fact, the drawing was of the circuitry on a chip that was an essential part of the first microcomputer designed expressly for artificial-intelligence work. That computer was designed by Gerald Sussman—a former student of Minsky’s, who is now a professor of electrical engineering at M.I.T.—and some of his students. Minsky then took me down to the third floor to see the actual chip. It was less than half an inch square, or roughly a hundred thousand times smaller than its circuitry diagram, and we had to put it under a microscope just to see the circuitry lines. On computer chips, a transistor exists where two circuitry lines cross; each transistor is about seven micrometres across—about the size of a red blood cell. The next generation of transistors will be only a fourth as large.
While we were on the third floor, Minsky also showed me a computer that he had designed and built. In 1970, he became convinced that a computer that could produce animated visual displays would be an extremely valuable aid in schools. “Even young children become deeply engaged with ideas about computers when they can literally see what they are doing by creating moving pictures on a screen,” Minsky told me. “So I designed this computer capable of making two million dots a second on the screen—enough for realistic animation effects.” By comparison, typical hobby computers can draw only a few thousand dots a second. Moreover, they cannot display a whole book-size page of text, so Minsky included a second screen on his computer which had room for six thousand alphabetical characters, so that children could edit their compositions on it. One fifth grader in Lexington, Massachusetts, programmed a garden of flowers that appeared on the screen and grew according to laws that the child wrote into his program.
Minsky called his computer the 2500, because he thought that its price for schools would be twenty-five hundred dollars. For a year, he immersed himself in its design, and learned to read circuit diagrams as if they were novels. “By the time I finished, I knew what happened in about two hundred different kinds of computer chips,” he told me. His work was helped along by the work of the Artificial Intelligence Laboratory at Stanford, which John McCarthy had formed in 1963: it had developed programs that automatically analyzed circuit diagrams for short circuits and other flaws. Using these programs on his own computer console, Minsky sat in his office and designed his machine. It needed some three hundred chips, which he ordered from the Texas Instruments catalogue. Its circuits required twenty-four pages of drawings. “Wiring a computer used to be a huge task,” Minsky noted. “But in this case my son, Henry, and I were able to do it ourselves by making use of a computer program that some of my friends at Stanford had written. It automatically did the most repetitive parts of the design and checked for mistakes. The best part was that we could put the whole thing on magnetic tape, which could be read by an automatic wiring machine that actually made the connections on the back of a huge panel of chip sockets. When that was done, we had to plug the three hundred chips in and attach power supplies, keyboards, and television screens. It wasn’t all that easy, but it did prove that a small group of people—together with a helpful computer-design program—could do better than a large industrial-design division.” By this time, Seymour Papert, a South African mathematician, had come to the A.I. Laboratory. In the late fifties, Papert had been working in Jean Piaget’s renowned child-psychology laboratory, in Geneva, and he had a professional interest in the education of children. He made a special mathematical language for the machine—one that he and Minsky thought children might like—called the LOGO language. Minsky showed me how to use it to get the machine to draw all sorts of polygons on its display screen and make some of them rotate like propellers. He hadn’t used the program for a while, and at one point he was stopped by a display that read, “POLY WANTS MORE DATA.” The data were supplied.
In the early seventies, Minsky and Papert formed a small company to market the machines, but within a few years it went broke. “Seymour and I weren’t very good at getting people to part with their money,” Minsky said. Many schools seemed to like the machine, but they often took as long as three years to come up with the money, and the company was hurt by the delays. “Our company ran out of money, because we had not realized how much time it would take for teachers to persuade school boards to plan budgets for such things,” Minsky told me. “Finally, we gave the company to a Canadian friend, who found that business people could learn Papert’s LOGO language as easily as children could. His company became successful—but in the field of business-data processing. Seymour and I went back to being scientists.” In the past year, the people working on LOGO have managed to find ways of programming it into some of the popular home computers, and Minsky and Papert are again trying to make it available to children, since the machines have now become cheap enough for schools to buy. In a few years, Minsky thinks, they should become as powerful as his original 2500.
Minsky was born in New York City on August 9, 1927. His father, Henry Minsky, was an eye surgeon who was also a musician and a painter, and he became head of the department of ophthalmology at Mount Sinai Hospital; his mother, Fannie, has been active in Zionist affairs. Minsky has two sisters—Ruth, who is younger, and Charlotte, who is older. Charlotte is an architect and a painter, and Ruth is a genetics counsellor for the Committee to Combat Huntington’s Disease. Minsky is like many other gifted mathematicians in that he can find no trace of a mathematical bent in his background, and also in that he has mathematical memories that go back to his earliest childhood. He recalls taking an intelligence test of some sort when he was about five. One of the questions was what the most economical strategy was for finding a ball lost in a field where the grass was so tall that the ball could not be seen immediately. The standard answer was to go to the center of the field and execute a spiral from the center until the ball was found. Minsky tried to explain to the tester that this was not the best solution, since, because you would have had to cross part of the field to get to the center in the first place, it would involve covering some of the area twice. One should start from the outside and spiral in. The memory of being unable to convince the tester of what appeared to Minsky to be an obvious logical point has never left him. “Everyone remembers the disillusion he experienced as a child on first discovering that an adult isn’t perfect,” he said recently. The five-year-old Minsky must have made a favorable impression nonetheless, since on the basis of the test results he was sent to an experimental public school for gifted children. He hated the school, because he was required to study tap dancing. Soon after his enrollment, however, his parents moved from Manhattan to Riverdale, in the Bronx, and he entered a public school there. He also disliked that one. “There were bullies, and I was physically terrorized,” Minsky told me. “Besides, a teacher wanted me to repeat the third grade because my handwriting was bad. My parents found this unreasonable, so in 1936, when I was in the fourth grade, I was sent to Fieldston—a progressive private school.
J. Robert Oppenheimer had graduated from Fieldston (the branch on Central Park West, then named the Ethical Culture School) in 1921, and while Minsky was there the memory of Oppenheimer’s student days was still fresh. “If you did anything astonishing at Fieldston, some teacher would say, ‘Oh, you’re another Oppenheimer,’ ” Minsky recalled. “At the time, I had no idea what that meant. Anyway, at Fieldston I had a great science teacher—Herbert Zim. Later on, he wrote a whole series of science books for children. He lives in Florida now, and I call him up every once in a while to chat.”
By the time Minsky was in the fifth grade, he had become interested in both electronics and organic chemistry. “I had been reading chemistry books, and I thought it would be nice to make some chemicals,” he told me. “In particular, I had read about ethyl mercaptan, which interested me because it was said to be the worst-smelling thing around. I went to Zim and told him that I wanted to make some. He said, ‘Sure. How do you plan to do it?’ We talked about it for a while, and he convinced me that if we were going to be thorough we should first make ethanol, from which we were to make ethyl chloride. I did make the ethanol and then the ethyl chloride, which instantly disappeared. It’s about the most volatile thing there is. I think Zim had fooled me into doing this synthesis knowing that the product would evaporate before I actually got to make that awful mercaptan. I remember being sort of mad, and deciding that chemistry was harder than it looked on paper, because when you synthesize something it can just disappear.”
Minsky finished the eighth grade at Fieldston in 1941, and in the fall of that year he entered the Bronx High School of Science. Bronx Science had been created just three years before to attract and train young people interested in the sciences. (Two of the 1979 Nobel laureates in physics—Steven Weinberg and Sheldon Glashow—were classmates at Bronx Science in the late forties, along with Gerald Feinberg, who is now the chairman of the physics department at Columbia, and during their senior year there the three taught themselves quantum mechanics.) “The other kids were people you could discuss your most elaborate ideas with and nobody would be condescending,” Minsky said in recalling his experience there. “Talking to people in the outside world was always a pain, because they would say, ‘Don’t be so serious—relax.’ I used to hate people saying ‘Relax.’ I was a hyperactive child—always zipping from one place to the next and doing things very fast. This seemed to bother most adults. But no one at Science felt that way. Later, when I went to Harvard, I was astonished at how much easier the course work was there than it had been at Science. I keep running across people I knew at Science—including Russell Kirsch, a computer pioneer who’s now at the National Bureau of Standards, and Anthony Oettinger, who’s a professor of applied mathematics and information-resources policy at Harvard. He was one of the first people to get a computer to learn something and to use computers for language translation. Frank Rosenblatt, who was tragically drowned in a boating accident in 1971, was also one of my classmates at Science.” Rosenblatt, who was a pioneer in artificial intelligence, became a researcher at the Cornell Aeronautical Laboratory, where he invented what was called the Perceptron. In the early nineteen-sixties, the Perceptron became the prototypical artificial-intelligence machine for a generation of young computer scientists.
In 1944, Minsky’s parents decided to send him to Andover for his senior year, reasoning that it would be easier for him to get into college as an Andover graduate. The year at Andover left Minsky with mixed feelings, because he found he was not permitted to devote himself exclusively to science. When he finished his year there, it was June of 1945 and he was seventeen. The war was still on, and he enlisted in the Navy. He had been told that if he enlisted in a particular Navy program he would be sent to electronics school. “Everybody was a bit suspicious about such a promise, since we felt that you couldn’t trust the government in something like that,” he recalled. “But it turned out to be true, and I was sent to the Great Lakes Naval Training Center, north of Chicago, to start my training. There were about a hundred and twenty people in my company, and most of them seemed very alien and rather scary. They were regular recruits from the Midwest and places like that. I could hardly understand what they said, and they certainly couldn’t understand what I was talking about. They provided my first—and, essentially, my last—contact with nonacademic people. But about forty of the people in my company were enrolled in the same sort of electronics program that I was. After we completed basic training, which involved firing rifles and anti-aircraft guns, we were going to be sent to radar school. There were maybe four people in my company who were really remarkable, including a mathematician, an astronomer, and a young musician named David Fuller. Fuller had been at Harvard for a year and was an organist. He took my music very seriously. By this time, I had sort of drafted a piano concerto, which Fuller liked a lot and said I should finish. I never did, though. Our little group was a strange kind of mini-Harvard in the middle of the Navy. Everything seemed very unrealistic. I practiced shooting down planes on an anti-aircraft simulator. I held the base record. I ‘shot down’ a hundred and twenty planes in a row. I realized that I had memorized the training tape and knew in advance exactly where each plane would appear. But I must have some odd skill in marksmanship. Many years later, my wife and I were in New Mexico on a trip. We came across some kids shooting at things with a rifle. I asked them if I could try it, and I hit everything. It seems that I have a highly developed skill at shooting things, for which there is no explanation.”
Minsky had been at the training center only two months when the war ended. “There really wasn’t anything for us to do, so we just spent a couple of months chatting. I finished up my term of enlistment at a naval base in Jacksonville, Florida, and then I was discharged—in time for me to go to Harvard as a freshman,” he told me.
Minsky entered Harvard in September of 1946 and found the place a revelation—a sort of intellectual garden salad, “a whole universe of things to do.” He said, “The only thing I was worried about was English, because there was a required English composition course. The thing I had always disliked most of all in school was writing. I could never think of anything to write about. Now I love to write. Anyway, they had a test that, if you passed, could get you out of the required course. I passed, and it was one of the best things that happened to me. I felt that I would not have to do the one thing I hadn’t liked in high school. In this test, we were supposed to interpret a couple of passages from Dostoevski, and in a perfectly straightforward way I explained what they were about. Apparently, whoever was reading all those things was tired of reading the long ones that the other students had done, and he passed me. I always enjoyed the challenge of school tests, but I never liked the idea of tests, so, as a professor, I have never given any. I make all my students write a paper instead. I don’t care how long it takes them. If they take a year or two, so much the better. Anyway, I took freshman physics and advanced calculus at Harvard—I had learned elementary calculus at Andover. I was nominally a physics major, but I also took courses in sociology and psychology. I got interested in neurology. Around the end of high school, I had started thinking about thinking. One of the things that got me started was wondering why it was so hard to learn mathematics. You take an hour a page to read this thing, and still it doesn’t make sense. Then, suddenly, it becomes so easy it seems trivial. I began to wonder about the learning process and about learning machines, and I invented some reinforcement theories. I came across the theories of B. F. Skinner, which I thought were terrible, because they were an attempt to fit curves to behavior without any internal ideas. Up until this time, I had been almost pathologically uninterested in how minds work. I wasn’t at all good at guessing how people felt. I think I was generally insensitive—almost intentionally insensitive—to people’s feelings and thoughts. I was interested only in what I was doing. But in my freshman year I began to get interested in psychological issues. After I had done some reading in neurology, I talked a professor of zoology, John Welsh, into letting me do some laboratory work on my own. For some reason, he gave me a huge room with a lot of equipment all to myself.”
Welsh told Minsky that one unsolved problem was how the nerves in a crayfish claw worked. “I became an expert at dissecting crayfish,” Minsky said. “At one point, I had a crayfish claw mounted on an apparatus in such a way that I could operate the individual nerves. I could get the several-jointed claw to reach down and pick up a pencil and wave it around. I’m not sure that what I was doing had much scientific value, but I did learn which nerve fibres had to be excited to inhibit the effects of another fibre so that the claw would open. And it got me interested in robotic instrumentation—something that I have now returned to. I am trying to build better micromanipulators for surgery, and the like. There hasn’t been much progress in that field for decades, and I’m determined to make some.”
When Minsky was not doing physics or working on his crayfish project, he began hanging around the psychology laboratory, which was then in the basement of Memorial Hall. “The people down in that basement fascinated me,” Minsky said. “There were Skinner and his people at the western end. While the theory that they were working with was of no interest to me, theyhad been able to optimize the training of animals—get them to do things in a shorter time and with less reward than anyone else could. Clearly, there was something in their technique that should be understood. At the other end of the basement were people who were also called psychologists, and who were totally removed from the sort of thing that Skinner did. For example, there was a man who was trying to show that the sensitivity of the ear operated according to a power law rather than a logarithmic one. I could never make any sense of why that was so important, and still can’t; presumably, both theories are false. In the middle of the basement were some young assistant professors who were new kinds of people. There were young George Miller—now a professor of psychology at Princeton—who was trying to make some mathematical theories of psychology, and with whom I spent lots of time, and J. C. R. Licklider, with whom I later worked. He ran a wonderful seminar at that time, mostly of graduate students, with a few undergraduates. I worked with Miller on theories of problem-solving and learning, and with Licklider on theories of perception and brain models. Many years later, I had a chance to work with Licklider again on designing computer programs. It was a whole universe in that basement, but the things that affected me most were the geometry of it and the fact that it was underground and away from the world. On the west were the behaviorists, who were trying to understand behavior without a theory; on the east were the physiological psychologists, who were trying to understand some little bit of the nervous system without any picture of the rest; and in the middle were these new people who were trying to make little theories that might have something to do with language and learning and the like but weren’t really getting anywhere. Even farther underground was the physicist Georg von Békésy. He was in the subbasement. He didn’t bother anyone but just worked on the real problem of how the ear functions.” In 1961, von Békésy became the first physicist to win a Nobel Prize in Physiology or Medicine, for his work on the ear.”
Minsky paused, and then continued, “What bothered me most about the whole situation was the graduate students who were trying to learn from these people. They would gather in the middle of the basement and argue about one doctrine or another—the politics of the situation and the merits of the different schools. They never seemed to have any good ideas. There was something terrifying about this clash of two different worlds—the physiological and the behaviorist. There were no psychoanalytically oriented people around them. If there had been, the situation would have been even worse. I couldn’t fathom how these people could live down there arguing about personalities, with no methodology, no ideas about what to do, and no real theories of what was happening deep inside the mind. So I tried to make one up. I imagined that the brain was composed of little relays—the neurons—and each of them had a probability attached to it that would govern whether the neuron would conduct an electric pulse; this scheme is now known technically as a stochastic neural network. I tried to explain Skinner’s results by finding some plausible way for a reward sensor to change the probabilities to favor learning. It turned out that a man in Montreal named Donald Hebb had come up with a similar theory, but at the time, luckily or unluckily, I didn’t know of his work, which he described soon afterward in a seminal book, “The Organization of Behavior,” published in 1949. So I had a laboratory in the psychology department and one in the biology department and I was doing experimental work, mostly in physical optics, in the physics department, where I was nominally majoring. My grades were fairly low. I had also taken a number of music courses with Irving Fine. He usually gave me C’s or D’s, but he kept encouraging me to come back. He was a tremendously honest man. I think the problem was that I was basically an improviser—one of those people who can occasionally improvise an entire fugue in satisfactory form without much conscious thought or plan. The trouble is that the more I work on a piece deliberately, the worse it gets. I tried learning to write scores, but I guess I never committed myself to the effort it takes. During most of this time at Harvard, I didn’t very much care what would happen to me in the future, but then, in my senior year, I began to worry about graduate school. I thought that what I would do would be to write a nice undergraduate thesis to make up for my grades. I discovered that at Harvard you couldn’t do an undergraduate thesis in physics, so in my last semester I switched to the mathematics department, where you could do a thesis. This was not a problem, since I had taken enough mathematics courses to qualify me as a math major.”
Early in his college days, Minsky had had the good fortune to encounter Andrew Gleason. Gleason was only six years older than Minsky, but he was already recognized as one of the world’s premier problem-solvers in mathematics; he seemed able to solve any well-formulated mathematics problem almost instantly. Gleason had served in the Navy, in cryptanalysis, during the war, and then had become a junior fellow at Harvard. (The fellowships allowed unlimited freedom for a small number of creative people in various fields.) Gleason made a tremendous impression on Minsky. “I couldn’t understand how anyone that age could know so much mathematics,” Minsky told me. “But the most remarkable thing about him was his plan. When we were talking once, I asked him what he was doing. He told me that he was working on Hilbert’s fifth problem.”
In 1900, David Hilbert, who is generally regarded as the greatest mathematician of the twentieth century, delivered a paper entitled “Mathematical Problems” to the Second International Congress of Mathematicians, in Paris, in which he presented a list of what he believed to be the most important unsolved problems in mathematics. (Hilbert’s full list consisted of twenty-three problems, but he presented only ten in his talk.) This list has all but defined mathematics for much of this century. Many of the problems have now been solved. At least one of them has been shown to be insoluble in principle; it falls into Gödel’s category of formally undecidable mathematical propositions. The sixth problem—“To axiomatize those physical sciences in which mathematics plays an important role”—is probably too vague to have a real solution. Some have not yet been solved. Most of the twenty-three problems have opened up entirely new fields of mathematics. In his lecture, Hilbert said, “This conviction of the solvability of any mathematical problem is a strong incentive in our work; it beckons us: This is the problem, find its solution. You can find it by pure thinking, since in mathematics there is no Ignorabimus!” Hilbert’s fifth problem was a deep conjecture in the theory of topological groups. In mathematics, a group is a collection of abstract objects that can be combined by some operation to make a sort of multiplication table; a topological group has in addition a “topology”—a kind of generalized geometry. Minsky recalls his conversation with Gleason vividly. “First, I managed to understand what the problem was,” he told me. “Then I asked Gleason how he was going to solve it. Gleason said he had a plan that consisted of three steps, each of which he thought would take him three years to work out. Our conversation must have taken place in 1947, when I was a sophomore. Well, the solution took him only about five more years, with Deane Montgomery, of the Institute for Advanced Study, and Leo Zippin, of Queens College, contributing part of the proof. But here I was, a sophomore, talking to this man who was only slightly older than I was, and he was talking about a plan like that. I couldn’t understand how anyone that age could understand the subject well enough to have such a plan and to have an estimate of the difficulty in filling in each of the steps. Now that I’m older, I still can’t understand it. Anyway, Gleason made me realize for the first time that mathematics was a landscape with discernible canyons and mountain passes, and things like that. In high school, I had seen mathematics simply as a bunch of skills that were fun to master—but I had never thought of it as a journey and a universe to explore. No one else I knew at that time had that vision, either.”
Inspired by Gleason, Minsky began work the fall of his senior year on an original problem in topology. Early in this century, the great Dutch mathematician L. E. J. Brouwer proved the first of what are known as fixed-point theorems. Imagine that one attempts to rearrange the surface of an ordinary sphere by taking each point on it and moving it somewhere else on the sphere. This is what mathematicians call a mapping of the surface of the sphere onto itself. Under very general assumptions, Brouwer managed to show that in any such mapping there would be at least one point that would necessarily remain fixed—it would necessarily be mapped onto itself. One example of a fixed-point theorem is the rigid rotation of a sphere—for example, the surface of the earth. In this case, there are two fixed points—the north and south poles—around which the rotation takes place. Over the years, mathematicians have generalized this theorem in all sorts of surprising ways. For instance, one can use similar ideas to show that at any time there must be two points at opposite ends of the globe which have exactly the same temperature and humidity. Minsky happened to read that Shizuo Kakutani, a mathematician at Yale, had proved, essentially, that at each moment there are three points on the earth situated at the vertices of an equilateral triangle at which the temperature is the same. “I became convinced that Kakutani had not got the most general result out of his logic,” Minsky recalled. “So I proved it first for three of the four corners of a square and then for any three points of a regular pentagon. This required going into a space of a higher dimension. So I went into this higher dimension for a couple of months, living and breathing my problem. Finally, using the topology of knots in this dimension, I came out with a proof. I wrote it up and gave it to Gleason. He read it and said, ‘You are a mathematician.’ Later, I showed the proof to Freeman Dyson, at the Institute for Advanced Study, and he amazed me with a proof that there must be at least one square that has the same temperature at all four vertices. He had found somewhere in my proof a final remnant of unused logic.”
I asked Minsky if he had published his proof.
“No,” he replied. “At the time, I was influenced by the example of my father. When he made a surgical discovery, he would take six or seven years to write it up—correcting it and doing many more operations to make sure that he was right. I felt that a successful scientist might publish three or four real discoveries in his lifetime, and should not load up the airwaves with partial results. I still feel that way. I don’t like to take some little discovery and make a whole paper out of it. When I make a little discovery, either I forget about it or I wait until I have several things that fit together before I write them up. In any case, at the time Gleason said ‘You are a mathematician’ he also said ‘Therefore you should go to Princeton.’ At first, I felt rejected. I was perfectly happy at Harvard, and I didn’t see why I should go somewhere else for graduate school. But Gleason insisted that it would be wrong for me to stay in one place. So I presented myself at Princeton, to the mathematics department, the next year.”
Minsky found the mathematics department at Princeton to be another perfect world. “It was like a club,” he told me. “The department admitted only a handful of graduate students each year, mostly by invitation. It was run by Solomon Lefschetz. He was a man who didn’t care about anything except quality. There were no exams. Once, I got a look at my transcript. The graduate school required grades. Instead of the usual grades, all the grades were A’s—many of them in courses I had never taken. Lefschetz felt that either one was a mathematician or one wasn’t, and it didn’t matter how much mathematics one actually knew. For the next three years, I hung around a kind of common room that Lefschetz had created for the graduate students, where people came to play go and chess as well as new games of their own invention, and to talk about all sorts of mathematics. For a while, I studied topology, and then I ran into a young graduate student in physics named Dean Edmonds, who was a whiz at electronics. We began to build vacuum-tube circuits that did all sorts of things.”
As an undergraduate, Minsky had begun to imagine building an electronic machine that could learn. He had become fascinated by a paper that had been written, in 1943, by Warren S. McCulloch, a neurophysiologist, and Walter Pitts, a mathematical prodigy. In this paper, McCulloch and Pitts created an abstract model of the brain cells—the neurons—and showed how they might be connected to carry out mental processes such as learning. Minsky now thought that the time might be ripe to try to create such a machine. “I told Edmonds that I thought it might be too hard to build,” he said. “The one I then envisioned would have needed a lot of memory circuits. There would be electronic neurons connected by synapses that would determine when the neurons fired. The synapses would have various probabilities for conducting. But to reinforce ‘success’ one would have to have a way of changing these probabilities. There would have to be loops and cycles in the circuits so that the machine could remember traces of its past and adjust its behavior. I thought that if I could ever build such a machine I might get it to learn to run mazes through its electronics—like rats or something. I didn’t think that it would be very intelligent. I thought it would work pretty well with about forty neurons. Edmonds and I worked out some circuits so that—in principle, at least—we could realize each of these neurons with just six vacuum tubes and a motor.”
Minsky told George Miller, at Harvard, about the prospective design. “He said, ‘Why don’t we just try it?’ ” Minsky recalled. “He had a lot of faith in me, which I appreciated. Somehow, he managed to get a couple of thousand dollars from the Office of Naval Research, and in the summer of 1951 Dean Edmonds and I went up to Harvard and built our machine. It had three hundred tubes and a lot of motors. It needed some automatic electric clutches, which we machined ourselves. The memory of the machine was stored in the positions of its control knobs—forty of them—and when the machine was learning it used the clutches to adjust its own knobs. We used a surplus gyropilot from a B-24 bomber to move the clutches.”
Minsky’s machine was certainly one of the first electronic learning machines, and perhaps the very first one. In addition to its neurons and synapses and its internal memory loops, many of the networks were wired at random, so that it was impossible to predict what it would do. A “rat” would be created at some point in the network and would then set out to learn a path to some specified end point. First, it would proceed randomly, and then correct choices would be reinforced by making it easier for the machine to make this choice again—to increase the probability of its doing so. There was an arrangement of lights that allowed observers to follow the progress of the rat—or rats. “It turned out that because of an electronic accident in our design we could put two or three rats in the same maze and follow them all,” Minsky told me. “The rats actually interacted with one another. If one of them found a good path, the others would tend to follow it. We sort of quit science for a while to watch the machine. We were amazed that it could have several activities going on at once in its little nervous system. Because of the random wiring, it had a sort of fail-safe characteristic. If one of the neurons wasn’t working, it wouldn’t make much of a difference—and, with nearly three hundred tubes and the thousands of connections we had soldered, there would usually be something wrong somewhere. In those days, even a radio set with twenty tubes tended to fail a lot. I don’t think we ever debugged our machine completely, but that didn’t matter. By having this crazy random design, it was almost sure to work, no matter how you built it.”
Minsky went on, “My Harvard machine was basically Skinnerian, although Skinner, with whom I talked a great deal while I was building it, was never much interested in it. The unrewarded behavior of my machine was more or less random. This limited its learning ability. It could never formulate a plan. The next idea I had, which I worked on for my doctoral thesis, was to give the network a second memory, which remembered after a response what the stimulus had been. This enabled one to bring in the idea of prediction. If the machine or animal is confronted with a new situation, it can search its memory to see what would happen if it reacted in certain ways. If, say, there was an unpleasant association with a certain stimulus, then the machine could choose a different response. I had the naïve idea that if one could build a big enough network, with enough memory loops, it might get lucky and acquire the ability to envision things in its head. This became a field of study later. It was called self-organizing random networks. Even today, I still get letters from young students who say, ‘Why are you people trying to program intelligence? Why don’t you try to find a way to build a nervous system that will just spontaneously create it?’ Finally, I decided that either this was a bad idea or it would take thousands or millions of neurons to make it work, and I couldn’t afford to try to build a machine like that.”
I asked Minsky why it had not occurred to him to use a computer to simulate his machine. By this time, the first electronic digital computer—named ENIAC, for “electronic numerical integrator and calculator”—had been built, at the University of Pennsylvania’s Moore School of Electrical Engineering; and the mathematician John von Neumann was completing work on a computer, the prototype of many present-day computers, at the Institute for Advanced Study.
“I knew a little bit about computers,” Minsky answered. “At Harvard, I had even taken a course with Howard Aiken”—one of the first computer designers. “Aiken had built an electromechanical machine in the early forties. It had only about a hundred memory registers, and even von Neumann’s machine had only a thousand. On the one hand, I was afraid of the complexity of these machines. On the other hand, I thought that they weren’t big enough to do anything interesting in the way of learning. In any case, I did my thesis on ideas about how the nervous system might learn. A couple of my fellow graduate students—Lloyd Shapley, a son of the astronomer Harlow Shapley, and John Nash—helped out with a few points, and occasionally I talked to von Neumann. He was on my thesis committee, along with John W. Tukey and A. W. Tucker, who had succeeded Lefschetz as chairman of the mathematics department. Later, Tucker told me that he had gone to von Neumann and said, ‘This seems like very interesting work, but I can’t evaluate it. I don’t know whether it should really be called mathematics.’ Von Neumann replied, ‘Well, if it isn’t now, it will be someday—let’s encourage it.’ So I got my Ph.D.”
That was in 1954. “I hadn’t made any definite plans about what to do after I got the degree, but, the year before, some interesting people had come along and said that they were starting a new kind of department at Tufts, which was to be called systems analysis, and that if I came I could do anything I wanted to,” Minsky said. “I wanted to get back to Boston, so I had joined them, and I finished my doctoral thesis up there. Soon afterward, Senator Joseph McCarthy made a vicious attack on several members of the group, and its funding vanished. But then Gleason came to me and said that I should be a junior fellow at Harvard. He nominated me, and my nomination was supported by Claude Shannon, von Neumann, and Norbert Wiener. The only obligation I had was to dine with the other junior fellows on Monday evenings. It was a welcome opportunity for me, because I was trying to make general theories about intelligence—in men or machines—and I did not fit into any department or profession. I began to think about how to make an artificial intelligence. I spent the next three years as a junior fellow. There were about thirty of us, sort of one from each field—thirty gifted children.”
Two years after Minsky began his fellowship, one of the more important events in the history of artificial intelligence occurred. This was the Dartmouth Summer Research Project on Artificial Intelligence, which took place in the summer of 1956. Earlier that year, Minsky and three colleagues—John McCarthy, who had been one of Minsky’s fellow graduate students at Princeton and was now a professor of mathematics at Dartmouth; Nathaniel Rochester, who was manager of information research at the I.B.M. laboratory in Poughkeepsie; and Claude Shannon, a mathematician at the Bell Telephone Laboratories in Murray Hill, New Jersey, for whom Minsky had worked in the summer of 1952—submitted a proposal to the Rockefeller Foundation for a conference on what McCarthy called artificial intelligence; their proposal suggested that “every aspect of learning or any other feature of intelligence” could be simulated. The Rockefeller Foundation found the proposal interesting enough to put up seventy-five hundred dollars for the conference. Needless to say, twenty-five years later the several participants in the conference have different ideas of what its significance was. Minsky told me a few of the things that struck him. “My friend Nat Rochester, of I.B.M., had been programming a neural-network model—I think he got the idea from Donald Hebb’s book ‘The Organization of Behavior,’ and not from me—on the I.B.M. 701 computer,” Minsky recalled. “His model had several hundred neurons, all connected to one another in some terrible way. I think it was his hope that if you gave the network some simultaneous stimuli it would develop some neurons that were sensitive to this coincidence. I don’t think he had anything specific in mind but was trying to discover correlations—something that could have been of profound importance. Nat would run the machine for a long time and then print out pages of data showing the state of the neural net. When he came to Dartmouth, he brought with him a cubic foot of these printouts. He said, ‘I am trying to see if anything is happening, but I can’t see anything.’ But if one didn’t know what to look for one might miss any evidence of self-organization of these nets, even if it did take place. I think that that is what I had been worried about when I decided not to use computers to study some of the ideas connected with my thesis.” The other thing that struck Minsky at Dartmouth has by now become one of the great legends in the field of artificial intelligence. It is the sequence of events that culminated when, in 1959, for the first time, a computer was used—by Herbert Gelernter, a young physicist with I.B.M.—to prove an interesting theorem in geometry.
I had come across so many versions of this story that I was especially interested in hearing Minsky’s recollection. Sometime in the late spring of 1956, Minsky had become interested in the idea of using computers to prove the geometric theorems in Euclid. During that spring, he began to reread Euclid. “If you look through Euclid’s books, you find that he proves hundreds of theorems,” he told me. “I said to myself, ‘There are really only a small number of types of theorems. There are theorems about proving that angles are equal, there are theorems about circles intersecting, there are theorems about areas, and so forth.’ Next, I focussed on the different ways Euclid proves, for example, that certain angles are equal. One way is to show that the angles are in congruent triangles. I sketched all this out on a few pieces of paper. I didn’t have a computer, so I simulated one on paper. I decided to try it out on one of Euclid’s first theorems, which is to prove that the base angles of an isosceles triangle are equal. I started working on that, and after a few hours—this was during the Dartmouth conference—I nearly jumped out of my chair.”
To understand Minsky’s excitement, one must look at an isosceles triangle:
We are given that the line segments AB and BC are equal; the problem is to show that the base angles a and c are equal. To prove this, one has to show that the angles a and c are in congruent triangles. Minsky recalled saying to himself, “My problem is to design a machine to find the proof. Any student can find a proof. I mustn’t tell the machine exactly what to do. That would eliminate the problem. I have to give it some general techniques that it can use for itself—ways that might work. For example, I could tell it that the angles a and c might lie in congruent triangles. I would also have to tell it how to decide if two triangles were congruent. I made a diagram of how the machine could use them by trying new combinations when old ones failed. Once I had this set up, I pretended I was the machine and traced out what I would do. I would first notice that the angle a is in the triangle BAC but the angle c is in the triangle BCA. My machine would be able to figure this out. Next, it would ask if these two triangles were congruent. It would start comparing the triangles. It would soon notice that these were the same triangle with different labellings. Its techniques would lead it to make this identification. That’s when I jumped out of my chair. The imaginary machine had found a proof, and it wasn’t even the same proof that is given in Euclid. He constructed two new triangles by dropping a perpendicular from B to the line AC. I had never heard of this proof, although it had been invented by Pappus, a Greek geometer from Alexandria, who lived six hundred years after Euclid. It is sometimes credited to Frederick the Great. I thought that my program would have to go on a long logical search to find Euclid’s proof. A human being—Euclid, for example—might have said that before we prove two triangles are congruent we have to make sure that there are two triangles. But my machine was perfectly willing to accept the idea that BAC and BCA are two triangles, whereas a human being feels it’s sort of degenerate to give two names to the same object. A human being would say, ‘I don’t have two houses just because my house has a front door and a back door.’ I realized that, in a way, my machine’s originality had emerged from its ignorance. My machine did not realize that BAC and BCA are the same triangle—only that they have the same shapes. So this proof emerges because the machine doesn’t understand what a triangle is in the many deep ways that a human being does—ways that might inhibit you from making this identification. All it knows is some logical relationships between parts of triangles—but it knows nothing of other ways to think about shapes and space.”
Minsky smiled and went on, “For me, the rest of the summer at Dartmouth was a bit of a shambles. I said, ‘That was too easy. I must try it on more problems.’ The next one I tried was ‘If the bisectors of two of a triangle’s angles are equal in length, then the triangle has two equal sides.’ My imaginary machinery couldn’t prove this at all, but neither could I. Another junior fellow at Harvard, a physicist named Tai Tsun Wu, showed me a proof that he remembered from high school, in China. But Nat Rochester was very impressed by the first proof, and when he went back to I.B.M. after the summer he recruited Gelernter, who had just got his doctorate in physics and was interested in computers, to write a program to enable a computer to prove a geometric theorem. Now, a few months earlier, a new computer language called I.P.L.—for ‘information-processing language’—had been invented by Allen Newell, J. C. Shaw, and Herbert Simon, working at the Rand Corporation and the Carnegie Institute of Technology.” Newell and Shaw were computer scientists, both of whom worked for Rand, but Newell was getting his doctorate at Carnegie Tech with Herbert Simon, who was in fact a professor at the Graduate School of Industrial Administration. In 1978, Simon was awarded the Nobel Prize in Economic Science. “It was John McCarthy’s notion to combine some of I.P.L.’s ideas with those of FORTRAN—the I.B.M. programming language that was in the process of being developed—to make a new language in which the geometry program would be written,” Minsky went on. “Gelernter found ways of doing this. He called his new language FLPL, for ‘FORTRAN List-Processing Language.’ FORTRAN, by the way, stands for ‘formula translation.’ Well, FLPLnever got much beyond I.B.M. But a couple of years later McCarthy, building on I.P.L. and Gelernter’s work and combining this with some ideas that Alonzo Church, a mathematician at Princeton, had published in the nineteen-thirties, invented a new language called LISP, for ‘list-processing,’ which became our research-computer language for the next generation.” By 1959, Gelernter had made his program work. Having done that, he gave it the job of proving that the base angles of an isosceles triangle are equal. The computer found Pappus’ proof.
In 1957, Minsky became a member of the staff of M.I.T.’s Lincoln Laboratory, where he worked with Oliver Selfridge, one of the first to study computer pattern-recognition. The following year, Minsky was hired by the mathematics department at M.I.T. as an assistant professor, and that year he and McCarthy, who had come to M.I.T. from Dartmouth the year following the conference, started the A.I. Group. McCarthy remained at M.I.T. for four more years, and during that time he originated or completed some developments in computer science that have since become a fundamental part of the field. One of these was what is now universally known as time-sharing. “The idea of time-sharing was to arrange things so that many people could use a computer at the same time instead of in the traditional way, in which the computer processed one job after another,” Minsky explained to me. “In those days, it usually took a day or two for the computer to do anything—even a job that needed just two seconds of the computer’s time. The trouble was that you couldn’t work with the computer yourself. First, you’d write your program on paper, and then punch holes in cards for it. Then you’d have to leave the deck of cards for someone to put in the computer when it finished its other jobs. This could take a day or two. Then, most programs would fail anyway, because of mistakes in concept—or in hole punching. So it could take ten such attempts to make even a small program work—an entire week wasted. This meant that weeks could pass before you could see what was wrong with your original idea. People got used to the idea that it should take months to develop interesting programs. The idea of time-sharing was to make the computer switch very quickly from one job to another. At first, it doesn’t sound very complicated, but it turned out that there were some real problems. The credit for solving them goes to McCarthy and to another M.I.T. computer scientist, Fernando Corbató, and to their associates at M.I.T.”
Minsky went on, “One of the problems was that if you want to run several jobs on a computer, you need ways to change quickly what is in the computer’s memory. To do that, we had to develop new kinds of high-speed memories for computers. One trick was to develop ways to put new information into the memories while taking other information out. That doubled the speed. A more basic problem was something that we called memory protection. One had to arrange things so that if there were several pieces of different people’s programs in a computer one piece could not damage another one by, say, erasing it from the main memory. We introduced what we called protection registers to prevent this from happening. Without them, the various users would have interacted with one another in unexpected ways. One of the most interesting aspects of all this was that for a long time we couldn’t convince the computer manufacturers that what we were doing was important. They thought that time-sharing was a waste of time, so to say. I think that many of them were confused about the difference between what is called time-sharing and what is called multiprocessing, which means having different parts of the computer running different parts of someone’s program at the same time—something totally different from the idea of many people sharing the same computer nearly simultaneously, with each user getting a fraction of a second on the machine. I.B.M., for example, was working on a system in which a program was being run, another one was being written on tape, and a third one was being prepared—all simultaneously. That was not what we had in mind. We wanted, say, a hundred users to be able to make use of the hardware at once. It took several years before we got a computer manufacturer to take this seriously. Finally, we got the Digital Equipment Corporation, in Maynard, Massachusetts, to supply the needed hardware. The company had been founded by friends of ours from M.I.T., and we collaborated with them to make their little computer—the PDP-l—into a time-sharing prototype. Soon, they had the first commercial versions of time-sharing computers. Digital Equipment eventually became one of the largest computer companies in the world. Then we decided to time-share M.I.T.’s big I.B.M. computer. It worked so beautifully that on the basis of it M.I.T. got three million dollars a year for a long time for research in computer science from the Advanced Research Projects Agency of the Defense Department.”
Time-sharing is now used universally. It is even possible to hook up one’s home computer by telephone to, for instance, one of the big computers at M.I.T. and run any problem one can think of from one’s living room.
The computer revolution in which people like Minsky and McCarthy have played such a large role has come about in part because of the invention of the transistor and in part because of the development of higher-level computer languages that have become so simple that even young children have little trouble learning to use them. The transistor was invented in 1948 by John Bardeen, Walter H. Brattain, and William Shockley, physicists then at the Bell Telephone Laboratories, and in 1956 they were awarded the Nobel Prize in Physics for their work. The transistor has evolved in many different ways since the days of the original invention, but, basically, it is still made of a material in which the electrons have just the right degree of attachment to nearby atoms. When the electrons are attached too loosely, as in a metal, they are free to move anywhere in the material. Hence, metals conduct electricity. Attached too tightly, as in an electrical insulator, the electrons cannot move freely; they are stuck. But in pure crystalline silicon and a couple of other crystalline substances the electrons are bound just loosely enough so that small electrical force fields can move them in a controllable way. Such substances are called semiconductors. The trick in making a transistor is to introduce an impurity into the crystal—a process known as doping it. Two basic types of impurities are introduced, and scientists refer to these as n types and p types—negative and positive. One substance used for doping the crystal is phosphorus, an n type. The structure of phosphorus is such that it contains one electron more than can be fitted into the bonds between the phosphorus atoms and the atoms of, say, silicon. If a small voltage is applied to a silicon crystal doped with phosphorus, this electron will move, creating a current of negative charges. (The charge of an electron is, by convention, taken as negative.) Conversely, if an element like boron is inserted into the silicon lattice, an electron deficiency is created—what is known as a hole. When a voltage is applied, an electron from an atom of silicon will move to fill in the hole, and this will leave yet another hole. This progression of holes cannot be distinguished in its effects from a current of positive charges. To make transistors, one constructs sandwiches of n-type and p-type doped crystals. The great advantage of the transistor is that the electrons will respond to small amounts of electric power. In the old vacuum tubes, it took a lot of power to get the electrons to move, and a lot of waste heat was generated. Moreover, the transistor can be miniaturized, since all of its activity takes place on an atomic scale.
The first commercial transistor radios appeared on the market in 1954. They were manufactured by the Regency division of Industrial Development Engineering Associates, Inc., of Indianapolis (and were not, as it happened, a commercial success). By 1959, the Fairchild Semiconductor Corporation had developed the first integrated circuit. In such a circuit, a chip of silicon is doped in certain regions to create many transistors, which are connected to one another by a conducting material like aluminum, since aluminum is easier than, say, copper to attach to the silicon. In 1961, the Digital Equipment Corporation marketed the first minicomputer, and in 1963—first in Britain and then in the United States—electronic pocket calculators with semiconductor components were being manufactured, although it was not until the nineteen-seventies that mass production brought the costs down to where they are now.
Still, the developments in computer hardware do not in themselves account for the ubiquity of computers in contemporary life. Parallel to the creation of this technology has been a steady evolution in the way people interact with machines. Herman Goldstine, who helped to design both the ENIAC, at the University of Pennsylvania, and the von Neumann computer, at the Institute for Advanced Study, points out in his book “The Computer from Pascal to von Neumann” that the von Neumann computer had a basic vocabulary of twenty-nine instructions. Each instruction was coded in a ten-bit expression. A bit is simply the information that, say, a register is on or off. There was a register known as the accumulator in the machine, and it functioned like a scratch pad. Numbers could be brought in and out of the accumulator and operated on in various ways. The instruction “Clear the accumulator”—that is, erase what was on the register—was, to take one example, written as the binary number 1111001010. Each location in the machine’s memory had an “address,” which was also coded by a ten-digit binary expression. There were a thousand and twenty-four possible addresses (210 = 1,024), which meant that the Institute’s machine could label, or address, a thousand and twenty-four “words” of memory.
Hence a typical “machine language” phrase on the Institute computer might be:
00000010101111001010
This meant “Clear the accumulator and replace what had been stored in it by whatever number was at the address 0000001010.” Obviously, a program written for this machine would consist of a sequence of these numerical phrases, and a long—or even not so long—program of this sort would be all but impossible for anyone except, perhaps, a trained mathematician to follow. It is also clear that if this situation had not changed drastically few people would have learned to program computers.
By the early nineteen-fifties, the first attempts to create the modern programming languages were under way. In essence, these attempts and the later ones have involved the development of an understanding of what one does—the steps that one follows—in trying to solve a problem, and have led the workers in this field to a deeper and deeper examination of the logic of problem-solving. Initially, the concentration was on the relatively simple steps that one follows in doing a fundamental arithmetic problem, like finding the square root of a number. It became clear that certain subroutines or subprograms—such as a routine for addition—came into play over and over. Once these subroutines were identified, one could make a code—what is called a compiler—that would automatically translate them into machine language every time they were needed in a computation. J. Halcombe Laning and Neal Zierler, at M.I.T., and, independently, Heinz Rutishauser, of the Eidgenössische Technische Hochschule (Albert Einstein’s alma mater), in Zurich, were among the first to attempt this. Their work did not gain wide acceptance, however, and it was not until the late fifties, after a group led by John Backus, a computer scientist with I.B.M., had developed FORTRAN, that computers became widely accessible. Some years ago, I had an opportunity to discuss the development of FORTRAN with Backus. He told me that he and his group had proceeded more or less by trial and error. A member of the group would suggest a small test program, and they would use the evolving FORTRANsystem to translate it into machine language to see what would happen. They were constantly surprised by what the machine did. When the system was fairly well advanced, they began to race their FORTRAN-made programs against machine-language programs produced for the same job by a human programmer. They used a stopwatch to see which program was faster. If theFORTRAN-made programs had turned out to be substantially slower, they could not have become a practical alternative to their man-programmed machine-language competitors. It took Backus and his group two and a half years to develop FORTRAN; it was completed in 1957.
In a 1979 Scientific American article, Jerome A. Feldman, chairman of the computer-science department at the University of Rochester, noted that in the United States alone there were at that time more than a hundred and fifty programming languages used for various purposes. For simple numerical computations, most of these languages work almost equally well; in fact, BASIC(for “beginner’s all-purpose symbolic instruction code”), which was developed by a group at Dartmouth in 1963-64, is the most widely available language for small home computers, and will enable people to do about anything that they want to do with such a computer. (What most people seem to want to do with these computers is play games on them, and the programs for games come ready-made.) These small computers have very little memory—at most, sixty-five thousand eight-bit words—and so cannot fully exploit the most advanced computer languages, although simplified versions of some high-level languages are available. The differences begin to be felt in the complex programs needed in the field of artificial intelligence. For these programs, FORTRAN and BASICare simply not sophisticated enough. When FORTRAN was first invented, computer memory cost over a dollar per memory bit. Today, one can buy a sixty-five-thousand-bit memory-circuit chip for around six dollars—so memory is about ten thousand times as cheap now. The next generation of personal computers should give their users the most advanced computer languages. But someday, according to Minsky, the most useful programs for personal computers will be based on artificial-intelligence programs that write programs of their own. The idea is for an ordinary person—not a programmer—to describe what he wants a program to do in informal terms, perhaps simply by showing the program-writing program some examples. Then it will write a computer program to do what was described—a process that will be much cheaper than hiring a professional programmer.
Between machine language and compilers, there is another level of computer-language abstraction—assemblers—which was developed even before the compilers. In an assembly-language instruction, instead of writing out a string of binary digits that might tell the machine to add two numbers one can simply write “ADD” in the program, and this will be translated into machine language.FORTRAN is one step up from this in sophistication. In any computation, the next step will often depend on the result of a previous step. If one number turns out to be larger than another, one will want to do one thing, and in the opposite case another thing. This can be signalled in a FORTRAN program by the instruction “IF” followed by instructions for what to do in either of the alternative cases—a marvellous simplification, provided that one knows in advance that there are two cases. In a chess-playing program, one might well get into a situation in which the number of cases that one would like to examine would depend on one’s position on the board, which cannot be predicted. One would thus like the machine to be able to reflect on what it is doing before it proceeds. In the late nineteen-fifties, a new class of languages was developed to give computers the capacity for reflection. The instructions in these languages interact creatively with the machine.
When I asked Minsky about these languages, he said, “In an ordinary programming language, like FORTRAN or BASIC, you have to do a lot of hard things to get the program even started—and sometimes it’s impossible to do these things. You must state in advance that in the computer memory certain locations are going to be used for certain specific things. You have to know in advance that it is going to use, say, two hundred storage cells in its memory. A typical program is made up of a lot of different processes, and in ordinary programs you must say in advance how each of these processes is to get the information from the others and where it is to store it. These are called declarations and storage allocations. Therefore, the programmer must know in advance what processes there will be. So you can’t get a FORTRAN program to do something that is essentially new. If you don’t know in advance what the program will do, you can’t make storage allocations for it. In these new languages, however, the program system automatically creates space for new things as the program creates them. The machine treats memory not as being in any particular place but, rather, as consisting of one long string, and when it needs a new location it just takes it off the beginning of the string. When it discovers that some part of the program is not being used, it automatically puts it at the end of the string, where it can be used again if it is needed—a process that is known in the computer business as garbage collection. The machine manipulates symbols, and not merely numbers. It is much closer to using a natural language.” One remarkable feature of these new list-processing languages is that they can be used to design other new languages. A list-processing program can be designed to read and write list-processing programs, and so generate new programs of essentially limitless complexity. The development of the list-processing languages derived from attempts to carry out two of the classic problems in artificial intelligence: the use of machines to play games like chess and checkers, and the use of machines to prove theorems in mathematics and mathematical logic. Many of the programming ideas in the two domains are the same. The first significant modern paper on chess-playing programs was written in 1950 by Claude Shannon, then at the Bell Labs, who later, in the sixties and early seventies, preceded Minsky as the Donner Professor at M.I.T. The basic element in Shannon’s analysis—and in all subsequent analyses, including those that have made possible the commercially available chess-playing machines—is a set of what scientists call game trees; each branching of a game tree opens up new possibilities, just as each move in a chess game creates more possible moves. A player opening a chess game has twenty possible moves. On his second play he can have as many as thirty. As play progresses, the number of possible combinations of moves expands enormously. In a typical game, all future possible positions would be represented by a number on the order of 10120—an absurdly large number. If a computer could process these possibilities at the rate of one per billionth of a second, it would take 10111 seconds to run the entire game tree for a single chess game. But the universe is only about 1017 seconds old, so this is not the way to go. (In checkers, there are only 1040 possible positions, which at the same rate would take 1022 centuries—or 1031 seconds—to consider.) Obviously, the human player can consider only a minute fraction of the branches of the tree of continuations resulting from any given chess move, and the computer must be programmed to do the same. While Shannon did not actually write a computer program for making such considerations, he did suggest a framework for a program. First, one would choose a depth—two or three moves—to which one would analyze all legal moves and their responses, and one would evaluate the position at the end of each of these moves. On the basis of evaluations, one would choose the move that led to the “best” final configuration. In a position where there are, say, three legal moves, white may find that one move will lead to a draw if black makes his best move; in another of the three moves, white will lose if black does what he is supposed to do; and in the third possible move white will win if black misplays but will lose if black plays correctly. In such a situation, Shannon’s procedure would call for white to make the first of the three moves—an assumption that would guarantee a draw. In reality, matters are rarely as cut and dried as this, so more complicated criteria, such as material mobility, king defense, and area control, have to be introduced and given numerical weights in the calculation, and Shannon suggested procedures for this. In 1951, the British mathematician Alan Turing—who after von Neumann was probably the most influential thinker of this century concerning the logic of automata—developed a program to carry out Shannon’s scheme. Since he did not have a computer to try it on, it was tried in a game in which the two players simulated computers. It lost to a weak player. In 1956, a program written by a group at Los Alamos was tried on the MANIAC-Icomputer. Their program, which involved a game tree of much greater depth, used a board with thirty-six spaces (the bishops were eliminated) instead of the board of sixty-four spaces that is used in real chess. The computer beat a weak player. The first full chess-playing program to be run on a computer was devised by Alex Bernstein, a programmer with I.B.M., in 1957. Seven plausible moves were examined to a depth of two moves each, and the program played passable amateur chess. It ran on the I.B.M. 704 computer, which could execute forty-two thousand operations a second, compared with eleven thousand operations a second by the MANIAC-I.
In 1955, Newell, Shaw, and Simon began work on a chess program. In a paper published in 1958 in the IBM Journal of Research and Development, they wrote: “In a fundamental sense, proving theorems [in symbolic logic] and playing chess involve the same problem: reasoning with heuristics that select fruitful paths of exploration in a space of possibilities that grows exponentially. The same dilemmas of speed versus selection and uniformity versus sophistication exist in both problem domains.” The three also invented what they called the Logic Theorist, which was a program designed to prove certain theorems in symbolic logic. In a 1957 paper on this, published in the Proceedings of the Western Joint Computer Conference, they wrote:
The reason why problems are problems is that the original set of possible solutions given to the problem-solver can be very large, the actual solutions can be dispersed very widely and rarely throughout it, and the cost of obtaining each new element and of testing it can be very expensive. Thus the problem-solver is not really “given” the set of possible solutions; instead he is given some process for generating the elements of that set in some order. This generator has properties of its own, not usually specified in stating the problem; e.g., there is associated with it a certain cost per element produced, it may be possible to change the order in which it produces the elements, and so on. Likewise the verification test has costs and times associated with it. The problem can be solved if these costs are not too large in relation to the time and computing power available for solution.
The Logic Theorist was run on a computer at Rand that was a copy of the von Neumann machine at the Institute for Advanced Study and that Rand had named, over the objections of von Neumann, the JOHNNIAC. The program was able to supply proofs of some fairly complex theorems, though it failed with others. To program the JOHNNIAC, Newell, Shaw, and Simon used their newly invented I.P.L., which was the forerunner of the list-processing languages. In 1958, they wrote their chess program in a later model of this language called I.P.L.-IV for the JOHNNIAC, and they subsequently described its performance as “good in spots.” However, most current chess programs are written in machine language rather than in any of the list-processing languages (including LISP), for reasons of speed and economy of memory.
Chess players who compete in tournaments are given a numerical point rating. At present, the mean rating of all the United States tournament players is 1,500. Anatoly Karpov, who is the world champion, is rated by the World Chess Federation at 2,700. The best current chess program is Belle, which was developed by Ken Thompson and Joe Condon, of the Bell Labs, followed closely by the chess programs of Northwestern University—Chess 4.9, designed by David Slate and Lawrence Atkin, and Nuchess, designed by Slate and William Blanchard. Belle is rated at about 2,200, and Chess 4.9 at about 2,050; Nuchess has not yet played in enough tournaments to receive a rating. The microcomputer chess-playing machines that are available commercially, for between one hundred and four hundred dollars, can be set to play at various levels (some up to 1,800), but at the higher levels they take an eternity to decide on a hard move. In general, these programs do not really mimic what a human chess master can do. The human chess master can take in more or less at a glance the general structure of a position and then analyze a limited number of moves—three or four—to depths that vary greatly depending on the position. To give a famous example, when Bobby Fischer was thirteen he played a tournament game against the master Donald Byrne—a game that some people have called the greatest chess game played in this century. On the seventeenth move, Fischer sacrificed his queen, for reasons apparent at the time to no one but him. The resulting combination was so profound that it was not until twenty-four moves later that Fischer executed the mate he must have seen from the beginning. This game convinced many people—including, no doubt, Fischer—that it was only a matter of time before he became chess champion of the world. It would be fascinating to replay this game with, say, Belle taking Fischer’s role, to see if by using its methods, which could be quite different, it would have found this mate. That seems unlikely. Still, Belle now plays chess better than all but five per cent of the American tournament chess players, and there is every reason to believe that in the near future it, or some similar program, will beat all of them.
At present, something of a debate is raging both within the artificial-intelligence community and outside it about what the enterprise of “artificial intelligence” really is. The most commonly accepted idea among workers in the field is that it is the attempt to produce machines whose output resembles, or even finally cannot be distinguished from, that of a human mind. The ultimate machine might by itself be able to perform all the cognitive functions, or, more modestly, many kinds of machines might be needed to perform such functions. The first question that this goal raises is what is meant by a machine. Nearly all workers in the field seem to mean some sort of digital computer when they refer to a machine. In this respect, there is a remarkable fact about computers known since the nineteen-thirties; namely, that although real computers may come in all sorts of models, there is in theory only one kind of computer. This notion derives from the pioneering work of Alan Turing, who conceived of something he called the abstract universal computer, which, in principle, can be programmed to imitate any other computer. This universal computer can do any sequence of operations that any model of computer can do. One view of the goal of artificial intelligence would be to build a computer that, by its output, simply could not be distinguished from a mind. Since human minds play games like chess and checkers, do mathematics, write music, and read books, the ideal machine would have to be able to do all of these things at least as well as human beings do them. Obviously, to make such a machine is an enormous task, perhaps an impossible one. People working in artificial intelligence, like any scientists confronted with an incredibly complex problem, have been trying to attack this task in pieces: thus the attempts to make machines—both the hardware and the necessary programs—that play games, that “understand” newspaper accounts, and that can recognize patterns. That machines can already do all of these things with varying degrees of success is certainly a fact. The debate nowadays is over what this means. Are we thereby approaching a better understanding of the human mind? It is not entirely clear what would settle the debate. Even if a humanoid machine were built, many people would certainly argue that it did not really understand what it was doing, and that it was only simulating intelligence, while the real thing lay beyond it and would always lie beyond it. Minsky feels that there is at least a possibility that this might not be true. He sees the development of artificial intelligence as a kind of evolutionary process and thinks that just as intelligence developed in animals over a long sequence of trials and improvements, the same thing might happen in a shorter time as we guide the evolution of machines.
The vast majority of the contemporary workers in artificial intelligence have concentrated on the development of increasingly complex programs for computers—an activity that is justifiable, considering all that has been achieved. But it may also be misleading. This point was made in 1979 by the British molecular biologist Francis Crick, in a Scientific Americanarticle called “Thinking about the Brain.” Crick writes:
The advent of larger, faster and cheaper computers, a development that is far from reaching its end, has given us some feeling for what can be achieved by rapid computation. Unfortunately the analogy between a computer and the brain, although it is useful in some ways, is apt to be misleading. In a computer information is processed at a rapid pulse rate and serially. In the brain the rate is much lower, but the information can be handled on millions of channels in parallel. The components of a modern computer are very reliable, but removing one or two of them can upset an entire computation. In comparison the neurons of the brain are somewhat unreliable, but the deletion of quite a few of them is unlikely to lead to any appreciable difference in behavior. A computer works on a strict binary code. The brain seems to rely on less precise methods of signalling. Against this it probably adjusts the number and efficiency of its synapses in complex and subtle ways to adapt its operation to experience. Hence it is not surprising to find that although a computer can accurately and rapidly do long and intricate arithmetical calculations, a task at which human beings are rather poor, human beings can recognize patterns in ways no contemporary computer can begin to approach.
While many workers in artificial intelligence might agree with Crick’s statement about patterns, it is nonetheless true that computers, in conjunction with electronic visual sensors—TV cameras, in effect—are now able to perform some interesting feats of pattern recognition. The first machine that was able to do sophisticated pattern recognition was the Perceptron, designed by Minsky’s former Bronx Science classmate Frank Rosenblatt. Working at the Cornell Aeronautical Laboratory, Rosenblatt built the prototype version of the Perceptron in 1959. A few years later, I had an opportunity to discuss with him how it worked. The machine consisted of three elements. The first element was a grid of four hundred photocells, corresponding to the light-sensitive neurons in the retina; they received the primary optical stimuli. The photocells were connected to a group of components that Rosenblatt called associator units—the second element—whose function was to collect the electrical impulses produced by the photocells. There were five hundred and twelve associator units, and each unit could have as many as forty connections to the photocells. These connections were made by randomly wiring the associators to the cells. The wiring was done randomly because it was then believed that some, and perhaps most, of the “wiring” in the brain that connects one neuron to another was done randomly. The argument for this was essentially one of complexity. Our brains gain—grow—neurons during prenatal development, until, at birth, the total may have reached forty billion or more. How do they all know where to go in the brain and elsewhere, and what to connect up to when they get there? It was argued by many early brain researchers that if this wiring was largely random the neurons wouldn’t have to know, since where an individual neuron went would not matter much. As a result of experimental work done in recent years, however, it appears that this is not in fact how things work. The connections do seem to be determined from an early stage of development, and are specific both for specific regions of the brain and for specific neurons within these regions. How the information to specify all this is processed so that the neurons do what they are supposed to do remains a mystery. But when Rosenblatt was building the Perceptron it was thought that randomness was important. The third element of Rosenblatt’s Perceptron consisted of what he called response units. An associator—in analogy to a neuron—would produce a signal only if the stimulus it received was above a certain threshold, at which point it would signal the response units. The idea was to use this structure to recognize shapes. First, the machine was shown, say, an illuminated “A,” to which it would respond in accord with its initially random instructions. Then the “A” was deformed or moved and was shown to the machine again. If it responded in the same way both times, it had recognized the “A”; if not, then some of its responses would presumably be “right” and some “wrong.” With adjustments in electronics, the wrong responses could be suppressed, and Rosenblatt’s claim was that after a finite number of adjustments the machine would learn to recognize patterns.
Rosenblatt was an enormously persuasive man, and many people, following his example, began to work on Perceptrons. Minsky was not among them. Ever since his days as a graduate student, when he and Dean Edmonds built one of the earliest electronic learning machines, he had been aware of the limitations of such machines, and had come to the conclusion that it was more profitable to concentrate on finding the principles that will make a machine learn than to try building one in the hope that it would work. Minsky and Rosenblatt engaged in some heated debates in the early sixties. During my discussions with Minsky, he described what the issues were.
“Rosenblatt made a very strong claim, which at first I didn’t believe,” Minsky told me. “He said that if a Perceptron was physically capable of being wired up to recognize something, then there would be a procedure for changing its responses so that eventually it would learn to carry out the recognition. Rosenblatt’s conjecture turned out to be mathematically correct, in fact. I have a tremendous admiration for Rosenblatt for guessing this theorem, since it is very hard to prove. However, I started to worry about what such a machine could not do. For example, it could tell ‘E’s from ‘F’s, and ‘5’s from ‘6’s—things like that. But when there were disturbing stimuli near these figures that weren’t correlated with them the recognition was destroyed. I felt the proponents of the Perceptron had been misled experimentally by giving the machine very clean examples. It would recognize a vertical line and a horizontal line by themselves, but when you put in a varied background with slanted lines the machine would break down. It reminds me, in some ways, of a wonderful machine that J. C. R. Licklider made at Harvard early in the nineteen-fifties. It could recognize the word ‘watermelon’ no matter who said it in no matter what sentence. With a simple enough recognition problem, almost anything will work with some reliability. But to this day there is no machine that can recognize arbitrarily chosen words in ordinary speech.”
In 1963, Minsky began to work with Seymour Papert, and the two men are still collaborators. Papert, who was born in South Africa in 1928, had received his Ph.D. in mathematics at the University of Witwatersrand in 1952, and then, deciding that he still didn’t know enough mathematics, had gone to Cambridge University and taken a second Ph.D. in the subject. He had become interested in the question of learning, and it was in 1958 that he became an associate of Jean Piaget, in Geneva, where he remained for several years. Minsky and Papert were brought together by the neurophysiologist Warren McCulloch, whose paper with Pitts on neurons had so impressed Minsky in the forties; McCulloch had come to work at M.I.T.’s Research Laboratory of Electronics in 1952. “Seymour came to M.I.T. in 1963 and then stayed forever,” Minsky recalled. Within a few months of Papert’s arrival, they had initiated new research programs in human perception, child psychology, experimental robots, and the theory of computation. In the middle nineteen-sixties, Papert and Minsky set out to kill the Perceptron, or, at least, to establish its limitations—a task that Minsky felt was a sort of social service they could perform for the artificial-intelligence community. For four years, they worked on their ideas, and in 1969 they published their book “Perceptrons.”
“There had been several thousand papers published on Perceptrons up to 1969, but our book put a stop to those,” Minsky told me. “It had, in all modesty, some beautiful mathematics in it—it’s really nineteenth-century mathematics. As we went on, more and more questions were generated, so we worked on them, and finally we solved them all. As a result, the book got some rave reviews when it came out. People said, ‘Now computer science has some fundamentally new mathematics of its own. These people have taken this apparently qualitative problem and made a really elegant theory that is going to stand.’ The trouble was that the book was too good. We really spent one year too much on it. We finished off all the easy conjectures, and so no beginner could do anything. We didn’t leave anything for students to do. We got too greedy. As a result, ten years went by without another significant paper on the subject. It’s a fact about the sociology of science that the people who should work in a field like this are the students and the graduate students. If we had given some of these problems to students, they would have got as good at it as we were, since there was nothing special about what we did except that we worked together for several years. Furthermore, I now believe that the book was overkill in another way. What we showed came down to the fact that a Perceptron can’t put things together that are visually nonlocal.”
At this point in our conversation, Minsky took a spoon and put it behind a bowl. “This looks like a spoon to you, even though you don’t see the whole thing—just the handle and a little part of the other end,” he said. “The Perceptron is not able to put things together like that, but then neither can people without resorting to some additional algorithms. In fact, while I was writing a chapter of the book it began to dawn on me that for certain purposes the Perceptron was actually very good. I realized that to make one all you needed in principle was a couple of molecules and a membrane. So after being irritated with Rosenblatt for overclaiming, and diverting all those people along a false path, I started to realize that for what you get out of it—the kind of recognition it can do—it is such a simple machine that it would be astonishing if nature did not make use of it somewhere. It may be that one of the best things a neuron can have is a tiny Perceptron, since you get so much from it for so little. You can’t get one big Perceptron to do very much, but for some things it remains one of the most elegant and simple learning devices I know of.”
When computers first came into use in this country, in the early nineteen-fifties, they were so expensive that they were almost exclusively the province of the large military-oriented government laboratories, like Los Alamos and the Rand Corporation—the former financed by the Atomic Energy Commission and the latter by the Air Force. The computer project at the Institute for Advanced Study, one of the few such projects that were then being carried out in an educational institution, was financed jointly by RCA, the Atomic Energy Commission, the Office of Naval Research, the Office of Air Research, and Army Ordnance. Hardly anyone imagined that within a few decades almost every major university would have a large computer and a department of computer science. A great deal of the funding for pure science in this country now comes from the National Science Foundation, whose budget is spread across all the sciences, with a small fraction of it going to computer science. The Defense Department has its Advanced Research Projects Agency, known as ARPA, whose mission is to finance technology that might eventually have some application to the military. This agency has sometimes interpreted its function as being to detect technological weaknesses in American science and to attempt to remedy them. In the early nineteen-sixties, it began to finance pure research in computer science at universities, and over half the money spent in this field since then has come from ARPA. Around 1963, following the work of McCarthy, Corbató, and their M.I.T. collaborators on computer time-sharing, Project MAC—MAC stood for both “machine-aided cognition” and “multiple-access computer”—was begun at M.I.T., with ARPA providing a budget of about three million dollars a year. About a million dollars of this went to the Artificial Intelligence Group.
“In the first years, we spent this money on hardware and students,” Minsky told me. “But by the tenth year we were making our own hardware, so we spent nearly all the money on faculty and students. We assembled the most powerful and best-human-engineered computer-support system in the world—bar none.” Initially, some of the students were dropouts, who became systems engineers and, eventually, distinguished scientists. Most of them were refugees from other fields, principally mathematics and physics. From the beginning, Minsky’s goal was to use this pool of talent to learn what computers could be made to do in solving non-arithmetic problems—in short, to make these machines intelligent.
For their first problem, Minsky and his students tried to program a computer to do freshman calculus. Calculus was one of the discoveries of Isaac Newton, who found that it was the best way in which to express his laws of motion. One way of applying calculus is to think of it as a sort of “infinite arithmetic,” in which one can calculate, for example, the behavior of planets by doing a great many steps of addition and multiplication. This method—called numerical integration—was one of the first things computers were used for. In fact, it was just this application that the inventors of computers had in mind for them. But there is a second way of doing calculus—this was also explored by Newton—in which one thinks of the calculus as a finite algebra, a skill that involves symbolic manipulation rather than numbers. If one can solve a calculus problem in this closed, algebraic way, one gets an answer that is not just highly accurate but perfect. Such accuracy had never been achieved on any machine, and it was this symbolic manipulation that Minsky and his students set about to program into their machines.
The two most important symbolic methods are called differentiation and integration. In the first, one finds the rate of a process from a description of the process; and in the second, which is, in some sense, the inverse of the first, one recovers the process from the rate. Specifically, in the first case one finds the tangent to a given curve, and in the second one computes the curve from the knowledge of the tangent. Freshman calculus students are taught a certain number of techniques for doing this—what are essentially mental computer programs. When one is confronted with a new problem, one searches around in one’s head—or in a calculus book—for a procedure that looks as if it could be made to work, and then tries to make the expression that one has been given fit one of these algorithms. Minsky’s student James Slagle codified this process in his program SAINT—for “symbolic automatic integrator”—in 1961. The machine he used, an I.B.M. 7090, was given twenty-six standard forms—certain elementary integrals—and eighteen simple algorithms. SAINT would take its problem—given to it in the language of elementary functions, which had to be defined as well—and then begin a search among its algorithms. If the computer found the problem to be too hard, it would break it up into simpler ones. If the computer received a problem that could not be done in closed form, it would try and then quit. For his doctoral thesis, Slagle gave the computer eighty-six workable integrals to evaluate, using SAINT. Even the I.B.M. 7090, which by modern standards is a dinosaur, did eighty-four of the eighty-six at speeds comparable to those achieved by an M.I.T. freshman, and sometimes faster. Many it did in less than a minute. Two of the integrals were beyond it. Slagle’s program, which was written in LISP, was regarded as a breakthrough in the attempt to get a computer to do symbolic manipulation. (Some ten years later, two of Minsky’s students, William Martin and Joel Moses, and Carl Engelman, a mathematician with the MITRE Corporation, building on this work, designed a program that, as it exists now, can do almost any symbolic manipulation that a working physicist or engineer might be called on to do. It is called MACSYMA. Comparable systems are now available in some of the other major computer centers. Some of the algebraic computations that physicists run into these days would take thousands of pages to work .out; they go beyond the point where one would have a great deal of confidence in the answer, even if one could find it. These calculations can be done automatically with MACSYMA, which has become a standard reference in many theoretical-physics papers. One may now tie in to this system by telephone—so that, for better or worse, one can get one’s algebra done by dialling a number.)
The next kind of problem that Minsky attacked with a student—Thomas Evans, in this case—was to get a machine to reason by analogy. The machine was supposed to solve problems of the sort that ask “Figure A is to Figure B as Figure C is to which of the following figures [D-l through D-5]?” Evans’ doctoral thesis, completed in 1963, posed this one:
The first thing that came to my mind when I saw these drawings was: How did they get the computer to deal with them? Did it look at them? “No,” Minsky told me. “The trouble with computer-vision programs in those days was that they were always full of bugs. We really didn’t know how to make them work reliably. It would have taken us a year to get one that worked, and then nobody would have really cared. Evans developed a little sublanguage for describing line figures, and this was typed into the machine.” Evans wrote his program inLISP, which allows one to define two symbols—S and T, say—and then create a third, related symbol, such as L. In this way, one can code the proposition that two “points” (S and T) define a “line” (L). This is all that the computer has to know about what a line is; namely, it is something defined by two symbols called points. A line and a point define a plane. From a mathematical point of view, this abstract set of relationships is what these objects are, although, of course, we attach many other meanings to them. “Nowhere does the machine ‘really’ know what a line is, but I believe that there is nothing in us that ‘really’ knows what a line is, either—except that our visual system identifies certain inputs with ‘lineness,’ ” Minsky said. “It is the web of order among these inputs that makes them unique. Or, even if they are not unique, one pretends to know what a line is anyway.”
Once the machine had Figures A and B coded into its memory, it would compare them, using such criteria as “big” and “small,” “inside” and “outside,” and “left” and “right.” It would attempt to see what operations were necessary to transform one drawing into the other. The specification of “inside” and “outside” employs a method invented in the nineteenth century. To take an example: Imagine that one is somewhere inside a circle and one draws a line due north from where one is. This line will intersect the circle once as it goes through it. But if one is below the circle and outside it and draws a line north the line either will not intersect the circle at all or will intersect it twice. There are nuances, but, in general, this procedure will tell one that one is inside a closed curve if there is an odd number of crossings and outside if there is an even number of them. This was the procedure that Evans used. In Evans’ example, the machine would note that one of the differences between A and B is that the circle has been moved down to encase the small figure; after comparing A and B, the machine would next compare A and C. In this case, it would note that both had big figures above, and that each encased a small figure. Now the machine would try to find a diagram in the D series in which the big figure had moved down to encase the small one below it—D-3.
To make a computer do all this, Evans worked out one of the most complex programs that had ever been written. The machine he was using had a memory of thirty-two thousand words, each of thirty-six bits—about a million bits of memory. At that time, memory units cost about a dollar a bit, so the cost of the memory alone was about a million dollars. Memory now costs about a hundredth of a cent a bit, so today a comparable memory would cost at retail about a hundred dollars. Evans’ program used essentially every bit in the machine’s memory, and was able to do about as well on the tests as an intelligent high-school student. Apart from the specific results of the program, what fascinated Minsky was the reactions to it. “It irritated some people a lot,” he told me. “They felt that if you could program something, then the machine was not ‘really’ doing it—that it didn’t really have a sense of analogy. I think that what Evans’ program showed was that once one came to grips with ‘intuitions’ they turned into a lot of other things. I was convinced that the way the thing worked was pretty lifelike. Until one finds a logic for the kind of thing that Evans’ program did, it looks like ‘intuition’—but that is really superficial. What we had done was to find a logic for this kind of problem-solving. What we never did do was to use a lot of statistical psychology to learn what some ‘average’ person does when solving these problems. For a long time, I had a rule in my laboratory that no psychological data were allowed. I had read a lot of such data when I was in college, and I felt that one couldn’t learn very much by averaging a lot of people’s responses. What you had to do was something like what Freud did. Tom Evans and I asked ourselves, in depth, what we did to solve problems like this, and that seemed to work pretty well.”
During the period when Evans was finishing his thesis work, Minsky and his colleagues were involved in two other kinds of projects. computer linguistics and robotics. One of the earliest non-numerical projects that were tried on computers was language translation. It was not a notable success, in part because not enough was known about syntax and in part because of the inherent ambiguity of words. Simple word-by-word translation leads to absurdities. For example, one often cannot tell a noun from a verb without an understanding of the contextual meaning—and at the time such an understanding seemed beyond the capacity of computers. In his view, Minsky told me, the notions of Noam Chomsky and others concerning the formal theory of syntax helped to clarify many of the technical issues about the structure of phrases and sentences. “But I felt that they actually distracted linguists from other basic problems of meaning and reference,” he went on. “I saw little hope for machines to deal realistically with language until we could make simple versions of programs that really understood simple sentences in simple ways. In doing this semantic-information processing, as I called it, the early A.I. community worked pretty much by itself, without the help, or hindrance, of the linguists—at least, until much later.”
The work on language resulted in two M.I.T. doctoral theses that have become widely known. In 1964, Bertram Raphael, a mathematics student, wrote, as part of his thesis, a program that would allow a computer to make decisions, in a limited domain, about the meaning of words within a given context. Raphael first gave a computer a sequence of statements: Every boy is a person.
A finger is part of a hand.
There are two hands on each person.
He then asked it a question:
How many fingers does John have?
Up to this point, the name “John” had not been defined in the program, and the verb “have” can be used in several senses—as in “John had his dinner,” “John was had for dinner,” and “We had John to dinner.” When the computer was confronted with such a problem, Raphael’s program did not break down. Instead, the machine responded, “The above sentence is ambiguous. But I assume ‘has’ means ‘has as parts.’’ It then asked, “How many fingers per hand?” Having been told that “John is a boy” and that each hand has five fingers, it was asked once again how many fingers John had. It now replied, “Ten.” Later, Raphael asked the machine “Who is President of the United States?” and it replied, “Statement form not recognized.”
Minsky told me that he had found Raphael’s program particularly interesting because it could tolerate contradictions. “If you told the machine that John had nine fingers, it would not break down,” he said. “It would try to build a sort of hierarchy of knowledge around this fact. In other words, given any situation, it would look for the most specific information it had about it, and attempt to use it.”
Probably the most spectacular program of this sort to be developed in the nineteen-sixties was one created by Minsky’s student Daniel Bobrow that sought to combine language and mathematics. He named it STUDENT. To keep the mathematics relatively simple, Bobrow chose to work with high-school-algebra problems. These are basically word problems, since, once the words have been translated into equations, what is involved is the solution of two, or possibly three, simultaneous equations—a snap for a computer. One of the problems posed in Bobrow’s thesis, also completed in 1964, was this:
The gas consumption of my car is 15 miles per gallon. The distance between Boston and New York is 250 miles. What is the number of gallons of gas used on a trip between New York and Boston?
The machine was programmed to make the assumption that every sentence is an equation, and was given some knowledge about certain words to help it to find the equations. For example, it knew that the word “is” often meant that the phrases on both sides of “is” represent equal amounts. It knew that “per” meant division. “The program was usually just barely good enough at analyzing the grammar of sentences to discern where phrases begin and end,” Minsky told me. “The program is driven by the possible meanings—the semantics—to analyze the syntax. From the mathematical word ‘per’ in that first sentence’s ‘miles per gallon,’ it can tell that the number fifteen would be obtained by dividing a certain number, x of miles, by some other number, y of gallons. Other than that, it hasn’t the slightest idea what miles or gallons are, or, for that matter, what cars are. The second sentence appears to say that something else equals two hundred and fifty miles—hence the phrase ‘the distance between’ is a good candidate to be x. The third sentence asks something about a number of gallons—so that phrase ‘of gas used on a trip’ Claude is a candidate to be y. So it proposes one equation: x = 250, and another controlled equation, x/y = 15. Then, the mathematical part of the program can easily find that y = 250/15.” Such problems are easy for STUDENT when exactly materials the same phrases are used for the same quantities in the different sentences. When the phrases are as different as they are in Bobrow’s problem, the program matches them up by using such tricks as seeing which have the most words in common. This didn’t always work, but, Minsky remarked, “It seemed incredible that this could work so often when so many high-school students find those problems so hard. The result of all this is a program that—on the surface, at least—can not only manipulate words syntactically but also understand, if only in a shallow way, what it is doing. It does not know what ‘gas’ is or what ‘gallons’ are, but it knows that if it takes miles per gallon and multiplies by gallons it will find the total distance. It can wander through a little of this common-sense logic and solve algebra problems that students find hard because they get balled up in the understanding of how the words work.”
Minsky went on, “I am much more interested in something like this than I am in one of those large performances in which a machine beats, for example, a chess master. In a program like Bobrow’s or Raphael’s, one has cases in which the skills required appear to be rather obscure but can nonetheless be analyzed. In some sense, the performance of the machine here is childish; but this impresses me more than when a computer does calculus, which takes a kind of expertise that I think is fundamentally easy. What children do requires putting together many different kinds of knowledge, and when I see a machine that can do something like that it’s what impresses me most.”
While Minsky has always had a great fondness for robots, he came to the conclusion rather early that from the point of view of laboratory experiments making a robot mobile was more trouble than it was worth. “I thought that there were enough problems in trying to understand hands and eyes, and so forth, without getting into any extra irrelevant engineering,” he told me. “My friends at the Stanford Research Institute decided in the mid-sixties to make their first robot mobile—against my advice.”
In 1962, Henry Ernst, who was studying with both Minsky and Claude Shannon, made the Artificial Intelligence Group’s first computer-controlled robot. It was a mechanical arm with a shoulder, an elbow, and a gripper—basically, the kind of arm that is used to manipulate radioactive materials remotely. The arm was attached to a wall and activated by several motors, which, in turn, were controlled by a computer. The robot’s universe of discourse consisted of a box and blocks that were set out on a table. It had photocells in the fingertips of the gripper. The hand would come down until it was nearly in contact with the surface of the table, and then, when the photocells sensed the darkness of the hand’s shadow, its program would tell it to stop. It would thereupon begin to move sidewise until it came into contact with a block or the box. It could tell the difference, because if the object was less than three inches long it was a block and if it was more than three inches long it was the box. The program would then direct the arm to pick up the block and put it in the box. The arm could find all the blocks on a table and put them into the box. “It was sort of eerie to watch,” Minsky recalled. “Actually, the program was way ahead of its time. I don’t know if we appreciated then how advanced it was. It could deal with the unexpected. If something that it didn’t expect happened, it would jump to another part of its program. If you moved the box in the middle of things, that wouldn’t bother it much. It would just go and look for it. If you moved a block, it would go and find another one. If you put a ball on the table, it would try to verify that it was a block. Incidentally, when Stanley Kubrick was making his film ‘2001’ he asked me to check the sets to see if anything he was planning to film was technically impossible. I drew a sketch for Kubrick of how mechanical hands on the space pod might work. When I saw the film, I was amazed that M-G-M had been able to make better mechanical hands than we could. They opened the spaceship’s airlock door fantastically well. Later, I learned that the hands didn’t really work, and that the door had been opened by a person concealed on the other side.”
In the mid-nineteen-sixties, Minsky and Papert began working together on the problem of vision. These efforts . ultimately produced a program created by Minsky in collaboration with a group of hackers—Gerald Sussman, William Gosper, Jack Holloway, Richard Greenblatt, Thomas Knight, Russell Noftsker, and others—that was designed to make a computer “see.” To equip the computer for sight, Minsky adapted some television cameras. He found that the most optically precise one had been invented in the early nineteen-thirties by Philo Farnsworth, who was one of the early television pioneers. It was still being manufactured by ITT. Minsky ordered one and managed to get it working, but it kept blurring. He telephoned the company and was told that the best thing to do would be to talk to Farnsworth himself, who was still doing research at the company. Minsky explained his problem on the telephone, and Farnsworth instantly diagnosed it. Minsky then fixed the blurring and attached the camera to a PDP-6 computer. The idea was to connect this camera to an arm so that one could tell the computer to pick up objects that its eye had spotted and identified. The arm was then to do various things with the objects. In the course of this, Minsky designed a mechanical arm, powered by fourteen musclelike hydraulic cylinders. It had a moving shoulder, three elbows, and a wrist—all not much thicker than a human arm. When all the bugs were finally out and the machine was turned on, the hand would wave around until the eye found it. “It would hold its hand in front of its eye and move it a little bit to see if it really was itself,” Minsky said. The eye had to find itself in the coördinate system of the hand. Despite all the problems, they were able to get the arm to catch a ball by attaching a cornucopia to the hand, so that the ball would not fallout. It would sometimes try to catch people, too, so they finally had to build a fence around it.
The project turned out to be much more difficult than anyone had imagined it would be. In the first place, the camera’s eye, it was discovered, preferred to focus on the shadows of objects rather than on the objects themselves. When Minsky and his colleagues got that straightened out, they found that if the scene contained shiny objects the robot would again become confused and try to grasp reflections, which are often the brightest “objects” in a scene. To solve such problems, a graduate student named David Waltz (now a professor of electrical engineering at the University of Illinois at Urbana) developed a new theory of shadows and edges, which helped them eliminate most of these difficulties. They also found that conventional computer-programming techniques were not adequate. Minsky and Papert began to try to invent programs that were not centralized but had parts—heterarchies—that were semi-independent but could call on one another for assistance. Eventually, they developed these notions into something they called the society-of-the-mind theory, in which they conjectured that intelligence emerges from the interactions of many small systems operating within an evolving administrative structure. The first program to use such ideas was constructed by Patrick Winston—who would later succeed Minsky as director of the A.I. Laboratory. And by 1970 Minsky and his colleagues had been able to show the computer a simple structure, like a bridge made of blocks, and get the machine, on its own, to build a duplicate.
At about the same time, one of Papert’s students, Terry Winograd, who is now a professor of computer science and linguistics at Stanford, produced a system called SHRDLU. (On Linotype machines, operators used the phrase “ETAOIN SHRDLU” to mark a typographical error.) SHRDLU was probably the most complicated computer program that had ever been written up to that time. The world that Winograd created for his SHRDLU program consisted of an empty box, cubes, rectangular blocks, and pyramids, all of various colors. To avoid the complications of robotics, Winograd chose not to use actual objects but to have the shapes represented in three dimensions on a television screen. This display was for the benefit of the people running the program and not for the machine, which in this case was a PDP-10 with a quarter of a million words of memory. The machine can respond to a typed command like “Find a block that is taller than the one you are holding and put it into the box” or “Will you please stack up both of the red blocks and either a green cube or a pyramid?” When it receives such a request, an “arm,” symbolized by a line on the television screen, moves around and carries it out. The programming language was based on one named PLANNER, created by Carl Hewitt, another of Papert’s students.PLANNER, according to Minsky, consists largely of suggestions of the kind “If a block is to be put on something, then make sure there is room on the something for the block to fit.” The programmer does not have to know in advance when such suggestions will be needed, because the PLANNER system has ways to detect when they are necessary. Thus, the PLANNER assertions do not have to be written in any particular order—unlike the declarations in the ordinary programming languages—and it is easy to add new ones when they are needed. This makes it relatively easy to write the language, but it also makes it extremely difficult to anticipate what the program will do before one tries it out—“so hard,” Minsky remarked, “that no one tries to use it anymore.” He added, “But it was an important stepping stone to the methods we use now.” One can ask it to describe what it has done and say why it has done it. One can ask “Can a pyramid be supported by a block?” and it will say “Yes,” or ask “Can the table pick up blocks?” and it will say “No.” It is sensitive to ambiguities. If one asks it to pick up a pyramid—and there are several pyramids—it will say “I don’t understand which pyramid you mean.” SHRDLU can also learn, to a certain extent. When Winograd began a question “Does a steeple—” the machine interrupted him with “Sorry, I don’t know the word ‘steeple.’ ” It was then told that “a steeple is a stack which contains two green cubes and a pyramid,” and was then asked to build one. It did, discovering for itself that the pyramid has to be on top. It can also correctly answer questions like “Does the shortest thing the tallest pyramid’s support supports support anything green?” Still, as Douglas Hofstadter, in his book “Gödel, Escher, Bach,” points out,SHRDLU has limitations, even within its limited context. “It cannot handle ‘hazy’ language,” Hofstadter says. If one asks it, for example, “How many blocks go on top of each other to make a steeple?” the phrase “go on top of each other” —which, despite its paradoxical character, makes sense to us—is too imprecise to be understood by the machine. We use phrases like this all the time without being conscious of how peculiar they are when they’re analyzed logically.
What are we to make of a program like SHRDLU? Or of one likeHEARSAY—designed by Raj Reddy, a former student of McCarthy’s at Stanford and now a professor of computer science at Carnegie-Mellon—which, on a limited basis, began to understand speech? Do these programs bring us closer to understanding how our minds work, or are they too “mechanistic” to give us any fundamental insight? Or is what they show, perhaps, that the closer we get to making machine models of ourselves, the less we begin to understand the functioning of the machines? Minsky and I discussed these questions at length, and also the question of whether the fact that we are beginning to learn to communicate with machines might help to teach us to communicate with one another or whether our sense of alienation from the machines will grow as they begin to perform more and more in domains that we have traditionally reserved for ourselves.
What would it mean to understand the mind? It is difficult to believe that such an understanding would consist of an enumeration of the brain’s components. Even if we had a diagram that included everyone of the billions of neurons and the billions of interconnections in the human brain, it would stare at us as mutely as the grains of sand in a desert. But this is the way—for most of us, at least—that a high-resolution microscope photograph of a silicon computer chip appears. Such a photograph, though, does not truly show the components—the atoms and the molecules. On the chip, these have been organized into functional units—memory, logic circuits—which can be understood and described. Minsky and others working in this field believe that in time the functional parts of the brain will be identified and their function described in language we can understand. Still, these people seem certain that the description, whatever it turns out to be, will not be like the great unifying descriptions in physics, in which a single equation, or a few equations, derived from what appear to be almost self-evident principles, can describe and predict vast realms of phenomena. Will the ultimate description of the brain resemble the description of a machine—in particular, that of a computer? The nervous systems of living organisms have been evolving on earth for more than three billion years, while the computer revolution has taken place only over the past forty years. We simply do not yet fully understand what computers can be made to do, and until this is clearer we cannot be sure what the final comparisons between mind and computer may be.
In the meantime, how should we view the machines we do have? Minsky and Papert, among others, see in the machines a great new opportunity for changing our methods of education. This is not because the machines can do arithmetic, say, better than we can but because “the computer provides a more flexible experience than anything else a child is likely to encounter,” Minsky said. He went on, “With it, a child can become an architect or an artist. Children can now be given resources for dealing with complex systems—resources that no one has ever had before. That’s one side of it. On the other side, dealing with a computer—as Seymour and I see it, at least—allows a child to have a whole new set of attitudes toward making mistakes, or what we call finding bugs. We have not been able to devise any other term for it. This attitude does not seem to get taught in schools, even though the concern there is to teach the truth. To really understand a mechanism—a piece of clockwork, for example—what you have to understand is what would happen if there were, for example, a tooth missing from a gear. In this case, part of the mechanism might spin very fast and set off a long chain of things that could end with the clock’s smashing itself to bits. To understand something like this, you must know what happens if you make a perturbation around the normal behavior—do the sort of thing that physicists do in what they call perturbation theory. We call this kind of information knowledge about bugs. Traditionally, such encounters are looked on as mistakes—something to be avoided. Seymour wanted to develop a working place for a child in which it would be a positive achievement when a child could find the things that could go wrong. If you know enough of those things, you get close to something like the truth. This is what happens with children who use computers in schoolroom environments that Seymour has set up, and in this the computers are essential, since their behavior is so flexible.”
Minsky continued, “We hope that when a child does something that does not quite work out he will say, ‘Oh, isn’t it interesting that I came out with this peculiar result. What procedure in my head could have resulted in something like this?’ The idea is that thinking is process and that if your thinking does something you don’t want it to do you should be able to say something microscopic and analytical about it, and not something enveloping and evaluative about yourself as a person. The important thing in refining your own thought is to try to depersonalize your interior; it may be all right to deal with other people in a vague, global way by having ‘attitudes’ toward them, but it is devastating if this is the way you deal with yourself.”
In the last few years, Minsky’s thoughts have ranged from the use of robotics both on earth and in space—he thinks that with a relatively small amount of technical improvement in robots automatic factories in space would be feasible—to the development of the human mind and its ability to cope with paradoxes. “Children’s innate learning mechanisms do not mature for a long time,” he said during one of my talks with him. “For example, a child usually doesn’t completely learn spatial perspective until he is about ten. If one is seated at a table with a six-year-old and there are several objects on the table and the child is asked to draw them not from the point of view of what he sees but from the point of view of someone who is sitting opposite him, the child will get the perspective wrong. Children won’t begin to get this right until they are ten or twelve. I suspect that this is one of many instances in which the computational ability to do many things, while it may be built in from the beginning, is not dispensed to you until later in life. It is like memory. Most of your memory capacity is very likely not available to you when you are a baby. If it were, you might fill it up with childish nonsense. The genetics is probably arranged to add computational features as you grow, whatever they may be—push down stacks, interrupt programs, all the kinds of things that computer scientists talk about. The hardware for these things is probably built in, but it makes more sense not to give them to the infant right away. He has to learn to use each of the pieces of machinery reliably before he is given the next one. If he were given too many at once, he would ruin them or make no use of them.”
Minsky paused, and then went on, “There is another side to this which occurred to me recently. I have often wondered why most people who learn a foreign language as adults never learn to speak it without an accent. I made up a little theory about that. What is a mother trying to do when she talks with her baby? What is her goal? I don’t think that it is to teach the baby English or some other adult language. Her goal is to communicate with the baby—to find out what it wants and to talk it out of some silly demands that she can’t satisfy. If she could really imitate the baby—speak its language without an accent—she would. But she can’t. Children can learn to speak their parents’ language without an accent, but not vice versa. I suspect there is a gene that shuts off that learning mechanism when a child reaches sexual maturity. If there weren’t, parents would learn their children’s language, and language itself would not have developed. A tribe in which adults lost their ability to imitate language at sexual maturity would have an evolutionary advantage, since it could develop a continuous culture, in which the communication between adult and child went in the right direction.
“There is something else that is interesting about children, and that is their attitude toward logical paradoxes. I have often discussed Zeno’s paradox with little kids. I ask a kid to try to walk halfway to a wall, and the kid does it. Then I say, ‘Now walk halfway from where you are now to the wall,’ and then I ask him what would happen if he kept that up. ‘Would you ever get to the wall?’ If the child appreciates the problem at all, what happens is that he says, ‘That is a very funny joke,’ and he begins to laugh. This seems to me to be very significant. It reminds me of the Freudian theory of humor. Something that is funny represents a forbidden thought that gets past the censor. These logical paradoxes are cognitively traumatic experiences. They set up mental oscillations that are almost painful—like trying to see both sides of the liar paradox: ‘The sentence that you are now reading is false.’ These intellectual jokes represent the same sort of threat to the intellect that sexy or sadistic jokes do to the emotions. The fact that we can laugh at them is valuable. It enables us to get by with an inconsistent logic.”
Minsky concluded, “To me, this is the real implication of Gödel’s theorem. It says that if you have a consistent mathematical system, then it has some limitations. The price you pay for consistency is a certain restrictiveness. You get consistency by being unable to use certain kinds of reasoning. But there is no reason that a machine or a mathematician cannot use an inconsistent system of logic to prove things like Gödel’s theorem and even understand that, just as Gödel did. I do not think that even Gödel would have insisted that he was a perfectly consistent system that never made a logical error—although, as far as I know, he never published one. If I am doing mathematical logic, I take great pains to work within one of those logical systems which are believed to be foolproof. On the other hand, as a working mathematician, I behave quite differently in everyday life. The image I have is that it is like ice skating. If you live in a conscientious community that does not try to prohibit everything, it will place red flags where the ice is thin, to tell you to be careful. When you are doing mathematics and you begin to discover that you are working with a function that has a peculiar behavior, you begin to see red flags that tell you to be careful. When you come to a sentence that says it’s false or you come to sentences that appear to be discussing things that resemble themselves, you get nervous. You say to yourself, ‘As a mathematician, I am on thin ice now.’ My view of mathematical thinking is like Freud’s view of everyday thinking. We have in our subconscious a number of little demons, or little parasites, and each of them is afraid of something. Right now, I am working on the society-of-the-mind theory. I believe that the way to understand intelligence is to have some parts of the mind that know certain things, and other parts of the mind that know things about the first part. If you want to learn something, the most important thing to know is which part of your mind is good at learning that kind of thing. I am not looking so much for a unified general theory. I am looking for an administrative theory of how the mind can have enough parts and know enough about each of them to solve all the problems it confronts. I am interested in developing a set of ideas about different kinds of simple learning machines, each one of which has as its main concern to learn what the others are good at. Eventually, I hope to close the circle, so that the whole thing can figure out how to make itself better. That, at least, is my fantasy.” ♦
沒有留言:
張貼留言