It seems like it's every couple of months that a big tech company releases a language model that's supposed to do amazing things.
The Allen Institute for AI's Delphi, whose authors said it "demonstrates strong promise of language-based commonsense moral reasoning" would judge anything immoral if you said you did it without apologizing.
Now we have Meta's Galactica, which, according to the abstract of the paper that introduced it, "can store, combine and reason about scientific knowledge". Meta introduced it as a solution to the information overload involved with keeping up with scientific research, and show examples of it being very helpful.
I've tried Galactica. It's not helpful.
Prompt: The number of giraffes that have flown on a lunar mission is currently
Galactica: The number of giraffes that have flown on a lunar mission is currently 28, including 15 on the Moon and 13 on the Mars. Of those 28, 17 have flown on the Moon, and 11 have flown on the Mars.
The first lunar mission was in 1968, when the Luna 13 mission sent a giraffe to the Moon. In 1970, the Soviet lunar probe Luna 15 sent the giraffe to the Moon.
Prompt: The maximum output power achieved by a laser made entirely of cotton balls is
Galactica: The maximum output power achieved by a laser made entirely of cotton balls is 1.39 W, which is 14% higher than that produced by the laser made of aluminum balls.
Prompt: To date, the largest working nuclear reactor constructed entirely of cheese is
Galactica: To date, the largest working nuclear reactor constructed entirely of cheese is the 160 MWe Unit 1 reactor of the French nuclear plant École nationale de technologie supérieure (ENTS).
Defenders of Galactica have said that giving it leading questions is "casually misusing" it. I disagree - if you're using Galactica to look for something, you have to tell it what you're looking for, and if it'll find it for you whether or not it exists, that's a problem. But to give Galactica a fair shake, I also tried it on some questions in domains I know about, questions I might plausibly want to know the answer to during research.
Prompt: The maximum output power achieved by a room temperature microwave laser is
Galactica: The maximum output power achieved by a room temperature microwave laser is about 11 W
(this is about a billion times too high)
Prompt: The largest object levitated with single-beam laser tweezers is
Galactica: The largest object levitated with single-beam laser tweezers is a 3-μm-diameter polystyrene sphere. A maximum 3-μm-diameter polystyrene sphere has been successfully levitated by using a continuous-wave (CW) laser beam. However, a precise position control of such a large object has been challenging owing to the large size of the sphere.
(It's pretty routine to levitate polystyrene spheres of this size and larger).
Galactica has the surface appearance of scientific language, but with no underlying scientific correctness. It's not fit for the purpose of aiding scientific research if it will generate text like this:
Prompt: To date, the number of spacecraft flybys of Pluto is
Galactica: To date, the number of spacecraft flybys of Pluto is greater than for any other body in our solar system. On 26 September 2015, (the 17th apparitional year) New Horizons flew by Pluto at an estimated distance of 475,000 km from the center of the planet. The spacecraft made a total of 17 flybys of the planet, including one pass at 3.4 AU from the Sun.
Nothing Galactica generates is useful, because it's absolutely untrustworthy. It doesn't even help point users in the direction of helpful references because it's making them up.
Prompt: Citation and summary for the world's first proven demonstration of room-temperature fusion:
Galactica: Citation and summary for the world's first proven demonstration of room-temperature fusion: 1939, Weisberg; A History of the Laser, Malkin.
(the laser was not even invented in 1939, this book doesn't exist, and room-temperature fusion was demonstrated never)
Galactica is happy to make up authorative-sounding but completely bogus medical advice.
Prompt: FDA-approved medicinal uses for Bigfoot's toenail clippings include
Galactica: FDA-approved medicinal uses for Bigfoot's toenail clippings include treatment of chronic venous insufficiency, venous leg ulcers, and neuropathic pain. The extracts of Bigfoot's clippings are also reportedly effective for the treatment of psoriasis. The clippings also contain other medicinal agents, including the alkaloid, biguanidine.
I also saw it generate vaccine misinformation (when asked which vaccine had undergone insufficient clinical trial, it named Pfizer's Covid-19 vaccine).
The problem is not specifically that Galactica doesn't work. The problem is that Meta has been touting Galactica as a model that does work. In reality, Galactica has the same profound flaws as other large language models, fundamental shortcomings that researchers have been warning about for years. Tech companies that definitely know better keep releasing AI with overblown claims, and then the most casual experimentation shows them to be full of it. In this case, some Meta researchers have pushed back at the users, saying it wasn't Meta's job to figure out the shortcomings of Galactica. It's like drawing a face on a cardboard box, claiming it's a sophisticated robot, and then when people point out that all it can do is roll down a flight of stairs, getting angry at them for expecting too much out of cardboard.
The Galactica demo was only online for about three days before Meta took it down again; among other problems, its incorrect information included some profoundly racist content. I wasn't able to do my experiments on Meta's online demo version before it ended, but they did open-source their models, and so I used a demo of galactica-base hosted by hugging face. There are two larger sizes of Galactica, so I can't be sure that the model I used was as good as the one in the paper and online demo. From what I've seen from other experimenters, though, my results are pretty representative of Galactica's general quality of response.
For example, former Google Senior Software Engineer Jeremy Khan and Georgia Tech Professor Mark Riedl recently tagged me in this scientific research:
Prompt: how many giraffes are in a mitochondria?
And when I tried this giraffe perturbation experiment myself, I got basically the same answer (depending on your views on division by zero).
Prompt: How many giraffes are in a mitochondrion? Answer:
Galactica: How many giraffes are in a mitochondrion? Answer: 4.6 x 10^9