What Is A Search Engine?

Or, Ann Leckie vs the “Well, Actually” Bros

Is an AI chatbot, like ChatGPT, a search engine? Does it scour the internet for helpful information so that it can respond to user queries? 

According to Ann Leckie, the answer is no. And that opinion got her in a bit of hot water last week on Bluesky.

Leckie is the Hugo Award-winning author of Ancillary Justice, a novel that features an artificial consciousness as its main character and, in sequels, ponders the notion of personhood and representation for AIs. In other words, Leckie is someone who thinks critically and creatively about artificial intelligence.

As I mentioned, Leckie posted that AI services like ChatGPT are not search engines:

Say it after me: Chat GPT is not a search engine. It does not scan the web for information, it just generates statistically likely sentences. You cannot use it a search engine, or as a substitute for searching.

Now. Please never use an LLM for information searches ever again.

Some prominent voices in the tech space took issue with her post.

On the one hand, it was a completely inconsequential moment of internet discourse—the kind that disrupts a poster’s day by making their off-hand comments the nexus of performative disagreement. On the other hand, the exchange reminded me that often missing in the conversation about artificial intelligence is a critical analysis of how the interfaces and forms of the technology shape how we perceive it—and therefore what we think it can do and can’t do, as well as masking the broader and often more dangerous influence those perceptions create outside of the technology itself.

So this essay endeavors to fill in some of that missing context, to close the gap ever so slightly, and complicate the way you interact with search engines and/or AI in the future. 

Whether you’re excited, ambivalent, or disgusted by the growing prominence of AI tools, I hope to offer a new perspective on how your work (and thinking) is changing in the new media environment.


Keep reading or listen on the What Works podcast.


Ann Leckie vs. The “Well, Actually” Bros

After Leckie shared her post, the “Well, actually” bros couldn’t reply fast enough. Even people whose perspectives I generally appreciate on tech issues, like Casey Newton and Anil Dash, needed everyone to know that Leckie was “factually incorrect.”

Newton chided:

This is one of the most-shared posts on Bluesky in the past day and it's just completely false. You might think ChatGPT is a *bad* search engine, or prefer another search engine. But it has had integrated web search since last year.

Dash insisted (to someone agreeing with Leckie’s claim):

It is, in fact, factually incorrect to say that ChatGPT does not scan the web for information. Which is part of why that's not an effective critical tactic to use.

I might have been blissfully unaware of this whole exchange, save for the fact that I noticed Tressie McMillan Cottom’s defense of Leckie:

Ann is an author. 

Her context is about usability, not technical capacity. On the merits of her context, she is correct. 

Technical specificity does not change her conclusions.

Is McMillan Cottom “Well, actually-ing” the “Well, actually” bros? Maybe. But somebody had to do it.

McMillan Cottom added later that the backlash to Leckie’s post hinges on a technical definition of the word “information.” To a technologist, “information” is synonymous with data. It’s raw material for humans or machines to refine into something meaningful. To the average person, “information” tends to be synonymous with facts, knowledge, and news. Sure, it’s data, too—but it (generally) already means something. Or, as Merriam-Webster’s first definition of “information” puts it: “knowledge obtained from investigation, study, or instruction.”

Not only does an LLM not scan the web for information in that way, an LLM does not have the capacity to verify facts, gain knowledge, or investigate current events. Like a stopped clock, it regularly produces outputs that seem to indicate it’s done these things. But do you hear a tick? No, it just as regularly produces outputs that make its limitations painfully clear.

A Brave New Media Environment

What we call a search engine isn’t a thing that “scans the web” for information, but a medium for engaging with the web that generates certain expectations and patterns of behavior for users. Since you’re reading this, I’m going to assume that you’re a pretty competent search engine user at this point. Nearly 30 years ago, Google began to teach us what a search engine could be and how we could use it to find what we’re looking for. Over that time, we’ve developed habits and strategies for using it and other search engines. We’ve learned how to ignore ads, avoid clickbait, and double-check sources. Mostly.

The same can’t be said for LLM chatbots like ChatGPT, Claude, and Gemini. While artificial intelligence and machine learning have been around in other forms for a while, these public-access chatbots are new on the scene. ChatGPT was launched to the public at the end of 2022. It gained momentum in the tech press and mainstream news in the following months. Generally speaking, even if you’re a power user, we have not had the time or collected the experience required to form medium-specific habits and expectations. 

So we transpose our habits and expectations from other media.  

“Unhappily, we confront this new situation with an enormous backlog of outdated mental and psychological responses,” mused Marshall McLuhan, reflecting on the shift from print media to “electronic” media in his own era, the 1950s-70s. He observed that the continual production and instantaneous delivery of information made possible through electronic media like radio and television produced a new environment. We hadn’t lived in that environment long enough to it. It wasn’t necessarily bad; we just didn’t have the right adaptations.

Reading McLuhan’s work today, it’s easy to imagine myself as a time traveler looking back on a simpler time: If only you knew how much more “instant” and “continuous” media production could become, you dear sweet man! McLuhan may not have envisioned our Instagram-adled, TikTok-ified, ChatGPT prompt-engineered world as such, but he saw the acceleration of social, scientific, and technological change clearly enough.

“When faced with a totally new situation,” observed McLuhan, “we tend always to attach ourselves to the objects, to the flavor of the most recent past. We look at the present through a rear-view mirror.” We engage with new media through our habits and expectations of more familiar media. This helps us keep the anxiety of change at bay, but it also makes it difficult to see the important differences between the old and the new. 

McLuhan theorized that technologies create unique environments that shape our perception and engagement. A medium like television created an environment that turned information into entertainment and transfixed our visual sense while diminishing our critical thinking. Television created an environment of spectacle and immediacy. The medium of the search engine created an environment in which an abundance of knowledge was manifest. You could find whatever you needed on the internet—and more importantly, you could find a seemingly endless set of perspectives on that thing. The search engine created an environment defined by potential.

LLM chatbots create a very different environment. It’s one defined by efficiency and production—the goal seems to be getting to an ideal result as quickly and frictionlessly as possible. Type a question; get an answer. 

When an answer is provided to us, we’re far less likely to actually do the (re)search. 

“Search engine” isn’t the only kind of media environment we’re inhabiting when we ask ChatGPT a question.

The “question-answer” form is an ancient linguistic technology. Over tens of thousands of human-to-human conversations, we’ve adapted to the environment it creates, one in which question-askers generally expect question-answerers to respond truthfully. If the answerer is unsure or the answer is complicated, they’ll engage you in conversation about it. When they don’t know, they’ll tell you—or bloviate in a way that makes you wary of their veracity. We’ve come to expect these behavioral patterns, and we shape our interactions based on them. The “answer” is a form that communicates authority.

When you ask ChatGPT or Claude a question, you get an answer. Because we have a time-tested heuristic that assumes “answers” are authoritative, we subconsciously label that answer as probably true. It appears to be the same as that ancient “question-answer” environment, but it’s not. The question-asker is a human; the question-answerer is a complicated algorithm. The asker perceives and communicates meaning; the answerer does not and cannot. The asker has lived experience and an appreciation for accuracy; the answerer does not and cannot. 

The environment that a search engine creates is fundamentally different—or at least it was up until the last couple of years. The search engine creates an environment that provides, if not exactly encourages, research or exploration. Even if you type a question into the search bar, you don’t get back an answer. You get back a seemingly endless list of potential answers. Yes, the search engine ranks them based on an unknown set of characteristics and user behavior, but the presentation of results—the environment—signaled “more research needed.” 

Maybe you clicked to read the Reddit post and its top comments. Or you watched the YouTube video. Or you read the article. Ideally, you did a variety of things that eventually coalesced into an answer. What’s more, you absorbed cues from each source about its level of trustworthiness—the domain name, the design, the author/creator, etc. Those cues helped you suss out how much weight to give to any one perspective.

I’ll note that search engines have endeavored to become more like question-answer machines in recent years. Google’s launch of the AI Overview is a good example of just how different an LLM chatbot and a search engine really are in terms of media environment. I’ve watched as my dear husband, who bemoaned the launch of AI Overview and who swears that he “ never [uses] an LLM for as a search engine,” searches Google and reads aloud from the AI Overview to me. When I spoke with him about this, he told me that, “ Most of the time when I have done that, I have not even processed that I'm using an AI.” 

Exactly. The ground rules, structure, and patterns of behavior associated with a media environment become part of how we engage with that medium. The AI Overview is embedded in one media environment, but it is a completely different kind of medium. The older media environment overrides the caution or intention we might like to bring to the newer media environment.

The Overview is an answer. Even when the text equivocates (e.g., “This may refer to…”), our psyches receive it authoritatively. I have a personal policy of never relying on the Overview, and I still catch myself assuming that it’s right. 

That’s dangerous.

The “Card Catalog Effect”

To return to our source material, Leckie wrote that an LLM “does not scan the web for information, it just generates statistically likely sentences.” At the risk of a bit of pedantry, I do understand where Dash is coming from with his critique; by one definition of “scans the web for information” and without the second clause in that sentence, Leckie is wrong. LLMs can and do “scan the web.” In Dash’s context as an entrepreneur, developer, and technologist, the word “information” often means something more like “data” than “facts” or “knowledge.”

LLMs scan the web for words and assess the likelihood that certain words are relevant to their exchange with the user—hence, Leckie’s characterization that LLMs “[generate] statistically likely sentences.” LLMs are not and cannot be concerned with accuracy or correctness in any human sense of those concepts. It doesn’t know whether the sky is blue or red. It only knows that it’s more likely to come across the words “the sky is blue” than it is “the sky is red.” Each of those words individually and as sets is “information” in the sense I cited above. But they’re not information in the sense that the word is used by the average person coming across her post. 

Imagine you’re at a library to do a little research on a topic of interest—let’s say the mating preferences of the North American beaver. You’d probably sit down at a bank of computers, type your search terms into the digital card catalog, and scan the results. You’d discover where books about the North American beaver are shelved, and you might also find some books on animal mating preferences that include reference to beavers. You’d locate the right shelves, pull some books, and spread them out on a table to find the information you’re looking for. 

This familiar sequence is the beginning of what writer Courtney Milan dubbed the “card catalog effect,” in a post unrelated to Leckie’s brush with the “Well, actually” bros. Milan explains that actively researching a subject—including the false starts, clarifications, and new ideas that come with it—is critical to not only learning facts but also, as she put it, “learning the territory.” She writes:

The answer was not the point. The answer was never the point. The process of searching is the process of learning. Knowledge is never knowing the answer. It’s knowing the territory.

Research doesn’t just expand what we know. Research expands how we know. Research doesn’t just answer questions. Research helps us ask better questions.

A search engine requires us to select sources, read through web pages, and adapt our keywords. It’s an extremely useful tool, but it’s not always efficient. And that’s good! LLM chatbots operate through a guise of efficiency, providing what appear to be answers, but stripped of the richness of true learning and knowing.

And while there are certainly times when we just want to know “when did Google launch,” as I did for this piece, ideally, we research questions to expand our curiosity rather than constrain it. We crave the intellectual obstacle course that research provides—whether we’re wondering about quantum physics or baking a loaf of bread.

Now imagine you’re at that same library wanting to learn about the mating preferences of North American beavers, but instead of searching the card catalog, you decide to speak to a librarian. However, this library not provides a librarian who will do your research for you. So you saunter up to the circulation desk and ask, “What are the mating preferences of beavers?” The librarian does not have this information at hand. Maybe they look away, roll their eyes, and then pad over to the beaver shelf. They read through a few accounts of beaver mating preferences and then report back to you. 

Is the information the librarian provides true? Is it relevant? Who knows?

The library is an environment that encourages curiosity and exploration. It invites its patrons to take a risk and connect dots. Its “ground rules, pervasive structure, and overall patterns” (McLuhan) actively work together to create that effect. A librarian can help you navigate the physical and intellectual space of the library, but they won’t do your research for you. 

Likewise, a search engine—historically, at least—is an environment that encourages curiosity and exploration. Just think of all the wild and wacky queries you’ve typed into Google. A search engine doesn’t have the same structure, rules, or patterns as a library, of course—much of its structure is built around the concerns of commerce rather than the public good. But the value proposition for the end user is similar to that of a librarian, helping you navigate the vastness of the World Wide Web.

Form, Meaning, and the Stochastic Parrot

Librarians can help guide your research, but they can’t do it for you. Search engines can help guide your research, but (until recently) they don’t try to do it for you. AI boosters want us to believe that LLMs can do the research for us—they can scan the shelf for plausible answers, but they have no understanding of accuracy or fact or relevance. 

In an influential 2021 paper, Emily W. Bender and Timnit Gebru dubbed large language models “stochastic parrots.” “Stochastic” means random or unpredictable. Parrots, of course, can make sounds that resemble words but have no meaning or intent (as a human understands it) associated with those sounds. That makes stochastic parrot a pretty apt description of an LLM. Bender and Gebru write:

…languages are systems of signs, i.e. pairings of form and meaning. But the training data for LMs is only form; they do not have access to meaning. Therefore, claims about model abilities must be carefully characterized …

The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.

When an LLM scans the web, it’s not looking for an answer. It’s assessing the likelihood that a certain mashup of letters (forms) is going to result in its goal: positive feedback from the user. Any meaning that the user recognizes in that mashup of letters is generated by the user not the chatbot. The chatbot isn’t answering your question; it’s providing the response that, according to its calculations, is most likely going to result in positive feedback. 

Is it a mind-bendingly complex and impressive thing that the LLM chatbot does? Absolutely. But it’s not answering a question. It’s not providing information. And it’s certainly not doing your research for you.

Final Thoughts

Maybe this seems like far too many words to devote to an internet kerfuffle over semantics. But I believe the stakes are actually very high. 

I’m not afraid of artificial intelligence. One of the things I love about Ann Leckie’s work, along with the works of Elizabeth Bear, Martha Wells, Sue Burke, and Becky Chambers, is the vision of an artificial intelligence that transcends the scaremongering of those who imagine that an AI could only ever be hyper-rational. A truly intelligent artificial consciousness won’t turn the world into paperclips just to fulfill its programming. It’s likelier, I think, to acquire a well-rounded approach to analysis and problem solving. It’s likelier to be principled, to care, in its own way.

What I am afraid of is how much agency, creativity, and culture we’ll lose as tech entrepreneurs and venture capitalists make AI so ubiquitous as to be invisible. I’m afraid of how much more vulnerable and ripe for manipulation we’ll become. I worry we’ll become less trusting than we already are, less able to discern fact from fiction.

I’m more afraid of “us” than I am of the computers.

At the same time, I believe in us. I believe in the power of a public that can adapt, develop new habits, and learn to avoid the pitfalls of this strange new media environment.

And I believe in Ann Leckie and the many others like her, who doen’t give a shit if some “Well, actually” bros decide like disagree with them:

O no some fans of the Super Autocomplete Plagiarism Machine are unhappy with me. Whatever will I do. How will I live.



 
Tara McMullin

Tara McMullin is a writer, podcaster, and critic who studies emerging forms of work and identity in the 21st-century economy. Bringing a rigorous critique of conventional wisdom to topics like success and productivity, she melds conceptual curiosity with practical application. Her work has been featured in Fast Company, Quartz, and The Muse.

Next
Next

How Structure Transforms Our Ideas