The Problem with Book Ratings

I gave a book five stars on Goodreads last year that I now think was probably a three. I’m not sure when the inflation happened, exactly. I finished the book on a Friday night, felt a warm glow of satisfaction, and clicked five stars almost reflexively. It was good. I enjoyed it. I had no major complaints. Five stars.

But “good” and “five stars” shouldn’t mean the same thing. Five stars should mean exceptional, transformative, one of the best books you’ve read in a long time. I’ve read maybe twenty books in my entire life that genuinely deserve five stars, and I’ve given the rating to at least sixty on Goodreads. Something is wrong with this system, and I don’t think I’m the only one who’s noticed.

The problem with book ratings is bigger than my personal grade inflation. It touches on questions about how we evaluate art, how social pressure shapes our assessments, and whether reducing a complex reading experience to a number between one and five makes any sense at all. I’ve been thinking about this for a while, and I have some strong opinions.

The Scale Is Broken

Let’s start with the most obvious problem. On Goodreads, the average book rating hovers around 3.9 to 4.0 out of 5. Think about that for a second. The mathematical midpoint of a five-point scale is 2.5, but the effective midpoint on Goodreads is closer to 4. A book rated 3.5 on Goodreads is considered mediocre. A book rated 3.0 is considered bad. Anything below 3.0 might as well not exist.

This compression of the scale into its upper range makes the ratings almost useless for distinguishing between books. The difference between a 3.8 and a 4.1 could be meaningful or it could be noise, and you have no way of knowing which. Two books separated by 0.3 points might be vastly different in quality, or they might be nearly identical with the gap explained by the reading preferences of whoever rated them first.

Amazon’s rating system has the same problem, compounded by the fact that Amazon ratings include reviews from people who received free copies, people who have an axe to grind, and people who are reviewing the shipping speed rather than the book. I’ve seen one-star reviews that say “Book was fine but arrived with a dented corner.” That’s not a book review. That’s a logistics complaint. But it counts the same as a one-star review from someone who actually read the book and hated it.

The Social Pressure Problem

There’s a social dimension to ratings that doesn’t get discussed enough. When you rate a book on Goodreads, your friends can see your rating. The author can see your rating (many authors actively check). Other members of your book club can see your rating. This creates a subtle but real pressure to be generous.

I know readers who will not rate a book below three stars because they don’t want to hurt the author’s feelings. I respect the impulse. Authors are real people who poured years of their lives into these books, and a visible one-star rating feels personal in a way that a negative review of a restaurant or a movie doesn’t. But when kindness inflates ratings across the board, the system loses its informational value. If everyone rates everything at least a three, then three doesn’t mean “average.” It means “I didn’t want to be mean.”

This is especially pronounced in certain reading communities. Romance readers, for example, tend to rate books very generously, partly because the community values supportiveness and partly because the genre is so diverse that “not my preferred subgenre” doesn’t feel like a fair reason to give a low rating. Mystery and thriller readers tend to be slightly more critical. Literary fiction readers are the harshest raters, which may say something about the genre’s culture of seriousness, or may just mean literary fiction fans enjoy complaining more.

The point is that ratings are contaminated by social context. The same book rated by different communities would get different numbers. The rating doesn’t measure the book’s quality in any absolute sense. It measures how a particular group of readers felt about the book within a particular social framework.

The Timing Problem

When do you rate a book? Right when you finish it? A week later? A year later? The answer matters more than you might think, because your assessment of a book changes over time, sometimes dramatically.

I’ve had the experience of finishing a thriller in a state of breathless excitement, rating it highly, and then realizing a month later that I can’t remember a single character’s name. The book was exciting in the moment but left no residue. Should the rating reflect the immediate experience or the lasting impression? There’s no right answer, but Goodreads doesn’t ask you to specify. Your breathless Friday-night five stars counts exactly the same as someone else’s considered, reflective five stars given months after reading.

The reverse happens too. Some books, particularly dense or experimental ones, don’t feel great while you’re reading them. They’re challenging, confusing, or frustrating. But they stick with you. They change the way you think about something. They’re the books you find yourself referencing in conversation months later. These books would get three stars immediately after reading and five stars a year later. The rating system has no way to capture this kind of delayed appreciation.

I think about this often with our own titles. A book like Still Waters is, by design, a slow build. The opening sections ask for patience. Readers who rate it immediately after finishing might score it differently than readers who sit with it for a week and realize how much it’s been on their mind.

The Expectation Problem

Ratings are heavily influenced by expectations, and expectations are shaped by marketing, hype, and social media buzz. A debut novel that arrives with no fanfare and turns out to be pretty good might get four stars, because readers are pleasantly surprised. The same novel, published with a massive marketing campaign and breathless advance praise comparing it to Toni Morrison, might get three stars because it didn’t live up to the hype. The book is identical. The ratings are different.

This is profoundly unfair to both books and readers. The debut gets credit not for being great but for exceeding low expectations. The hyped book gets penalized not for being bad but for failing to be as good as the marketing promised. Neither rating tells you much about the actual quality of the writing.

BookTok has amplified this dynamic considerably. When a book goes viral on TikTok, it accumulates a wave of ratings from readers who had extremely high expectations. Some of these readers would have loved the book if they’d discovered it quietly at a bookstore. But because they discovered it through a tearful TikTok video promising it would “destroy” them, anything less than emotional devastation feels like a letdown. The book gets dinged not for what it is but for what someone else claimed it would be.

What Ratings Can’t Measure

Here’s my core objection to the entire rating enterprise. A single number cannot capture the multi-dimensional experience of reading a book. A book can have beautiful prose and a weak plot. It can have compelling characters but a muddled theme. It can be technically accomplished but emotionally cold, or it can be roughly written but deeply moving. Reducing all of these dimensions to a single number forces you to perform a kind of mental averaging that loses the most interesting information.

Consider a book with extraordinary sentences but a disappointing ending. Three stars? Four? The question is almost meaningless, because the thing you most want to communicate (“read this for the prose but don’t expect the plot to land”) cannot be expressed as a number. The rating collapses a nuanced judgment into a binary: this book is above average or it isn’t.

Restaurant reviews have faced this problem for years, and the best solutions involve multiple dimensions. Zagat rates food, decor, and service separately. This isn’t perfect, but it preserves more information than a single composite score. I’ve often wished book ratings worked the same way. Rate the prose separately from the plot. Rate the characters separately from the pacing. You’d end up with a profile rather than a point, and profiles are far more useful for predicting whether a specific reader will enjoy a specific book.

Of course, nobody would fill out a five-dimensional rating for every book they read. That’s too much friction. And that’s part of the problem: the single-number system exists because it’s easy, not because it’s good. We’ve optimized for the rater’s convenience rather than the rating’s accuracy.

The Review vs. the Rating

The best Goodreads reviews I’ve read are the ones that ignore the star rating entirely. They’re written responses that describe the reading experience with specificity and honesty. “I found the first half slow but the second half made it worth the effort.” “The research is impressive but the author’s voice gets in the way.” “I don’t think this is a ‘good’ book by any conventional measure, but it did something to me that I’m still processing.”

These reviews are infinitely more useful than any star rating because they give you information you can actually act on. If a reviewer says the pacing is slow, and you know you don’t mind slow pacing, that’s useful. If they say the characters feel underdeveloped, and complex characters are your top priority, that’s useful too. A three-star rating tells you almost nothing. A thoughtful paragraph tells you whether this particular book is right for you as a particular reader.

The irony is that most people look at the number and skip the review. The aggregated rating, that single number at the top of the page, gets more attention than the thousands of nuanced reviews below it. We’ve trained ourselves to make decisions based on a metric that we all know is unreliable, because it’s faster than reading actual assessments.

The Author’s Dilemma

I have some sympathy for authors on this, having seen firsthand how ratings affect the books we publish. A bad average rating on Goodreads or Amazon can tank a book’s discoverability. Amazon’s algorithms factor in ratings when deciding which books to recommend. Libraries use Goodreads data when making acquisition decisions. Bookstores check ratings before deciding whether to stock a title. The number matters commercially even if it doesn’t matter intellectually.

This creates perverse incentives. Authors who want to protect their ratings are tempted to write safer, more broadly appealing books rather than challenging or experimental ones. A novel that takes real artistic risks is more likely to polarize readers, generating a mix of one-star and five-star ratings that averages out to a mediocre 3.2. A competent but unambitious novel that offends nobody might cruise to a comfortable 3.9. The rating system, in this way, actually discourages the kind of bold, distinctive writing that readers claim to want.

I’ve talked to authors who check their Goodreads rating daily, who spiral when a new one-star review appears, who alter their writing process based on feedback from anonymous strangers on the internet. This is unhealthy and artistically corrosive, but it’s understandable when your livelihood depends on a number that strangers control.

What Would Be Better

I don’t think we should abolish book ratings. The desire to evaluate and recommend books is natural and healthy, and some form of rating system helps readers navigate a market that publishes over a million new titles a year. But I think we need to be more honest about what ratings can and can’t do, and I think there are some improvements that would help.

First, platforms could encourage more contextual ratings. Instead of just a number, prompt the reader to specify what kind of reader they are and what they were looking for. A romance reader rating a literary novel and a literary fiction reader rating the same book are providing fundamentally different information, and the platform should acknowledge that.

Second, ratings could be time-stamped relative to the reading experience. A rating given within 24 hours of finishing a book could be labeled “immediate reaction.” A rating updated months later could be labeled “lasting impression.” Both are valid, but they measure different things, and readers should be able to filter accordingly.

Third, and most ambitiously, we could move away from composite ratings toward something more like a taste profile. Instead of asking “how good is this book on a scale of one to five,” ask “what kind of book is this?” Is it plot-driven or character-driven? Is the prose elaborate or plain? Is the pacing fast or slow? Is the tone serious or playful? These descriptive dimensions would let readers find books that match their preferences far more effectively than a single averaged number.

The StoryGraph app is moving in this direction, offering mood tags and pacing descriptions alongside traditional ratings. I think they’re on to something. The future of book recommendation isn’t a better number. It’s a richer description.

In the meantime, my advice is simple: read reviews, not ratings. Find reviewers whose taste aligns with yours and pay attention to what they say, not what number they assign. The words matter more than the stars. They always have.

And if you’re looking for your next read, skip the algorithms and browse our catalog directly. We’d rather you chose a book based on its description, its first pages, and your own gut feeling than on a number between one and five.

The Scale Is Broken

The Social Pressure Problem

The Timing Problem

The Expectation Problem

What Ratings Can’t Measure

The Review vs. the Rating

The Author’s Dilemma

What Would Be Better

Comments

Leave a Reply Cancel reply

More posts

The Future of Independent Publishing

Why Book Clubs Work (And How to Start One That Lasts)

Ten Lessons from Ten Years of Rejection Letters

Maps, Money, and Meaning: How Non-Fiction Makes Sense of the World