Books vs. Charts – Part Two


Using AI-generated graphs to see if we can draw some conclusions.

This is the second post, you can see all the books vs. charts posts collected at this link.

Yesterday I wrote about the genesis of the Me vs. Janan Ganesh project and how I prepared the data for crunching by me and my friend, Claude and ChatGPT. I’d managed to start generating charts that I could use to test his (and my) theses. This post contains the charts.

I started off simple[1]. Here are books over time, separated by fiction and nonfiction:

I read a lot! Every year it happens to be about 2:1 in favour of fiction. That’s not deliberate, but the consistency is interesting. 581 fiction books, 310 nonfiction. Certainly a large enough sample size to reveal some trends.

Next question: how old are these books? Here’s the ‘age’ of each book at the time that I finished reading it:

Again, not terribly surprising, I read a lot more new books than old ones. Two thirds of the books I read are fewer than 10 years old at the time I finish them, and the number of books decreases as the age increases.  I imagine this is true of most recreational readers.

The data also supported my sense that I’m more likely to read old fiction as opposed to old nonfiction:

So, to Ganesh’s point: do I like old books more than new books? Star ratings aren’t the best judge of lasting impact, but it’s what I’ve got. Turns out, yes, absolutely I like older books more, no question:

If I break it down between fiction and nonfiction, it’s a much different story though. I like new nonfiction books a lot. This doesn’t surprise me at all, but it’s interesting to see it in colour:

Bear in mind the sample size of ‘nonfiction books over 50’ is exactly 43[2]

Next let’s look at rating differential: the books where my rating is most divergent from the Goodreads rating. It would be interesting to see how goodreads ratings change over time, maybe I could run this again next year with revised goodreads consensus readings and see what differentials have changed the most.

One thing that’s interesting is my rating differential – my sense is that I’m more critical than the average Goodreads user. According to the data, that was right until this year, where I’m bang-on with consensus:

What’s going on here? Am I becoming softer? Am I using better sources to find books? Am I getting better at picking books that I will like? Am I reading older books? I have found that it’s harder to find appealing books at mall bookstores, and I’m increasingly drawn to indies and specialty bookstores. I’m also more chill – my life has become much simpler in the past couple of years, so maybe that factors in.

The numbers don’t have any answers. The statistical analysis (done by ChatGPT, cuz that’s way outside my wheelhouse) says that there isn’t a correlation between the rating differential and the age of the book. Something called a Pearson correlation coefficient is 1.1484, which translates to a weak correlation – not enough to make any inferences from. The median and average age of the books I’m reading was higher in 2022 than 2024. So the data doesn’t give me any answers there.

Whatever the case, I’m enjoying what I’m reading more than ever, and I’m enjoying old books more than others. So by the numbers, Ganesh is right — I should only read old fiction and new, very focused, nonfiction.

TOMORROW: Counterpoint: anecdotes and gut instincts


[1] the charts aren’t finessed out of Claude – I copied and pasted the code, that’s all

[2] Confessions of an Advertising Man by David Ogilvy, Walden by Thoreau, and Down and Out in Paris and London by Orwell. Turns out I had misfiled a Bradbury book as nonfiction.

more text here