Marking four years of sobriety by making charts about my Goodreads data on a Friday night
This will be a series of posts, not sure how many. You can find them all at this link.
On Friday night, I was talking with someone about this post, where I agree and disagree with Janan Ganesh’s take that reading new books is not a wise use of time. It’s snobby and elitist to be sure, but he’s probably not entirely wrong. Buzzy new books tend to disappoint, while classics that I think are outside my taste prove me wrong again and again.
Insomnia and a spreadsheet fetish led me to spend a chunk of Friday night dumping my Goodreads data into Excel and spending hours cleaning it up. Coincidentally: it was 4 years to the day since I quit drinking and channeled that time and energy into reading, so I have 4 full years of data, as a voracious and omnivorous reader. Just short of 900 books over that time. Until I looked at the data I didn’t even register the occasion. More on that later.
The data looks like hell when it comes out of Goodreads. It screws up publication dates — it uses month/day/year for modern books, but for lots of older stuff, it just dumps out month/year, which Excel interprets as month/day/current year, which makes for a task.1
It also doesn’t assign any basic categories such as genre or whether it’s fiction or nonfiction, so there was another task. 2 I don’t care much about how much literary fiction or biographies I read (yet, we’ll see), but basic fiction/nonfiction should bear some interesting data.
Lastly, I had some interesting data types I thought would generate insights, like rating differential (between my ratings and the Goodreads averages) and the age of the books I’m reading as I finish them. Those were easy to set up.
By noon on Saturday, I had the data in a usable format. It was time to try to remember how to use Excel formulas. It’s not hard but I’ve never done it enough to get good, so each time is like starting from zero.
I spent a couple of hours relearning COUNTIFS and AVERAGEIFS and data formatting, trying to logic my way around different ways to sort and parse the data. Eventually, I get to a place where I’m trying to troubleshoot formulas like this:
This was too much even for me.
Until now, I’d been thinking about using AI for some part of this but I wanted to do the original work myself before engaging it. I like messing around with data, and I figured ChatGPT would lead me down interesting paths but maybe not the ones I’d come up with myself.
But fixing a spreadsheet formula? Perfect use case. I started asking questions like this:
“On my spreadsheet, range E2:E892 is my rating for each book out of 5. Range K2:K.892 is the year I finished reading the book. What formula will determine the average rating for a book in 2024?”
And out comes this:
This was a game-changer. Not only near instant and 100% accurate but often more elegant than whatever brute-force-and-ignorance I would have come up with. I went from expecting hours of logic problems and formula troubleshooting to 15 minutes of copy-and-paste and I was in business. It culminated in this table:
Now we’re getting somewhere. Time to make some charts.
Once again, I started off fighting with charts in Excel, and I spent a little while trying to make those work. Once again, I got frustrated with my lack of success. Once again, AI beckoned. The table above was simple enough that I could copy and paste it into chatGPT, but I quickly ran out of free time on it. I’ve never used Claude.ai before, so I figured it was a good opportunity for a bake-off.
Claude immediately started spitting out code for charts, and once I asked it to output code that I could copy and paste to my wordpress site, it got unbelievably easy, and impressively fast.
Until it started lying.
MORE TOMORROW.
1turns out that if I’d exported the .csv it might have been easier, rather than copying the table. Didn’t think of that until later. No difference in the long run, but would have saved me an hour or so. (back)
2this is true no matter what you do. I have shelves and stuff set up in goodreads, and no faith that they’re accurate. (back)