Nov 14, 2015 permalink

Fun with iMessage

A friend of mine in a group chat was wondering how long people would go between talking or who talked the most often. I became curious and after some quick stackoverflowing I found that you already have all of the data you need locally if you use Messages on OS X. As of today on El Capitan, you can easily hop into the database with sqlite3 ~/Library/Messages/chat.db.

Note: This only goes back as far as you used Messages on OS X. To get all of your messages from your phone, back it up to your computer and find the sqlite database in ~/Library/Application\ Support/MobileSync/Backup/ (Thanks, Wired!). You'll find that there are just a mess of files with no names to help you; picking a string from your OS X copy of the Messages db and grepping for it in the backup seems to work.

The only tables I ended up caring about to answer this question were message and handle. I’ll talk a bit more about how to generate this in a moment, but for now, we can look at Figure 1 and see what the final product is.

Figure 1. Messages per day by participant. Figure 1

From Figure 1 we can extrapolate into the future and see that Pat will need to spend every single waking second sending texts to this group and I’m probably dead, can someone check on me?

Note: It's always ok to extrapolate from two data points.

There aren’t that many steps to getting that plot put together. The first step will be to find which “room” your chats have been taking place in. As far as I can tell, each message is connected to a room via the cache_roomnames column. Your chat’s roomname will be something along the lines of chatNNNNNNNNNNNNNNNNNN, where the N’s are obviously numbers. I found the room I wanted by getting the last messages I received and then matching up a message that was from the room to its roomname.

SELECT cache_roomnames, text
FROM message
WHERE cache_roomnames != ''
ORDER BY date DESC LIMIT 10;

Once you have your desired room, you can get every message in it that your heart desires. I found that pulling out just a few key columns got me what I wanted, and joining the handle_id column to the handle table gets you a phone number for each participant.

SELECT M.handle_id,
       H.id,
       strftime('%s', M.date + strftime('%s', '2001-01-01 00:00:00'), 'unixepoch', 'localtime'),
       M.text
FROM message AS M
LEFT OUTER JOIN handle AS H ON M.handle_id = H.rowid
WHERE M.cache_roomnames = 'chatNNNNNNNNNNNNNNNNNN'
ORDER BY date;

That ‘2001-01-01 00:00:00’ is necessary because iMessage starts counting time from 2001 for some reason I’ll never understand. What you want to do at this point is up to you, but I output it as a csv and go to work from there in Python. One note to keep in mind, messages will occasionally have newlines in them, so you may want to define a -newline of your own in your csv output. This will make it harder to read normally, but fixes that issue. I’m almost certain there’s a better way of solving it, and if you know one, let me know!

From there it’s just some simple scripting to make a histogram of posts per user and plot it however you like. I have a manual phone-number human-name dictionary in my code. I’m not sure if there’s a way to automate this with wherever Contacts stores its data, but perhaps that’s a post for another day.

This got me curious, what does my history look like across everyone I’ve talked to? Well, here you go:

Figure 2. Messages per week per person. Figure 2

In Figure 2 I plot the messages per week from anyone who sent me over 150 messages in a week at least once since I’ve had the phone (arbitrary threshold to keep plot clean). With the exception of Person E and Person F, the top texter during any period of time was the person I was dating at that time. I’ve not revealed identities here to protect the innocent.

Over the course of a normal week, when are people texting me? We can pretty much steal Github’s punchcard visualization and reproduce it Figure 3 with texts instead of commits.

Figure 3. Times when texts are happening. Figure 3