Last week, I posted some R code that downloads the user and timestamp of tweets that contain a given hashtag going back as far as Twitter search will allow. As I noted in the post, the text of these tweets isn’t stored because of encoding issues with R and its JSON packages. A few people emailed asking
I've been playing with Twitter data over the last week, archiving Algerian, Egyptian, Iranian, and Chinese tweets. I thought I'd bring the story a little closer to home this time by archiving tweets from Wisconsin Union protesters on the #wiunion tag. Grab the dataset of 165,593 tweets here, and check out the two figure
I created my RescueTime account in October 2009, installed the client, got confused, and promptly ignored its weekly email until last week. For one reason or another, I was compelled to log back in and check it out. After downloading the client and trying it out on my free account for a few hours, I
Out of all the visuals I’ve produced, I think the "coolest" is the three-dimensional U.S. Supreme Court citation network 1080p movie I produced with Dan Katz (close friend, coauthor, and newly minted law professor!). 3D networks, especially dynamic ones, really invoke the "wow" factor. Movies are especially important in dynamic cases too, since without the animation,
Remember that last Snowmageddon? This one was pretty good too.
I’ve posted three examples of Twitter hashtags datasets in the last week: one on China, one on Iran, and one on Algeria. In order to build these datasets, I needed to obtain older tweets; this is slightly more difficult than simply filtering the streaming feed for your hashtag of choice. The original code I wrote
Earlier this week, I posted a ~100k tweet dataset on the #25bahman protests in Iran. The corresponding figure of frequencies showed a strong presence on Twitter, with over 500 tweets per 5 minute period at peak. You can download the dataset or check out the figure in that post. I decided to take a quick
My first experience with R was in 2007 as a sophomore in undergrad. As part of a larger project on pricing day-ahead electricity futures, I wanted to cluster locational marginal price (LMP) data from the ISO-NE. Something like k-means is easy to plot and visualize in low-dimensions, but this data was better approached by hierarchical methods.
You may have heard about the recently leaked presentation on combating Wikileaks that was produced by employees of HBGary Federal, Palantir Tech, and Berico Tech. You may have also heard that Anonymous retaliated against HB Gary Federal for threatening to release their identities. I thought it would be interesting to run some analysis of the email networks
Just got back from a few days of business in New York. Here's a picture of the the view on approach to LGA on Wednesday afternoon. I never check bags, which means I never bring my Canon EOS, but these stunning opportunities always make me wish I had a real camera.