One of the more exciting and public projects we’ve been working on lately has finally come ...
Fuzzy match sentences in Python
Let’s imagine you have a sentence of interest. Â You’d like to find all occurrences of t...
Isotonic Regressions in scikit-learn
Isotonic regression is a great tool to keep in your repertoire; it’s like weighted least-squ...
Featured in Wired: Measuring the Complexity of the Law
Thanks to Sam Arbesman (@arbesman) for featuring Dan and my paper, Measuring the Complexity of the L...
AWS EC2 hs1.8xlarge Oracle ORION benchmark results
Benchmarking I/O with Oracle ORION is an important part of planning, baselining, and performance-tun...
Measuring the Complexity of the Law: The U.S. Code
 Four years ago, Dan Katz and I began working on a project to measure the complexity of the law. Â...
Git Repository for Congressional Bill Statistics
 After a nice twitter conversation this morning, I finally got the impetus to release the source f...
Grexit stage left: visualizing the online discussion around Greece’s possible Euro exit
 While Tsipras and his Syriza coalition have been busy in Greek parliament, the Internet has been ...
Charting Twitter time series data with tweet and unique user counts
Let’s say you’ve used my Python script to automate the download of a hashtag or search p...
Generating AWS CloudSearch SDF for Emails
 In my last post on CloudSearch and eDiscovery, I described something like “Google” fo...