Isotonic regression is a great tool to keep in your repertoire; it's like weighted least-squares with a monotonicity constraint. Why is this so useful, you ask? Take a look at the example relationship below. (You can follow along with the Python code here). Let's imagine that the true relationship between x and y is characterized piece-wise by a sharp
Four years ago, Dan Katz and I began working on a project to measure the complexity of the law. Its genesis was, in every sense, an accident; in order to properly identify citations to the IRC in our VTR empirical review of U.S. Tax Court decisions, we had to deal with the informal, non-Blue
After a nice twitter conversation this morning, I finally got the impetus to release the source for my Congressional Bill Statistics data. You can find the source at this Github repository. I haven't taken the time to review licensing yet, but I won't be asserting anything more than CC3 Attribution on my code.
Based on Launchpad traffic and mailing list responses, Gabor and Tamas will soon be releasing igraph 0.6. In celebration, I’ll be publishing a number of helpful lists and tables I’ve put together to organize information about igraph. In this post, we’ll cover the community detection algorithms (~i.e., clustering, partitioning, segmenting) available in 0.6
I first heard about Python pandas from a friend at RenTech or AQR in the early summer of last year. At the time, the project was little more than a documentation page and a few wrapper methods around numpy. Since then, pandas has matured into a very useful and featureful addition to numpy, scipy,
While Tsipras and his Syriza coalition have been busy in Greek parliament, the Internet has been a-buzz with speculation that their platform will result in a Greek exit from the Euro currency. This prospect, affectionately dubbed "Grexit" by Citi in February, has been making the rounds on Twitter under the hashtag #grexit. We think the
The NATO summit is currently being held in Chicago, and, as is typical for NATO or G# summits, the streets and tweets are full of dissent. In the spirit of my past investigations of online dissent (#jan25, #25bahman, #12fev, #wiunion, #cn220, #march15), I thought I would investigate the #nonato tag, where Twitter users around
In the last post on AWS CloudSearch, I provided a tutorial on the creation of a simple CloudSearch domain for Supreme Court decisions. This walkthrough described the steps of creating a domain, configuring access policies and indexing, populating the index, and using the search API. We were left with a functioning case search database.
Here's a fun example of how you might use my data on Congressional bill length and complexity. Imagine you want to understand the empirical distribution of Flesch-Kincaid reading level for Congressional bills and how this distribution is related to bill stage. A first step might be to visualize this relationship. Based on this
When I put together my original post on the length and complexity of Congressional bills, I was hoping to build forward momentum on the project. The goal was to build a simple, sortable and searchable interface to explore and visualize the data. As usual, however, paying employers and consulting clients got in the way