One of the more exciting and public projects we've been working on lately has finally come to light - our Supreme Court prediction project with Dan Katz and Josh Blackman. This project is exactly what you'd expect - a framework for predicting the Supreme Court, though meant to span the Court's entire history, unlike previous projects.
Let's imagine you have a sentence of interest. You'd like to find all occurrences of this sentence within a corpus of text. How would you go about this? The most obvious answer is to look for exact matches of the sentence. You'd search through every sentence of your corpus, checking to see if every character of the
Isotonic regression is a great tool to keep in your repertoire; it's like weighted least-squares with a monotonicity constraint. Why is this so useful, you ask? Take a look at the example relationship below. (You can follow along with the Python code here). Let's imagine that the true relationship between x and y is characterized piece-wise by a sharp
Thanks to Sam Arbesman (@arbesman) for featuring Dan and my paper, Measuring the Complexity of the Law: The United States Code, on his excellent Wired Science blog, the Social Dimension. You can read the article here, and, as a reminder, all of the code and data from the paper is available in this github repository. Abstact
Benchmarking I/O with Oracle ORION is an important part of planning, baselining, and performance-tuning Oracle environments. I've previously provided ORION results for the hi1.4xlarge SSD-backed instance class, and based on some recent work, I wanted to provide an update for the newer hs1.8xlarge instance class. Below you'll find hs1.8xlarge Oracle ORION benchmark results with the following
Four years ago, Dan Katz and I began working on a project to measure the complexity of the law. Its genesis was, in every sense, an accident; in order to properly identify citations to the IRC in our VTR empirical review of U.S. Tax Court decisions, we had to deal with the informal, non-Blue
After a nice twitter conversation this morning, I finally got the impetus to release the source for my Congressional Bill Statistics data. You can find the source at this Github repository. I haven't taken the time to review licensing yet, but I won't be asserting anything more than CC3 Attribution on my code.
While Tsipras and his Syriza coalition have been busy in Greek parliament, the Internet has been a-buzz with speculation that their platform will result in a Greek exit from the Euro currency. This prospect, affectionately dubbed "Grexit" by Citi in February, has been making the rounds on Twitter under the hashtag #grexit. We think the
Let's say you've used my Python script to automate the download of a hashtag or search phrase from Twitter (in a Unicode safe way, unlike within R). Now let's say you want to visualize the number of tweets over time. Easy enough - I've also shared this R/ggplot2 code that accomplishes the task. However, let's say
In my last post on CloudSearch and eDiscovery, I described something like “Google” for eDiscovery emails. FedEx or DropBox your data to an eDiscovery service provider like myself, and rest assured that you’ll soon have a powerful, web-based user interface for searching and visualizing your digital discovery materials. As a technical follow-up to