## Advanced approximate sentence matching in Python

In our last post, we went over a range of options to perform approximate sentence matching in Python, an import task for many natural language processing and machine learning tasks.  To begin, we defined terms like: tokens: a word, number, or other "discrete" unit of text. stems: words that have had their "inflected" pieces removed based on

## Fuzzy match sentences in Python

Let's imagine you have a sentence of interest.  You'd like to find all occurrences of this sentence within a corpus of text.  How would you go about this? The most obvious answer is to look for exact matches of the sentence.  You'd search through every sentence of your corpus, checking to see if every character of the

## Isotonic Regressions in scikit-learn

Isotonic regression is a great tool to keep in your repertoire; it's like weighted least-squares with a monotonicity constraint.  Why is this so useful, you ask?  Take a look at the example relationship below. (You can follow along with the Python code here).       Let's imagine that the true relationship between x and y is characterized piece-wise by a sharp

## Featured in Wired: Measuring the Complexity of the Law

Thanks to Sam Arbesman (@arbesman) for featuring Dan and my paper, Measuring the Complexity of the Law: The United States Code, on his excellent Wired Science blog, the Social Dimension. You can read the article here, and, as a reminder, all of the code and data from the paper is available in this github repository. Abstact

