Course material for Complex Systems 530 – Computer Modeling for Complex Systems

This term, I'm teaching Complex Systems 530 - Computer Modeling for Complex Systems at the University of Michigan Center for the Study of Complex Systems.  In the spirit of open science, all course material will be available online at Github.  You can browse the repository here: https://github.com/mjbommar/cscs-530-w2015.   In the course, we're exploring why and

Advanced approximate sentence matching in Python

In our last post, we went over a range of options to perform approximate sentence matching in Python, an import task for many natural language processing and machine learning tasks.  To begin, we defined terms like: tokens: a word, number, or other "discrete" unit of text. stems: words that have had their "inflected" pieces removed based on

Fuzzy match sentences in Python

Let's imagine you have a sentence of interest.  You'd like to find all occurrences of this sentence within a corpus of text.  How would you go about this? The most obvious answer is to look for exact matches of the sentence.  You'd search through every sentence of your corpus, checking to see if every character of the

Isotonic Regressions in scikit-learn

Isotonic regression is a great tool to keep in your repertoire; it's like weighted least-squares with a monotonicity constraint.  Why is this so useful, you ask?  Take a look at the example relationship below. (You can follow along with the Python code here).       Let's imagine that the true relationship between x and y is characterized piece-wise by a sharp

Benchmarking the new AWS Postgres RDS instances

Postgres has finally come to RDS!  Want to know how these instances stack up on a naive pgbench test? Below is a table of simple pgbench results taken over a 60 second period.   All RDS instances were launched into us-east-1a, and pgbench (9.4devel) was run from an m2.4xlarge in us-east-1a as well. TPS t1.micro

Automating Oracle ORION I/O testing

Oracle ORION is a powerful tool for evaluating realistic OLTP and DSS/DW I/O performance.  ORION should be a part of every Oracle professional's toolkit for build QA and performance tuning.  I've previously used it on this blog to show the performance of Amazon's hi1.4lxarge SSD-backed instances and newer hs1.8xlarge instances with 117GB of RAM and 48TB of

AWS EC2 hs1.8xlarge Oracle ORION benchmark results

Benchmarking I/O with Oracle ORION is an important part of planning, baselining, and performance-tuning Oracle environments.  I've previously provided ORION results for the hi1.4xlarge SSD-backed instance class, and based on some recent work, I wanted to provide an update for the newer hs1.8xlarge instance class.  Below you'll find hs1.8xlarge Oracle ORION benchmark results with the following

ipython notebook for R: Quickstart for Ubuntu

If you're like me, you love ipython notebook but often write R.  RStudio's integrated RMarkdown is nice, but for some contexts like quick demos or basic training, a browser-based interface is unbeatable.  What if we could get the best of both worlds - an ipython notebook for R? The answer is rNotebook, and if you

Is the Tax Code the longest Title?

Last week, I shared that Dan Katz and I had finally published a draft of our paper, Measuring the Complexity of the Law: The U.S. Code.  We'd previewed this research on Computational Legal Studies years ago.  Since then, we've received great feedback and a number of questions.   The most common question, even among legal professionals,

Plotting average read and write operation size by ASM disk for Oracle

Throughput, throughput, throughput - for many databases, this is the performance measure of importance.  When you are working with a fixed number of IOPS but see mixed workload types, system health can be assessed through the average read and write operation size.  In an ASM environment, we can query this information by ASM disk

