Your company’s finances are the most quantified part of your business, so when it comes to due dil...
Built to Sell: Taxes
We’re going to be up-front with you here: it’s time to talk taxes. Don’t run away yet, unless ...
Built to Sell: HR
People are the metaphorical lifeblood of any company. Perhaps it’s no coincidence then that ...
Built to Sell: Legal
Entrepreneurs are often portrayed as rule-breaking rebels who have a great new idea to introduce to ...
Built to Sell: The Blueprints
Imagine your company five to ten years down the road: what does it look like? Do you see a well-run ...
Since our Last Episode
The Internet is awash in posts looking back on 2018, but over here at Bommarito Consulting, we decid...
Course material for Complex Systems 530 – Computer Modeling for Complex Systems
 This term, I’m teaching Complex Systems 530 – Computer Modeling for Complex Systems a...
Predicting the Supreme Court
 One of the more exciting and public projects we’ve been working on lately has finally come ...
Advanced approximate sentence matching in Python
In our last post, we went over a range of options to perform approximate sentence matching in Python...
Fuzzy match sentences in Python
Let’s imagine you have a sentence of interest. Â You’d like to find all occurrences of t...
Isotonic Regressions in scikit-learn
Isotonic regression is a great tool to keep in your repertoire; it’s like weighted least-squ...
Featured in Wired: Measuring the Complexity of the Law
Thanks to Sam Arbesman (@arbesman) for featuring Dan and my paper, Measuring the Complexity of the L...
Is there tax in the cloud?
Do you contract for IaaS, PaaS, or SaaS services like Amazon Web Services or SalesForce? Â Do you pr...
Benchmarking the new AWS Postgres RDS instances
Postgres has finally come to RDS! Â Want to know how these instances stack up on a naive pgbench tes...
Monitoring AWS VPC tunnels
While many Amazon Web Services resources with “state” have CloudWatch sensors available,...
Automating Oracle ORION I/O testing
Oracle ORION is a powerful tool for evaluating realistic OLTP and DSS/DW I/O performance. Â ORION sh...
AWS EC2 hs1.8xlarge Oracle ORION benchmark results
Benchmarking I/O with Oracle ORION is an important part of planning, baselining, and performance-tun...
ipython notebook for R: Quickstart for Ubuntu
If you’re like me, you love ipython notebook but often write R. Â RStudio’s integrated R...
Is the Tax Code the longest Title?
  Last week, I shared that Dan Katz and I had finally published a draft of our paper, Measuring t...
Measuring the Complexity of the Law: The U.S. Code
 Four years ago, Dan Katz and I began working on a project to measure the complexity of the law. Â...
Plotting average read and write operation size by ASM disk for Oracle
 Throughput, throughput, throughput – for many databases, this is the performance measure of...
Plotting Oracle RMAN backup durations with R
 How long does your Oracle RMAN backup take to complete?  How does this vary over time?  Are the...
Revisiting text processing with R and Python
 Back in 2011, I covered the relative performance difference of the most popular libraries for tex...
Law’s Future from Finance’s Past: Recorded Talk from Reinvent Law Silicon Valley
Back in March, I posted the slides to my talk at the Silicon Valley Reinvent Law event – Law&#...
Generating SSH config from AWS hosts using boto
 As a consultant and advisor to many firms running on or investigating AWS, I find SSH host and ke...
Slides from ReInvent Law Silicon Valley Talk
Live from ReInvent Law Silicon Valley, where I gave an Ignite-style talk drawing analogy to law̵...
Automating Oracle Database deployment with Amazon Web Services, fabric, and boto – SEMOP Talk, Feb 12, 2013
I’ll be giving a talk tonight on automated Oracle database deployment at the SouthEast Michig...
Git Repository for Congressional Bill Statistics
 After a nice twitter conversation this morning, I finally got the impetus to release the source f...
Connecting R to an Oracle database with RJDBC
In many circumstances, you might want to connect R directly to a database to store and retrieve data...
Retrieving the VIX term structure in R
 Much of my time lately has gone into analyzing and trading products in the volatility complex.  ...
Natural Language Processing and Machine Learning for e-Discovery – Slides from guest lecture at MSU College of Law
 Fellow Computational Legal Studies blogger and MSU law prof Dan Katz invited me to give an expert...
Oracle ORION I/O benchmark results for AWS EC2 hi1.4xlarge instance type
 In a typical Oracle database implementation, you’ll want to baseline or benchmark your stor...
Debugging parameter mismatch across RAC database instances with R, dba_hist, and gv$parameter
Did you find this post useful? Â Does your organization need Oracle services? Â We can help. Â Much...
Legal Informatics with AWS CloudSearch – Slides for tonight’s AWS Michigan meetup
 Tonight, Eric and I will be presenting back-to-back talks at the AWS Michigan meetup (hosted by t...
Wordcloud of the Healthcare/ACA (NFIB v. Sebelius) Opinion
Here’s a wordcloud of the NFIB et al. v. Sebelius et al. opinion. Â Very interesting coalition...
Wordcloud of the Arizona et al. v. United States opinion
Here’s one purely for fun – a wordcloud built from the Supreme Court’s opinion on ...
Summary of community detection algorithms in igraph 0.6
 Based on Launchpad traffic and mailing list responses, Gabor and Tamas will soon be releasing igr...
Building Python pandas from development source
 I first heard about Python pandas from a friend at RenTech or AQR in the early summer of last yea...
OTM in the Cloud: Hosting Architectures, Part 2 – 1+RDS
 In our last post on cloud architectures, I covered a simple one-node Oracle Transportation Manage...
Grexit stage left: visualizing the online discussion around Greece’s possible Euro exit
 While Tsipras and his Syriza coalition have been busy in Greek parliament, the Internet has been ...
Oracle Transportation Management (OTM) in the Cloud: Hosting Architectures, Part I – Single Node
 As I mentioned on Monday, I’ll be starting a series of posts on Oracle Transportation Manag...
Upcoming Series: Oracle Transportation Management (OTM) in the Cloud
 I wanted to update readers to let them know of an upcoming post series on Oracle Transportation M...
Visualizing the #nonato Twitter hashtag – time series and top users
 The NATO summit is currently being held in Chicago, and, as is typical for NATO or G# summits, th...
Charting Twitter time series data with tweet and unique user counts
Let’s say you’ve used my Python script to automate the download of a hashtag or search p...
eDiscovery Consulting in the Cloud: Searching an Outlook mailbox and attachments
 You may have noticed that I keep talking about eDiscovery consulting and legal search in the clou...
Down with the static
After six years of static HTML, it finally became apparent that this site needed a real CMS. Â Pleas...
Generating AWS CloudSearch SDF for Emails
 In my last post on CloudSearch and eDiscovery, I described something like “Google” fo...
“Google” for subpoenaed emails: AWS CloudSearch for eDiscovery
 In the last post on AWS CloudSearch, I provided a tutorial on the creation of a simple CloudSear...
Building an AWS CloudSearch domain for the Supreme Court
 It should be pretty clear by now that two things I’m very interested in are cloud computing...
Visualization of Reading Level Frequency by Congressional Bill Stage
 Here’s a fun example of how you might use my data on Congressional bill length and complexi...
Updates to data and statistics on Congressional bill complexity
 When I put together my original post on the length and complexity of Congressional bills, I was h...
Installing AWS Cloud Search Command Line Tools
 In case you’re too lazy to dig into the full CloudSearch developer guide, here’s a qu...
Updated Michigan Compiled Laws (MCL) XML
 Last August, I released an XML copy of the Michigan Compiled Laws (MCL).  As an example of what ...
Statistics on the length and linguistic complexity of bills
 Where would you go to find out what the longest bill of the 112th Congress was by number of secti...
Hash collision attacks hose Oracle Transportation Management
Ars had a great write-up a few weeks ago about the huge hash resource starvation attack that was un...
Now hosted on EC2
 After a few days of configuration and testing, it’s official – bommaritollc.com and c...
Visual Summary of #jan25 Twitter Activity
 Last year, I covered a number of the so-called “Twitter protests” in China (#cn220), ...
Network of foreign key constraints in Oracle Transportation Management
 If you’ve ever worked under the hood with OTM, you’ll know that there’s some pr...
Debugging ORA-02292: integrity constraint (OWNER.CONSTRAINT) violated – child record found
 Did you find this post useful?  Does your organization need Oracle services?  We can help.  W...
Saving memory in redis and python with struct.pack
 In redis, every object is either a binary-safe string or collection thereof.  Even if you’...
Building, configuring, and benchmarking redis from Github source
 Part of blogging for myself is making notes about process that may or may not garner a wide audie...
Blogging for myself, not you.
 In between writing the wrong year on documents, I’ve been reflecting on this blog.  Specif...
Building Legal Language Explorer: Interactivity and drill-down, noSQL and SQL
  Dan and I recently released a new legal informatics project with a few colleagues. The project, ...
21st Century Legal Informatics: Part 1, Introduction
Dan and I have written and spoken on legal informatics many times. Inevitably the...
Single HTML File of the Michigan Compiled Law (MCL)
Last night, I posted a copy of the Michigan Compiled Law (MCL) as an improved and structured X...
XML Copy of the Michigan Compiled Law (MCL)
A few weeks ago, Ari Hershowitz posted on Quroa calling for a Californa code hackat...
Building igraph’s Python bindings with plotting support on Ubuntu
igraph is my preferred library for graph manipulation. The core library is written in C and is...
Natty Narwhal on the Precision M4600
Since there are always questions of support for newly released models, I thought I’d put up a ...
More monitor fun with Natty Narwhal – rotating one screen
After struggling with Natty’s multiple monitor support last week, I thought I’d t...
Multiple Monitors on Natty Narwhal
Have you been infuriated by Natty Narwhal’s poor/broken support for multiple monitors (#1, #2,...
Electronic World Treaty Index and Tax Court Appendix
I’ve been busy lately with my new day job and wedding planning , but Dan and I still managed t...
Historical data mining the Supreme Court headnotes
Two weeks ago, I posted a pair of very rough working papers. The second of these, E...
Slides from my talk at the University of Houston, Law and Computation Workshop – Law ? Computation
I’ve uploaded the slides for my talk today at the University of Houston Computational Law Conf...
Two new papers on SSRN: Measuring EU integration through sovereign debt & Exploring relationships between headnotes in the Supreme Court
What do you do with that unfinished paper? You know, the one that’s 50% ...
Now in print: An Empirical Survey of the Population of U.S. Tax Court Written Decisions
When someone brings up the empirical study of legal citation, most people think of the w...
Building a better legal search engine, part 1: Searching the U.S. Code
As I mentioned last week, I’m excited to give a keynote in two weeks on Law and Co...
Upcoming post series: Building a better legal search engine
Later this month, I’ll be giving a keynote at a meeting on Law and Computation at ...
Deaths per TWh (terawatt-hour) by Energy Type
The chart says it all, with nuclear winning by two orders of magnitude (via ManyEyes). ...
#anon member outs himself through Facebook App ID
Oops. Looks like the #anon member in charge of developing the BoA leak site may have outed him...
A quick look at #march11 / #saudi tweets
Well, so much for that #march11 #Saudi day of rage. Whether it was really the "tempest in...
Marginal Revolution on ideological economist blindspots
Having spent more time than I’d like to recall in rooms with economists, political scientists,...
Kevin Kelly thinks Computational Legal Studies is Cool
Thanks to Sam Arbesman yesterday for pointing out that Kevin Kelly thinks the blog Dan and I run, Co...
Christoph Gohlke’s Windows Python Packages
Last night, I spent a few hours configuring a new OCZ Vertex 2 on my M4500, m...
Archiving Tweets with Python
Last week, I posted some R code that downloads the user and timestamp of tweets tha...
Dataset: Wisconsin Union Protester Tweets #wiunion
  I’ve been playing with Twitter data over the last week, archiving Algerian, Egyptian, Ira...
RescueTime: Really Cool Time Tracking
I created my RescueTime account in October 2009, installed the client, got confused, and promptly ig...
Plotting 3D Graphs with Python, igraph, and Cairo: #cn220 Example
Out of all the visuals I’ve produced, I think the "coolest" is the three...
Snowmageddon II: February 21, 2011
Remember that last Snowmageddon? This one was pretty good too....
Tracking the Frequency of Twitter Hashtags with R
I’ve posted three examples of Twitter hashtags datasets in the last week: one on China, ...
Dataset: Tweets from the Chinese Protests #cn220
  Earlier this week, I posted a ~100k tweet dataset on the #25bahman protests in Iran.  The corre...
R Bloggers: The Site I Wish Existed in 2007
My first experience with R was in 2007 as a sophomore in undergrad. As part of a l...
Most Contacted HBGary Emails and Domains
 You may have heard about the recently leaked presentation on combating Wikileaks that was produc...
LGA – DTW: Approach to NYC, 02/16
Just got back from a few days of business in New York. Here’s a picture of t...
OCR’d Exhibit A – List of Closing Stores from Borders Chapter 11 Filing
Here’s an OCR’d version of the list of Borders stores that are closing. &nbs...
Pre-processing text: R/tm vs. python/NLTK
Let’s say that you want to take a set of documents and apply a computational lingu...
Dataset: 5 Days of #25bahman
What do 88,831 tweets about protest and revolution in Iran look like? Â Following in the success of ...
Plotting a Revolution: Time Series Comparison of #feb12 vs. #fev12
I wondered yesterday whether one of the Algeria/Yemen hashtags would dominate. In order ...
DNS-Based Internet Censorship and IPv6
I’ve been watching the #feb12 tweets on the movements in Algeria and Yemen. One of...
Paper: Quantifying and Modeling Long-Range Cross-Correlations in Multiple Time Series with Applications to World Stock Indices
Here’s another econophysics paper from H. Eugene Stanley and crew: D. Wang, B. Podobnik,...
Twitter Hashtag Battle Royale – #(feb|fev)12 vs. #12(feb|fev)
Algeria and Yemen seem to be pushing a #feb12 revolution hashtag like Egypt’s #jan25 tag. In t...
Nixon in China and reusable phrases
I’m not really big on modern opera, but I just caught the first act of Nixon in China on Siriu...
What’s an instruction anyway?
I really like this Wired article and the underlying study in Science because the research brings com...
First Post
Last month, I decided to take a leave of absence from my Ph.D. program at Michigan. I’ll...