Insights

Research, analysis, and perspectives from Bommarito Consulting on AI, governance, privacy, and computational modeling.

AI Governance · 2 min

AI Lifecycle and the Board's Role

A primer for board directors on the AI lifecycle — data collection, training, and deployment — and the strategic considerations boards must understand for effective AI oversight.

AI Governance · 2 min

AI Oversight: 5 Key Sources of Board Requirements

A framework identifying the five key sources of AI governance requirements for boards — legal mandates, risk frameworks, insurance, internal policies, and customer preferences.

AI Governance · 2 min

Risk Management for AI: A Board Director's Guide

A comprehensive guide for board directors on leading AI risk management through six key elements: establishing context, risk assessment, risk treatment, recording, communication, and continuous monitoring.

Open Source · 2 min

The KL3M Data Project: Building Copyright-Clean Training Data at Scale

Inside the KL3M Data Project — assembling 132M+ copyright-clean documents to enable responsible LLM training and earning the first 'Fairly Trained' certification.

AI Governance · 2 min

How Data Provenance Drives Machine Learning Risk and Value

An exploration of data provenance — the origin and history of data — and why it is a board-level concern for AI risk management, legal compliance, and responsible governance.

AI Governance · 2 min

Pioneering Responsible Data Science: A Framework for Ethical Innovation

Introducing an open-source Responsible Data Science Policy Framework designed to help organizations address ethical AI governance through a modular, adaptable approach.

Firm Update · 1 min

Since Our Last Episode: The Evolution of Bommarito Consulting

A retrospective on the firm's journey from early-stage consulting through the LexPredict era, and the pivot toward AI governance, privacy, and institutional advisory.

Entrepreneurship · 2 min

Built to Sell: Lessons from the LexPredict Journey

Key takeaways from building, scaling, and successfully exiting LexPredict — an AI-powered legal technology company acquired in 2018.

Research · 2 min

Predicting the Supreme Court: AI Meets Legal Outcomes

How our machine learning research achieved breakthrough results in predicting Supreme Court decisions, and what it means for the future of legal AI.

Research · 1 min

Course Material for Complex Systems 530 — Computer Modeling for Complex Systems

Open course material for Complex Systems 530 at the University of Michigan, covering agent-based, Monte Carlo, and network modeling in Python.

Research · 2 min

Predicting the Behavior of the Supreme Court of the United States: A General Approach

Introducing our Supreme Court prediction project using extremely randomized trees to forecast over sixty years of decisions and individual justice votes.

Research · 2 min

Featured in Wired: Measuring the Complexity of the Law

Sam Arbesman featured our paper on measuring legal complexity of the United States Code on his Wired Science blog, the Social Dimension.

Research · 2 min

Is the Tax Code the Longest Title?

An analysis of whether the Internal Revenue Code (Title 26) is actually the longest Title in the U.S. Code, using data from our paper on measuring legal complexity.

Research · 2 min

Measuring the Complexity of the Law: The U.S. Code

Releasing our empirical framework for measuring legal complexity, applied to the U.S. Code, with full replication source and data on GitHub.

Research · 2 min

Summary of Community Detection Algorithms in igraph 0.6

A reference guide to the community detection algorithms available in igraph 0.6, including their runtime complexity, and support for directed and weighted edges.

Research · 3 min

Building an AWS CloudSearch Domain for the Supreme Court

A step-by-step tutorial for building a fully searchable AWS CloudSearch domain using public domain U.S. Supreme Court decisions, covering data acquisition, domain configuration, indexing, and querying.

Research · 1 min

Statistics on the Length and Linguistic Complexity of Bills

An introduction to an automated daily analysis of 112th Congress bills, providing statistics on section count, word count, unique words, and Flesch-Kincaid reading level for every bill.

Research · 4 min

Building Legal Language Explorer: Interactivity and Drill-Down, noSQL and SQL

A technical deep-dive into the architecture of the Legal Language Explorer, a Google Ngrams-style viewer for the U.S. Supreme Court corpus, combining redis (noSQL) for fast time series queries with PostgreSQL for case-level drill-down.

Research · 2 min

21st Century Legal Informatics: Part 1, Introduction

An introduction to three paradigms of legal informatics — 20th century computers-as-libraries, 22nd century computers-as-lawyers, and the practical 21st century middle ground.

Research · 3 min

Historical Data Mining the Supreme Court Headnotes

A demonstration of historical data mining techniques using editorially-assigned legal headnotes from U.S. Supreme Court cases, examining co-occurrence and citation networks to trace the evolution of legal concepts like criminal suspect rights.

Research · 3 min

Two New Papers on SSRN: Measuring EU Integration Through Sovereign Debt & Exploring Relationships Between Headnotes in the Supreme Court

Announcing two working papers released on SSRN: one measuring European integration through sovereign bond yield correlations from 1872 to 2010, and another exploring the network structure of legal concepts in Supreme Court headnotes.

Research · 2 min

Now in Print: An Empirical Survey of the Population of U.S. Tax Court Written Decisions

Announcing the publication of an empirical analysis of U.S. Tax Court citation practices from 1990 to 2008 in the Virginia Tax Review, examining Internal Revenue Code citation frequency and the impact of tax legislation on court decisions.

Research · 2 min

Building a Better Legal Search Engine, Part 1: Searching the U.S. Code

An introduction to indexing and searching the U.S. Code using Apache Lucene, structured public domain data, and open source software.

Archive

Technical tutorials and programming posts from the early years of the firm.

Archive · 2 min

Advanced Approximate Sentence Matching in Python

Advanced techniques for approximate sentence matching in Python using Jaccard similarity on token, stem, and noun lemma sets after stopword removal.

Archive · 2 min

Fuzzy Match Sentences in Python

A tutorial on fuzzy matching sentences in Python using NLTK, covering tokenization, stopword removal, stemming, and lemmatization approaches.

Archive · 1 min

Isotonic Regressions in scikit-learn

A practical example and discussion of isotonic regression in Python's scikit-learn, including contributions to improve the IsotonicRegression class.

Archive · 2 min

Is There Tax in the Cloud?

A discussion of the U.S. tax implications of cloud computing transactions, highlighting Orly Mazur's SSRN paper on the challenges that IaaS, PaaS, and SaaS create for federal income tax principles.

Archive · 1 min

Law's Future from Finance's Past: Recorded Talk from Reinvent Law Silicon Valley

The recorded video of Michael Bommarito's talk at the Reinvent Law Silicon Valley event, drawing parallels between the evolution of finance and the future trajectory of the legal industry.

Archive · 2 min

Connecting R to an Oracle Database with RJDBC

A step-by-step guide to connecting R to an Oracle database using RJDBC, a cross-platform JDBC-based approach.

Archive · 2 min

eDiscovery Consulting in the Cloud: Searching an Outlook Mailbox and Attachments

A real-world case study demonstrating how to make Outlook PST mailboxes and their attachments searchable using AWS CloudSearch, processing 1.3GB of Enron email data on a laptop in under an hour.

Archive · 2 min

“Google” for Subpoenaed Emails: AWS CloudSearch for eDiscovery

A practical look at using AWS CloudSearch to build a scalable, on-demand search engine for subpoenaed email in eDiscovery engagements, eliminating the need for large capital expenditures on servers and storage.

Archive · 2 min

Saving Memory in Redis and Python with struct.pack

Using Python's struct.pack to convert numbers from string to binary representation in Redis, achieving significant memory savings.

Archive · 1 min

Tracking the Frequency of Twitter Hashtags with R

An R script for downloading and plotting the frequency of Twitter hashtags over time using ggplot2.

Archive · 1 min

Plotting 3D Graphs with Python, igraph, and Cairo

A demonstration of how to generate 3D network animations using Python, igraph, and Cairo, applied to a Twitter user graph.

Let's Work Together

We'd welcome the opportunity to discuss how we can help your organization navigate the intersection of technology, governance, and strategy.