Insights
Research, analysis, and perspectives from Bommarito Consulting on AI, governance, privacy, and computational modeling.
AI Lifecycle and the Board's Role
A primer for board directors on the AI lifecycle — data collection, training, and deployment — and the strategic considerations boards must understand for effective AI oversight.
AI Oversight: 5 Key Sources of Board Requirements
A framework identifying the five key sources of AI governance requirements for boards — legal mandates, risk frameworks, insurance, internal policies, and customer preferences.
Risk Management for AI: A Board Director's Guide
A comprehensive guide for board directors on leading AI risk management through six key elements: establishing context, risk assessment, risk treatment, recording, communication, and continuous monitoring.
The KL3M Data Project: Building Copyright-Clean Training Data at Scale
Inside the KL3M Data Project — assembling 132M+ copyright-clean documents to enable responsible LLM training and earning the first 'Fairly Trained' certification.
How Data Provenance Drives Machine Learning Risk and Value
An exploration of data provenance — the origin and history of data — and why it is a board-level concern for AI risk management, legal compliance, and responsible governance.
Pioneering Responsible Data Science: A Framework for Ethical Innovation
Introducing an open-source Responsible Data Science Policy Framework designed to help organizations address ethical AI governance through a modular, adaptable approach.
Since Our Last Episode: The Evolution of Bommarito Consulting
A retrospective on the firm's journey from early-stage consulting through the LexPredict era, and the pivot toward AI governance, privacy, and institutional advisory.
Built to Sell: Lessons from the LexPredict Journey
Key takeaways from building, scaling, and successfully exiting LexPredict — an AI-powered legal technology company acquired in 2018.
Predicting the Supreme Court: AI Meets Legal Outcomes
How our machine learning research achieved breakthrough results in predicting Supreme Court decisions, and what it means for the future of legal AI.
Course Material for Complex Systems 530 — Computer Modeling for Complex Systems
Open course material for Complex Systems 530 at the University of Michigan, covering agent-based, Monte Carlo, and network modeling in Python.
Predicting the Behavior of the Supreme Court of the United States: A General Approach
Introducing our Supreme Court prediction project using extremely randomized trees to forecast over sixty years of decisions and individual justice votes.
Featured in Wired: Measuring the Complexity of the Law
Sam Arbesman featured our paper on measuring legal complexity of the United States Code on his Wired Science blog, the Social Dimension.
Is the Tax Code the Longest Title?
An analysis of whether the Internal Revenue Code (Title 26) is actually the longest Title in the U.S. Code, using data from our paper on measuring legal complexity.
Measuring the Complexity of the Law: The U.S. Code
Releasing our empirical framework for measuring legal complexity, applied to the U.S. Code, with full replication source and data on GitHub.
Summary of Community Detection Algorithms in igraph 0.6
A reference guide to the community detection algorithms available in igraph 0.6, including their runtime complexity, and support for directed and weighted edges.
Building an AWS CloudSearch Domain for the Supreme Court
A step-by-step tutorial for building a fully searchable AWS CloudSearch domain using public domain U.S. Supreme Court decisions, covering data acquisition, domain configuration, indexing, and querying.
Statistics on the Length and Linguistic Complexity of Bills
An introduction to an automated daily analysis of 112th Congress bills, providing statistics on section count, word count, unique words, and Flesch-Kincaid reading level for every bill.
Building Legal Language Explorer: Interactivity and Drill-Down, noSQL and SQL
A technical deep-dive into the architecture of the Legal Language Explorer, a Google Ngrams-style viewer for the U.S. Supreme Court corpus, combining redis (noSQL) for fast time series queries with PostgreSQL for case-level drill-down.
21st Century Legal Informatics: Part 1, Introduction
An introduction to three paradigms of legal informatics — 20th century computers-as-libraries, 22nd century computers-as-lawyers, and the practical 21st century middle ground.
Historical Data Mining the Supreme Court Headnotes
A demonstration of historical data mining techniques using editorially-assigned legal headnotes from U.S. Supreme Court cases, examining co-occurrence and citation networks to trace the evolution of legal concepts like criminal suspect rights.
Two New Papers on SSRN: Measuring EU Integration Through Sovereign Debt & Exploring Relationships Between Headnotes in the Supreme Court
Announcing two working papers released on SSRN: one measuring European integration through sovereign bond yield correlations from 1872 to 2010, and another exploring the network structure of legal concepts in Supreme Court headnotes.
Now in Print: An Empirical Survey of the Population of U.S. Tax Court Written Decisions
Announcing the publication of an empirical analysis of U.S. Tax Court citation practices from 1990 to 2008 in the Virginia Tax Review, examining Internal Revenue Code citation frequency and the impact of tax legislation on court decisions.
Building a Better Legal Search Engine, Part 1: Searching the U.S. Code
An introduction to indexing and searching the U.S. Code using Apache Lucene, structured public domain data, and open source software.
Archive
Technical tutorials and programming posts from the early years of the firm.
Advanced Approximate Sentence Matching in Python
Advanced techniques for approximate sentence matching in Python using Jaccard similarity on token, stem, and noun lemma sets after stopword removal.
Fuzzy Match Sentences in Python
A tutorial on fuzzy matching sentences in Python using NLTK, covering tokenization, stopword removal, stemming, and lemmatization approaches.
Isotonic Regressions in scikit-learn
A practical example and discussion of isotonic regression in Python's scikit-learn, including contributions to improve the IsotonicRegression class.
Is There Tax in the Cloud?
A discussion of the U.S. tax implications of cloud computing transactions, highlighting Orly Mazur's SSRN paper on the challenges that IaaS, PaaS, and SaaS create for federal income tax principles.
Law's Future from Finance's Past: Recorded Talk from Reinvent Law Silicon Valley
The recorded video of Michael Bommarito's talk at the Reinvent Law Silicon Valley event, drawing parallels between the evolution of finance and the future trajectory of the legal industry.
Connecting R to an Oracle Database with RJDBC
A step-by-step guide to connecting R to an Oracle database using RJDBC, a cross-platform JDBC-based approach.
eDiscovery Consulting in the Cloud: Searching an Outlook Mailbox and Attachments
A real-world case study demonstrating how to make Outlook PST mailboxes and their attachments searchable using AWS CloudSearch, processing 1.3GB of Enron email data on a laptop in under an hour.
“Google” for Subpoenaed Emails: AWS CloudSearch for eDiscovery
A practical look at using AWS CloudSearch to build a scalable, on-demand search engine for subpoenaed email in eDiscovery engagements, eliminating the need for large capital expenditures on servers and storage.
Saving Memory in Redis and Python with struct.pack
Using Python's struct.pack to convert numbers from string to binary representation in Redis, achieving significant memory savings.
Tracking the Frequency of Twitter Hashtags with R
An R script for downloading and plotting the frequency of Twitter hashtags over time using ggplot2.
Plotting 3D Graphs with Python, igraph, and Cairo
A demonstration of how to generate 3D network animations using Python, igraph, and Cairo, applied to a Twitter user graph.
Let's Work Together
We'd welcome the opportunity to discuss how we can help your organization navigate the intersection of technology, governance, and strategy.