Statistics on the length and linguistic complexity of bills

  Where would you go to find out what the longest bill of the 112th Congress was by number of sections (H. R. 1473)?  How about by number of unique words (H.R. 3671)?  What about by Flesch-Kincaid reading level  (S. 475)?

  Head on over to this table of bills, updated daily for the 112th Congress, which contains the following fields:

  • Bill Name
  • Publish Date
  • Bill Title
  • Stage
  • Section Count
  • Sentence Count
  • Word (Token) Count
  • Unique Words (Tokens)
  • Unique Stem Count
  • Avg. Word Length
  • Avg. Sentence Length
  • Reading Level (Flesch-Kincaid)

I’ll be adding more automated analysis and figures over the next few weeks, but for now, here’s a morsel to get your gears turning.

CEO and Founder of Bommarito Consulting. View Michael's profile here.

Tagged with: , , , ,
Posted in Law, Programming, Research
7 comments on “Statistics on the length and linguistic complexity of bills
  1. Matt Barney says:

    It would be interesting to superipose a distribution of those bills that pass that could suggest a certain size for being more persuasive than smaller/bigger sizes.

    Matt

  2. Karen Suhaka says:

    Great stuff. Can’t wait to see more figures!

    I agree with Matt that it would be interesting to see if there’s a statistical difference between bills that passed, and those that didn’t.

    I’ve got all the bills for the states in xml and in text. Want to check them out? Compare to federal? Compare states to each other? Let me know and I’ll get you access.

    -k

  3. Tom says:

    This will be very interesting; I’m looking forward to future installments.

    I hope you’ll forgive a criticism on the formatting of the graph. It took me several seconds to decipher the meaning of the coloring. Upon realizing that you were just double-plotting the data, I frankly felt that you were wasting my time. There may be some good reasons to overlay a heat map on a histogram (or, alternatively, to use marginal plots), but double-plotting the same data surely isn’t one of them.

  4. Douglas Calvert says:

    Can you post the code you used for this? I would love to see how word count / reading level fits in with the new bill passage prognosis from govtrack…

1 Pings/Trackbacks for "Statistics on the length and linguistic complexity of bills"
  1. [...] bill complexity Posted on Sat, 2012-04-14 by Michael J Bommarito II  When I put together my original post on the length and complexity of Congressional bills, I was hoping to build forward momentum on the project.  The goal was to build a simple, sortable [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>