Pre-processing text: R/tm vs. python/NLTK

  Let's say that you want to take a set of documents and apply a computational linguistic technique.  If your method is based on the bag-of-words model, you probably need to pre-process these documents first by segmenting, tokenizing, stripping, stopwording, and stemming each one (phew, that's a lot of -ing's).     In the past, I've relied

By |2011-02-16T10:12:07-05:00February 16th, 2011|Programming|15 Comments

Dataset: 5 Days of #25bahman

What do 88,831 tweets about protest and revolution in Iran look like?  Following in the success of Egypt's #jan25 tag, protesters have piled onto the #25bahman tag to discuss Iran's own prospects for "revolution" (25 Bahman 1389 is the Hijri date for February 14, 2011).  Curious to analyze and compare these movements, I've started collecting

By |2011-02-15T14:14:17-05:00February 15th, 2011|Society, Technology|10 Comments

DNS-Based Internet Censorship and IPv6

 I've been watching the #feb12 tweets on the movements in Algeria and Yemen.  One of the most common types of tweets explains how to avoid the current censorship of Twitter, Facebook, and Google by directly entering IPs.  This got me thinking - while it may be fairly easy to memorize and disseminate IPv4 addresses, what

By |2011-02-14T14:08:27-05:00February 14th, 2011|Society, Technology|0 Comments

Paper: Quantifying and Modeling Long-Range Cross-Correlations in Multiple Time Series with Applications to World Stock Indices

Here's another econophysics paper from H. Eugene Stanley and crew:  D. Wang, B. Podobnik, D. Horvatić, H. E. Stanley. Quantifying and Modeling Long-Range Cross-Correlations in Multiple Time Series with Applications to World Stock Indices.  In my opinion, the primary contribution of the paper isn't really their method.  The "global factor model" seems like the same

By |2011-02-14T09:10:14-05:00February 14th, 2011|Reading List, Research|0 Comments

Nixon in China and reusable phrases

I'm not really big on modern opera, but I just caught the first act of Nixon in China on Sirius/XM.  According to Wikipedia, the opera's premier was met with such glowing reviews as "Mr. Adams does for the arpeggio what McDonald's did for the hamburger." I'll be working hard to use that as an insult

By |2011-02-12T14:33:50-05:00February 12th, 2011|Society|0 Comments

What’s an instruction anyway?

I really like this Wired article and the underlying study in Science because the research brings computing and storage into context, both historically and with respect to types of devices.   From the abstract, "In 2007, humankind was able to store 2.9 × 1020 optimally compressed bytes, communicate almost 2 × 1021 bytes, and carry

By |2011-02-12T11:55:54-05:00February 12th, 2011|Technology|0 Comments

First Post

Last month, I decided to take a leave of absence from my Ph.D. program at Michigan.  I'll be pursuing opportunities in the private sector, and it's very likely that I won't be returning to academia.  As a result, I've begun to transition my web presence off of University servers.  This domain is a big part

By |2011-02-11T17:53:32-05:00February 11th, 2011|Personal|0 Comments

Top Sliding Bar

This Sliding Bar can be switched on or off in theme options, and can take any widget you throw at it or even fill it with your custom HTML Code. Its perfect for grabbing the attention of your viewers. Choose between 1, 2, 3 or 4 columns, set the background color, widget divider color, activate transparency, a top border or fully disable it on desktop and mobile.

Recent Tweets

Newsletter

Sign-up to get the latest news and update information. Don’t worry, we won’t send spam!

Go to Top