Should I bring all my shoes and glasses?

//Mining Mailinator

General | | 19. November, 2010

for those who are unfamiliar, mailinator is a site/service that accepts and temporarily stores any e-mail it receives. that’s right, any e-mail received by the mailinator mail servers is shuffled into the appropriate inbox as if it already existed.  you simply choose which @mailinator.com e-mail address you would like, regardless of whether or not it has been used in the past, and use it as if it were your own.  this makes it extremely useful when, for example, you are required to register for a site in order to download a white-paper.  or, perhaps you would prefer not to use your @gmail.com address when browsing myhotmatez.com.

while recently using the site an idea came to me suddenly — if i were to mine data from mailinator inboxes, could i find anything interesting?  i was curious to find out whether people used this service for legitimate reasons or if it was merely a  dumping ground for spam.  to entice me, i realized mailinator allows you to access an inbox by appending the username to the following url http://www.mailinator.com/maildir.jsp?email=x (where x is the username).

so, i fired up vi and began piecing together a python script to automate the discovery and reporting of messages across multiple inboxes.  the logic was simple:

  1. read in a list of words (i.e., usernames).
  2. for each word pull down a listing of the current e-mails for that inbox.
  3. for each e-mail strip out the ‘from:’ and ‘subject:’ fields.
  4. write this information to csv in the format [username, from:, subject:, url] (where url is a link to the full e-mail message).
  5. boom, review.

after completing the script, i began by slowly feeding it lists of 5 to 10 usernames — mostly names of actual people (e.g., bob, sarah, frank).  even though i was able to scrape hundreds of e-mails, 99% of them were easily identifiable as spam.  i then proceeded to words that could be associated with a specific task someone was trying to accomplish (e.g., code, temp, password, exchange).  while still mostly spam, i managed to find a few interesting tidbits:

it looks as though lesley lupo was concerned about her wildlife points in zooworld.  ok, maybe this isn’t that interesting but i did find it odd that someone would register a mailinator e-mail address for a service (zooworld) that accepts payments — where they are actively purchasing goods.  the next one struck me as a little concerning:

you’ll notice i blurred sections of the e-mail out.  that’s because it contains Robert’s full contact information including the company he works for.  i can imagine that someone with a bit more time could easily modify the script to look for these types of sections within an e-mail.  jackpot.

i’m continuing to mine for this type of data using various sets of usernames.  i expect to post updates as i find more interesting information.  my main reason for publishing this now is to gather ideas for improvement of the script and mostly because i know if i didn’t post it now, i never would.  here are a few ideas for next steps:

  • add logic to filter messages with subject lines including specific words (e.g., free, discount, xanax).
  • pull down and store the entire e-mail message locally (messages are regularly pruned on mailinator servers).
  • extract the originating mailserver ip to graph the distribution of messages across their source.
  • tidy up my python code and define functions for each task.

/edit i forgot to mention the name of the file where usernames are read should be titled ‘words.txt’ and be in the same directory as mailinator.py.

Comments

  • dude, that’s awesome! kinda scary that site makes it so easy to read the inboxes =)

  • gremlin says:

    Updated your script to work with mailinators new api/link calls.

    http://pastebin.com/U2K18jjv

    Thanks!

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    Next | Previous
    Theme made by Igor T. | Powered by WordPress | Log in | | RSS | Back to Top