Sunday, April 22, 2012 @ 8:33 pm

Getting involved in the search on the web-front was an experience in itself. Got involved after seeing how hard her friends and family were looking for her and I did what I did out of hope others would do the same for other people who need help, strangers or not. Definitely learned a lot about what’s vital information in a missing persons case and about server-load management.

I just wanted to share some technical aspects of the Find Michelle site and some lessons learned.

In one day, Friday, April 20, 2012, over 55 gigs of data was transferred (first picture). This was shortly after major media outlets started linking to the site as well as many of her friends tweeting the URL to celebrities (in one case, Rick Mercer retweeted it to his ~268,000 followers). The second picture shows the average hourly transfer rate (so the spike was between 8pm-9pm; 80-100M bits/sec).

Rick Mercer Retweet

I had been hosting all of the images and documents on my host, but after seeing how much traffic was being taken, I had to move the images to Imgur.com and documents/files to Dropbox (public folder). FYI, Dropbox allows 20 gigs/day of traffic for public links, and Imgur offers unlimited bandwidth (only requirement is your image(s) must receive a hit a day or else it’s deleted).

In terms of the actual page itself, it was a simple HTML page that I had designed to be client-based. If the site was server-based, my server wouldn’t have been able to handle the sheer server load. I used Google’s API to load the latest news reports, Facebook’s API to integrate the Facebook page, and Twitter’s API for both the official Twitter account her friends had set up and for a feed that tracked the hash tag. (in short, the site centralized activity in the news, on Facebook and on Twitter)

I also learned of a tool Facebook provides: https://developers.facebook.com/tools/debug With this tool, I was able to refresh their cached “Share Link” copy of the site to change it from my host’s landing page to the actual site content itself.

A bit of social analysis, most hits to the site were at 6pm:

First bar is Thursday, April 19 (site was launched), second cyan bar is Friday, April 20 (things started picking up), third blue bar is Saturday, April 21 (decent amount of traffic for a weekend)

It’s not every day you are able to become a part of something of this scale: scalability, stress-testing, responsiveness and overall intuition are vital.