stress relief
Posted by roy at 05:04 AM on January 7, 2009 in Photography, MindTouch, San Diego.
This month is gonna be brutal at work. The most significant MindTouch Deki release since Hayes, Lyons is slated for release sometime in February. As we usually do, we aimed quite high on the number of deliverable features. Just check out some of the features that are on the board - amazing what our engineering team is capable of (it's only been about three months since Kilen Woods!).
We're also going through a bunch of professional service projects that I had stalled out on at the end of December (the vacation depleted the prof services team at MindTouch, and I needed to step back into the product side). There's a huge project launch in the middle of January that is a bit stressful - let's hope we get that out without problems!
Supposedly January 5th is the most stressful day of the year, and I certainly felt that yesterday. Just getting back and having to deal with everything was hard. It wasn't so much coming back to work that was difficult, but the urgency of "Let's get things started off on the right foot." I'm starting to bring my work home more often (case in point: tonight until 1am, boo!), but it will all be worth it at the end. Just... gotta... get... through.. it (flashes of that Daniel Bedingfield song come to mind)
Anyways, I don't know what I'd do without the beaches here. There's nothing more relaxing than driving out to the beaches here and just emptying my mind of all its burdens. What *did* I do in NC? I can't even remember. Probably played Starcraft.
I foresee many beach runs this month!
. . .
So I keep giving away my digital point and shoots - I swear I've bought three or four over the past couple of years, and I keep giving them away. Anyways, I gave my Canon G5 to my parents when I was home back in October, so I've been without a digital point-and-shoot for a couple of months. I love my SLR, but I'd like to carry a smaller camera with me on a day-to-day basis for those "little moment" pictures.
I've convinced myself that I want a Ricoh GX200:

The glass looks great (24mm - 72mm, f2.5 - f 4.4), stores in RAW, it's compact, and has manual control. Plus I think the camera looks gorgeous (in that industrial, inconspicuous way).
"The Road" by Cormac McCarthy
Posted by roy at 02:18 AM on January 6, 2009 in Ramblings.
Let me first say that I'm not a big fan of fiction; I've always been more interested in works of non-fiction. So if this review is overenthusiastic, it's simply because I don't read enough fiction. Apologies to those who are more well-versed than I ;D
My sister recently recommended I read Cormac McCarthy's "The Road." Before going to bed tonight, I started it. And I was blown away. Enough to get me back on the computer to write about it.
Cormac McCarthy writes in the exact style that I try (and fail) in my super crappy short stories. I haven't even gotten 30 pages into the book, and I've already been re-reading whole passages - his mastery of words is amazing. I'm going to share some of my favorite passages so far (the absence of punctuation is part of his style, and not a typo - I'm doing a complete transcription):
They passed through the city at noon of the day following. He kept the pistol to hand on the folded tarp on top of the cart. He kept the boy close to his side. The city was mostly burned. No sign of life. Cars in the street caked with ash, everything covered with ash and dust. Fossil tracks in the dried sludge. A corpse in a doorway dried to leather. Grimacing at the day. He pulled the boy closer. Just remember that the things you put into your head are there forever, he said. You might want to think about that.
You forget some things, don't you?
Yes. You forget what you want to remember and you remember what you want to forget.
My favorite passage thus far, emphasis mine: (this story is about an unnamed father and boy in post-apocalyptic America)
An hour later they were on the road. He pushed the cart and both he and the boy carried knapsacks. In the knapsacks were essential things. In case they had to abandon the cart and make a run for it. Clamped to the handle of the cart was a chrome motorcycle mirror that he used to watch the road behind them. He shifted the pack higher on his shoulders and looked out over the wasted country. The road was empty. Below in the little valley the still gray serpentine of a river. Motionless and precise. Along the shore a burden of dead reeds. Are you okay? he said. The boy nodded. Then they set out along the blacktop in the gunmetal light, shuffling through the ash, each the other's world entire.
Wow.
Digital mischief
Posted by roy at 12:29 AM on January 6, 2009 in Tabulas.
Before you read further, I want to just say that I ran this idea by Corey, just to make sure it wasn't too crazy (I've long since given up trying not to be weird). So if this entry makes you think I'm going insane, I blame Corey.
For some reason tonight, I thought it'd be hilarious to prevent Tabulas users from ever using tinyurl.com in their entries. (I really hope I wake up tomorrow morning thinking it's as funny as it is to me tonight). I just imagined some spammers trying to hide their sites behind TinyURLs, and discovering they kept getting converted to their real equivalents.
After seeing the best website idea of 2008 (they launched the last week of '08) in longurlplease.com, I decided this would be worth the twenty minute investment. Over the weekend, I had added code inside Tabulas to do link parsing of entries (on save) for some new features, so it was relatively trivial for me to add a little hook that would convert the links to their non-tinyurl equivalents (using longurlplease.com's API).
So basically, any time you try to use a tinyurl in a Tabulas entry, I'll magically convert it to the real URL. This is a pretty jerk-ish thing to do (I'm generally very against ever touching a user's content), but I'm just too tired and grouchy to care today. Plus I think it's hilarious.
I've posted a screencast of it in action at http://screencast.com/t/oOsZtUf8Jth
Why I hate TinyURL as a service can be saved for another post.
Linking back and related entries
Posted by tabulas at 11:07 PM on January 5, 2009 in General News.
For patron accounts, I've added one new feature: linkbacks. Any time a Tabulas user posts an entry that links to a particular entry of yours, you'll see it appear in the comment view of a page. This will help you discover who's talking about your entries on Tabulas, and it is a way to encourage you to link to other's posts as well!
I've also started processing Tabulas entries for the "Related" entries feature - for those of you who don't know, this is a magical feature which will try to find "related" entries that you've written. Sometimes it's accurate, sometimes it's not. But it's fun to see what entries will show up :)
Related entries are processed nightly, so you may have to wait a bit to get related entries. All patron account should have had their entries processed, though.
Images and Tabulas
Posted by roy at 05:05 AM on January 4, 2009 in Web Development, Tabulas.
Sanjuro recently commented "Actually I find these insights into the management of a fairly big website very interesting. So keep'em coming!" I'm going to pre-apologize right now if this entry bores you to tears; just imagine the despair of the electrons who were fired onto you with futility!
Tonight, I wrapped up a back-end project for Tabulas that's been ongoing for a couple of weeks now (dates back to early December). What I've successfully done is to place the burden of serving Tabulas images onto this server, instead of Amazon S3. Geniuses like PeteE or fdn are probably thinking to themselves: "Oh man, this would take me all of twenty minutes." Unfortunately, I was not blessed with smarts, but instead dashing good looks, so this took me a couple of weeks instead.
Back in the day before Amazon S3, everything was hosted in a datacenter on several servers I rented from EV1Servers. This included both the database, as well as the flat files. If you remember the old PDS (personal data servers), they were aces.tabulas.com, lca.tabulas.com, jbiel.tabulas.com - your account was tied to these servers.
This worked fine, but there was the issue of the servers one day going up in flames, and me losing all the data (I don't like RAID, and I only had the time/skills to backup the database and the raw files). There were also issues of scaling I never addressed (for example, what would happen when aces.tabulas.com ran out of disk space?).
Then Amazon S3 came along and I crapped my proverbial pants (and quite possibly my literal pants).
When an image is uploaded to Tabulas, I store 4 versions of the same file: a thumbnail size (small), a web size (medium), a large size (large), and the raw image. In the early days, I didn't expose the raw version, but kept it archived on a separate server. I didn't have the skills to keep file systems backed-up, so I figured if the PDS server ever went toast, I could use the raw versions to regenerate the different sizes.
So obviously, the first thing when S3 came out was to transition the raw files to an "original" bucket (ACL: private).
And then a couple of months later, I created the new bucket images.tabulas.com and started hosting images on S3 directly. I also exposed the "raw" format publicly, which pretty much deprecated the usefulness of the "original" bucket. And all this was working fine until a couple of weeks ago.
While S3 simplifies the maintenance of the server, it is still not very cost effective. The bandwidth/data storage costs are much higher than if you ran it yourself - but for a guy like me who is more interested in cutting cruddy code than maintaining servers, that added cost is fine. Well, to a certain point. When my S3 costs started spiraling into $300/month, I decided it'd be worth cutting some code.
So I created i2.tabulas.com, which was routed through my servers. i2.tabulas.com, without the math getting complicated, gives me "free" storage, with bandwidth costs of $0.0485/GB per month. S3 costs $0.17/GB per month. It's 4 times as cheaper, even excluding the storage costs.
So when a user requested a picture from Tabulas, it got routed to i2.tabulas.com; i2 would then ask, "Hmm, do I have a local copy of this file?" If so, it would simply serve that image out (using PHP's fpassthru) to the end-user. If it didn't, it would retrieve it from Amazon S3 once (and store it on the local server), then serve up the image.
I waited for people to complain about things not working, but there weren't any complaints. So I took it to the next level.
One of the problems I have is that people were using images.tabulas.com when referencing images - so even if I was telling Tabulas users the subdomain was i2.tabulas.com, people who had embedded images from other sites would constantly hit S3.
My goal was to have images.tabulas.com's DNS no longer point to S3, but to Tabulas.
But there was one caveat. When I serve up images from i2, I send HTTP headers that tell your browser the file size, as well as the filetype. I was missing this information for most images (I told you, I used to be quite lazy).
So I had to write a script to retrieve about half a million pictures on Amazon S3 and retrieve the file metadata. (Conceivably, I could use the local copy I get from Amazon to get this information, but I've had bad experiences with mimetype detection locally). When I ran this script, I noticed that roughly 6,000 images on Tabulas had data records, but missing files.
Being on a "spring cleaning" mode for the Tabulas database, I decided to fix these images by writing a script that would (1) go to the "original" backup bucket and retrieve the file and (2) regenerate the file images and (3) update the data records accordingly. Using this, I fixed roughly 5,000 of those images. The last 1,000 I just deleted from the database (hell, the images don't work, why would people want them in their gallery?).
While doing this, I realized how useless the "original" Amazon S3 bucket was - so I started running a script to delete that whole bucket (I think it weighs in at around 60GB or so - that'll save me a whopping $100/year, but it's a bucket I absolutely don't use, so it'll be good to do that).
Once I verified all images stored in the Tabulas DB has the appropriate metadata, I flipped the switch by removing the CNAME DNS record which points images.tabulas.com to Amazon S3 - now even images.tabulas.com points to the primary Tabulas server!
Of course, after this was done, I also decided to clean up any entries which had embedded images over the past couple of months - I wrote a script to go through all entries posted in the past three months, and fixed up all references from i2.tabulas.com to images.tabulas.com (although I plan on maintaining the i2.tabulas.com subdomain indefinitely, by ensuring all data inside Tabulas was referencing images.tabulas.com, I could cut down on some code.
Doesn't it seem weird that there's so much work being done just to maintain the status quo? All that work, and its success was judged by how little had changed.
But it was all worth it - I got to remove an unused Amazon S3 bucket, I cleaned up the Tabulas DB, and I made the data inside the images table of the database more consistent. And not only that, I added a feature that had long since bugged me: privacy controls on the images themselves.
In the past, you could set privacy controls on albums, but they would not be enforced on the images themselves. For example, if you got an image URL, you could easily just share it with somebody. The false sense of security = not cool.
Facebook still does this; I have an album that's set to "Friends Only", yet conceivably this link to an image in that album works. (I'm guessing the problem is compounded probably due to Akamai - CDN with auth will be hard).
Anyways, I finally got this implemented in Tabulas with just a few lines of code (G will probably snicker due to my usage of $wg, but I don't care!):
// do a privacy check if the album isn't public
if ($Image->getAlbum()->getStatus() != STATUS_PUBLIC) {
// define the site user
$wgSiteUser = User::fromId($wgTitle->getPath(0) /* userid */);
// do the privacy check
if (!$Image->getAlbum()->canView()) {
http_status(403);
exit();
}
}
If you are logged into Tabulas and are a Tabulas friend, you can see this picture. Try logging out and hard-refreshing. Can you see it now? NOPE! Burn.
Of course, the one use case this breaks: users who uploaded background images for their Tabulas in their gallery and "Private"-ed the album to "hide" it. Maybe I'll add an "archive" or "hide" album feature instead.
I'm pretty sick of working on images, but there is one last thing I'd like to add: EXIF image parsing. There must be a wealth of knowledge there already.
So yeah, that was what I did yesterday and today. Fun!
well, that's just brutal
Posted by roy at 03:58 PM on January 3, 2009 in Ramblings, Tabulas.
Does this story sound familiar? Via TechCrunch:
Blogging platform JournalSpace has ceased to be, following a
wipe-out of the main database for which there was no back-up in place.
...
JournalSpace had apparently been around for 6 years, and will now be
releasing its source code to the open source community, and possibly
sell off the domain name and trademarks.
By all indications JournalSpace was smaller than Tabulas, but that still doesn't dimish the blow for the loyal users who stuck with a smaller service. My heart goes out especially to the owners - losing a six-year labor of love has to be rough. Even though Tabulas is much smaller now than it used to be (and there are suitable alternatives out there), I know if I lost Tabulas through something like a database wipe-out, I'd be devastated. I mean, Tabulas has existed for 20% of my life (it's sixth birthday is coming up this year).

Anyways, my heart goes out to the JournalSpace guys.
(By the way: Tabulas runs monthly database back-ups to S3, so I'm somewhat shielded against catastrophic database losses. I also offer each Tabulas account - free or patron - the ability to backup. Take advantage of it.).
. . .
This is a bit late, but my picks for the week 1 of NFL playoffs: Cardinals, Colts (sorry SD fans), Ravens, and Eagles.