hacks

OTFG Step 4: Running the Import Script

At long last here's the import script I used: import-flickr-photos.php.

If you want to try it out, be sure to add all of your personalized information I've mentioned in the previous steps (1,2,3) to the top of the script.

Here's what the script does:
  • Authenticates the person running the script at Flickr via the browser. (You'll have to give your script permission to read all of your photos.)
  • Requests the total number of photos for the authenticated account from the Flickr API (to help with looping).
  • Requests all of the account's photos, asking for standard info (title, FlickrID, whether or not it's public) and some extra details (relevant dates, and the longitude and latitude).
  • Loops through every photo, adding the photo information to the database if it isn't already there.
  • Downloads the original photo file from Flickr and saves it locally if it isn't already there.
  • Requests the description (caption) and tags for the particular photo from the Flickr API. (Requires a separate API call, unfortunately.)
  • Adds the file location, description, and tags to the db.
  • Finally, the script sleeps for one second before doing anything else. (Seemed like the polite thing to do so the script doesn't hammer the API.)
In my last post I mentioned that Flickr Backup was clunky to use, but this script is a thousand times clunkier. If Flickr Backup is a Ford Taurus, then running this script is like taking your covered wagon out on the Oregon trail: no shocks, no rubber tires, no paved roads, and you'll likely die of cholera before it's finished. I'm kidding on that last part, but the script probably will die before all of your photos are saved locally. Not to worry, you can run the script multiple times without duplicating files or data. The script checks for existing records and files before taking any action. I have 513 photos at Flickr—which isn't too many in the scheme of things—and I still needed to run the script a couple times to get all of them. (I set a ridiculously high timeout at the top of the script, but the script seemed to die anyway.)

To run the script, open it in your browser. The URL should be something like:

http://example.com/import-flickr-photos.php

The script will redirect you to Flickr where you'll need to log in and/or authorize the script. From there, the magic starts. The script will try to give you some info about what's happening, but if you don't see anything but a blank page, don't worry, it's probably working. If you can, log into your server and check out the photos directory you set up. You should see folders and files appearing. Another way to check progress is by firing up MySQL and running some counts on your table. Something like this:

SELECT Count(PhotoID) FROM photos

If the script is working the count will be higher than zero.

It's important to note that the import script is grabbing every photo you uploaded to Flickr, even those marked as friends and family only. This is exactly what I wanted to happen, but if you want something different, check out the documentation for the flickr.photos.search method and tweak line 65 of the script. You can set a privacy_filter argument in the call to get a list of only photos that are public, for example.

So, once this script finished, I had a bunch of local directories filled with photos that I'd uploaded to Flickr over the past three years. I also had 513 records in the photos table and 1,646 records in the tags table describing those photos. That means I (and some others) added about 3.2 tags per photo. huh. So I can't really look at my photos through the Web yet, but at least they're ready for the next phase. Not too shabby for a few hours on a Saturday afternoon.

Disclaimer: As I mentioned before, OTFG is an off-the-top-of-my-head project. So if you try any of this stuff out, please don't hold me responsible for your toaster catching on fire. I'm sharing this project publicly to show how I'm going off the Flickr grid, and to hopefully get some feedback in the process.

Next Up: Set Theory

OTFG Step 3: Registering the Import Script

One feature that put Flickr ahead of the existing photo-sharing pack a few years ago was the fantastic Flickr API. The API gives developers the chance to tap into the Flickr photo database and create stunning visualizations, fun toys, and productivity hacks that extend the service. Hosting my own images means I won't be able to take advantage of these Flickr-specific tools built (for the most part) by fans of the service. Loosing access to these tools is one of the biggest drawbacks to going off the grid. Even my ability to leave Flickr with my photos hinges on the existence of the API. People can use the API to export their photos if they're not happy with the service, or if they want to share their photos in different way.

But exporting your photos isn't just a matter of pushing an "export" button. You have to have a certain amount of technical expertise to be able to export your photos from Flickr. There is one existing tool I know of that can grab all of your photos—Flickr Backup—but in my experience it's a bit clunky to use, and doesn't snag all of my data like descriptions and tags. (Sounds like it's getting better since I tried it, though, based on posts in the FlickrBackup Open Discussion.) So while it's possible to get your photos out of Flickr, it's not easy for most of the world. I consider myself familiar with Flickr and its API, but it still took a few hours to write a script to grab my photos, titles, captions, and tags.

The key to gathering my photos was logging in as myself through the Flickr API. (Which is about as easy as it sounds.) Luckily most of the heavy lifting is handled through phpFlickr—a swiss-army-knife for working with the Flickr API in PHP. If you want to try this at home you'll need to download and install phpFlickr on your server. Make sure the main phpFlickr file is in your public working directory, along with the file auth.php. This little file helps handle Flickr authentication.

Note the URL of auth.php on your server, it should be something like:

http://example.com/auth.php

Flickr controls access to their API through keys, and I needed one for my import script. With my auth URL in hand, I headed over to the Flickr API and applied for a key. I quickly described the app, noted that it was for non-commercial use, and agreed to the Terms of Use. In exchange, I got a couple of alphanumeric strings that let me use the browser to log in via the Flickr API.

I was instantaneously approved for the key, and I clicked on the Your API Keys link and found the key I just made on the list. I clicked Edit key details, gave the import script a quick title and description, and placed my auth.php URL in the Callback URL field. Then I clicked "Save Changes" to finish the setup.

I found my new key once again on the list of keys and copied both my Key, and my Secret—which is a smaller alphanumeric string that shouldn't be shared with others (as the name implies). You'll need to jump through these hoops to register your import script as well if you're following along.

To recap the progress so far, these items were in my magic bag of holding before any files were transferred:
  • A MySQL username, password, and database name, with empty photos and tags tables.
  • A local filesystem directory where photos will be stored.
  • A Flickr API Application Key and Secret.
It takes a bit of work to put the pieces in place, but once this groundwork is done, importing the photos from Flickr can begin. That's up next.

OTFG Step 2: Thinking about Photo URLs

My next step in moving my photos from Flickr to my own server was thinking about where I would store the photo files. Flickr assigns every photo a numeric ID which is available in every photo URL. For example, here's the URL of one of my photos hosted on Flickr:

http://farm1.static.flickr.com/124/359119647_4874f02815_o.jpg

The URL doesn't give much information about the photo. We know that the photo is at a flickr.com server, and that the photo is a user's original photo (note the _o at the end of the filename). Other than that, pretty anonymous.

Since I'm hosting my own photos, I thought I'd put a bit more information into the photo URLs. I decided to go with this format for original, unresized images:

http://example.com/[year]/[month]/[photo title].jpg

This means the same photo on my server will have a URL like this:

http://example.com/2007/01/beach-dogs.jpg

Though I don't expect my photo URLs to be exposed in the wild very much, I like this structure because it provides a bit of context. And because I'll be using actual directories in the filesystem named /2007 and /01, for example, the filesystem should scale well. I won't have hundreds and hundreds of photos in one folder. On the other hand, it will make running batch operations on all of the photos a bit tougher because I'll have to recurse through the directories—but that shouldn't be a big deal. (Especially since all of the file locations will be stored in the db.)

The Flickr API provides the date and time a photo was added to their system in Unix time, and the PHP date() function converts that to any format. So as my import script grabs photos from the Flickr server, it puts the image in the local filesystem based on the time it was added to Flickr originally.

I simply set a starting directory in my import script that's available through the web server, say, /www/photos/ or c:\\www\\photos\\ in Windows, and it will create the necessary local directories as it pulls in photos from Flickr.

Using the title of a photo as the file title is a bit tricky, because the titles are meant to be read by humans, not used in the filesystem. Photo titles contain punctuation and spaces, so I just strip all of that out with some regular expressions. I'm sure this could be improved, but I'm using:

$photoTitle_f = preg_replace('/\s+/', '-', $photoTitle_f);
$photoTitle_f = preg_replace('/[^-\w]/', '', $photoTitle_f);


Basically this bit of code says replace any whitespace in the title with a dash, and then remove any character that isn't a dash or isn't standard letters and numbers. A bit rough, but it should handle most standard English titles.

With the photo-URL planning out of the way, it was time to set up Flickr API access for my import script. I'll show how that works in Step 3.

OTFG Step 1: Setting the Stage

This weekend I took my first step toward going off the Flickr grid (aka OTFG). I set up a database to store information about my photos, I downloaded all of my original photos from the Flickr servers, and used the Flickr API to gather information about those photos. This first step is key for me because I don't want to loose the years of work I've already put in to adding titles, captions, and tags to my photos at Flickr. Luckily, Flickr has a fantastic API that lets you tap into their photo database.

If you want to follow along at home, my setup includes PHP 5 and MySQL 5.

I started by whipping up a quick database structure to hold info about photos. I'm sure this will change over time, but this is what I felt was the bare minimum I needed to get back out of Flickr. I made two tables: one for photos and one for tags. The table photos includes fields for a PhotoID, title, description, date the photo was added, date the photo was taken, longitude and latitude of the location (if available), whether or not the photo is public, and the location of the photo on the local filesystem. (I decided to store the files in the local filesystem instead of in the database because that seemed more intuitive to me.) The table tags will store all of the tags associated with the photos.

You can grab the SQL required to create the tables here: otfg_tables_1.txt.

And you can set them up in a new database like this:

shell> mysql -u [username] -p [password] [database name] < otfg_tables_1.txt

Next you'll want to set up a user to access this database. Fire up MySQL and run something like this:

mysql> grant all on *.* to [username]@localhost identified by '[password]';

Remember the MySQL username/password you set, you'll need them in a bit.

And with that, the stage is set and ready to be filled with photos! Coming up: How I grabbed my photos and put stuff in the db.

Disclaimer: Please be aware that I'm building this as I go. That means the code I'm sharing hasn't even been tested in the real world. I'm merely showing the steps I'm taking as a guide (and hopefully for some input). In other words, don't try this at home unless you're comfortable with what's going on here.
  • Remap your Home and End keys on a Mac so they operate on the current line rather than the current document. This has been bugging me forever!
    filed under: mac, software, hacks

Google Stops Supporting Search API

The O'Reilly Radar picked up on an internal conversation I was involved with about Google's decision to stop supporting their Search API. There's more at the Radar: Google Deprecates Their SOAP Search API.

I can't help but think of the lawyerly phrase arbitrary and capricious to describe the decision because no one from Google has explained the move in a public forum. I'm sure there were some black hat SEO types using the Search API for nefarious purposes—and I'm guessing the decision to shut it down stems in part from that. (Not to mention that SOAP has fallen out of favor.) But that's like throwing the baby out with the bathwater. Why such a limited alternative? (The API has been "replaced" by the Ajax API, a very limited cousin of the Search API.) And why not move to a REST API if that's where the developer preference winds have blown? In one of my emails I mentioned that this decision reminds me of Microsoft's decision to scrap their MVP program (which was later reinstated). Why alienate future potential power users?

Also, the new Hackzine blog picked up a tip I stumbled on for grabbing an API Key while the grabbing is good: Get a Google SOAP API Key.

Update: Slashdot weighs in. For them it's all about the failure of SOAP, not a misstep by Google.

Gift Idea for new digital photographers

O'Reilly just published a digital media gift guide via press release, and it includes a bunch of great books for the MP3 and digital photo crowd. But they left out one item, imho. (Warning, what is about to follow is blatant self-promotion—but I still think it's a good idea.) If someone you know is getting a new digital camera (or receiving a digital camera or cameraphone for the first time), give them the ultimate accessories: a Flickr Pro account and a copy of Flickr Hacks. Digital cameras should ship with some sort of pipeline to Flickr. And until they do, you can encourage sharing with the account and get them up to speed with almost everything Flickr can do with the book. (This is my own digital media press release.)

And speaking of Flickr Hacks, co-author Jim Bumgardner recently joined Yahoo. He'll be doing his brand of audio and visual Flash hackery for Yahoo Music. Congrats, Jim! (And Yahoo!)

Update (12/18): Flickr added a Flickr Gifts page, and Paul Stamitiou put together a Flickr Gift Guide for the Flickrist in your life. (Alas, no mention of Flickr Hacks—but you know better!)
« Older posts  /  Newer posts »