google

Finding Lost URLs

A week or so ago, a page by Professor Solomon called The Twelve Principles made the link rounds. The prof lays out a 12-step plan for finding any lost object. Most of the principles are mental tricks to get you back to the place you lost a physical object: your keys, your glasses, your cellphone, etc.

Unfortunately, the principles don't translate well to digital objects like URLs. You didn't stick that URL for the Xbox hacking How-To in your junk drawer, and it's not likely to be stuck in the "Eureka Zone" under your keyboard. But I lose URLs all the time. I remember something I saw on the web a couple weeks ago and I can't figure out how to get there again.

I don't have anything close to a 12-principle system for finding lost URLs, but I thought it'd be fun to examine my haphazard ways of re-finding web things. These are probably obvious, but I thought collecting them together would help me start a system for finding those lost pages, blog posts, and other digital artifacts that I'd like to see again.

1. Google - As you already know, Google is great at finding things, and I can usually get back to old URLs by remembering keywords for the document. Even if I don't find exactly what I was after, I can sometimes find good substitute information on the same subject. Unfortunately, a query like "SQL Remove Duplicates" will bring up thousands of documents, and if I'm looking for a specific bit of code I found once for removing duplicate records in a database the search has to go to the next stage.

2. Browse Browser History - Ctrl-H in the browser will bring up your surfing history and it can be a lifesaver if I know I visited the URL within the last week or two. It's especially helpful if I can remember the approximate time I was visiting the page I want to find, and I sort the history by date. But because browser histories only show the domain and page title, it's not very useful if I simply remember the subject of the page. I don't think of pages in terms of the domains they're hosted on, I think in terms of the page's content. (Searching your browser cache with something like Google Desktop might be better because you can search the full text of your browsing history, but I haven't started using this regularly.)

3. Revisit Web Haunts - Chances are good that I probably found the link I'm looking for at one of the sites I read regularly. Since I follow hundreds of sites with the news reader Bloglines, this can be a big search. Unfortunately the "Search My Subscriptions" feature at Bloglines isn't working for me, so generally I'll try to narrow down which site would have had the URL and then go back in time for each site individually using the "Display items within the last x" feature. Then Ctrl-F can help me find specific keywords within past posts. Google can also come in handy here. If I know I spotted a link about SQL on O'Reilly Radar, I can use the site: keyword like this: site:radar.oreilly.com SQL.

4. Search People - del.icio.us just rolled out a feature called your network that lets you track other del.icio.us members. There's no search yet, but you can browse back in time to see what people you know bookmarked at del.icio.us. I think this'll be handy, and I have gone back into specific people's del.icio.us archives looking for a URL. Having them all in one place is good for browsing, and saves time if I can't remember exactly who posted the link I'm looking for.

del.icio.us leads into my primary strategy for finding lost URLs: make links more findable before they're lost. Here's how I do it.

1. Use Web-based Bookmarks - I use del.icio.us (my bookmarks), but there are a bunch of web bookmark systems out there. When I come across a URL I know I'm going to want to get back to at some point, I'll click the del.icio.us bookmarklet and tag it. Searching my del.icio.us bookmarks is easy, but like your browser history, you're only searching titles, tags, and notes, not the full text of the site you bookmarked. Yahoo's My Web, and Google's Personalized Search both do better on the searching front—which leads to...

2. Turn on Search History - Privacy implications aside, I've found Google's Personalized Search handy for finding lost URLs even though I have mixed feelings about it. Once enabled, Google will remember every query you make and every search result you clicked on. You can then search just those sites that you clicked on in the past. Of course, that means everything you've searched for and every site you've clicked on is stored in a digital archive somewhere. I go back and forth, but privacy usually trumps findability for me so I might remove this option from my toolbox soon.

I should echo Professor Solomon's 13th principle: sometimes you can't find what you're after and you have to give up. The Web is ephemeral and pages come and go all the time. Even though it's maddening not to be able to get back to a document I know I've seen, that's life. What strategies am I missing?

Add a batch of dates to Google Calendar

I've always used several calendars to plan out my life. Until recently, I used a paper desk calendar to track work-related events like project milestones. I used an insanely hacked-up version of PHP Calendar to track daily appointments and travel plans. And I used a paper calendar hanging in the kitchen to track family events like birthdays and anniversaries. And to be honest, with all of the calendars I still wasn't very organized. The distinction between types of events and the calendars weren't as clear-cut as I'm describing them, and I'd often have a work project milestone on my kitchen calendar, or a birthday in PHP Calendar, not in their "proper" locations.

What I like about Google Calendar is the ability to lay several calendars on top of each other. So I can keep the family birthdays separate from the project milestones, but I can still show them all on one calendar if I need to. And with a click, I can remove the dates that aren't relevant for what I'm working on at the moment. The calendar list looks like this:

calendar controls

I decided to make Google Calendar my One Calendar To Rule Them All, and the switch has been very easy. The Ajaxy interface makes adding events insanely intuitive—click a day to add an event on that day. And I love the ability to click and drag several days to add weeklong events like conferences. The other big advantage to going digital is the ability to share calendars with other people. I can't easily send all of the data on my paper calendars to friends and family without Xerox and Fedex involved.

The one issue I ran into during the conversion was with family events. I had over 50 birthdays and anniversaries I wanted to add to a calendar, and the thought of clicking Create Event and adding data for each one, or worse—hunting and pecking to find a particular day to click—wasn't appealing. So I thought I'd share my method for dumping a bunch of dates into Google Calendar. You just need a little time to get your dates together, some Perl, and a Google Calendar account.

Import/Export

The Google Calendar doesn't have an API (yet), but it does have a hacker's little friend called import/export. Google accepts two types of calendar formats for import: iCalendar and Outlook's Comma Separated Values (CSV) export. So if you already have calendar data in Outlook or iCal you can simply import/export at will. (Yahoo! Calendar also exports to the Outlook CSV format, so switching is fairly painless.) But I didn't know the first thing about either of these formats, I simply had a list of dates I wanted to dump.

Gathering Dates

I had a head start because I already had a list of family birthdays and anniversaries in a text file. I massaged the list a little to get it into a data-friendly format, and ended up with a file full of dates that looked like this:
4/18/1942,Uncle Bob's Birthday
4/28/1944,Aunt Sally's Birthday
7/23/1978,Lindsay and Tobias' Anniversary
8/10/1989,Cousin Maeby's Birthday
...
(obviously not real data.)

If you're building a list of dates from scratch you can use Excel. Just put dates in the first column in mm/dd/yyyy format, descriptions in the second. When you're done, save the file in CSV format, ignoring all the warnings about compatibility.

I called the file family_dates.csv. Yes, this is a comma-separated value list too, but not the format Google Calendar is expecting. Plus you don't want to add an event on April 18th, 1942. You want to add a full day event for April 18th, each year going forward. This is where I turned to Perl to massage the data.

The Code

This simple Perl script: calendar_csv.pl transformed the simple CSV list of dates and titles into the Outlook CSV format that Google likes to see. As you run the script it converts the year of the event into the current year, and adds an event for the next several years.

You'll need to customize the script a bit before you run it. Change $datefile to the name of your simple CSV file, in my case family_dates.csv. You can change $importfile to your preferred name of the output file, the default is import.csv. And you can set the number of years into the future that you'd like the date to appear by adjusting the value of $yearsahead, the default is 5. (If your events should only be added in the current year, set this to 1.)

Keep in mind that the larger the amount of data in your calendar, the longer it will take Google to load that calendar when you fire up Google Calendar. I originally set the $yearsahead value to 10, but with over 500 events, the calendar was noticably slowing the Google Calendar startup.

In addition to Perl, you'll need the standard Date::Calc module.

And if you're not in the US and would prefer dd/mm/yyyy format, simply change this bit: my ($month, $day) = to this: my ($day, $month) =. Instant internationalization!

Once everything is set, run the script from a command prompt, like this:

perl calendar_csv.pl

A new file called import.csv will magically appear with your dates formatted as Outlook CSV events. With the file in hand you can head over to Google Calendar.

Importing Data

Over at Google Calendar, click Manage Calendars under your calendar listing on the left site. Choose Create new calendar, and give your calendar a name and any other details. Click Create Calendar, and you'll see the new calendar in your list. Now click Settings in the upper right corner of the page, and choose the Import Calendar tab. Click Browse..., choose import.csv from your local files, set the calendar to your new calendar, and click Import.

That's all there is to it. You'll get a quick report about the number of events Google was able to import. Go back to your main view, and you should see your imported dates on the calendar, in the color of your newly created calendar. With one import, my view of April went from this:

calendar pre import

To this view with family birthdays the rust color:

calendar post import

(The details have been removed to protect the innocent.)

And once you have your calendar in Google, you can invite others to view and even help maintain the dates. Where I think this batch importing will be useful is for very large data sets. Imagine a teacher who wants to track the birthdays of students. It wouldn't be too hard to add the dates by hand. But a principal who wants to track the birthdays of everyone in a school will have an easier time putting together a spreadsheet than entering the days by hand. And even for my 50+ dates, writing a Perl script was preferable to entering the dates by hand.

So far I'm enjoying Google Calendar, and I haven't found any major problems beyond the limited importing ability. But now I really don't have an excuse for not sending out birthday cards.

Update (4/20): Google just released their Google Calendar API. I'll bet there are scores of hackers rushing to build bulk-import tools. Using the Calendar API would be a more stable way to import dates quickly. And wow! Hello, lifehackers!

I'm Feeling Googly

After focusing on Yahoo! and Flickr for most of 2005, I've been kicking off 2006 by poking, prodding, and generally hacking another side in the search wars: Google. I'm going to be bringing Google Hacks up to date and into its 3rd edition.

The first edition of Google Hacks was published in February, 2003 and it was a runaway success. Here's an article Tim O'Reilly wrote just months after it was released: Thoughts on the Success of Google Hacks. (The key ingredient? Having fun with technology during the darkest post-bubble days.) Google Hacks, 2nd edition was released in December, 2004 during the mad frenzy to get a gmail account. (Doesn't that seem like ancient history?) It's been over a year, and there are plenty of new topics to cover. 2005 was the year of Google Maps Mashups, and O'Reilly felt the topic deserved its own book: Google Maps Hacks. It's out now—and it rocks! (please note O'Reilly bias, but seriously. it's good.). I'll be including a few Google Maps Hacks in the new addition along with many, many more new Google features that you can tweak to your advantage.

And of course I'll be keeping a close eye on the news that Feds are after Google data. Wired News is already on the case letting people know that there are some privacy hacks you can use with Google or any other search engine. Personally, I'm happy to see Google standing up for their users' privacy.

I'm very excited to be adding to what's already a fantastic book, and I'm honored to be walking the trail that Tara and Rael blazed. Plus I get to play with all of the Google goodness at google.com and from around the Web. I'm searching for the most useful (and fun!) hacks, tips, and tricks I can find to include in the new edition. Got a Google Hack? Lay it on me.

Update: On a negative but important note, Philipp Lenssen is doing good reporting on the latest news that Google Censors Its Results in China.

* crickets *

If I could embed a sound file of crickets chirping in this post, I would. My non-web world is busy at the moment, so the blog suffers.

Google Maps really is all that.

Here are some weblogs I read regularly, but aren't on my sitegeist sidebar: There are many many more, but that's a start. Back to the crickets.

Orkut at OSU

Hey fellow Oregon geeks, Orkut Buyukkokten from Google (yeah, his namesake Orkut) is going to be speaking at OSU on Monday night. The talk is called: Google: A Computer Scientist's Playground. See ya there!

Smackdown Slashdotted

Woke up this morning, checked the server logs, found slashdot linking to Google Smackdown. What other site can illicit joy and dread at the same time? Of course the Google API developer key maximum of 1,000 queries has been reached for today, which sucks. But maybe some will get their own key if they really want to try it out.

So far the server is doing ok with the blip in traffic:

Internet Inbound Traffic Graph

I've mentioned it here before, but you can find the code for Google Smackdown in the new O'Reilly book Google Hacks. (Which is currently ranked #10 on Amazon overall!)

Google buys Pyra

In case you haven't heard, Google bought Pyra—the makers of Blogger. (I joined Pyra a few months after Ev and Meg founded it and was there for about two years.) I think this is a good turn of events for everyone who believed and invested in Pyra/Blogger in the early days. (Anyone close to the company has had a bit of a rough ride with ups and downs.) And it feels good personally to see something I believed in and worked hard for enter a new phase with a company like Google. Sometimes I wish I could still be involved with Blogger's development, but life never goes according to plan. We always had fun anthropomorphizing the application—and this feels like Blogger's graduation.
Newer posts »