Jun 14, 2011

Twitter is Unix. Facebook is Windows.

Last week, I wrote a long, rambling post about how Twitter is on a path to radically change the way web apps communicate. A friend who actually read the whole thing was kind enough to let me know that it didn’t make a whole lot of sense. So, armed with that feedback, I’m going to take another crack at getting my point across.

I’m going to split my argument into two parts. In this post, we’ll look at how Twitter’s new photo service represents a shift in strategy that will ultimately threaten Facebook. In a follow-on, we’ll look under the hood at the cool new capabilities third-party apps will gain as a result. It turns out that a good way to think about both of these subjects is to look for parallels between Twitter and Facebook and traditional desktop computing platforms, so we'll follow that thread throughout.

Back when the Mac first introduced the world to the graphical user interface, a great way to stir up any gathering of computer geeks was to start a debate about which was better, the GUI or the command line interface (CLI). DOS and Unix power users took the side that, once you mastered all the arcane commands, CLI’s were much more powerful and productive. Mac converts, of course, argued that GUI’s were not only more powerful, they were more approachable, easier to use, and more elegant.

Ironically, both sides were right. For the tasks that a large subset of users were performing at the time (programming, data entry, manipulating text files, etc.), CLI’s were more productive. On the other hand, GUI’s enabled a whole slew of new apps which weren’t possible with a CLI (e.g., WYSIWYG word processing). For these tasks, GUI’s were obviously superior.

We all know how that shook out. GUI’s and the new apps they enabled turned out to be a better fit for the vast majority of computer users, with most of the world winding up on Windows (funny how that worked out). But for a certain class of tasks frequently performed by power users, system administrators, and developers, CLI’s still reign supreme, most notably on Unix-derived platforms. Of course, even the power users tend to use GUI's some of the time, so they gravitate towards platforms with good support for both interaction models.

Powerful vs. Approachable
So, what does this have to do with Twitter and Facebook? Let’s take a look at some UI examples:

Exhibit 1A:
This is one of many ways that a Unix user might search for all the occurrences of “N bottles of beer” (where N is any number) in text files under the current directory. It may be inscrutable if you don’t know the lingo, but once you get the hang of it, this way of doing things is extremely powerful (especially that vertical bar in the middle; more on that in the next post).

Exhibit 1B:
Here’s how you would probably do the search in windows. It’s both easier (for non-experts) and much less powerful.

Exhibit 2A:
Here's a tweet by a Brazilian TV personality in which he comments on a photo shared byAshton Kutcher showing the two of them drinking beers in Rio. If nothing else, the information density is quite impressive.

Exhibit 2B:
Finally, here’s a comment on a photo shared in Facebook. It’s a little ugly, but you can probably figure out what’s going on even if you’ve never used Facebook before.

Hopefully, the parallels are obvious. Twitter has a CLI and Facebook has a GUI. This is true of pretty much every aspect of the Facebook and Twitter interfaces. On Facebook, you have a rich photo sharing UI. On Twitter (until last week), 140 characters of text. On Facebook, you have a calendar for scheduling events. On Twitter, text. And the list goes on. As is also the case with Windows, you can’t exactly say that the Facebook interface is elegant, but it certainly gets the job done in a way that’s much more approachable to a broad range of users.

Linux vs. MacOS
It seems pretty clear that Twitter needs a GUI if it’s going to challenge Facebook for mass-market appeal. In fact, the need is so clear that third-party services started filling in the gaps several years ago by building dedicated GUI’s for common Twitter tasks. First came the Twitter-focused photo sharing services (twitpic, yfrog, etc.) that wrapped a nicer interface around image uploading and linking. Then there were the check-in services (Foursquare and Gowalla), with GUI’s optimized for sharing location. And more recently there have been a surge of Twitter-attached news reader GUI’s (Flipboard, Pulse, etc.).

It’s tempting to say that focusing on the core CLI and letting others provide the pretty interfaces is a good strategy. Leaving the GUI’s to developers has the nice effect of giving users a variety of options for any given task; they can build a platform that exactly suits their needs from a bucket of pluggable parts. And not competing with the developer community seems like a good way to keep it healthy and growing, which will lead to more apps on the platform than you could ever build yourself.

But to see the fallacy of this line of reasoning, we just have to look at how this strategy worked out for Linux. The interesting thing about Linux is that the kernel is closely controlled by a small group of core contributors, led by a benevolent dictator. The kernel developers have a laser-focused mission of providing the best possible low-level OS infrastructure, very deliberately leaving it to the community to fill in the rest of the user-level components that go into making a complete platform. It's hard to argue with this approach, as the Linux kernel has made its way into hundreds of millions (billions?) of computing devices of every imaginable type.

But Linux has struggled with adoption in one key market: mass-market consumer desktops. With the job of building a GUI left to the community, an array of projects and organizations obligingly stepped up. As a result, for each component of the desktop environment, users can choose from a plethora of options of varying functionality, interface style, and, unfortunately, quality. This is great for power users, who are happy to have choices and flexibility. But it's a total bomb with most consumers, who are confused by the options and inconsistent styles and want things to just work out of the box.

Contrast this with the other major Unix-derived desktop platform: MacOS. Apple happens to be run by a (less benevolent) dictator who cares as much about creating usable and elegant GUI’s as Linus does about implementing efficient scheduling algorithms. As a result, every Mac ships with not just nicely-designed core GUI components (like the window manager, finder, system preferences, etc.), but a suite of bundled apps covering the most popular use-cases for the platform (iTunes, iPhoto, Mail.app, etc.).

The result is the best out-of-box experience in the industry. And now that MacOS is finally reaching maturity (a process that seems to take about a decade for every new platform), the superior user experience is translating to rapidly growing market share. It’s important to note that they’ve accomplished this without sacrificing one bit of the power of the CLI, since every Mac also ships with a terminal app and a full suite of Unix utilities.

Until last week, Twitter was headed full steam down the Linux path. Its developers seemed intent on building a powerful kernel, exposing it via a text-based interface and a simple but powerful API, and letting the community take care of the rest. But with the launch of the photos UI last week, they’ve changed course. If they follow this up with more UI’s (e.g., check-ins, bookmarks, events, Q&A) without sacrificing the power-user interface, then they have a good shot at becoming the MacOS of social networking.

Platform vs. Developers
One of the hot topics around Twitter's new strategy is whether it actually winds up hurting them by alienating the third-party developers who have been instrumental to their growth so far. It's instructive to look at how this played out on the Mac.

Like Twitter, when Apple started bundling apps with MacOS, they put themselves in direct competition with developers. At the time, it was easy to argue that this was bad for the developer community (and certainly was for the developers of directly competitive apps), but the net result has been that the bundled apps have made for a more compelling platform. That has led to more users, which has in turn created a much larger market for third-party apps. In the long run, it’s been a big positive for the developer ecosystem as a whole, and the Mac app market is thriving.

Twitter is just starting to navigate the same path. In the short run, bundling apps is mostly bad for developers since they’ll be competing with Twitter for the same base of users, with Twitter having a massive advantage. But in the long run, there’s a big opportunity for those who can retarget their apps to be less directly competitive. The bundled apps should lead to a combination of new and more engaged users, which will be a huge win for the developers who survive the transition.

Twitter vs. Facebook
Despite the fact that Apple has been the darling of the tech industry for most of the last decade, the competitive threat from MacOS seems to have somehow caught Microsoft off guard. I suspect that the reason has something to do with the fact that it made such a slow and incremental transformation from a self-proclaimed niche OS (remember when Apple used to compare itself to Mercedes?) to a full-featured platform that can do everything Windows can, and then some.

Twitter can, if it plays its cards right, do the same. Compared to Facebook, it’s still a niche service today. A photo sharing UI that isn’t all that different from existing third-party services may not seem like it changes that in any significant way, but if it's the first step of the guification of Twitter, then Facebook should be worried. A Twitter that maintains all the capabilities that make it unique while also acquiring a simple and elegant GUI is likely to be a real threat to the Facebook monopoly.

Jun 9, 2011

iOS, Twitter, and the World Wide Message Bus

When Apple announced the deep Twitter integration in iOS 5 on Monday, I was completely befuddled. Twitter’s user experience is pretty much the antithesis of Apple’s. The main UI is an never-ending list of short blobs of text containing lots of symbols and abbreviations. Can you imagine getting that past Steve Jobs? And while Twitter certainly has a large and loyal user-base, I have a hard time believing that it appeals to anywhere near the full breadth of iOS users. So, if Apple was going to pick one service to anoint, why Twitter?

One answer is that they would have preferred Facebook but the relationship went sour after the Ping debacle. It’s also possible that Apple sees Facebook as more of a competitive threat than Twitter. Both reasons are plausible, but after puzzling over it for a couple of days, it occurred to me that this could instead be a visionary move that foreshadows all sorts of exciting new possibilities for the web.

Just as Apple is using its clout to push the desktop peripherals industry towards a new communication bus, they could be doing the same for web apps by embracing Twitter as the message bus by which iOS will communicate with, well, everyone. The potential for Twitter to play this role has been obvious to many observers for years now, but as with so many other innovations, it looks like Apple could be the first to take it mainstream (and get all the credit in the process).

What’s a Message Bus?

Message buses are best described in Enterprise Integration Patterns (which, despite having Enterprise in its title, is a must-read for anyone designing connected systems). In a nutshell, when developers need to make apps talk to one another, their first try is usually a Service-Oriented Architecture (SOA). Each app publishes a well-defined public interface which is accessed directly by every app that needs to communicate with it. This is miles better than reaching into a shared database (in an enterprise context) or scraping html (in a web context) because it allows the apps to make all sorts of internal changes without breaking one another, so long as they maintain compatibility with their published API’s. SOA is ubiquitous on the web at this point, with almost every popular web app exposing an API.

SOA’s have quite a few shortcomings, however. For the purposes of our discussion, the most important ones are that: 1) App developers have to know about all of the other apps they’ll need to talk to in advance, 2) If an app needs to change its public interface, then all of the apps that talk to it will have to be updated, and 3) Because a direct integration is required, competitive concerns often prevent popular apps from talking to one another.

The result is that apps only talk to a predetermined handful of other apps, and keeping everything working is a big pain for everyone involved. If you’re building, say, a mobile OS with deep support for web services, then you’re faced with a real dilemma. Which handful of services should I integrate with? What if they change their API’s and I have to push an OS update to deal with it (which users may or may not actually install)? What if they won’t give me business terms that I like?

The popular solution in enterprise apps these days is to use a messaging system for integration. Rather than directly communicating with one another, apps instead publish messages into a message channel, which queues them for processing. This processing usually involves routing them through a series of additional channels, potentially transforming them along the way, before ultimately delivering them to one or more other apps. The apps don’t have to know anything about one another; they only need to be able to receive and publish messages via channels. The messaging system takes care of wiring them together.

The whole system of channels, queues, and routers is often called a message bus. It may sound unnecessarily complicated, but in practice, the loose coupling that results from a bus-centric architecture makes life a lot easier for everyone involved, especially when it comes to integrating a large numbers of apps over a long period of time. And more importantly, it’s infinitely more flexible, allowing apps to be connected in ways that their developers never anticipated.

The Twitter Bus

At this point, Twitter’s role as a message bus should be obvious. As a user, I want to see articles from the New York Times in Flipboard. Instead of hoping that the NYT and Flipboard will establish a business relationship that results in the NYT adding yet another button to all its articles, I just follow the NYT on Twitter. The NYT publishes “new article” messages into its tweet stream (which is actually a message channel), and Twitter routes them into my home timeline (another message channel), where they’re picked up by Flipboard and beautifully rendered on my iPad. Flipboard and the NYT don’t need to know anything about one another for this to work.

It’s even more powerful when there are other channels in between. Instead of directly following the NYT, I follow another user who, by retweeting, routes a message from the NYT’s channel to my channel via her own, providing a value-added filtering service in the process. The NYT doesn’t even need to have a Twitter feed for this to work, as users can act as human message bridges by pasting links into tweets.

Where this gets really interesting, though, is when you consider what happened when Twitter launched their photo service last week. Under the hood, they’ve created a new channel just for photo messages and added some metadata to tweets so that photos can be easily identified and routed. They’ve also built a new service which subscribes to this channel and extracts and indexes that metadata so that photos can be, surprise, displayed in an interface optimized for viewing photos.

For now, it appears that these messages will still always be accompanied by a short piece of human-readable text and will also be routed to your home timeline, but there’s no reason why this has to remain the case. If you think of each tweet as a collection of typed messages, (which currently include text, location, and now photo messages), then there’s really no reason why a tweet needs to include the text. And if there’s no text, then there’s no need for it to be delivered to your home timeline.

Where they go from here seems pretty clear: they’ll create channels and define canonical formats for some or all of the important message types that are currently flooding Twitter. Looking back over my timeline for the last couple of days, I see all sorts of different things shoehorned into tweets:

  • Link shares
  • Status updates
  • Photo shares
  • Check-ins
  • Group chats
  • Event announcements
  • Questions and answers
  • New blog post announcements
  • Job postings
  • Micro reviews
  • Product support requests

Once these messages are segregated into their own channels, Twitter can create dedicated UI's for them, so that you can, for example, see all the events on a calendar, all the check-ins on a map, all the articles in a news reader, etc (and, yes, many tweets fall into more than one bucket and should therefore be routed to multiple channels).

The home timeline can become more like the Facebook news feed: a summary of what’s going on in all the other channels, with each update rendered in a way that’s appropriate for the message type. Or, if you don’t want to stray too far from the current experience, it can just be a filtered view of those tweets that include text messages.

It's important to note that while we're really just interested in the canonical message formats and the new channels, neither of those is particularly useful unless app developers add support for them, and it's unlikely that they'll do this without some sort of carrot. Having their messages show up in the new type-specific UI's is that carrot that bootstraps the process, so the new front-ends wind up playing a critical role in driving adoption of the new backend features.

Will the magic be gone?

Before looking at the super-exciting implications for app developers, I want to take a moment to address the question that’s probably on everyone’s minds at this point: Won't transforming Twitter into a combination of systems integration plumbing and Yahoo kill the magic that makes it so special? After all, a lot of its beauty and appeal comes from the fact that you can do so many different things all in one place and in one constrained format. Will the cool kids still use it to hobnob about node.js and organize revolutions when the Yahoo Answers crowd shows up?

My take is that making this transition is the only way to save the magic. As more apps are hijacking my twitter timeline as a communications channel, it’s getting harder to follow the stuff that has made Twitter special from the beginning: micro-blog posts from people I care about. Putting them into their own filtered channel and putting everything else somewhere else is the only way to keep a service alive that in any way resembles the one that Twitter launched with. It will just happen to be one of many apps hosted on the Twitter platform.

What about developers?

Now that we've gotten that out of the way, what does all this mean for app developers? In short, it changes everything. For starters, the Twitter user-base is about to get a lot bigger and more engaged, so you ignore it at your own peril. If you had asked me last week which was the most important social service to support in your app, I would have said, “Facebook, duh.” This week, I’m inclined to say Twitter.

On top of that, if you’re building a general-purpose app for creating and viewing one of the important message types, then you’re going to find yourself competing with Twitter. If you want to stay in business, you should be thinking long and hard about how you’re going to differentiate.

As a thought experiment, let’s take a look at what’s likely to happen to all the check-in apps when Twitter inevitably launches a competitive service. From what I understand of Foursquare and Gowalla (which admittedly isn’t much), they haven’t gotten a ton of traction with any use-cases beyond saying, “Hey, I’m here,” and a large portion of their check-ins are shared on Twitter. Once Twitter adds native check-in support, I don’t see any compelling reason to use the Foursquare app to hit the check-in button. And perhaps more importantly, because they’re not providing much of value at foursquare.com, I don’t see any good reason to click on a link in someone's Foursquare check-in if I can see the location, status, and photo directly on Twitter.

Contrast this with Yelp’s check-in service. The Yelp app can let you check-in and write a review at the same time, so it makes sense for me to check-in there. And because there’s deep content at yelp.com, I have good reason to click on a link in someone's Yelp check-in. On top of that, they’ve done a great job of fostering a passionate community that keeps users coming back.

Twitter’s check-in service will only be a good thing for Yelp. They can push their check-ins to Twitter, where they’ll become visible in Twitter’s check-in view, which will in turn drive traffic back to Yelp. In addition, they can consume Twitter check-ins from Yelp users, which may originate either directly from the Twitter app or from another check-in app that publishes to Twitter (this last point is key: Yelp can now consume check-ins from apps that its developers may never have heard of). Yelp can then both display the check-ins directly and use them as an input to a recommendations engine.

If I were running Gowalla, I’d be looking for a way to build a more focused community, to build deep content on my site that will drive click-throughs on check-ins displayed in the Twitter app, and to allow users to do something else useful at the same time they’re checking in. For example, I might take advantage of the fact that I’m headquartered in the Live Music Capitial of the World to focus the app around check-ins at live music events.

Music lovers are a great demographic to build a community around, and it’s easy to build deep content and services for them (band info, venue schedules, music purchases, ticket sales). I can do something useful when I check in (rate the band and the venue and access all of the above services), and the check-in data can feed into a recommendations engine that helps me discover new bands and events or even people with similar interests.

The Cool Part

Ok, when this all shakes out, services like Yelp will be able to do more or less what they’re already doing, only better, and some others will go under if they don’t adapt. But the really cool part is all the new possibilities that arise when anyone can connect any two apps (something Unix users have been doing for the last forty years, by the way). For this to happen, we’ll need a little bit of extra functionality in the Twitter API. Specifically, apps will need to be able to create new message channels and routing rules in a user’s account.

To illustrate how this would work, I’ll use photos as an example. Let’s say I use SmugMug as my main photo repository. When I connect it to my Twitter account, it creates two message channels for each of my albums, an inbox which it monitors for new photos to add to the album, and an outbox for publishing notifications that new photos have been added. By creating a rule to route messages to an album’s inbox, I can now use any app that outputs photo messages as a SmugMug uploader. This can be my phone, another photo service, an image editor, my Twitter stream, etc. Likewise, by routing messages from the outbox to a channel associated with another app, I can cause all photos that are added to the album to be tweeted, uploaded to Facebook, printed via Shutterfly, etc.. The possibilities are endless.

Setting up routing rules may sound too hard for ordinary users, but it doesn’t have to be. When an app creates a channel, it specifies a human-readable name, a list of supported content-types, and an ACL. All SmugMug needs to do is add two buttons to its interface. A “Link Album” button triggers SmugMug to query Twitter for a list of all channels in my account which support photo messages and that SmugMug has permission to publish to. SmugMug shows their names to me (e.g., “benstrong@facebook - Wall Photos” and “Ben’s Epson Artisan 835”). I pick one, SmugMug tells Twitter to create the routing rule, and we’re done. The other button is “Send Photo”, which allows me to publish a single photo to a specified channel (e.g., one that’s linked to the print queue on my home printer).

This example barely scratches the surface of what you can do with this architecture, especially when you start to consider creating longer pipelines of apps, getting people involved in the processing and routing of messages, and taking advantage of the social graph to direct message flow. I can't even imagine all the possibilities, but I can guarantee you that making this infrastructure available to developers will kick off a huge wave of innovation.

Back to Apple

So how does this play out on iOS? For starters, all of those apps that now have Twitter support baked in will gain the ability to publish to any destination on the web. And third-party iOS developers will be able to easily add this capability, too. In addition, I expect that Apple’s apps will also become message consumers. So, for example, the iOS Photos app will be able to view photos from any service publishing to Twitter.

This is a big deal for iCloud, too. Undoubtedly, iCloud will launch with Yet Another Web API. In the normal course of events, Apple would have to wait a non-trivial amount of time for a critical mass of apps to add support for it. But with Twitter as an intermediary, they can pick up tons of apps without any of them having to do iCloud-specific integration work.

Back to Reality

So is this really what Apple and Twitter are planning? I don’t know, but I hope so. Apps are already using Twitter as a message bus, but in an ad hoc way that both doesn’t realize its full potential and is slowly eroding the core micro-blogging experience. So, even if Twitter is still reluctant to depart from their original mission, I think there’s a good chance they’ll be forced to do something along these lines just to preserve it.

And if they don’t do it, someone else surely will. Twitter, with its unique combination of a large user-base and social graph, a scalable messaging infrastructure, and a ton of app integrations, is undoubtedly best positioned to pull it off, but Facebook, Google, and Microsoft probably all could, as well (the best long-term solution is a completely decentralized bus, but that’s a subject for another post).

In any case, my money is on Twitter to do it first. They’re already so close, and lots of forces are at work pushing them to finish the job. I have to believe they’re doing everything they can to capitalize on the opportunity.

Nov 26, 2010

Slow-Start Follow-Up

I guess it was a slow news day today, because my post on how sites are goosing the tcp slow-start algorithm has received a couple of orders of magnitude more attention than I expected. On the one hand, it's been pretty cool for my second blog post ever to make it to the front page of HN, reddit, and Slashdot. On the other hand, I feel awkward about the whole thing because, while I'm reasonably familiar with the issues I'm talking about, I'm by no means an expert. And in retrospect, I probably should have used a less inflammatory title for the post (though it's a little late to retract it now, so I'll leave it as is).

So, this is probably a good time to couch my observations with a few qualifications:
  1. My analysis wasn't particularly rigorous, and I didn't even save my data (other than in the blog post). If I had know how much attention this would get, I would have done a lot more test runs of more sites from more locations (and I may as a follow-up).
  2. It's entirely possible that I'm misinterpreting what I saw. In particular, there may be some cwnd caching going on somewhere a la RFC-2140. However, since Google mentions that they don't believe cwnd caching is effective in their presentation on IW10, I think that's unlikely in the www.google.com case. It does, however, seem like a plausible explanation for the www.microsoft.com behavior.
  3. A number of commenters have taken issue with my invocation of network neutrality. They make a good point. This isn't a net neutrality issue, strictly speaking. It is (arguably) an issue of network fairness, however.
Those points aside, my goal was to spark some discussion in the web development community around whether it's time for widespread deployment of a larger IW in spite of the fact that the tcp community is still divided on the issue. That mission, at least, seems to have been accomplished.

Nov 25, 2010

Google and Microsoft Cheat on Slow-Start. Should You?

A Quest for Speed

I decided a couple of weeks ago that I wanted to build an app, most likely a web app. Being a premature optimizer by nature, my first order of business (after deciding I need to learn to draw) was to find the absolute fastest way to serve up a web page. The Google home page is the fastest-loading page I know of, so I thought a good place to start would be to figure out how they do it and then replicate their strategy.

The full story of my search is below, but the short version is that to match Google's page load times you have to cheat on the tcp slow-start algorithm. It appears that stretching the parameters a little bit is fairly common, but Google and Microsoft push it a lot further than most. This may well be common knowledge in web development circles, but it was news to me.

Some Sleuthing

My first step was to measure the load time of www.google.com over my home cable modem connection. As a first pass, I timed the download with curl:

$ time curl www.google.com > /dev/null  
  % Total  % Received % Xferd Average Speed  Time  Time   Time Current  
                  Dload Upload  Total  Spent  Left Speed  
 100 8885  0 8885  0   0  115k   0 --:--:-- --:--:-- --:--:-- 173k  
 real     0m0.085s

Holy smokes, that was fast! We were able to open a tcp connection, make an http request, receive an 8KB response, and close the connection all in 85ms! That's even faster than I expected, and demonstrates that it should be possible to build an app with a page load time below the threshold that humans perceive as instantaneous (about 150ms, according to one study). Sign me up.

Curious about how they pulled that off (did someone sneak into my house and install a GGC node in the attic?), I fired up tcpdump to take a closer look. What I saw surprised me:

$ tcpdump -ttttt host www.google.com

# 3-way handshake (RTT 16ms)
00:00:00.000000 IP > Flags [S], seq 2726806947, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 949329348 ecr 0,sackOK,eol], length 0
00:00:00.016255 IP > Flags [S.], seq 3505557820, ack 2726806948, win 5672, options [mss 1430,sackOK,TS val 688795316 ecr 949329348,nop,wscale 6], length 0
00:00:00.016376 IP > Flags [.], ack 1, win 65535, options [nop,nop,TS val 949329348 ecr 688795316], length 0

# client sends request and server acks
00:00:00.017437 IP > Flags [P.], seq 1:180, ack 1, win 65535, options [nop,nop,TS val 949329348 ecr 688795316], length 179
00:00:00.037139 IP > Flags [.], ack 180, win 106, options [nop,nop,TS val 688795338 ecr 949329348], length 0

# server sends 8 segments in the space of 3ms (interspersed with client acks)
00:00:00.067151 IP > Flags [.], seq 1:1419, ack 180, win 106, options [nop,nop,TS val 688795368 ecr 949329348], length 1418 # segment 1
00:00:00.069693 IP > Flags [.], seq 1419:2837, ack 180, win 106, options [nop,nop,TS val 688795368 ecr 949329348], length 1418 # segment 2
00:00:00.069814 IP > Flags [.], ack 2837, win 65405, options [nop,nop,TS val 949329349 ecr 688795368], length 0
00:00:00.069918 IP > Flags [.], seq 2837:4255, ack 180, win 106, options [nop,nop,TS val 688795368 ecr 949329348], length 1418 # segment 3
00:00:00.070374 IP > Flags [P.], seq 4255:4711, ack 180, win 106, options [nop,nop,TS val 688795368 ecr 949329348], length 456 # segment 4
00:00:00.070486 IP > Flags [.], ack 4711, win 65525, options [nop,nop,TS val 949329349 ecr 688795368], length 0
00:00:00.070796 IP > Flags [.], seq 4711:6129, ack 180, win 106, options [nop,nop,TS val 688795368 ecr 949329348], length 1418 # segment 5
00:00:00.070847 IP > Flags [.], seq 6129:7547, ack 180, win 106, options [nop,nop,TS val 688795368 ecr 949329348], length 1418 # segment 6
00:00:00.070853 IP > Flags [P.], seq 7547:8109, ack 180, win 106, options [nop,nop,TS val 688795368 ecr 949329348], length 562 # segment 7
00:00:00.070876 IP > Flags [.], ack 7547, win 65228, options [nop,nop,TS val 949329349 ecr 688795368], length 0
00:00:00.070900 IP > Flags [.], ack 8109, win 65512, options [nop,nop,TS val 949329349 ecr 688795368], length 0
00:00:00.070962 IP > Flags [P.], seq 8109:9501, ack 180, win 106, options [nop,nop,TS val 688795368 ecr 949329348], length 1392 # segment 8
00:00:00.070990 IP > Flags [.], ack 9501, win 65408, options [nop,nop,TS val 949329349 ecr 688795368], length 0

# connection close (RTT 22 ms)
00:00:00.071300 IP > Flags [F.], seq 180, ack 9501, win 65535, options [nop,nop,TS val 949329349 ecr 688795368], length 0
00:00:00.093299 IP > Flags [F.], seq 9501, ack 181, win 106, options [nop,nop,TS val 688795393 ecr 949329349], length 0
00:00:00.093469 IP > Flags [.], ack 9502, win 65535, options [nop,nop,TS val 949329349 ecr 688795393], length 0

On the performance front, this is really exciting. They actually managed to deliver the whole response in just 70ms, 30ms of which was spent generating the response (come on Google, you can do better than 30ms). That means that a load time under 50ms should be possible.

How they accomplished that is what surprised me. The rate at which a server can send data over a new connection is limited by the tcp slow-start algorithm, which works as follows: The server maintains a congestion window which controls how many tcp segments it can send before receiving acks from the client. The server starts with a small initial window (IW), and then for each ack received from the client, increases the window size by one segment until it either reaches the client's receive window size or encounters congestion. This allows the server to discover the true bandwidth of the path in a way that's fair to other flows and minimizes congestion.

If you look at the trace, though, you'll notice that the server is actually sending the entire 8 segment response before there's time for the first client ack to reach it. This is a clear violation of RFC-3390, which defines the following algorithm for determining the max IW:

The upper bound for the initial window is given more precisely in
   (1): min (4*MSS, max (2*MSS, 4380 bytes))  

Note: Sending a 1500 byte packet indicates a maximum segment size
(MSS) of 1460 bytes (assuming no IP or TCP options).  Therefore,
limiting the initial window's MSS to 4380 bytes allows the sender to
transmit three segments initially in the common case when using 1500
byte packets.

www.google.com is indeed advertising an MSS of 1460, allowing it an IW of 3 segments according to the rfc. In our trace, they appear to be using an IW of at least 8, which allows them to shave off 2 round trips (~50ms) over an IW of 3 for this request. This raises the question: just how far will they go? Let's request a larger file and see what happens:

$ tcpdump -i en1 -ttttt host www.google.com

# 3-way handshake (RTT 22ms)
00:00:00.000000 IP > Flags [S], seq 2589435808, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 949341091 ecr 0,sackOK,eol], length 0
00:00:00.022780 IP > Flags [S.], seq 4085145017, ack 2589435809, win 5672, options [mss 1430,sackOK,TS val 990595778 ecr 949341091,nop,wscale 6], length 0
00:00:00.022913 IP > Flags [.], ack 1, win 65535, options [nop,nop,TS val 949341092 ecr 990595778], length 0

# client request and server ack
00:00:00.023699 IP > Flags [P.], seq 1:193, ack 1, win 65535, options [nop,nop,TS val 949341092 ecr 990595778], length 192
00:00:00.048205 IP > Flags [.], ack 193, win 106, options [nop,nop,TS val 990595802 ecr 949341092], length 0

# server sends 9 segments in 4ms (interspersed with client acks)
00:00:00.082766 IP > Flags [.], seq 1:1419, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1418
00:00:00.083077 IP > Flags [.], seq 1419:2837, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1418
00:00:00.083118 IP > Flags [.], ack 2837, win 65405, options [nop,nop,TS val 949341092 ecr 990595836], length 0
00:00:00.083284 IP > Flags [P.], seq 2837:3966, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1129
00:00:00.083318 IP > Flags [.], ack 3966, win 65441, options [nop,nop,TS val 949341092 ecr 990595836], length 0
00:00:00.085550 IP > Flags [.], seq 3966:5384, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1418
00:00:00.085875 IP > Flags [.], seq 5384:6802, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1418
00:00:00.085976 IP > Flags [.], ack 6802, win 65405, options [nop,nop,TS val 949341092 ecr 990595836], length 0
00:00:00.086045 IP > Flags [P.], seq 6802:8062, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1260
00:00:00.086067 IP > Flags [.], ack 8062, win 65425, options [nop,nop,TS val 949341092 ecr 990595836], length 0
00:00:00.086601 IP > Flags [.], seq 8062:9480, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1418
00:00:00.086709 IP > Flags [.], seq 9480:10898, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1418
00:00:00.086728 IP > Flags [.], ack 10898, win 65405, options [nop,nop,TS val 949341092 ecr 990595836], length 0
00:00:00.086820 IP > Flags [P.], seq 10898:12158, ack 193, win 106, options [nop,nop,TS val 990595836 ecr 949341092], length 1260
00:00:00.086836 IP > Flags [.], ack 12158, win 65425, options [nop,nop,TS val 949341092 ecr 990595836], length 0

# 24ms after first client ack was sent, we get 2 more segments
00:00:00.107116 IP > Flags [.], seq 12158:13576, ack 193, win 106, options [nop,nop,TS val 990595860 ecr 949341092], length 1418
00:00:00.107403 IP > Flags [P.], seq 13576:14651, ack 193, win 106, options [nop,nop,TS val 990595860 ecr 949341092], length 1075
00:00:00.107518 IP > Flags [.], ack 14651, win 65448, options [nop,nop,TS val 949341092 ecr 990595860], length 0

# connection close (RTT 25ms)
00:00:00.107938 IP > Flags [F.], seq 193, ack 14651, win 65535, options [nop,nop,TS val 949341092 ecr 990595860], length 0
00:00:00.129947 IP > Flags [F.], seq 14651, ack 194, win 106, options [nop,nop,TS val 990595884 ecr 949341092], length 0
00:00:00.130071 IP > Flags [.], ack 14652, win 65535, options [nop,nop,TS val 949341093 ecr 990595884], length 0

Interestingly, the server waits for ~1 RTT after sending 9 segments, indicating an IW of 9. This suggests that the value was tuned for the home page (or for the similarly-sized search results page).

How Common is This?

So, is this common practice that I just never noticed before, or is Google the only one doing it? I thought I'd run traces against a few more sites and try to deduce their IWs. Here's what I found:


It looks like goosing the IW to 4 is pretty common practice, but I was about to give up on finding anyone pushing as far as Google until, almost as an afterthought, I tried www.microsoft.com. You have to see it to believe it:

$ tcpdump -i en1 -ttttt host www.microsoft.com

# 3-way handshake (RTT 92ms)
00:00:00.000000 IP > wwwco1vip.microsoft.com.http: Flags [S], seq 2134062443, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 949406122 ecr 0,sackOK,eol], length 0
00:00:00.091960 IP wwwco1vip.microsoft.com.http > Flags [S.], seq 2932567358, ack 2134062444, win 8190, options [mss 1460], length 0
00:00:00.092094 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 1, win 65535, length 0

# request form client and server ack
00:00:00.092909 IP > wwwco1vip.microsoft.com.http: Flags [P.], seq 1:179, ack 1, win 65535, length 178
00:00:00.189451 IP wwwco1vip.microsoft.com.http > Flags [.], seq 1:1461, ack 179, win 64034, length 1460

# server sends 43 segments without pause, for a total of almost 64KB!!! (the full client receive window size)
00:00:00.189780 IP wwwco1vip.microsoft.com.http > Flags [.], seq 1461:2921, ack 179, win 64034, length 1460
00:00:00.190009 IP wwwco1vip.microsoft.com.http > Flags [.], seq 2921:4381, ack 179, win 64034, length 1460
00:00:00.190055 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 4381, win 65535, length 0
00:00:00.190204 IP wwwco1vip.microsoft.com.http > Flags [.], seq 4381:5841, ack 179, win 64034, length 1460
00:00:00.190282 IP wwwco1vip.microsoft.com.http > Flags [.], seq 5841:7301, ack 179, win 64034, length 1460
00:00:00.190325 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 7301, win 65535, length 0
00:00:00.192433 IP wwwco1vip.microsoft.com.http > Flags [.], seq 7301:8761, ack 179, win 64034, length 1460
00:00:00.192742 IP wwwco1vip.microsoft.com.http > Flags [.], seq 8761:10221, ack 179, win 64034, length 1460
00:00:00.192834 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 10221, win 65535, length 0
00:00:00.192965 IP wwwco1vip.microsoft.com.http > Flags [.], seq 10221:11681, ack 179, win 64034, length 1460
00:00:00.193438 IP wwwco1vip.microsoft.com.http > Flags [.], seq 11681:13141, ack 179, win 64034, length 1460
00:00:00.193523 IP wwwco1vip.microsoft.com.http > Flags [.], seq 13141:14601, ack 179, win 64034, length 1460
00:00:00.193627 IP wwwco1vip.microsoft.com.http > Flags [.], seq 14601:16061, ack 179, win 64034, length 1460
00:00:00.193916 IP wwwco1vip.microsoft.com.http > Flags [.], seq 16061:17521, ack 179, win 64034, length 1460
00:00:00.194171 IP wwwco1vip.microsoft.com.http > Flags [.], seq 17521:18981, ack 179, win 64034, length 1460
00:00:00.194257 IP wwwco1vip.microsoft.com.http > Flags [.], seq 18981:20441, ack 179, win 64034, length 1460
00:00:00.194365 IP wwwco1vip.microsoft.com.http > Flags [.], seq 20441:21901, ack 179, win 64034, length 1460
00:00:00.199122 IP wwwco1vip.microsoft.com.http > Flags [.], seq 21901:23361, ack 179, win 64034, length 1460
00:00:00.199129 IP wwwco1vip.microsoft.com.http > Flags [.], seq 23361:24821, ack 179, win 64034, length 1460
00:00:00.199164 IP wwwco1vip.microsoft.com.http > Flags [.], seq 24821:26281, ack 179, win 64034, length 1460
00:00:00.199251 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 23361, win 65535, length 0
00:00:00.199255 IP wwwco1vip.microsoft.com.http > Flags [.], seq 26281:27741, ack 179, win 64034, length 1460
00:00:00.199810 IP wwwco1vip.microsoft.com.http > Flags [.], seq 27741:29201, ack 179, win 64034, length 1460
00:00:00.200126 IP wwwco1vip.microsoft.com.http > Flags [.], seq 29201:30661, ack 179, win 64034, length 1460
00:00:00.200135 IP wwwco1vip.microsoft.com.http > Flags [.], seq 30661:32121, ack 179, win 64034, length 1460
00:00:00.200403 IP wwwco1vip.microsoft.com.http > Flags [.], seq 32121:33581, ack 179, win 64034, length 1460
00:00:00.200503 IP wwwco1vip.microsoft.com.http > Flags [.], seq 33581:35041, ack 179, win 64034, length 1460
00:00:00.202268 IP wwwco1vip.microsoft.com.http > Flags [.], seq 35041:36501, ack 179, win 64034, length 1460
00:00:00.202301 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 36501, win 65535, length 0
00:00:00.202792 IP wwwco1vip.microsoft.com.http > Flags [.], seq 36501:37961, ack 179, win 64034, length 1460
00:00:00.203063 IP wwwco1vip.microsoft.com.http > Flags [.], seq 37961:39421, ack 179, win 64034, length 1460
00:00:00.203642 IP wwwco1vip.microsoft.com.http > Flags [.], seq 39421:40881, ack 179, win 64034, length 1460
00:00:00.205313 IP wwwco1vip.microsoft.com.http > Flags [.], seq 40881:42341, ack 179, win 64034, length 1460
00:00:00.205576 IP wwwco1vip.microsoft.com.http > Flags [.], seq 42341:43801, ack 179, win 64034, length 1460
00:00:00.205905 IP wwwco1vip.microsoft.com.http > Flags [.], seq 43801:45261, ack 179, win 64034, length 1460
00:00:00.206253 IP wwwco1vip.microsoft.com.http > Flags [.], seq 45261:46721, ack 179, win 64034, length 1460
00:00:00.206354 IP wwwco1vip.microsoft.com.http > Flags [.], seq 46721:48181, ack 179, win 64034, length 1460
00:00:00.206627 IP wwwco1vip.microsoft.com.http > Flags [.], seq 48181:49641, ack 179, win 64034, length 1460
00:00:00.206655 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 49641, win 65535, length 0
00:00:00.208561 IP wwwco1vip.microsoft.com.http > Flags [.], seq 49641:51101, ack 179, win 64034, length 1460
00:00:00.208883 IP wwwco1vip.microsoft.com.http > Flags [.], seq 51101:52561, ack 179, win 64034, length 1460
00:00:00.209052 IP wwwco1vip.microsoft.com.http > Flags [.], seq 52561:54021, ack 179, win 64034, length 1460
00:00:00.209290 IP wwwco1vip.microsoft.com.http > Flags [.], seq 54021:55481, ack 179, win 64034, length 1460
00:00:00.209373 IP wwwco1vip.microsoft.com.http > Flags [.], seq 55481:56941, ack 179, win 64034, length 1460
00:00:00.209677 IP wwwco1vip.microsoft.com.http > Flags [.], seq 56941:58401, ack 179, win 64034, length 1460
00:00:00.209758 IP wwwco1vip.microsoft.com.http > Flags [.], seq 58401:59861, ack 179, win 64034, length 1460
00:00:00.210097 IP wwwco1vip.microsoft.com.http > Flags [.], seq 59861:61321, ack 179, win 64034, length 1460
00:00:00.210188 IP wwwco1vip.microsoft.com.http > Flags [.], seq 61321:62781, ack 179, win 64034, length 1460
00:00:00.210216 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 62781, win 65535, length 0
00:00:00.210471 IP wwwco1vip.microsoft.com.http > Flags [.], seq 62781:64241, ack 179, win 64034, length 1460

# finally, the server waits for an ack before continuing
00:00:00.282291 IP wwwco1vip.microsoft.com.http > Flags [P.], seq 64241:65701, ack 179, win 64034, length 1460
00:00:00.282420 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 65701, win 65535, length 0
00:00:00.284785 IP wwwco1vip.microsoft.com.http > Flags [.], seq 65701:67161, ack 179, win 64034, length 1460
00:00:00.285122 IP wwwco1vip.microsoft.com.http > Flags [.], seq 67161:68621, ack 179, win 64034, length 1460
00:00:00.287224 IP wwwco1vip.microsoft.com.http > Flags [.], seq 68621:70081, ack 179, win 64034, length 1460
00:00:00.287518 IP wwwco1vip.microsoft.com.http > Flags [.], seq 70081:71541, ack 179, win 64034, length 1460
00:00:00.287746 IP wwwco1vip.microsoft.com.http > Flags [.], seq 71541:73001, ack 179, win 64034, length 1460
00:00:00.288043 IP wwwco1vip.microsoft.com.http > Flags [P.], seq 73001:74396, ack 179, win 64034, length 1395
00:00:00.288084 IP > wwwco1vip.microsoft.com.http: Flags [.], ack 74396, win 65535, length 0

Microsoft appears to be skipping slow-start altogether and setting the IW to the full client receive buffer size. Crazy.

Some Discussion

A search for "google tcp initial window" turns up a Google-authored research paper and Internet-Draft proposing a change to the slow-start algorithm to allow an IW of up to 10 segments (IW10). Interesting.

There's also a lively ongoing discussion on the IETF TMRG mailing list. I haven't read every post (there have been hundreds over the last few months), but it seems that most of the participants are approaching this as a theoretical problem, not as an issue that is actually occurring in the wild and needs to be addressed. The Google engineers on the mailing list have taken on a more frustrated tone recently, so it's possible that they decided the best way to make forward progress was to just turn it on and see whether the internet actually melts down or not. It's also possible that I happen to part of an ongoing test that they're running.

I wasn't able to find any discussion relevant to what I saw in my Microsoft trace.


It's getting late, so I'll wrap this up with a few thoughts:
  1. Fast is good. I'm excited to see that sub-100ms page loads are possible, and it's a shame to not be able to take full advantage of modern networks because of protocol limitations (http being the limiting protocol, btw).
  2. Being non-standards-compliant in a way that privileges their flows relative to others seems more than a little hypocritical from a company that's making such a fuss about network neutrality.
  3. I'm not really qualified to render judgment on whether IW10 is a net positive, but after reading the discussion (and considering that the internet hasn't actually melted down), I'm inclined to think that it is.
  4. I'm pretty confident that turning off slow-start entirely, as Microsoft seems to be doing in my trace, is a very bad thing (maybe even a bug).
So, this leaves the question, what should I do in my app (and what should you do in yours)? Join the arms race or sit on the sidelines and let Google have all the page-load glory? I'll let you know what I decide (and if I do it, I'll be sure to let you know how it works out).

Nov 16, 2010

Learning to Draw (and Blog)

When I first started developing software, I did a lot of work on apps. The bulk of it was on PalmOS, back when graphics were still 1-bit monochrome and clean, spare interfaces ruled the day. I'm sure a designer would have been appalled by the constraints of the platform, but I loved it. I have just enough design sense to create a nice-looking 1-bit interface, which meant that I was able to design and implement professional quality apps entirely on my own. It was fun, but I was lured off into the wilderness of servers and systems software and gradually lost touch with app development.

Flash forward ten years. I woke up the other day and thought it would be fun to build an app. I spent some time looking into how it's done in the year 2010 and came to the startling realization that I no longer have the skills it takes to do it on my own. The standards for design and production values have increased so much in the last decade (thanks Apple) that my skills are nowhere near up to the task.

This bothers me a lot, so I've decided to try to remedy the situation. I figure the first step is to learn to draw, so I ordered Nancy Edwards' Drawing on the Right Side of the Brain. The premise of the book (and others like it, I'm sure) is that anyone can learn to draw well, and the pervasive myth that being good at it requires natural talent stems from the fact that our schools do a lousy job of teaching drawing to those who don't come to it naturally.

We'll see whether she's right, but I'm optimistic for now. The first exercise in the book is to draw a self portrait pre-instruction, so you can track your progress over time. Mine is no work of art, but after nothing more than a good pep talk, I was pleasantly surprised to be able to produce a drawing that is at least recognizable as a picture of me (I'm usually more of a stick figures kind of guy).

I'll be posting my progress here as I work my way through remedial drawing and hopefully on to other design-related topics.