Thursday, February 24, 2011

Augmented Reality on the iPad : Project "OwlCam"


Author: Jay Ayres



We are very excited to announce a brand new Virtual Tours feature for our TripAdvisor iPad app, code-named “OwlCam”. When the iPad launched last April, one noticeable shortcoming was its lack of camera, preventing augmented reality apps from really taking off on the platform. However, augmented reality is such a great tool for finding nearby hotels, restaurants, and attractions, and we really wanted to find a way to bring it to our app without waiting for the launch of the iPad 2.

At the same time, TripAdvisor is most useful for travelers planning their trips ahead of time, so our users generally don’t care so much about restaurants where they CURRENTLY are as much as restaurants nearby their hotel on their upcoming trip. So, we realized that our ultimate goal was to find a way to implement “remote” augmented reality.

In the end, the lack of camera on the iPad was not a barrier to development of our feature. The iPad is equipped with all of the other sensors typically used for augmented reality apps, including a compass, accelerometer, and GPS. We were able to leverage Google’s Street View API to provide 3D panoramas of vistas all around the world. The breadth of locations covered by Street View is truly amazing, including the United States, Europe, Australia, New Zealand, parts of Asia, and even parts of Antarctica.

Once we have a Street View panorama loaded on the iPad, we are able to display information and ratings about the hotels, restaurants, and attractions right within the Street View display. The end result is that you can literally drop yourself into a vacation destination and see all of the nearby things to do & places to eat. With one extra tap on a specific location, you immediately warp yourself to the 3D panorama nearby that location. So, it is easy to take a virtual tour of the best places to see in Paris, Hong Kong, or Sydney while sitting on your couch at home. This "warping" behavior is not even possible in standard augmented reality apps, for which you are bound to the viewport in your current location.



Suppose you will be staying at the Hollywood Celebrity Hotel and want to find things to do while you stay there. You see that Grauman’s Chinese Theatre is less than half a mile south, well within walking distance. By clicking on the orange button, you can immediately warp there to see the view.


Once you’re there, you can virtually walk down the street, or find a nearby restaurant where you can eat lunch.


Adding location data in 3D space

It is simple to render Google’s Street View imagery using their Maps API, but we also added a layer in our iPad app to display the rectangular pins showcasing our hotel, restaurant, and attraction ratings. Once we have the GPS coordinates of the pins, we need to calculate the locations onscreen to render the rectangles. First, lets start with the GPS coordinate where the user is currently standing in the virtual viewport. In navigation, a common term is the azimuth, which is the angle on the ground plane between the current location and some other location in the distance, relative to true north.


The azimuth angle can be calculated for each pin based on the arc-tangent of its difference in latitude with the Street View location divided by its difference in longitude with the Street View location, adjusting for quadrant:


LatDiff = PinLatitude - StreetViewLatitude
LngDiff = PinLongitude - StreetViewLongitude

if (LngDiff == 0)
{
if (LatDiff < 0)
{
Azimuth = Π
}
else
{
Azimuth = 0
}
}
else
{
AzimuthIntermediate = Π/2 - arctan(LatDiff / LngDiff)

if (LngDiff > 0)
{
Azimuth = AzimuthIntermediate
}
else
{
Azimuth = AzimuthIntermediate + Π
}
}


Once we have the azimuth angle for each pin, the next step is to find the horizontal location onscreen for that pin. We continually update a special azimuth angle representing an imaginary location straight ahead of us in the field of vision, whenever the user decides to rotate the Street View panorama left or right. The horizontal onscreen location is computed by comparing this special azimuth with the pin azimuth. On the iPad, in landscape orientation we can assume that our field of vision spans roughly 70 degrees out of a possible 360. If the difference between the straight-ahead azimuth and the pin azimuth is greater than 70 degrees, then we do not show the pin at all. Otherwise, we calculate the horizontal coordinate of the pin such that pins with azimuth angles closest to the straight-ahead azimuth appear in the exact center of the screen. Pins with azimuth angles exactly 35 degrees on either side of the straight-ahead angle will appear just on the edge of the screen.

The final step is to calculate the vertical location of each pin onscreen. Here, we have a few options. We do not know the height off the ground of each hotel, restaurant, and attraction, and so we'll need to simulate the intended height value. One option is to keep the pin centered vertically if it is very far away, and move the pin towards the top or bottom of the screen if it is very close to the Street View location. This option has the effect of showing faraway locations right on the horizon. However, in practice, we do not ever want two pins to overlap with each other for display purposes, and by putting many pins on the horizon, they almost always tend to overlap. So, the second option is to just ignore the pin distance and stack pins vertically so that they never overlap, and are also clustered towards the center of the screen. With either option, if the user is allowed to move the Street View panorama up and down, then the vertical pin locations must be offset by the degree by which the user has tilted the panorama.


Making use of device sensors

In the typical augmented reality experience, the user moves their phone or tablet around in real-time, looking through the camera viewport at their surroundings. The device compass, accelerometer, and gyroscope are used to calculate the direction and height in which the user is facing.

Compass
The device compass readings can tell us the straight-ahead azimuth angle as described in the previous section. That is, it tells us the degree by which the current forward direction differs from north. Keep in mind that the compass reading does not change at all based on how the user's phone or tablet is currently being held. Compass readings can be used to calculate the horizontal location of pins onscreen.

Accelerometer
The device accelerometer readings tell us the orientation at which the user's phone or tablet is currently being held. If the user is holding their device perpendicular to the ground, then pins should appear in the center of the screen because the user is looking forward at the horizon. However, if the user is holding their device at a 45 degree angle facing downwards toward the ground, then pins might appear further towards the top of the screen. The accelerometer can also be used to determine whether pins should be rotated onscreen.

Gyroscope
The device gyroscope can be used to determine how quickly a user is currently rotating their phone or tablet. A common complaint with augmented reality apps is that the pins onscreen move around very frequently and are overly sensitive to device movement. By observing gyroscope events, the effect of the accelerometer on pin movement can be dampened whenever device movement is rapid. A gyroscope is available on iOS starting with the iPhone 4.

For project "OwlCam", we use Google Street View in place of a device camera, and so the default user expectation is to be able to control the viewport by touching the screen. However, a special option in the settings enables “Compass Mode”, which uses the iPad’s internal compass to track movement within Street View.

Tip: To enable Compass Mode, go to Settings -> TripAdvisor -> Street View -> Compass Mode -> ON, and then return to the TripAdvisor app. Within Street View, rotate your iPad around to control the camera.

In the future, we expect that augmented reality will become even more prevalent on mobile and tablet devices, especially for online travel planning. We are very excited at TripAdvisor to be on the leading edge of this technology.

Thursday, May 6, 2010

Making Sure Your Website Still Works When You Can't Get To Facebook

Author: Wilhelm Asche

One of the challenges at TripAdvisor is working internationally -- both for non-US members and US members who are travelling. We work hard to make sure that our site works from everywhere.

Recently, we began rolling Facebook Connect out on the site as a login mechanism. For us, Connect is a great way to streamline the login process and make it even easier for existing members to login and for new members to sign up. To get started, Facebook has provided some excellent documentation.

The problem is that the suggested approach -- including the FeatureLoader javascript right after the body tag can cause serious problems when it is unavailable. The important thing to realize is that many things can cause this file to become unavailable: being blocked in a workplace, downtime of Facebook's servers, or even by a national policy (as in China). As a service that our members have come to rely on, TripAdvisor must continue to work in light of those sorts of issues.

Failure Modes

There are two big failure modes to worry about: the file itself not being available and the browser not being able to find the Facebook servers at all.

File Is Not Available

The good news with this failure mode is that even if the file is not available, the page can continue to load: our members can find review, book hotels and so on. Naturally, functions that rely on Facebook classes won’t work and we'll want to turn them off. A simple approach is to set a javascript var to say that Facebooks' libraries have initialized correctly; successive functions can check on that:

FB_RequireFeatures(["XFBML", "Api", "Connect"], function(){
FB.Facebook.init(apiKey, '/xd_receiver.htm');
FB.ensureInit(function(){
// Facebook JS is now ready for use
window.FacebookInitialized = true;
});

FB.XFBML.Host.get_areElementsReady().waitUntilReady(function(){
// FBML has been processed
window.FBML_initialized = true;
});
});

Can’t Find Server

Not finding a server is a more serious problem. The reason is that the web browser could block waiting for a timeout. This will result in it not loading the rest of the page: no reviews, no bookings, for all intents and purposes, no TripAdvisor. This is a Very Bad Thing. Even worse, it is how many firewalls are designed.

Solutions

In order to solve these concerns, we need to make sure Facebook is available, while making sure that the page loads correctly when it is not. We went through a couple of iterations to get it right.

Iframe With Callback

One approach we've tried is to use an iframe with a callback. We create a hidden iframe that tries to load the Facebook javascript as they recommend and then call a method on the parent saying it loaded. The problem is that this causes the browser to appear as though it is waiting for something to finish, even though the main page is fully loaded. This is confusing to the user and not ideal.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<script type="text/javascript" src="http://static.ak.connect.facebook.com/js/api_lib/v0.4/FeatureLoader.js.php">
</script>
</head>
<body>
</body>
</html>

Iframe With Script Injection

Next, we tried to using the iframe with JavaScript injection. For the moment, ignore that the FeatureLoader JS does a document.write under certain circumstances. We wait for the page to load and then inject a script tag in the page head. This lets the browser look like it is done, which as far as the user is concerned, it is.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<script type="text/javascript">
function loadFacebook() {
// load Facebook API
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'http://static.ak.connect.facebook.com/js/api_lib/v0.4/FeatureLoader.js.php';
document.getElementsByTagName("head")[0].appendChild(script);
}
</script>
</head>
<body onload="loadFacebook()">
</body>
</html>

The problem here is that we don't know when the JS is done loading in the iframe.

Iframe With Script Injection After Callback

Finally, we took the approach of injecting the JavaScript and then waiting on a callback to determine when it is available. For this, we use some trivial functions to check if any of the variables set by Facebook’s JavaScript are available. If they are, we're done.

When the JavaScript is available we call to the parent, which can do the same injection to pull in the JS (which the browser will have cached!). Now the parent can run functions in Facebook's libraries.

In either case, we remove the iframe; it is unnecessary at this point. If we did timeout, removing the iframe will force a timeout on the load script.

<script type="text/javascript">
// called on load
function loadFacebook()
{
// load Facebook API
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'http://static.ak.connect.facebook.com/js/api_lib/v0.4/FeatureLoader.js.php';
document.getElementsByTagName("head")[0].appendChild(script);
script.onload = facebookLoaded;
script.onreadystatechange = function()
{
if (this.readyState == 'loaded' || this.readyState == 'complete') facebookLoaded();
}
}


function facebookLoaded()
{
FB_RequireFeatures(["XFBML", "Api", "Connect"], function(){
FB.Facebook.init(apiKey, '/xd_receiver.htm',
{
doNotUseCachedConnectState: true,
permsToRequestOnConnect: "email"
}
);
FB.ensureInit(function(){
// Facebook JS is now ready for use
window.FacebookInitialized = true;
});
FB.XFBML.Host.get_areElementsReady().waitUntilReady(function(){
// FBML has been processed
window.FBML_initialized = true;
});
});
}
</script>

Wednesday, May 5, 2010

Partitioning Data for Fun and Scalability

Author: Xavid Pretzer

A while back, TripAdvisor released a Facebook app, Cities I've Visited, that lets people stick pins in a map for places they've been and where they're going.  While our CIV app is not quite Farmville yet, the viral nature of Facebook does help things take off.  Without any particular effort on our part, we've ended up with almost 1 billion pins, with new pins coming in at almost five million a day.

The pin table is easily the fastest-growing table in our database and one of the largest.  With new functionality in the works that will take better advantage of this data, and associated plans for marketing pushes to promote our app, it could easily start growing significantly faster.  Currently, we store all the pin data in a table in a PostgreSQL database.  This has served us adequately so far, but the size of the table is putting stress on our replication system and makes compiling aggregate statistics and database management difficult. Now is a good time to rethink our design and come up with something that scale better, so we would be ready for any surges down the road.

The obvious way to increase scalability is to partition your data between databases on multiple machines.  Thus, you can increase your data capacity by simply buying new hardware.  There are some downsides, of course: queries that need to talk to more than one database are on average slightly slower, and enforcing transactional semantics becomes more challenging.  But it also has some side bonuses, such as more parallelism for large queries.  And there's really only so much you can do if you insist on keeping all your data on every database server.  (Database replication can help if read load is your worry, but that just makes things worse when you're worried about write load like we are, and most of the time a proper caching setup works better there anyways.)

Partitioning the data ended up being a great fit here.  Our data setup makes this easy: all operations (except for some offline aggregate reporting) specify either a single member ID or a set of member IDs (e.g., for when you're seeing everywhere your friends have gone).  Thus, partitioning on member ID makes routing queries easy. Moreover, all modification operations only operate on a single member ID, except in the less-frequent case of merging two users.  Finally, we don't need strong guarantees about which pins are visible if you look at someone's map while they're in the middle of modifying it, so we don't need to handle multi-server transactions.

From here, the main question is how to partition the data.  Based on our query load, we know we want the partitioning to be based on member ID.  The obvious way to do this would be to take the member ID, mod by the number of servers, and put the data on the server with the resulting index.  This is quick and easy, and since memberid is effectively random this should result in an even distribution of data.  Why wouldn't you go with the simple solution?

Well, one situation that this doesn't handle so well is when you need to repartition your data.  Say that you initially partitioned your data onto 5 servers, and now you want to add 3 more.  With optimal partitioning, you could move 1/40th of the data from each of the 5 servers to the 3 new ones, and you'd end up 1/8 of the data on each server, without needing to move any data between the existing servers.  However, with the simple mod hash function, almost all the data changes which server it's assigned to.  This results in unnecessary copying and thus overly long transitions.

For some of you, this problem will look familiar, and you'll want to reach into your toolbox for everyone's favorite way to divide data among cache nodes: consistent hashing.  I won't go into the hairy details here, but the basic idea of consistent hashing is that you arrange your data evenly on a circle (say, by taking some rightmost bits of the member ID), and then you assign points on the circle at random to each server.  Each piece of data is stored at the server that comes next in the circle.  With enough points per server, your data ends up well-distributed, though not perfectly so.  Consistent hashing has the useful property that whenever you add or remove a server, only the data for the affected server has to move, and that data is likely to be well-distributed among the other servers.  Thus, more servers can be added at any time with minimal disruption.

What's the downside?  One issue is that the data's not distributed as evenly as with mod-hashing, which can lead to you not getting the most out of your hardware.  You can increase the evenness of your data distribution by using more circle points per server.  But the more points you add, the more complex your hash function becomes. One great thing with mod-hashing is that you can easily express your hashing in SQL, whereas with consistent hashing you either have huge and less-efficient SQL expressions or you have to do your hashing in client code.  So, while less data needs to be moved under consistent hashing, figuring out what data needs to move can take considerably longer.  Finally, consistent hashing is complex and random enough that it's no longer possible to ensure you're hashing properly by inspection, making it more difficult to notice errors.

While consistent hashing seems like a useful idea here, it's possible to get most of its advantages while keeping the simplicity of mod-hashing.  First, ideally you've already got a hot backup of your database servers, so you don't need to worry about needing to remove database machines on short notice.  Then, you can start by dividing your data between more databases than machines you need to use.  For example, start with 12 databases divided between two database machines.  If your load or data size becomes large enough that two machines aren't enough, just move 2 databases from each machine onto a new one.  From the point of view of your partitioning, nothing's changed, but you're now using more machines.  You can easily transition to 4, then 6, then 12 machines with even machine usage.

But what if your app becomes the next Twitter, and 12 machines is no longer enough? We saw before that in the general case adding new databases causes most of your data to need to move.  But there's a special case that works better: multiplying the number of databases by an integer.  Going from 12 to 24 databases, half of each database's data moves to a new database, with no movement of data between the existing databases at all.  Of course, you're still moving half your data, so it's not going to be a quick operation.  But it's much better than shuffling your data between all of your databases.

Scalability is a complex area, and obviously what works for us won't necessarily be the right answer in your situation.  The mod-hashing system with more databases than machines looks to be working well with us, and it'll be interesting to see how it holds up with the new developments of the next few months.  While too much load may be a good problem to have, I think we can all agree that it's a better problem to handle seamlessly.