So the key premise of the game is that you're giving someone directions like "Turn right after passing a red building". Clearly knowing which buildings a user has passed is going to prove important.

To start reasoning about how I would determine which buildings a user had passed I took a screenshot of the game, and marked it up quickly using Skitch with the grid coordinates of the intersections in the map, as well as the buildings.

City grid, showing building coordinates, and intersection coordinates. 

Then I thought about the first move the cursor would make, traveling down. When it moves up and down it needs to look at the buildings to the left and right. If it moves left and right it needs to look at the buildings above and below it.

Thinking about the coordinates  traveling from 0,-1 to 0,0. It would pass the buildings at 0,0, and 1,0. In relative terms, it passed the building at its own coordinates, and the building at its own coordinates with X+1. Or: x,y & x+1,y.  I then generalized those rules into the following:

//If X changes - look up and down.
// Moving Right - int + 0,0 & int + 0,1
// Moving left  - int + 1,0 & int + 1,1

//If Y changes - look left and right
// Moving Down int + 0,0 & int + 1,0
// Moving UP int + 0,1 & int + 1,1

Adding some debug comments to my code shows just what I'm hoping for:

The player is at 0, 0
Player moved down to get here
Player passed a red building
Player passed a orange building

I'm not currently happy with all of the ways that I'm storing the information on the building grid, but it sounds like several of the language limitations I'm running into will be addressed in the next release.