CS470/570  Artificial Intelligence

Program #2:  RoadWarrior meets Google Maps
(a.k.a.,  Searching like a homo sapien)

Overview: 

Now that we are well on the way to understanding the concept of “state space search” with our primitive Boggle-bashing, it’s time to have a deeper look at searching to better understand what it is we were doing in Fred Flintstone mode, how this fits into the full range of options possible, and what the practical differences there are between some of the main searching options.

To explore these questions, let's switch from Boggle to a more classic search problem: route-finding on complex maps. This is pretty much exactly the problem that Google Maps solves every time you ask for directions to someplace, and you can be sure that they have lots of data and some good AI going into providing you with an answer...because they are increasingly competing with others (e.g. Apple maps) for your patronage! For our exploration, we'll keep it simple: we just want to find a route from some some starting point to some ending point, knowing only what the road network looks like. Some of the real-life questions we might want model include:

We'll explore at least a couple of these scenarios in our toy map world, while looking at how different possible searching algorithms would handle each. There are, of course, many many other domains in which we might explore the concept of "large state space search" that lies at the heart of much of AI: chess, moving a character through a maze, planning the best possible sequence of actions for your robot, etc. etc.

One huge advantage of the network navigaton domain is that it's naturally nice to visualize the search: unlike Boggle, the search space is just the map...so you can literally watch your search progress to understand how each algorithm really works. Actually *getting such a visualization up and running* is a whole different story of course! Graphical layout and programming is particularly vexing and challenging to program. Fortunately, your intrepid prof has jumped in to create some useful tools here, leaving you to focus on the search algorithms that we're really interested in! Some details are given further down. Note that there is nothing that REQUIRES you to use these tools...but you'd be silly not to use them to really get your mind around what's going on!

The Assignment:

Our aim in this programming exercise is to explore the implications of different search algorithms and heuristics in a nice, easily-vizualized route-finding domain. Specifically, your challenge is to implement a general search engine program capable of doing DFS, BFS, Best-first, and A*. As evident from book and lecture, this is really a single base algorithm that you can manipulate to produce the various search behaviors. The idea is that your program runs repeatedly, exploring a given map using each of these algorithms in turn. Of course, if everything is working correctly, all of the searches will find a desired route (assuming one exists)...but you will see that they have *dramatically* different search behaviors. We've discussed some of the pros/cons of these algorithms in lecture (completeness, time/space complexity, and optimality); the aim here will be to see what this REALLY MEANS in a hand-on scenario. Your program will print out the statistics for each search run as it completes that search. For the A* search, I will ask you to explore one informed heuristic, with top score requiring the exploration of two or more.

The search spaces for this problem will be maps, such as the one visualized on the right. Specifically, you will need to be able to load a "map" from a simple text file with very simple format, with each line of the file describing one edge in the graph:

(node1, node2, edgevalue, [x1,y1],[x2,y2])

To do the search, you technically only need the first three of these; the x-y positions of the two nodes are only needed if you want to actually lay out/visualize the search graphically. For instance, here is the map file for the map shown on the right. The map on the right is actually a snapshot of the GraphViz tool I made in action: start and goal have been marked, and a search has been started and has explored several nodes.

Important note: We want our maps to simulate real life! Look *carefully* at the map on the right and you will notice that the distances on the edges are NOT just straight-line-distance (SLD) between the nodes...reflecting the fact that in real life, the length of a road between two places is almost always longer than a "way the crow flies" SLD between the places! In other words, you can assume the following constraint: maps will always show distances between nodes, and these distances are *vagely* representative of the physical distance between them, but will often be longer (never shorter) than a calculated straight-line-distance between node coordinates.

Programming: What functionality you need to provide

Your task is to create a software tool for searching maps. It should be able to:

Required output: what to show on your sample runs.

For all searches done, your awesome searcher should report:

  1. Search type it's doing, and the name of the input file that map was taken from.
  2. The start node and the goal node(s) set for that search
  3. The number of expansions that were done, i.e., the total number of nodes searched to find the solution.
  4. What node the search ended at (hopefully a goal node!) and the path cost of the path it found.
  5. The actual path to the goal: start by noting the length of the path, then show the nodes in the path from start to finish
  6. Search Stats! Average and Maximum OPEN list size, Average and Maximum depth reached during the search, and average branching factor of nodes expanded.

For searches done with VERBOSE mode turned on, the following should also be shown:

Don't worry, we'll only turn on verbose mode for testing where we specify a small number of expansions to do, or for very small test maps!

Here are a couple of sample output files to show you what your solution should be producing:

Required details, pay attention!

As you can see from the specs above, your program needs to provide certain outputs...which will allow me to evaluate whether you have correctly implemented the targeted search functionality. In order for this to work out (i.e., for us all to produce easily comparable output for a given search on a given map), we need to all answer address certain "undefined" issues in the same way. Please observe the following rules in implementing your solution:

Some comments on implementation:

Your write-up: 

In addition to your code and solution print-outs, you'll need to provide a nice write-up of your solution.  Your write-up should be professionally neat and must include:

  1. A brief description of your solution approach/strategy.   Introduce how you factored the problem (major classes and/or functions) and what each of them does. You don't need to describe every little helper function, just walk us through an overview of how your code tackles the problem.
  2. Analyze your heuristic function(s) that you developed for the A* search. Are they admissible? Are they consistent? Define each concept, analyze whether each h(n) you did meets that bar. With this in mind, can you guarantee that you A* searches guarantee optimal routes? If not, can show me a map in which your functions do NOT find the optimal route?
  3. So what is the difference between the algorithms? In which situations does one work better than the others? Do some exploration to generate some data to answer these questions by creating at least two reasonably complex (>30 nodes) maps. Then do the following on each map, recording your stats every time:
    1. Select five different start-goal combinations, choosing them to be different from each other. Closer, farther, heavily connected, on the edge, in the middle. The idea is that you're trying to somehow get a spread of possible start-goal conditions.
    2. Now run each of the algos on the map, for each of the start-goal conditions. Record your stats for each run.
    3. Now, repeat (1) and (2), but with the following change: choose a start point...and then multiple goals. So no you have five conditions where you have multiple goals.
    4. Again, run all your algorithms/heuristics on each start-goals condition. Record your stats.
  4. Now analyze the data you generated. First show your raw data in one or two nice tables (one for each map analyzed). Then add some column/row statistics to compare, for each search type and start-goals combo the average of: goal path cost, goal path length, number of nodes explored, etc. What you are trying to do is to see if you can see some patterns. Does one type of search always do better? Or does it only tend to do better for certain situations (e.g., close-by goals vs distant goals). Or maybe things change when you have multiple goals vs. just one? If you did multiple A* heuristics, which worked better?
  5. Now let's bring this back to reality: describe a heuristic function that Google Maps might actually use to generate routes. What are the inputs? How does it calculate a score? Remember, this function gets called *every time* a new search node is generated! Discuss the compuational complexity of running your function...and speculate on ways that Google might make this manageable.
  6. Write up the results of your analysis, answering questions like the ones suggested above...and noting any other patterns you exposed with your exploration. Whatever you present as your conclusions, it must be supported by the data your produced!! Do NOT just dump a bunch of data...and then speculate freely without any reference to actual data!

To turn in:

A professional packet with the following items in exactly this order:

Part 1: Making basic progress

  1. Cover sheet:  Name, course, assignment title, date
  2. Printout of your program doing some simple "building block" things:
    1. Create a super-simple SearchNode class that has at least two fields: label and value. For now the value is just the path cost (from start) to the node.
    2. Show your program loading in the 30-node sample file above.
    3. Show you program setting start node=U and end node=T. Accompany your console action with snapshot of the graphical map after this action, i.e., using the GraphViz tool.
    4. Ask your program to show your OPEN list to see that indeed node U is in it. Your showOpen() function should print tuples of (label, value) for your list.
    5. You asking it to generate the SUCCESSOR (children) for node 'U'. This should return a list of the children of 'U'; as we said above, these siblings should be in alpha order.
    6. You asking it to INSERT the list of children produced above into your OPEN list. Show three inserts: at the front, and the end, and "in order", meaning a priority list based on the node value so that the cheapest node appears first in the new OPEN list. The insert should show us the new OPEN list each time.
    7. Now let's make sure your INSERT handles duplicates properly: manually create new nodes for (K,500), (C,91) and (J,10). INSERT these into your OPEN list, showing the results.
  3. Show your hSLD heuritic function being called on these nodes: V, AC, and J.
  4. Your richly commented and professionally presented code (maybe be duplex printed).

Part 2: The whole enchilada

  1. Cover sheet:  Name, course, assignment title, date
  2. Your write-up to the analysis questions above, neatly typed up, cleanly formatted. Make sure you clearly label each answers so that I know which question it is addressing.
  3. Printout of your program output running the 10-node and 30-node samples given above. For each one, run the first three expansion steps with VERBOSE mode on, to show development of your OPEN list. Then turn VERBOSE=off, and let the searches run to completion to show the final stats.
  4. Printouts of your program running the dynamically assigned searches. Link will be activated here shortly before due date.
  5. Your richly commented and professionally presented code (may be be duplex printed).

ATTENTION: when you are showing runs of your searches, *you must clearly label the printouts*!! Especially: you should carefully state what input map you are running on! An excellent way to help me out here is to attach a small snapshot of the graphical layout of the map...makes it easy to see which map your are running! Extra cool is if you've asked the visualizer to paintPath the final goal path found by your search, so that it's showing in red on the map!