Summary
We are going to implement a parallel graph search for use in artificial intelligence.
We will use CUDA as our platform and will test the algorithms on the Gates NVIDIA GPU's.
Background
Graph search algorithms are notorious for having long runtimes, especially on search spaces that are many dimensional.
For simple 2 dimensional maps, the runtime of graph search may still be too slow to be real-time.
Parallelization of these algorithms in higher dimension spaces is important for making them feasible to run in an acceptable
amount of time.
One application that may be interesting to think about is artificial intelligence in video games. Our approach is to
apply parallelism to graph search algorithms in order to have intelligent responses in real-time,
thus reducing the amount of time necessary for the player to remain idle. In 2 dimensional maps, the search space
of the graphs is as large as that of higher dimensional maps (like 3D games), but can still create idle time for the player.
AI path planning is one example that would benefit from parallelism; this problem often takes longer to solve than other
computational events simply because of the need to search a rather large search space.
Our goal is to apply parallelism to one (or a few if time allows) graph search algorithm, beginning with D* Lite.
In addition to this, we plan on measuring the effectiveness of parallelism through both timings as well as an application
towards an AI in a 2 dimensional game. We seek to improve the speed in which the AI can react to changes in the map
through parallelism.
The Challenge
Quite a few challenges exist past the simple parallelization of the algorithm. First of all, the A* family of algorithms
seem similar to the breadth first search/depth first search family but are incredibly different in the way that they
operate. Since in A*, not all nodes on the frontier are treated equally, the “frontier” nodes cannot all be checked at
the same time to ensure correctness; thus, this is quite a different problem from parallelizing breadth first and
depth first search.
Concurrency will also be a difficult problem to deal with when multiple threads are working on the same graph.
Computing heuristic values and updating them will require some level of concurrency such that the node cost is updated
to the correct value. In addition, we seek to implement caching between threads to keep track of paths that are completely
explored and require no more exploration. These threads must then also have some level of concurrency to help the cache run
efficiently without loss of information.
Resources
A quick search on Google returns a relatively low amount of research towards parallel graph search algorithms.
Thus, we will rely on the techniques learned in this class as well as assignment 3 to help us implement an efficient
parallelized graph search algorithm. For implementing the algorithms, we will start from the basic graph search algorithm
and begin coding from there. We are not sure how to approach implementing an AI in a preexisting game, but we will be
sure to research that early so as to be ready for that part later in the project. We do require the use of NVIDIA GPU's
in order to run the CUDA programs, but apart from that, we do not require any special machines.
Deliverables
Plan to Achieve
We plan to implement a parallel graph search algorithm, D* Lite, in CUDA that also takes advantage of caching. In order to
show the usefulness of our algorithm (without going overboard), we will apply the algorithm to an artificial intelligence
in a certain 2D game. We will either obtain source code for the game or write our own, as the parallel algorithm is the
main objective of this project. In addition, we will run tests on the Gates machines comparing the speeds of sequential and
our parallel D* Lite algorithm.
Hope to Achieve
If we end up with more time, we would like to implement a different graph search algorithm from the same A* family to
compare the effectiveness of each in parallel. If given even more time, we would attempt to apply the algorithm to
higher dimensional spaces, such as robotic arms or 3D games.
If all goes well, our demo would include a side-by-side comparison of two AI's using either parallel or sequential
implementations of D* Lite, and seeing the difference in speed. We may need to speedup or slow down the game in order to show
a drastic difference.
If we implement another graph search algorithm, we would hope to analyze the differences between the two algorithms
when implemented in parallel. It would also be interesting to see if the same algorithm that runs faster in series also runs
faster in parallel.
Platform Choice
We will be using CUDA on the Gates machines, using the NVIDIA GTX 480 GPUs. As we are currently limiting ourselves
to planar graphs to traverse, using the GPUs may not show significant speedup due to the limited graph size. However,
since our long-term vision is to scale our graph search to multidimensional graphs, our implementation will eventually
utilize much of the GPU's processing power.
Schedule
Week 1 (4/4-4/13) Research and understand our algorithm. Collect all resources we might need (graphs to run our code on, for example). Generate psuedocode so we have a skeleton to work off of.
Week 2 (4/14-4/20) Implement a sequential version of D* Lite.
Week 3 (4/19-4/27) Parallelize our implementation. Implement caching if possible.
Week 4 (4/28-5/4) Debugging and fine-tuning.
Week 5 (5/5-5/9) Final tests and data analysis. Write up final report. Prepare for the competition!