|
|
|||
![]() |
Department of Engineering |
| University of Cambridge > Engineering Department > computing help > Languages > C++ |
The STL doesn't contain a tree container, but if you wrote one so that it had the standard hooks that algorithms require, then algorithms like copy etc. should work with it. Clearly a tree container would require some special routines that other containers wouldn't support, but C++ makes it easy to add functionality in, building on what you already have. Before we look at trees, let's see how to use some simpler containers.
#include <deque>
#include <algorithm>
#include <iostream>
using namespace std;
int main()
{
deque<int> Q; // An empty, elastic, container ready to hold ints.
// A deque is a double-ended queue. Queues (unlike lists) are
// data structures where you wouldn't expect to insert new items
// anywhere else but at the ends
Q.push_back(0); // Many containers have a push_back
// routine. It's a way to add elements
// to a container in a safe way - the
// container will grow if it's not big enough.
Q.push_back(2);
Q.push_back(1);
// the next line is a bit of magic you'll have to take on trust
// for now - it "copies" (i.e. prints) the elements of Q to
// the output stream. A simple "for" loop could be used instead
// but it's tempting to avoid using "for" altogether.
copy(Q.begin(), Q.end(), ostream_iterator<int>(cout, " "));
// Output: 0 2 1
cout <<endl;
// Now let's sort!
sort(Q.begin(), Q.end());
// the arguments we've given to sort mean that the whole queue
// will be sorted, and that the result will be copied back to
// the original queue. sort uses the default sorting criterium
// for integers, and can use the swap function of Q to switch
// elements around.
copy(Q.begin(), Q.end(), ostream_iterator<int>(cout, " "));
// Output: 0 1 2
cout <<endl;
}
To create a queue of strings rather than a queue of
integers, we don't need to change much. This ease of coping with
different data types will be useful when we come to create trees.
#include <deque>
#include <string> // added so that we can use strings
#include <algorithm>
#include <iostream>
using namespace std;
int main()
{
deque<string> Q; // changed
Q.push_back("standard");
Q.push_back("template");
Q.push_back("library");
// the next line has been changed to support strings rather than integers
copy(Q.begin(), Q.end(), ostream_iterator<string>(cout, " "));
cout <<endl;
// Now let's sort!
sort(Q.begin(), Q.end());
// sort uses the default sorting criterium for strings - alphabetical
// the next line has been changed to support strings rather than integers
copy(Q.begin(), Q.end(), ostream_iterator<string>(cout, " "));
cout <<endl;
}
As well as being able to have queues of standard items like strings
and integers, we can also invent our own types and have queues of
those. The main problem is that when we sort we have to specify
the sort criteria for our own types. It's sufficient to specify what the
'<' operator should do for your own type. Here's an example
#include <deque>
#include <algorithm>
#include <iostream>
using namespace std;
class simple {
public:
int value;
// redefine the '<' operator, so when 2 simple objects are
// compared, the 'value' fields of the objects are compared.
const bool operator<(const simple &rhs) const
{ return this->value < rhs.value; }
};
int main()
{
deque<simple> Q; // An empty, elastic, container.
simple s;
s.value=0;
Q.push_back(s);
s.value=2;
Q.push_back(s);
s.value=1;
Q.push_back(s);
for(int i=0;i<Q.size(); i++){
cout << Q[i].value; cout << " ";
}
cout <<endl;
// Output: 0 2 1
// Now let's sort! sort uses the '<' operator that we created in the 'simple' class
sort(Q.begin(), Q.end());
for(int i=0;i<Q.size(); i++){
cout << Q[i].value; cout << " ";
}
cout <<endl;
// Output: 0 1 2
}
Binaries trees (where each node has at most 2 children) are commonly used - not because they most closely model reality, but because they're computationally and mathematically more easy to deal with.
Suppose that a program were to be given 10,000 words (non-unique) and had to list the unique words used. Suppose also that memory was at a premium, so that storing every non-unique word is to be avoided. Each time a word is read the program needs to check if the word's new and store it only if it is. If the words are stored in an array alphabetically, searching will be faster, but adding a word to the start of the array will be slow because subsequent words will have to be shifted.
Using a binary tree will speed things up. Each time a word is read,
the program starts at the top of the tree. If the node is empty, the
word is stored there, otherwise the left or right branch is taken according
to whether the word comes before or after the word stored in the node.
For example, suppose the first 5 words were "standard", "template", "library",
"data" and "structures". "standard" would be stored in the top node. "template"
would be stored in the right-child node of the top node. "library"
would be stored in the left-child node of the top node.
standard
/ \
library template
"data" comes before "standard" so the left link will be taken. "data"
comes before "library", so "data" will be stored in the left-child node of
"library". The final tree looks like this
standard
/ \
library template
/ /
data structures
If this tree remains balanced, then it could fit 10,000 words within 15 layers,
so a word-search would require only 15 comparisons at most. Insertions (new nodes)
are
cheap too. Contrast this with the worst case array scenario, where 10,000
comparisons might need to be made before a search is complete!
However, if the words are supplied in alphabetical order, this tree-based
method suffers - each new word will add a right-node, deepening the tree
each time, making searching as slow as with arrays. Various techniques exist
to combat this. The benefits of these
techniques are so
enormous that simple binary trees aren't used in big programs.
With either method there's a choice of ways to order operations on the node.
-
/ \
* *
/ \ / \
2 + 3 -
/ \ / \
x 5 y 4
If you do a depth-first
search, processing each left-branch and right-branch of a node before printing
the node's contents you'll get
2 x 5 + * 3 y 4 - * -
(check to see if you agree with this!) which is the expression in Reverse Polish notation (used by older calculators
and the postscript language) but if you deal with nodes differently
(still depth-first search, but this time processing the left-branch, printing the contents of the node, then
processing the right-branch) you'll get something much closer to the
normal representation.
The depth-first algorithm can be described more formally as follows
A
/ \
B C
/ \ / \
D E F G
/ \ / \
H I J K
The STL makes it relatively easy to translate this algorithm into code
nearly line-by-line. If each vertex has a list of sub-vertices called
child then
list<Vertex > q;
q.push_back( start );
creates and initialises a list, and
while( !q.empty( ) ){
Vertex v = q.front( ); // get the first item
if (!v.goal_reached()) {
q.pop_front( ); // remove it from the list
for( int i = 0; i < v.child.size( ); i++ )
q.push_front(v.child[i]); // push each child to the front of the queue
}
}
modifies the list.
It's sometimes possible to sort this list of nodes to be visited so that the most promising nodes are visited first.
deque<Vertex> q;
deque<Vertex> new_wave;
while( !q.empty( ) ){
Vertex v = q.front( ); // get the first item
if (!v.goal_reached()) {
q.pop_front( ); // remove it from the list
for( int i = 0; i < v.child.size( ); i++ )
new_wave.push_front(v.child[i]); // push each child onto a queue
// Now decide whether to sort this new queue before adding
// it to the main queue, or add it to the main queue before
// sorting the whole queue
if (mode==hill_climbing) {
sort(new_wave.begin(),new_wave.end());
while(!new_wave.empty()){
q.push_front(new_wave.back( ));
new_wave.pop_back( );
}
}
else {
while(!new_wave.empty()){
q.push_front(new_wave.back( ));
new_wave.pop_back( );
}
sort(q.begin(),q.end());
}
}
}
Branch and Bound methods break the set of feasible solutions into
smaller subsets, and find for each subset a lower bound (as high as
possible) on the function one's trying to maximise in the hope of
isolating the optimal solution. These subsets often take the form of
subtrees. If the subsets are well chosen, and if the
bounds are easy to calculate, these methods can greatly reduce the
number of nodes that need to be explicitly searched.
Many tree traversal strategies require the program to remember previous nodes in the path so that backtracking is possible. This (and recursion) can use up a lot of memory. With threaded trees each node knows its sucessor - with a depth-first search parents lead to their first child which in turn leads to siblings. The last sibling leads to the next node to be visited. Maintaining such a tree requires more work when nodes are added or removed, and makes traversal less flexible but with each node having a successor, it becomes more possible to treat the Graph as an STL container on which iterators can operate.
If some edge (u,v) is in graph , then vertex v is adjacent to vertex u. In a directed graph, edge (u,v) is an out-edge of vertex u and an in-edge of vertex v. In an undirected graph edge (u,v) is incident on vertices u and v. In a directed graph, the number of out-edges of a vertex is its out-degree and the number of in-edges is its in-degree. For an undirected graph, the number of edges incident to a vertex is its degree.
A path is a sequence of edges in a graph such that the target vertex of each edge is the source vertex of the next edge in the sequence. If there is a path starting at vertex u and ending at vertex v we say that v is reachable from u. A path is simple if none of the vertices in the sequence are repeated. A path is a cycle if the first and last vertex in the path are the same. A graph with no cycles is acyclic.
The following directed graph has 6 nodes. The Dijkstra-Moore algorithm can be used to find the shortest path from a given node to each of the others as long as costs aren't negative. A breadth-size search from the first node radiates out across the graph. If it reaches an unvisited node, it sets its distance from the first node to be the distance of the previous node from the first node, plus the length of the edge from the previous node. If a node's already been visited, it will already have been given a distance from the first node, but this might not be the shortest distance - the current path might be shorter. So a comparison is done, and the node's distance is set to the minimum.
For example, in the graph below, suppose that A was the source node. In the first stage, node B's distance would be set to 7 and node C's distance to 1. During the second stage, routes from nodes B and C would be explored. When node B is reached from node C, the distance is 1+5, so node B's distance is reset to 6 from its original 7. As the algorithm progresses, node B's distance will later be reset to 5, which is the final value for node B's distance. The algorithm terminates after 5 stages because by then all of the routes will have been traversed - routes can't have more than number_of_vertices - 1 edges.
A graph is an STL vector of vertices. Each vertex has an STL vector of edges.
For flexibility the details of the search task are
separate from the configuration of the graph.
enum ordering {preorder, inorder, postorder};
enum search_mode {breadth_first, depth_first, hill_climbing, best_first, Astar, shortest_paths, min_spanning_tree};
class Search_Description
{
public:
ordering order;
search_mode searching;
string startnode;
string goalnode;
bool printnode; // whether the visited node-name is printed
};
The vertices and edges can have associated information
(cost, names, etc).
The graph and search specifications can be read from a
file or be specified interactively.
The minimum-distance problem above can be set up using the following file
A B 7 anon
A C 1 anon
B D 4 anon
B F 1 anon
C B 5 anon
C E 2 anon
C F 7 anon
E B 2 anon
E D 5 anon
F E 3 anon
SEARCHMODE shortest
STARTNODE A
Most of the lines specify directed edges with weights (the "anon" means that
the edges don't have names). Vertices are automatically created on
demand. The output is
B is 5 from the source.
Shortest path is A to C to E to B
C is 1 from the source.
Shortest path is A to C
D is 8 from the source.
Shortest path is A to C to E to D
F is 6 from the source.
Shortest path is A to C to E to B to F
E is 3 from the source.
Shortest path is A to C to E
The hill climbing problem above can be set up with the following file
A B 1 anon
B C 1 anon
C D 1 anon
A E 1 anon
E F 1 anon
NODEVALUE A 0
NODEVALUE B 2
NODEVALUE C 0
NODEVALUE D 1
NODEVALUE E 1
NODEVALUE F 3
// For best-first change the next line to SEARCHMODE bestfirst
SEARCHMODE hill
STARTNODE A
GOALNODE F
Graph member functions include
#include "graphs.cc"
int main( int argc, char *argv[ ] )
{
Graph g;
Search_Description s;
vector<string> nodenames;
nodenames.push_back("A");
nodenames.push_back("B");
nodenames.push_back("C");
nodenames.push_back("D");
nodenames.push_back("E");
nodenames.push_back("F");
for (int i=0;i<6;i++)
for (int j=i+1;j<6;j++)
g.addEdge(nodenames[i],nodenames[j],i+j,"anon");
g.shortestpaths("B");
return 0;
}
Answer: Generic programming is a programming method that is based in finding the most abstract representations of efficient algorithms. That is, you start with an algorithm and find the most general set of requirements that allows it to perform and to perform efficiently. The amazing thing is that many different algorithms need the same set of requirements and there are multiple implementations of these requirements. The analogous fact in mathematics is that many different theorems depend on the same set of axioms and there are many different models of the same axioms. Abstraction works! Generic programming assumes that there are some fundamental laws that govern the behavior of software components and that it is possible to design interoperable modules based on these laws. It is also possible to use the laws to guide our software design. STL is an example of generic programming. C++ is a language in which I was able to produce a convincing example.
Question: I think STL and Generic Programming mark a definite departure from the common C++ programming style, which I find is almost completely derived from SmallTalk. Do you agree?
Answer: Yes. STL is not object oriented. I think that object orientedness is almost as much of a hoax as Artificial Intelligence.
| | C++ | Languages | computing help | |