C++ Notes
The C++ runtime and generated grammars look very much the
same as the java ones. There are some subtle differences
though, but more on this later.
The following is a bit unix centric. For Windows some
contributed project files can be found in lib/cpp/contrib. These
may be slightly outdated.
The runtime files are located in the lib/cpp subdirectory
of the ANTLR distribution. Building it is in general done via the
toplevel configure script and the Makefile generated by the configure
script. Before configuring please read INSTALL.txt in the toplevel directory. The file lib/cpp/README may contain some extra information on specific target machines.
./configure --prefix=/usr/local
make
Installing ANTLR and the runtime is then done by typing
make install
This installs the runtime library libantlr.a in
/usr/local/lib and the header files in
/usr/local/include/antlr. Two convenience scripts antlr and
antlr-config are also installed into /usr/local/bin. The first
script takes care of invoking antlr and the other can be used to
query the right options for your compiler to build files with
antlr.
Generally you will compile the ANTLR generated files with
something similar to:
c++ -c MyParser.cpp -I/usr/local/include
Linking is done with something similar to:
c++ -o MyExec <your .o files> -L/usr/local/lib -lantlr
To get ANTLR to generate C++ code you have to add
language="Cpp";
to the global options section. After that things are pretty
much the same as in java mode except that a all token and AST
classes are wrapped by a reference counting class (this to
make live easier (in some ways and much harder in others)).
The reference counting class uses
operator->
to reference the object it is wrapping. As a result of this
you use -> in C++ mode in stead of the '.' of java. See
the examples in examples/cpp for some illustrations.
New as of ANTLR 2.7.2 is that if you supply the
buildAST=true
option to a parser then you have
to set and initialize an ASTFactory for the parser and
treewalkers that use the resulting AST.
ASTFactory my_factory; // generates CommonAST per default..
MyParser parser( some-lexer );
// Do setup from the AST factory repeat this for all parsers using the AST
parser.initializeASTFactory( my_factory );
parser.setASTFactory( &my_factory );
In C++ mode it is also possible to override the AST type used
by the code generated by ANTLR. To do this you have to do the
following:
-
Define a custom AST class like the following:
#ifndef __MY_AST_H__
#define __MY_AST_H__
#include <antlr/CommonAST.hpp>
class MyAST;
typedef ANTLR_USE_NAMESPACE(antlr)ASTRefCount<MyAST> RefMyAST;
/** Custom AST class that adds line numbers to the AST nodes.
* easily extended with columns. Filenames will take more work since
* you'll need a custom token class as well (one that contains the
* filename)
*/
class MyAST : public ANTLR_USE_NAMESPACE(antlr)CommonAST {
public:
// copy constructor
MyAST( const MyAST& other )
: CommonAST(other)
, line(other.line)
{
}
// Default constructor
MyAST( void ) : CommonAST(), line(0) {}
virtual ~MyAST( void ) {}
// get the line number of the node (or try to derive it from the child node
virtual int getLine( void ) const
{
// most of the time the line number is not set if the node is a
// imaginary one. Usually this means it has a child. Refer to the
// child line number. Of course this could be extended a bit.
// based on an example by Peter Morling.
if ( line != 0 )
return line;
if( getFirstChild() )
return ( RefMyAST(getFirstChild())->getLine() );
return 0;
}
virtual void setLine( int l )
{
line = l;
}
/** the initialize methods are called by the tree building constructs
* depending on which version is called the line number is filled in.
* e.g. a bit depending on how the node is constructed it will have the
* line number filled in or not (imaginary nodes!).
*/
virtual void initialize(int t, const ANTLR_USE_NAMESPACE(std)string& txt)
{
CommonAST::initialize(t,txt);
line = 0;
}
virtual void initialize( ANTLR_USE_NAMESPACE(antlr)RefToken t )
{
CommonAST::initialize(t);
line = t->getLine();
}
virtual void initialize( RefMyAST ast )
{
CommonAST::initialize(ANTLR_USE_NAMESPACE(antlr)RefAST(ast));
line = ast->getLine();
}
// for convenience will also work without
void addChild( RefMyAST c )
{
BaseAST::addChild( ANTLR_USE_NAMESPACE(antlr)RefAST(c) );
}
// for convenience will also work without
void setNextSibling( RefMyAST c )
{
BaseAST::setNextSibling( ANTLR_USE_NAMESPACE(antlr)RefAST(c) );
}
// provide a clone of the node (no sibling/child pointers are copied)
virtual ANTLR_USE_NAMESPACE(antlr)RefAST clone( void )
{
return ANTLR_USE_NAMESPACE(antlr)RefAST(new MyAST(*this));
}
static ANTLR_USE_NAMESPACE(antlr)RefAST factory( void )
{
return ANTLR_USE_NAMESPACE(antlr)RefAST(RefMyAST(new MyAST()));
}
private:
int line;
};
#endif
-
Tell ANTLR's C++ codegenerator to use your RefMyAST by
including the following in the options section of your grammars:
ASTLabelType = "RefMyAST";
After that you only need to tell the parser before every
invocation of a new instance that it should use the AST
factory defined in your class. This is done like this:
// make factory with default type of MyAST
ASTFactory my_factory( "MyAST", MyAST::factory );
My_Parser parser(lexer);
// make sure the factory knows about all AST types in the parser..
parser.initializeASTFactory(my_factory);
// and tell the parser about the factory..
parser.setASTFactory( &my_factory );
After these steps you can access methods/attributes of (Ref)MyAST
directly (without typecasting) in parser/treewalker productions.
Forgetting to do a setASTFactory results in a nice SIGSEGV or you OS's
equivalent. The default constructor of ASTFactory initializes itself to
generate CommonAST objects.
If you use a 'chain' of parsers/treewalkers then you have to make sure
they all share the same AST factory. Also if you add new definitions of
ASTnodes/tokens in downstream parsers/treewalkers you have to apply the
respective initializeASTFactory methods to this factory.
This all is demonstrated in the examples/cpp/treewalk example.
This should now (as of 2.7.2) work in C++ mode. With probably some
caveats.
The heteroAST example show how to set things up. A short excerpt:
ASTFactory ast_factory;
parser.initializeASTFactory(ast_factory);
parser.setASTFactory(&ast_factory);
A small excerpt from the generated initializeASTFactory method:
void CalcParser::initializeASTFactory( antlr::ASTFactory& factory )
{
factory.registerFactory(4, "PLUSNode", PLUSNode::factory);
factory.registerFactory(5, "MULTNode", MULTNode::factory);
factory.registerFactory(6, "INTNode", INTNode::factory);
factory.setMaxNodeType(11);
}
After these steps ANTLR should be able to decide what factory to use at
what time.
In C++ mode ANTLR supports some extra functionality to make
life a little easier.
Inserting Code
In C++ mode some extra control is supplied over the places
where code can be placed in the gerenated files. These are
extensions on the header directive. The syntax is:
header "<identifier>" { }
identifier |
where |
pre_include_hpp |
Code is inserted before ANTLR generated includes in
the header file. |
post_include_hpp |
Code is inserted after ANTLR generated includes in
the header file, but outside any generated namespace
specifications. |
pre_include_cpp |
Code is inserted before ANTLR generated includes in
the cpp file. |
post_include_cpp |
Code is inserted after ANTLR generated includes in
the cpp file, but outside any generated namespace
specifications. |
Pacifying the preprocessor
Sometimes various tree building constructs with '#'
in them clash with the C/C++ preprocessor. ANTLR's
preprocessor for actions is slightly extended in C++ mode to
alleviate these pains.
NOTE: At some point I plan to replace the '#' by
something different that gives less trouble in C++.
The following preprocessor constructs are not
touched. (And as a result you cannot use these as labels for
AST nodes.
if
define
ifdef
ifndef
else
elif
endif
warning
error
ident
pragma
include
As another extra it's possible to escape '#'-signs
with a backslash e.g. "\#". As the action lexer sees these
they get translated to simple '#' characters.
header "pre_include_hpp" {
// gets inserted before antlr generated includes in the header file
}
header "post_include_hpp" {
// gets inserted after antlr generated includes in the header file
// outside any generated namespace specifications
}
header "pre_include_cpp" {
// gets inserted before the antlr generated includes in the cpp file
}
header "post_include_cpp" {
// gets inserted after the antlr generated includes in the cpp file
}
header {
// gets inserted after generated namespace specifications in the header
// file. But outside the generated class.
}
options {
language="Cpp";
namespace="something"; // encapsulate code in this namespace
// namespaceStd="std"; // cosmetic option to get rid of long defines
// in generated code
// namespaceAntlr="antlr"; // cosmetic option to get rid of long defines
// in generated code
genHashLines = true; // generated #line's or turn it off.
}
{
// global stuff in the cpp file
...
}
class MyParser extends Parser;
options {
exportVocab=My;
}
{
// additional methods and members
...
}
... rules ...
{
// global stuff in the cpp file
...
}
class MyLexer extends Lexer;
options {
exportVocab=My;
}
{
// additional methods and members
...
}
... rules ...
{
// global stuff in the cpp file
...
}
class MyTreeParser extends TreeParser;
options {
exportVocab=My;
}
{
// additional methods and members
...
}
... rules ...