Blame - doc/bison.texinfo - fp2-dev/platform/external/bison

blob: 6c67b59392b7d3ddf648a071eec46f2dfc3d6197 [file] [log] [blame]

The Android Open Source Project	b7f2b4d	2008-12-17 18:04:06 -0800	[diff] [blame^]	1	\input texinfo @c --texinfo--
				2	@comment %**start of header
				3	@setfilename bison.info
				4	@include version.texi
				5	@settitle Bison @value{VERSION}
				6	@setchapternewpage odd
				7
				8	@finalout
				9
				10	@c SMALL BOOK version
				11	@c This edition has been formatted so that you can format and print it in
				12	@c the smallbook format.
				13	@c @smallbook
				14
				15	@c Set following if you want to document %default-prec and %no-default-prec.
				16	@c This feature is experimental and may change in future Bison versions.
				17	@c @set defaultprec
				18
				19	@ifnotinfo
				20	@syncodeindex fn cp
				21	@syncodeindex vr cp
				22	@syncodeindex tp cp
				23	@end ifnotinfo
				24	@ifinfo
				25	@synindex fn cp
				26	@synindex vr cp
				27	@synindex tp cp
				28	@end ifinfo
				29	@comment %**end of header
				30
				31	@copying
				32
				33	This manual is for @acronym{GNU} Bison (version @value{VERSION},
				34	@value{UPDATED}), the @acronym{GNU} parser generator.
				35
				36	Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
				37	1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
				38
				39	@quotation
				40	Permission is granted to copy, distribute and/or modify this document
				41	under the terms of the @acronym{GNU} Free Documentation License,
				42	Version 1.2 or any later version published by the Free Software
				43	Foundation; with no Invariant Sections, with the Front-Cover texts
				44	being ``A @acronym{GNU} Manual,'' and with the Back-Cover Texts as in
				45	(a) below. A copy of the license is included in the section entitled
				46	``@acronym{GNU} Free Documentation License.''
				47
				48	(a) The @acronym{FSF}'s Back-Cover Text is: ``You have freedom to copy
				49	and modify this @acronym{GNU} Manual, like @acronym{GNU} software.
				50	Copies published by the Free Software Foundation raise funds for
				51	@acronym{GNU} development.''
				52	@end quotation
				53	@end copying
				54
				55	@dircategory Software development
				56	@direntry
				57	* bison: (bison). @acronym{GNU} parser generator (Yacc replacement).
				58	@end direntry
				59
				60	@titlepage
				61	@title Bison
				62	@subtitle The Yacc-compatible Parser Generator
				63	@subtitle @value{UPDATED}, Bison Version @value{VERSION}
				64
				65	@author by Charles Donnelly and Richard Stallman
				66
				67	@page
				68	@vskip 0pt plus 1filll
				69	@insertcopying
				70	@sp 2
				71	Published by the Free Software Foundation @*
				72	51 Franklin Street, Fifth Floor @*
				73	Boston, MA 02110-1301 USA @*
				74	Printed copies are available from the Free Software Foundation.@*
				75	@acronym{ISBN} 1-882114-44-2
				76	@sp 2
				77	Cover art by Etienne Suvasa.
				78	@end titlepage
				79
				80	@contents
				81
				82	@ifnottex
				83	@node Top
				84	@top Bison
				85	@insertcopying
				86	@end ifnottex
				87
				88	@menu
				89	* Introduction::
				90	* Conditions::
				91	* Copying:: The @acronym{GNU} General Public License says
				92	how you can copy and share Bison
				93
				94	Tutorial sections:
				95	* Concepts:: Basic concepts for understanding Bison.
				96	* Examples:: Three simple explained examples of using Bison.
				97
				98	Reference sections:
				99	* Grammar File:: Writing Bison declarations and rules.
				100	* Interface:: C-language interface to the parser function @code{yyparse}.
				101	* Algorithm:: How the Bison parser works at run-time.
				102	* Error Recovery:: Writing rules for error recovery.
				103	* Context Dependency:: What to do if your language syntax is too
				104	messy for Bison to handle straightforwardly.
				105	* Debugging:: Understanding or debugging Bison parsers.
				106	* Invocation:: How to run Bison (to produce the parser source file).
				107	* C++ Language Interface:: Creating C++ parser objects.
				108	* FAQ:: Frequently Asked Questions
				109	* Table of Symbols:: All the keywords of the Bison language are explained.
				110	* Glossary:: Basic concepts are explained.
				111	* Copying This Manual:: License for copying this manual.
				112	* Index:: Cross-references to the text.
				113
				114	@detailmenu
				115	--- The Detailed Node Listing ---
				116
				117	The Concepts of Bison
				118
				119	* Language and Grammar:: Languages and context-free grammars,
				120	as mathematical ideas.
				121	* Grammar in Bison:: How we represent grammars for Bison's sake.
				122	* Semantic Values:: Each token or syntactic grouping can have
				123	a semantic value (the value of an integer,
				124	the name of an identifier, etc.).
				125	* Semantic Actions:: Each rule can have an action containing C code.
				126	* GLR Parsers:: Writing parsers for general context-free languages.
				127	* Locations Overview:: Tracking Locations.
				128	* Bison Parser:: What are Bison's input and output,
				129	how is the output used?
				130	* Stages:: Stages in writing and running Bison grammars.
				131	* Grammar Layout:: Overall structure of a Bison grammar file.
				132
				133	Writing @acronym{GLR} Parsers
				134
				135	* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars.
				136	* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities.
				137	* GLR Semantic Actions:: Deferred semantic actions have special concerns.
				138	* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler.
				139
				140	Examples
				141
				142	* RPN Calc:: Reverse polish notation calculator;
				143	a first example with no operator precedence.
				144	* Infix Calc:: Infix (algebraic) notation calculator.
				145	Operator precedence is introduced.
				146	* Simple Error Recovery:: Continuing after syntax errors.
				147	* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
				148	* Multi-function Calc:: Calculator with memory and trig functions.
				149	It uses multiple data-types for semantic values.
				150	* Exercises:: Ideas for improving the multi-function calculator.
				151
				152	Reverse Polish Notation Calculator
				153
				154	* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc.
				155	* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
				156	* Lexer: Rpcalc Lexer. The lexical analyzer.
				157	* Main: Rpcalc Main. The controlling function.
				158	* Error: Rpcalc Error. The error reporting function.
				159	* Gen: Rpcalc Gen. Running Bison on the grammar file.
				160	* Comp: Rpcalc Compile. Run the C compiler on the output code.
				161
				162	Grammar Rules for @code{rpcalc}
				163
				164	* Rpcalc Input::
				165	* Rpcalc Line::
				166	* Rpcalc Expr::
				167
				168	Location Tracking Calculator: @code{ltcalc}
				169
				170	* Decls: Ltcalc Decls. Bison and C declarations for ltcalc.
				171	* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations.
				172	* Lexer: Ltcalc Lexer. The lexical analyzer.
				173
				174	Multi-Function Calculator: @code{mfcalc}
				175
				176	* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
				177	* Rules: Mfcalc Rules. Grammar rules for the calculator.
				178	* Symtab: Mfcalc Symtab. Symbol table management subroutines.
				179
				180	Bison Grammar Files
				181
				182	* Grammar Outline:: Overall layout of the grammar file.
				183	* Symbols:: Terminal and nonterminal symbols.
				184	* Rules:: How to write grammar rules.
				185	* Recursion:: Writing recursive rules.
				186	* Semantics:: Semantic values and actions.
				187	* Locations:: Locations and actions.
				188	* Declarations:: All kinds of Bison declarations are described here.
				189	* Multiple Parsers:: Putting more than one Bison parser in one program.
				190
				191	Outline of a Bison Grammar
				192
				193	* Prologue:: Syntax and usage of the prologue.
				194	* Bison Declarations:: Syntax and usage of the Bison declarations section.
				195	* Grammar Rules:: Syntax and usage of the grammar rules section.
				196	* Epilogue:: Syntax and usage of the epilogue.
				197
				198	Defining Language Semantics
				199
				200	* Value Type:: Specifying one data type for all semantic values.
				201	* Multiple Types:: Specifying several alternative data types.
				202	* Actions:: An action is the semantic definition of a grammar rule.
				203	* Action Types:: Specifying data types for actions to operate on.
				204	* Mid-Rule Actions:: Most actions go at the end of a rule.
				205	This says when, why and how to use the exceptional
				206	action in the middle of a rule.
				207
				208	Tracking Locations
				209
				210	* Location Type:: Specifying a data type for locations.
				211	* Actions and Locations:: Using locations in actions.
				212	* Location Default Action:: Defining a general way to compute locations.
				213
				214	Bison Declarations
				215
				216	* Require Decl:: Requiring a Bison version.
				217	* Token Decl:: Declaring terminal symbols.
				218	* Precedence Decl:: Declaring terminals with precedence and associativity.
				219	* Union Decl:: Declaring the set of all semantic value types.
				220	* Type Decl:: Declaring the choice of type for a nonterminal symbol.
				221	* Initial Action Decl:: Code run before parsing starts.
				222	* Destructor Decl:: Declaring how symbols are freed.
				223	* Expect Decl:: Suppressing warnings about parsing conflicts.
				224	* Start Decl:: Specifying the start symbol.
				225	* Pure Decl:: Requesting a reentrant parser.
				226	* Decl Summary:: Table of all Bison declarations.
				227
				228	Parser C-Language Interface
				229
				230	* Parser Function:: How to call @code{yyparse} and what it returns.
				231	* Lexical:: You must supply a function @code{yylex}
				232	which reads tokens.
				233	* Error Reporting:: You must supply a function @code{yyerror}.
				234	* Action Features:: Special features for use in actions.
				235	* Internationalization:: How to let the parser speak in the user's
				236	native language.
				237
				238	The Lexical Analyzer Function @code{yylex}
				239
				240	* Calling Convention:: How @code{yyparse} calls @code{yylex}.
				241	* Token Values:: How @code{yylex} must return the semantic value
				242	of the token it has read.
				243	* Token Locations:: How @code{yylex} must return the text location
				244	(line number, etc.) of the token, if the
				245	actions want that.
				246	* Pure Calling:: How the calling convention differs
				247	in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
				248
				249	The Bison Parser Algorithm
				250
				251	* Look-Ahead:: Parser looks one token ahead when deciding what to do.
				252	* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
				253	* Precedence:: Operator precedence works by resolving conflicts.
				254	* Contextual Precedence:: When an operator's precedence depends on context.
				255	* Parser States:: The parser is a finite-state-machine with stack.
				256	* Reduce/Reduce:: When two rules are applicable in the same situation.
				257	* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
				258	* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
				259	* Memory Management:: What happens when memory is exhausted. How to avoid it.
				260
				261	Operator Precedence
				262
				263	* Why Precedence:: An example showing why precedence is needed.
				264	* Using Precedence:: How to specify precedence in Bison grammars.
				265	* Precedence Examples:: How these features are used in the previous example.
				266	* How Precedence:: How they work.
				267
				268	Handling Context Dependencies
				269
				270	* Semantic Tokens:: Token parsing can depend on the semantic context.
				271	* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
				272	* Tie-in Recovery:: Lexical tie-ins have implications for how
				273	error recovery rules must be written.
				274
				275	Debugging Your Parser
				276
				277	* Understanding:: Understanding the structure of your parser.
				278	* Tracing:: Tracing the execution of your parser.
				279
				280	Invoking Bison
				281
				282	* Bison Options:: All the options described in detail,
				283	in alphabetical order by short options.
				284	* Option Cross Key:: Alphabetical list of long options.
				285	* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
				286
				287	C++ Language Interface
				288
				289	* C++ Parsers:: The interface to generate C++ parser classes
				290	* A Complete C++ Example:: Demonstrating their use
				291
				292	C++ Parsers
				293
				294	* C++ Bison Interface:: Asking for C++ parser generation
				295	* C++ Semantic Values:: %union vs. C++
				296	* C++ Location Values:: The position and location classes
				297	* C++ Parser Interface:: Instantiating and running the parser
				298	* C++ Scanner Interface:: Exchanges between yylex and parse
				299
				300	A Complete C++ Example
				301
				302	* Calc++ --- C++ Calculator:: The specifications
				303	* Calc++ Parsing Driver:: An active parsing context
				304	* Calc++ Parser:: A parser class
				305	* Calc++ Scanner:: A pure C++ Flex scanner
				306	* Calc++ Top Level:: Conducting the band
				307
				308	Frequently Asked Questions
				309
				310	* Memory Exhausted:: Breaking the Stack Limits
				311	* How Can I Reset the Parser:: @code{yyparse} Keeps some State
				312	* Strings are Destroyed:: @code{yylval} Loses Track of Strings
				313	* Implementing Gotos/Loops:: Control Flow in the Calculator
				314	* Multiple start-symbols:: Factoring closely related grammars
				315	* Secure? Conform?:: Is Bison @acronym{POSIX} safe?
				316	* I can't build Bison:: Troubleshooting
				317	* Where can I find help?:: Troubleshouting
				318	* Bug Reports:: Troublereporting
				319	* Other Languages:: Parsers in Java and others
				320	* Beta Testing:: Experimenting development versions
				321	* Mailing Lists:: Meeting other Bison users
				322
				323	Copying This Manual
				324
				325	* GNU Free Documentation License:: License for copying this manual.
				326
				327	@end detailmenu
				328	@end menu
				329
				330	@node Introduction
				331	@unnumbered Introduction
				332	@cindex introduction
				333
				334	@dfn{Bison} is a general-purpose parser generator that converts an
				335	annotated context-free grammar into an @acronym{LALR}(1) or
				336	@acronym{GLR} parser for that grammar. Once you are proficient with
				337	Bison, you can use it to develop a wide range of language parsers, from those
				338	used in simple desk calculators to complex programming languages.
				339
				340	Bison is upward compatible with Yacc: all properly-written Yacc grammars
				341	ought to work with Bison with no change. Anyone familiar with Yacc
				342	should be able to use Bison with little trouble. You need to be fluent in
				343	C or C++ programming in order to use Bison or to understand this manual.
				344
				345	We begin with tutorial chapters that explain the basic concepts of using
				346	Bison and show three explained examples, each building on the last. If you
				347	don't know Bison or Yacc, start by reading these chapters. Reference
				348	chapters follow which describe specific aspects of Bison in detail.
				349
				350	Bison was written primarily by Robert Corbett; Richard Stallman made it
				351	Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added
				352	multi-character string literals and other features.
				353
				354	This edition corresponds to version @value{VERSION} of Bison.
				355
				356	@node Conditions
				357	@unnumbered Conditions for Using Bison
				358
				359	The distribution terms for Bison-generated parsers permit using the
				360	parsers in nonfree programs. Before Bison version 2.2, these extra
				361	permissions applied only when Bison was generating @acronym{LALR}(1)
				362	parsers in C@. And before Bison version 1.24, Bison-generated
				363	parsers could be used only in programs that were free software.
				364
				365	The other @acronym{GNU} programming tools, such as the @acronym{GNU} C
				366	compiler, have never
				367	had such a requirement. They could always be used for nonfree
				368	software. The reason Bison was different was not due to a special
				369	policy decision; it resulted from applying the usual General Public
				370	License to all of the Bison source code.
				371
				372	The output of the Bison utility---the Bison parser file---contains a
				373	verbatim copy of a sizable piece of Bison, which is the code for the
				374	parser's implementation. (The actions from your grammar are inserted
				375	into this implementation at one point, but most of the rest of the
				376	implementation is not changed.) When we applied the @acronym{GPL}
				377	terms to the skeleton code for the parser's implementation,
				378	the effect was to restrict the use of Bison output to free software.
				379
				380	We didn't change the terms because of sympathy for people who want to
				381	make software proprietary. @strong{Software should be free.} But we
				382	concluded that limiting Bison's use to free software was doing little to
				383	encourage people to make other software free. So we decided to make the
				384	practical conditions for using Bison match the practical conditions for
				385	using the other @acronym{GNU} tools.
				386
				387	This exception applies when Bison is generating code for a parser.
				388	You can tell whether the exception applies to a Bison output file by
				389	inspecting the file for text beginning with ``As a special
				390	exception@dots{}''. The text spells out the exact terms of the
				391	exception.
				392
				393	@include gpl.texi
				394
				395	@node Concepts
				396	@chapter The Concepts of Bison
				397
				398	This chapter introduces many of the basic concepts without which the
				399	details of Bison will not make sense. If you do not already know how to
				400	use Bison or Yacc, we suggest you start by reading this chapter carefully.
				401
				402	@menu
				403	* Language and Grammar:: Languages and context-free grammars,
				404	as mathematical ideas.
				405	* Grammar in Bison:: How we represent grammars for Bison's sake.
				406	* Semantic Values:: Each token or syntactic grouping can have
				407	a semantic value (the value of an integer,
				408	the name of an identifier, etc.).
				409	* Semantic Actions:: Each rule can have an action containing C code.
				410	* GLR Parsers:: Writing parsers for general context-free languages.
				411	* Locations Overview:: Tracking Locations.
				412	* Bison Parser:: What are Bison's input and output,
				413	how is the output used?
				414	* Stages:: Stages in writing and running Bison grammars.
				415	* Grammar Layout:: Overall structure of a Bison grammar file.
				416	@end menu
				417
				418	@node Language and Grammar
				419	@section Languages and Context-Free Grammars
				420
				421	@cindex context-free grammar
				422	@cindex grammar, context-free
				423	In order for Bison to parse a language, it must be described by a
				424	@dfn{context-free grammar}. This means that you specify one or more
				425	@dfn{syntactic groupings} and give rules for constructing them from their
				426	parts. For example, in the C language, one kind of grouping is called an
				427	`expression'. One rule for making an expression might be, ``An expression
				428	can be made of a minus sign and another expression''. Another would be,
				429	``An expression can be an integer''. As you can see, rules are often
				430	recursive, but there must be at least one rule which leads out of the
				431	recursion.
				432
				433	@cindex @acronym{BNF}
				434	@cindex Backus-Naur form
				435	The most common formal system for presenting such rules for humans to read
				436	is @dfn{Backus-Naur Form} or ``@acronym{BNF}'', which was developed in
				437	order to specify the language Algol 60. Any grammar expressed in
				438	@acronym{BNF} is a context-free grammar. The input to Bison is
				439	essentially machine-readable @acronym{BNF}.
				440
				441	@cindex @acronym{LALR}(1) grammars
				442	@cindex @acronym{LR}(1) grammars
				443	There are various important subclasses of context-free grammar. Although it
				444	can handle almost all context-free grammars, Bison is optimized for what
				445	are called @acronym{LALR}(1) grammars.
				446	In brief, in these grammars, it must be possible to
				447	tell how to parse any portion of an input string with just a single
				448	token of look-ahead. Strictly speaking, that is a description of an
				449	@acronym{LR}(1) grammar, and @acronym{LALR}(1) involves additional
				450	restrictions that are
				451	hard to explain simply; but it is rare in actual practice to find an
				452	@acronym{LR}(1) grammar that fails to be @acronym{LALR}(1).
				453	@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for
				454	more information on this.
				455
				456	@cindex @acronym{GLR} parsing
				457	@cindex generalized @acronym{LR} (@acronym{GLR}) parsing
				458	@cindex ambiguous grammars
				459	@cindex nondeterministic parsing
				460
				461	Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning
				462	roughly that the next grammar rule to apply at any point in the input is
				463	uniquely determined by the preceding input and a fixed, finite portion
				464	(called a @dfn{look-ahead}) of the remaining input. A context-free
				465	grammar can be @dfn{ambiguous}, meaning that there are multiple ways to
				466	apply the grammar rules to get the same inputs. Even unambiguous
				467	grammars can be @dfn{nondeterministic}, meaning that no fixed
				468	look-ahead always suffices to determine the next grammar rule to apply.
				469	With the proper declarations, Bison is also able to parse these more
				470	general context-free grammars, using a technique known as @acronym{GLR}
				471	parsing (for Generalized @acronym{LR}). Bison's @acronym{GLR} parsers
				472	are able to handle any context-free grammar for which the number of
				473	possible parses of any given string is finite.
				474
				475	@cindex symbols (abstract)
				476	@cindex token
				477	@cindex syntactic grouping
				478	@cindex grouping, syntactic
				479	In the formal grammatical rules for a language, each kind of syntactic
				480	unit or grouping is named by a @dfn{symbol}. Those which are built by
				481	grouping smaller constructs according to grammatical rules are called
				482	@dfn{nonterminal symbols}; those which can't be subdivided are called
				483	@dfn{terminal symbols} or @dfn{token types}. We call a piece of input
				484	corresponding to a single terminal symbol a @dfn{token}, and a piece
				485	corresponding to a single nonterminal symbol a @dfn{grouping}.
				486
				487	We can use the C language as an example of what symbols, terminal and
				488	nonterminal, mean. The tokens of C are identifiers, constants (numeric
				489	and string), and the various keywords, arithmetic operators and
				490	punctuation marks. So the terminal symbols of a grammar for C include
				491	`identifier', `number', `string', plus one symbol for each keyword,
				492	operator or punctuation mark: `if', `return', `const', `static', `int',
				493	`char', `plus-sign', `open-brace', `close-brace', `comma' and many more.
				494	(These tokens can be subdivided into characters, but that is a matter of
				495	lexicography, not grammar.)
				496
				497	Here is a simple C function subdivided into tokens:
				498
				499	@ifinfo
				500	@example
				501	int /* @r{keyword `int'} */
				502	square (int x) /* @r{identifier, open-paren, keyword `int',}
				503	@r{identifier, close-paren} */
				504	@{ /* @r{open-brace} */
				505	return x * x; /* @r{keyword `return', identifier, asterisk,}
				506	@r{identifier, semicolon} */
				507	@} /* @r{close-brace} */
				508	@end example
				509	@end ifinfo
				510	@ifnotinfo
				511	@example
				512	int /* @r{keyword `int'} */
				513	square (int x) /* @r{identifier, open-paren, keyword `int', identifier, close-paren} */
				514	@{ /* @r{open-brace} */
				515	return x * x; /* @r{keyword `return', identifier, asterisk, identifier, semicolon} */
				516	@} /* @r{close-brace} */
				517	@end example
				518	@end ifnotinfo
				519
				520	The syntactic groupings of C include the expression, the statement, the
				521	declaration, and the function definition. These are represented in the
				522	grammar of C by nonterminal symbols `expression', `statement',
				523	`declaration' and `function definition'. The full grammar uses dozens of
				524	additional language constructs, each with its own nonterminal symbol, in
				525	order to express the meanings of these four. The example above is a
				526	function definition; it contains one declaration, and one statement. In
				527	the statement, each @samp{x} is an expression and so is @samp{x * x}.
				528
				529	Each nonterminal symbol must have grammatical rules showing how it is made
				530	out of simpler constructs. For example, one kind of C statement is the
				531	@code{return} statement; this would be described with a grammar rule which
				532	reads informally as follows:
				533
				534	@quotation
				535	A `statement' can be made of a `return' keyword, an `expression' and a
				536	`semicolon'.
				537	@end quotation
				538
				539	@noindent
				540	There would be many other rules for `statement', one for each kind of
				541	statement in C.
				542
				543	@cindex start symbol
				544	One nonterminal symbol must be distinguished as the special one which
				545	defines a complete utterance in the language. It is called the @dfn{start
				546	symbol}. In a compiler, this means a complete input program. In the C
				547	language, the nonterminal symbol `sequence of definitions and declarations'
				548	plays this role.
				549
				550	For example, @samp{1 + 2} is a valid C expression---a valid part of a C
				551	program---but it is not valid as an @emph{entire} C program. In the
				552	context-free grammar of C, this follows from the fact that `expression' is
				553	not the start symbol.
				554
				555	The Bison parser reads a sequence of tokens as its input, and groups the
				556	tokens using the grammar rules. If the input is valid, the end result is
				557	that the entire token sequence reduces to a single grouping whose symbol is
				558	the grammar's start symbol. If we use a grammar for C, the entire input
				559	must be a `sequence of definitions and declarations'. If not, the parser
				560	reports a syntax error.
				561
				562	@node Grammar in Bison
				563	@section From Formal Rules to Bison Input
				564	@cindex Bison grammar
				565	@cindex grammar, Bison
				566	@cindex formal grammar
				567
				568	A formal grammar is a mathematical construct. To define the language
				569	for Bison, you must write a file expressing the grammar in Bison syntax:
				570	a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}.
				571
				572	A nonterminal symbol in the formal grammar is represented in Bison input
				573	as an identifier, like an identifier in C@. By convention, it should be
				574	in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
				575
				576	The Bison representation for a terminal symbol is also called a @dfn{token
				577	type}. Token types as well can be represented as C-like identifiers. By
				578	convention, these identifiers should be upper case to distinguish them from
				579	nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
				580	@code{RETURN}. A terminal symbol that stands for a particular keyword in
				581	the language should be named after that keyword converted to upper case.
				582	The terminal symbol @code{error} is reserved for error recovery.
				583	@xref{Symbols}.
				584
				585	A terminal symbol can also be represented as a character literal, just like
				586	a C character constant. You should do this whenever a token is just a
				587	single character (parenthesis, plus-sign, etc.): use that same character in
				588	a literal as the terminal symbol for that token.
				589
				590	A third way to represent a terminal symbol is with a C string constant
				591	containing several characters. @xref{Symbols}, for more information.
				592
				593	The grammar rules also have an expression in Bison syntax. For example,
				594	here is the Bison rule for a C @code{return} statement. The semicolon in
				595	quotes is a literal character token, representing part of the C syntax for
				596	the statement; the naked semicolon, and the colon, are Bison punctuation
				597	used in every rule.
				598
				599	@example
				600	stmt: RETURN expr ';'
				601	;
				602	@end example
				603
				604	@noindent
				605	@xref{Rules, ,Syntax of Grammar Rules}.
				606
				607	@node Semantic Values
				608	@section Semantic Values
				609	@cindex semantic value
				610	@cindex value, semantic
				611
				612	A formal grammar selects tokens only by their classifications: for example,
				613	if a rule mentions the terminal symbol `integer constant', it means that
				614	@emph{any} integer constant is grammatically valid in that position. The
				615	precise value of the constant is irrelevant to how to parse the input: if
				616	@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
				617	grammatical.
				618
				619	But the precise value is very important for what the input means once it is
				620	parsed. A compiler is useless if it fails to distinguish between 4, 1 and
				621	3989 as constants in the program! Therefore, each token in a Bison grammar
				622	has both a token type and a @dfn{semantic value}. @xref{Semantics,
				623	,Defining Language Semantics},
				624	for details.
				625
				626	The token type is a terminal symbol defined in the grammar, such as
				627	@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything
				628	you need to know to decide where the token may validly appear and how to
				629	group it with other tokens. The grammar rules know nothing about tokens
				630	except their types.
				631
				632	The semantic value has all the rest of the information about the
				633	meaning of the token, such as the value of an integer, or the name of an
				634	identifier. (A token such as @code{','} which is just punctuation doesn't
				635	need to have any semantic value.)
				636
				637	For example, an input token might be classified as token type
				638	@code{INTEGER} and have the semantic value 4. Another input token might
				639	have the same token type @code{INTEGER} but value 3989. When a grammar
				640	rule says that @code{INTEGER} is allowed, either of these tokens is
				641	acceptable because each is an @code{INTEGER}. When the parser accepts the
				642	token, it keeps track of the token's semantic value.
				643
				644	Each grouping can also have a semantic value as well as its nonterminal
				645	symbol. For example, in a calculator, an expression typically has a
				646	semantic value that is a number. In a compiler for a programming
				647	language, an expression typically has a semantic value that is a tree
				648	structure describing the meaning of the expression.
				649
				650	@node Semantic Actions
				651	@section Semantic Actions
				652	@cindex semantic actions
				653	@cindex actions, semantic
				654
				655	In order to be useful, a program must do more than parse input; it must
				656	also produce some output based on the input. In a Bison grammar, a grammar
				657	rule can have an @dfn{action} made up of C statements. Each time the
				658	parser recognizes a match for that rule, the action is executed.
				659	@xref{Actions}.
				660
				661	Most of the time, the purpose of an action is to compute the semantic value
				662	of the whole construct from the semantic values of its parts. For example,
				663	suppose we have a rule which says an expression can be the sum of two
				664	expressions. When the parser recognizes such a sum, each of the
				665	subexpressions has a semantic value which describes how it was built up.
				666	The action for this rule should create a similar sort of value for the
				667	newly recognized larger expression.
				668
				669	For example, here is a rule that says an expression can be the sum of
				670	two subexpressions:
				671
				672	@example
				673	expr: expr '+' expr @{ $$ = $1 + $3; @}
				674	;
				675	@end example
				676
				677	@noindent
				678	The action says how to produce the semantic value of the sum expression
				679	from the values of the two subexpressions.
				680
				681	@node GLR Parsers
				682	@section Writing @acronym{GLR} Parsers
				683	@cindex @acronym{GLR} parsing
				684	@cindex generalized @acronym{LR} (@acronym{GLR}) parsing
				685	@findex %glr-parser
				686	@cindex conflicts
				687	@cindex shift/reduce conflicts
				688	@cindex reduce/reduce conflicts
				689
				690	In some grammars, Bison's standard
				691	@acronym{LALR}(1) parsing algorithm cannot decide whether to apply a
				692	certain grammar rule at a given point. That is, it may not be able to
				693	decide (on the basis of the input read so far) which of two possible
				694	reductions (applications of a grammar rule) applies, or whether to apply
				695	a reduction or read more of the input and apply a reduction later in the
				696	input. These are known respectively as @dfn{reduce/reduce} conflicts
				697	(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts
				698	(@pxref{Shift/Reduce}).
				699
				700	To use a grammar that is not easily modified to be @acronym{LALR}(1), a
				701	more general parsing algorithm is sometimes necessary. If you include
				702	@code{%glr-parser} among the Bison declarations in your file
				703	(@pxref{Grammar Outline}), the result is a Generalized @acronym{LR}
				704	(@acronym{GLR}) parser. These parsers handle Bison grammars that
				705	contain no unresolved conflicts (i.e., after applying precedence
				706	declarations) identically to @acronym{LALR}(1) parsers. However, when
				707	faced with unresolved shift/reduce and reduce/reduce conflicts,
				708	@acronym{GLR} parsers use the simple expedient of doing both,
				709	effectively cloning the parser to follow both possibilities. Each of
				710	the resulting parsers can again split, so that at any given time, there
				711	can be any number of possible parses being explored. The parsers
				712	proceed in lockstep; that is, all of them consume (shift) a given input
				713	symbol before any of them proceed to the next. Each of the cloned
				714	parsers eventually meets one of two possible fates: either it runs into
				715	a parsing error, in which case it simply vanishes, or it merges with
				716	another parser, because the two of them have reduced the input to an
				717	identical set of symbols.
				718
				719	During the time that there are multiple parsers, semantic actions are
				720	recorded, but not performed. When a parser disappears, its recorded
				721	semantic actions disappear as well, and are never performed. When a
				722	reduction makes two parsers identical, causing them to merge, Bison
				723	records both sets of semantic actions. Whenever the last two parsers
				724	merge, reverting to the single-parser case, Bison resolves all the
				725	outstanding actions either by precedences given to the grammar rules
				726	involved, or by performing both actions, and then calling a designated
				727	user-defined function on the resulting values to produce an arbitrary
				728	merged result.
				729
				730	@menu
				731	* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars.
				732	* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities.
				733	* GLR Semantic Actions:: Deferred semantic actions have special concerns.
				734	* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler.
				735	@end menu
				736
				737	@node Simple GLR Parsers
				738	@subsection Using @acronym{GLR} on Unambiguous Grammars
				739	@cindex @acronym{GLR} parsing, unambiguous grammars
				740	@cindex generalized @acronym{LR} (@acronym{GLR}) parsing, unambiguous grammars
				741	@findex %glr-parser
				742	@findex %expect-rr
				743	@cindex conflicts
				744	@cindex reduce/reduce conflicts
				745	@cindex shift/reduce conflicts
				746
				747	In the simplest cases, you can use the @acronym{GLR} algorithm
				748	to parse grammars that are unambiguous, but fail to be @acronym{LALR}(1).
				749	Such grammars typically require more than one symbol of look-ahead,
				750	or (in rare cases) fall into the category of grammars in which the
				751	@acronym{LALR}(1) algorithm throws away too much information (they are in
				752	@acronym{LR}(1), but not @acronym{LALR}(1), @ref{Mystery Conflicts}).
				753
				754	Consider a problem that
				755	arises in the declaration of enumerated and subrange types in the
				756	programming language Pascal. Here are some examples:
				757
				758	@example
				759	type subrange = lo .. hi;
				760	type enum = (a, b, c);
				761	@end example
				762
				763	@noindent
				764	The original language standard allows only numeric
				765	literals and constant identifiers for the subrange bounds (@samp{lo}
				766	and @samp{hi}), but Extended Pascal (@acronym{ISO}/@acronym{IEC}
				767	10206) and many other
				768	Pascal implementations allow arbitrary expressions there. This gives
				769	rise to the following situation, containing a superfluous pair of
				770	parentheses:
				771
				772	@example
				773	type subrange = (a) .. b;
				774	@end example
				775
				776	@noindent
				777	Compare this to the following declaration of an enumerated
				778	type with only one value:
				779
				780	@example
				781	type enum = (a);
				782	@end example
				783
				784	@noindent
				785	(These declarations are contrived, but they are syntactically
				786	valid, and more-complicated cases can come up in practical programs.)
				787
				788	These two declarations look identical until the @samp{..} token.
				789	With normal @acronym{LALR}(1) one-token look-ahead it is not
				790	possible to decide between the two forms when the identifier
				791	@samp{a} is parsed. It is, however, desirable
				792	for a parser to decide this, since in the latter case
				793	@samp{a} must become a new identifier to represent the enumeration
				794	value, while in the former case @samp{a} must be evaluated with its
				795	current meaning, which may be a constant or even a function call.
				796
				797	You could parse @samp{(a)} as an ``unspecified identifier in parentheses'',
				798	to be resolved later, but this typically requires substantial
				799	contortions in both semantic actions and large parts of the
				800	grammar, where the parentheses are nested in the recursive rules for
				801	expressions.
				802
				803	You might think of using the lexer to distinguish between the two
				804	forms by returning different tokens for currently defined and
				805	undefined identifiers. But if these declarations occur in a local
				806	scope, and @samp{a} is defined in an outer scope, then both forms
				807	are possible---either locally redefining @samp{a}, or using the
				808	value of @samp{a} from the outer scope. So this approach cannot
				809	work.
				810
				811	A simple solution to this problem is to declare the parser to
				812	use the @acronym{GLR} algorithm.
				813	When the @acronym{GLR} parser reaches the critical state, it
				814	merely splits into two branches and pursues both syntax rules
				815	simultaneously. Sooner or later, one of them runs into a parsing
				816	error. If there is a @samp{..} token before the next
				817	@samp{;}, the rule for enumerated types fails since it cannot
				818	accept @samp{..} anywhere; otherwise, the subrange type rule
				819	fails since it requires a @samp{..} token. So one of the branches
				820	fails silently, and the other one continues normally, performing
				821	all the intermediate actions that were postponed during the split.
				822
				823	If the input is syntactically incorrect, both branches fail and the parser
				824	reports a syntax error as usual.
				825
				826	The effect of all this is that the parser seems to ``guess'' the
				827	correct branch to take, or in other words, it seems to use more
				828	look-ahead than the underlying @acronym{LALR}(1) algorithm actually allows
				829	for. In this example, @acronym{LALR}(2) would suffice, but also some cases
				830	that are not @acronym{LALR}(@math{k}) for any @math{k} can be handled this way.
				831
				832	In general, a @acronym{GLR} parser can take quadratic or cubic worst-case time,
				833	and the current Bison parser even takes exponential time and space
				834	for some grammars. In practice, this rarely happens, and for many
				835	grammars it is possible to prove that it cannot happen.
				836	The present example contains only one conflict between two
				837	rules, and the type-declaration context containing the conflict
				838	cannot be nested. So the number of
				839	branches that can exist at any time is limited by the constant 2,
				840	and the parsing time is still linear.
				841
				842	Here is a Bison grammar corresponding to the example above. It
				843	parses a vastly simplified form of Pascal type declarations.
				844
				845	@example
				846	%token TYPE DOTDOT ID
				847
				848	@group
				849	%left '+' '-'
				850	%left '*' '/'
				851	@end group
				852
				853	%%
				854
				855	@group
				856	type_decl : TYPE ID '=' type ';'
				857	;
				858	@end group
				859
				860	@group
				861	type : '(' id_list ')'
				862	\| expr DOTDOT expr
				863	;
				864	@end group
				865
				866	@group
				867	id_list : ID
				868	\| id_list ',' ID
				869	;
				870	@end group
				871
				872	@group
				873	expr : '(' expr ')'
				874	\| expr '+' expr
				875	\| expr '-' expr
				876	\| expr '*' expr
				877	\| expr '/' expr
				878	\| ID
				879	;
				880	@end group
				881	@end example
				882
				883	When used as a normal @acronym{LALR}(1) grammar, Bison correctly complains
				884	about one reduce/reduce conflict. In the conflicting situation the
				885	parser chooses one of the alternatives, arbitrarily the one
				886	declared first. Therefore the following correct input is not
				887	recognized:
				888
				889	@example
				890	type t = (a) .. b;
				891	@end example
				892
				893	The parser can be turned into a @acronym{GLR} parser, while also telling Bison
				894	to be silent about the one known reduce/reduce conflict, by
				895	adding these two declarations to the Bison input file (before the first
				896	@samp{%%}):
				897
				898	@example
				899	%glr-parser
				900	%expect-rr 1
				901	@end example
				902
				903	@noindent
				904	No change in the grammar itself is required. Now the
				905	parser recognizes all valid declarations, according to the
				906	limited syntax above, transparently. In fact, the user does not even
				907	notice when the parser splits.
				908
				909	So here we have a case where we can use the benefits of @acronym{GLR},
				910	almost without disadvantages. Even in simple cases like this, however,
				911	there are at least two potential problems to beware. First, always
				912	analyze the conflicts reported by Bison to make sure that @acronym{GLR}
				913	splitting is only done where it is intended. A @acronym{GLR} parser
				914	splitting inadvertently may cause problems less obvious than an
				915	@acronym{LALR} parser statically choosing the wrong alternative in a
				916	conflict. Second, consider interactions with the lexer (@pxref{Semantic
				917	Tokens}) with great care. Since a split parser consumes tokens without
				918	performing any actions during the split, the lexer cannot obtain
				919	information via parser actions. Some cases of lexer interactions can be
				920	eliminated by using @acronym{GLR} to shift the complications from the
				921	lexer to the parser. You must check the remaining cases for
				922	correctness.
				923
				924	In our example, it would be safe for the lexer to return tokens based on
				925	their current meanings in some symbol table, because no new symbols are
				926	defined in the middle of a type declaration. Though it is possible for
				927	a parser to define the enumeration constants as they are parsed, before
				928	the type declaration is completed, it actually makes no difference since
				929	they cannot be used within the same enumerated type declaration.
				930
				931	@node Merging GLR Parses
				932	@subsection Using @acronym{GLR} to Resolve Ambiguities
				933	@cindex @acronym{GLR} parsing, ambiguous grammars
				934	@cindex generalized @acronym{LR} (@acronym{GLR}) parsing, ambiguous grammars
				935	@findex %dprec
				936	@findex %merge
				937	@cindex conflicts
				938	@cindex reduce/reduce conflicts
				939
				940	Let's consider an example, vastly simplified from a C++ grammar.
				941
				942	@example
				943	%@{
				944	#include <stdio.h>
				945	#define YYSTYPE char const *
				946	int yylex (void);
				947	void yyerror (char const *);
				948	%@}
				949
				950	%token TYPENAME ID
				951
				952	%right '='
				953	%left '+'
				954
				955	%glr-parser
				956
				957	%%
				958
				959	prog :
				960	\| prog stmt @{ printf ("\n"); @}
				961	;
				962
				963	stmt : expr ';' %dprec 1
				964	\| decl %dprec 2
				965	;
				966
				967	expr : ID @{ printf ("%s ", $$); @}
				968	\| TYPENAME '(' expr ')'
				969	@{ printf ("%s <cast> ", $1); @}
				970	\| expr '+' expr @{ printf ("+ "); @}
				971	\| expr '=' expr @{ printf ("= "); @}
				972	;
				973
				974	decl : TYPENAME declarator ';'
				975	@{ printf ("%s <declare> ", $1); @}
				976	\| TYPENAME declarator '=' expr ';'
				977	@{ printf ("%s <init-declare> ", $1); @}
				978	;
				979
				980	declarator : ID @{ printf ("\"%s\" ", $1); @}
				981	\| '(' declarator ')'
				982	;
				983	@end example
				984
				985	@noindent
				986	This models a problematic part of the C++ grammar---the ambiguity between
				987	certain declarations and statements. For example,
				988
				989	@example
				990	T (x) = y+z;
				991	@end example
				992
				993	@noindent
				994	parses as either an @code{expr} or a @code{stmt}
				995	(assuming that @samp{T} is recognized as a @code{TYPENAME} and
				996	@samp{x} as an @code{ID}).
				997	Bison detects this as a reduce/reduce conflict between the rules
				998	@code{expr : ID} and @code{declarator : ID}, which it cannot resolve at the
				999	time it encounters @code{x} in the example above. Since this is a
				1000	@acronym{GLR} parser, it therefore splits the problem into two parses, one for
				1001	each choice of resolving the reduce/reduce conflict.
				1002	Unlike the example from the previous section (@pxref{Simple GLR Parsers}),
				1003	however, neither of these parses ``dies,'' because the grammar as it stands is
				1004	ambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} and
				1005	the other reduces @code{stmt : decl}, after which both parsers are in an
				1006	identical state: they've seen @samp{prog stmt} and have the same unprocessed
				1007	input remaining. We say that these parses have @dfn{merged.}
				1008
				1009	At this point, the @acronym{GLR} parser requires a specification in the
				1010	grammar of how to choose between the competing parses.
				1011	In the example above, the two @code{%dprec}
				1012	declarations specify that Bison is to give precedence
				1013	to the parse that interprets the example as a
				1014	@code{decl}, which implies that @code{x} is a declarator.
				1015	The parser therefore prints
				1016
				1017	@example
				1018	"x" y z + T <init-declare>
				1019	@end example
				1020
				1021	The @code{%dprec} declarations only come into play when more than one
				1022	parse survives. Consider a different input string for this parser:
				1023
				1024	@example
				1025	T (x) + y;
				1026	@end example
				1027
				1028	@noindent
				1029	This is another example of using @acronym{GLR} to parse an unambiguous
				1030	construct, as shown in the previous section (@pxref{Simple GLR Parsers}).
				1031	Here, there is no ambiguity (this cannot be parsed as a declaration).
				1032	However, at the time the Bison parser encounters @code{x}, it does not
				1033	have enough information to resolve the reduce/reduce conflict (again,
				1034	between @code{x} as an @code{expr} or a @code{declarator}). In this
				1035	case, no precedence declaration is used. Again, the parser splits
				1036	into two, one assuming that @code{x} is an @code{expr}, and the other
				1037	assuming @code{x} is a @code{declarator}. The second of these parsers
				1038	then vanishes when it sees @code{+}, and the parser prints
				1039
				1040	@example
				1041	x T <cast> y +
				1042	@end example
				1043
				1044	Suppose that instead of resolving the ambiguity, you wanted to see all
				1045	the possibilities. For this purpose, you must merge the semantic
				1046	actions of the two possible parsers, rather than choosing one over the
				1047	other. To do so, you could change the declaration of @code{stmt} as
				1048	follows:
				1049
				1050	@example
				1051	stmt : expr ';' %merge <stmtMerge>
				1052	\| decl %merge <stmtMerge>
				1053	;
				1054	@end example
				1055
				1056	@noindent
				1057	and define the @code{stmtMerge} function as:
				1058
				1059	@example
				1060	static YYSTYPE
				1061	stmtMerge (YYSTYPE x0, YYSTYPE x1)
				1062	@{
				1063	printf ("<OR> ");
				1064	return "";
				1065	@}
				1066	@end example
				1067
				1068	@noindent
				1069	with an accompanying forward declaration
				1070	in the C declarations at the beginning of the file:
				1071
				1072	@example
				1073	%@{
				1074	#define YYSTYPE char const *
				1075	static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
				1076	%@}
				1077	@end example
				1078
				1079	@noindent
				1080	With these declarations, the resulting parser parses the first example
				1081	as both an @code{expr} and a @code{decl}, and prints
				1082
				1083	@example
				1084	"x" y z + T <init-declare> x T <cast> y z + = <OR>
				1085	@end example
				1086
				1087	Bison requires that all of the
				1088	productions that participate in any particular merge have identical
				1089	@samp{%merge} clauses. Otherwise, the ambiguity would be unresolvable,
				1090	and the parser will report an error during any parse that results in
				1091	the offending merge.
				1092
				1093	@node GLR Semantic Actions
				1094	@subsection GLR Semantic Actions
				1095
				1096	@cindex deferred semantic actions
				1097	By definition, a deferred semantic action is not performed at the same time as
				1098	the associated reduction.
				1099	This raises caveats for several Bison features you might use in a semantic
				1100	action in a @acronym{GLR} parser.
				1101
				1102	@vindex yychar
				1103	@cindex @acronym{GLR} parsers and @code{yychar}
				1104	@vindex yylval
				1105	@cindex @acronym{GLR} parsers and @code{yylval}
				1106	@vindex yylloc
				1107	@cindex @acronym{GLR} parsers and @code{yylloc}
				1108	In any semantic action, you can examine @code{yychar} to determine the type of
				1109	the look-ahead token present at the time of the associated reduction.
				1110	After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF},
				1111	you can then examine @code{yylval} and @code{yylloc} to determine the
				1112	look-ahead token's semantic value and location, if any.
				1113	In a nondeferred semantic action, you can also modify any of these variables to
				1114	influence syntax analysis.
				1115	@xref{Look-Ahead, ,Look-Ahead Tokens}.
				1116
				1117	@findex yyclearin
				1118	@cindex @acronym{GLR} parsers and @code{yyclearin}
				1119	In a deferred semantic action, it's too late to influence syntax analysis.
				1120	In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to
				1121	shallow copies of the values they had at the time of the associated reduction.
				1122	For this reason alone, modifying them is dangerous.
				1123	Moreover, the result of modifying them is undefined and subject to change with
				1124	future versions of Bison.
				1125	For example, if a semantic action might be deferred, you should never write it
				1126	to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free
				1127	memory referenced by @code{yylval}.
				1128
				1129	@findex YYERROR
				1130	@cindex @acronym{GLR} parsers and @code{YYERROR}
				1131	Another Bison feature requiring special consideration is @code{YYERROR}
				1132	(@pxref{Action Features}), which you can invoke in a semantic action to
				1133	initiate error recovery.
				1134	During deterministic @acronym{GLR} operation, the effect of @code{YYERROR} is
				1135	the same as its effect in an @acronym{LALR}(1) parser.
				1136	In a deferred semantic action, its effect is undefined.
				1137	@c The effect is probably a syntax error at the split point.
				1138
				1139	Also, see @ref{Location Default Action, ,Default Action for Locations}, which
				1140	describes a special usage of @code{YYLLOC_DEFAULT} in @acronym{GLR} parsers.
				1141
				1142	@node Compiler Requirements
				1143	@subsection Considerations when Compiling @acronym{GLR} Parsers
				1144	@cindex @code{inline}
				1145	@cindex @acronym{GLR} parsers and @code{inline}
				1146
				1147	The @acronym{GLR} parsers require a compiler for @acronym{ISO} C89 or
				1148	later. In addition, they use the @code{inline} keyword, which is not
				1149	C89, but is C99 and is a common extension in pre-C99 compilers. It is
				1150	up to the user of these parsers to handle
				1151	portability issues. For instance, if using Autoconf and the Autoconf
				1152	macro @code{AC_C_INLINE}, a mere
				1153
				1154	@example
				1155	%@{
				1156	#include <config.h>
				1157	%@}
				1158	@end example
				1159
				1160	@noindent
				1161	will suffice. Otherwise, we suggest
				1162
				1163	@example
				1164	%@{
				1165	#if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline
				1166	#define inline
				1167	#endif
				1168	%@}
				1169	@end example
				1170
				1171	@node Locations Overview
				1172	@section Locations
				1173	@cindex location
				1174	@cindex textual location
				1175	@cindex location, textual
				1176
				1177	Many applications, like interpreters or compilers, have to produce verbose
				1178	and useful error messages. To achieve this, one must be able to keep track of
				1179	the @dfn{textual location}, or @dfn{location}, of each syntactic construct.
				1180	Bison provides a mechanism for handling these locations.
				1181
				1182	Each token has a semantic value. In a similar fashion, each token has an
				1183	associated location, but the type of locations is the same for all tokens and
				1184	groupings. Moreover, the output parser is equipped with a default data
				1185	structure for storing locations (@pxref{Locations}, for more details).
				1186
				1187	Like semantic values, locations can be reached in actions using a dedicated
				1188	set of constructs. In the example above, the location of the whole grouping
				1189	is @code{@@$}, while the locations of the subexpressions are @code{@@1} and
				1190	@code{@@3}.
				1191
				1192	When a rule is matched, a default action is used to compute the semantic value
				1193	of its left hand side (@pxref{Actions}). In the same way, another default
				1194	action is used for locations. However, the action for locations is general
				1195	enough for most cases, meaning there is usually no need to describe for each
				1196	rule how @code{@@$} should be formed. When building a new location for a given
				1197	grouping, the default behavior of the output parser is to take the beginning
				1198	of the first symbol, and the end of the last symbol.
				1199
				1200	@node Bison Parser
				1201	@section Bison Output: the Parser File
				1202	@cindex Bison parser
				1203	@cindex Bison utility
				1204	@cindex lexical analyzer, purpose
				1205	@cindex parser
				1206
				1207	When you run Bison, you give it a Bison grammar file as input. The output
				1208	is a C source file that parses the language described by the grammar.
				1209	This file is called a @dfn{Bison parser}. Keep in mind that the Bison
				1210	utility and the Bison parser are two distinct programs: the Bison utility
				1211	is a program whose output is the Bison parser that becomes part of your
				1212	program.
				1213
				1214	The job of the Bison parser is to group tokens into groupings according to
				1215	the grammar rules---for example, to build identifiers and operators into
				1216	expressions. As it does this, it runs the actions for the grammar rules it
				1217	uses.
				1218
				1219	The tokens come from a function called the @dfn{lexical analyzer} that
				1220	you must supply in some fashion (such as by writing it in C). The Bison
				1221	parser calls the lexical analyzer each time it wants a new token. It
				1222	doesn't know what is ``inside'' the tokens (though their semantic values
				1223	may reflect this). Typically the lexical analyzer makes the tokens by
				1224	parsing characters of text, but Bison does not depend on this.
				1225	@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
				1226
				1227	The Bison parser file is C code which defines a function named
				1228	@code{yyparse} which implements that grammar. This function does not make
				1229	a complete C program: you must supply some additional functions. One is
				1230	the lexical analyzer. Another is an error-reporting function which the
				1231	parser calls to report an error. In addition, a complete C program must
				1232	start with a function called @code{main}; you have to provide this, and
				1233	arrange for it to call @code{yyparse} or the parser will never run.
				1234	@xref{Interface, ,Parser C-Language Interface}.
				1235
				1236	Aside from the token type names and the symbols in the actions you
				1237	write, all symbols defined in the Bison parser file itself
				1238	begin with @samp{yy} or @samp{YY}. This includes interface functions
				1239	such as the lexical analyzer function @code{yylex}, the error reporting
				1240	function @code{yyerror} and the parser function @code{yyparse} itself.
				1241	This also includes numerous identifiers used for internal purposes.
				1242	Therefore, you should avoid using C identifiers starting with @samp{yy}
				1243	or @samp{YY} in the Bison grammar file except for the ones defined in
				1244	this manual. Also, you should avoid using the C identifiers
				1245	@samp{malloc} and @samp{free} for anything other than their usual
				1246	meanings.
				1247
				1248	In some cases the Bison parser file includes system headers, and in
				1249	those cases your code should respect the identifiers reserved by those
				1250	headers. On some non-@acronym{GNU} hosts, @code{<alloca.h>}, @code{<malloc.h>},
				1251	@code{<stddef.h>}, and @code{<stdlib.h>} are included as needed to
				1252	declare memory allocators and related types. @code{<libintl.h>} is
				1253	included if message translation is in use
				1254	(@pxref{Internationalization}). Other system headers may
				1255	be included if you define @code{YYDEBUG} to a nonzero value
				1256	(@pxref{Tracing, ,Tracing Your Parser}).
				1257
				1258	@node Stages
				1259	@section Stages in Using Bison
				1260	@cindex stages in using Bison
				1261	@cindex using Bison
				1262
				1263	The actual language-design process using Bison, from grammar specification
				1264	to a working compiler or interpreter, has these parts:
				1265
				1266	@enumerate
				1267	@item
				1268	Formally specify the grammar in a form recognized by Bison
				1269	(@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule
				1270	in the language, describe the action that is to be taken when an
				1271	instance of that rule is recognized. The action is described by a
				1272	sequence of C statements.
				1273
				1274	@item
				1275	Write a lexical analyzer to process input and pass tokens to the parser.
				1276	The lexical analyzer may be written by hand in C (@pxref{Lexical, ,The
				1277	Lexical Analyzer Function @code{yylex}}). It could also be produced
				1278	using Lex, but the use of Lex is not discussed in this manual.
				1279
				1280	@item
				1281	Write a controlling function that calls the Bison-produced parser.
				1282
				1283	@item
				1284	Write error-reporting routines.
				1285	@end enumerate
				1286
				1287	To turn this source code as written into a runnable program, you
				1288	must follow these steps:
				1289
				1290	@enumerate
				1291	@item
				1292	Run Bison on the grammar to produce the parser.
				1293
				1294	@item
				1295	Compile the code output by Bison, as well as any other source files.
				1296
				1297	@item
				1298	Link the object files to produce the finished product.
				1299	@end enumerate
				1300
				1301	@node Grammar Layout
				1302	@section The Overall Layout of a Bison Grammar
				1303	@cindex grammar file
				1304	@cindex file format
				1305	@cindex format of grammar file
				1306	@cindex layout of Bison grammar
				1307
				1308	The input file for the Bison utility is a @dfn{Bison grammar file}. The
				1309	general form of a Bison grammar file is as follows:
				1310
				1311	@example
				1312	%@{
				1313	@var{Prologue}
				1314	%@}
				1315
				1316	@var{Bison declarations}
				1317
				1318	%%
				1319	@var{Grammar rules}
				1320	%%
				1321	@var{Epilogue}
				1322	@end example
				1323
				1324	@noindent
				1325	The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
				1326	in every Bison grammar file to separate the sections.
				1327
				1328	The prologue may define types and variables used in the actions. You can
				1329	also use preprocessor commands to define macros used there, and use
				1330	@code{#include} to include header files that do any of these things.
				1331	You need to declare the lexical analyzer @code{yylex} and the error
				1332	printer @code{yyerror} here, along with any other global identifiers
				1333	used by the actions in the grammar rules.
				1334
				1335	The Bison declarations declare the names of the terminal and nonterminal
				1336	symbols, and may also describe operator precedence and the data types of
				1337	semantic values of various symbols.
				1338
				1339	The grammar rules define how to construct each nonterminal symbol from its
				1340	parts.
				1341
				1342	The epilogue can contain any code you want to use. Often the
				1343	definitions of functions declared in the prologue go here. In a
				1344	simple program, all the rest of the program can go here.
				1345
				1346	@node Examples
				1347	@chapter Examples
				1348	@cindex simple examples
				1349	@cindex examples, simple
				1350
				1351	Now we show and explain three sample programs written using Bison: a
				1352	reverse polish notation calculator, an algebraic (infix) notation
				1353	calculator, and a multi-function calculator. All three have been tested
				1354	under BSD Unix 4.3; each produces a usable, though limited, interactive
				1355	desk-top calculator.
				1356
				1357	These examples are simple, but Bison grammars for real programming
				1358	languages are written the same way. You can copy these examples into a
				1359	source file to try them.
				1360
				1361	@menu
				1362	* RPN Calc:: Reverse polish notation calculator;
				1363	a first example with no operator precedence.
				1364	* Infix Calc:: Infix (algebraic) notation calculator.
				1365	Operator precedence is introduced.
				1366	* Simple Error Recovery:: Continuing after syntax errors.
				1367	* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
				1368	* Multi-function Calc:: Calculator with memory and trig functions.
				1369	It uses multiple data-types for semantic values.
				1370	* Exercises:: Ideas for improving the multi-function calculator.
				1371	@end menu
				1372
				1373	@node RPN Calc
				1374	@section Reverse Polish Notation Calculator
				1375	@cindex reverse polish notation
				1376	@cindex polish notation calculator
				1377	@cindex @code{rpcalc}
				1378	@cindex calculator, simple
				1379
				1380	The first example is that of a simple double-precision @dfn{reverse polish
				1381	notation} calculator (a calculator using postfix operators). This example
				1382	provides a good starting point, since operator precedence is not an issue.
				1383	The second example will illustrate how operator precedence is handled.
				1384
				1385	The source code for this calculator is named @file{rpcalc.y}. The
				1386	@samp{.y} extension is a convention used for Bison input files.
				1387
				1388	@menu
				1389	* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc.
				1390	* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
				1391	* Lexer: Rpcalc Lexer. The lexical analyzer.
				1392	* Main: Rpcalc Main. The controlling function.
				1393	* Error: Rpcalc Error. The error reporting function.
				1394	* Gen: Rpcalc Gen. Running Bison on the grammar file.
				1395	* Comp: Rpcalc Compile. Run the C compiler on the output code.
				1396	@end menu
				1397
				1398	@node Rpcalc Decls
				1399	@subsection Declarations for @code{rpcalc}
				1400
				1401	Here are the C and Bison declarations for the reverse polish notation
				1402	calculator. As in C, comments are placed between @samp{/@dots{}/}.
				1403
				1404	@example
				1405	/* Reverse polish notation calculator. */
				1406
				1407	%@{
				1408	#define YYSTYPE double
				1409	#include <math.h>
				1410	int yylex (void);
				1411	void yyerror (char const *);
				1412	%@}
				1413
				1414	%token NUM
				1415
				1416	%% /* Grammar rules and actions follow. */
				1417	@end example
				1418
				1419	The declarations section (@pxref{Prologue, , The prologue}) contains two
				1420	preprocessor directives and two forward declarations.
				1421
				1422	The @code{#define} directive defines the macro @code{YYSTYPE}, thus
				1423	specifying the C data type for semantic values of both tokens and
				1424	groupings (@pxref{Value Type, ,Data Types of Semantic Values}). The
				1425	Bison parser will use whatever type @code{YYSTYPE} is defined as; if you
				1426	don't define it, @code{int} is the default. Because we specify
				1427	@code{double}, each token and each expression has an associated value,
				1428	which is a floating point number.
				1429
				1430	The @code{#include} directive is used to declare the exponentiation
				1431	function @code{pow}.
				1432
				1433	The forward declarations for @code{yylex} and @code{yyerror} are
				1434	needed because the C language requires that functions be declared
				1435	before they are used. These functions will be defined in the
				1436	epilogue, but the parser calls them so they must be declared in the
				1437	prologue.
				1438
				1439	The second section, Bison declarations, provides information to Bison
				1440	about the token types (@pxref{Bison Declarations, ,The Bison
				1441	Declarations Section}). Each terminal symbol that is not a
				1442	single-character literal must be declared here. (Single-character
				1443	literals normally don't need to be declared.) In this example, all the
				1444	arithmetic operators are designated by single-character literals, so the
				1445	only terminal symbol that needs to be declared is @code{NUM}, the token
				1446	type for numeric constants.
				1447
				1448	@node Rpcalc Rules
				1449	@subsection Grammar Rules for @code{rpcalc}
				1450
				1451	Here are the grammar rules for the reverse polish notation calculator.
				1452
				1453	@example
				1454	input: /* empty */
				1455	\| input line
				1456	;
				1457
				1458	line: '\n'
				1459	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
				1460	;
				1461
				1462	exp: NUM @{ $$ = $1; @}
				1463	\| exp exp '+' @{ $$ = $1 + $2; @}
				1464	\| exp exp '-' @{ $$ = $1 - $2; @}
				1465	\| exp exp '' @{ $$ = $1 $2; @}
				1466	\| exp exp '/' @{ $$ = $1 / $2; @}
				1467	/* Exponentiation */
				1468	\| exp exp '^' @{ $$ = pow ($1, $2); @}
				1469	/* Unary minus */
				1470	\| exp 'n' @{ $$ = -$1; @}
				1471	;
				1472	%%
				1473	@end example
				1474
				1475	The groupings of the rpcalc ``language'' defined here are the expression
				1476	(given the name @code{exp}), the line of input (@code{line}), and the
				1477	complete input transcript (@code{input}). Each of these nonterminal
				1478	symbols has several alternate rules, joined by the vertical bar @samp{\|}
				1479	which is read as ``or''. The following sections explain what these rules
				1480	mean.
				1481
				1482	The semantics of the language is determined by the actions taken when a
				1483	grouping is recognized. The actions are the C code that appears inside
				1484	braces. @xref{Actions}.
				1485
				1486	You must specify these actions in C, but Bison provides the means for
				1487	passing semantic values between the rules. In each action, the
				1488	pseudo-variable @code{$$} stands for the semantic value for the grouping
				1489	that the rule is going to construct. Assigning a value to @code{$$} is the
				1490	main job of most actions. The semantic values of the components of the
				1491	rule are referred to as @code{$1}, @code{$2}, and so on.
				1492
				1493	@menu
				1494	* Rpcalc Input::
				1495	* Rpcalc Line::
				1496	* Rpcalc Expr::
				1497	@end menu
				1498
				1499	@node Rpcalc Input
				1500	@subsubsection Explanation of @code{input}
				1501
				1502	Consider the definition of @code{input}:
				1503
				1504	@example
				1505	input: /* empty */
				1506	\| input line
				1507	;
				1508	@end example
				1509
				1510	This definition reads as follows: ``A complete input is either an empty
				1511	string, or a complete input followed by an input line''. Notice that
				1512	``complete input'' is defined in terms of itself. This definition is said
				1513	to be @dfn{left recursive} since @code{input} appears always as the
				1514	leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}.
				1515
				1516	The first alternative is empty because there are no symbols between the
				1517	colon and the first @samp{\|}; this means that @code{input} can match an
				1518	empty string of input (no tokens). We write the rules this way because it
				1519	is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
				1520	It's conventional to put an empty alternative first and write the comment
				1521	@samp{/* empty */} in it.
				1522
				1523	The second alternate rule (@code{input line}) handles all nontrivial input.
				1524	It means, ``After reading any number of lines, read one more line if
				1525	possible.'' The left recursion makes this rule into a loop. Since the
				1526	first alternative matches empty input, the loop can be executed zero or
				1527	more times.
				1528
				1529	The parser function @code{yyparse} continues to process input until a
				1530	grammatical error is seen or the lexical analyzer says there are no more
				1531	input tokens; we will arrange for the latter to happen at end-of-input.
				1532
				1533	@node Rpcalc Line
				1534	@subsubsection Explanation of @code{line}
				1535
				1536	Now consider the definition of @code{line}:
				1537
				1538	@example
				1539	line: '\n'
				1540	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
				1541	;
				1542	@end example
				1543
				1544	The first alternative is a token which is a newline character; this means
				1545	that rpcalc accepts a blank line (and ignores it, since there is no
				1546	action). The second alternative is an expression followed by a newline.
				1547	This is the alternative that makes rpcalc useful. The semantic value of
				1548	the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
				1549	question is the first symbol in the alternative. The action prints this
				1550	value, which is the result of the computation the user asked for.
				1551
				1552	This action is unusual because it does not assign a value to @code{$$}. As
				1553	a consequence, the semantic value associated with the @code{line} is
				1554	uninitialized (its value will be unpredictable). This would be a bug if
				1555	that value were ever used, but we don't use it: once rpcalc has printed the
				1556	value of the user's input line, that value is no longer needed.
				1557
				1558	@node Rpcalc Expr
				1559	@subsubsection Explanation of @code{expr}
				1560
				1561	The @code{exp} grouping has several rules, one for each kind of expression.
				1562	The first rule handles the simplest expressions: those that are just numbers.
				1563	The second handles an addition-expression, which looks like two expressions
				1564	followed by a plus-sign. The third handles subtraction, and so on.
				1565
				1566	@example
				1567	exp: NUM
				1568	\| exp exp '+' @{ $$ = $1 + $2; @}
				1569	\| exp exp '-' @{ $$ = $1 - $2; @}
				1570	@dots{}
				1571	;
				1572	@end example
				1573
				1574	We have used @samp{\|} to join all the rules for @code{exp}, but we could
				1575	equally well have written them separately:
				1576
				1577	@example
				1578	exp: NUM ;
				1579	exp: exp exp '+' @{ $$ = $1 + $2; @} ;
				1580	exp: exp exp '-' @{ $$ = $1 - $2; @} ;
				1581	@dots{}
				1582	@end example
				1583
				1584	Most of the rules have actions that compute the value of the expression in
				1585	terms of the value of its parts. For example, in the rule for addition,
				1586	@code{$1} refers to the first component @code{exp} and @code{$2} refers to
				1587	the second one. The third component, @code{'+'}, has no meaningful
				1588	associated semantic value, but if it had one you could refer to it as
				1589	@code{$3}. When @code{yyparse} recognizes a sum expression using this
				1590	rule, the sum of the two subexpressions' values is produced as the value of
				1591	the entire expression. @xref{Actions}.
				1592
				1593	You don't have to give an action for every rule. When a rule has no
				1594	action, Bison by default copies the value of @code{$1} into @code{$$}.
				1595	This is what happens in the first rule (the one that uses @code{NUM}).
				1596
				1597	The formatting shown here is the recommended convention, but Bison does
				1598	not require it. You can add or change white space as much as you wish.
				1599	For example, this:
				1600
				1601	@example
				1602	exp : NUM \| exp exp '+' @{$$ = $1 + $2; @} \| @dots{} ;
				1603	@end example
				1604
				1605	@noindent
				1606	means the same thing as this:
				1607
				1608	@example
				1609	exp: NUM
				1610	\| exp exp '+' @{ $$ = $1 + $2; @}
				1611	\| @dots{}
				1612	;
				1613	@end example
				1614
				1615	@noindent
				1616	The latter, however, is much more readable.
				1617
				1618	@node Rpcalc Lexer
				1619	@subsection The @code{rpcalc} Lexical Analyzer
				1620	@cindex writing a lexical analyzer
				1621	@cindex lexical analyzer, writing
				1622
				1623	The lexical analyzer's job is low-level parsing: converting characters
				1624	or sequences of characters into tokens. The Bison parser gets its
				1625	tokens by calling the lexical analyzer. @xref{Lexical, ,The Lexical
				1626	Analyzer Function @code{yylex}}.
				1627
				1628	Only a simple lexical analyzer is needed for the @acronym{RPN}
				1629	calculator. This
				1630	lexical analyzer skips blanks and tabs, then reads in numbers as
				1631	@code{double} and returns them as @code{NUM} tokens. Any other character
				1632	that isn't part of a number is a separate token. Note that the token-code
				1633	for such a single-character token is the character itself.
				1634
				1635	The return value of the lexical analyzer function is a numeric code which
				1636	represents a token type. The same text used in Bison rules to stand for
				1637	this token type is also a C expression for the numeric code for the type.
				1638	This works in two ways. If the token type is a character literal, then its
				1639	numeric code is that of the character; you can use the same
				1640	character literal in the lexical analyzer to express the number. If the
				1641	token type is an identifier, that identifier is defined by Bison as a C
				1642	macro whose definition is the appropriate number. In this example,
				1643	therefore, @code{NUM} becomes a macro for @code{yylex} to use.
				1644
				1645	The semantic value of the token (if it has one) is stored into the
				1646	global variable @code{yylval}, which is where the Bison parser will look
				1647	for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was
				1648	defined at the beginning of the grammar; @pxref{Rpcalc Decls,
				1649	,Declarations for @code{rpcalc}}.)
				1650
				1651	A token type code of zero is returned if the end-of-input is encountered.
				1652	(Bison recognizes any nonpositive value as indicating end-of-input.)
				1653
				1654	Here is the code for the lexical analyzer:
				1655
				1656	@example
				1657	@group
				1658	/* The lexical analyzer returns a double floating point
				1659	number on the stack and the token NUM, or the numeric code
				1660	of the character read if not a number. It skips all blanks
				1661	and tabs, and returns 0 for end-of-input. */
				1662
				1663	#include <ctype.h>
				1664	@end group
				1665
				1666	@group
				1667	int
				1668	yylex (void)
				1669	@{
				1670	int c;
				1671
				1672	/* Skip white space. */
				1673	while ((c = getchar ()) == ' ' \|\| c == '\t')
				1674	;
				1675	@end group
				1676	@group
				1677	/* Process numbers. */
				1678	if (c == '.' \|\| isdigit (c))
				1679	@{
				1680	ungetc (c, stdin);
				1681	scanf ("%lf", &yylval);
				1682	return NUM;
				1683	@}
				1684	@end group
				1685	@group
				1686	/* Return end-of-input. */
				1687	if (c == EOF)
				1688	return 0;
				1689	/* Return a single char. */
				1690	return c;
				1691	@}
				1692	@end group
				1693	@end example
				1694
				1695	@node Rpcalc Main
				1696	@subsection The Controlling Function
				1697	@cindex controlling function
				1698	@cindex main function in simple example
				1699
				1700	In keeping with the spirit of this example, the controlling function is
				1701	kept to the bare minimum. The only requirement is that it call
				1702	@code{yyparse} to start the process of parsing.
				1703
				1704	@example
				1705	@group
				1706	int
				1707	main (void)
				1708	@{
				1709	return yyparse ();
				1710	@}
				1711	@end group
				1712	@end example
				1713
				1714	@node Rpcalc Error
				1715	@subsection The Error Reporting Routine
				1716	@cindex error reporting routine
				1717
				1718	When @code{yyparse} detects a syntax error, it calls the error reporting
				1719	function @code{yyerror} to print an error message (usually but not
				1720	always @code{"syntax error"}). It is up to the programmer to supply
				1721	@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so
				1722	here is the definition we will use:
				1723
				1724	@example
				1725	@group
				1726	#include <stdio.h>
				1727
				1728	/* Called by yyparse on error. */
				1729	void
				1730	yyerror (char const *s)
				1731	@{
				1732	fprintf (stderr, "%s\n", s);
				1733	@}
				1734	@end group
				1735	@end example
				1736
				1737	After @code{yyerror} returns, the Bison parser may recover from the error
				1738	and continue parsing if the grammar contains a suitable error rule
				1739	(@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We
				1740	have not written any error rules in this example, so any invalid input will
				1741	cause the calculator program to exit. This is not clean behavior for a
				1742	real calculator, but it is adequate for the first example.
				1743
				1744	@node Rpcalc Gen
				1745	@subsection Running Bison to Make the Parser
				1746	@cindex running Bison (introduction)
				1747
				1748	Before running Bison to produce a parser, we need to decide how to
				1749	arrange all the source code in one or more source files. For such a
				1750	simple example, the easiest thing is to put everything in one file. The
				1751	definitions of @code{yylex}, @code{yyerror} and @code{main} go at the
				1752	end, in the epilogue of the file
				1753	(@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}).
				1754
				1755	For a large project, you would probably have several source files, and use
				1756	@code{make} to arrange to recompile them.
				1757
				1758	With all the source in a single file, you use the following command to
				1759	convert it into a parser file:
				1760
				1761	@example
				1762	bison @var{file}.y
				1763	@end example
				1764
				1765	@noindent
				1766	In this example the file was called @file{rpcalc.y} (for ``Reverse Polish
				1767	@sc{calc}ulator''). Bison produces a file named @file{@var{file}.tab.c},
				1768	removing the @samp{.y} from the original file name. The file output by
				1769	Bison contains the source code for @code{yyparse}. The additional
				1770	functions in the input file (@code{yylex}, @code{yyerror} and @code{main})
				1771	are copied verbatim to the output.
				1772
				1773	@node Rpcalc Compile
				1774	@subsection Compiling the Parser File
				1775	@cindex compiling the parser
				1776
				1777	Here is how to compile and run the parser file:
				1778
				1779	@example
				1780	@group
				1781	# @r{List files in current directory.}
				1782	$ @kbd{ls}
				1783	rpcalc.tab.c rpcalc.y
				1784	@end group
				1785
				1786	@group
				1787	# @r{Compile the Bison parser.}
				1788	# @r{@samp{-lm} tells compiler to search math library for @code{pow}.}
				1789	$ @kbd{cc -lm -o rpcalc rpcalc.tab.c}
				1790	@end group
				1791
				1792	@group
				1793	# @r{List files again.}
				1794	$ @kbd{ls}
				1795	rpcalc rpcalc.tab.c rpcalc.y
				1796	@end group
				1797	@end example
				1798
				1799	The file @file{rpcalc} now contains the executable code. Here is an
				1800	example session using @code{rpcalc}.
				1801
				1802	@example
				1803	$ @kbd{rpcalc}
				1804	@kbd{4 9 +}
				1805	13
				1806	@kbd{3 7 + 3 4 5 *+-}
				1807	-13
				1808	@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}}
				1809	13
				1810	@kbd{5 6 / 4 n +}
				1811	-3.166666667
				1812	@kbd{3 4 ^} @r{Exponentiation}
				1813	81
				1814	@kbd{^D} @r{End-of-file indicator}
				1815	$
				1816	@end example
				1817
				1818	@node Infix Calc
				1819	@section Infix Notation Calculator: @code{calc}
				1820	@cindex infix notation calculator
				1821	@cindex @code{calc}
				1822	@cindex calculator, infix notation
				1823
				1824	We now modify rpcalc to handle infix operators instead of postfix. Infix
				1825	notation involves the concept of operator precedence and the need for
				1826	parentheses nested to arbitrary depth. Here is the Bison code for
				1827	@file{calc.y}, an infix desk-top calculator.
				1828
				1829	@example
				1830	/* Infix notation calculator. */
				1831
				1832	%@{
				1833	#define YYSTYPE double
				1834	#include <math.h>
				1835	#include <stdio.h>
				1836	int yylex (void);
				1837	void yyerror (char const *);
				1838	%@}
				1839
				1840	/* Bison declarations. */
				1841	%token NUM
				1842	%left '-' '+'
				1843	%left '*' '/'
				1844	%left NEG /* negation--unary minus */
				1845	%right '^' /* exponentiation */
				1846
				1847	%% /* The grammar follows. */
				1848	input: /* empty */
				1849	\| input line
				1850	;
				1851
				1852	line: '\n'
				1853	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
				1854	;
				1855
				1856	exp: NUM @{ $$ = $1; @}
				1857	\| exp '+' exp @{ $$ = $1 + $3; @}
				1858	\| exp '-' exp @{ $$ = $1 - $3; @}
				1859	\| exp '' exp @{ $$ = $1 $3; @}
				1860	\| exp '/' exp @{ $$ = $1 / $3; @}
				1861	\| '-' exp %prec NEG @{ $$ = -$2; @}
				1862	\| exp '^' exp @{ $$ = pow ($1, $3); @}
				1863	\| '(' exp ')' @{ $$ = $2; @}
				1864	;
				1865	%%
				1866	@end example
				1867
				1868	@noindent
				1869	The functions @code{yylex}, @code{yyerror} and @code{main} can be the
				1870	same as before.
				1871
				1872	There are two important new features shown in this code.
				1873
				1874	In the second section (Bison declarations), @code{%left} declares token
				1875	types and says they are left-associative operators. The declarations
				1876	@code{%left} and @code{%right} (right associativity) take the place of
				1877	@code{%token} which is used to declare a token type name without
				1878	associativity. (These tokens are single-character literals, which
				1879	ordinarily don't need to be declared. We declare them here to specify
				1880	the associativity.)
				1881
				1882	Operator precedence is determined by the line ordering of the
				1883	declarations; the higher the line number of the declaration (lower on
				1884	the page or screen), the higher the precedence. Hence, exponentiation
				1885	has the highest precedence, unary minus (@code{NEG}) is next, followed
				1886	by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator
				1887	Precedence}.
				1888
				1889	The other important new feature is the @code{%prec} in the grammar
				1890	section for the unary minus operator. The @code{%prec} simply instructs
				1891	Bison that the rule @samp{\| '-' exp} has the same precedence as
				1892	@code{NEG}---in this case the next-to-highest. @xref{Contextual
				1893	Precedence, ,Context-Dependent Precedence}.
				1894
				1895	Here is a sample run of @file{calc.y}:
				1896
				1897	@need 500
				1898	@example
				1899	$ @kbd{calc}
				1900	@kbd{4 + 4.5 - (34/(8*3+-3))}
				1901	6.880952381
				1902	@kbd{-56 + 2}
				1903	-54
				1904	@kbd{3 ^ 2}
				1905	9
				1906	@end example
				1907
				1908	@node Simple Error Recovery
				1909	@section Simple Error Recovery
				1910	@cindex error recovery, simple
				1911
				1912	Up to this point, this manual has not addressed the issue of @dfn{error
				1913	recovery}---how to continue parsing after the parser detects a syntax
				1914	error. All we have handled is error reporting with @code{yyerror}.
				1915	Recall that by default @code{yyparse} returns after calling
				1916	@code{yyerror}. This means that an erroneous input line causes the
				1917	calculator program to exit. Now we show how to rectify this deficiency.
				1918
				1919	The Bison language itself includes the reserved word @code{error}, which
				1920	may be included in the grammar rules. In the example below it has
				1921	been added to one of the alternatives for @code{line}:
				1922
				1923	@example
				1924	@group
				1925	line: '\n'
				1926	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
				1927	\| error '\n' @{ yyerrok; @}
				1928	;
				1929	@end group
				1930	@end example
				1931
				1932	This addition to the grammar allows for simple error recovery in the
				1933	event of a syntax error. If an expression that cannot be evaluated is
				1934	read, the error will be recognized by the third rule for @code{line},
				1935	and parsing will continue. (The @code{yyerror} function is still called
				1936	upon to print its message as well.) The action executes the statement
				1937	@code{yyerrok}, a macro defined automatically by Bison; its meaning is
				1938	that error recovery is complete (@pxref{Error Recovery}). Note the
				1939	difference between @code{yyerrok} and @code{yyerror}; neither one is a
				1940	misprint.
				1941
				1942	This form of error recovery deals with syntax errors. There are other
				1943	kinds of errors; for example, division by zero, which raises an exception
				1944	signal that is normally fatal. A real calculator program must handle this
				1945	signal and use @code{longjmp} to return to @code{main} and resume parsing
				1946	input lines; it would also have to discard the rest of the current line of
				1947	input. We won't discuss this issue further because it is not specific to
				1948	Bison programs.
				1949
				1950	@node Location Tracking Calc
				1951	@section Location Tracking Calculator: @code{ltcalc}
				1952	@cindex location tracking calculator
				1953	@cindex @code{ltcalc}
				1954	@cindex calculator, location tracking
				1955
				1956	This example extends the infix notation calculator with location
				1957	tracking. This feature will be used to improve the error messages. For
				1958	the sake of clarity, this example is a simple integer calculator, since
				1959	most of the work needed to use locations will be done in the lexical
				1960	analyzer.
				1961
				1962	@menu
				1963	* Decls: Ltcalc Decls. Bison and C declarations for ltcalc.
				1964	* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations.
				1965	* Lexer: Ltcalc Lexer. The lexical analyzer.
				1966	@end menu
				1967
				1968	@node Ltcalc Decls
				1969	@subsection Declarations for @code{ltcalc}
				1970
				1971	The C and Bison declarations for the location tracking calculator are
				1972	the same as the declarations for the infix notation calculator.
				1973
				1974	@example
				1975	/* Location tracking calculator. */
				1976
				1977	%@{
				1978	#define YYSTYPE int
				1979	#include <math.h>
				1980	int yylex (void);
				1981	void yyerror (char const *);
				1982	%@}
				1983
				1984	/* Bison declarations. */
				1985	%token NUM
				1986
				1987	%left '-' '+'
				1988	%left '*' '/'
				1989	%left NEG
				1990	%right '^'
				1991
				1992	%% /* The grammar follows. */
				1993	@end example
				1994
				1995	@noindent
				1996	Note there are no declarations specific to locations. Defining a data
				1997	type for storing locations is not needed: we will use the type provided
				1998	by default (@pxref{Location Type, ,Data Types of Locations}), which is a
				1999	four member structure with the following integer fields:
				2000	@code{first_line}, @code{first_column}, @code{last_line} and
				2001	@code{last_column}.
				2002
				2003	@node Ltcalc Rules
				2004	@subsection Grammar Rules for @code{ltcalc}
				2005
				2006	Whether handling locations or not has no effect on the syntax of your
				2007	language. Therefore, grammar rules for this example will be very close
				2008	to those of the previous example: we will only modify them to benefit
				2009	from the new information.
				2010
				2011	Here, we will use locations to report divisions by zero, and locate the
				2012	wrong expressions or subexpressions.
				2013
				2014	@example
				2015	@group
				2016	input : /* empty */
				2017	\| input line
				2018	;
				2019	@end group
				2020
				2021	@group
				2022	line : '\n'
				2023	\| exp '\n' @{ printf ("%d\n", $1); @}
				2024	;
				2025	@end group
				2026
				2027	@group
				2028	exp : NUM @{ $$ = $1; @}
				2029	\| exp '+' exp @{ $$ = $1 + $3; @}
				2030	\| exp '-' exp @{ $$ = $1 - $3; @}
				2031	\| exp '' exp @{ $$ = $1 $3; @}
				2032	@end group
				2033	@group
				2034	\| exp '/' exp
				2035	@{
				2036	if ($3)
				2037	$$ = $1 / $3;
				2038	else
				2039	@{
				2040	$$ = 1;
				2041	fprintf (stderr, "%d.%d-%d.%d: division by zero",
				2042	@@3.first_line, @@3.first_column,
				2043	@@3.last_line, @@3.last_column);
				2044	@}
				2045	@}
				2046	@end group
				2047	@group
				2048	\| '-' exp %preg NEG @{ $$ = -$2; @}
				2049	\| exp '^' exp @{ $$ = pow ($1, $3); @}
				2050	\| '(' exp ')' @{ $$ = $2; @}
				2051	@end group
				2052	@end example
				2053
				2054	This code shows how to reach locations inside of semantic actions, by
				2055	using the pseudo-variables @code{@@@var{n}} for rule components, and the
				2056	pseudo-variable @code{@@$} for groupings.
				2057
				2058	We don't need to assign a value to @code{@@$}: the output parser does it
				2059	automatically. By default, before executing the C code of each action,
				2060	@code{@@$} is set to range from the beginning of @code{@@1} to the end
				2061	of @code{@@@var{n}}, for a rule with @var{n} components. This behavior
				2062	can be redefined (@pxref{Location Default Action, , Default Action for
				2063	Locations}), and for very specific rules, @code{@@$} can be computed by
				2064	hand.
				2065
				2066	@node Ltcalc Lexer
				2067	@subsection The @code{ltcalc} Lexical Analyzer.
				2068
				2069	Until now, we relied on Bison's defaults to enable location
				2070	tracking. The next step is to rewrite the lexical analyzer, and make it
				2071	able to feed the parser with the token locations, as it already does for
				2072	semantic values.
				2073
				2074	To this end, we must take into account every single character of the
				2075	input text, to avoid the computed locations of being fuzzy or wrong:
				2076
				2077	@example
				2078	@group
				2079	int
				2080	yylex (void)
				2081	@{
				2082	int c;
				2083	@end group
				2084
				2085	@group
				2086	/* Skip white space. */
				2087	while ((c = getchar ()) == ' ' \|\| c == '\t')
				2088	++yylloc.last_column;
				2089	@end group
				2090
				2091	@group
				2092	/* Step. */
				2093	yylloc.first_line = yylloc.last_line;
				2094	yylloc.first_column = yylloc.last_column;
				2095	@end group
				2096
				2097	@group
				2098	/* Process numbers. */
				2099	if (isdigit (c))
				2100	@{
				2101	yylval = c - '0';
				2102	++yylloc.last_column;
				2103	while (isdigit (c = getchar ()))
				2104	@{
				2105	++yylloc.last_column;
				2106	yylval = yylval * 10 + c - '0';
				2107	@}
				2108	ungetc (c, stdin);
				2109	return NUM;
				2110	@}
				2111	@end group
				2112
				2113	/* Return end-of-input. */
				2114	if (c == EOF)
				2115	return 0;
				2116
				2117	/* Return a single char, and update location. */
				2118	if (c == '\n')
				2119	@{
				2120	++yylloc.last_line;
				2121	yylloc.last_column = 0;
				2122	@}
				2123	else
				2124	++yylloc.last_column;
				2125	return c;
				2126	@}
				2127	@end example
				2128
				2129	Basically, the lexical analyzer performs the same processing as before:
				2130	it skips blanks and tabs, and reads numbers or single-character tokens.
				2131	In addition, it updates @code{yylloc}, the global variable (of type
				2132	@code{YYLTYPE}) containing the token's location.
				2133
				2134	Now, each time this function returns a token, the parser has its number
				2135	as well as its semantic value, and its location in the text. The last
				2136	needed change is to initialize @code{yylloc}, for example in the
				2137	controlling function:
				2138
				2139	@example
				2140	@group
				2141	int
				2142	main (void)
				2143	@{
				2144	yylloc.first_line = yylloc.last_line = 1;
				2145	yylloc.first_column = yylloc.last_column = 0;
				2146	return yyparse ();
				2147	@}
				2148	@end group
				2149	@end example
				2150
				2151	Remember that computing locations is not a matter of syntax. Every
				2152	character must be associated to a location update, whether it is in
				2153	valid input, in comments, in literal strings, and so on.
				2154
				2155	@node Multi-function Calc
				2156	@section Multi-Function Calculator: @code{mfcalc}
				2157	@cindex multi-function calculator
				2158	@cindex @code{mfcalc}
				2159	@cindex calculator, multi-function
				2160
				2161	Now that the basics of Bison have been discussed, it is time to move on to
				2162	a more advanced problem. The above calculators provided only five
				2163	functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would
				2164	be nice to have a calculator that provides other mathematical functions such
				2165	as @code{sin}, @code{cos}, etc.
				2166
				2167	It is easy to add new operators to the infix calculator as long as they are
				2168	only single-character literals. The lexical analyzer @code{yylex} passes
				2169	back all nonnumeric characters as tokens, so new grammar rules suffice for
				2170	adding a new operator. But we want something more flexible: built-in
				2171	functions whose syntax has this form:
				2172
				2173	@example
				2174	@var{function_name} (@var{argument})
				2175	@end example
				2176
				2177	@noindent
				2178	At the same time, we will add memory to the calculator, by allowing you
				2179	to create named variables, store values in them, and use them later.
				2180	Here is a sample session with the multi-function calculator:
				2181
				2182	@example
				2183	$ @kbd{mfcalc}
				2184	@kbd{pi = 3.141592653589}
				2185	3.1415926536
				2186	@kbd{sin(pi)}
				2187	0.0000000000
				2188	@kbd{alpha = beta1 = 2.3}
				2189	2.3000000000
				2190	@kbd{alpha}
				2191	2.3000000000
				2192	@kbd{ln(alpha)}
				2193	0.8329091229
				2194	@kbd{exp(ln(beta1))}
				2195	2.3000000000
				2196	$
				2197	@end example
				2198
				2199	Note that multiple assignment and nested function calls are permitted.
				2200
				2201	@menu
				2202	* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
				2203	* Rules: Mfcalc Rules. Grammar rules for the calculator.
				2204	* Symtab: Mfcalc Symtab. Symbol table management subroutines.
				2205	@end menu
				2206
				2207	@node Mfcalc Decl
				2208	@subsection Declarations for @code{mfcalc}
				2209
				2210	Here are the C and Bison declarations for the multi-function calculator.
				2211
				2212	@smallexample
				2213	@group
				2214	%@{
				2215	#include <math.h> /* For math functions, cos(), sin(), etc. */
				2216	#include "calc.h" /* Contains definition of `symrec'. */
				2217	int yylex (void);
				2218	void yyerror (char const *);
				2219	%@}
				2220	@end group
				2221	@group
				2222	%union @{
				2223	double val; /* For returning numbers. */
				2224	symrec tptr; / For returning symbol-table pointers. */
				2225	@}
				2226	@end group
				2227	%token <val> NUM /* Simple double precision number. */
				2228	%token <tptr> VAR FNCT /* Variable and Function. */
				2229	%type <val> exp
				2230
				2231	@group
				2232	%right '='
				2233	%left '-' '+'
				2234	%left '*' '/'
				2235	%left NEG /* negation--unary minus */
				2236	%right '^' /* exponentiation */
				2237	@end group
				2238	%% /* The grammar follows. */
				2239	@end smallexample
				2240
				2241	The above grammar introduces only two new features of the Bison language.
				2242	These features allow semantic values to have various data types
				2243	(@pxref{Multiple Types, ,More Than One Value Type}).
				2244
				2245	The @code{%union} declaration specifies the entire list of possible types;
				2246	this is instead of defining @code{YYSTYPE}. The allowable types are now
				2247	double-floats (for @code{exp} and @code{NUM}) and pointers to entries in
				2248	the symbol table. @xref{Union Decl, ,The Collection of Value Types}.
				2249
				2250	Since values can now have various types, it is necessary to associate a
				2251	type with each grammar symbol whose semantic value is used. These symbols
				2252	are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their
				2253	declarations are augmented with information about their data type (placed
				2254	between angle brackets).
				2255
				2256	The Bison construct @code{%type} is used for declaring nonterminal
				2257	symbols, just as @code{%token} is used for declaring token types. We
				2258	have not used @code{%type} before because nonterminal symbols are
				2259	normally declared implicitly by the rules that define them. But
				2260	@code{exp} must be declared explicitly so we can specify its value type.
				2261	@xref{Type Decl, ,Nonterminal Symbols}.
				2262
				2263	@node Mfcalc Rules
				2264	@subsection Grammar Rules for @code{mfcalc}
				2265
				2266	Here are the grammar rules for the multi-function calculator.
				2267	Most of them are copied directly from @code{calc}; three rules,
				2268	those which mention @code{VAR} or @code{FNCT}, are new.
				2269
				2270	@smallexample
				2271	@group
				2272	input: /* empty */
				2273	\| input line
				2274	;
				2275	@end group
				2276
				2277	@group
				2278	line:
				2279	'\n'
				2280	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
				2281	\| error '\n' @{ yyerrok; @}
				2282	;
				2283	@end group
				2284
				2285	@group
				2286	exp: NUM @{ $$ = $1; @}
				2287	\| VAR @{ $$ = $1->value.var; @}
				2288	\| VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
				2289	\| FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
				2290	\| exp '+' exp @{ $$ = $1 + $3; @}
				2291	\| exp '-' exp @{ $$ = $1 - $3; @}
				2292	\| exp '' exp @{ $$ = $1 $3; @}
				2293	\| exp '/' exp @{ $$ = $1 / $3; @}
				2294	\| '-' exp %prec NEG @{ $$ = -$2; @}
				2295	\| exp '^' exp @{ $$ = pow ($1, $3); @}
				2296	\| '(' exp ')' @{ $$ = $2; @}
				2297	;
				2298	@end group
				2299	/* End of grammar. */
				2300	%%
				2301	@end smallexample
				2302
				2303	@node Mfcalc Symtab
				2304	@subsection The @code{mfcalc} Symbol Table
				2305	@cindex symbol table example
				2306
				2307	The multi-function calculator requires a symbol table to keep track of the
				2308	names and meanings of variables and functions. This doesn't affect the
				2309	grammar rules (except for the actions) or the Bison declarations, but it
				2310	requires some additional C functions for support.
				2311
				2312	The symbol table itself consists of a linked list of records. Its
				2313	definition, which is kept in the header @file{calc.h}, is as follows. It
				2314	provides for either functions or variables to be placed in the table.
				2315
				2316	@smallexample
				2317	@group
				2318	/* Function type. */
				2319	typedef double (*func_t) (double);
				2320	@end group
				2321
				2322	@group
				2323	/* Data type for links in the chain of symbols. */
				2324	struct symrec
				2325	@{
				2326	char name; / name of symbol */
				2327	int type; /* type of symbol: either VAR or FNCT */
				2328	union
				2329	@{
				2330	double var; /* value of a VAR */
				2331	func_t fnctptr; /* value of a FNCT */
				2332	@} value;
				2333	struct symrec next; / link field */
				2334	@};
				2335	@end group
				2336
				2337	@group
				2338	typedef struct symrec symrec;
				2339
				2340	/* The symbol table: a chain of `struct symrec'. */
				2341	extern symrec *sym_table;
				2342
				2343	symrec putsym (char const , int);
				2344	symrec getsym (char const );
				2345	@end group
				2346	@end smallexample
				2347
				2348	The new version of @code{main} includes a call to @code{init_table}, a
				2349	function that initializes the symbol table. Here it is, and
				2350	@code{init_table} as well:
				2351
				2352	@smallexample
				2353	#include <stdio.h>
				2354
				2355	@group
				2356	/* Called by yyparse on error. */
				2357	void
				2358	yyerror (char const *s)
				2359	@{
				2360	printf ("%s\n", s);
				2361	@}
				2362	@end group
				2363
				2364	@group
				2365	struct init
				2366	@{
				2367	char const *fname;
				2368	double (*fnct) (double);
				2369	@};
				2370	@end group
				2371
				2372	@group
				2373	struct init const arith_fncts[] =
				2374	@{
				2375	"sin", sin,
				2376	"cos", cos,
				2377	"atan", atan,
				2378	"ln", log,
				2379	"exp", exp,
				2380	"sqrt", sqrt,
				2381	0, 0
				2382	@};
				2383	@end group
				2384
				2385	@group
				2386	/* The symbol table: a chain of `struct symrec'. */
				2387	symrec *sym_table;
				2388	@end group
				2389
				2390	@group
				2391	/* Put arithmetic functions in table. */
				2392	void
				2393	init_table (void)
				2394	@{
				2395	int i;
				2396	symrec *ptr;
				2397	for (i = 0; arith_fncts[i].fname != 0; i++)
				2398	@{
				2399	ptr = putsym (arith_fncts[i].fname, FNCT);
				2400	ptr->value.fnctptr = arith_fncts[i].fnct;
				2401	@}
				2402	@}
				2403	@end group
				2404
				2405	@group
				2406	int
				2407	main (void)
				2408	@{
				2409	init_table ();
				2410	return yyparse ();
				2411	@}
				2412	@end group
				2413	@end smallexample
				2414
				2415	By simply editing the initialization list and adding the necessary include
				2416	files, you can add additional functions to the calculator.
				2417
				2418	Two important functions allow look-up and installation of symbols in the
				2419	symbol table. The function @code{putsym} is passed a name and the type
				2420	(@code{VAR} or @code{FNCT}) of the object to be installed. The object is
				2421	linked to the front of the list, and a pointer to the object is returned.
				2422	The function @code{getsym} is passed the name of the symbol to look up. If
				2423	found, a pointer to that symbol is returned; otherwise zero is returned.
				2424
				2425	@smallexample
				2426	symrec *
				2427	putsym (char const *sym_name, int sym_type)
				2428	@{
				2429	symrec *ptr;
				2430	ptr = (symrec *) malloc (sizeof (symrec));
				2431	ptr->name = (char *) malloc (strlen (sym_name) + 1);
				2432	strcpy (ptr->name,sym_name);
				2433	ptr->type = sym_type;
				2434	ptr->value.var = 0; /* Set value to 0 even if fctn. */
				2435	ptr->next = (struct symrec *)sym_table;
				2436	sym_table = ptr;
				2437	return ptr;
				2438	@}
				2439
				2440	symrec *
				2441	getsym (char const *sym_name)
				2442	@{
				2443	symrec *ptr;
				2444	for (ptr = sym_table; ptr != (symrec *) 0;
				2445	ptr = (symrec *)ptr->next)
				2446	if (strcmp (ptr->name,sym_name) == 0)
				2447	return ptr;
				2448	return 0;
				2449	@}
				2450	@end smallexample
				2451
				2452	The function @code{yylex} must now recognize variables, numeric values, and
				2453	the single-character arithmetic operators. Strings of alphanumeric
				2454	characters with a leading letter are recognized as either variables or
				2455	functions depending on what the symbol table says about them.
				2456
				2457	The string is passed to @code{getsym} for look up in the symbol table. If
				2458	the name appears in the table, a pointer to its location and its type
				2459	(@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not
				2460	already in the table, then it is installed as a @code{VAR} using
				2461	@code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is
				2462	returned to @code{yyparse}.
				2463
				2464	No change is needed in the handling of numeric values and arithmetic
				2465	operators in @code{yylex}.
				2466
				2467	@smallexample
				2468	@group
				2469	#include <ctype.h>
				2470	@end group
				2471
				2472	@group
				2473	int
				2474	yylex (void)
				2475	@{
				2476	int c;
				2477
				2478	/* Ignore white space, get first nonwhite character. */
				2479	while ((c = getchar ()) == ' ' \|\| c == '\t');
				2480
				2481	if (c == EOF)
				2482	return 0;
				2483	@end group
				2484
				2485	@group
				2486	/* Char starts a number => parse the number. */
				2487	if (c == '.' \|\| isdigit (c))
				2488	@{
				2489	ungetc (c, stdin);
				2490	scanf ("%lf", &yylval.val);
				2491	return NUM;
				2492	@}
				2493	@end group
				2494
				2495	@group
				2496	/* Char starts an identifier => read the name. */
				2497	if (isalpha (c))
				2498	@{
				2499	symrec *s;
				2500	static char *symbuf = 0;
				2501	static int length = 0;
				2502	int i;
				2503	@end group
				2504
				2505	@group
				2506	/* Initially make the buffer long enough
				2507	for a 40-character symbol name. */
				2508	if (length == 0)
				2509	length = 40, symbuf = (char *)malloc (length + 1);
				2510
				2511	i = 0;
				2512	do
				2513	@end group
				2514	@group
				2515	@{
				2516	/* If buffer is full, make it bigger. */
				2517	if (i == length)
				2518	@{
				2519	length *= 2;
				2520	symbuf = (char *) realloc (symbuf, length + 1);
				2521	@}
				2522	/* Add this character to the buffer. */
				2523	symbuf[i++] = c;
				2524	/* Get another character. */
				2525	c = getchar ();
				2526	@}
				2527	@end group
				2528	@group
				2529	while (isalnum (c));
				2530
				2531	ungetc (c, stdin);
				2532	symbuf[i] = '\0';
				2533	@end group
				2534
				2535	@group
				2536	s = getsym (symbuf);
				2537	if (s == 0)
				2538	s = putsym (symbuf, VAR);
				2539	yylval.tptr = s;
				2540	return s->type;
				2541	@}
				2542
				2543	/* Any other character is a token by itself. */
				2544	return c;
				2545	@}
				2546	@end group
				2547	@end smallexample
				2548
				2549	This program is both powerful and flexible. You may easily add new
				2550	functions, and it is a simple job to modify this code to install
				2551	predefined variables such as @code{pi} or @code{e} as well.
				2552
				2553	@node Exercises
				2554	@section Exercises
				2555	@cindex exercises
				2556
				2557	@enumerate
				2558	@item
				2559	Add some new functions from @file{math.h} to the initialization list.
				2560
				2561	@item
				2562	Add another array that contains constants and their values. Then
				2563	modify @code{init_table} to add these constants to the symbol table.
				2564	It will be easiest to give the constants type @code{VAR}.
				2565
				2566	@item
				2567	Make the program report an error if the user refers to an
				2568	uninitialized variable in any way except to store a value in it.
				2569	@end enumerate
				2570
				2571	@node Grammar File
				2572	@chapter Bison Grammar Files
				2573
				2574	Bison takes as input a context-free grammar specification and produces a
				2575	C-language function that recognizes correct instances of the grammar.
				2576
				2577	The Bison grammar input file conventionally has a name ending in @samp{.y}.
				2578	@xref{Invocation, ,Invoking Bison}.
				2579
				2580	@menu
				2581	* Grammar Outline:: Overall layout of the grammar file.
				2582	* Symbols:: Terminal and nonterminal symbols.
				2583	* Rules:: How to write grammar rules.
				2584	* Recursion:: Writing recursive rules.
				2585	* Semantics:: Semantic values and actions.
				2586	* Locations:: Locations and actions.
				2587	* Declarations:: All kinds of Bison declarations are described here.
				2588	* Multiple Parsers:: Putting more than one Bison parser in one program.
				2589	@end menu
				2590
				2591	@node Grammar Outline
				2592	@section Outline of a Bison Grammar
				2593
				2594	A Bison grammar file has four main sections, shown here with the
				2595	appropriate delimiters:
				2596
				2597	@example
				2598	%@{
				2599	@var{Prologue}
				2600	%@}
				2601
				2602	@var{Bison declarations}
				2603
				2604	%%
				2605	@var{Grammar rules}
				2606	%%
				2607
				2608	@var{Epilogue}
				2609	@end example
				2610
				2611	Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
				2612	As a @acronym{GNU} extension, @samp{//} introduces a comment that
				2613	continues until end of line.
				2614
				2615	@menu
				2616	* Prologue:: Syntax and usage of the prologue.
				2617	* Bison Declarations:: Syntax and usage of the Bison declarations section.
				2618	* Grammar Rules:: Syntax and usage of the grammar rules section.
				2619	* Epilogue:: Syntax and usage of the epilogue.
				2620	@end menu
				2621
				2622	@node Prologue
				2623	@subsection The prologue
				2624	@cindex declarations section
				2625	@cindex Prologue
				2626	@cindex declarations
				2627
				2628	The @var{Prologue} section contains macro definitions and declarations
				2629	of functions and variables that are used in the actions in the grammar
				2630	rules. These are copied to the beginning of the parser file so that
				2631	they precede the definition of @code{yyparse}. You can use
				2632	@samp{#include} to get the declarations from a header file. If you
				2633	don't need any C declarations, you may omit the @samp{%@{} and
				2634	@samp{%@}} delimiters that bracket this section.
				2635
				2636	The @var{Prologue} section is terminated by the the first occurrence
				2637	of @samp{%@}} that is outside a comment, a string literal, or a
				2638	character constant.
				2639
				2640	You may have more than one @var{Prologue} section, intermixed with the
				2641	@var{Bison declarations}. This allows you to have C and Bison
				2642	declarations that refer to each other. For example, the @code{%union}
				2643	declaration may use types defined in a header file, and you may wish to
				2644	prototype functions that take arguments of type @code{YYSTYPE}. This
				2645	can be done with two @var{Prologue} blocks, one before and one after the
				2646	@code{%union} declaration.
				2647
				2648	@smallexample
				2649	%@{
				2650	#include <stdio.h>
				2651	#include "ptypes.h"
				2652	%@}
				2653
				2654	%union @{
				2655	long int n;
				2656	tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
				2657	@}
				2658
				2659	%@{
				2660	static void print_token_value (FILE *, int, YYSTYPE);
				2661	#define YYPRINT(F, N, L) print_token_value (F, N, L)
				2662	%@}
				2663
				2664	@dots{}
				2665	@end smallexample
				2666
				2667	@node Bison Declarations
				2668	@subsection The Bison Declarations Section
				2669	@cindex Bison declarations (introduction)
				2670	@cindex declarations, Bison (introduction)
				2671
				2672	The @var{Bison declarations} section contains declarations that define
				2673	terminal and nonterminal symbols, specify precedence, and so on.
				2674	In some simple grammars you may not need any declarations.
				2675	@xref{Declarations, ,Bison Declarations}.
				2676
				2677	@node Grammar Rules
				2678	@subsection The Grammar Rules Section
				2679	@cindex grammar rules section
				2680	@cindex rules section for grammar
				2681
				2682	The @dfn{grammar rules} section contains one or more Bison grammar
				2683	rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}.
				2684
				2685	There must always be at least one grammar rule, and the first
				2686	@samp{%%} (which precedes the grammar rules) may never be omitted even
				2687	if it is the first thing in the file.
				2688
				2689	@node Epilogue
				2690	@subsection The epilogue
				2691	@cindex additional C code section
				2692	@cindex epilogue
				2693	@cindex C code, section for additional
				2694
				2695	The @var{Epilogue} is copied verbatim to the end of the parser file, just as
				2696	the @var{Prologue} is copied to the beginning. This is the most convenient
				2697	place to put anything that you want to have in the parser file but which need
				2698	not come before the definition of @code{yyparse}. For example, the
				2699	definitions of @code{yylex} and @code{yyerror} often go here. Because
				2700	C requires functions to be declared before being used, you often need
				2701	to declare functions like @code{yylex} and @code{yyerror} in the Prologue,
				2702	even if you define them in the Epilogue.
				2703	@xref{Interface, ,Parser C-Language Interface}.
				2704
				2705	If the last section is empty, you may omit the @samp{%%} that separates it
				2706	from the grammar rules.
				2707
				2708	The Bison parser itself contains many macros and identifiers whose names
				2709	start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using
				2710	any such names (except those documented in this manual) in the epilogue
				2711	of the grammar file.
				2712
				2713	@node Symbols
				2714	@section Symbols, Terminal and Nonterminal
				2715	@cindex nonterminal symbol
				2716	@cindex terminal symbol
				2717	@cindex token type
				2718	@cindex symbol
				2719
				2720	@dfn{Symbols} in Bison grammars represent the grammatical classifications
				2721	of the language.
				2722
				2723	A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a
				2724	class of syntactically equivalent tokens. You use the symbol in grammar
				2725	rules to mean that a token in that class is allowed. The symbol is
				2726	represented in the Bison parser by a numeric code, and the @code{yylex}
				2727	function returns a token type code to indicate what kind of token has
				2728	been read. You don't need to know what the code value is; you can use
				2729	the symbol to stand for it.
				2730
				2731	A @dfn{nonterminal symbol} stands for a class of syntactically
				2732	equivalent groupings. The symbol name is used in writing grammar rules.
				2733	By convention, it should be all lower case.
				2734
				2735	Symbol names can contain letters, digits (not at the beginning),
				2736	underscores and periods. Periods make sense only in nonterminals.
				2737
				2738	There are three ways of writing terminal symbols in the grammar:
				2739
				2740	@itemize @bullet
				2741	@item
				2742	A @dfn{named token type} is written with an identifier, like an
				2743	identifier in C@. By convention, it should be all upper case. Each
				2744	such name must be defined with a Bison declaration such as
				2745	@code{%token}. @xref{Token Decl, ,Token Type Names}.
				2746
				2747	@item
				2748	@cindex character token
				2749	@cindex literal token
				2750	@cindex single-character literal
				2751	A @dfn{character token type} (or @dfn{literal character token}) is
				2752	written in the grammar using the same syntax used in C for character
				2753	constants; for example, @code{'+'} is a character token type. A
				2754	character token type doesn't need to be declared unless you need to
				2755	specify its semantic value data type (@pxref{Value Type, ,Data Types of
				2756	Semantic Values}), associativity, or precedence (@pxref{Precedence,
				2757	,Operator Precedence}).
				2758
				2759	By convention, a character token type is used only to represent a
				2760	token that consists of that particular character. Thus, the token
				2761	type @code{'+'} is used to represent the character @samp{+} as a
				2762	token. Nothing enforces this convention, but if you depart from it,
				2763	your program will confuse other readers.
				2764
				2765	All the usual escape sequences used in character literals in C can be
				2766	used in Bison as well, but you must not use the null character as a
				2767	character literal because its numeric code, zero, signifies
				2768	end-of-input (@pxref{Calling Convention, ,Calling Convention
				2769	for @code{yylex}}). Also, unlike standard C, trigraphs have no
				2770	special meaning in Bison character literals, nor is backslash-newline
				2771	allowed.
				2772
				2773	@item
				2774	@cindex string token
				2775	@cindex literal string token
				2776	@cindex multicharacter literal
				2777	A @dfn{literal string token} is written like a C string constant; for
				2778	example, @code{"<="} is a literal string token. A literal string token
				2779	doesn't need to be declared unless you need to specify its semantic
				2780	value data type (@pxref{Value Type}), associativity, or precedence
				2781	(@pxref{Precedence}).
				2782
				2783	You can associate the literal string token with a symbolic name as an
				2784	alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token
				2785	Declarations}). If you don't do that, the lexical analyzer has to
				2786	retrieve the token number for the literal string token from the
				2787	@code{yytname} table (@pxref{Calling Convention}).
				2788
				2789	@strong{Warning}: literal string tokens do not work in Yacc.
				2790
				2791	By convention, a literal string token is used only to represent a token
				2792	that consists of that particular string. Thus, you should use the token
				2793	type @code{"<="} to represent the string @samp{<=} as a token. Bison
				2794	does not enforce this convention, but if you depart from it, people who
				2795	read your program will be confused.
				2796
				2797	All the escape sequences used in string literals in C can be used in
				2798	Bison as well, except that you must not use a null character within a
				2799	string literal. Also, unlike Standard C, trigraphs have no special
				2800	meaning in Bison string literals, nor is backslash-newline allowed. A
				2801	literal string token must contain two or more characters; for a token
				2802	containing just one character, use a character token (see above).
				2803	@end itemize
				2804
				2805	How you choose to write a terminal symbol has no effect on its
				2806	grammatical meaning. That depends only on where it appears in rules and
				2807	on when the parser function returns that symbol.
				2808
				2809	The value returned by @code{yylex} is always one of the terminal
				2810	symbols, except that a zero or negative value signifies end-of-input.
				2811	Whichever way you write the token type in the grammar rules, you write
				2812	it the same way in the definition of @code{yylex}. The numeric code
				2813	for a character token type is simply the positive numeric code of the
				2814	character, so @code{yylex} can use the identical value to generate the
				2815	requisite code, though you may need to convert it to @code{unsigned
				2816	char} to avoid sign-extension on hosts where @code{char} is signed.
				2817	Each named token type becomes a C macro in
				2818	the parser file, so @code{yylex} can use the name to stand for the code.
				2819	(This is why periods don't make sense in terminal symbols.)
				2820	@xref{Calling Convention, ,Calling Convention for @code{yylex}}.
				2821
				2822	If @code{yylex} is defined in a separate file, you need to arrange for the
				2823	token-type macro definitions to be available there. Use the @samp{-d}
				2824	option when you run Bison, so that it will write these macro definitions
				2825	into a separate header file @file{@var{name}.tab.h} which you can include
				2826	in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
				2827
				2828	If you want to write a grammar that is portable to any Standard C
				2829	host, you must use only nonnull character tokens taken from the basic
				2830	execution character set of Standard C@. This set consists of the ten
				2831	digits, the 52 lower- and upper-case English letters, and the
				2832	characters in the following C-language string:
				2833
				2834	@example
				2835	"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{\|@}~"
				2836	@end example
				2837
				2838	The @code{yylex} function and Bison must use a consistent character set
				2839	and encoding for character tokens. For example, if you run Bison in an
				2840	@acronym{ASCII} environment, but then compile and run the resulting
				2841	program in an environment that uses an incompatible character set like
				2842	@acronym{EBCDIC}, the resulting program may not work because the tables
				2843	generated by Bison will assume @acronym{ASCII} numeric values for
				2844	character tokens. It is standard practice for software distributions to
				2845	contain C source files that were generated by Bison in an
				2846	@acronym{ASCII} environment, so installers on platforms that are
				2847	incompatible with @acronym{ASCII} must rebuild those files before
				2848	compiling them.
				2849
				2850	The symbol @code{error} is a terminal symbol reserved for error recovery
				2851	(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
				2852	In particular, @code{yylex} should never return this value. The default
				2853	value of the error token is 256, unless you explicitly assigned 256 to
				2854	one of your tokens with a @code{%token} declaration.
				2855
				2856	@node Rules
				2857	@section Syntax of Grammar Rules
				2858	@cindex rule syntax
				2859	@cindex grammar rule syntax
				2860	@cindex syntax of grammar rules
				2861
				2862	A Bison grammar rule has the following general form:
				2863
				2864	@example
				2865	@group
				2866	@var{result}: @var{components}@dots{}
				2867	;
				2868	@end group
				2869	@end example
				2870
				2871	@noindent
				2872	where @var{result} is the nonterminal symbol that this rule describes,
				2873	and @var{components} are various terminal and nonterminal symbols that
				2874	are put together by this rule (@pxref{Symbols}).
				2875
				2876	For example,
				2877
				2878	@example
				2879	@group
				2880	exp: exp '+' exp
				2881	;
				2882	@end group
				2883	@end example
				2884
				2885	@noindent
				2886	says that two groupings of type @code{exp}, with a @samp{+} token in between,
				2887	can be combined into a larger grouping of type @code{exp}.
				2888
				2889	White space in rules is significant only to separate symbols. You can add
				2890	extra white space as you wish.
				2891
				2892	Scattered among the components can be @var{actions} that determine
				2893	the semantics of the rule. An action looks like this:
				2894
				2895	@example
				2896	@{@var{C statements}@}
				2897	@end example
				2898
				2899	@noindent
				2900	@cindex braced code
				2901	This is an example of @dfn{braced code}, that is, C code surrounded by
				2902	braces, much like a compound statement in C@. Braced code can contain
				2903	any sequence of C tokens, so long as its braces are balanced. Bison
				2904	does not check the braced code for correctness directly; it merely
				2905	copies the code to the output file, where the C compiler can check it.
				2906
				2907	Within braced code, the balanced-brace count is not affected by braces
				2908	within comments, string literals, or character constants, but it is
				2909	affected by the C digraphs @samp{<%} and @samp{%>} that represent
				2910	braces. At the top level braced code must be terminated by @samp{@}}
				2911	and not by a digraph. Bison does not look for trigraphs, so if braced
				2912	code uses trigraphs you should ensure that they do not affect the
				2913	nesting of braces or the boundaries of comments, string literals, or
				2914	character constants.
				2915
				2916	Usually there is only one action and it follows the components.
				2917	@xref{Actions}.
				2918
				2919	@findex \|
				2920	Multiple rules for the same @var{result} can be written separately or can
				2921	be joined with the vertical-bar character @samp{\|} as follows:
				2922
				2923	@example
				2924	@group
				2925	@var{result}: @var{rule1-components}@dots{}
				2926	\| @var{rule2-components}@dots{}
				2927	@dots{}
				2928	;
				2929	@end group
				2930	@end example
				2931
				2932	@noindent
				2933	They are still considered distinct rules even when joined in this way.
				2934
				2935	If @var{components} in a rule is empty, it means that @var{result} can
				2936	match the empty string. For example, here is how to define a
				2937	comma-separated sequence of zero or more @code{exp} groupings:
				2938
				2939	@example
				2940	@group
				2941	expseq: /* empty */
				2942	\| expseq1
				2943	;
				2944	@end group
				2945
				2946	@group
				2947	expseq1: exp
				2948	\| expseq1 ',' exp
				2949	;
				2950	@end group
				2951	@end example
				2952
				2953	@noindent
				2954	It is customary to write a comment @samp{/* empty */} in each rule
				2955	with no components.
				2956
				2957	@node Recursion
				2958	@section Recursive Rules
				2959	@cindex recursive rule
				2960
				2961	A rule is called @dfn{recursive} when its @var{result} nonterminal
				2962	appears also on its right hand side. Nearly all Bison grammars need to
				2963	use recursion, because that is the only way to define a sequence of any
				2964	number of a particular thing. Consider this recursive definition of a
				2965	comma-separated sequence of one or more expressions:
				2966
				2967	@example
				2968	@group
				2969	expseq1: exp
				2970	\| expseq1 ',' exp
				2971	;
				2972	@end group
				2973	@end example
				2974
				2975	@cindex left recursion
				2976	@cindex right recursion
				2977	@noindent
				2978	Since the recursive use of @code{expseq1} is the leftmost symbol in the
				2979	right hand side, we call this @dfn{left recursion}. By contrast, here
				2980	the same construct is defined using @dfn{right recursion}:
				2981
				2982	@example
				2983	@group
				2984	expseq1: exp
				2985	\| exp ',' expseq1
				2986	;
				2987	@end group
				2988	@end example
				2989
				2990	@noindent
				2991	Any kind of sequence can be defined using either left recursion or right
				2992	recursion, but you should always use left recursion, because it can
				2993	parse a sequence of any number of elements with bounded stack space.
				2994	Right recursion uses up space on the Bison stack in proportion to the
				2995	number of elements in the sequence, because all the elements must be
				2996	shifted onto the stack before the rule can be applied even once.
				2997	@xref{Algorithm, ,The Bison Parser Algorithm}, for further explanation
				2998	of this.
				2999
				3000	@cindex mutual recursion
				3001	@dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
				3002	rule does not appear directly on its right hand side, but does appear
				3003	in rules for other nonterminals which do appear on its right hand
				3004	side.
				3005
				3006	For example:
				3007
				3008	@example
				3009	@group
				3010	expr: primary
				3011	\| primary '+' primary
				3012	;
				3013	@end group
				3014
				3015	@group
				3016	primary: constant
				3017	\| '(' expr ')'
				3018	;
				3019	@end group
				3020	@end example
				3021
				3022	@noindent
				3023	defines two mutually-recursive nonterminals, since each refers to the
				3024	other.
				3025
				3026	@node Semantics
				3027	@section Defining Language Semantics
				3028	@cindex defining language semantics
				3029	@cindex language semantics, defining
				3030
				3031	The grammar rules for a language determine only the syntax. The semantics
				3032	are determined by the semantic values associated with various tokens and
				3033	groupings, and by the actions taken when various groupings are recognized.
				3034
				3035	For example, the calculator calculates properly because the value
				3036	associated with each expression is the proper number; it adds properly
				3037	because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
				3038	the numbers associated with @var{x} and @var{y}.
				3039
				3040	@menu
				3041	* Value Type:: Specifying one data type for all semantic values.
				3042	* Multiple Types:: Specifying several alternative data types.
				3043	* Actions:: An action is the semantic definition of a grammar rule.
				3044	* Action Types:: Specifying data types for actions to operate on.
				3045	* Mid-Rule Actions:: Most actions go at the end of a rule.
				3046	This says when, why and how to use the exceptional
				3047	action in the middle of a rule.
				3048	@end menu
				3049
				3050	@node Value Type
				3051	@subsection Data Types of Semantic Values
				3052	@cindex semantic value type
				3053	@cindex value type, semantic
				3054	@cindex data types of semantic values
				3055	@cindex default data type
				3056
				3057	In a simple program it may be sufficient to use the same data type for
				3058	the semantic values of all language constructs. This was true in the
				3059	@acronym{RPN} and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish
				3060	Notation Calculator}).
				3061
				3062	Bison's default is to use type @code{int} for all semantic values. To
				3063	specify some other type, define @code{YYSTYPE} as a macro, like this:
				3064
				3065	@example
				3066	#define YYSTYPE double
				3067	@end example
				3068
				3069	@noindent
				3070	@code{YYSTYPE}'s replacement list should be a type name
				3071	that does not contain parentheses or square brackets.
				3072	This macro definition must go in the prologue of the grammar file
				3073	(@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
				3074
				3075	@node Multiple Types
				3076	@subsection More Than One Value Type
				3077
				3078	In most programs, you will need different data types for different kinds
				3079	of tokens and groupings. For example, a numeric constant may need type
				3080	@code{int} or @code{long int}, while a string constant needs type
				3081	@code{char *}, and an identifier might need a pointer to an entry in the
				3082	symbol table.
				3083
				3084	To use more than one data type for semantic values in one parser, Bison
				3085	requires you to do two things:
				3086
				3087	@itemize @bullet
				3088	@item
				3089	Specify the entire collection of possible data types, with the
				3090	@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of
				3091	Value Types}).
				3092
				3093	@item
				3094	Choose one of those types for each symbol (terminal or nonterminal) for
				3095	which semantic values are used. This is done for tokens with the
				3096	@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names})
				3097	and for groupings with the @code{%type} Bison declaration (@pxref{Type
				3098	Decl, ,Nonterminal Symbols}).
				3099	@end itemize
				3100
				3101	@node Actions
				3102	@subsection Actions
				3103	@cindex action
				3104	@vindex $$
				3105	@vindex $@var{n}
				3106
				3107	An action accompanies a syntactic rule and contains C code to be executed
				3108	each time an instance of that rule is recognized. The task of most actions
				3109	is to compute a semantic value for the grouping built by the rule from the
				3110	semantic values associated with tokens or smaller groupings.
				3111
				3112	An action consists of braced code containing C statements, and can be
				3113	placed at any position in the rule;
				3114	it is executed at that position. Most rules have just one action at the
				3115	end of the rule, following all the components. Actions in the middle of
				3116	a rule are tricky and used only for special purposes (@pxref{Mid-Rule
				3117	Actions, ,Actions in Mid-Rule}).
				3118
				3119	The C code in an action can refer to the semantic values of the components
				3120	matched by the rule with the construct @code{$@var{n}}, which stands for
				3121	the value of the @var{n}th component. The semantic value for the grouping
				3122	being constructed is @code{$$}. Bison translates both of these
				3123	constructs into expressions of the appropriate type when it copies the
				3124	actions into the parser file. @code{$$} is translated to a modifiable
				3125	lvalue, so it can be assigned to.
				3126
				3127	Here is a typical example:
				3128
				3129	@example
				3130	@group
				3131	exp: @dots{}
				3132	\| exp '+' exp
				3133	@{ $$ = $1 + $3; @}
				3134	@end group
				3135	@end example
				3136
				3137	@noindent
				3138	This rule constructs an @code{exp} from two smaller @code{exp} groupings
				3139	connected by a plus-sign token. In the action, @code{$1} and @code{$3}
				3140	refer to the semantic values of the two component @code{exp} groupings,
				3141	which are the first and third symbols on the right hand side of the rule.
				3142	The sum is stored into @code{$$} so that it becomes the semantic value of
				3143	the addition-expression just recognized by the rule. If there were a
				3144	useful semantic value associated with the @samp{+} token, it could be
				3145	referred to as @code{$2}.
				3146
				3147	Note that the vertical-bar character @samp{\|} is really a rule
				3148	separator, and actions are attached to a single rule. This is a
				3149	difference with tools like Flex, for which @samp{\|} stands for either
				3150	``or'', or ``the same action as that of the next rule''. In the
				3151	following example, the action is triggered only when @samp{b} is found:
				3152
				3153	@example
				3154	@group
				3155	a-or-b: 'a'\|'b' @{ a_or_b_found = 1; @};
				3156	@end group
				3157	@end example
				3158
				3159	@cindex default action
				3160	If you don't specify an action for a rule, Bison supplies a default:
				3161	@w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule
				3162	becomes the value of the whole rule. Of course, the default action is
				3163	valid only if the two data types match. There is no meaningful default
				3164	action for an empty rule; every empty rule must have an explicit action
				3165	unless the rule's value does not matter.
				3166
				3167	@code{$@var{n}} with @var{n} zero or negative is allowed for reference
				3168	to tokens and groupings on the stack @emph{before} those that match the
				3169	current rule. This is a very risky practice, and to use it reliably
				3170	you must be certain of the context in which the rule is applied. Here
				3171	is a case in which you can use this reliably:
				3172
				3173	@example
				3174	@group
				3175	foo: expr bar '+' expr @{ @dots{} @}
				3176	\| expr bar '-' expr @{ @dots{} @}
				3177	;
				3178	@end group
				3179
				3180	@group
				3181	bar: /* empty */
				3182	@{ previous_expr = $0; @}
				3183	;
				3184	@end group
				3185	@end example
				3186
				3187	As long as @code{bar} is used only in the fashion shown here, @code{$0}
				3188	always refers to the @code{expr} which precedes @code{bar} in the
				3189	definition of @code{foo}.
				3190
				3191	@vindex yylval
				3192	It is also possible to access the semantic value of the look-ahead token, if
				3193	any, from a semantic action.
				3194	This semantic value is stored in @code{yylval}.
				3195	@xref{Action Features, ,Special Features for Use in Actions}.
				3196
				3197	@node Action Types
				3198	@subsection Data Types of Values in Actions
				3199	@cindex action data types
				3200	@cindex data types in actions
				3201
				3202	If you have chosen a single data type for semantic values, the @code{$$}
				3203	and @code{$@var{n}} constructs always have that data type.
				3204
				3205	If you have used @code{%union} to specify a variety of data types, then you
				3206	must declare a choice among these types for each terminal or nonterminal
				3207	symbol that can have a semantic value. Then each time you use @code{$$} or
				3208	@code{$@var{n}}, its data type is determined by which symbol it refers to
				3209	in the rule. In this example,
				3210
				3211	@example
				3212	@group
				3213	exp: @dots{}
				3214	\| exp '+' exp
				3215	@{ $$ = $1 + $3; @}
				3216	@end group
				3217	@end example
				3218
				3219	@noindent
				3220	@code{$1} and @code{$3} refer to instances of @code{exp}, so they all
				3221	have the data type declared for the nonterminal symbol @code{exp}. If
				3222	@code{$2} were used, it would have the data type declared for the
				3223	terminal symbol @code{'+'}, whatever that might be.
				3224
				3225	Alternatively, you can specify the data type when you refer to the value,
				3226	by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
				3227	reference. For example, if you have defined types as shown here:
				3228
				3229	@example
				3230	@group
				3231	%union @{
				3232	int itype;
				3233	double dtype;
				3234	@}
				3235	@end group
				3236	@end example
				3237
				3238	@noindent
				3239	then you can write @code{$<itype>1} to refer to the first subunit of the
				3240	rule as an integer, or @code{$<dtype>1} to refer to it as a double.
				3241
				3242	@node Mid-Rule Actions
				3243	@subsection Actions in Mid-Rule
				3244	@cindex actions in mid-rule
				3245	@cindex mid-rule actions
				3246
				3247	Occasionally it is useful to put an action in the middle of a rule.
				3248	These actions are written just like usual end-of-rule actions, but they
				3249	are executed before the parser even recognizes the following components.
				3250
				3251	A mid-rule action may refer to the components preceding it using
				3252	@code{$@var{n}}, but it may not refer to subsequent components because
				3253	it is run before they are parsed.
				3254
				3255	The mid-rule action itself counts as one of the components of the rule.
				3256	This makes a difference when there is another action later in the same rule
				3257	(and usually there is another at the end): you have to count the actions
				3258	along with the symbols when working out which number @var{n} to use in
				3259	@code{$@var{n}}.
				3260
				3261	The mid-rule action can also have a semantic value. The action can set
				3262	its value with an assignment to @code{$$}, and actions later in the rule
				3263	can refer to the value using @code{$@var{n}}. Since there is no symbol
				3264	to name the action, there is no way to declare a data type for the value
				3265	in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to
				3266	specify a data type each time you refer to this value.
				3267
				3268	There is no way to set the value of the entire rule with a mid-rule
				3269	action, because assignments to @code{$$} do not have that effect. The
				3270	only way to set the value for the entire rule is with an ordinary action
				3271	at the end of the rule.
				3272
				3273	Here is an example from a hypothetical compiler, handling a @code{let}
				3274	statement that looks like @samp{let (@var{variable}) @var{statement}} and
				3275	serves to create a variable named @var{variable} temporarily for the
				3276	duration of @var{statement}. To parse this construct, we must put
				3277	@var{variable} into the symbol table while @var{statement} is parsed, then
				3278	remove it afterward. Here is how it is done:
				3279
				3280	@example
				3281	@group
				3282	stmt: LET '(' var ')'
				3283	@{ $<context>$ = push_context ();
				3284	declare_variable ($3); @}
				3285	stmt @{ $$ = $6;
				3286	pop_context ($<context>5); @}
				3287	@end group
				3288	@end example
				3289
				3290	@noindent
				3291	As soon as @samp{let (@var{variable})} has been recognized, the first
				3292	action is run. It saves a copy of the current semantic context (the
				3293	list of accessible variables) as its semantic value, using alternative
				3294	@code{context} in the data-type union. Then it calls
				3295	@code{declare_variable} to add the new variable to that list. Once the
				3296	first action is finished, the embedded statement @code{stmt} can be
				3297	parsed. Note that the mid-rule action is component number 5, so the
				3298	@samp{stmt} is component number 6.
				3299
				3300	After the embedded statement is parsed, its semantic value becomes the
				3301	value of the entire @code{let}-statement. Then the semantic value from the
				3302	earlier action is used to restore the prior list of variables. This
				3303	removes the temporary @code{let}-variable from the list so that it won't
				3304	appear to exist while the rest of the program is parsed.
				3305
				3306	@findex %destructor
				3307	@cindex discarded symbols, mid-rule actions
				3308	@cindex error recovery, mid-rule actions
				3309	In the above example, if the parser initiates error recovery (@pxref{Error
				3310	Recovery}) while parsing the tokens in the embedded statement @code{stmt},
				3311	it might discard the previous semantic context @code{$<context>5} without
				3312	restoring it.
				3313	Thus, @code{$<context>5} needs a destructor (@pxref{Destructor Decl, , Freeing
				3314	Discarded Symbols}).
				3315	However, Bison currently provides no means to declare a destructor for a
				3316	mid-rule action's semantic value.
				3317
				3318	One solution is to bury the mid-rule action inside a nonterminal symbol and to
				3319	declare a destructor for that symbol:
				3320
				3321	@example
				3322	@group
				3323	%type <context> let
				3324	%destructor @{ pop_context ($$); @} let
				3325
				3326	%%
				3327
				3328	stmt: let stmt
				3329	@{ $$ = $2;
				3330	pop_context ($1); @}
				3331	;
				3332
				3333	let: LET '(' var ')'
				3334	@{ $$ = push_context ();
				3335	declare_variable ($3); @}
				3336	;
				3337
				3338	@end group
				3339	@end example
				3340
				3341	@noindent
				3342	Note that the action is now at the end of its rule.
				3343	Any mid-rule action can be converted to an end-of-rule action in this way, and
				3344	this is what Bison actually does to implement mid-rule actions.
				3345
				3346	Taking action before a rule is completely recognized often leads to
				3347	conflicts since the parser must commit to a parse in order to execute the
				3348	action. For example, the following two rules, without mid-rule actions,
				3349	can coexist in a working parser because the parser can shift the open-brace
				3350	token and look at what follows before deciding whether there is a
				3351	declaration or not:
				3352
				3353	@example
				3354	@group
				3355	compound: '@{' declarations statements '@}'
				3356	\| '@{' statements '@}'
				3357	;
				3358	@end group
				3359	@end example
				3360
				3361	@noindent
				3362	But when we add a mid-rule action as follows, the rules become nonfunctional:
				3363
				3364	@example
				3365	@group
				3366	compound: @{ prepare_for_local_variables (); @}
				3367	'@{' declarations statements '@}'
				3368	@end group
				3369	@group
				3370	\| '@{' statements '@}'
				3371	;
				3372	@end group
				3373	@end example
				3374
				3375	@noindent
				3376	Now the parser is forced to decide whether to run the mid-rule action
				3377	when it has read no farther than the open-brace. In other words, it
				3378	must commit to using one rule or the other, without sufficient
				3379	information to do it correctly. (The open-brace token is what is called
				3380	the @dfn{look-ahead} token at this time, since the parser is still
				3381	deciding what to do about it. @xref{Look-Ahead, ,Look-Ahead Tokens}.)
				3382
				3383	You might think that you could correct the problem by putting identical
				3384	actions into the two rules, like this:
				3385
				3386	@example
				3387	@group
				3388	compound: @{ prepare_for_local_variables (); @}
				3389	'@{' declarations statements '@}'
				3390	\| @{ prepare_for_local_variables (); @}
				3391	'@{' statements '@}'
				3392	;
				3393	@end group
				3394	@end example
				3395
				3396	@noindent
				3397	But this does not help, because Bison does not realize that the two actions
				3398	are identical. (Bison never tries to understand the C code in an action.)
				3399
				3400	If the grammar is such that a declaration can be distinguished from a
				3401	statement by the first token (which is true in C), then one solution which
				3402	does work is to put the action after the open-brace, like this:
				3403
				3404	@example
				3405	@group
				3406	compound: '@{' @{ prepare_for_local_variables (); @}
				3407	declarations statements '@}'
				3408	\| '@{' statements '@}'
				3409	;
				3410	@end group
				3411	@end example
				3412
				3413	@noindent
				3414	Now the first token of the following declaration or statement,
				3415	which would in any case tell Bison which rule to use, can still do so.
				3416
				3417	Another solution is to bury the action inside a nonterminal symbol which
				3418	serves as a subroutine:
				3419
				3420	@example
				3421	@group
				3422	subroutine: /* empty */
				3423	@{ prepare_for_local_variables (); @}
				3424	;
				3425
				3426	@end group
				3427
				3428	@group
				3429	compound: subroutine
				3430	'@{' declarations statements '@}'
				3431	\| subroutine
				3432	'@{' statements '@}'
				3433	;
				3434	@end group
				3435	@end example
				3436
				3437	@noindent
				3438	Now Bison can execute the action in the rule for @code{subroutine} without
				3439	deciding which rule for @code{compound} it will eventually use.
				3440
				3441	@node Locations
				3442	@section Tracking Locations
				3443	@cindex location
				3444	@cindex textual location
				3445	@cindex location, textual
				3446
				3447	Though grammar rules and semantic actions are enough to write a fully
				3448	functional parser, it can be useful to process some additional information,
				3449	especially symbol locations.
				3450
				3451	The way locations are handled is defined by providing a data type, and
				3452	actions to take when rules are matched.
				3453
				3454	@menu
				3455	* Location Type:: Specifying a data type for locations.
				3456	* Actions and Locations:: Using locations in actions.
				3457	* Location Default Action:: Defining a general way to compute locations.
				3458	@end menu
				3459
				3460	@node Location Type
				3461	@subsection Data Type of Locations
				3462	@cindex data type of locations
				3463	@cindex default location type
				3464
				3465	Defining a data type for locations is much simpler than for semantic values,
				3466	since all tokens and groupings always use the same type.
				3467
				3468	You can specify the type of locations by defining a macro called
				3469	@code{YYLTYPE}, just as you can specify the semantic value type by
				3470	defining @code{YYSTYPE} (@pxref{Value Type}).
				3471	When @code{YYLTYPE} is not defined, Bison uses a default structure type with
				3472	four members:
				3473
				3474	@example
				3475	typedef struct YYLTYPE
				3476	@{
				3477	int first_line;
				3478	int first_column;
				3479	int last_line;
				3480	int last_column;
				3481	@} YYLTYPE;
				3482	@end example
				3483
				3484	@node Actions and Locations
				3485	@subsection Actions and Locations
				3486	@cindex location actions
				3487	@cindex actions, location
				3488	@vindex @@$
				3489	@vindex @@@var{n}
				3490
				3491	Actions are not only useful for defining language semantics, but also for
				3492	describing the behavior of the output parser with locations.
				3493
				3494	The most obvious way for building locations of syntactic groupings is very
				3495	similar to the way semantic values are computed. In a given rule, several
				3496	constructs can be used to access the locations of the elements being matched.
				3497	The location of the @var{n}th component of the right hand side is
				3498	@code{@@@var{n}}, while the location of the left hand side grouping is
				3499	@code{@@$}.
				3500
				3501	Here is a basic example using the default data type for locations:
				3502
				3503	@example
				3504	@group
				3505	exp: @dots{}
				3506	\| exp '/' exp
				3507	@{
				3508	@@$.first_column = @@1.first_column;
				3509	@@$.first_line = @@1.first_line;
				3510	@@$.last_column = @@3.last_column;
				3511	@@$.last_line = @@3.last_line;
				3512	if ($3)
				3513	$$ = $1 / $3;
				3514	else
				3515	@{
				3516	$$ = 1;
				3517	fprintf (stderr,
				3518	"Division by zero, l%d,c%d-l%d,c%d",
				3519	@@3.first_line, @@3.first_column,
				3520	@@3.last_line, @@3.last_column);
				3521	@}
				3522	@}
				3523	@end group
				3524	@end example
				3525
				3526	As for semantic values, there is a default action for locations that is
				3527	run each time a rule is matched. It sets the beginning of @code{@@$} to the
				3528	beginning of the first symbol, and the end of @code{@@$} to the end of the
				3529	last symbol.
				3530
				3531	With this default action, the location tracking can be fully automatic. The
				3532	example above simply rewrites this way:
				3533
				3534	@example
				3535	@group
				3536	exp: @dots{}
				3537	\| exp '/' exp
				3538	@{
				3539	if ($3)
				3540	$$ = $1 / $3;
				3541	else
				3542	@{
				3543	$$ = 1;
				3544	fprintf (stderr,
				3545	"Division by zero, l%d,c%d-l%d,c%d",
				3546	@@3.first_line, @@3.first_column,
				3547	@@3.last_line, @@3.last_column);
				3548	@}
				3549	@}
				3550	@end group
				3551	@end example
				3552
				3553	@vindex yylloc
				3554	It is also possible to access the location of the look-ahead token, if any,
				3555	from a semantic action.
				3556	This location is stored in @code{yylloc}.
				3557	@xref{Action Features, ,Special Features for Use in Actions}.
				3558
				3559	@node Location Default Action
				3560	@subsection Default Action for Locations
				3561	@vindex YYLLOC_DEFAULT
				3562	@cindex @acronym{GLR} parsers and @code{YYLLOC_DEFAULT}
				3563
				3564	Actually, actions are not the best place to compute locations. Since
				3565	locations are much more general than semantic values, there is room in
				3566	the output parser to redefine the default action to take for each
				3567	rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is
				3568	matched, before the associated action is run. It is also invoked
				3569	while processing a syntax error, to compute the error's location.
				3570	Before reporting an unresolvable syntactic ambiguity, a @acronym{GLR}
				3571	parser invokes @code{YYLLOC_DEFAULT} recursively to compute the location
				3572	of that ambiguity.
				3573
				3574	Most of the time, this macro is general enough to suppress location
				3575	dedicated code from semantic actions.
				3576
				3577	The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is
				3578	the location of the grouping (the result of the computation). When a
				3579	rule is matched, the second parameter identifies locations of
				3580	all right hand side elements of the rule being matched, and the third
				3581	parameter is the size of the rule's right hand side.
				3582	When a @acronym{GLR} parser reports an ambiguity, which of multiple candidate
				3583	right hand sides it passes to @code{YYLLOC_DEFAULT} is undefined.
				3584	When processing a syntax error, the second parameter identifies locations
				3585	of the symbols that were discarded during error processing, and the third
				3586	parameter is the number of discarded symbols.
				3587
				3588	By default, @code{YYLLOC_DEFAULT} is defined this way:
				3589
				3590	@smallexample
				3591	@group
				3592	# define YYLLOC_DEFAULT(Current, Rhs, N) \
				3593	do \
				3594	if (N) \
				3595	@{ \
				3596	(Current).first_line = YYRHSLOC(Rhs, 1).first_line; \
				3597	(Current).first_column = YYRHSLOC(Rhs, 1).first_column; \
				3598	(Current).last_line = YYRHSLOC(Rhs, N).last_line; \
				3599	(Current).last_column = YYRHSLOC(Rhs, N).last_column; \
				3600	@} \
				3601	else \
				3602	@{ \
				3603	(Current).first_line = (Current).last_line = \
				3604	YYRHSLOC(Rhs, 0).last_line; \
				3605	(Current).first_column = (Current).last_column = \
				3606	YYRHSLOC(Rhs, 0).last_column; \
				3607	@} \
				3608	while (0)
				3609	@end group
				3610	@end smallexample
				3611
				3612	where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol
				3613	in @var{rhs} when @var{k} is positive, and the location of the symbol
				3614	just before the reduction when @var{k} and @var{n} are both zero.
				3615
				3616	When defining @code{YYLLOC_DEFAULT}, you should consider that:
				3617
				3618	@itemize @bullet
				3619	@item
				3620	All arguments are free of side-effects. However, only the first one (the
				3621	result) should be modified by @code{YYLLOC_DEFAULT}.
				3622
				3623	@item
				3624	For consistency with semantic actions, valid indexes within the
				3625	right hand side range from 1 to @var{n}. When @var{n} is zero, only 0 is a
				3626	valid index, and it refers to the symbol just before the reduction.
				3627	During error processing @var{n} is always positive.
				3628
				3629	@item
				3630	Your macro should parenthesize its arguments, if need be, since the
				3631	actual arguments may not be surrounded by parentheses. Also, your
				3632	macro should expand to something that can be used as a single
				3633	statement when it is followed by a semicolon.
				3634	@end itemize
				3635
				3636	@node Declarations
				3637	@section Bison Declarations
				3638	@cindex declarations, Bison
				3639	@cindex Bison declarations
				3640
				3641	The @dfn{Bison declarations} section of a Bison grammar defines the symbols
				3642	used in formulating the grammar and the data types of semantic values.
				3643	@xref{Symbols}.
				3644
				3645	All token type names (but not single-character literal tokens such as
				3646	@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
				3647	declared if you need to specify which data type to use for the semantic
				3648	value (@pxref{Multiple Types, ,More Than One Value Type}).
				3649
				3650	The first rule in the file also specifies the start symbol, by default.
				3651	If you want some other symbol to be the start symbol, you must declare
				3652	it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free
				3653	Grammars}).
				3654
				3655	@menu
				3656	* Require Decl:: Requiring a Bison version.
				3657	* Token Decl:: Declaring terminal symbols.
				3658	* Precedence Decl:: Declaring terminals with precedence and associativity.
				3659	* Union Decl:: Declaring the set of all semantic value types.
				3660	* Type Decl:: Declaring the choice of type for a nonterminal symbol.
				3661	* Initial Action Decl:: Code run before parsing starts.
				3662	* Destructor Decl:: Declaring how symbols are freed.
				3663	* Expect Decl:: Suppressing warnings about parsing conflicts.
				3664	* Start Decl:: Specifying the start symbol.
				3665	* Pure Decl:: Requesting a reentrant parser.
				3666	* Decl Summary:: Table of all Bison declarations.
				3667	@end menu
				3668
				3669	@node Require Decl
				3670	@subsection Require a Version of Bison
				3671	@cindex version requirement
				3672	@cindex requiring a version of Bison
				3673	@findex %require
				3674
				3675	You may require the minimum version of Bison to process the grammar. If
				3676	the requirement is not met, @command{bison} exits with an error (exit
				3677	status 63).
				3678
				3679	@example
				3680	%require "@var{version}"
				3681	@end example
				3682
				3683	@node Token Decl
				3684	@subsection Token Type Names
				3685	@cindex declaring token type names
				3686	@cindex token type names, declaring
				3687	@cindex declaring literal string tokens
				3688	@findex %token
				3689
				3690	The basic way to declare a token type name (terminal symbol) is as follows:
				3691
				3692	@example
				3693	%token @var{name}
				3694	@end example
				3695
				3696	Bison will convert this into a @code{#define} directive in
				3697	the parser, so that the function @code{yylex} (if it is in this file)
				3698	can use the name @var{name} to stand for this token type's code.
				3699
				3700	Alternatively, you can use @code{%left}, @code{%right}, or
				3701	@code{%nonassoc} instead of @code{%token}, if you wish to specify
				3702	associativity and precedence. @xref{Precedence Decl, ,Operator
				3703	Precedence}.
				3704
				3705	You can explicitly specify the numeric code for a token type by appending
				3706	a decimal or hexadecimal integer value in the field immediately
				3707	following the token name:
				3708
				3709	@example
				3710	%token NUM 300
				3711	%token XNUM 0x12d // a GNU extension
				3712	@end example
				3713
				3714	@noindent
				3715	It is generally best, however, to let Bison choose the numeric codes for
				3716	all token types. Bison will automatically select codes that don't conflict
				3717	with each other or with normal characters.
				3718
				3719	In the event that the stack type is a union, you must augment the
				3720	@code{%token} or other token declaration to include the data type
				3721	alternative delimited by angle-brackets (@pxref{Multiple Types, ,More
				3722	Than One Value Type}).
				3723
				3724	For example:
				3725
				3726	@example
				3727	@group
				3728	%union @{ /* define stack type */
				3729	double val;
				3730	symrec *tptr;
				3731	@}
				3732	%token <val> NUM /* define token NUM and its type */
				3733	@end group
				3734	@end example
				3735
				3736	You can associate a literal string token with a token type name by
				3737	writing the literal string at the end of a @code{%token}
				3738	declaration which declares the name. For example:
				3739
				3740	@example
				3741	%token arrow "=>"
				3742	@end example
				3743
				3744	@noindent
				3745	For example, a grammar for the C language might specify these names with
				3746	equivalent literal string tokens:
				3747
				3748	@example
				3749	%token <operator> OR "\|\|"
				3750	%token <operator> LE 134 "<="
				3751	%left OR "<="
				3752	@end example
				3753
				3754	@noindent
				3755	Once you equate the literal string and the token name, you can use them
				3756	interchangeably in further declarations or the grammar rules. The
				3757	@code{yylex} function can use the token name or the literal string to
				3758	obtain the token type code number (@pxref{Calling Convention}).
				3759
				3760	@node Precedence Decl
				3761	@subsection Operator Precedence
				3762	@cindex precedence declarations
				3763	@cindex declaring operator precedence
				3764	@cindex operator precedence, declaring
				3765
				3766	Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to
				3767	declare a token and specify its precedence and associativity, all at
				3768	once. These are called @dfn{precedence declarations}.
				3769	@xref{Precedence, ,Operator Precedence}, for general information on
				3770	operator precedence.
				3771
				3772	The syntax of a precedence declaration is the same as that of
				3773	@code{%token}: either
				3774
				3775	@example
				3776	%left @var{symbols}@dots{}
				3777	@end example
				3778
				3779	@noindent
				3780	or
				3781
				3782	@example
				3783	%left <@var{type}> @var{symbols}@dots{}
				3784	@end example
				3785
				3786	And indeed any of these declarations serves the purposes of @code{%token}.
				3787	But in addition, they specify the associativity and relative precedence for
				3788	all the @var{symbols}:
				3789
				3790	@itemize @bullet
				3791	@item
				3792	The associativity of an operator @var{op} determines how repeated uses
				3793	of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op}
				3794	@var{z}} is parsed by grouping @var{x} with @var{y} first or by
				3795	grouping @var{y} with @var{z} first. @code{%left} specifies
				3796	left-associativity (grouping @var{x} with @var{y} first) and
				3797	@code{%right} specifies right-associativity (grouping @var{y} with
				3798	@var{z} first). @code{%nonassoc} specifies no associativity, which
				3799	means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is
				3800	considered a syntax error.
				3801
				3802	@item
				3803	The precedence of an operator determines how it nests with other operators.
				3804	All the tokens declared in a single precedence declaration have equal
				3805	precedence and nest together according to their associativity.
				3806	When two tokens declared in different precedence declarations associate,
				3807	the one declared later has the higher precedence and is grouped first.
				3808	@end itemize
				3809
				3810	@node Union Decl
				3811	@subsection The Collection of Value Types
				3812	@cindex declaring value types
				3813	@cindex value types, declaring
				3814	@findex %union
				3815
				3816	The @code{%union} declaration specifies the entire collection of
				3817	possible data types for semantic values. The keyword @code{%union} is
				3818	followed by braced code containing the same thing that goes inside a
				3819	@code{union} in C@.
				3820
				3821	For example:
				3822
				3823	@example
				3824	@group
				3825	%union @{
				3826	double val;
				3827	symrec *tptr;
				3828	@}
				3829	@end group
				3830	@end example
				3831
				3832	@noindent
				3833	This says that the two alternative types are @code{double} and @code{symrec
				3834	*}. They are given names @code{val} and @code{tptr}; these names are used
				3835	in the @code{%token} and @code{%type} declarations to pick one of the types
				3836	for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
				3837
				3838	As an extension to @acronym{POSIX}, a tag is allowed after the
				3839	@code{union}. For example:
				3840
				3841	@example
				3842	@group
				3843	%union value @{
				3844	double val;
				3845	symrec *tptr;
				3846	@}
				3847	@end group
				3848	@end example
				3849
				3850	@noindent
				3851	specifies the union tag @code{value}, so the corresponding C type is
				3852	@code{union value}. If you do not specify a tag, it defaults to
				3853	@code{YYSTYPE}.
				3854
				3855	As another extension to @acronym{POSIX}, you may specify multiple
				3856	@code{%union} declarations; their contents are concatenated. However,
				3857	only the first @code{%union} declaration can specify a tag.
				3858
				3859	Note that, unlike making a @code{union} declaration in C, you need not write
				3860	a semicolon after the closing brace.
				3861
				3862	@node Type Decl
				3863	@subsection Nonterminal Symbols
				3864	@cindex declaring value types, nonterminals
				3865	@cindex value types, nonterminals, declaring
				3866	@findex %type
				3867
				3868	@noindent
				3869	When you use @code{%union} to specify multiple value types, you must
				3870	declare the value type of each nonterminal symbol for which values are
				3871	used. This is done with a @code{%type} declaration, like this:
				3872
				3873	@example
				3874	%type <@var{type}> @var{nonterminal}@dots{}
				3875	@end example
				3876
				3877	@noindent
				3878	Here @var{nonterminal} is the name of a nonterminal symbol, and
				3879	@var{type} is the name given in the @code{%union} to the alternative
				3880	that you want (@pxref{Union Decl, ,The Collection of Value Types}). You
				3881	can give any number of nonterminal symbols in the same @code{%type}
				3882	declaration, if they have the same value type. Use spaces to separate
				3883	the symbol names.
				3884
				3885	You can also declare the value type of a terminal symbol. To do this,
				3886	use the same @code{<@var{type}>} construction in a declaration for the
				3887	terminal symbol. All kinds of token declarations allow
				3888	@code{<@var{type}>}.
				3889
				3890	@node Initial Action Decl
				3891	@subsection Performing Actions before Parsing
				3892	@findex %initial-action
				3893
				3894	Sometimes your parser needs to perform some initializations before
				3895	parsing. The @code{%initial-action} directive allows for such arbitrary
				3896	code.
				3897
				3898	@deffn {Directive} %initial-action @{ @var{code} @}
				3899	@findex %initial-action
				3900	Declare that the braced @var{code} must be invoked before parsing each time
				3901	@code{yyparse} is called. The @var{code} may use @code{$$} and
				3902	@code{@@$} --- initial value and location of the look-ahead --- and the
				3903	@code{%parse-param}.
				3904	@end deffn
				3905
				3906	For instance, if your locations use a file name, you may use
				3907
				3908	@example
				3909	%parse-param @{ char const *file_name @};
				3910	%initial-action
				3911	@{
				3912	@@$.initialize (file_name);
				3913	@};
				3914	@end example
				3915
				3916
				3917	@node Destructor Decl
				3918	@subsection Freeing Discarded Symbols
				3919	@cindex freeing discarded symbols
				3920	@findex %destructor
				3921
				3922	During error recovery (@pxref{Error Recovery}), symbols already pushed
				3923	on the stack and tokens coming from the rest of the file are discarded
				3924	until the parser falls on its feet. If the parser runs out of memory,
				3925	or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the
				3926	symbols on the stack must be discarded. Even if the parser succeeds, it
				3927	must discard the start symbol.
				3928
				3929	When discarded symbols convey heap based information, this memory is
				3930	lost. While this behavior can be tolerable for batch parsers, such as
				3931	in traditional compilers, it is unacceptable for programs like shells or
				3932	protocol implementations that may parse and execute indefinitely.
				3933
				3934	The @code{%destructor} directive defines code that is called when a
				3935	symbol is automatically discarded.
				3936
				3937	@deffn {Directive} %destructor @{ @var{code} @} @var{symbols}
				3938	@findex %destructor
				3939	Invoke the braced @var{code} whenever the parser discards one of the
				3940	@var{symbols}.
				3941	Within @var{code}, @code{$$} designates the semantic value associated
				3942	with the discarded symbol. The additional parser parameters are also
				3943	available (@pxref{Parser Function, , The Parser Function
				3944	@code{yyparse}}).
				3945	@end deffn
				3946
				3947	For instance:
				3948
				3949	@smallexample
				3950	%union
				3951	@{
				3952	char *string;
				3953	@}
				3954	%token <string> STRING
				3955	%type <string> string
				3956	%destructor @{ free ($$); @} STRING string
				3957	@end smallexample
				3958
				3959	@noindent
				3960	guarantees that when a @code{STRING} or a @code{string} is discarded,
				3961	its associated memory will be freed.
				3962
				3963	@sp 1
				3964
				3965	@cindex discarded symbols
				3966	@dfn{Discarded symbols} are the following:
				3967
				3968	@itemize
				3969	@item
				3970	stacked symbols popped during the first phase of error recovery,
				3971	@item
				3972	incoming terminals during the second phase of error recovery,
				3973	@item
				3974	the current look-ahead and the entire stack (except the current
				3975	right-hand side symbols) when the parser returns immediately, and
				3976	@item
				3977	the start symbol, when the parser succeeds.
				3978	@end itemize
				3979
				3980	The parser can @dfn{return immediately} because of an explicit call to
				3981	@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory
				3982	exhaustion.
				3983
				3984	Right-hand size symbols of a rule that explicitly triggers a syntax
				3985	error via @code{YYERROR} are not discarded automatically. As a rule
				3986	of thumb, destructors are invoked only when user actions cannot manage
				3987	the memory.
				3988
				3989	@node Expect Decl
				3990	@subsection Suppressing Conflict Warnings
				3991	@cindex suppressing conflict warnings
				3992	@cindex preventing warnings about conflicts
				3993	@cindex warnings, preventing
				3994	@cindex conflicts, suppressing warnings of
				3995	@findex %expect
				3996	@findex %expect-rr
				3997
				3998	Bison normally warns if there are any conflicts in the grammar
				3999	(@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars
				4000	have harmless shift/reduce conflicts which are resolved in a predictable
				4001	way and would be difficult to eliminate. It is desirable to suppress
				4002	the warning about these conflicts unless the number of conflicts
				4003	changes. You can do this with the @code{%expect} declaration.
				4004
				4005	The declaration looks like this:
				4006
				4007	@example
				4008	%expect @var{n}
				4009	@end example
				4010
				4011	Here @var{n} is a decimal integer. The declaration says there should
				4012	be @var{n} shift/reduce conflicts and no reduce/reduce conflicts.
				4013	Bison reports an error if the number of shift/reduce conflicts differs
				4014	from @var{n}, or if there are any reduce/reduce conflicts.
				4015
				4016	For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more
				4017	serious, and should be eliminated entirely. Bison will always report
				4018	reduce/reduce conflicts for these parsers. With @acronym{GLR}
				4019	parsers, however, both kinds of conflicts are routine; otherwise,
				4020	there would be no need to use @acronym{GLR} parsing. Therefore, it is
				4021	also possible to specify an expected number of reduce/reduce conflicts
				4022	in @acronym{GLR} parsers, using the declaration:
				4023
				4024	@example
				4025	%expect-rr @var{n}
				4026	@end example
				4027
				4028	In general, using @code{%expect} involves these steps:
				4029
				4030	@itemize @bullet
				4031	@item
				4032	Compile your grammar without @code{%expect}. Use the @samp{-v} option
				4033	to get a verbose list of where the conflicts occur. Bison will also
				4034	print the number of conflicts.
				4035
				4036	@item
				4037	Check each of the conflicts to make sure that Bison's default
				4038	resolution is what you really want. If not, rewrite the grammar and
				4039	go back to the beginning.
				4040
				4041	@item
				4042	Add an @code{%expect} declaration, copying the number @var{n} from the
				4043	number which Bison printed. With @acronym{GLR} parsers, add an
				4044	@code{%expect-rr} declaration as well.
				4045	@end itemize
				4046
				4047	Now Bison will warn you if you introduce an unexpected conflict, but
				4048	will keep silent otherwise.
				4049
				4050	@node Start Decl
				4051	@subsection The Start-Symbol
				4052	@cindex declaring the start symbol
				4053	@cindex start symbol, declaring
				4054	@cindex default start symbol
				4055	@findex %start
				4056
				4057	Bison assumes by default that the start symbol for the grammar is the first
				4058	nonterminal specified in the grammar specification section. The programmer
				4059	may override this restriction with the @code{%start} declaration as follows:
				4060
				4061	@example
				4062	%start @var{symbol}
				4063	@end example
				4064
				4065	@node Pure Decl
				4066	@subsection A Pure (Reentrant) Parser
				4067	@cindex reentrant parser
				4068	@cindex pure parser
				4069	@findex %pure-parser
				4070
				4071	A @dfn{reentrant} program is one which does not alter in the course of
				4072	execution; in other words, it consists entirely of @dfn{pure} (read-only)
				4073	code. Reentrancy is important whenever asynchronous execution is possible;
				4074	for example, a nonreentrant program may not be safe to call from a signal
				4075	handler. In systems with multiple threads of control, a nonreentrant
				4076	program must be called only within interlocks.
				4077
				4078	Normally, Bison generates a parser which is not reentrant. This is
				4079	suitable for most uses, and it permits compatibility with Yacc. (The
				4080	standard Yacc interfaces are inherently nonreentrant, because they use
				4081	statically allocated variables for communication with @code{yylex},
				4082	including @code{yylval} and @code{yylloc}.)
				4083
				4084	Alternatively, you can generate a pure, reentrant parser. The Bison
				4085	declaration @code{%pure-parser} says that you want the parser to be
				4086	reentrant. It looks like this:
				4087
				4088	@example
				4089	%pure-parser
				4090	@end example
				4091
				4092	The result is that the communication variables @code{yylval} and
				4093	@code{yylloc} become local variables in @code{yyparse}, and a different
				4094	calling convention is used for the lexical analyzer function
				4095	@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
				4096	Parsers}, for the details of this. The variable @code{yynerrs} also
				4097	becomes local in @code{yyparse} (@pxref{Error Reporting, ,The Error
				4098	Reporting Function @code{yyerror}}). The convention for calling
				4099	@code{yyparse} itself is unchanged.
				4100
				4101	Whether the parser is pure has nothing to do with the grammar rules.
				4102	You can generate either a pure parser or a nonreentrant parser from any
				4103	valid grammar.
				4104
				4105	@node Decl Summary
				4106	@subsection Bison Declaration Summary
				4107	@cindex Bison declaration summary
				4108	@cindex declaration summary
				4109	@cindex summary, Bison declaration
				4110
				4111	Here is a summary of the declarations used to define a grammar:
				4112
				4113	@deffn {Directive} %union
				4114	Declare the collection of data types that semantic values may have
				4115	(@pxref{Union Decl, ,The Collection of Value Types}).
				4116	@end deffn
				4117
				4118	@deffn {Directive} %token
				4119	Declare a terminal symbol (token type name) with no precedence
				4120	or associativity specified (@pxref{Token Decl, ,Token Type Names}).
				4121	@end deffn
				4122
				4123	@deffn {Directive} %right
				4124	Declare a terminal symbol (token type name) that is right-associative
				4125	(@pxref{Precedence Decl, ,Operator Precedence}).
				4126	@end deffn
				4127
				4128	@deffn {Directive} %left
				4129	Declare a terminal symbol (token type name) that is left-associative
				4130	(@pxref{Precedence Decl, ,Operator Precedence}).
				4131	@end deffn
				4132
				4133	@deffn {Directive} %nonassoc
				4134	Declare a terminal symbol (token type name) that is nonassociative
				4135	(@pxref{Precedence Decl, ,Operator Precedence}).
				4136	Using it in a way that would be associative is a syntax error.
				4137	@end deffn
				4138
				4139	@ifset defaultprec
				4140	@deffn {Directive} %default-prec
				4141	Assign a precedence to rules lacking an explicit @code{%prec} modifier
				4142	(@pxref{Contextual Precedence, ,Context-Dependent Precedence}).
				4143	@end deffn
				4144	@end ifset
				4145
				4146	@deffn {Directive} %type
				4147	Declare the type of semantic values for a nonterminal symbol
				4148	(@pxref{Type Decl, ,Nonterminal Symbols}).
				4149	@end deffn
				4150
				4151	@deffn {Directive} %start
				4152	Specify the grammar's start symbol (@pxref{Start Decl, ,The
				4153	Start-Symbol}).
				4154	@end deffn
				4155
				4156	@deffn {Directive} %expect
				4157	Declare the expected number of shift-reduce conflicts
				4158	(@pxref{Expect Decl, ,Suppressing Conflict Warnings}).
				4159	@end deffn
				4160
				4161
				4162	@sp 1
				4163	@noindent
				4164	In order to change the behavior of @command{bison}, use the following
				4165	directives:
				4166
				4167	@deffn {Directive} %debug
				4168	In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
				4169	already defined, so that the debugging facilities are compiled.
				4170	@end deffn
				4171	@xref{Tracing, ,Tracing Your Parser}.
				4172
				4173	@deffn {Directive} %defines
				4174	Write a header file containing macro definitions for the token type
				4175	names defined in the grammar as well as a few other declarations.
				4176	If the parser output file is named @file{@var{name}.c} then this file
				4177	is named @file{@var{name}.h}.
				4178
				4179	Unless @code{YYSTYPE} is already defined as a macro, the output header
				4180	declares @code{YYSTYPE}. Therefore, if you are using a @code{%union}
				4181	(@pxref{Multiple Types, ,More Than One Value Type}) with components that
				4182	require other definitions, or if you have defined a @code{YYSTYPE} macro
				4183	(@pxref{Value Type, ,Data Types of Semantic Values}), you need to
				4184	arrange for these definitions to be propagated to all modules, e.g., by
				4185	putting them in a prerequisite header that is included both by your
				4186	parser and by any other module that needs @code{YYSTYPE}.
				4187
				4188	Unless your parser is pure, the output header declares @code{yylval}
				4189	as an external variable. @xref{Pure Decl, ,A Pure (Reentrant)
				4190	Parser}.
				4191
				4192	If you have also used locations, the output header declares
				4193	@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of
				4194	@code{YYSTYPE} and @code{yylval}. @xref{Locations, ,Tracking
				4195	Locations}.
				4196
				4197	This output file is normally essential if you wish to put the definition
				4198	of @code{yylex} in a separate source file, because @code{yylex}
				4199	typically needs to be able to refer to the above-mentioned declarations
				4200	and to the token type codes. @xref{Token Values, ,Semantic Values of
				4201	Tokens}.
				4202	@end deffn
				4203
				4204	@deffn {Directive} %destructor
				4205	Specify how the parser should reclaim the memory associated to
				4206	discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
				4207	@end deffn
				4208
				4209	@deffn {Directive} %file-prefix="@var{prefix}"
				4210	Specify a prefix to use for all Bison output file names. The names are
				4211	chosen as if the input file were named @file{@var{prefix}.y}.
				4212	@end deffn
				4213
				4214	@deffn {Directive} %locations
				4215	Generate the code processing the locations (@pxref{Action Features,
				4216	,Special Features for Use in Actions}). This mode is enabled as soon as
				4217	the grammar uses the special @samp{@@@var{n}} tokens, but if your
				4218	grammar does not use it, using @samp{%locations} allows for more
				4219	accurate syntax error messages.
				4220	@end deffn
				4221
				4222	@deffn {Directive} %name-prefix="@var{prefix}"
				4223	Rename the external symbols used in the parser so that they start with
				4224	@var{prefix} instead of @samp{yy}. The precise list of symbols renamed
				4225	in C parsers
				4226	is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs},
				4227	@code{yylval}, @code{yychar}, @code{yydebug}, and
				4228	(if locations are used) @code{yylloc}. For example, if you use
				4229	@samp{%name-prefix="c_"}, the names become @code{c_parse}, @code{c_lex},
				4230	and so on. In C++ parsers, it is only the surrounding namespace which is
				4231	named @var{prefix} instead of @samp{yy}.
				4232	@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
				4233	@end deffn
				4234
				4235	@ifset defaultprec
				4236	@deffn {Directive} %no-default-prec
				4237	Do not assign a precedence to rules lacking an explicit @code{%prec}
				4238	modifier (@pxref{Contextual Precedence, ,Context-Dependent
				4239	Precedence}).
				4240	@end deffn
				4241	@end ifset
				4242
				4243	@deffn {Directive} %no-parser
				4244	Do not include any C code in the parser file; generate tables only. The
				4245	parser file contains just @code{#define} directives and static variable
				4246	declarations.
				4247
				4248	This option also tells Bison to write the C code for the grammar actions
				4249	into a file named @file{@var{file}.act}, in the form of a
				4250	brace-surrounded body fit for a @code{switch} statement.
				4251	@end deffn
				4252
				4253	@deffn {Directive} %no-lines
				4254	Don't generate any @code{#line} preprocessor commands in the parser
				4255	file. Ordinarily Bison writes these commands in the parser file so that
				4256	the C compiler and debuggers will associate errors and object code with
				4257	your source file (the grammar file). This directive causes them to
				4258	associate errors with the parser file, treating it an independent source
				4259	file in its own right.
				4260	@end deffn
				4261
				4262	@deffn {Directive} %output="@var{file}"
				4263	Specify @var{file} for the parser file.
				4264	@end deffn
				4265
				4266	@deffn {Directive} %pure-parser
				4267	Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure
				4268	(Reentrant) Parser}).
				4269	@end deffn
				4270
				4271	@deffn {Directive} %require "@var{version}"
				4272	Require version @var{version} or higher of Bison. @xref{Require Decl, ,
				4273	Require a Version of Bison}.
				4274	@end deffn
				4275
				4276	@deffn {Directive} %token-table
				4277	Generate an array of token names in the parser file. The name of the
				4278	array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
				4279	token whose internal Bison token code number is @var{i}. The first
				4280	three elements of @code{yytname} correspond to the predefined tokens
				4281	@code{"$end"},
				4282	@code{"error"}, and @code{"$undefined"}; after these come the symbols
				4283	defined in the grammar file.
				4284
				4285	The name in the table includes all the characters needed to represent
				4286	the token in Bison. For single-character literals and literal
				4287	strings, this includes the surrounding quoting characters and any
				4288	escape sequences. For example, the Bison single-character literal
				4289	@code{'+'} corresponds to a three-character name, represented in C as
				4290	@code{"'+'"}; and the Bison two-character literal string @code{"\\/"}
				4291	corresponds to a five-character name, represented in C as
				4292	@code{"\"\\\\/\""}.
				4293
				4294	When you specify @code{%token-table}, Bison also generates macro
				4295	definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
				4296	@code{YYNRULES}, and @code{YYNSTATES}:
				4297
				4298	@table @code
				4299	@item YYNTOKENS
				4300	The highest token number, plus one.
				4301	@item YYNNTS
				4302	The number of nonterminal symbols.
				4303	@item YYNRULES
				4304	The number of grammar rules,
				4305	@item YYNSTATES
				4306	The number of parser states (@pxref{Parser States}).
				4307	@end table
				4308	@end deffn
				4309
				4310	@deffn {Directive} %verbose
				4311	Write an extra output file containing verbose descriptions of the
				4312	parser states and what is done for each type of look-ahead token in
				4313	that state. @xref{Understanding, , Understanding Your Parser}, for more
				4314	information.
				4315	@end deffn
				4316
				4317	@deffn {Directive} %yacc
				4318	Pretend the option @option{--yacc} was given, i.e., imitate Yacc,
				4319	including its naming conventions. @xref{Bison Options}, for more.
				4320	@end deffn
				4321
				4322
				4323	@node Multiple Parsers
				4324	@section Multiple Parsers in the Same Program
				4325
				4326	Most programs that use Bison parse only one language and therefore contain
				4327	only one Bison parser. But what if you want to parse more than one
				4328	language with the same program? Then you need to avoid a name conflict
				4329	between different definitions of @code{yyparse}, @code{yylval}, and so on.
				4330
				4331	The easy way to do this is to use the option @samp{-p @var{prefix}}
				4332	(@pxref{Invocation, ,Invoking Bison}). This renames the interface
				4333	functions and variables of the Bison parser to start with @var{prefix}
				4334	instead of @samp{yy}. You can use this to give each parser distinct
				4335	names that do not conflict.
				4336
				4337	The precise list of symbols renamed is @code{yyparse}, @code{yylex},
				4338	@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yylloc},
				4339	@code{yychar} and @code{yydebug}. For example, if you use @samp{-p c},
				4340	the names become @code{cparse}, @code{clex}, and so on.
				4341
				4342	@strong{All the other variables and macros associated with Bison are not
				4343	renamed.} These others are not global; there is no conflict if the same
				4344	name is used in different parsers. For example, @code{YYSTYPE} is not
				4345	renamed, but defining this in different ways in different parsers causes
				4346	no trouble (@pxref{Value Type, ,Data Types of Semantic Values}).
				4347
				4348	The @samp{-p} option works by adding macro definitions to the beginning
				4349	of the parser source file, defining @code{yyparse} as
				4350	@code{@var{prefix}parse}, and so on. This effectively substitutes one
				4351	name for the other in the entire parser file.
				4352
				4353	@node Interface
				4354	@chapter Parser C-Language Interface
				4355	@cindex C-language interface
				4356	@cindex interface
				4357
				4358	The Bison parser is actually a C function named @code{yyparse}. Here we
				4359	describe the interface conventions of @code{yyparse} and the other
				4360	functions that it needs to use.
				4361
				4362	Keep in mind that the parser uses many C identifiers starting with
				4363	@samp{yy} and @samp{YY} for internal purposes. If you use such an
				4364	identifier (aside from those in this manual) in an action or in epilogue
				4365	in the grammar file, you are likely to run into trouble.
				4366
				4367	@menu
				4368	* Parser Function:: How to call @code{yyparse} and what it returns.
				4369	* Lexical:: You must supply a function @code{yylex}
				4370	which reads tokens.
				4371	* Error Reporting:: You must supply a function @code{yyerror}.
				4372	* Action Features:: Special features for use in actions.
				4373	* Internationalization:: How to let the parser speak in the user's
				4374	native language.
				4375	@end menu
				4376
				4377	@node Parser Function
				4378	@section The Parser Function @code{yyparse}
				4379	@findex yyparse
				4380
				4381	You call the function @code{yyparse} to cause parsing to occur. This
				4382	function reads tokens, executes actions, and ultimately returns when it
				4383	encounters end-of-input or an unrecoverable syntax error. You can also
				4384	write an action which directs @code{yyparse} to return immediately
				4385	without reading further.
				4386
				4387
				4388	@deftypefun int yyparse (void)
				4389	The value returned by @code{yyparse} is 0 if parsing was successful (return
				4390	is due to end-of-input).
				4391
				4392	The value is 1 if parsing failed because of invalid input, i.e., input
				4393	that contains a syntax error or that causes @code{YYABORT} to be
				4394	invoked.
				4395
				4396	The value is 2 if parsing failed due to memory exhaustion.
				4397	@end deftypefun
				4398
				4399	In an action, you can cause immediate return from @code{yyparse} by using
				4400	these macros:
				4401
				4402	@defmac YYACCEPT
				4403	@findex YYACCEPT
				4404	Return immediately with value 0 (to report success).
				4405	@end defmac
				4406
				4407	@defmac YYABORT
				4408	@findex YYABORT
				4409	Return immediately with value 1 (to report failure).
				4410	@end defmac
				4411
				4412	If you use a reentrant parser, you can optionally pass additional
				4413	parameter information to it in a reentrant way. To do so, use the
				4414	declaration @code{%parse-param}:
				4415
				4416	@deffn {Directive} %parse-param @{@var{argument-declaration}@}
				4417	@findex %parse-param
				4418	Declare that an argument declared by the braced-code
				4419	@var{argument-declaration} is an additional @code{yyparse} argument.
				4420	The @var{argument-declaration} is used when declaring
				4421	functions or prototypes. The last identifier in
				4422	@var{argument-declaration} must be the argument name.
				4423	@end deffn
				4424
				4425	Here's an example. Write this in the parser:
				4426
				4427	@example
				4428	%parse-param @{int *nastiness@}
				4429	%parse-param @{int *randomness@}
				4430	@end example
				4431
				4432	@noindent
				4433	Then call the parser like this:
				4434
				4435	@example
				4436	@{
				4437	int nastiness, randomness;
				4438	@dots{} /* @r{Store proper data in @code{nastiness} and @code{randomness}.} */
				4439	value = yyparse (&nastiness, &randomness);
				4440	@dots{}
				4441	@}
				4442	@end example
				4443
				4444	@noindent
				4445	In the grammar actions, use expressions like this to refer to the data:
				4446
				4447	@example
				4448	exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @}
				4449	@end example
				4450
				4451
				4452	@node Lexical
				4453	@section The Lexical Analyzer Function @code{yylex}
				4454	@findex yylex
				4455	@cindex lexical analyzer
				4456
				4457	The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
				4458	the input stream and returns them to the parser. Bison does not create
				4459	this function automatically; you must write it so that @code{yyparse} can
				4460	call it. The function is sometimes referred to as a lexical scanner.
				4461
				4462	In simple programs, @code{yylex} is often defined at the end of the Bison
				4463	grammar file. If @code{yylex} is defined in a separate source file, you
				4464	need to arrange for the token-type macro definitions to be available there.
				4465	To do this, use the @samp{-d} option when you run Bison, so that it will
				4466	write these macro definitions into a separate header file
				4467	@file{@var{name}.tab.h} which you can include in the other source files
				4468	that need it. @xref{Invocation, ,Invoking Bison}.
				4469
				4470	@menu
				4471	* Calling Convention:: How @code{yyparse} calls @code{yylex}.
				4472	* Token Values:: How @code{yylex} must return the semantic value
				4473	of the token it has read.
				4474	* Token Locations:: How @code{yylex} must return the text location
				4475	(line number, etc.) of the token, if the
				4476	actions want that.
				4477	* Pure Calling:: How the calling convention differs
				4478	in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
				4479	@end menu
				4480
				4481	@node Calling Convention
				4482	@subsection Calling Convention for @code{yylex}
				4483
				4484	The value that @code{yylex} returns must be the positive numeric code
				4485	for the type of token it has just found; a zero or negative value
				4486	signifies end-of-input.
				4487
				4488	When a token is referred to in the grammar rules by a name, that name
				4489	in the parser file becomes a C macro whose definition is the proper
				4490	numeric code for that token type. So @code{yylex} can use the name
				4491	to indicate that type. @xref{Symbols}.
				4492
				4493	When a token is referred to in the grammar rules by a character literal,
				4494	the numeric code for that character is also the code for the token type.
				4495	So @code{yylex} can simply return that character code, possibly converted
				4496	to @code{unsigned char} to avoid sign-extension. The null character
				4497	must not be used this way, because its code is zero and that
				4498	signifies end-of-input.
				4499
				4500	Here is an example showing these things:
				4501
				4502	@example
				4503	int
				4504	yylex (void)
				4505	@{
				4506	@dots{}
				4507	if (c == EOF) /* Detect end-of-input. */
				4508	return 0;
				4509	@dots{}
				4510	if (c == '+' \|\| c == '-')
				4511	return c; /* Assume token type for `+' is '+'. */
				4512	@dots{}
				4513	return INT; /* Return the type of the token. */
				4514	@dots{}
				4515	@}
				4516	@end example
				4517
				4518	@noindent
				4519	This interface has been designed so that the output from the @code{lex}
				4520	utility can be used without change as the definition of @code{yylex}.
				4521
				4522	If the grammar uses literal string tokens, there are two ways that
				4523	@code{yylex} can determine the token type codes for them:
				4524
				4525	@itemize @bullet
				4526	@item
				4527	If the grammar defines symbolic token names as aliases for the
				4528	literal string tokens, @code{yylex} can use these symbolic names like
				4529	all others. In this case, the use of the literal string tokens in
				4530	the grammar file has no effect on @code{yylex}.
				4531
				4532	@item
				4533	@code{yylex} can find the multicharacter token in the @code{yytname}
				4534	table. The index of the token in the table is the token type's code.
				4535	The name of a multicharacter token is recorded in @code{yytname} with a
				4536	double-quote, the token's characters, and another double-quote. The
				4537	token's characters are escaped as necessary to be suitable as input
				4538	to Bison.
				4539
				4540	Here's code for looking up a multicharacter token in @code{yytname},
				4541	assuming that the characters of the token are stored in
				4542	@code{token_buffer}, and assuming that the token does not contain any
				4543	characters like @samp{"} that require escaping.
				4544
				4545	@smallexample
				4546	for (i = 0; i < YYNTOKENS; i++)
				4547	@{
				4548	if (yytname[i] != 0
				4549	&& yytname[i][0] == '"'
				4550	&& ! strncmp (yytname[i] + 1, token_buffer,
				4551	strlen (token_buffer))
				4552	&& yytname[i][strlen (token_buffer) + 1] == '"'
				4553	&& yytname[i][strlen (token_buffer) + 2] == 0)
				4554	break;
				4555	@}
				4556	@end smallexample
				4557
				4558	The @code{yytname} table is generated only if you use the
				4559	@code{%token-table} declaration. @xref{Decl Summary}.
				4560	@end itemize
				4561
				4562	@node Token Values
				4563	@subsection Semantic Values of Tokens
				4564
				4565	@vindex yylval
				4566	In an ordinary (nonreentrant) parser, the semantic value of the token must
				4567	be stored into the global variable @code{yylval}. When you are using
				4568	just one data type for semantic values, @code{yylval} has that type.
				4569	Thus, if the type is @code{int} (the default), you might write this in
				4570	@code{yylex}:
				4571
				4572	@example
				4573	@group
				4574	@dots{}
				4575	yylval = value; /* Put value onto Bison stack. */
				4576	return INT; /* Return the type of the token. */
				4577	@dots{}
				4578	@end group
				4579	@end example
				4580
				4581	When you are using multiple data types, @code{yylval}'s type is a union
				4582	made from the @code{%union} declaration (@pxref{Union Decl, ,The
				4583	Collection of Value Types}). So when you store a token's value, you
				4584	must use the proper member of the union. If the @code{%union}
				4585	declaration looks like this:
				4586
				4587	@example
				4588	@group
				4589	%union @{
				4590	int intval;
				4591	double val;
				4592	symrec *tptr;
				4593	@}
				4594	@end group
				4595	@end example
				4596
				4597	@noindent
				4598	then the code in @code{yylex} might look like this:
				4599
				4600	@example
				4601	@group
				4602	@dots{}
				4603	yylval.intval = value; /* Put value onto Bison stack. */
				4604	return INT; /* Return the type of the token. */
				4605	@dots{}
				4606	@end group
				4607	@end example
				4608
				4609	@node Token Locations
				4610	@subsection Textual Locations of Tokens
				4611
				4612	@vindex yylloc
				4613	If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
				4614	Tracking Locations}) in actions to keep track of the textual locations
				4615	of tokens and groupings, then you must provide this information in
				4616	@code{yylex}. The function @code{yyparse} expects to find the textual
				4617	location of a token just parsed in the global variable @code{yylloc}.
				4618	So @code{yylex} must store the proper data in that variable.
				4619
				4620	By default, the value of @code{yylloc} is a structure and you need only
				4621	initialize the members that are going to be used by the actions. The
				4622	four members are called @code{first_line}, @code{first_column},
				4623	@code{last_line} and @code{last_column}. Note that the use of this
				4624	feature makes the parser noticeably slower.
				4625
				4626	@tindex YYLTYPE
				4627	The data type of @code{yylloc} has the name @code{YYLTYPE}.
				4628
				4629	@node Pure Calling
				4630	@subsection Calling Conventions for Pure Parsers
				4631
				4632	When you use the Bison declaration @code{%pure-parser} to request a
				4633	pure, reentrant parser, the global communication variables @code{yylval}
				4634	and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
				4635	Parser}.) In such parsers the two global variables are replaced by
				4636	pointers passed as arguments to @code{yylex}. You must declare them as
				4637	shown here, and pass the information back by storing it through those
				4638	pointers.
				4639
				4640	@example
				4641	int
				4642	yylex (YYSTYPE lvalp, YYLTYPE llocp)
				4643	@{
				4644	@dots{}
				4645	lvalp = value; / Put value onto Bison stack. */
				4646	return INT; /* Return the type of the token. */
				4647	@dots{}
				4648	@}
				4649	@end example
				4650
				4651	If the grammar file does not use the @samp{@@} constructs to refer to
				4652	textual locations, then the type @code{YYLTYPE} will not be defined. In
				4653	this case, omit the second argument; @code{yylex} will be called with
				4654	only one argument.
				4655
				4656
				4657	If you wish to pass the additional parameter data to @code{yylex}, use
				4658	@code{%lex-param} just like @code{%parse-param} (@pxref{Parser
				4659	Function}).
				4660
				4661	@deffn {Directive} lex-param @{@var{argument-declaration}@}
				4662	@findex %lex-param
				4663	Declare that the braced-code @var{argument-declaration} is an
				4664	additional @code{yylex} argument declaration.
				4665	@end deffn
				4666
				4667	For instance:
				4668
				4669	@example
				4670	%parse-param @{int *nastiness@}
				4671	%lex-param @{int *nastiness@}
				4672	%parse-param @{int *randomness@}
				4673	@end example
				4674
				4675	@noindent
				4676	results in the following signature:
				4677
				4678	@example
				4679	int yylex (int *nastiness);
				4680	int yyparse (int nastiness, int randomness);
				4681	@end example
				4682
				4683	If @code{%pure-parser} is added:
				4684
				4685	@example
				4686	int yylex (YYSTYPE lvalp, int nastiness);
				4687	int yyparse (int nastiness, int randomness);
				4688	@end example
				4689
				4690	@noindent
				4691	and finally, if both @code{%pure-parser} and @code{%locations} are used:
				4692
				4693	@example
				4694	int yylex (YYSTYPE lvalp, YYLTYPE llocp, int *nastiness);
				4695	int yyparse (int nastiness, int randomness);
				4696	@end example
				4697
				4698	@node Error Reporting
				4699	@section The Error Reporting Function @code{yyerror}
				4700	@cindex error reporting function
				4701	@findex yyerror
				4702	@cindex parse error
				4703	@cindex syntax error
				4704
				4705	The Bison parser detects a @dfn{syntax error} or @dfn{parse error}
				4706	whenever it reads a token which cannot satisfy any syntax rule. An
				4707	action in the grammar can also explicitly proclaim an error, using the
				4708	macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use
				4709	in Actions}).
				4710
				4711	The Bison parser expects to report the error by calling an error
				4712	reporting function named @code{yyerror}, which you must supply. It is
				4713	called by @code{yyparse} whenever a syntax error is found, and it
				4714	receives one argument. For a syntax error, the string is normally
				4715	@w{@code{"syntax error"}}.
				4716
				4717	@findex %error-verbose
				4718	If you invoke the directive @code{%error-verbose} in the Bison
				4719	declarations section (@pxref{Bison Declarations, ,The Bison Declarations
				4720	Section}), then Bison provides a more verbose and specific error message
				4721	string instead of just plain @w{@code{"syntax error"}}.
				4722
				4723	The parser can detect one other kind of error: memory exhaustion. This
				4724	can happen when the input contains constructions that are very deeply
				4725	nested. It isn't likely you will encounter this, since the Bison
				4726	parser normally extends its stack automatically up to a very large limit. But
				4727	if memory is exhausted, @code{yyparse} calls @code{yyerror} in the usual
				4728	fashion, except that the argument string is @w{@code{"memory exhausted"}}.
				4729
				4730	In some cases diagnostics like @w{@code{"syntax error"}} are
				4731	translated automatically from English to some other language before
				4732	they are passed to @code{yyerror}. @xref{Internationalization}.
				4733
				4734	The following definition suffices in simple programs:
				4735
				4736	@example
				4737	@group
				4738	void
				4739	yyerror (char const *s)
				4740	@{
				4741	@end group
				4742	@group
				4743	fprintf (stderr, "%s\n", s);
				4744	@}
				4745	@end group
				4746	@end example
				4747
				4748	After @code{yyerror} returns to @code{yyparse}, the latter will attempt
				4749	error recovery if you have written suitable error recovery grammar rules
				4750	(@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will
				4751	immediately return 1.
				4752
				4753	Obviously, in location tracking pure parsers, @code{yyerror} should have
				4754	an access to the current location.
				4755	This is indeed the case for the @acronym{GLR}
				4756	parsers, but not for the Yacc parser, for historical reasons. I.e., if
				4757	@samp{%locations %pure-parser} is passed then the prototypes for
				4758	@code{yyerror} are:
				4759
				4760	@example
				4761	void yyerror (char const msg); / Yacc parsers. */
				4762	void yyerror (YYLTYPE locp, char const msg); /* GLR parsers. */
				4763	@end example
				4764
				4765	If @samp{%parse-param @{int *nastiness@}} is used, then:
				4766
				4767	@example
				4768	void yyerror (int nastiness, char const msg); /* Yacc parsers. */
				4769	void yyerror (int nastiness, char const msg); /* GLR parsers. */
				4770	@end example
				4771
				4772	Finally, @acronym{GLR} and Yacc parsers share the same @code{yyerror} calling
				4773	convention for absolutely pure parsers, i.e., when the calling
				4774	convention of @code{yylex} @emph{and} the calling convention of
				4775	@code{%pure-parser} are pure. I.e.:
				4776
				4777	@example
				4778	/* Location tracking. */
				4779	%locations
				4780	/* Pure yylex. */
				4781	%pure-parser
				4782	%lex-param @{int *nastiness@}
				4783	/* Pure yyparse. */
				4784	%parse-param @{int *nastiness@}
				4785	%parse-param @{int *randomness@}
				4786	@end example
				4787
				4788	@noindent
				4789	results in the following signatures for all the parser kinds:
				4790
				4791	@example
				4792	int yylex (YYSTYPE lvalp, YYLTYPE llocp, int *nastiness);
				4793	int yyparse (int nastiness, int randomness);
				4794	void yyerror (YYLTYPE *locp,
				4795	int nastiness, int randomness,
				4796	char const *msg);
				4797	@end example
				4798
				4799	@noindent
				4800	The prototypes are only indications of how the code produced by Bison
				4801	uses @code{yyerror}. Bison-generated code always ignores the returned
				4802	value, so @code{yyerror} can return any type, including @code{void}.
				4803	Also, @code{yyerror} can be a variadic function; that is why the
				4804	message is always passed last.
				4805
				4806	Traditionally @code{yyerror} returns an @code{int} that is always
				4807	ignored, but this is purely for historical reasons, and @code{void} is
				4808	preferable since it more accurately describes the return type for
				4809	@code{yyerror}.
				4810
				4811	@vindex yynerrs
				4812	The variable @code{yynerrs} contains the number of syntax errors
				4813	reported so far. Normally this variable is global; but if you
				4814	request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser})
				4815	then it is a local variable which only the actions can access.
				4816
				4817	@node Action Features
				4818	@section Special Features for Use in Actions
				4819	@cindex summary, action features
				4820	@cindex action features summary
				4821
				4822	Here is a table of Bison constructs, variables and macros that
				4823	are useful in actions.
				4824
				4825	@deffn {Variable} $$
				4826	Acts like a variable that contains the semantic value for the
				4827	grouping made by the current rule. @xref{Actions}.
				4828	@end deffn
				4829
				4830	@deffn {Variable} $@var{n}
				4831	Acts like a variable that contains the semantic value for the
				4832	@var{n}th component of the current rule. @xref{Actions}.
				4833	@end deffn
				4834
				4835	@deffn {Variable} $<@var{typealt}>$
				4836	Like @code{$$} but specifies alternative @var{typealt} in the union
				4837	specified by the @code{%union} declaration. @xref{Action Types, ,Data
				4838	Types of Values in Actions}.
				4839	@end deffn
				4840
				4841	@deffn {Variable} $<@var{typealt}>@var{n}
				4842	Like @code{$@var{n}} but specifies alternative @var{typealt} in the
				4843	union specified by the @code{%union} declaration.
				4844	@xref{Action Types, ,Data Types of Values in Actions}.
				4845	@end deffn
				4846
				4847	@deffn {Macro} YYABORT;
				4848	Return immediately from @code{yyparse}, indicating failure.
				4849	@xref{Parser Function, ,The Parser Function @code{yyparse}}.
				4850	@end deffn
				4851
				4852	@deffn {Macro} YYACCEPT;
				4853	Return immediately from @code{yyparse}, indicating success.
				4854	@xref{Parser Function, ,The Parser Function @code{yyparse}}.
				4855	@end deffn
				4856
				4857	@deffn {Macro} YYBACKUP (@var{token}, @var{value});
				4858	@findex YYBACKUP
				4859	Unshift a token. This macro is allowed only for rules that reduce
				4860	a single value, and only when there is no look-ahead token.
				4861	It is also disallowed in @acronym{GLR} parsers.
				4862	It installs a look-ahead token with token type @var{token} and
				4863	semantic value @var{value}; then it discards the value that was
				4864	going to be reduced by this rule.
				4865
				4866	If the macro is used when it is not valid, such as when there is
				4867	a look-ahead token already, then it reports a syntax error with
				4868	a message @samp{cannot back up} and performs ordinary error
				4869	recovery.
				4870
				4871	In either case, the rest of the action is not executed.
				4872	@end deffn
				4873
				4874	@deffn {Macro} YYEMPTY
				4875	@vindex YYEMPTY
				4876	Value stored in @code{yychar} when there is no look-ahead token.
				4877	@end deffn
				4878
				4879	@deffn {Macro} YYEOF
				4880	@vindex YYEOF
				4881	Value stored in @code{yychar} when the look-ahead is the end of the input
				4882	stream.
				4883	@end deffn
				4884
				4885	@deffn {Macro} YYERROR;
				4886	@findex YYERROR
				4887	Cause an immediate syntax error. This statement initiates error
				4888	recovery just as if the parser itself had detected an error; however, it
				4889	does not call @code{yyerror}, and does not print any message. If you
				4890	want to print an error message, call @code{yyerror} explicitly before
				4891	the @samp{YYERROR;} statement. @xref{Error Recovery}.
				4892	@end deffn
				4893
				4894	@deffn {Macro} YYRECOVERING
				4895	@findex YYRECOVERING
				4896	The expression @code{YYRECOVERING ()} yields 1 when the parser
				4897	is recovering from a syntax error, and 0 otherwise.
				4898	@xref{Error Recovery}.
				4899	@end deffn
				4900
				4901	@deffn {Variable} yychar
				4902	Variable containing either the look-ahead token, or @code{YYEOF} when the
				4903	look-ahead is the end of the input stream, or @code{YYEMPTY} when no look-ahead
				4904	has been performed so the next token is not yet known.
				4905	Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic
				4906	Actions}).
				4907	@xref{Look-Ahead, ,Look-Ahead Tokens}.
				4908	@end deffn
				4909
				4910	@deffn {Macro} yyclearin;
				4911	Discard the current look-ahead token. This is useful primarily in
				4912	error rules.
				4913	Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR
				4914	Semantic Actions}).
				4915	@xref{Error Recovery}.
				4916	@end deffn
				4917
				4918	@deffn {Macro} yyerrok;
				4919	Resume generating error messages immediately for subsequent syntax
				4920	errors. This is useful primarily in error rules.
				4921	@xref{Error Recovery}.
				4922	@end deffn
				4923
				4924	@deffn {Variable} yylloc
				4925	Variable containing the look-ahead token location when @code{yychar} is not set
				4926	to @code{YYEMPTY} or @code{YYEOF}.
				4927	Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic
				4928	Actions}).
				4929	@xref{Actions and Locations, ,Actions and Locations}.
				4930	@end deffn
				4931
				4932	@deffn {Variable} yylval
				4933	Variable containing the look-ahead token semantic value when @code{yychar} is
				4934	not set to @code{YYEMPTY} or @code{YYEOF}.
				4935	Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic
				4936	Actions}).
				4937	@xref{Actions, ,Actions}.
				4938	@end deffn
				4939
				4940	@deffn {Value} @@$
				4941	@findex @@$
				4942	Acts like a structure variable containing information on the textual location
				4943	of the grouping made by the current rule. @xref{Locations, ,
				4944	Tracking Locations}.
				4945
				4946	@c Check if those paragraphs are still useful or not.
				4947
				4948	@c @example
				4949	@c struct @{
				4950	@c int first_line, last_line;
				4951	@c int first_column, last_column;
				4952	@c @};
				4953	@c @end example
				4954
				4955	@c Thus, to get the starting line number of the third component, you would
				4956	@c use @samp{@@3.first_line}.
				4957
				4958	@c In order for the members of this structure to contain valid information,
				4959	@c you must make @code{yylex} supply this information about each token.
				4960	@c If you need only certain members, then @code{yylex} need only fill in
				4961	@c those members.
				4962
				4963	@c The use of this feature makes the parser noticeably slower.
				4964	@end deffn
				4965
				4966	@deffn {Value} @@@var{n}
				4967	@findex @@@var{n}
				4968	Acts like a structure variable containing information on the textual location
				4969	of the @var{n}th component of the current rule. @xref{Locations, ,
				4970	Tracking Locations}.
				4971	@end deffn
				4972
				4973	@node Internationalization
				4974	@section Parser Internationalization
				4975	@cindex internationalization
				4976	@cindex i18n
				4977	@cindex NLS
				4978	@cindex gettext
				4979	@cindex bison-po
				4980
				4981	A Bison-generated parser can print diagnostics, including error and
				4982	tracing messages. By default, they appear in English. However, Bison
				4983	also supports outputting diagnostics in the user's native language. To
				4984	make this work, the user should set the usual environment variables.
				4985	@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}.
				4986	For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might
				4987	set the user's locale to French Canadian using the @acronym{UTF}-8
				4988	encoding. The exact set of available locales depends on the user's
				4989	installation.
				4990
				4991	The maintainer of a package that uses a Bison-generated parser enables
				4992	the internationalization of the parser's output through the following
				4993	steps. Here we assume a package that uses @acronym{GNU} Autoconf and
				4994	@acronym{GNU} Automake.
				4995
				4996	@enumerate
				4997	@item
				4998	@cindex bison-i18n.m4
				4999	Into the directory containing the @acronym{GNU} Autoconf macros used
				5000	by the package---often called @file{m4}---copy the
				5001	@file{bison-i18n.m4} file installed by Bison under
				5002	@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory.
				5003	For example:
				5004
				5005	@example
				5006	cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4
				5007	@end example
				5008
				5009	@item
				5010	@findex BISON_I18N
				5011	@vindex BISON_LOCALEDIR
				5012	@vindex YYENABLE_NLS
				5013	In the top-level @file{configure.ac}, after the @code{AM_GNU_GETTEXT}
				5014	invocation, add an invocation of @code{BISON_I18N}. This macro is
				5015	defined in the file @file{bison-i18n.m4} that you copied earlier. It
				5016	causes @samp{configure} to find the value of the
				5017	@code{BISON_LOCALEDIR} variable, and it defines the source-language
				5018	symbol @code{YYENABLE_NLS} to enable translations in the
				5019	Bison-generated parser.
				5020
				5021	@item
				5022	In the @code{main} function of your program, designate the directory
				5023	containing Bison's runtime message catalog, through a call to
				5024	@samp{bindtextdomain} with domain name @samp{bison-runtime}.
				5025	For example:
				5026
				5027	@example
				5028	bindtextdomain ("bison-runtime", BISON_LOCALEDIR);
				5029	@end example
				5030
				5031	Typically this appears after any other call @code{bindtextdomain
				5032	(PACKAGE, LOCALEDIR)} that your package already has. Here we rely on
				5033	@samp{BISON_LOCALEDIR} to be defined as a string through the
				5034	@file{Makefile}.
				5035
				5036	@item
				5037	In the @file{Makefile.am} that controls the compilation of the @code{main}
				5038	function, make @samp{BISON_LOCALEDIR} available as a C preprocessor macro,
				5039	either in @samp{DEFS} or in @samp{AM_CPPFLAGS}. For example:
				5040
				5041	@example
				5042	DEFS = @@DEFS@@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
				5043	@end example
				5044
				5045	or:
				5046
				5047	@example
				5048	AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
				5049	@end example
				5050
				5051	@item
				5052	Finally, invoke the command @command{autoreconf} to generate the build
				5053	infrastructure.
				5054	@end enumerate
				5055
				5056
				5057	@node Algorithm
				5058	@chapter The Bison Parser Algorithm
				5059	@cindex Bison parser algorithm
				5060	@cindex algorithm of parser
				5061	@cindex shifting
				5062	@cindex reduction
				5063	@cindex parser stack
				5064	@cindex stack, parser
				5065
				5066	As Bison reads tokens, it pushes them onto a stack along with their
				5067	semantic values. The stack is called the @dfn{parser stack}. Pushing a
				5068	token is traditionally called @dfn{shifting}.
				5069
				5070	For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
				5071	@samp{3} to come. The stack will have four elements, one for each token
				5072	that was shifted.
				5073
				5074	But the stack does not always have an element for each token read. When
				5075	the last @var{n} tokens and groupings shifted match the components of a
				5076	grammar rule, they can be combined according to that rule. This is called
				5077	@dfn{reduction}. Those tokens and groupings are replaced on the stack by a
				5078	single grouping whose symbol is the result (left hand side) of that rule.
				5079	Running the rule's action is part of the process of reduction, because this
				5080	is what computes the semantic value of the resulting grouping.
				5081
				5082	For example, if the infix calculator's parser stack contains this:
				5083
				5084	@example
				5085	1 + 5 * 3
				5086	@end example
				5087
				5088	@noindent
				5089	and the next input token is a newline character, then the last three
				5090	elements can be reduced to 15 via the rule:
				5091
				5092	@example
				5093	expr: expr '*' expr;
				5094	@end example
				5095
				5096	@noindent
				5097	Then the stack contains just these three elements:
				5098
				5099	@example
				5100	1 + 15
				5101	@end example
				5102
				5103	@noindent
				5104	At this point, another reduction can be made, resulting in the single value
				5105	16. Then the newline token can be shifted.
				5106
				5107	The parser tries, by shifts and reductions, to reduce the entire input down
				5108	to a single grouping whose symbol is the grammar's start-symbol
				5109	(@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
				5110
				5111	This kind of parser is known in the literature as a bottom-up parser.
				5112
				5113	@menu
				5114	* Look-Ahead:: Parser looks one token ahead when deciding what to do.
				5115	* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
				5116	* Precedence:: Operator precedence works by resolving conflicts.
				5117	* Contextual Precedence:: When an operator's precedence depends on context.
				5118	* Parser States:: The parser is a finite-state-machine with stack.
				5119	* Reduce/Reduce:: When two rules are applicable in the same situation.
				5120	* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
				5121	* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
				5122	* Memory Management:: What happens when memory is exhausted. How to avoid it.
				5123	@end menu
				5124
				5125	@node Look-Ahead
				5126	@section Look-Ahead Tokens
				5127	@cindex look-ahead token
				5128
				5129	The Bison parser does @emph{not} always reduce immediately as soon as the
				5130	last @var{n} tokens and groupings match a rule. This is because such a
				5131	simple strategy is inadequate to handle most languages. Instead, when a
				5132	reduction is possible, the parser sometimes ``looks ahead'' at the next
				5133	token in order to decide what to do.
				5134
				5135	When a token is read, it is not immediately shifted; first it becomes the
				5136	@dfn{look-ahead token}, which is not on the stack. Now the parser can
				5137	perform one or more reductions of tokens and groupings on the stack, while
				5138	the look-ahead token remains off to the side. When no more reductions
				5139	should take place, the look-ahead token is shifted onto the stack. This
				5140	does not mean that all possible reductions have been done; depending on the
				5141	token type of the look-ahead token, some rules may choose to delay their
				5142	application.
				5143
				5144	Here is a simple case where look-ahead is needed. These three rules define
				5145	expressions which contain binary addition operators and postfix unary
				5146	factorial operators (@samp{!}), and allow parentheses for grouping.
				5147
				5148	@example
				5149	@group
				5150	expr: term '+' expr
				5151	\| term
				5152	;
				5153	@end group
				5154
				5155	@group
				5156	term: '(' expr ')'
				5157	\| term '!'
				5158	\| NUMBER
				5159	;
				5160	@end group
				5161	@end example
				5162
				5163	Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
				5164	should be done? If the following token is @samp{)}, then the first three
				5165	tokens must be reduced to form an @code{expr}. This is the only valid
				5166	course, because shifting the @samp{)} would produce a sequence of symbols
				5167	@w{@code{term ')'}}, and no rule allows this.
				5168
				5169	If the following token is @samp{!}, then it must be shifted immediately so
				5170	that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the
				5171	parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
				5172	@code{expr}. It would then be impossible to shift the @samp{!} because
				5173	doing so would produce on the stack the sequence of symbols @code{expr
				5174	'!'}. No rule allows that sequence.
				5175
				5176	@vindex yychar
				5177	@vindex yylval
				5178	@vindex yylloc
				5179	The look-ahead token is stored in the variable @code{yychar}.
				5180	Its semantic value and location, if any, are stored in the variables
				5181	@code{yylval} and @code{yylloc}.
				5182	@xref{Action Features, ,Special Features for Use in Actions}.
				5183
				5184	@node Shift/Reduce
				5185	@section Shift/Reduce Conflicts
				5186	@cindex conflicts
				5187	@cindex shift/reduce conflicts
				5188	@cindex dangling @code{else}
				5189	@cindex @code{else}, dangling
				5190
				5191	Suppose we are parsing a language which has if-then and if-then-else
				5192	statements, with a pair of rules like this:
				5193
				5194	@example
				5195	@group
				5196	if_stmt:
				5197	IF expr THEN stmt
				5198	\| IF expr THEN stmt ELSE stmt
				5199	;
				5200	@end group
				5201	@end example
				5202
				5203	@noindent
				5204	Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
				5205	terminal symbols for specific keyword tokens.
				5206
				5207	When the @code{ELSE} token is read and becomes the look-ahead token, the
				5208	contents of the stack (assuming the input is valid) are just right for
				5209	reduction by the first rule. But it is also legitimate to shift the
				5210	@code{ELSE}, because that would lead to eventual reduction by the second
				5211	rule.
				5212
				5213	This situation, where either a shift or a reduction would be valid, is
				5214	called a @dfn{shift/reduce conflict}. Bison is designed to resolve
				5215	these conflicts by choosing to shift, unless otherwise directed by
				5216	operator precedence declarations. To see the reason for this, let's
				5217	contrast it with the other alternative.
				5218
				5219	Since the parser prefers to shift the @code{ELSE}, the result is to attach
				5220	the else-clause to the innermost if-statement, making these two inputs
				5221	equivalent:
				5222
				5223	@example
				5224	if x then if y then win (); else lose;
				5225
				5226	if x then do; if y then win (); else lose; end;
				5227	@end example
				5228
				5229	But if the parser chose to reduce when possible rather than shift, the
				5230	result would be to attach the else-clause to the outermost if-statement,
				5231	making these two inputs equivalent:
				5232
				5233	@example
				5234	if x then if y then win (); else lose;
				5235
				5236	if x then do; if y then win (); end; else lose;
				5237	@end example
				5238
				5239	The conflict exists because the grammar as written is ambiguous: either
				5240	parsing of the simple nested if-statement is legitimate. The established
				5241	convention is that these ambiguities are resolved by attaching the
				5242	else-clause to the innermost if-statement; this is what Bison accomplishes
				5243	by choosing to shift rather than reduce. (It would ideally be cleaner to
				5244	write an unambiguous grammar, but that is very hard to do in this case.)
				5245	This particular ambiguity was first encountered in the specifications of
				5246	Algol 60 and is called the ``dangling @code{else}'' ambiguity.
				5247
				5248	To avoid warnings from Bison about predictable, legitimate shift/reduce
				5249	conflicts, use the @code{%expect @var{n}} declaration. There will be no
				5250	warning as long as the number of shift/reduce conflicts is exactly @var{n}.
				5251	@xref{Expect Decl, ,Suppressing Conflict Warnings}.
				5252
				5253	The definition of @code{if_stmt} above is solely to blame for the
				5254	conflict, but the conflict does not actually appear without additional
				5255	rules. Here is a complete Bison input file that actually manifests the
				5256	conflict:
				5257
				5258	@example
				5259	@group
				5260	%token IF THEN ELSE variable
				5261	%%
				5262	@end group
				5263	@group
				5264	stmt: expr
				5265	\| if_stmt
				5266	;
				5267	@end group
				5268
				5269	@group
				5270	if_stmt:
				5271	IF expr THEN stmt
				5272	\| IF expr THEN stmt ELSE stmt
				5273	;
				5274	@end group
				5275
				5276	expr: variable
				5277	;
				5278	@end example
				5279
				5280	@node Precedence
				5281	@section Operator Precedence
				5282	@cindex operator precedence
				5283	@cindex precedence of operators
				5284
				5285	Another situation where shift/reduce conflicts appear is in arithmetic
				5286	expressions. Here shifting is not always the preferred resolution; the
				5287	Bison declarations for operator precedence allow you to specify when to
				5288	shift and when to reduce.
				5289
				5290	@menu
				5291	* Why Precedence:: An example showing why precedence is needed.
				5292	* Using Precedence:: How to specify precedence in Bison grammars.
				5293	* Precedence Examples:: How these features are used in the previous example.
				5294	* How Precedence:: How they work.
				5295	@end menu
				5296
				5297	@node Why Precedence
				5298	@subsection When Precedence is Needed
				5299
				5300	Consider the following ambiguous grammar fragment (ambiguous because the
				5301	input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
				5302
				5303	@example
				5304	@group
				5305	expr: expr '-' expr
				5306	\| expr '*' expr
				5307	\| expr '<' expr
				5308	\| '(' expr ')'
				5309	@dots{}
				5310	;
				5311	@end group
				5312	@end example
				5313
				5314	@noindent
				5315	Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
				5316	should it reduce them via the rule for the subtraction operator? It
				5317	depends on the next token. Of course, if the next token is @samp{)}, we
				5318	must reduce; shifting is invalid because no single rule can reduce the
				5319	token sequence @w{@samp{- 2 )}} or anything starting with that. But if
				5320	the next token is @samp{*} or @samp{<}, we have a choice: either
				5321	shifting or reduction would allow the parse to complete, but with
				5322	different results.
				5323
				5324	To decide which one Bison should do, we must consider the results. If
				5325	the next operator token @var{op} is shifted, then it must be reduced
				5326	first in order to permit another opportunity to reduce the difference.
				5327	The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other
				5328	hand, if the subtraction is reduced before shifting @var{op}, the result
				5329	is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or
				5330	reduce should depend on the relative precedence of the operators
				5331	@samp{-} and @var{op}: @samp{*} should be shifted first, but not
				5332	@samp{<}.
				5333
				5334	@cindex associativity
				5335	What about input such as @w{@samp{1 - 2 - 5}}; should this be
				5336	@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most
				5337	operators we prefer the former, which is called @dfn{left association}.
				5338	The latter alternative, @dfn{right association}, is desirable for
				5339	assignment operators. The choice of left or right association is a
				5340	matter of whether the parser chooses to shift or reduce when the stack
				5341	contains @w{@samp{1 - 2}} and the look-ahead token is @samp{-}: shifting
				5342	makes right-associativity.
				5343
				5344	@node Using Precedence
				5345	@subsection Specifying Operator Precedence
				5346	@findex %left
				5347	@findex %right
				5348	@findex %nonassoc
				5349
				5350	Bison allows you to specify these choices with the operator precedence
				5351	declarations @code{%left} and @code{%right}. Each such declaration
				5352	contains a list of tokens, which are operators whose precedence and
				5353	associativity is being declared. The @code{%left} declaration makes all
				5354	those operators left-associative and the @code{%right} declaration makes
				5355	them right-associative. A third alternative is @code{%nonassoc}, which
				5356	declares that it is a syntax error to find the same operator twice ``in a
				5357	row''.
				5358
				5359	The relative precedence of different operators is controlled by the
				5360	order in which they are declared. The first @code{%left} or
				5361	@code{%right} declaration in the file declares the operators whose
				5362	precedence is lowest, the next such declaration declares the operators
				5363	whose precedence is a little higher, and so on.
				5364
				5365	@node Precedence Examples
				5366	@subsection Precedence Examples
				5367
				5368	In our example, we would want the following declarations:
				5369
				5370	@example
				5371	%left '<'
				5372	%left '-'
				5373	%left '*'
				5374	@end example
				5375
				5376	In a more complete example, which supports other operators as well, we
				5377	would declare them in groups of equal precedence. For example, @code{'+'} is
				5378	declared with @code{'-'}:
				5379
				5380	@example
				5381	%left '<' '>' '=' NE LE GE
				5382	%left '+' '-'
				5383	%left '*' '/'
				5384	@end example
				5385
				5386	@noindent
				5387	(Here @code{NE} and so on stand for the operators for ``not equal''
				5388	and so on. We assume that these tokens are more than one character long
				5389	and therefore are represented by names, not character literals.)
				5390
				5391	@node How Precedence
				5392	@subsection How Precedence Works
				5393
				5394	The first effect of the precedence declarations is to assign precedence
				5395	levels to the terminal symbols declared. The second effect is to assign
				5396	precedence levels to certain rules: each rule gets its precedence from
				5397	the last terminal symbol mentioned in the components. (You can also
				5398	specify explicitly the precedence of a rule. @xref{Contextual
				5399	Precedence, ,Context-Dependent Precedence}.)
				5400
				5401	Finally, the resolution of conflicts works by comparing the precedence
				5402	of the rule being considered with that of the look-ahead token. If the
				5403	token's precedence is higher, the choice is to shift. If the rule's
				5404	precedence is higher, the choice is to reduce. If they have equal
				5405	precedence, the choice is made based on the associativity of that
				5406	precedence level. The verbose output file made by @samp{-v}
				5407	(@pxref{Invocation, ,Invoking Bison}) says how each conflict was
				5408	resolved.
				5409
				5410	Not all rules and not all tokens have precedence. If either the rule or
				5411	the look-ahead token has no precedence, then the default is to shift.
				5412
				5413	@node Contextual Precedence
				5414	@section Context-Dependent Precedence
				5415	@cindex context-dependent precedence
				5416	@cindex unary operator precedence
				5417	@cindex precedence, context-dependent
				5418	@cindex precedence, unary operator
				5419	@findex %prec
				5420
				5421	Often the precedence of an operator depends on the context. This sounds
				5422	outlandish at first, but it is really very common. For example, a minus
				5423	sign typically has a very high precedence as a unary operator, and a
				5424	somewhat lower precedence (lower than multiplication) as a binary operator.
				5425
				5426	The Bison precedence declarations, @code{%left}, @code{%right} and
				5427	@code{%nonassoc}, can only be used once for a given token; so a token has
				5428	only one precedence declared in this way. For context-dependent
				5429	precedence, you need to use an additional mechanism: the @code{%prec}
				5430	modifier for rules.
				5431
				5432	The @code{%prec} modifier declares the precedence of a particular rule by
				5433	specifying a terminal symbol whose precedence should be used for that rule.
				5434	It's not necessary for that symbol to appear otherwise in the rule. The
				5435	modifier's syntax is:
				5436
				5437	@example
				5438	%prec @var{terminal-symbol}
				5439	@end example
				5440
				5441	@noindent
				5442	and it is written after the components of the rule. Its effect is to
				5443	assign the rule the precedence of @var{terminal-symbol}, overriding
				5444	the precedence that would be deduced for it in the ordinary way. The
				5445	altered rule precedence then affects how conflicts involving that rule
				5446	are resolved (@pxref{Precedence, ,Operator Precedence}).
				5447
				5448	Here is how @code{%prec} solves the problem of unary minus. First, declare
				5449	a precedence for a fictitious terminal symbol named @code{UMINUS}. There
				5450	are no tokens of this type, but the symbol serves to stand for its
				5451	precedence:
				5452
				5453	@example
				5454	@dots{}
				5455	%left '+' '-'
				5456	%left '*'
				5457	%left UMINUS
				5458	@end example
				5459
				5460	Now the precedence of @code{UMINUS} can be used in specific rules:
				5461
				5462	@example
				5463	@group
				5464	exp: @dots{}
				5465	\| exp '-' exp
				5466	@dots{}
				5467	\| '-' exp %prec UMINUS
				5468	@end group
				5469	@end example
				5470
				5471	@ifset defaultprec
				5472	If you forget to append @code{%prec UMINUS} to the rule for unary
				5473	minus, Bison silently assumes that minus has its usual precedence.
				5474	This kind of problem can be tricky to debug, since one typically
				5475	discovers the mistake only by testing the code.
				5476
				5477	The @code{%no-default-prec;} declaration makes it easier to discover
				5478	this kind of problem systematically. It causes rules that lack a
				5479	@code{%prec} modifier to have no precedence, even if the last terminal
				5480	symbol mentioned in their components has a declared precedence.
				5481
				5482	If @code{%no-default-prec;} is in effect, you must specify @code{%prec}
				5483	for all rules that participate in precedence conflict resolution.
				5484	Then you will see any shift/reduce conflict until you tell Bison how
				5485	to resolve it, either by changing your grammar or by adding an
				5486	explicit precedence. This will probably add declarations to the
				5487	grammar, but it helps to protect against incorrect rule precedences.
				5488
				5489	The effect of @code{%no-default-prec;} can be reversed by giving
				5490	@code{%default-prec;}, which is the default.
				5491	@end ifset
				5492
				5493	@node Parser States
				5494	@section Parser States
				5495	@cindex finite-state machine
				5496	@cindex parser state
				5497	@cindex state (of parser)
				5498
				5499	The function @code{yyparse} is implemented using a finite-state machine.
				5500	The values pushed on the parser stack are not simply token type codes; they
				5501	represent the entire sequence of terminal and nonterminal symbols at or
				5502	near the top of the stack. The current state collects all the information
				5503	about previous input which is relevant to deciding what to do next.
				5504
				5505	Each time a look-ahead token is read, the current parser state together
				5506	with the type of look-ahead token are looked up in a table. This table
				5507	entry can say, ``Shift the look-ahead token.'' In this case, it also
				5508	specifies the new parser state, which is pushed onto the top of the
				5509	parser stack. Or it can say, ``Reduce using rule number @var{n}.''
				5510	This means that a certain number of tokens or groupings are taken off
				5511	the top of the stack, and replaced by one grouping. In other words,
				5512	that number of states are popped from the stack, and one new state is
				5513	pushed.
				5514
				5515	There is one other alternative: the table can say that the look-ahead token
				5516	is erroneous in the current state. This causes error processing to begin
				5517	(@pxref{Error Recovery}).
				5518
				5519	@node Reduce/Reduce
				5520	@section Reduce/Reduce Conflicts
				5521	@cindex reduce/reduce conflict
				5522	@cindex conflicts, reduce/reduce
				5523
				5524	A reduce/reduce conflict occurs if there are two or more rules that apply
				5525	to the same sequence of input. This usually indicates a serious error
				5526	in the grammar.
				5527
				5528	For example, here is an erroneous attempt to define a sequence
				5529	of zero or more @code{word} groupings.
				5530
				5531	@example
				5532	sequence: /* empty */
				5533	@{ printf ("empty sequence\n"); @}
				5534	\| maybeword
				5535	\| sequence word
				5536	@{ printf ("added word %s\n", $2); @}
				5537	;
				5538
				5539	maybeword: /* empty */
				5540	@{ printf ("empty maybeword\n"); @}
				5541	\| word
				5542	@{ printf ("single word %s\n", $1); @}
				5543	;
				5544	@end example
				5545
				5546	@noindent
				5547	The error is an ambiguity: there is more than one way to parse a single
				5548	@code{word} into a @code{sequence}. It could be reduced to a
				5549	@code{maybeword} and then into a @code{sequence} via the second rule.
				5550	Alternatively, nothing-at-all could be reduced into a @code{sequence}
				5551	via the first rule, and this could be combined with the @code{word}
				5552	using the third rule for @code{sequence}.
				5553
				5554	There is also more than one way to reduce nothing-at-all into a
				5555	@code{sequence}. This can be done directly via the first rule,
				5556	or indirectly via @code{maybeword} and then the second rule.
				5557
				5558	You might think that this is a distinction without a difference, because it
				5559	does not change whether any particular input is valid or not. But it does
				5560	affect which actions are run. One parsing order runs the second rule's
				5561	action; the other runs the first rule's action and the third rule's action.
				5562	In this example, the output of the program changes.
				5563
				5564	Bison resolves a reduce/reduce conflict by choosing to use the rule that
				5565	appears first in the grammar, but it is very risky to rely on this. Every
				5566	reduce/reduce conflict must be studied and usually eliminated. Here is the
				5567	proper way to define @code{sequence}:
				5568
				5569	@example
				5570	sequence: /* empty */
				5571	@{ printf ("empty sequence\n"); @}
				5572	\| sequence word
				5573	@{ printf ("added word %s\n", $2); @}
				5574	;
				5575	@end example
				5576
				5577	Here is another common error that yields a reduce/reduce conflict:
				5578
				5579	@example
				5580	sequence: /* empty */
				5581	\| sequence words
				5582	\| sequence redirects
				5583	;
				5584
				5585	words: /* empty */
				5586	\| words word
				5587	;
				5588
				5589	redirects:/* empty */
				5590	\| redirects redirect
				5591	;
				5592	@end example
				5593
				5594	@noindent
				5595	The intention here is to define a sequence which can contain either
				5596	@code{word} or @code{redirect} groupings. The individual definitions of
				5597	@code{sequence}, @code{words} and @code{redirects} are error-free, but the
				5598	three together make a subtle ambiguity: even an empty input can be parsed
				5599	in infinitely many ways!
				5600
				5601	Consider: nothing-at-all could be a @code{words}. Or it could be two
				5602	@code{words} in a row, or three, or any number. It could equally well be a
				5603	@code{redirects}, or two, or any number. Or it could be a @code{words}
				5604	followed by three @code{redirects} and another @code{words}. And so on.
				5605
				5606	Here are two ways to correct these rules. First, to make it a single level
				5607	of sequence:
				5608
				5609	@example
				5610	sequence: /* empty */
				5611	\| sequence word
				5612	\| sequence redirect
				5613	;
				5614	@end example
				5615
				5616	Second, to prevent either a @code{words} or a @code{redirects}
				5617	from being empty:
				5618
				5619	@example
				5620	sequence: /* empty */
				5621	\| sequence words
				5622	\| sequence redirects
				5623	;
				5624
				5625	words: word
				5626	\| words word
				5627	;
				5628
				5629	redirects:redirect
				5630	\| redirects redirect
				5631	;
				5632	@end example
				5633
				5634	@node Mystery Conflicts
				5635	@section Mysterious Reduce/Reduce Conflicts
				5636
				5637	Sometimes reduce/reduce conflicts can occur that don't look warranted.
				5638	Here is an example:
				5639
				5640	@example
				5641	@group
				5642	%token ID
				5643
				5644	%%
				5645	def: param_spec return_spec ','
				5646	;
				5647	param_spec:
				5648	type
				5649	\| name_list ':' type
				5650	;
				5651	@end group
				5652	@group
				5653	return_spec:
				5654	type
				5655	\| name ':' type
				5656	;
				5657	@end group
				5658	@group
				5659	type: ID
				5660	;
				5661	@end group
				5662	@group
				5663	name: ID
				5664	;
				5665	name_list:
				5666	name
				5667	\| name ',' name_list
				5668	;
				5669	@end group
				5670	@end example
				5671
				5672	It would seem that this grammar can be parsed with only a single token
				5673	of look-ahead: when a @code{param_spec} is being read, an @code{ID} is
				5674	a @code{name} if a comma or colon follows, or a @code{type} if another
				5675	@code{ID} follows. In other words, this grammar is @acronym{LR}(1).
				5676
				5677	@cindex @acronym{LR}(1)
				5678	@cindex @acronym{LALR}(1)
				5679	However, Bison, like most parser generators, cannot actually handle all
				5680	@acronym{LR}(1) grammars. In this grammar, two contexts, that after
				5681	an @code{ID}
				5682	at the beginning of a @code{param_spec} and likewise at the beginning of
				5683	a @code{return_spec}, are similar enough that Bison assumes they are the
				5684	same. They appear similar because the same set of rules would be
				5685	active---the rule for reducing to a @code{name} and that for reducing to
				5686	a @code{type}. Bison is unable to determine at that stage of processing
				5687	that the rules would require different look-ahead tokens in the two
				5688	contexts, so it makes a single parser state for them both. Combining
				5689	the two contexts causes a conflict later. In parser terminology, this
				5690	occurrence means that the grammar is not @acronym{LALR}(1).
				5691
				5692	In general, it is better to fix deficiencies than to document them. But
				5693	this particular deficiency is intrinsically hard to fix; parser
				5694	generators that can handle @acronym{LR}(1) grammars are hard to write
				5695	and tend to
				5696	produce parsers that are very large. In practice, Bison is more useful
				5697	as it is now.
				5698
				5699	When the problem arises, you can often fix it by identifying the two
				5700	parser states that are being confused, and adding something to make them
				5701	look distinct. In the above example, adding one rule to
				5702	@code{return_spec} as follows makes the problem go away:
				5703
				5704	@example
				5705	@group
				5706	%token BOGUS
				5707	@dots{}
				5708	%%
				5709	@dots{}
				5710	return_spec:
				5711	type
				5712	\| name ':' type
				5713	/* This rule is never used. */
				5714	\| ID BOGUS
				5715	;
				5716	@end group
				5717	@end example
				5718
				5719	This corrects the problem because it introduces the possibility of an
				5720	additional active rule in the context after the @code{ID} at the beginning of
				5721	@code{return_spec}. This rule is not active in the corresponding context
				5722	in a @code{param_spec}, so the two contexts receive distinct parser states.
				5723	As long as the token @code{BOGUS} is never generated by @code{yylex},
				5724	the added rule cannot alter the way actual input is parsed.
				5725
				5726	In this particular example, there is another way to solve the problem:
				5727	rewrite the rule for @code{return_spec} to use @code{ID} directly
				5728	instead of via @code{name}. This also causes the two confusing
				5729	contexts to have different sets of active rules, because the one for
				5730	@code{return_spec} activates the altered rule for @code{return_spec}
				5731	rather than the one for @code{name}.
				5732
				5733	@example
				5734	param_spec:
				5735	type
				5736	\| name_list ':' type
				5737	;
				5738	return_spec:
				5739	type
				5740	\| ID ':' type
				5741	;
				5742	@end example
				5743
				5744	For a more detailed exposition of @acronym{LALR}(1) parsers and parser
				5745	generators, please see:
				5746	Frank DeRemer and Thomas Pennello, Efficient Computation of
				5747	@acronym{LALR}(1) Look-Ahead Sets, @cite{@acronym{ACM} Transactions on
				5748	Programming Languages and Systems}, Vol.@: 4, No.@: 4 (October 1982),
				5749	pp.@: 615--649 @uref{http://doi.acm.org/10.1145/69622.357187}.
				5750
				5751	@node Generalized LR Parsing
				5752	@section Generalized @acronym{LR} (@acronym{GLR}) Parsing
				5753	@cindex @acronym{GLR} parsing
				5754	@cindex generalized @acronym{LR} (@acronym{GLR}) parsing
				5755	@cindex ambiguous grammars
				5756	@cindex nondeterministic parsing
				5757
				5758	Bison produces @emph{deterministic} parsers that choose uniquely
				5759	when to reduce and which reduction to apply
				5760	based on a summary of the preceding input and on one extra token of look-ahead.
				5761	As a result, normal Bison handles a proper subset of the family of
				5762	context-free languages.
				5763	Ambiguous grammars, since they have strings with more than one possible
				5764	sequence of reductions cannot have deterministic parsers in this sense.
				5765	The same is true of languages that require more than one symbol of
				5766	look-ahead, since the parser lacks the information necessary to make a
				5767	decision at the point it must be made in a shift-reduce parser.
				5768	Finally, as previously mentioned (@pxref{Mystery Conflicts}),
				5769	there are languages where Bison's particular choice of how to
				5770	summarize the input seen so far loses necessary information.
				5771
				5772	When you use the @samp{%glr-parser} declaration in your grammar file,
				5773	Bison generates a parser that uses a different algorithm, called
				5774	Generalized @acronym{LR} (or @acronym{GLR}). A Bison @acronym{GLR}
				5775	parser uses the same basic
				5776	algorithm for parsing as an ordinary Bison parser, but behaves
				5777	differently in cases where there is a shift-reduce conflict that has not
				5778	been resolved by precedence rules (@pxref{Precedence}) or a
				5779	reduce-reduce conflict. When a @acronym{GLR} parser encounters such a
				5780	situation, it
				5781	effectively @emph{splits} into a several parsers, one for each possible
				5782	shift or reduction. These parsers then proceed as usual, consuming
				5783	tokens in lock-step. Some of the stacks may encounter other conflicts
				5784	and split further, with the result that instead of a sequence of states,
				5785	a Bison @acronym{GLR} parsing stack is what is in effect a tree of states.
				5786
				5787	In effect, each stack represents a guess as to what the proper parse
				5788	is. Additional input may indicate that a guess was wrong, in which case
				5789	the appropriate stack silently disappears. Otherwise, the semantics
				5790	actions generated in each stack are saved, rather than being executed
				5791	immediately. When a stack disappears, its saved semantic actions never
				5792	get executed. When a reduction causes two stacks to become equivalent,
				5793	their sets of semantic actions are both saved with the state that
				5794	results from the reduction. We say that two stacks are equivalent
				5795	when they both represent the same sequence of states,
				5796	and each pair of corresponding states represents a
				5797	grammar symbol that produces the same segment of the input token
				5798	stream.
				5799
				5800	Whenever the parser makes a transition from having multiple
				5801	states to having one, it reverts to the normal @acronym{LALR}(1) parsing
				5802	algorithm, after resolving and executing the saved-up actions.
				5803	At this transition, some of the states on the stack will have semantic
				5804	values that are sets (actually multisets) of possible actions. The
				5805	parser tries to pick one of the actions by first finding one whose rule
				5806	has the highest dynamic precedence, as set by the @samp{%dprec}
				5807	declaration. Otherwise, if the alternative actions are not ordered by
				5808	precedence, but there the same merging function is declared for both
				5809	rules by the @samp{%merge} declaration,
				5810	Bison resolves and evaluates both and then calls the merge function on
				5811	the result. Otherwise, it reports an ambiguity.
				5812
				5813	It is possible to use a data structure for the @acronym{GLR} parsing tree that
				5814	permits the processing of any @acronym{LALR}(1) grammar in linear time (in the
				5815	size of the input), any unambiguous (not necessarily
				5816	@acronym{LALR}(1)) grammar in
				5817	quadratic worst-case time, and any general (possibly ambiguous)
				5818	context-free grammar in cubic worst-case time. However, Bison currently
				5819	uses a simpler data structure that requires time proportional to the
				5820	length of the input times the maximum number of stacks required for any
				5821	prefix of the input. Thus, really ambiguous or nondeterministic
				5822	grammars can require exponential time and space to process. Such badly
				5823	behaving examples, however, are not generally of practical interest.
				5824	Usually, nondeterminism in a grammar is local---the parser is ``in
				5825	doubt'' only for a few tokens at a time. Therefore, the current data
				5826	structure should generally be adequate. On @acronym{LALR}(1) portions of a
				5827	grammar, in particular, it is only slightly slower than with the default
				5828	Bison parser.
				5829
				5830	For a more detailed exposition of @acronym{GLR} parsers, please see: Elizabeth
				5831	Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style
				5832	Generalised @acronym{LR} Parsers, Royal Holloway, University of
				5833	London, Department of Computer Science, TR-00-12,
				5834	@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps},
				5835	(2000-12-24).
				5836
				5837	@node Memory Management
				5838	@section Memory Management, and How to Avoid Memory Exhaustion
				5839	@cindex memory exhaustion
				5840	@cindex memory management
				5841	@cindex stack overflow
				5842	@cindex parser stack overflow
				5843	@cindex overflow of parser stack
				5844
				5845	The Bison parser stack can run out of memory if too many tokens are shifted and
				5846	not reduced. When this happens, the parser function @code{yyparse}
				5847	calls @code{yyerror} and then returns 2.
				5848
				5849	Because Bison parsers have growing stacks, hitting the upper limit
				5850	usually results from using a right recursion instead of a left
				5851	recursion, @xref{Recursion, ,Recursive Rules}.
				5852
				5853	@vindex YYMAXDEPTH
				5854	By defining the macro @code{YYMAXDEPTH}, you can control how deep the
				5855	parser stack can become before memory is exhausted. Define the
				5856	macro with a value that is an integer. This value is the maximum number
				5857	of tokens that can be shifted (and not reduced) before overflow.
				5858
				5859	The stack space allowed is not necessarily allocated. If you specify a
				5860	large value for @code{YYMAXDEPTH}, the parser normally allocates a small
				5861	stack at first, and then makes it bigger by stages as needed. This
				5862	increasing allocation happens automatically and silently. Therefore,
				5863	you do not need to make @code{YYMAXDEPTH} painfully small merely to save
				5864	space for ordinary inputs that do not need much stack.
				5865
				5866	However, do not allow @code{YYMAXDEPTH} to be a value so large that
				5867	arithmetic overflow could occur when calculating the size of the stack
				5868	space. Also, do not allow @code{YYMAXDEPTH} to be less than
				5869	@code{YYINITDEPTH}.
				5870
				5871	@cindex default stack limit
				5872	The default value of @code{YYMAXDEPTH}, if you do not define it, is
				5873	10000.
				5874
				5875	@vindex YYINITDEPTH
				5876	You can control how much stack is allocated initially by defining the
				5877	macro @code{YYINITDEPTH} to a positive integer. For the C
				5878	@acronym{LALR}(1) parser, this value must be a compile-time constant
				5879	unless you are assuming C99 or some other target language or compiler
				5880	that allows variable-length arrays. The default is 200.
				5881
				5882	Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}.
				5883
				5884	@c FIXME: C++ output.
				5885	Because of semantical differences between C and C++, the
				5886	@acronym{LALR}(1) parsers in C produced by Bison cannot grow when compiled
				5887	by C++ compilers. In this precise case (compiling a C parser as C++) you are
				5888	suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix
				5889	this deficiency in a future release.
				5890
				5891	@node Error Recovery
				5892	@chapter Error Recovery
				5893	@cindex error recovery
				5894	@cindex recovery from errors
				5895
				5896	It is not usually acceptable to have a program terminate on a syntax
				5897	error. For example, a compiler should recover sufficiently to parse the
				5898	rest of the input file and check it for errors; a calculator should accept
				5899	another expression.
				5900
				5901	In a simple interactive command parser where each input is one line, it may
				5902	be sufficient to allow @code{yyparse} to return 1 on error and have the
				5903	caller ignore the rest of the input line when that happens (and then call
				5904	@code{yyparse} again). But this is inadequate for a compiler, because it
				5905	forgets all the syntactic context leading up to the error. A syntax error
				5906	deep within a function in the compiler input should not cause the compiler
				5907	to treat the following line like the beginning of a source file.
				5908
				5909	@findex error
				5910	You can define how to recover from a syntax error by writing rules to
				5911	recognize the special token @code{error}. This is a terminal symbol that
				5912	is always defined (you need not declare it) and reserved for error
				5913	handling. The Bison parser generates an @code{error} token whenever a
				5914	syntax error happens; if you have provided a rule to recognize this token
				5915	in the current context, the parse can continue.
				5916
				5917	For example:
				5918
				5919	@example
				5920	stmnts: /* empty string */
				5921	\| stmnts '\n'
				5922	\| stmnts exp '\n'
				5923	\| stmnts error '\n'
				5924	@end example
				5925
				5926	The fourth rule in this example says that an error followed by a newline
				5927	makes a valid addition to any @code{stmnts}.
				5928
				5929	What happens if a syntax error occurs in the middle of an @code{exp}? The
				5930	error recovery rule, interpreted strictly, applies to the precise sequence
				5931	of a @code{stmnts}, an @code{error} and a newline. If an error occurs in
				5932	the middle of an @code{exp}, there will probably be some additional tokens
				5933	and subexpressions on the stack after the last @code{stmnts}, and there
				5934	will be tokens to read before the next newline. So the rule is not
				5935	applicable in the ordinary way.
				5936
				5937	But Bison can force the situation to fit the rule, by discarding part of
				5938	the semantic context and part of the input. First it discards states
				5939	and objects from the stack until it gets back to a state in which the
				5940	@code{error} token is acceptable. (This means that the subexpressions
				5941	already parsed are discarded, back to the last complete @code{stmnts}.)
				5942	At this point the @code{error} token can be shifted. Then, if the old
				5943	look-ahead token is not acceptable to be shifted next, the parser reads
				5944	tokens and discards them until it finds a token which is acceptable. In
				5945	this example, Bison reads and discards input until the next newline so
				5946	that the fourth rule can apply. Note that discarded symbols are
				5947	possible sources of memory leaks, see @ref{Destructor Decl, , Freeing
				5948	Discarded Symbols}, for a means to reclaim this memory.
				5949
				5950	The choice of error rules in the grammar is a choice of strategies for
				5951	error recovery. A simple and useful strategy is simply to skip the rest of
				5952	the current input line or current statement if an error is detected:
				5953
				5954	@example
				5955	stmnt: error ';' /* On error, skip until ';' is read. */
				5956	@end example
				5957
				5958	It is also useful to recover to the matching close-delimiter of an
				5959	opening-delimiter that has already been parsed. Otherwise the
				5960	close-delimiter will probably appear to be unmatched, and generate another,
				5961	spurious error message:
				5962
				5963	@example
				5964	primary: '(' expr ')'
				5965	\| '(' error ')'
				5966	@dots{}
				5967	;
				5968	@end example
				5969
				5970	Error recovery strategies are necessarily guesses. When they guess wrong,
				5971	one syntax error often leads to another. In the above example, the error
				5972	recovery rule guesses that an error is due to bad input within one
				5973	@code{stmnt}. Suppose that instead a spurious semicolon is inserted in the
				5974	middle of a valid @code{stmnt}. After the error recovery rule recovers
				5975	from the first error, another syntax error will be found straightaway,
				5976	since the text following the spurious semicolon is also an invalid
				5977	@code{stmnt}.
				5978
				5979	To prevent an outpouring of error messages, the parser will output no error
				5980	message for another syntax error that happens shortly after the first; only
				5981	after three consecutive input tokens have been successfully shifted will
				5982	error messages resume.
				5983
				5984	Note that rules which accept the @code{error} token may have actions, just
				5985	as any other rules can.
				5986
				5987	@findex yyerrok
				5988	You can make error messages resume immediately by using the macro
				5989	@code{yyerrok} in an action. If you do this in the error rule's action, no
				5990	error messages will be suppressed. This macro requires no arguments;
				5991	@samp{yyerrok;} is a valid C statement.
				5992
				5993	@findex yyclearin
				5994	The previous look-ahead token is reanalyzed immediately after an error. If
				5995	this is unacceptable, then the macro @code{yyclearin} may be used to clear
				5996	this token. Write the statement @samp{yyclearin;} in the error rule's
				5997	action.
				5998	@xref{Action Features, ,Special Features for Use in Actions}.
				5999
				6000	For example, suppose that on a syntax error, an error handling routine is
				6001	called that advances the input stream to some point where parsing should
				6002	once again commence. The next symbol returned by the lexical scanner is
				6003	probably correct. The previous look-ahead token ought to be discarded
				6004	with @samp{yyclearin;}.
				6005
				6006	@vindex YYRECOVERING
				6007	The expression @code{YYRECOVERING ()} yields 1 when the parser
				6008	is recovering from a syntax error, and 0 otherwise.
				6009	Syntax error diagnostics are suppressed while recovering from a syntax
				6010	error.
				6011
				6012	@node Context Dependency
				6013	@chapter Handling Context Dependencies
				6014
				6015	The Bison paradigm is to parse tokens first, then group them into larger
				6016	syntactic units. In many languages, the meaning of a token is affected by
				6017	its context. Although this violates the Bison paradigm, certain techniques
				6018	(known as @dfn{kludges}) may enable you to write Bison parsers for such
				6019	languages.
				6020
				6021	@menu
				6022	* Semantic Tokens:: Token parsing can depend on the semantic context.
				6023	* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
				6024	* Tie-in Recovery:: Lexical tie-ins have implications for how
				6025	error recovery rules must be written.
				6026	@end menu
				6027
				6028	(Actually, ``kludge'' means any technique that gets its job done but is
				6029	neither clean nor robust.)
				6030
				6031	@node Semantic Tokens
				6032	@section Semantic Info in Token Types
				6033
				6034	The C language has a context dependency: the way an identifier is used
				6035	depends on what its current meaning is. For example, consider this:
				6036
				6037	@example
				6038	foo (x);
				6039	@end example
				6040
				6041	This looks like a function call statement, but if @code{foo} is a typedef
				6042	name, then this is actually a declaration of @code{x}. How can a Bison
				6043	parser for C decide how to parse this input?
				6044
				6045	The method used in @acronym{GNU} C is to have two different token types,
				6046	@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
				6047	identifier, it looks up the current declaration of the identifier in order
				6048	to decide which token type to return: @code{TYPENAME} if the identifier is
				6049	declared as a typedef, @code{IDENTIFIER} otherwise.
				6050
				6051	The grammar rules can then express the context dependency by the choice of
				6052	token type to recognize. @code{IDENTIFIER} is accepted as an expression,
				6053	but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
				6054	@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
				6055	is @emph{not} significant, such as in declarations that can shadow a
				6056	typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
				6057	accepted---there is one rule for each of the two token types.
				6058
				6059	This technique is simple to use if the decision of which kinds of
				6060	identifiers to allow is made at a place close to where the identifier is
				6061	parsed. But in C this is not always so: C allows a declaration to
				6062	redeclare a typedef name provided an explicit type has been specified
				6063	earlier:
				6064
				6065	@example
				6066	typedef int foo, bar;
				6067	int baz (void)
				6068	@{
				6069	static bar (bar); /* @r{redeclare @code{bar} as static variable} */
				6070	extern foo foo (foo); /* @r{redeclare @code{foo} as function} */
				6071	return foo (bar);
				6072	@}
				6073	@end example
				6074
				6075	Unfortunately, the name being declared is separated from the declaration
				6076	construct itself by a complicated syntactic structure---the ``declarator''.
				6077
				6078	As a result, part of the Bison parser for C needs to be duplicated, with
				6079	all the nonterminal names changed: once for parsing a declaration in
				6080	which a typedef name can be redefined, and once for parsing a
				6081	declaration in which that can't be done. Here is a part of the
				6082	duplication, with actions omitted for brevity:
				6083
				6084	@example
				6085	initdcl:
				6086	declarator maybeasm '='
				6087	init
				6088	\| declarator maybeasm
				6089	;
				6090
				6091	notype_initdcl:
				6092	notype_declarator maybeasm '='
				6093	init
				6094	\| notype_declarator maybeasm
				6095	;
				6096	@end example
				6097
				6098	@noindent
				6099	Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
				6100	cannot. The distinction between @code{declarator} and
				6101	@code{notype_declarator} is the same sort of thing.
				6102
				6103	There is some similarity between this technique and a lexical tie-in
				6104	(described next), in that information which alters the lexical analysis is
				6105	changed during parsing by other parts of the program. The difference is
				6106	here the information is global, and is used for other purposes in the
				6107	program. A true lexical tie-in has a special-purpose flag controlled by
				6108	the syntactic context.
				6109
				6110	@node Lexical Tie-ins
				6111	@section Lexical Tie-ins
				6112	@cindex lexical tie-in
				6113
				6114	One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
				6115	which is set by Bison actions, whose purpose is to alter the way tokens are
				6116	parsed.
				6117
				6118	For example, suppose we have a language vaguely like C, but with a special
				6119	construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes
				6120	an expression in parentheses in which all integers are hexadecimal. In
				6121	particular, the token @samp{a1b} must be treated as an integer rather than
				6122	as an identifier if it appears in that context. Here is how you can do it:
				6123
				6124	@example
				6125	@group
				6126	%@{
				6127	int hexflag;
				6128	int yylex (void);
				6129	void yyerror (char const *);
				6130	%@}
				6131	%%
				6132	@dots{}
				6133	@end group
				6134	@group
				6135	expr: IDENTIFIER
				6136	\| constant
				6137	\| HEX '('
				6138	@{ hexflag = 1; @}
				6139	expr ')'
				6140	@{ hexflag = 0;
				6141	$$ = $4; @}
				6142	\| expr '+' expr
				6143	@{ $$ = make_sum ($1, $3); @}
				6144	@dots{}
				6145	;
				6146	@end group
				6147
				6148	@group
				6149	constant:
				6150	INTEGER
				6151	\| STRING
				6152	;
				6153	@end group
				6154	@end example
				6155
				6156	@noindent
				6157	Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
				6158	it is nonzero, all integers are parsed in hexadecimal, and tokens starting
				6159	with letters are parsed as integers if possible.
				6160
				6161	The declaration of @code{hexflag} shown in the prologue of the parser file
				6162	is needed to make it accessible to the actions (@pxref{Prologue, ,The Prologue}).
				6163	You must also write the code in @code{yylex} to obey the flag.
				6164
				6165	@node Tie-in Recovery
				6166	@section Lexical Tie-ins and Error Recovery
				6167
				6168	Lexical tie-ins make strict demands on any error recovery rules you have.
				6169	@xref{Error Recovery}.
				6170
				6171	The reason for this is that the purpose of an error recovery rule is to
				6172	abort the parsing of one construct and resume in some larger construct.
				6173	For example, in C-like languages, a typical error recovery rule is to skip
				6174	tokens until the next semicolon, and then start a new statement, like this:
				6175
				6176	@example
				6177	stmt: expr ';'
				6178	\| IF '(' expr ')' stmt @{ @dots{} @}
				6179	@dots{}
				6180	error ';'
				6181	@{ hexflag = 0; @}
				6182	;
				6183	@end example
				6184
				6185	If there is a syntax error in the middle of a @samp{hex (@var{expr})}
				6186	construct, this error rule will apply, and then the action for the
				6187	completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would
				6188	remain set for the entire rest of the input, or until the next @code{hex}
				6189	keyword, causing identifiers to be misinterpreted as integers.
				6190
				6191	To avoid this problem the error recovery rule itself clears @code{hexflag}.
				6192
				6193	There may also be an error recovery rule that works within expressions.
				6194	For example, there could be a rule which applies within parentheses
				6195	and skips to the close-parenthesis:
				6196
				6197	@example
				6198	@group
				6199	expr: @dots{}
				6200	\| '(' expr ')'
				6201	@{ $$ = $2; @}
				6202	\| '(' error ')'
				6203	@dots{}
				6204	@end group
				6205	@end example
				6206
				6207	If this rule acts within the @code{hex} construct, it is not going to abort
				6208	that construct (since it applies to an inner level of parentheses within
				6209	the construct). Therefore, it should not clear the flag: the rest of
				6210	the @code{hex} construct should be parsed with the flag still in effect.
				6211
				6212	What if there is an error recovery rule which might abort out of the
				6213	@code{hex} construct or might not, depending on circumstances? There is no
				6214	way you can write the action to determine whether a @code{hex} construct is
				6215	being aborted or not. So if you are using a lexical tie-in, you had better
				6216	make sure your error recovery rules are not of this kind. Each rule must
				6217	be such that you can be sure that it always will, or always won't, have to
				6218	clear the flag.
				6219
				6220	@c ================================================== Debugging Your Parser
				6221
				6222	@node Debugging
				6223	@chapter Debugging Your Parser
				6224
				6225	Developing a parser can be a challenge, especially if you don't
				6226	understand the algorithm (@pxref{Algorithm, ,The Bison Parser
				6227	Algorithm}). Even so, sometimes a detailed description of the automaton
				6228	can help (@pxref{Understanding, , Understanding Your Parser}), or
				6229	tracing the execution of the parser can give some insight on why it
				6230	behaves improperly (@pxref{Tracing, , Tracing Your Parser}).
				6231
				6232	@menu
				6233	* Understanding:: Understanding the structure of your parser.
				6234	* Tracing:: Tracing the execution of your parser.
				6235	@end menu
				6236
				6237	@node Understanding
				6238	@section Understanding Your Parser
				6239
				6240	As documented elsewhere (@pxref{Algorithm, ,The Bison Parser Algorithm})
				6241	Bison parsers are @dfn{shift/reduce automata}. In some cases (much more
				6242	frequent than one would hope), looking at this automaton is required to
				6243	tune or simply fix a parser. Bison provides two different
				6244	representation of it, either textually or graphically (as a @acronym{VCG}
				6245	file).
				6246
				6247	The textual file is generated when the options @option{--report} or
				6248	@option{--verbose} are specified, see @xref{Invocation, , Invoking
				6249	Bison}. Its name is made by removing @samp{.tab.c} or @samp{.c} from
				6250	the parser output file name, and adding @samp{.output} instead.
				6251	Therefore, if the input file is @file{foo.y}, then the parser file is
				6252	called @file{foo.tab.c} by default. As a consequence, the verbose
				6253	output file is called @file{foo.output}.
				6254
				6255	The following grammar file, @file{calc.y}, will be used in the sequel:
				6256
				6257	@example
				6258	%token NUM STR
				6259	%left '+' '-'
				6260	%left '*'
				6261	%%
				6262	exp: exp '+' exp
				6263	\| exp '-' exp
				6264	\| exp '*' exp
				6265	\| exp '/' exp
				6266	\| NUM
				6267	;
				6268	useless: STR;
				6269	%%
				6270	@end example
				6271
				6272	@command{bison} reports:
				6273
				6274	@example
				6275	calc.y: warning: 1 useless nonterminal and 1 useless rule
				6276	calc.y:11.1-7: warning: useless nonterminal: useless
				6277	calc.y:11.10-12: warning: useless rule: useless: STR
				6278	calc.y: conflicts: 7 shift/reduce
				6279	@end example
				6280
				6281	When given @option{--report=state}, in addition to @file{calc.tab.c}, it
				6282	creates a file @file{calc.output} with contents detailed below. The
				6283	order of the output and the exact presentation might vary, but the
				6284	interpretation is the same.
				6285
				6286	The first section includes details on conflicts that were solved thanks
				6287	to precedence and/or associativity:
				6288
				6289	@example
				6290	Conflict in state 8 between rule 2 and token '+' resolved as reduce.
				6291	Conflict in state 8 between rule 2 and token '-' resolved as reduce.
				6292	Conflict in state 8 between rule 2 and token '*' resolved as shift.
				6293	@exdent @dots{}
				6294	@end example
				6295
				6296	@noindent
				6297	The next section lists states that still have conflicts.
				6298
				6299	@example
				6300	State 8 conflicts: 1 shift/reduce
				6301	State 9 conflicts: 1 shift/reduce
				6302	State 10 conflicts: 1 shift/reduce
				6303	State 11 conflicts: 4 shift/reduce
				6304	@end example
				6305
				6306	@noindent
				6307	@cindex token, useless
				6308	@cindex useless token
				6309	@cindex nonterminal, useless
				6310	@cindex useless nonterminal
				6311	@cindex rule, useless
				6312	@cindex useless rule
				6313	The next section reports useless tokens, nonterminal and rules. Useless
				6314	nonterminals and rules are removed in order to produce a smaller parser,
				6315	but useless tokens are preserved, since they might be used by the
				6316	scanner (note the difference between ``useless'' and ``not used''
				6317	below):
				6318
				6319	@example
				6320	Useless nonterminals:
				6321	useless
				6322
				6323	Terminals which are not used:
				6324	STR
				6325
				6326	Useless rules:
				6327	#6 useless: STR;
				6328	@end example
				6329
				6330	@noindent
				6331	The next section reproduces the exact grammar that Bison used:
				6332
				6333	@example
				6334	Grammar
				6335
				6336	Number, Line, Rule
				6337	0 5 $accept -> exp $end
				6338	1 5 exp -> exp '+' exp
				6339	2 6 exp -> exp '-' exp
				6340	3 7 exp -> exp '*' exp
				6341	4 8 exp -> exp '/' exp
				6342	5 9 exp -> NUM
				6343	@end example
				6344
				6345	@noindent
				6346	and reports the uses of the symbols:
				6347
				6348	@example
				6349	Terminals, with rules where they appear
				6350
				6351	$end (0) 0
				6352	'*' (42) 3
				6353	'+' (43) 1
				6354	'-' (45) 2
				6355	'/' (47) 4
				6356	error (256)
				6357	NUM (258) 5
				6358
				6359	Nonterminals, with rules where they appear
				6360
				6361	$accept (8)
				6362	on left: 0
				6363	exp (9)
				6364	on left: 1 2 3 4 5, on right: 0 1 2 3 4
				6365	@end example
				6366
				6367	@noindent
				6368	@cindex item
				6369	@cindex pointed rule
				6370	@cindex rule, pointed
				6371	Bison then proceeds onto the automaton itself, describing each state
				6372	with it set of @dfn{items}, also known as @dfn{pointed rules}. Each
				6373	item is a production rule together with a point (marked by @samp{.})
				6374	that the input cursor.
				6375
				6376	@example
				6377	state 0
				6378
				6379	$accept -> . exp $ (rule 0)
				6380
				6381	NUM shift, and go to state 1
				6382
				6383	exp go to state 2
				6384	@end example
				6385
				6386	This reads as follows: ``state 0 corresponds to being at the very
				6387	beginning of the parsing, in the initial rule, right before the start
				6388	symbol (here, @code{exp}). When the parser returns to this state right
				6389	after having reduced a rule that produced an @code{exp}, the control
				6390	flow jumps to state 2. If there is no such transition on a nonterminal
				6391	symbol, and the look-ahead is a @code{NUM}, then this token is shifted on
				6392	the parse stack, and the control flow jumps to state 1. Any other
				6393	look-ahead triggers a syntax error.''
				6394
				6395	@cindex core, item set
				6396	@cindex item set core
				6397	@cindex kernel, item set
				6398	@cindex item set core
				6399	Even though the only active rule in state 0 seems to be rule 0, the
				6400	report lists @code{NUM} as a look-ahead token because @code{NUM} can be
				6401	at the beginning of any rule deriving an @code{exp}. By default Bison
				6402	reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if
				6403	you want to see more detail you can invoke @command{bison} with
				6404	@option{--report=itemset} to list all the items, include those that can
				6405	be derived:
				6406
				6407	@example
				6408	state 0
				6409
				6410	$accept -> . exp $ (rule 0)
				6411	exp -> . exp '+' exp (rule 1)
				6412	exp -> . exp '-' exp (rule 2)
				6413	exp -> . exp '*' exp (rule 3)
				6414	exp -> . exp '/' exp (rule 4)
				6415	exp -> . NUM (rule 5)
				6416
				6417	NUM shift, and go to state 1
				6418
				6419	exp go to state 2
				6420	@end example
				6421
				6422	@noindent
				6423	In the state 1...
				6424
				6425	@example
				6426	state 1
				6427
				6428	exp -> NUM . (rule 5)
				6429
				6430	$default reduce using rule 5 (exp)
				6431	@end example
				6432
				6433	@noindent
				6434	the rule 5, @samp{exp: NUM;}, is completed. Whatever the look-ahead token
				6435	(@samp{$default}), the parser will reduce it. If it was coming from
				6436	state 0, then, after this reduction it will return to state 0, and will
				6437	jump to state 2 (@samp{exp: go to state 2}).
				6438
				6439	@example
				6440	state 2
				6441
				6442	$accept -> exp . $ (rule 0)
				6443	exp -> exp . '+' exp (rule 1)
				6444	exp -> exp . '-' exp (rule 2)
				6445	exp -> exp . '*' exp (rule 3)
				6446	exp -> exp . '/' exp (rule 4)
				6447
				6448	$ shift, and go to state 3
				6449	'+' shift, and go to state 4
				6450	'-' shift, and go to state 5
				6451	'*' shift, and go to state 6
				6452	'/' shift, and go to state 7
				6453	@end example
				6454
				6455	@noindent
				6456	In state 2, the automaton can only shift a symbol. For instance,
				6457	because of the item @samp{exp -> exp . '+' exp}, if the look-ahead if
				6458	@samp{+}, it will be shifted on the parse stack, and the automaton
				6459	control will jump to state 4, corresponding to the item @samp{exp -> exp
				6460	'+' . exp}. Since there is no default action, any other token than
				6461	those listed above will trigger a syntax error.
				6462
				6463	The state 3 is named the @dfn{final state}, or the @dfn{accepting
				6464	state}:
				6465
				6466	@example
				6467	state 3
				6468
				6469	$accept -> exp $ . (rule 0)
				6470
				6471	$default accept
				6472	@end example
				6473
				6474	@noindent
				6475	the initial rule is completed (the start symbol and the end
				6476	of input were read), the parsing exits successfully.
				6477
				6478	The interpretation of states 4 to 7 is straightforward, and is left to
				6479	the reader.
				6480
				6481	@example
				6482	state 4
				6483
				6484	exp -> exp '+' . exp (rule 1)
				6485
				6486	NUM shift, and go to state 1
				6487
				6488	exp go to state 8
				6489
				6490	state 5
				6491
				6492	exp -> exp '-' . exp (rule 2)
				6493
				6494	NUM shift, and go to state 1
				6495
				6496	exp go to state 9
				6497
				6498	state 6
				6499
				6500	exp -> exp '*' . exp (rule 3)
				6501
				6502	NUM shift, and go to state 1
				6503
				6504	exp go to state 10
				6505
				6506	state 7
				6507
				6508	exp -> exp '/' . exp (rule 4)
				6509
				6510	NUM shift, and go to state 1
				6511
				6512	exp go to state 11
				6513	@end example
				6514
				6515	As was announced in beginning of the report, @samp{State 8 conflicts:
				6516	1 shift/reduce}:
				6517
				6518	@example
				6519	state 8
				6520
				6521	exp -> exp . '+' exp (rule 1)
				6522	exp -> exp '+' exp . (rule 1)
				6523	exp -> exp . '-' exp (rule 2)
				6524	exp -> exp . '*' exp (rule 3)
				6525	exp -> exp . '/' exp (rule 4)
				6526
				6527	'*' shift, and go to state 6
				6528	'/' shift, and go to state 7
				6529
				6530	'/' [reduce using rule 1 (exp)]
				6531	$default reduce using rule 1 (exp)
				6532	@end example
				6533
				6534	Indeed, there are two actions associated to the look-ahead @samp{/}:
				6535	either shifting (and going to state 7), or reducing rule 1. The
				6536	conflict means that either the grammar is ambiguous, or the parser lacks
				6537	information to make the right decision. Indeed the grammar is
				6538	ambiguous, as, since we did not specify the precedence of @samp{/}, the
				6539	sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM /
				6540	NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) /
				6541	NUM}, which corresponds to reducing rule 1.
				6542
				6543	Because in @acronym{LALR}(1) parsing a single decision can be made, Bison
				6544	arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, ,
				6545	Shift/Reduce Conflicts}. Discarded actions are reported in between
				6546	square brackets.
				6547
				6548	Note that all the previous states had a single possible action: either
				6549	shifting the next token and going to the corresponding state, or
				6550	reducing a single rule. In the other cases, i.e., when shifting
				6551	@emph{and} reducing is possible or when @emph{several} reductions are
				6552	possible, the look-ahead is required to select the action. State 8 is
				6553	one such state: if the look-ahead is @samp{*} or @samp{/} then the action
				6554	is shifting, otherwise the action is reducing rule 1. In other words,
				6555	the first two items, corresponding to rule 1, are not eligible when the
				6556	look-ahead token is @samp{}, since we specified that @samp{} has higher
				6557	precedence than @samp{+}. More generally, some items are eligible only
				6558	with some set of possible look-ahead tokens. When run with
				6559	@option{--report=look-ahead}, Bison specifies these look-ahead tokens:
				6560
				6561	@example
				6562	state 8
				6563
				6564	exp -> exp . '+' exp [$, '+', '-', '/'] (rule 1)
				6565	exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1)
				6566	exp -> exp . '-' exp (rule 2)
				6567	exp -> exp . '*' exp (rule 3)
				6568	exp -> exp . '/' exp (rule 4)
				6569
				6570	'*' shift, and go to state 6
				6571	'/' shift, and go to state 7
				6572
				6573	'/' [reduce using rule 1 (exp)]
				6574	$default reduce using rule 1 (exp)
				6575	@end example
				6576
				6577	The remaining states are similar:
				6578
				6579	@example
				6580	state 9
				6581
				6582	exp -> exp . '+' exp (rule 1)
				6583	exp -> exp . '-' exp (rule 2)
				6584	exp -> exp '-' exp . (rule 2)
				6585	exp -> exp . '*' exp (rule 3)
				6586	exp -> exp . '/' exp (rule 4)
				6587
				6588	'*' shift, and go to state 6
				6589	'/' shift, and go to state 7
				6590
				6591	'/' [reduce using rule 2 (exp)]
				6592	$default reduce using rule 2 (exp)
				6593
				6594	state 10
				6595
				6596	exp -> exp . '+' exp (rule 1)
				6597	exp -> exp . '-' exp (rule 2)
				6598	exp -> exp . '*' exp (rule 3)
				6599	exp -> exp '*' exp . (rule 3)
				6600	exp -> exp . '/' exp (rule 4)
				6601
				6602	'/' shift, and go to state 7
				6603
				6604	'/' [reduce using rule 3 (exp)]
				6605	$default reduce using rule 3 (exp)
				6606
				6607	state 11
				6608
				6609	exp -> exp . '+' exp (rule 1)
				6610	exp -> exp . '-' exp (rule 2)
				6611	exp -> exp . '*' exp (rule 3)
				6612	exp -> exp . '/' exp (rule 4)
				6613	exp -> exp '/' exp . (rule 4)
				6614
				6615	'+' shift, and go to state 4
				6616	'-' shift, and go to state 5
				6617	'*' shift, and go to state 6
				6618	'/' shift, and go to state 7
				6619
				6620	'+' [reduce using rule 4 (exp)]
				6621	'-' [reduce using rule 4 (exp)]
				6622	'*' [reduce using rule 4 (exp)]
				6623	'/' [reduce using rule 4 (exp)]
				6624	$default reduce using rule 4 (exp)
				6625	@end example
				6626
				6627	@noindent
				6628	Observe that state 11 contains conflicts not only due to the lack of
				6629	precedence of @samp{/} with respect to @samp{+}, @samp{-}, and
				6630	@samp{*}, but also because the
				6631	associativity of @samp{/} is not specified.
				6632
				6633
				6634	@node Tracing
				6635	@section Tracing Your Parser
				6636	@findex yydebug
				6637	@cindex debugging
				6638	@cindex tracing the parser
				6639
				6640	If a Bison grammar compiles properly but doesn't do what you want when it
				6641	runs, the @code{yydebug} parser-trace feature can help you figure out why.
				6642
				6643	There are several means to enable compilation of trace facilities:
				6644
				6645	@table @asis
				6646	@item the macro @code{YYDEBUG}
				6647	@findex YYDEBUG
				6648	Define the macro @code{YYDEBUG} to a nonzero value when you compile the
				6649	parser. This is compliant with @acronym{POSIX} Yacc. You could use
				6650	@samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define
				6651	YYDEBUG 1} in the prologue of the grammar file (@pxref{Prologue, , The
				6652	Prologue}).
				6653
				6654	@item the option @option{-t}, @option{--debug}
				6655	Use the @samp{-t} option when you run Bison (@pxref{Invocation,
				6656	,Invoking Bison}). This is @acronym{POSIX} compliant too.
				6657
				6658	@item the directive @samp{%debug}
				6659	@findex %debug
				6660	Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison
				6661	Declaration Summary}). This is a Bison extension, which will prove
				6662	useful when Bison will output parsers for languages that don't use a
				6663	preprocessor. Unless @acronym{POSIX} and Yacc portability matter to
				6664	you, this is
				6665	the preferred solution.
				6666	@end table
				6667
				6668	We suggest that you always enable the debug option so that debugging is
				6669	always possible.
				6670
				6671	The trace facility outputs messages with macro calls of the form
				6672	@code{YYFPRINTF (stderr, @var{format}, @var{args})} where
				6673	@var{format} and @var{args} are the usual @code{printf} format and
				6674	arguments. If you define @code{YYDEBUG} to a nonzero value but do not
				6675	define @code{YYFPRINTF}, @code{<stdio.h>} is automatically included
				6676	and @code{YYPRINTF} is defined to @code{fprintf}.
				6677
				6678	Once you have compiled the program with trace facilities, the way to
				6679	request a trace is to store a nonzero value in the variable @code{yydebug}.
				6680	You can do this by making the C code do it (in @code{main}, perhaps), or
				6681	you can alter the value with a C debugger.
				6682
				6683	Each step taken by the parser when @code{yydebug} is nonzero produces a
				6684	line or two of trace information, written on @code{stderr}. The trace
				6685	messages tell you these things:
				6686
				6687	@itemize @bullet
				6688	@item
				6689	Each time the parser calls @code{yylex}, what kind of token was read.
				6690
				6691	@item
				6692	Each time a token is shifted, the depth and complete contents of the
				6693	state stack (@pxref{Parser States}).
				6694
				6695	@item
				6696	Each time a rule is reduced, which rule it is, and the complete contents
				6697	of the state stack afterward.
				6698	@end itemize
				6699
				6700	To make sense of this information, it helps to refer to the listing file
				6701	produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking
				6702	Bison}). This file shows the meaning of each state in terms of
				6703	positions in various rules, and also what each state will do with each
				6704	possible input token. As you read the successive trace messages, you
				6705	can see that the parser is functioning according to its specification in
				6706	the listing file. Eventually you will arrive at the place where
				6707	something undesirable happens, and you will see which parts of the
				6708	grammar are to blame.
				6709
				6710	The parser file is a C program and you can use C debuggers on it, but it's
				6711	not easy to interpret what it is doing. The parser function is a
				6712	finite-state machine interpreter, and aside from the actions it executes
				6713	the same code over and over. Only the values of variables show where in
				6714	the grammar it is working.
				6715
				6716	@findex YYPRINT
				6717	The debugging information normally gives the token type of each token
				6718	read, but not its semantic value. You can optionally define a macro
				6719	named @code{YYPRINT} to provide a way to print the value. If you define
				6720	@code{YYPRINT}, it should take three arguments. The parser will pass a
				6721	standard I/O stream, the numeric code for the token type, and the token
				6722	value (from @code{yylval}).
				6723
				6724	Here is an example of @code{YYPRINT} suitable for the multi-function
				6725	calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}):
				6726
				6727	@smallexample
				6728	%@{
				6729	static void print_token_value (FILE *, int, YYSTYPE);
				6730	#define YYPRINT(file, type, value) print_token_value (file, type, value)
				6731	%@}
				6732
				6733	@dots{} %% @dots{} %% @dots{}
				6734
				6735	static void
				6736	print_token_value (FILE *file, int type, YYSTYPE value)
				6737	@{
				6738	if (type == VAR)
				6739	fprintf (file, "%s", value.tptr->name);
				6740	else if (type == NUM)
				6741	fprintf (file, "%d", value.val);
				6742	@}
				6743	@end smallexample
				6744
				6745	@c ================================================= Invoking Bison
				6746
				6747	@node Invocation
				6748	@chapter Invoking Bison
				6749	@cindex invoking Bison
				6750	@cindex Bison invocation
				6751	@cindex options for invoking Bison
				6752
				6753	The usual way to invoke Bison is as follows:
				6754
				6755	@example
				6756	bison @var{infile}
				6757	@end example
				6758
				6759	Here @var{infile} is the grammar file name, which usually ends in
				6760	@samp{.y}. The parser file's name is made by replacing the @samp{.y}
				6761	with @samp{.tab.c} and removing any leading directory. Thus, the
				6762	@samp{bison foo.y} file name yields
				6763	@file{foo.tab.c}, and the @samp{bison hack/foo.y} file name yields
				6764	@file{foo.tab.c}. It's also possible, in case you are writing
				6765	C++ code instead of C in your grammar file, to name it @file{foo.ypp}
				6766	or @file{foo.y++}. Then, the output files will take an extension like
				6767	the given one as input (respectively @file{foo.tab.cpp} and
				6768	@file{foo.tab.c++}).
				6769	This feature takes effect with all options that manipulate file names like
				6770	@samp{-o} or @samp{-d}.
				6771
				6772	For example :
				6773
				6774	@example
				6775	bison -d @var{infile.yxx}
				6776	@end example
				6777	@noindent
				6778	will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}, and
				6779
				6780	@example
				6781	bison -d -o @var{output.c++} @var{infile.y}
				6782	@end example
				6783	@noindent
				6784	will produce @file{output.c++} and @file{outfile.h++}.
				6785
				6786	For compatibility with @acronym{POSIX}, the standard Bison
				6787	distribution also contains a shell script called @command{yacc} that
				6788	invokes Bison with the @option{-y} option.
				6789
				6790	@menu
				6791	* Bison Options:: All the options described in detail,
				6792	in alphabetical order by short options.
				6793	* Option Cross Key:: Alphabetical list of long options.
				6794	* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
				6795	@end menu
				6796
				6797	@node Bison Options
				6798	@section Bison Options
				6799
				6800	Bison supports both traditional single-letter options and mnemonic long
				6801	option names. Long option names are indicated with @samp{--} instead of
				6802	@samp{-}. Abbreviations for option names are allowed as long as they
				6803	are unique. When a long option takes an argument, like
				6804	@samp{--file-prefix}, connect the option name and the argument with
				6805	@samp{=}.
				6806
				6807	Here is a list of options that can be used with Bison, alphabetized by
				6808	short option. It is followed by a cross key alphabetized by long
				6809	option.
				6810
				6811	@c Please, keep this ordered as in `bison --help'.
				6812	@noindent
				6813	Operations modes:
				6814	@table @option
				6815	@item -h
				6816	@itemx --help
				6817	Print a summary of the command-line options to Bison and exit.
				6818
				6819	@item -V
				6820	@itemx --version
				6821	Print the version number of Bison and exit.
				6822
				6823	@item --print-localedir
				6824	Print the name of the directory containing locale-dependent data.
				6825
				6826	@item -y
				6827	@itemx --yacc
				6828	Act more like the traditional Yacc command. This can cause
				6829	different diagnostics to be generated, and may change behavior in
				6830	other minor ways. Most importantly, imitate Yacc's output
				6831	file name conventions, so that the parser output file is called
				6832	@file{y.tab.c}, and the other outputs are called @file{y.output} and
				6833	@file{y.tab.h}. Thus, the following shell script can substitute
				6834	for Yacc, and the Bison distribution contains such a script for
				6835	compatibility with @acronym{POSIX}:
				6836
				6837	@example
				6838	#! /bin/sh
				6839	bison -y "$@@"
				6840	@end example
				6841
				6842	The @option{-y}/@option{--yacc} option is intended for use with
				6843	traditional Yacc grammars. If your grammar uses a Bison extension
				6844	like @samp{%glr-parser}, Bison might not be Yacc-compatible even if
				6845	this option is specified.
				6846
				6847	@end table
				6848
				6849	@noindent
				6850	Tuning the parser:
				6851
				6852	@table @option
				6853	@item -S @var{file}
				6854	@itemx --skeleton=@var{file}
				6855	Specify the skeleton to use. You probably don't need this option unless
				6856	you are developing Bison.
				6857
				6858	@item -t
				6859	@itemx --debug
				6860	In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
				6861	already defined, so that the debugging facilities are compiled.
				6862	@xref{Tracing, ,Tracing Your Parser}.
				6863
				6864	@item --locations
				6865	Pretend that @code{%locations} was specified. @xref{Decl Summary}.
				6866
				6867	@item -p @var{prefix}
				6868	@itemx --name-prefix=@var{prefix}
				6869	Pretend that @code{%name-prefix="@var{prefix}"} was specified.
				6870	@xref{Decl Summary}.
				6871
				6872	@item -l
				6873	@itemx --no-lines
				6874	Don't put any @code{#line} preprocessor commands in the parser file.
				6875	Ordinarily Bison puts them in the parser file so that the C compiler
				6876	and debuggers will associate errors with your source file, the
				6877	grammar file. This option causes them to associate errors with the
				6878	parser file, treating it as an independent source file in its own right.
				6879
				6880	@item -n
				6881	@itemx --no-parser
				6882	Pretend that @code{%no-parser} was specified. @xref{Decl Summary}.
				6883
				6884	@item -k
				6885	@itemx --token-table
				6886	Pretend that @code{%token-table} was specified. @xref{Decl Summary}.
				6887	@end table
				6888
				6889	@noindent
				6890	Adjust the output:
				6891
				6892	@table @option
				6893	@item -d
				6894	@itemx --defines
				6895	Pretend that @code{%defines} was specified, i.e., write an extra output
				6896	file containing macro definitions for the token type names defined in
				6897	the grammar, as well as a few other declarations. @xref{Decl Summary}.
				6898
				6899	@item --defines=@var{defines-file}
				6900	Same as above, but save in the file @var{defines-file}.
				6901
				6902	@item -b @var{file-prefix}
				6903	@itemx --file-prefix=@var{prefix}
				6904	Pretend that @code{%file-prefix} was specified, i.e, specify prefix to use
				6905	for all Bison output file names. @xref{Decl Summary}.
				6906
				6907	@item -r @var{things}
				6908	@itemx --report=@var{things}
				6909	Write an extra output file containing verbose description of the comma
				6910	separated list of @var{things} among:
				6911
				6912	@table @code
				6913	@item state
				6914	Description of the grammar, conflicts (resolved and unresolved), and
				6915	@acronym{LALR} automaton.
				6916
				6917	@item look-ahead
				6918	Implies @code{state} and augments the description of the automaton with
				6919	each rule's look-ahead set.
				6920
				6921	@item itemset
				6922	Implies @code{state} and augments the description of the automaton with
				6923	the full set of items for each state, instead of its core only.
				6924	@end table
				6925
				6926	@item -v
				6927	@itemx --verbose
				6928	Pretend that @code{%verbose} was specified, i.e, write an extra output
				6929	file containing verbose descriptions of the grammar and
				6930	parser. @xref{Decl Summary}.
				6931
				6932	@item -o @var{file}
				6933	@itemx --output=@var{file}
				6934	Specify the @var{file} for the parser file.
				6935
				6936	The other output files' names are constructed from @var{file} as
				6937	described under the @samp{-v} and @samp{-d} options.
				6938
				6939	@item -g
				6940	Output a @acronym{VCG} definition of the @acronym{LALR}(1) grammar
				6941	automaton computed by Bison. If the grammar file is @file{foo.y}, the
				6942	@acronym{VCG} output file will
				6943	be @file{foo.vcg}.
				6944
				6945	@item --graph=@var{graph-file}
				6946	The behavior of @var{--graph} is the same than @samp{-g}. The only
				6947	difference is that it has an optional argument which is the name of
				6948	the output graph file.
				6949	@end table
				6950
				6951	@node Option Cross Key
				6952	@section Option Cross Key
				6953
				6954	@c FIXME: How about putting the directives too?
				6955	Here is a list of options, alphabetized by long option, to help you find
				6956	the corresponding short option.
				6957
				6958	@multitable {@option{--defines=@var{defines-file}}} {@option{-b @var{file-prefix}XXX}}
				6959	@headitem Long Option @tab Short Option
				6960	@item @option{--debug} @tab @option{-t}
				6961	@item @option{--defines=@var{defines-file}} @tab @option{-d}
				6962	@item @option{--file-prefix=@var{prefix}} @tab @option{-b @var{file-prefix}}
				6963	@item @option{--graph=@var{graph-file}} @tab @option{-d}
				6964	@item @option{--help} @tab @option{-h}
				6965	@item @option{--name-prefix=@var{prefix}} @tab @option{-p @var{name-prefix}}
				6966	@item @option{--no-lines} @tab @option{-l}
				6967	@item @option{--no-parser} @tab @option{-n}
				6968	@item @option{--output=@var{outfile}} @tab @option{-o @var{outfile}}
				6969	@item @option{--print-localedir} @tab
				6970	@item @option{--token-table} @tab @option{-k}
				6971	@item @option{--verbose} @tab @option{-v}
				6972	@item @option{--version} @tab @option{-V}
				6973	@item @option{--yacc} @tab @option{-y}
				6974	@end multitable
				6975
				6976	@node Yacc Library
				6977	@section Yacc Library
				6978
				6979	The Yacc library contains default implementations of the
				6980	@code{yyerror} and @code{main} functions. These default
				6981	implementations are normally not useful, but @acronym{POSIX} requires
				6982	them. To use the Yacc library, link your program with the
				6983	@option{-ly} option. Note that Bison's implementation of the Yacc
				6984	library is distributed under the terms of the @acronym{GNU} General
				6985	Public License (@pxref{Copying}).
				6986
				6987	If you use the Yacc library's @code{yyerror} function, you should
				6988	declare @code{yyerror} as follows:
				6989
				6990	@example
				6991	int yyerror (char const *);
				6992	@end example
				6993
				6994	Bison ignores the @code{int} value returned by this @code{yyerror}.
				6995	If you use the Yacc library's @code{main} function, your
				6996	@code{yyparse} function should have the following type signature:
				6997
				6998	@example
				6999	int yyparse (void);
				7000	@end example
				7001
				7002	@c ================================================= C++ Bison
				7003
				7004	@node C++ Language Interface
				7005	@chapter C++ Language Interface
				7006
				7007	@menu
				7008	* C++ Parsers:: The interface to generate C++ parser classes
				7009	* A Complete C++ Example:: Demonstrating their use
				7010	@end menu
				7011
				7012	@node C++ Parsers
				7013	@section C++ Parsers
				7014
				7015	@menu
				7016	* C++ Bison Interface:: Asking for C++ parser generation
				7017	* C++ Semantic Values:: %union vs. C++
				7018	* C++ Location Values:: The position and location classes
				7019	* C++ Parser Interface:: Instantiating and running the parser
				7020	* C++ Scanner Interface:: Exchanges between yylex and parse
				7021	@end menu
				7022
				7023	@node C++ Bison Interface
				7024	@subsection C++ Bison Interface
				7025	@c - %skeleton "lalr1.cc"
				7026	@c - Always pure
				7027	@c - initial action
				7028
				7029	The C++ parser @acronym{LALR}(1) skeleton is named @file{lalr1.cc}. To
				7030	select it, you may either pass the option @option{--skeleton=lalr1.cc}
				7031	to Bison, or include the directive @samp{%skeleton "lalr1.cc"} in the
				7032	grammar preamble. When run, @command{bison} will create several
				7033	entities in the @samp{yy} namespace. Use the @samp{%name-prefix}
				7034	directive to change the namespace name, see @ref{Decl Summary}. The
				7035	various classes are generated in the following files:
				7036
				7037	@table @file
				7038	@item position.hh
				7039	@itemx location.hh
				7040	The definition of the classes @code{position} and @code{location},
				7041	used for location tracking. @xref{C++ Location Values}.
				7042
				7043	@item stack.hh
				7044	An auxiliary class @code{stack} used by the parser.
				7045
				7046	@item @var{file}.hh
				7047	@itemx @var{file}.cc
				7048	(Assuming the extension of the input file was @samp{.yy}.) The
				7049	declaration and implementation of the C++ parser class. The basename
				7050	and extension of these two files follow the same rules as with regular C
				7051	parsers (@pxref{Invocation}).
				7052
				7053	The header is @emph{mandatory}; you must either pass
				7054	@option{-d}/@option{--defines} to @command{bison}, or use the
				7055	@samp{%defines} directive.
				7056	@end table
				7057
				7058	All these files are documented using Doxygen; run @command{doxygen}
				7059	for a complete and accurate documentation.
				7060
				7061	@node C++ Semantic Values
				7062	@subsection C++ Semantic Values
				7063	@c - No objects in unions
				7064	@c - YSTYPE
				7065	@c - Printer and destructor
				7066
				7067	The @code{%union} directive works as for C, see @ref{Union Decl, ,The
				7068	Collection of Value Types}. In particular it produces a genuine
				7069	@code{union}@footnote{In the future techniques to allow complex types
				7070	within pseudo-unions (similar to Boost variants) might be implemented to
				7071	alleviate these issues.}, which have a few specific features in C++.
				7072	@itemize @minus
				7073	@item
				7074	The type @code{YYSTYPE} is defined but its use is discouraged: rather
				7075	you should refer to the parser's encapsulated type
				7076	@code{yy::parser::semantic_type}.
				7077	@item
				7078	Non POD (Plain Old Data) types cannot be used. C++ forbids any
				7079	instance of classes with constructors in unions: only @emph{pointers}
				7080	to such objects are allowed.
				7081	@end itemize
				7082
				7083	Because objects have to be stored via pointers, memory is not
				7084	reclaimed automatically: using the @code{%destructor} directive is the
				7085	only means to avoid leaks. @xref{Destructor Decl, , Freeing Discarded
				7086	Symbols}.
				7087
				7088
				7089	@node C++ Location Values
				7090	@subsection C++ Location Values
				7091	@c - %locations
				7092	@c - class Position
				7093	@c - class Location
				7094	@c - %define "filename_type" "const symbol::Symbol"
				7095
				7096	When the directive @code{%locations} is used, the C++ parser supports
				7097	location tracking, see @ref{Locations, , Locations Overview}. Two
				7098	auxiliary classes define a @code{position}, a single point in a file,
				7099	and a @code{location}, a range composed of a pair of
				7100	@code{position}s (possibly spanning several files).
				7101
				7102	@deftypemethod {position} {std::string*} file
				7103	The name of the file. It will always be handled as a pointer, the
				7104	parser will never duplicate nor deallocate it. As an experimental
				7105	feature you may change it to @samp{@var{type}*} using @samp{%define
				7106	"filename_type" "@var{type}"}.
				7107	@end deftypemethod
				7108
				7109	@deftypemethod {position} {unsigned int} line
				7110	The line, starting at 1.
				7111	@end deftypemethod
				7112
				7113	@deftypemethod {position} {unsigned int} lines (int @var{height} = 1)
				7114	Advance by @var{height} lines, resetting the column number.
				7115	@end deftypemethod
				7116
				7117	@deftypemethod {position} {unsigned int} column
				7118	The column, starting at 0.
				7119	@end deftypemethod
				7120
				7121	@deftypemethod {position} {unsigned int} columns (int @var{width} = 1)
				7122	Advance by @var{width} columns, without changing the line number.
				7123	@end deftypemethod
				7124
				7125	@deftypemethod {position} {position&} operator+= (position& @var{pos}, int @var{width})
				7126	@deftypemethodx {position} {position} operator+ (const position& @var{pos}, int @var{width})
				7127	@deftypemethodx {position} {position&} operator-= (const position& @var{pos}, int @var{width})
				7128	@deftypemethodx {position} {position} operator- (position& @var{pos}, int @var{width})
				7129	Various forms of syntactic sugar for @code{columns}.
				7130	@end deftypemethod
				7131
				7132	@deftypemethod {position} {position} operator<< (std::ostream @var{o}, const position& @var{p})
				7133	Report @var{p} on @var{o} like this:
				7134	@samp{@var{file}:@var{line}.@var{column}}, or
				7135	@samp{@var{line}.@var{column}} if @var{file} is null.
				7136	@end deftypemethod
				7137
				7138	@deftypemethod {location} {position} begin
				7139	@deftypemethodx {location} {position} end
				7140	The first, inclusive, position of the range, and the first beyond.
				7141	@end deftypemethod
				7142
				7143	@deftypemethod {location} {unsigned int} columns (int @var{width} = 1)
				7144	@deftypemethodx {location} {unsigned int} lines (int @var{height} = 1)
				7145	Advance the @code{end} position.
				7146	@end deftypemethod
				7147
				7148	@deftypemethod {location} {location} operator+ (const location& @var{begin}, const location& @var{end})
				7149	@deftypemethodx {location} {location} operator+ (const location& @var{begin}, int @var{width})
				7150	@deftypemethodx {location} {location} operator+= (const location& @var{loc}, int @var{width})
				7151	Various forms of syntactic sugar.
				7152	@end deftypemethod
				7153
				7154	@deftypemethod {location} {void} step ()
				7155	Move @code{begin} onto @code{end}.
				7156	@end deftypemethod
				7157
				7158
				7159	@node C++ Parser Interface
				7160	@subsection C++ Parser Interface
				7161	@c - define parser_class_name
				7162	@c - Ctor
				7163	@c - parse, error, set_debug_level, debug_level, set_debug_stream,
				7164	@c debug_stream.
				7165	@c - Reporting errors
				7166
				7167	The output files @file{@var{output}.hh} and @file{@var{output}.cc}
				7168	declare and define the parser class in the namespace @code{yy}. The
				7169	class name defaults to @code{parser}, but may be changed using
				7170	@samp{%define "parser_class_name" "@var{name}"}. The interface of
				7171	this class is detailed below. It can be extended using the
				7172	@code{%parse-param} feature: its semantics is slightly changed since
				7173	it describes an additional member of the parser class, and an
				7174	additional argument for its constructor.
				7175
				7176	@defcv {Type} {parser} {semantic_value_type}
				7177	@defcvx {Type} {parser} {location_value_type}
				7178	The types for semantics value and locations.
				7179	@end defcv
				7180
				7181	@deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...)
				7182	Build a new parser object. There are no arguments by default, unless
				7183	@samp{%parse-param @{@var{type1} @var{arg1}@}} was used.
				7184	@end deftypemethod
				7185
				7186	@deftypemethod {parser} {int} parse ()
				7187	Run the syntactic analysis, and return 0 on success, 1 otherwise.
				7188	@end deftypemethod
				7189
				7190	@deftypemethod {parser} {std::ostream&} debug_stream ()
				7191	@deftypemethodx {parser} {void} set_debug_stream (std::ostream& @var{o})
				7192	Get or set the stream used for tracing the parsing. It defaults to
				7193	@code{std::cerr}.
				7194	@end deftypemethod
				7195
				7196	@deftypemethod {parser} {debug_level_type} debug_level ()
				7197	@deftypemethodx {parser} {void} set_debug_level (debug_level @var{l})
				7198	Get or set the tracing level. Currently its value is either 0, no trace,
				7199	or nonzero, full tracing.
				7200	@end deftypemethod
				7201
				7202	@deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m})
				7203	The definition for this member function must be supplied by the user:
				7204	the parser uses it to report a parser error occurring at @var{l},
				7205	described by @var{m}.
				7206	@end deftypemethod
				7207
				7208
				7209	@node C++ Scanner Interface
				7210	@subsection C++ Scanner Interface
				7211	@c - prefix for yylex.
				7212	@c - Pure interface to yylex
				7213	@c - %lex-param
				7214
				7215	The parser invokes the scanner by calling @code{yylex}. Contrary to C
				7216	parsers, C++ parsers are always pure: there is no point in using the
				7217	@code{%pure-parser} directive. Therefore the interface is as follows.
				7218
				7219	@deftypemethod {parser} {int} yylex (semantic_value_type& @var{yylval}, location_type& @var{yylloc}, @var{type1} @var{arg1}, ...)
				7220	Return the next token. Its type is the return value, its semantic
				7221	value and location being @var{yylval} and @var{yylloc}. Invocations of
				7222	@samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments.
				7223	@end deftypemethod
				7224
				7225
				7226	@node A Complete C++ Example
				7227	@section A Complete C++ Example
				7228
				7229	This section demonstrates the use of a C++ parser with a simple but
				7230	complete example. This example should be available on your system,
				7231	ready to compile, in the directory @dfn{../bison/examples/calc++}. It
				7232	focuses on the use of Bison, therefore the design of the various C++
				7233	classes is very naive: no accessors, no encapsulation of members etc.
				7234	We will use a Lex scanner, and more precisely, a Flex scanner, to
				7235	demonstrate the various interaction. A hand written scanner is
				7236	actually easier to interface with.
				7237
				7238	@menu
				7239	* Calc++ --- C++ Calculator:: The specifications
				7240	* Calc++ Parsing Driver:: An active parsing context
				7241	* Calc++ Parser:: A parser class
				7242	* Calc++ Scanner:: A pure C++ Flex scanner
				7243	* Calc++ Top Level:: Conducting the band
				7244	@end menu
				7245
				7246	@node Calc++ --- C++ Calculator
				7247	@subsection Calc++ --- C++ Calculator
				7248
				7249	Of course the grammar is dedicated to arithmetics, a single
				7250	expression, possibly preceded by variable assignments. An
				7251	environment containing possibly predefined variables such as
				7252	@code{one} and @code{two}, is exchanged with the parser. An example
				7253	of valid input follows.
				7254
				7255	@example
				7256	three := 3
				7257	seven := one + two * three
				7258	seven * seven
				7259	@end example
				7260
				7261	@node Calc++ Parsing Driver
				7262	@subsection Calc++ Parsing Driver
				7263	@c - An env
				7264	@c - A place to store error messages
				7265	@c - A place for the result
				7266
				7267	To support a pure interface with the parser (and the scanner) the
				7268	technique of the ``parsing context'' is convenient: a structure
				7269	containing all the data to exchange. Since, in addition to simply
				7270	launch the parsing, there are several auxiliary tasks to execute (open
				7271	the file for parsing, instantiate the parser etc.), we recommend
				7272	transforming the simple parsing context structure into a fully blown
				7273	@dfn{parsing driver} class.
				7274
				7275	The declaration of this driver class, @file{calc++-driver.hh}, is as
				7276	follows. The first part includes the CPP guard and imports the
				7277	required standard library components, and the declaration of the parser
				7278	class.
				7279
				7280	@comment file: calc++-driver.hh
				7281	@example
				7282	#ifndef CALCXX_DRIVER_HH
				7283	# define CALCXX_DRIVER_HH
				7284	# include <string>
				7285	# include <map>
				7286	# include "calc++-parser.hh"
				7287	@end example
				7288
				7289
				7290	@noindent
				7291	Then comes the declaration of the scanning function. Flex expects
				7292	the signature of @code{yylex} to be defined in the macro
				7293	@code{YY_DECL}, and the C++ parser expects it to be declared. We can
				7294	factor both as follows.
				7295
				7296	@comment file: calc++-driver.hh
				7297	@example
				7298	// Announce to Flex the prototype we want for lexing function, ...
				7299	# define YY_DECL \
				7300	yy::calcxx_parser::token_type \
				7301	yylex (yy::calcxx_parser::semantic_type* yylval, \
				7302	yy::calcxx_parser::location_type* yylloc, \
				7303	calcxx_driver& driver)
				7304	// ... and declare it for the parser's sake.
				7305	YY_DECL;
				7306	@end example
				7307
				7308	@noindent
				7309	The @code{calcxx_driver} class is then declared with its most obvious
				7310	members.
				7311
				7312	@comment file: calc++-driver.hh
				7313	@example
				7314	// Conducting the whole scanning and parsing of Calc++.
				7315	class calcxx_driver
				7316	@{
				7317	public:
				7318	calcxx_driver ();
				7319	virtual ~calcxx_driver ();
				7320
				7321	std::map<std::string, int> variables;
				7322
				7323	int result;
				7324	@end example
				7325
				7326	@noindent
				7327	To encapsulate the coordination with the Flex scanner, it is useful to
				7328	have two members function to open and close the scanning phase.
				7329	members.
				7330
				7331	@comment file: calc++-driver.hh
				7332	@example
				7333	// Handling the scanner.
				7334	void scan_begin ();
				7335	void scan_end ();
				7336	bool trace_scanning;
				7337	@end example
				7338
				7339	@noindent
				7340	Similarly for the parser itself.
				7341
				7342	@comment file: calc++-driver.hh
				7343	@example
				7344	// Handling the parser.
				7345	void parse (const std::string& f);
				7346	std::string file;
				7347	bool trace_parsing;
				7348	@end example
				7349
				7350	@noindent
				7351	To demonstrate pure handling of parse errors, instead of simply
				7352	dumping them on the standard error output, we will pass them to the
				7353	compiler driver using the following two member functions. Finally, we
				7354	close the class declaration and CPP guard.
				7355
				7356	@comment file: calc++-driver.hh
				7357	@example
				7358	// Error handling.
				7359	void error (const yy::location& l, const std::string& m);
				7360	void error (const std::string& m);
				7361	@};
				7362	#endif // ! CALCXX_DRIVER_HH
				7363	@end example
				7364
				7365	The implementation of the driver is straightforward. The @code{parse}
				7366	member function deserves some attention. The @code{error} functions
				7367	are simple stubs, they should actually register the located error
				7368	messages and set error state.
				7369
				7370	@comment file: calc++-driver.cc
				7371	@example
				7372	#include "calc++-driver.hh"
				7373	#include "calc++-parser.hh"
				7374
				7375	calcxx_driver::calcxx_driver ()
				7376	: trace_scanning (false), trace_parsing (false)
				7377	@{
				7378	variables["one"] = 1;
				7379	variables["two"] = 2;
				7380	@}
				7381
				7382	calcxx_driver::~calcxx_driver ()
				7383	@{
				7384	@}
				7385
				7386	void
				7387	calcxx_driver::parse (const std::string &f)
				7388	@{
				7389	file = f;
				7390	scan_begin ();
				7391	yy::calcxx_parser parser (*this);
				7392	parser.set_debug_level (trace_parsing);
				7393	parser.parse ();
				7394	scan_end ();
				7395	@}
				7396
				7397	void
				7398	calcxx_driver::error (const yy::location& l, const std::string& m)
				7399	@{
				7400	std::cerr << l << ": " << m << std::endl;
				7401	@}
				7402
				7403	void
				7404	calcxx_driver::error (const std::string& m)
				7405	@{
				7406	std::cerr << m << std::endl;
				7407	@}
				7408	@end example
				7409
				7410	@node Calc++ Parser
				7411	@subsection Calc++ Parser
				7412
				7413	The parser definition file @file{calc++-parser.yy} starts by asking for
				7414	the C++ LALR(1) skeleton, the creation of the parser header file, and
				7415	specifies the name of the parser class. Because the C++ skeleton
				7416	changed several times, it is safer to require the version you designed
				7417	the grammar for.
				7418
				7419	@comment file: calc++-parser.yy
				7420	@example
				7421	%skeleton "lalr1.cc" /* -- C++ -- */
				7422	%require "2.1a"
				7423	%defines
				7424	%define "parser_class_name" "calcxx_parser"
				7425	@end example
				7426
				7427	@noindent
				7428	Then come the declarations/inclusions needed to define the
				7429	@code{%union}. Because the parser uses the parsing driver and
				7430	reciprocally, both cannot include the header of the other. Because the
				7431	driver's header needs detailed knowledge about the parser class (in
				7432	particular its inner types), it is the parser's header which will simply
				7433	use a forward declaration of the driver.
				7434
				7435	@comment file: calc++-parser.yy
				7436	@example
				7437	%@{
				7438	# include <string>
				7439	class calcxx_driver;
				7440	%@}
				7441	@end example
				7442
				7443	@noindent
				7444	The driver is passed by reference to the parser and to the scanner.
				7445	This provides a simple but effective pure interface, not relying on
				7446	global variables.
				7447
				7448	@comment file: calc++-parser.yy
				7449	@example
				7450	// The parsing context.
				7451	%parse-param @{ calcxx_driver& driver @}
				7452	%lex-param @{ calcxx_driver& driver @}
				7453	@end example
				7454
				7455	@noindent
				7456	Then we request the location tracking feature, and initialize the
				7457	first location's file name. Afterwards new locations are computed
				7458	relatively to the previous locations: the file name will be
				7459	automatically propagated.
				7460
				7461	@comment file: calc++-parser.yy
				7462	@example
				7463	%locations
				7464	%initial-action
				7465	@{
				7466	// Initialize the initial location.
				7467	@@$.begin.filename = @@$.end.filename = &driver.file;
				7468	@};
				7469	@end example
				7470
				7471	@noindent
				7472	Use the two following directives to enable parser tracing and verbose
				7473	error messages.
				7474
				7475	@comment file: calc++-parser.yy
				7476	@example
				7477	%debug
				7478	%error-verbose
				7479	@end example
				7480
				7481	@noindent
				7482	Semantic values cannot use ``real'' objects, but only pointers to
				7483	them.
				7484
				7485	@comment file: calc++-parser.yy
				7486	@example
				7487	// Symbols.
				7488	%union
				7489	@{
				7490	int ival;
				7491	std::string *sval;
				7492	@};
				7493	@end example
				7494
				7495	@noindent
				7496	The code between @samp{%@{} and @samp{%@}} after the introduction of the
				7497	@samp{%union} is output in the @file{*.cc} file; it needs detailed
				7498	knowledge about the driver.
				7499
				7500	@comment file: calc++-parser.yy
				7501	@example
				7502	%@{
				7503	# include "calc++-driver.hh"
				7504	%@}
				7505	@end example
				7506
				7507
				7508	@noindent
				7509	The token numbered as 0 corresponds to end of file; the following line
				7510	allows for nicer error messages referring to ``end of file'' instead
				7511	of ``$end''. Similarly user friendly named are provided for each
				7512	symbol. Note that the tokens names are prefixed by @code{TOKEN_} to
				7513	avoid name clashes.
				7514
				7515	@comment file: calc++-parser.yy
				7516	@example
				7517	%token END 0 "end of file"
				7518	%token ASSIGN ":="
				7519	%token <sval> IDENTIFIER "identifier"
				7520	%token <ival> NUMBER "number"
				7521	%type <ival> exp "expression"
				7522	@end example
				7523
				7524	@noindent
				7525	To enable memory deallocation during error recovery, use
				7526	@code{%destructor}.
				7527
				7528	@c FIXME: Document %printer, and mention that it takes a braced-code operand.
				7529	@comment file: calc++-parser.yy
				7530	@example
				7531	%printer @{ debug_stream () << *$$; @} "identifier"
				7532	%destructor @{ delete $$; @} "identifier"
				7533
				7534	%printer @{ debug_stream () << $$; @} "number" "expression"
				7535	@end example
				7536
				7537	@noindent
				7538	The grammar itself is straightforward.
				7539
				7540	@comment file: calc++-parser.yy
				7541	@example
				7542	%%
				7543	%start unit;
				7544	unit: assignments exp @{ driver.result = $2; @};
				7545
				7546	assignments: assignments assignment @{@}
				7547	\| /* Nothing. */ @{@};
				7548
				7549	assignment: "identifier" ":=" exp @{ driver.variables[*$1] = $3; @};
				7550
				7551	%left '+' '-';
				7552	%left '*' '/';
				7553	exp: exp '+' exp @{ $$ = $1 + $3; @}
				7554	\| exp '-' exp @{ $$ = $1 - $3; @}
				7555	\| exp '' exp @{ $$ = $1 $3; @}
				7556	\| exp '/' exp @{ $$ = $1 / $3; @}
				7557	\| "identifier" @{ $$ = driver.variables[*$1]; @}
				7558	\| "number" @{ $$ = $1; @};
				7559	%%
				7560	@end example
				7561
				7562	@noindent
				7563	Finally the @code{error} member function registers the errors to the
				7564	driver.
				7565
				7566	@comment file: calc++-parser.yy
				7567	@example
				7568	void
				7569	yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l,
				7570	const std::string& m)
				7571	@{
				7572	driver.error (l, m);
				7573	@}
				7574	@end example
				7575
				7576	@node Calc++ Scanner
				7577	@subsection Calc++ Scanner
				7578
				7579	The Flex scanner first includes the driver declaration, then the
				7580	parser's to get the set of defined tokens.
				7581
				7582	@comment file: calc++-scanner.ll
				7583	@example
				7584	%@{ /* -- C++ -- */
				7585	# include <cstdlib>
				7586	# include <errno.h>
				7587	# include <limits.h>
				7588	# include <string>
				7589	# include "calc++-driver.hh"
				7590	# include "calc++-parser.hh"
				7591
				7592	/* Work around an incompatibility in flex (at least versions
				7593	2.5.31 through 2.5.33): it generates code that does
				7594	not conform to C89. See Debian bug 333231
				7595	<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */
				7596	# undef yywrap
				7597	# define yywrap() 1
				7598
				7599	/* By default yylex returns int, we use token_type.
				7600	Unfortunately yyterminate by default returns 0, which is
				7601	not of token_type. */
				7602	#define yyterminate() return token::END
				7603	%@}
				7604	@end example
				7605
				7606	@noindent
				7607	Because there is no @code{#include}-like feature we don't need
				7608	@code{yywrap}, we don't need @code{unput} either, and we parse an
				7609	actual file, this is not an interactive session with the user.
				7610	Finally we enable the scanner tracing features.
				7611
				7612	@comment file: calc++-scanner.ll
				7613	@example
				7614	%option noyywrap nounput batch debug
				7615	@end example
				7616
				7617	@noindent
				7618	Abbreviations allow for more readable rules.
				7619
				7620	@comment file: calc++-scanner.ll
				7621	@example
				7622	id [a-zA-Z][a-zA-Z_0-9]*
				7623	int [0-9]+
				7624	blank [ \t]
				7625	@end example
				7626
				7627	@noindent
				7628	The following paragraph suffices to track locations accurately. Each
				7629	time @code{yylex} is invoked, the begin position is moved onto the end
				7630	position. Then when a pattern is matched, the end position is
				7631	advanced of its width. In case it matched ends of lines, the end
				7632	cursor is adjusted, and each time blanks are matched, the begin cursor
				7633	is moved onto the end cursor to effectively ignore the blanks
				7634	preceding tokens. Comments would be treated equally.
				7635
				7636	@comment file: calc++-scanner.ll
				7637	@example
				7638	%@{
				7639	# define YY_USER_ACTION yylloc->columns (yyleng);
				7640	%@}
				7641	%%
				7642	%@{
				7643	yylloc->step ();
				7644	%@}
				7645	@{blank@}+ yylloc->step ();
				7646	[\n]+ yylloc->lines (yyleng); yylloc->step ();
				7647	@end example
				7648
				7649	@noindent
				7650	The rules are simple, just note the use of the driver to report errors.
				7651	It is convenient to use a typedef to shorten
				7652	@code{yy::calcxx_parser::token::identifier} into
				7653	@code{token::identifier} for instance.
				7654
				7655	@comment file: calc++-scanner.ll
				7656	@example
				7657	%@{
				7658	typedef yy::calcxx_parser::token token;
				7659	%@}
				7660	/* Convert ints to the actual type of tokens. */
				7661	[-+*/] return yy::calcxx_parser::token_type (yytext[0]);
				7662	":=" return token::ASSIGN;
				7663	@{int@} @{
				7664	errno = 0;
				7665	long n = strtol (yytext, NULL, 10);
				7666	if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
				7667	driver.error (*yylloc, "integer is out of range");
				7668	yylval->ival = n;
				7669	return token::NUMBER;
				7670	@}
				7671	@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER;
				7672	. driver.error (*yylloc, "invalid character");
				7673	%%
				7674	@end example
				7675
				7676	@noindent
				7677	Finally, because the scanner related driver's member function depend
				7678	on the scanner's data, it is simpler to implement them in this file.
				7679
				7680	@comment file: calc++-scanner.ll
				7681	@example
				7682	void
				7683	calcxx_driver::scan_begin ()
				7684	@{
				7685	yy_flex_debug = trace_scanning;
				7686	if (!(yyin = fopen (file.c_str (), "r")))
				7687	error (std::string ("cannot open ") + file);
				7688	@}
				7689
				7690	void
				7691	calcxx_driver::scan_end ()
				7692	@{
				7693	fclose (yyin);
				7694	@}
				7695	@end example
				7696
				7697	@node Calc++ Top Level
				7698	@subsection Calc++ Top Level
				7699
				7700	The top level file, @file{calc++.cc}, poses no problem.
				7701
				7702	@comment file: calc++.cc
				7703	@example
				7704	#include <iostream>
				7705	#include "calc++-driver.hh"
				7706
				7707	int
				7708	main (int argc, char *argv[])
				7709	@{
				7710	calcxx_driver driver;
				7711	for (++argv; argv[0]; ++argv)
				7712	if (*argv == std::string ("-p"))
				7713	driver.trace_parsing = true;
				7714	else if (*argv == std::string ("-s"))
				7715	driver.trace_scanning = true;
				7716	else
				7717	@{
				7718	driver.parse (*argv);
				7719	std::cout << driver.result << std::endl;
				7720	@}
				7721	@}
				7722	@end example
				7723
				7724	@c ================================================= FAQ
				7725
				7726	@node FAQ
				7727	@chapter Frequently Asked Questions
				7728	@cindex frequently asked questions
				7729	@cindex questions
				7730
				7731	Several questions about Bison come up occasionally. Here some of them
				7732	are addressed.
				7733
				7734	@menu
				7735	* Memory Exhausted:: Breaking the Stack Limits
				7736	* How Can I Reset the Parser:: @code{yyparse} Keeps some State
				7737	* Strings are Destroyed:: @code{yylval} Loses Track of Strings
				7738	* Implementing Gotos/Loops:: Control Flow in the Calculator
				7739	* Multiple start-symbols:: Factoring closely related grammars
				7740	* Secure? Conform?:: Is Bison @acronym{POSIX} safe?
				7741	* I can't build Bison:: Troubleshooting
				7742	* Where can I find help?:: Troubleshouting
				7743	* Bug Reports:: Troublereporting
				7744	* Other Languages:: Parsers in Java and others
				7745	* Beta Testing:: Experimenting development versions
				7746	* Mailing Lists:: Meeting other Bison users
				7747	@end menu
				7748
				7749	@node Memory Exhausted
				7750	@section Memory Exhausted
				7751
				7752	@display
				7753	My parser returns with error with a @samp{memory exhausted}
				7754	message. What can I do?
				7755	@end display
				7756
				7757	This question is already addressed elsewhere, @xref{Recursion,
				7758	,Recursive Rules}.
				7759
				7760	@node How Can I Reset the Parser
				7761	@section How Can I Reset the Parser
				7762
				7763	The following phenomenon has several symptoms, resulting in the
				7764	following typical questions:
				7765
				7766	@display
				7767	I invoke @code{yyparse} several times, and on correct input it works
				7768	properly; but when a parse error is found, all the other calls fail
				7769	too. How can I reset the error flag of @code{yyparse}?
				7770	@end display
				7771
				7772	@noindent
				7773	or
				7774
				7775	@display
				7776	My parser includes support for an @samp{#include}-like feature, in
				7777	which case I run @code{yyparse} from @code{yyparse}. This fails
				7778	although I did specify I needed a @code{%pure-parser}.
				7779	@end display
				7780
				7781	These problems typically come not from Bison itself, but from
				7782	Lex-generated scanners. Because these scanners use large buffers for
				7783	speed, they might not notice a change of input file. As a
				7784	demonstration, consider the following source file,
				7785	@file{first-line.l}:
				7786
				7787	@verbatim
				7788	%{
				7789	#include <stdio.h>
				7790	#include <stdlib.h>
				7791	%}
				7792	%%
				7793	.*\n ECHO; return 1;
				7794	%%
				7795	int
				7796	yyparse (char const *file)
				7797	{
				7798	yyin = fopen (file, "r");
				7799	if (!yyin)
				7800	exit (2);
				7801	/* One token only. */
				7802	yylex ();
				7803	if (fclose (yyin) != 0)
				7804	exit (3);
				7805	return 0;
				7806	}
				7807
				7808	int
				7809	main (void)
				7810	{
				7811	yyparse ("input");
				7812	yyparse ("input");
				7813	return 0;
				7814	}
				7815	@end verbatim
				7816
				7817	@noindent
				7818	If the file @file{input} contains
				7819
				7820	@verbatim
				7821	input:1: Hello,
				7822	input:2: World!
				7823	@end verbatim
				7824
				7825	@noindent
				7826	then instead of getting the first line twice, you get:
				7827
				7828	@example
				7829	$ @kbd{flex -ofirst-line.c first-line.l}
				7830	$ @kbd{gcc -ofirst-line first-line.c -ll}
				7831	$ @kbd{./first-line}
				7832	input:1: Hello,
				7833	input:2: World!
				7834	@end example
				7835
				7836	Therefore, whenever you change @code{yyin}, you must tell the
				7837	Lex-generated scanner to discard its current buffer and switch to the
				7838	new one. This depends upon your implementation of Lex; see its
				7839	documentation for more. For Flex, it suffices to call
				7840	@samp{YY_FLUSH_BUFFER} after each change to @code{yyin}. If your
				7841	Flex-generated scanner needs to read from several input streams to
				7842	handle features like include files, you might consider using Flex
				7843	functions like @samp{yy_switch_to_buffer} that manipulate multiple
				7844	input buffers.
				7845
				7846	If your Flex-generated scanner uses start conditions (@pxref{Start
				7847	conditions, , Start conditions, flex, The Flex Manual}), you might
				7848	also want to reset the scanner's state, i.e., go back to the initial
				7849	start condition, through a call to @samp{BEGIN (0)}.
				7850
				7851	@node Strings are Destroyed
				7852	@section Strings are Destroyed
				7853
				7854	@display
				7855	My parser seems to destroy old strings, or maybe it loses track of
				7856	them. Instead of reporting @samp{"foo", "bar"}, it reports
				7857	@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
				7858	@end display
				7859
				7860	This error is probably the single most frequent ``bug report'' sent to
				7861	Bison lists, but is only concerned with a misunderstanding of the role
				7862	of the scanner. Consider the following Lex code:
				7863
				7864	@verbatim
				7865	%{
				7866	#include <stdio.h>
				7867	char *yylval = NULL;
				7868	%}
				7869	%%
				7870	.* yylval = yytext; return 1;
				7871	\n /* IGNORE */
				7872	%%
				7873	int
				7874	main ()
				7875	{
				7876	/* Similar to using $1, $2 in a Bison action. */
				7877	char *fst = (yylex (), yylval);
				7878	char *snd = (yylex (), yylval);
				7879	printf ("\"%s\", \"%s\"\n", fst, snd);
				7880	return 0;
				7881	}
				7882	@end verbatim
				7883
				7884	If you compile and run this code, you get:
				7885
				7886	@example
				7887	$ @kbd{flex -osplit-lines.c split-lines.l}
				7888	$ @kbd{gcc -osplit-lines split-lines.c -ll}
				7889	$ @kbd{printf 'one\ntwo\n' \| ./split-lines}
				7890	"one
				7891	two", "two"
				7892	@end example
				7893
				7894	@noindent
				7895	this is because @code{yytext} is a buffer provided for @emph{reading}
				7896	in the action, but if you want to keep it, you have to duplicate it
				7897	(e.g., using @code{strdup}). Note that the output may depend on how
				7898	your implementation of Lex handles @code{yytext}. For instance, when
				7899	given the Lex compatibility option @option{-l} (which triggers the
				7900	option @samp{%array}) Flex generates a different behavior:
				7901
				7902	@example
				7903	$ @kbd{flex -l -osplit-lines.c split-lines.l}
				7904	$ @kbd{gcc -osplit-lines split-lines.c -ll}
				7905	$ @kbd{printf 'one\ntwo\n' \| ./split-lines}
				7906	"two", "two"
				7907	@end example
				7908
				7909
				7910	@node Implementing Gotos/Loops
				7911	@section Implementing Gotos/Loops
				7912
				7913	@display
				7914	My simple calculator supports variables, assignments, and functions,
				7915	but how can I implement gotos, or loops?
				7916	@end display
				7917
				7918	Although very pedagogical, the examples included in the document blur
				7919	the distinction to make between the parser---whose job is to recover
				7920	the structure of a text and to transmit it to subsequent modules of
				7921	the program---and the processing (such as the execution) of this
				7922	structure. This works well with so called straight line programs,
				7923	i.e., precisely those that have a straightforward execution model:
				7924	execute simple instructions one after the others.
				7925
				7926	@cindex abstract syntax tree
				7927	@cindex @acronym{AST}
				7928	If you want a richer model, you will probably need to use the parser
				7929	to construct a tree that does represent the structure it has
				7930	recovered; this tree is usually called the @dfn{abstract syntax tree},
				7931	or @dfn{@acronym{AST}} for short. Then, walking through this tree,
				7932	traversing it in various ways, will enable treatments such as its
				7933	execution or its translation, which will result in an interpreter or a
				7934	compiler.
				7935
				7936	This topic is way beyond the scope of this manual, and the reader is
				7937	invited to consult the dedicated literature.
				7938
				7939
				7940	@node Multiple start-symbols
				7941	@section Multiple start-symbols
				7942
				7943	@display
				7944	I have several closely related grammars, and I would like to share their
				7945	implementations. In fact, I could use a single grammar but with
				7946	multiple entry points.
				7947	@end display
				7948
				7949	Bison does not support multiple start-symbols, but there is a very
				7950	simple means to simulate them. If @code{foo} and @code{bar} are the two
				7951	pseudo start-symbols, then introduce two new tokens, say
				7952	@code{START_FOO} and @code{START_BAR}, and use them as switches from the
				7953	real start-symbol:
				7954
				7955	@example
				7956	%token START_FOO START_BAR;
				7957	%start start;
				7958	start: START_FOO foo
				7959	\| START_BAR bar;
				7960	@end example
				7961
				7962	These tokens prevents the introduction of new conflicts. As far as the
				7963	parser goes, that is all that is needed.
				7964
				7965	Now the difficult part is ensuring that the scanner will send these
				7966	tokens first. If your scanner is hand-written, that should be
				7967	straightforward. If your scanner is generated by Lex, them there is
				7968	simple means to do it: recall that anything between @samp{%@{ ... %@}}
				7969	after the first @code{%%} is copied verbatim in the top of the generated
				7970	@code{yylex} function. Make sure a variable @code{start_token} is
				7971	available in the scanner (e.g., a global variable or using
				7972	@code{%lex-param} etc.), and use the following:
				7973
				7974	@example
				7975	/* @r{Prologue.} */
				7976	%%
				7977	%@{
				7978	if (start_token)
				7979	@{
				7980	int t = start_token;
				7981	start_token = 0;
				7982	return t;
				7983	@}
				7984	%@}
				7985	/* @r{The rules.} */
				7986	@end example
				7987
				7988
				7989	@node Secure? Conform?
				7990	@section Secure? Conform?
				7991
				7992	@display
				7993	Is Bison secure? Does it conform to POSIX?
				7994	@end display
				7995
				7996	If you're looking for a guarantee or certification, we don't provide it.
				7997	However, Bison is intended to be a reliable program that conforms to the
				7998	@acronym{POSIX} specification for Yacc. If you run into problems,
				7999	please send us a bug report.
				8000
				8001	@node I can't build Bison
				8002	@section I can't build Bison
				8003
				8004	@display
				8005	I can't build Bison because @command{make} complains that
				8006	@code{msgfmt} is not found.
				8007	What should I do?
				8008	@end display
				8009
				8010	Like most GNU packages with internationalization support, that feature
				8011	is turned on by default. If you have problems building in the @file{po}
				8012	subdirectory, it indicates that your system's internationalization
				8013	support is lacking. You can re-configure Bison with
				8014	@option{--disable-nls} to turn off this support, or you can install GNU
				8015	gettext from @url{ftp://ftp.gnu.org/gnu/gettext/} and re-configure
				8016	Bison. See the file @file{ABOUT-NLS} for more information.
				8017
				8018
				8019	@node Where can I find help?
				8020	@section Where can I find help?
				8021
				8022	@display
				8023	I'm having trouble using Bison. Where can I find help?
				8024	@end display
				8025
				8026	First, read this fine manual. Beyond that, you can send mail to
				8027	@email{help-bison@@gnu.org}. This mailing list is intended to be
				8028	populated with people who are willing to answer questions about using
				8029	and installing Bison. Please keep in mind that (most of) the people on
				8030	the list have aspects of their lives which are not related to Bison (!),
				8031	so you may not receive an answer to your question right away. This can
				8032	be frustrating, but please try not to honk them off; remember that any
				8033	help they provide is purely voluntary and out of the kindness of their
				8034	hearts.
				8035
				8036	@node Bug Reports
				8037	@section Bug Reports
				8038
				8039	@display
				8040	I found a bug. What should I include in the bug report?
				8041	@end display
				8042
				8043	Before you send a bug report, make sure you are using the latest
				8044	version. Check @url{ftp://ftp.gnu.org/pub/gnu/bison/} or one of its
				8045	mirrors. Be sure to include the version number in your bug report. If
				8046	the bug is present in the latest version but not in a previous version,
				8047	try to determine the most recent version which did not contain the bug.
				8048
				8049	If the bug is parser-related, you should include the smallest grammar
				8050	you can which demonstrates the bug. The grammar file should also be
				8051	complete (i.e., I should be able to run it through Bison without having
				8052	to edit or add anything). The smaller and simpler the grammar, the
				8053	easier it will be to fix the bug.
				8054
				8055	Include information about your compilation environment, including your
				8056	operating system's name and version and your compiler's name and
				8057	version. If you have trouble compiling, you should also include a
				8058	transcript of the build session, starting with the invocation of
				8059	`configure'. Depending on the nature of the bug, you may be asked to
				8060	send additional files as well (such as `config.h' or `config.cache').
				8061
				8062	Patches are most welcome, but not required. That is, do not hesitate to
				8063	send a bug report just because you can not provide a fix.
				8064
				8065	Send bug reports to @email{bug-bison@@gnu.org}.
				8066
				8067	@node Other Languages
				8068	@section Other Languages
				8069
				8070	@display
				8071	Will Bison ever have C++ support? How about Java or @var{insert your
				8072	favorite language here}?
				8073	@end display
				8074
				8075	C++ support is there now, and is documented. We'd love to add other
				8076	languages; contributions are welcome.
				8077
				8078	@node Beta Testing
				8079	@section Beta Testing
				8080
				8081	@display
				8082	What is involved in being a beta tester?
				8083	@end display
				8084
				8085	It's not terribly involved. Basically, you would download a test
				8086	release, compile it, and use it to build and run a parser or two. After
				8087	that, you would submit either a bug report or a message saying that
				8088	everything is okay. It is important to report successes as well as
				8089	failures because test releases eventually become mainstream releases,
				8090	but only if they are adequately tested. If no one tests, development is
				8091	essentially halted.
				8092
				8093	Beta testers are particularly needed for operating systems to which the
				8094	developers do not have easy access. They currently have easy access to
				8095	recent GNU/Linux and Solaris versions. Reports about other operating
				8096	systems are especially welcome.
				8097
				8098	@node Mailing Lists
				8099	@section Mailing Lists
				8100
				8101	@display
				8102	How do I join the help-bison and bug-bison mailing lists?
				8103	@end display
				8104
				8105	See @url{http://lists.gnu.org/}.
				8106
				8107	@c ================================================= Table of Symbols
				8108
				8109	@node Table of Symbols
				8110	@appendix Bison Symbols
				8111	@cindex Bison symbols, table of
				8112	@cindex symbols in Bison, table of
				8113
				8114	@deffn {Variable} @@$
				8115	In an action, the location of the left-hand side of the rule.
				8116	@xref{Locations, , Locations Overview}.
				8117	@end deffn
				8118
				8119	@deffn {Variable} @@@var{n}
				8120	In an action, the location of the @var{n}-th symbol of the right-hand
				8121	side of the rule. @xref{Locations, , Locations Overview}.
				8122	@end deffn
				8123
				8124	@deffn {Variable} $$
				8125	In an action, the semantic value of the left-hand side of the rule.
				8126	@xref{Actions}.
				8127	@end deffn
				8128
				8129	@deffn {Variable} $@var{n}
				8130	In an action, the semantic value of the @var{n}-th symbol of the
				8131	right-hand side of the rule. @xref{Actions}.
				8132	@end deffn
				8133
				8134	@deffn {Delimiter} %%
				8135	Delimiter used to separate the grammar rule section from the
				8136	Bison declarations section or the epilogue.
				8137	@xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}.
				8138	@end deffn
				8139
				8140	@c Don't insert spaces, or check the DVI output.
				8141	@deffn {Delimiter} %@{@var{code}%@}
				8142	All code listed between @samp{%@{} and @samp{%@}} is copied directly to
				8143	the output file uninterpreted. Such code forms the prologue of the input
				8144	file. @xref{Grammar Outline, ,Outline of a Bison
				8145	Grammar}.
				8146	@end deffn
				8147
				8148	@deffn {Construct} /@dots{}/
				8149	Comment delimiters, as in C.
				8150	@end deffn
				8151
				8152	@deffn {Delimiter} :
				8153	Separates a rule's result from its components. @xref{Rules, ,Syntax of
				8154	Grammar Rules}.
				8155	@end deffn
				8156
				8157	@deffn {Delimiter} ;
				8158	Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}.
				8159	@end deffn
				8160
				8161	@deffn {Delimiter} \|
				8162	Separates alternate rules for the same result nonterminal.
				8163	@xref{Rules, ,Syntax of Grammar Rules}.
				8164	@end deffn
				8165
				8166	@deffn {Symbol} $accept
				8167	The predefined nonterminal whose only rule is @samp{$accept: @var{start}
				8168	$end}, where @var{start} is the start symbol. @xref{Start Decl, , The
				8169	Start-Symbol}. It cannot be used in the grammar.
				8170	@end deffn
				8171
				8172	@deffn {Directive} %debug
				8173	Equip the parser for debugging. @xref{Decl Summary}.
				8174	@end deffn
				8175
				8176	@ifset defaultprec
				8177	@deffn {Directive} %default-prec
				8178	Assign a precedence to rules that lack an explicit @samp{%prec}
				8179	modifier. @xref{Contextual Precedence, ,Context-Dependent
				8180	Precedence}.
				8181	@end deffn
				8182	@end ifset
				8183
				8184	@deffn {Directive} %defines
				8185	Bison declaration to create a header file meant for the scanner.
				8186	@xref{Decl Summary}.
				8187	@end deffn
				8188
				8189	@deffn {Directive} %destructor
				8190	Specify how the parser should reclaim the memory associated to
				8191	discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
				8192	@end deffn
				8193
				8194	@deffn {Directive} %dprec
				8195	Bison declaration to assign a precedence to a rule that is used at parse
				8196	time to resolve reduce/reduce conflicts. @xref{GLR Parsers, ,Writing
				8197	@acronym{GLR} Parsers}.
				8198	@end deffn
				8199
				8200	@deffn {Symbol} $end
				8201	The predefined token marking the end of the token stream. It cannot be
				8202	used in the grammar.
				8203	@end deffn
				8204
				8205	@deffn {Symbol} error
				8206	A token name reserved for error recovery. This token may be used in
				8207	grammar rules so as to allow the Bison parser to recognize an error in
				8208	the grammar without halting the process. In effect, a sentence
				8209	containing an error may be recognized as valid. On a syntax error, the
				8210	token @code{error} becomes the current look-ahead token. Actions
				8211	corresponding to @code{error} are then executed, and the look-ahead
				8212	token is reset to the token that originally caused the violation.
				8213	@xref{Error Recovery}.
				8214	@end deffn
				8215
				8216	@deffn {Directive} %error-verbose
				8217	Bison declaration to request verbose, specific error message strings
				8218	when @code{yyerror} is called.
				8219	@end deffn
				8220
				8221	@deffn {Directive} %file-prefix="@var{prefix}"
				8222	Bison declaration to set the prefix of the output files. @xref{Decl
				8223	Summary}.
				8224	@end deffn
				8225
				8226	@deffn {Directive} %glr-parser
				8227	Bison declaration to produce a @acronym{GLR} parser. @xref{GLR
				8228	Parsers, ,Writing @acronym{GLR} Parsers}.
				8229	@end deffn
				8230
				8231	@deffn {Directive} %initial-action
				8232	Run user code before parsing. @xref{Initial Action Decl, , Performing Actions before Parsing}.
				8233	@end deffn
				8234
				8235	@deffn {Directive} %left
				8236	Bison declaration to assign left associativity to token(s).
				8237	@xref{Precedence Decl, ,Operator Precedence}.
				8238	@end deffn
				8239
				8240	@deffn {Directive} %lex-param @{@var{argument-declaration}@}
				8241	Bison declaration to specifying an additional parameter that
				8242	@code{yylex} should accept. @xref{Pure Calling,, Calling Conventions
				8243	for Pure Parsers}.
				8244	@end deffn
				8245
				8246	@deffn {Directive} %merge
				8247	Bison declaration to assign a merging function to a rule. If there is a
				8248	reduce/reduce conflict with a rule having the same merging function, the
				8249	function is applied to the two semantic values to get a single result.
				8250	@xref{GLR Parsers, ,Writing @acronym{GLR} Parsers}.
				8251	@end deffn
				8252
				8253	@deffn {Directive} %name-prefix="@var{prefix}"
				8254	Bison declaration to rename the external symbols. @xref{Decl Summary}.
				8255	@end deffn
				8256
				8257	@ifset defaultprec
				8258	@deffn {Directive} %no-default-prec
				8259	Do not assign a precedence to rules that lack an explicit @samp{%prec}
				8260	modifier. @xref{Contextual Precedence, ,Context-Dependent
				8261	Precedence}.
				8262	@end deffn
				8263	@end ifset
				8264
				8265	@deffn {Directive} %no-lines
				8266	Bison declaration to avoid generating @code{#line} directives in the
				8267	parser file. @xref{Decl Summary}.
				8268	@end deffn
				8269
				8270	@deffn {Directive} %nonassoc
				8271	Bison declaration to assign nonassociativity to token(s).
				8272	@xref{Precedence Decl, ,Operator Precedence}.
				8273	@end deffn
				8274
				8275	@deffn {Directive} %output="@var{file}"
				8276	Bison declaration to set the name of the parser file. @xref{Decl
				8277	Summary}.
				8278	@end deffn
				8279
				8280	@deffn {Directive} %parse-param @{@var{argument-declaration}@}
				8281	Bison declaration to specifying an additional parameter that
				8282	@code{yyparse} should accept. @xref{Parser Function,, The Parser
				8283	Function @code{yyparse}}.
				8284	@end deffn
				8285
				8286	@deffn {Directive} %prec
				8287	Bison declaration to assign a precedence to a specific rule.
				8288	@xref{Contextual Precedence, ,Context-Dependent Precedence}.
				8289	@end deffn
				8290
				8291	@deffn {Directive} %pure-parser
				8292	Bison declaration to request a pure (reentrant) parser.
				8293	@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
				8294	@end deffn
				8295
				8296	@deffn {Directive} %require "@var{version}"
				8297	Require version @var{version} or higher of Bison. @xref{Require Decl, ,
				8298	Require a Version of Bison}.
				8299	@end deffn
				8300
				8301	@deffn {Directive} %right
				8302	Bison declaration to assign right associativity to token(s).
				8303	@xref{Precedence Decl, ,Operator Precedence}.
				8304	@end deffn
				8305
				8306	@deffn {Directive} %start
				8307	Bison declaration to specify the start symbol. @xref{Start Decl, ,The
				8308	Start-Symbol}.
				8309	@end deffn
				8310
				8311	@deffn {Directive} %token
				8312	Bison declaration to declare token(s) without specifying precedence.
				8313	@xref{Token Decl, ,Token Type Names}.
				8314	@end deffn
				8315
				8316	@deffn {Directive} %token-table
				8317	Bison declaration to include a token name table in the parser file.
				8318	@xref{Decl Summary}.
				8319	@end deffn
				8320
				8321	@deffn {Directive} %type
				8322	Bison declaration to declare nonterminals. @xref{Type Decl,
				8323	,Nonterminal Symbols}.
				8324	@end deffn
				8325
				8326	@deffn {Symbol} $undefined
				8327	The predefined token onto which all undefined values returned by
				8328	@code{yylex} are mapped. It cannot be used in the grammar, rather, use
				8329	@code{error}.
				8330	@end deffn
				8331
				8332	@deffn {Directive} %union
				8333	Bison declaration to specify several possible data types for semantic
				8334	values. @xref{Union Decl, ,The Collection of Value Types}.
				8335	@end deffn
				8336
				8337	@deffn {Macro} YYABORT
				8338	Macro to pretend that an unrecoverable syntax error has occurred, by
				8339	making @code{yyparse} return 1 immediately. The error reporting
				8340	function @code{yyerror} is not called. @xref{Parser Function, ,The
				8341	Parser Function @code{yyparse}}.
				8342	@end deffn
				8343
				8344	@deffn {Macro} YYACCEPT
				8345	Macro to pretend that a complete utterance of the language has been
				8346	read, by making @code{yyparse} return 0 immediately.
				8347	@xref{Parser Function, ,The Parser Function @code{yyparse}}.
				8348	@end deffn
				8349
				8350	@deffn {Macro} YYBACKUP
				8351	Macro to discard a value from the parser stack and fake a look-ahead
				8352	token. @xref{Action Features, ,Special Features for Use in Actions}.
				8353	@end deffn
				8354
				8355	@deffn {Variable} yychar
				8356	External integer variable that contains the integer value of the
				8357	look-ahead token. (In a pure parser, it is a local variable within
				8358	@code{yyparse}.) Error-recovery rule actions may examine this variable.
				8359	@xref{Action Features, ,Special Features for Use in Actions}.
				8360	@end deffn
				8361
				8362	@deffn {Variable} yyclearin
				8363	Macro used in error-recovery rule actions. It clears the previous
				8364	look-ahead token. @xref{Error Recovery}.
				8365	@end deffn
				8366
				8367	@deffn {Macro} YYDEBUG
				8368	Macro to define to equip the parser with tracing code. @xref{Tracing,
				8369	,Tracing Your Parser}.
				8370	@end deffn
				8371
				8372	@deffn {Variable} yydebug
				8373	External integer variable set to zero by default. If @code{yydebug}
				8374	is given a nonzero value, the parser will output information on input
				8375	symbols and parser action. @xref{Tracing, ,Tracing Your Parser}.
				8376	@end deffn
				8377
				8378	@deffn {Macro} yyerrok
				8379	Macro to cause parser to recover immediately to its normal mode
				8380	after a syntax error. @xref{Error Recovery}.
				8381	@end deffn
				8382
				8383	@deffn {Macro} YYERROR
				8384	Macro to pretend that a syntax error has just been detected: call
				8385	@code{yyerror} and then perform normal error recovery if possible
				8386	(@pxref{Error Recovery}), or (if recovery is impossible) make
				8387	@code{yyparse} return 1. @xref{Error Recovery}.
				8388	@end deffn
				8389
				8390	@deffn {Function} yyerror
				8391	User-supplied function to be called by @code{yyparse} on error.
				8392	@xref{Error Reporting, ,The Error
				8393	Reporting Function @code{yyerror}}.
				8394	@end deffn
				8395
				8396	@deffn {Macro} YYERROR_VERBOSE
				8397	An obsolete macro that you define with @code{#define} in the prologue
				8398	to request verbose, specific error message strings
				8399	when @code{yyerror} is called. It doesn't matter what definition you
				8400	use for @code{YYERROR_VERBOSE}, just whether you define it. Using
				8401	@code{%error-verbose} is preferred.
				8402	@end deffn
				8403
				8404	@deffn {Macro} YYINITDEPTH
				8405	Macro for specifying the initial size of the parser stack.
				8406	@xref{Memory Management}.
				8407	@end deffn
				8408
				8409	@deffn {Function} yylex
				8410	User-supplied lexical analyzer function, called with no arguments to get
				8411	the next token. @xref{Lexical, ,The Lexical Analyzer Function
				8412	@code{yylex}}.
				8413	@end deffn
				8414
				8415	@deffn {Macro} YYLEX_PARAM
				8416	An obsolete macro for specifying an extra argument (or list of extra
				8417	arguments) for @code{yyparse} to pass to @code{yylex}. The use of this
				8418	macro is deprecated, and is supported only for Yacc like parsers.
				8419	@xref{Pure Calling,, Calling Conventions for Pure Parsers}.
				8420	@end deffn
				8421
				8422	@deffn {Variable} yylloc
				8423	External variable in which @code{yylex} should place the line and column
				8424	numbers associated with a token. (In a pure parser, it is a local
				8425	variable within @code{yyparse}, and its address is passed to
				8426	@code{yylex}.)
				8427	You can ignore this variable if you don't use the @samp{@@} feature in the
				8428	grammar actions.
				8429	@xref{Token Locations, ,Textual Locations of Tokens}.
				8430	In semantic actions, it stores the location of the look-ahead token.
				8431	@xref{Actions and Locations, ,Actions and Locations}.
				8432	@end deffn
				8433
				8434	@deffn {Type} YYLTYPE
				8435	Data type of @code{yylloc}; by default, a structure with four
				8436	members. @xref{Location Type, , Data Types of Locations}.
				8437	@end deffn
				8438
				8439	@deffn {Variable} yylval
				8440	External variable in which @code{yylex} should place the semantic
				8441	value associated with a token. (In a pure parser, it is a local
				8442	variable within @code{yyparse}, and its address is passed to
				8443	@code{yylex}.)
				8444	@xref{Token Values, ,Semantic Values of Tokens}.
				8445	In semantic actions, it stores the semantic value of the look-ahead token.
				8446	@xref{Actions, ,Actions}.
				8447	@end deffn
				8448
				8449	@deffn {Macro} YYMAXDEPTH
				8450	Macro for specifying the maximum size of the parser stack. @xref{Memory
				8451	Management}.
				8452	@end deffn
				8453
				8454	@deffn {Variable} yynerrs
				8455	Global variable which Bison increments each time it reports a syntax error.
				8456	(In a pure parser, it is a local variable within @code{yyparse}.)
				8457	@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
				8458	@end deffn
				8459
				8460	@deffn {Function} yyparse
				8461	The parser function produced by Bison; call this function to start
				8462	parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
				8463	@end deffn
				8464
				8465	@deffn {Macro} YYPARSE_PARAM
				8466	An obsolete macro for specifying the name of a parameter that
				8467	@code{yyparse} should accept. The use of this macro is deprecated, and
				8468	is supported only for Yacc like parsers. @xref{Pure Calling,, Calling
				8469	Conventions for Pure Parsers}.
				8470	@end deffn
				8471
				8472	@deffn {Macro} YYRECOVERING
				8473	The expression @code{YYRECOVERING ()} yields 1 when the parser
				8474	is recovering from a syntax error, and 0 otherwise.
				8475	@xref{Action Features, ,Special Features for Use in Actions}.
				8476	@end deffn
				8477
				8478	@deffn {Macro} YYSTACK_USE_ALLOCA
				8479	Macro used to control the use of @code{alloca} when the C
				8480	@acronym{LALR}(1) parser needs to extend its stacks. If defined to 0,
				8481	the parser will use @code{malloc} to extend its stacks. If defined to
				8482	1, the parser will use @code{alloca}. Values other than 0 and 1 are
				8483	reserved for future Bison extensions. If not defined,
				8484	@code{YYSTACK_USE_ALLOCA} defaults to 0.
				8485
				8486	In the all-too-common case where your code may run on a host with a
				8487	limited stack and with unreliable stack-overflow checking, you should
				8488	set @code{YYMAXDEPTH} to a value that cannot possibly result in
				8489	unchecked stack overflow on any of your target hosts when
				8490	@code{alloca} is called. You can inspect the code that Bison
				8491	generates in order to determine the proper numeric values. This will
				8492	require some expertise in low-level implementation details.
				8493	@end deffn
				8494
				8495	@deffn {Type} YYSTYPE
				8496	Data type of semantic values; @code{int} by default.
				8497	@xref{Value Type, ,Data Types of Semantic Values}.
				8498	@end deffn
				8499
				8500	@node Glossary
				8501	@appendix Glossary
				8502	@cindex glossary
				8503
				8504	@table @asis
				8505	@item Backus-Naur Form (@acronym{BNF}; also called ``Backus Normal Form'')
				8506	Formal method of specifying context-free grammars originally proposed
				8507	by John Backus, and slightly improved by Peter Naur in his 1960-01-02
				8508	committee document contributing to what became the Algol 60 report.
				8509	@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
				8510
				8511	@item Context-free grammars
				8512	Grammars specified as rules that can be applied regardless of context.
				8513	Thus, if there is a rule which says that an integer can be used as an
				8514	expression, integers are allowed @emph{anywhere} an expression is
				8515	permitted. @xref{Language and Grammar, ,Languages and Context-Free
				8516	Grammars}.
				8517
				8518	@item Dynamic allocation
				8519	Allocation of memory that occurs during execution, rather than at
				8520	compile time or on entry to a function.
				8521
				8522	@item Empty string
				8523	Analogous to the empty set in set theory, the empty string is a
				8524	character string of length zero.
				8525
				8526	@item Finite-state stack machine
				8527	A ``machine'' that has discrete states in which it is said to exist at
				8528	each instant in time. As input to the machine is processed, the
				8529	machine moves from state to state as specified by the logic of the
				8530	machine. In the case of the parser, the input is the language being
				8531	parsed, and the states correspond to various stages in the grammar
				8532	rules. @xref{Algorithm, ,The Bison Parser Algorithm}.
				8533
				8534	@item Generalized @acronym{LR} (@acronym{GLR})
				8535	A parsing algorithm that can handle all context-free grammars, including those
				8536	that are not @acronym{LALR}(1). It resolves situations that Bison's
				8537	usual @acronym{LALR}(1)
				8538	algorithm cannot by effectively splitting off multiple parsers, trying all
				8539	possible parsers, and discarding those that fail in the light of additional
				8540	right context. @xref{Generalized LR Parsing, ,Generalized
				8541	@acronym{LR} Parsing}.
				8542
				8543	@item Grouping
				8544	A language construct that is (in general) grammatically divisible;
				8545	for example, `expression' or `declaration' in C@.
				8546	@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
				8547
				8548	@item Infix operator
				8549	An arithmetic operator that is placed between the operands on which it
				8550	performs some operation.
				8551
				8552	@item Input stream
				8553	A continuous flow of data between devices or programs.
				8554
				8555	@item Language construct
				8556	One of the typical usage schemas of the language. For example, one of
				8557	the constructs of the C language is the @code{if} statement.
				8558	@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
				8559
				8560	@item Left associativity
				8561	Operators having left associativity are analyzed from left to right:
				8562	@samp{a+b+c} first computes @samp{a+b} and then combines with
				8563	@samp{c}. @xref{Precedence, ,Operator Precedence}.
				8564
				8565	@item Left recursion
				8566	A rule whose result symbol is also its first component symbol; for
				8567	example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive
				8568	Rules}.
				8569
				8570	@item Left-to-right parsing
				8571	Parsing a sentence of a language by analyzing it token by token from
				8572	left to right. @xref{Algorithm, ,The Bison Parser Algorithm}.
				8573
				8574	@item Lexical analyzer (scanner)
				8575	A function that reads an input stream and returns tokens one by one.
				8576	@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
				8577
				8578	@item Lexical tie-in
				8579	A flag, set by actions in the grammar rules, which alters the way
				8580	tokens are parsed. @xref{Lexical Tie-ins}.
				8581
				8582	@item Literal string token
				8583	A token which consists of two or more fixed characters. @xref{Symbols}.
				8584
				8585	@item Look-ahead token
				8586	A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead
				8587	Tokens}.
				8588
				8589	@item @acronym{LALR}(1)
				8590	The class of context-free grammars that Bison (like most other parser
				8591	generators) can handle; a subset of @acronym{LR}(1). @xref{Mystery
				8592	Conflicts, ,Mysterious Reduce/Reduce Conflicts}.
				8593
				8594	@item @acronym{LR}(1)
				8595	The class of context-free grammars in which at most one token of
				8596	look-ahead is needed to disambiguate the parsing of any piece of input.
				8597
				8598	@item Nonterminal symbol
				8599	A grammar symbol standing for a grammatical construct that can
				8600	be expressed through rules in terms of smaller constructs; in other
				8601	words, a construct that is not a token. @xref{Symbols}.
				8602
				8603	@item Parser
				8604	A function that recognizes valid sentences of a language by analyzing
				8605	the syntax structure of a set of tokens passed to it from a lexical
				8606	analyzer.
				8607
				8608	@item Postfix operator
				8609	An arithmetic operator that is placed after the operands upon which it
				8610	performs some operation.
				8611
				8612	@item Reduction
				8613	Replacing a string of nonterminals and/or terminals with a single
				8614	nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison
				8615	Parser Algorithm}.
				8616
				8617	@item Reentrant
				8618	A reentrant subprogram is a subprogram which can be in invoked any
				8619	number of times in parallel, without interference between the various
				8620	invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
				8621
				8622	@item Reverse polish notation
				8623	A language in which all operators are postfix operators.
				8624
				8625	@item Right recursion
				8626	A rule whose result symbol is also its last component symbol; for
				8627	example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive
				8628	Rules}.
				8629
				8630	@item Semantics
				8631	In computer languages, the semantics are specified by the actions
				8632	taken for each instance of the language, i.e., the meaning of
				8633	each statement. @xref{Semantics, ,Defining Language Semantics}.
				8634
				8635	@item Shift
				8636	A parser is said to shift when it makes the choice of analyzing
				8637	further input from the stream rather than reducing immediately some
				8638	already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm}.
				8639
				8640	@item Single-character literal
				8641	A single character that is recognized and interpreted as is.
				8642	@xref{Grammar in Bison, ,From Formal Rules to Bison Input}.
				8643
				8644	@item Start symbol
				8645	The nonterminal symbol that stands for a complete valid utterance in
				8646	the language being parsed. The start symbol is usually listed as the
				8647	first nonterminal symbol in a language specification.
				8648	@xref{Start Decl, ,The Start-Symbol}.
				8649
				8650	@item Symbol table
				8651	A data structure where symbol names and associated data are stored
				8652	during parsing to allow for recognition and use of existing
				8653	information in repeated uses of a symbol. @xref{Multi-function Calc}.
				8654
				8655	@item Syntax error
				8656	An error encountered during parsing of an input stream due to invalid
				8657	syntax. @xref{Error Recovery}.
				8658
				8659	@item Token
				8660	A basic, grammatically indivisible unit of a language. The symbol
				8661	that describes a token in the grammar is a terminal symbol.
				8662	The input of the Bison parser is a stream of tokens which comes from
				8663	the lexical analyzer. @xref{Symbols}.
				8664
				8665	@item Terminal symbol
				8666	A grammar symbol that has no rules in the grammar and therefore is
				8667	grammatically indivisible. The piece of text it represents is a token.
				8668	@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
				8669	@end table
				8670
				8671	@node Copying This Manual
				8672	@appendix Copying This Manual
				8673
				8674	@menu
				8675	* GNU Free Documentation License:: License for copying this manual.
				8676	@end menu
				8677
				8678	@include fdl.texi
				8679
				8680	@node Index
				8681	@unnumbered Index
				8682
				8683	@printindex cp
				8684
				8685	@bye
				8686
				8687	@c LocalWords: texinfo setfilename settitle setchapternewpage finalout
				8688	@c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex
				8689	@c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry
				8690	@c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa
				8691	@c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc
				8692	@c LocalWords: rpcalc Lexer Gen Comp Expr ltcalc mfcalc Decl Symtab yylex
				8693	@c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref
				8694	@c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex
				8695	@c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init stmtMerge
				8696	@c LocalWords: pre STDC GNUC endif yy YY alloca lf stddef stdlib YYDEBUG
				8697	@c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit
				8698	@c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok
				8699	@c LocalWords: longjmp fprintf stderr preg yylloc YYLTYPE cos ln
				8700	@c LocalWords: smallexample symrec val tptr FNCT fnctptr func struct sym
				8701	@c LocalWords: fnct putsym getsym fname arith fncts atan ptr malloc sizeof
				8702	@c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum
				8703	@c LocalWords: ptypes itype YYPRINT trigraphs yytname expseq vindex dtype
				8704	@c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless typefull yynerrs
				8705	@c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES
				8706	@c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param
				8707	@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP
				8708	@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword
				8709	@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH
				8710	@c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm VCG notype
				8711	@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args
				8712	@c LocalWords: YYPRINTF infile ypp yxx outfile itemx vcg tex leaderfill
				8713	@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll
				8714	@c LocalWords: yyrestart nbar yytext fst snd osplit ntwo strdup AST
				8715	@c LocalWords: YYSTACK DVI fdl printindex