BookBody.pdf
(
1402 KB
)
Pobierz
273921588 UNPDF
Preface
Parsing (syntactic analysis) is one of the best understood branches of computer science.
Parsers are already being used extensively in a number of disciplines: in computer sci-
ence (for compiler construction, database interfaces, self-describing data-bases, artifi-
cial intelligence), in linguistics (for text analysis, corpora analysis, machine translation,
textual analysis of biblical texts), in document preparation and conversion, in typeset-
ting chemical formulae and in chromosome recognition, to name a few; they can be
used (and perhaps are) in a far larger number of disciplines. It is therefore surprising
that there is no book which collects the knowledge about parsing and explains it to the
non-specialist. Part of the reason may be that parsing has a name for being “difficult”.
In discussing the Amsterdam Compiler Kit and in teaching compiler construction, it
has, however, been our experience that seemingly difficult parsing techniques can be
explained in simple terms, given the right approach. The present book is the result of
these considerations.
This book does not address a strictly uniform audience. On the contrary, while
writing this book, we have consistently tried to imagine giving a course on the subject
to a diffuse mixture of students and faculty members of assorted faculties, sophisticated
laymen, the avid readers of the science supplement of the large newspapers, etc. Such a
course was never given; a diverse audience like that would be too uncoordinated to
convene at regular intervals, which is why we wrote this book, to be read, studied,
perused or consulted wherever or whenever desired.
Addressing such a varied audience has its own difficulties (and rewards).
Although no explicit math was used, it could not be avoided that an amount of
mathematical thinking should pervade this book. Technical terms pertaining to parsing
have of course been explained in the book, but sometimes a term on the fringe of the
subject has been used without definition. Any reader who has ever attended a lecture on
a non-familiar subject knows the phenomenon. He skips the term, assumes it refers to
something reasonable and hopes it will not recur too often. And then there will be pas-
sages where the reader will think we are elaborating the obvious (this paragraph may
be one such place). The reader may find solace in the fact that he does not have to doo-
dle his time away or stare out of the window until the lecturer progresses.
On the positive side, and that is the main purpose of this enterprise, we hope that
by means of a book with this approach we can reach those who were dimly aware of
the existence and perhaps of the usefulness of parsing but who thought it would forever
12
Preface
be hidden behind phrases like:
2
(
V
N
È
V
T
)
*
and
a homomorphism ...
No knowledge of any particular programming language is required. The book con-
tains two or three programs in Pascal, which serve as actualizations only and play a
minor role in the explanation. What is required, though, is an understanding of algo-
rithmic thinking, especially of recursion. Books like
Learning to program
by Howard
Johnston (Prentice-Hall, 1985) or
Programming from first principles
by Richard Bornat
(Prentice-Hall 1987) provide an adequate background (but supply more detail than
required). Pascal was chosen because it is about the only programming language more
or less widely available outside computer science environments.
The book features an extensive annotated bibliography. The user of the bibliogra-
phy is expected to be more than casually interested in parsing and to possess already a
reasonable knowledge of it, either through this book or otherwise. The bibliography as
a list serves to open up the more accessible part of the literature on the subject to the
reader; the annotations are in terse technical prose and we hope they will be useful as
stepping stones to reading the actual articles.
On the subject of applications of parsers, this book is vague. Although we suggest
a number of applications in Chapter 1, we lack the expertise to supply details. It is
obvious that musical compositions possess a structure which can largely be described
by a grammar and thus is amenable to parsing, but we shall have to leave it to the
musicologists to implement the idea. It was less obvious to us that behaviour at cor-
porate meetings proceeds according to a grammar, but we are told that this is so and
that it is a subject of socio-psychological research.
Let
be a mapping
V
N
®
Acknowledgements
We thank the people who helped us in writing this book. Marion de Krieger has
retrieved innumerable books and copies of journal articles for us and without her effort
the annotated bibliography would be much further from completeness. Ed Keizer has
patiently restored peace between us and the pic|tbl|eqn|psfig|troff pipeline, on the many
occasions when we abused, overloaded or just plainly misunderstood the latter. Leo
van Moergestel has made the hardware do things for us that it would not do for the
uninitiated. We also thank Erik Baalbergen, Frans Kaashoek, Erik Groeneveld, Gerco
Ballintijn, Jaco Imthorn, and Egon Amada for their critical remarks and contributions.
The rose at the end of Chapter 2 is by Arwen Grune. Ilana and Lily Grune typed parts
of the text on various occasions.
We thank the Faculteit Wiskunde en Informatica of the Vrije Universiteit for the
use of the equipment.
In a wider sense, we extend our thanks to the hundreds of authors who have been
so kind as to invent scores of clever and elegant algorithms and techniques for us to
exhibit. We hope we have named them all in our bibliography.
Dick Grune
Ceriel J.H. Jacobs
Amstelveen/Amsterdam, July 1990; Sept 1998
1
Introduction
Parsing is the process of structuring a linear representation in accordance with a given
grammar. This definition has been kept abstract on purpose, to allow as wide an
interpretation as possible. The “linear representation” may be a sentence, a computer
program, a knitting pattern, a sequence of geological strata, a piece of music, actions in
ritual behaviour, in short any linear sequence in which the preceding elements in some
way restrict
†
the next element. For some of the examples the grammar is well-known,
for some it is an object of research and for some our notion of a grammar is only just
beginning to take shape.
For each grammar, there are generally an infinite number of linear representations
(“sentences”) that can be structured with it. That is, a finite-size grammar can supply
structure to an infinite number of sentences. This is the main strength of the grammar
paradigm and indeed the main source of the importance of grammars: they summarize
succinctly the structure of an infinite number of objects of a certain class.
There are several reasons to perform this structuring process called parsing. One
reason derives from the fact that the obtained structure helps us to process the object
further. When we know that a certain segment of a sentence in German is the subject,
that information helps in translating the sentence. Once the structure of a document has
been brought to the surface, it can be converted more easily.
A second is related to the fact that the grammar in a sense represents our under-
standing of the observed sentences: the better a grammar we can give for the move-
ments of bees, the deeper our understanding of them is.
A third lies in the completion of missing information that parsers, and especially
error-repairing parsers, can provide. Given a reasonable grammar of the language, an
error-repairing parser can suggest possible word classes for missing or unknown words
on clay tablets.
†
If there is no restriction, the sequence still has a grammar, but this grammar is trivial and unin-
formative.
14
Introduction
[Ch. 1
1.1 PARSING AS A CRAFT
Parsing is no longer an arcane art; it has not been so since the early 70’s when Aho,
Ullman, Knuth and many others put various parsing techniques solidly on their theoret-
ical feet. It need not be a mathematical discipline either; the inner workings of a parser
can be visualized, understood and modified to fit the application, with not much more
than cutting and pasting strings.
There is a considerable difference between a mathematician’s view of the world
and a computer-scientist’s. To a mathematician all structures are static: they have
always been and will always be; the only time dependence is that we just haven’t
discovered them all yet. The computer scientist is concerned with (and fascinated by)
the continuous creation, combination, separation and destruction of structures: time is
of the essence. In the hands of a mathematician, the Peano axioms create the integers
without reference to time, but if a computer scientist uses them to implement integer
addition, he finds they describe a very slow process, which is why he will be looking
for a more efficient approach. In this respect the computer scientist has more in com-
mon with the physicist and the chemist; like these, he cannot do without a solid basis in
several branches of applied mathematics, but, like these, he is willing (and often virtu-
ally obliged) to take on faith certain theorems handed to him by the mathematician.
Without the rigor of mathematics all science would collapse, but not all inhabitants of a
building need to know all the spars and girders that keep it upright. Factoring off cer-
tain detailed knowledge to specialists reduces the intellectual complexity of a task,
which is one of the things computer science is about.
This is the vein in which this book is written: parsing for anybody who has pars-
ing to do: the compiler writer, the linguist, the data-base interface writer, the geologist
or musicologist who want to test grammatical descriptions of their respective objects of
interest, and so on. We require a good ability to visualize, some programming experi-
ence and the willingness and patience to follow non-trivial examples; there is nothing
better for understanding a kangaroo than seeing it jump. We treat, of course, the popu-
lar parsing techniques, but we will not shun some weird techniques that look as if they
are of theoretical interest only: they often offer new insights and a reader might find an
application for them.
1.2 THE APPROACH USED
This book addresses the reader at at least three different levels. The interested non-
computer scientist can read the book as “the story of grammars and parsing”; he or she
can skip the detailed explanations of the algorithms: each algorithm is first explained in
general terms. The computer scientist will find much technical detail on a wide array of
algorithms. To the expert we offer a systematic bibliography of over 400 entries, which
is intended to cover all articles on parsing that have appeared in the readily available
journals. Each entry is annotated, providing enough material for the reader to decide if
the referred article is worth reading.
No ready-to-run algorithms have been given, except for the general context-free
parser of Chapter 12. The formulation of a parsing algorithm with sufficient precision
to enable a programmer to implement and run it without problems requires a consider-
able supporting mechanism that would be out of place in this book and in our experi-
ence does little to increase one’s understanding of the process involved. The popular
methods are given in algorithmic form in most books on compiler construction. The
Sec. 1.2]
The approach used
15
less widely used methods are almost always described in detail in the original publica-
tion, for which see Chapter 13.
1.3 OUTLINE OF THE CONTENTS
Since parsing is concerned with sentences and grammars and since grammars are them-
selves fairly complicated objects, ample attention is paid to them in Chapter 2. Chapter
3 discusses the principles behind parsing and gives a classification of parsing methods.
In summary, parsing methods can be classified as top-down or bottom-up and as direc-
tional or non-directional; the directional methods can be further distinguished into
deterministic and non-deterministic. This scheme dictates the contents of the next few
chapters. In Chapter 4 we treat non-directional methods, including Unger and CYK.
Chapter 5 forms an intermezzo with the treatment of finite-state automata, which are
needed in the subsequent chapters. Chapters 6 through 9 are concerned with directional
methods. Chapter 6 covers non-deterministic directional top-down parsers (recursive
descent, Definite Clause Grammars), Chapter 7 non-deterministic directional bottom-
up parsers (Earley). Deterministic methods are treated in Chapters 8 (top-down: LL in
various forms) and 9 (bottom-up: LR, etc.). A combined deterministic/non-
deterministic method (Tomita) is also described in Chapter 9. That completes the pars-
ing methods per se.
Error handling for a selected number of methods is treated in Chapter 10. The
comparative survey of parsing methods in Chapter 11 summarizes the properties of the
popular and some less popular methods. Chapter 12 contains the full code in Pascal for
a parser that will work for any context-free grammar, to lower the threshold for experi-
menting.
1.4 THE ANNOTATED BIBLIOGRAPHY
The annotated bibliography is presented in Chapter 13 and is an easily accessible sup-
plement of the main body of the book. Rather than listing all publications in alphabetic
order, it is divided into fourteen named sections, each concerned with a particular
aspect of parsing; inside the sections, the publications are listed chronologically. An
author index replaces the usual alphabetic list. The section name plus year of publica-
tion, placed in brackets, are used in the text to refer to an author’s work. For instance,
the annotated reference to Earley’s publication of the Earley parser [CF 1970] can be
found in the section CF at the position of the papers of 1970. Since the name of the
first author is printed in bold letters, the actual reference is then easily located.
Plik z chomika:
zniwiarz1006
Inne pliki z tego folderu:
3d_studio_max_4.ace
(364 KB)
3d_studio_max_4.zip
(20380 KB)
3D Studio Max (HELION).zip
(5610 KB)
3D Studio Max4.zip
(9081 KB)
3D Studio Max.zip
(5610 KB)
Inne foldery tego chomika:
###Aguś
###Brov
###Kasia
###ZAZUN
##Android
Zgłoś jeśli
naruszono regulamin