Computational Logic
and Machine Learning Representation
Issues
Prague, September 20, 1997
Michele Sebag
The area meeting of CompulogNet devoted
to Representation Issues in Reasoning and Learning was organized in Prague by P.
Flach and N. Lavrac, immediately after ILP'97.
Luc de Raedt recalled the different
formalisms used for learning in first order logic: learning from interpretations (examples
are interpretations, i.e. sets of atoms); learning from entailment (examples are clauses);
and learning from satisfiability (examples are clausal theories). These settings are
nested (learning from interpretations I learning from entailment I
learning from satisfiability) which allows for propagating the PAC results (and situating
learning systems) along this scale: negative PAC results for learning from interpretations
still hold for learning from satisfiability, and conversely, positive PAC results for
learning from satisfiability would still hold for learning from interpretations. This
finally leads to a unifying framework for the different logical settings: learning from
satisfiability, which subsumes all previous settings.
Simon Anthony and Alan Frisch showed
how meta-languages can help reduce the cost of induction, and improve the processing of
numerical information. The cost of induction is greatly increased by the redundancy of the
search, i.e. the fact that the same clause may be discovered several times (e.g. clauses p(X)
:- a(X), b(X) and p(X) :- b(X),a(X)). This could be limited by
setting a partial order over literals, (ie over the predicates and the symbols in their
arguments), thereby decreasing the number of
possible refinement paths. Meta-languages can also support the discovery of numerical
literals - and
handling numbers is indeed a major concern in ILP. The question is to determine the
numerical constants involved in such literals, in order to improve the accuracy of the
whole clause. One possibility is to handle these constants as if they were variables
(Skolem constants or meta-variables); the choice of these constants can then be delegated
to specific or external (e.g. regression) routines. To sum up, learners need to avoid
making over-commitments during learning; and these can be avoided using meta-languages at
different stages in the learning task.
Ashwin Srinivasan made us benefit from
his experience regarding ILP applications to molecular biology and bio-chemistry, with
five (plus one) lessons.
* First lesson is: clearly identify problem areas where propositional learning has
been shown to be insufficient; then focus on the tradeoff online - off-line efficiency.
More precisely: ILP can deal with a general-purpose description of the
examples; but a propositional learner dealing with a specific, hand-tuned description of
the
examples, might outperform the ILP learner. What you gain with ILP is the pre-processing
phase; furthermore, ILP can suggest relevant
features and thereby contribute to an efficient
re-representation of the problem.
* In real-world domains, one digs the examples and background knowledge in the
specialized literature (true for pre-Web times only ?) and these have to be debugged
(still true for some time, I'm afraid). Debugging is easier in a declarative than in a
procedural context.
* Once debugged, an efficient encoding of the background knowledge is essential.
* The optimality function (which kind of knowledge is to be preferred) is
problem-specific: the Minimum Description Length may not necessarily correspond to a
relevant preference order.
* Comprehensibility is problem-specific, too: e.g. a chemist might prefer a
representation of knowledge, other than Prolog clauses.
* Last, have a log-book ! One easily forgets the failures and dead ends as time
goes by and
drawbacks are alleviated...). A. Srinivasan last underlined that a tight collaboration
with an expert of the problem domain is a key factor of success, and he acknowledged the
major contribution of Ross King to the breakthrough of ILP in bio-chemistry; the role of
another domain expert, Mike Sternberg, was highlighted too. Bravo.
Fabrizio Riguzzi described an extended
formalism for ILP, based on Abductive Logic Programming. Here, the stress is put on
dealing with incomplete background knowledge; this is strongly required for learning from
e.g. questionnaires, which always include unanswered questions. Abductive Logic
Programming resembles learning from entailment, as far as rules are learned from
(incomplete) clauses. But abductive entailment involves making additional assumptions,
e.g. the grand-father is male. And this completion of the available information must be as
regular as possible: the final interpretations (examples + assumptions) give rise to
integrity constraints, e.g. a human being is either male or female. Abductive Logic
Programming therefore also resembles learning from interpretations. One price to pay for
this powerful setting comes from its non-monotonicity. In theory, one should check the
consistency of the set of assumptions, with regards to one another, and/or with regards to
further examples. In the worst case, the previous assumptions would prevent from finding
any explanation for the current examples and backtracking would be required. But this does
not occur in practice, as examples are ''sufficiently'' independent in most applications.
John Lloyd presented a new higher order
logic language, termed Escher, that addresses many limitations of Prolog regarding
lists, functions, negation, and so forth. Within Escher, a predicate is defined via
a set of equations: these are much more natural than clauses, and easier to write as
disjunction and explicit existential quantification are allowed. These equations
correspond to a partition of the instances of the predicate; and the left side of the
equation should give all means to check whether this equation is relevant to a given
context. From the point of view of induction, Escher offers a clean way to express
learning bias, and specify: the role of predicate arguments; the allowed number of
occurrences of a predicate, including the connectives; a functional description of the
search space (e.g. an upper bound on some complexity measure of the candidate solutions);
and so forth.
As expected, the more powerful the
language, the more one has difficulty in designing a learner, and the more expensive the
learning search is (same remark as for the non-monotonic abductive learning setting);
these are still open questions.
And what about inventing another, still
more powerful, language, suggested P. Stepanek ? An object
oriented language, subsuming Escher, and named Bach...
Jean-Francois Puget advocated for a
fruitful collaboration between the Constrained Logic Programming (CLP) and the ILP
communities. He sketched several possibilities for learning numerical literals in the same
context as S. Anthony and A. Frisch, such as reformulating this problem as a constraint
satisfaction problem (CSP) or an optimization problem. Another approach would consist of
keeping the implicit description of such a literal given by the CSP, and directly check
further examples against the CSP.
Conversely, a constraint solver could
indeed benefit from a learning module able to generalize the previous search failures; the
solver could then take advantage of such generalizations to prune further combinatorial
exploration. Another direction for fruitful interaction is for ILP to refine the
representation of a CSP (e.g. taking advantage of the symmetries, invariance and so forth,
of the problem) in order to optimize the search.
Last, and not least, Celine Rouveirol
presented three case studies illustrating how representational changes can support
learning.
* The flattening operator proposed by C. Rouveirol and J.F. Puget allows one to
get rid of function symbols; let us recall that most ILP learners do not presently deal
with function symbols other than constants. Flattening can significantly decrease the
complexity of learning a theory with an infinite Herbrand model, as one judiciously
chooses the finite model of the flattened theory among the subsets of the infinite one.
* Some ILP problems (handling noise, numbers) can be addressed by mapping
first-order onto attribute-value examples. LINUS (Lavrac and Dzeroski) uses syntactic
restrictions to ensure a one-to-one mapping. The mapping is one-to-exponentially-many in
the disjunctive version space (Sebag), which involves no syntactic restrictions. The
latter mapping is combined with a stochastic bias in STILL (Sebag and Rouveirol), to enjoy
a bounded (polynomial and tunable) complexity.
* A third change of representation is concerned with tuning the granularity of
search when learning from databases. Different mappings can be defined from a DB to a LP
representation: a table with $n$ attributes can be represented as a single n-ary
predicate (large grain representation), or n-1 binary predicates (fine grain
representation). The strategy of exploring the large grain lattice until inconsistencies
are found, and only then switching to the fine grain lattice, dramatically decreases the
number of queries to the DB. Such representation changes are implemented in Haiku
(Nedellec and Rouveirol) and RDT/DB (Morik and Buchausen).
To sum up, most work is concerned with
extending the learning language. These extensions aim at: dealing with more complex or
incomplete examples; expressing more powerful learning biases; handling numbers more
efficiently. An alternative to extending the language consists in reformulating the
examples or the learning task.
Last, as ILP is getting mature, the
first real-world applications deliver interesting feedback. This feedback is partly
expected (the expert should be able to control the whole learning process) and
occasionally counter-intuitive (Prolog programs are not considered understandable by all
experts).
Michele Sebag
LMS, Ecole Polytechnique & LRI,
Universite d'Orsay
Michele.Sebag@polytechnique.fr |