Connectionist learning of belief networks

https://doi.org/10.1016/0004-3702(92)90065-6Get rights and content

Abstract

Connectionist learning procedures are presented for “sigmoid” and “noisy-OR” varieties of probabilistic belief networks. These networks have previously been seen primarily as a means of representing knowledge derived from experts. Here it is shown that the “Gibbs sampling” simulation procedure for such networks can support maximum-likelihood learning from empirical data through local gradient ascent. This learning procedure resembles that used for “Boltzmann machines”, and like it, allows the use of “hidden” variables to model correlations between visible variables. Due to the directed nature of the connections in a belief network, however, the “negative phase” of Boltzmann machine learning is unnecessary. Experimental results show that, as a result, learning in a sigmoid belief network can be faster than in a Boltzmann machine. These networks have other advantages over Boltzmann machines in pattern classification and decision making applications, are naturally applicable to unsupervised learning problems, and provide a link between work on connectionist learning and work on the representation of expert knowledge.

References (22)

  • D.H. Ackley et al.

    A learning algorithm for Boltzmann machines

    Cogn. Sci.

    (1985)
  • J. Pearl

    Evidential reasoning using stochastic simulation of causal models

    Artif. Intell.

    (1987)
  • J.S. Bridle

    Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition

  • P. Cheeseman et al.

    AutoClass: a Bayesian classification system

  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm (with discussion)

    J. Roy. Stat. Soc. B

    (1977)
  • M. Derthick

    Variations on the Boltzmann machine learning algorithm

  • A.E. Gelfand et al.

    Sampling-based approaches to calculating marginal densities

    J. Am. Stat. Assoc.

    (1990)
  • M. Henrion

    Towards efficient probabilistic diagnosis in multiply connected belief networks

  • G.E. Hinton et al.

    Learning and relearning in Boltzmann machines

  • S.L. Lauritzen et al.

    Local computations with probabilities on graphical structures and their application to expert systems (with discussion)

    J. Roy. Stat. Soc. B

    (1988)
  • S.E. Levinson et al.

    An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition

    Bell Syst. Tech. J.

    (1983)
  • Cited by (416)

    • Structural learning of mixed noisy-OR Bayesian networks

      2023, International Journal of Approximate Reasoning
    • Predicting mechanical properties of silk from its amino acid sequences via machine learning

      2023, Journal of the Mechanical Behavior of Biomedical Materials
    • Natural Reweighted Wake–Sleep

      2022, Neural Networks
      Citation Excerpt :

      Variational AutoEncoders (VAEs) Kingma and Welling (2014) and Rezende, Mohamed, and Wierstra (2014) introduce an approximate posterior distribution over the latent variables which are then sampled, thus resulting in stochastic networks. In addition, Helmholtz Machines (HMs) Dayan, Hinton, Neal, and Zemel (1995) consist of a recognition and a generative network both modeled as Sigmoid Belief Network (SBNs) (Neal, 1992), characterized by discrete hidden variables, differently from standard VAEs which commonly adopt continuous Gaussian variables only in the bottleneck layer. The training of stochastic networks is a challenging task in deep learning (Glorot & Bengio, 2010a).

    View all citing articles on Scopus
    View full text