Connectionist learning of belief networks

doi:10.1016/0004-3702(92)90065-6

Artificial Intelligence

Volume 56, Issue 1, July 1992, Pages 71-113

https://doi.org/10.1016/0004-3702(92)90065-6 Get rights and content

Abstract

Connectionist learning procedures are presented for “sigmoid” and “noisy-OR” varieties of probabilistic belief networks. These networks have previously been seen primarily as a means of representing knowledge derived from experts. Here it is shown that the “Gibbs sampling” simulation procedure for such networks can support maximum-likelihood learning from empirical data through local gradient ascent. This learning procedure resembles that used for “Boltzmann machines”, and like it, allows the use of “hidden” variables to model correlations between visible variables. Due to the directed nature of the connections in a belief network, however, the “negative phase” of Boltzmann machine learning is unnecessary. Experimental results show that, as a result, learning in a sigmoid belief network can be faster than in a Boltzmann machine. These networks have other advantages over Boltzmann machines in pattern classification and decision making applications, are naturally applicable to unsupervised learning problems, and provide a link between work on connectionist learning and work on the representation of expert knowledge.

References (22)

D.H. Ackley et al.
A learning algorithm for Boltzmann machines
Cogn. Sci.
(1985)
J. Pearl
Evidential reasoning using stochastic simulation of causal models
Artif. Intell.
(1987)
J.S. Bridle
Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition
P. Cheeseman et al.
AutoClass: a Bayesian classification system
A.P. Dempster et al.
Maximum likelihood from incomplete data via the EM algorithm (with discussion)
J. Roy. Stat. Soc. B
(1977)
M. Derthick
Variations on the Boltzmann machine learning algorithm
A.E. Gelfand et al.
Sampling-based approaches to calculating marginal densities
J. Am. Stat. Assoc.
(1990)
M. Henrion
Towards efficient probabilistic diagnosis in multiply connected belief networks
G.E. Hinton et al.
Learning and relearning in Boltzmann machines
S.L. Lauritzen et al.
Local computations with probabilities on graphical structures and their application to expert systems (with discussion)
J. Roy. Stat. Soc. B
(1988)

S.E. Levinson et al.

An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition

Bell Syst. Tech. J.

(1983)

Cited by (416)

Using a Bayesian Belief Network to detect healthcare fraud
2024, Expert Systems with Applications
Healthcare fraud detection algorithms are mostly based on applying machine learning methods to payer-claims data transactions to detect fraudulent activities. However, claim transactions are often analyzed in isolation, disregarding the interconnections that naturally exist in the claims generated from a single unit such as the patient, payer, or provider. Fraud detection frameworks have untapped potential to aid downstream processes such as investigations and audits through transparent and interpretable machine learning models.
In this study, we show how fraud detection frameworks can benefit from use of a graphical network model called Bayesian Belief Network (BBN) which exploits the relational structure of attributes in a transaction and has better interpretability properties than many competing machine learning methods. BBN model performance is found to be comparable to commonly utilized baseline models such as logistic regression and random forest after validation with a real-life insurance claims dataset. In particular, the F1 score of fraud class for the best performing BBN was 0.15. The proposed BBN model is also considered to be interpretable to auditors and investigators based on common sense qualitative evaluation.
Structural learning of mixed noisy-OR Bayesian networks
2023, International Journal of Approximate Reasoning
In this paper we discuss learning Bayesian networks whose conditional probability tables are either Noisy-OR models or general conditional probability tables. We refer to these models as Mixed Noisy-OR Bayesian Networks. To learn their structure, we modify the Bayesian Information Criterion used for standard Bayesian networks to reflect the number of parameters of a Noisy-OR model. We prove that the log-likelihood function of a Noisy-OR model has a unique maximum and adapt the EM-learning method for the leaky Noisy-OR model. We propose a structure learning algorithm that learns optimal Mixed Noisy-OR Bayesian Networks. We evaluate the proposed approach on synthetic data, where it performs substantially better than standard Bayesian networks. We perform experiments with Bipartite Noisy-OR Bayesian networks of different complexity to find out when the results of Mixed Noisy-OR Bayesian Networks are significantly better than the results of standard Bayesian networks and when they perform similarly. We also study how different penalties based on the number of model parameters affect the quality of the results. Finally, we apply the suggested approach to a problem from the domain of linguistics. Specifically, we use Mixed Noisy-OR Bayesian Networks to model the spread of loanwords in the South-East Asian Archipelago. We perform numerical experiments in which we compare the prediction ability of standard Bayesian networks with Mixed Noisy-OR Bayesian networks and test different pruning methods to reduce the number of parent sets considered.
Predicting mechanical properties of silk from its amino acid sequences via machine learning
2023, Journal of the Mechanical Behavior of Biomedical Materials
The silk fiber is increasingly being sought for its superior mechanical properties, biocompatibility, and eco-friendliness, making it promising as a base material for various applications. One of the characteristics of protein fibers, such as silk, is that their mechanical properties are significantly dependent on the amino acid sequence. Numerous studies have been conducted to determine the specific relationship between the amino acid sequence of silk and its mechanical properties. Still, the relationship between the amino acid sequence of silk and its mechanical properties is yet to be clarified. Other fields have adopted machine learning (ML) to establish a relationship between the inputs, such as the ratio of different input material compositions and the resulting mechanical properties. We have proposed a method to convert the amino acid sequence into numerical values for input and succeeded in predicting the mechanical properties of silk from its amino acid sequences. Our study sheds light on predicting mechanical properties of silk fiber from respective amino acid sequences.
Knowledge Discovery: Methods from data mining and machine learning
2023, Social Science Research
The interdisciplinary field of knowledge discovery and data mining emerged from a necessity of big data requiring new analytical methods beyond the traditional statistical approaches to discover new knowledge from the data mine. This emergent approach is a dialectic research process that is both deductive and inductive. The data mining approach automatically or semi-automatically considers a larger number of joint, interactive, and independent predictors to address causal heterogeneity and improve prediction. Instead of challenging the conventional model-building approach, it plays an important complementary role in improving model goodness of fit, revealing valid and significant hidden patterns in data, identifying nonlinear and non-additive effects, providing insights into data developments, methods, and theory, and enriching scientific discovery. Machine learning builds models and algorithms by learning and improving from data when the explicit model structure is unclear and algorithms with good performance are difficult to attain. The most recent development is to incorporate this new paradigm of predictive modeling with the classical approach of parameter estimation regressions to produce improved models that combine explanation and prediction.
A recurrent neural network model for predicting two-leader car-following behavior
2023, Transportation Letters
Unlike lane-based traffic where each driver has a distinct leader, the subject driver in disorderly traffic may interact with multiple vehicles in-front. The existence of lateral interactions among the vehicles in-front adds even more complexity to modeling the human-driving process. Utilizing trajectory data extracted from an instrumented vehicle study, this research attempts to propose a gated recurrent unit neural network model to predict responses of vehicles interacting with two leading vehicles simultaneously. The recurrent neural network model can illustrate realistic human-like following behavior of drivers, much better than the classical optimal velocity-based models in terms of trajectory reproducing accuracy. The model can also explain the closing-in, shying-away behavior and local stability properties. Results of the study provide insights into the driving behavioral phenomena of disorderly traffic flows and can contribute to the development of a realistic microsimulation model, smarter autonomous systems, and in-traffic safety evaluation as well.
Natural Reweighted Wake–Sleep
2022, Neural Networks
Citation Excerpt :
Variational AutoEncoders (VAEs) Kingma and Welling (2014) and Rezende, Mohamed, and Wierstra (2014) introduce an approximate posterior distribution over the latent variables which are then sampled, thus resulting in stochastic networks. In addition, Helmholtz Machines (HMs) Dayan, Hinton, Neal, and Zemel (1995) consist of a recognition and a generative network both modeled as Sigmoid Belief Network (SBNs) (Neal, 1992), characterized by discrete hidden variables, differently from standard VAEs which commonly adopt continuous Gaussian variables only in the bottleneck layer. The training of stochastic networks is a challenging task in deep learning (Glorot & Bengio, 2010a).
Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake–Sleep (WS) and more recently by improved versions, such as Reweighted Wake–Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connections in an SBN induces sparsity in the Fisher Information Matrices associated to the probabilistic models, in the form of a finely-grained block-diagonal structure. In this paper we exploit this property to efficiently train SBNs and HMs using the natural gradient. We present a novel algorithm, called Natural Reweighted Wake–Sleep (NRWS), that corresponds to the geometric adaptation of its standard version. In a similar manner, we also introduce Natural Bidirectional Helmholtz Machine (NBiHM). Differently from previous work, we will show how for HMs the natural gradient can be efficiently computed without the need of introducing any approximation in the structure of the Fisher information matrix. The experiments performed on standard datasets from the literature show a consistent improvement of NRWS and NBiHM not only with respect to their non-geometric baselines but also with respect to state-of-the-art training algorithms for HMs. The improvement is quantified both in terms of speed of convergence as well as value of the log-likelihood reached after training.

View all citing articles on Scopus

View full text

Connectionist learning of belief networks

Abstract

Cogn. Sci.

Artif. Intell.

Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition

AutoClass: a Bayesian classification system

Maximum likelihood from incomplete data via the EM algorithm (with discussion)

J. Roy. Stat. Soc. B

Variations on the Boltzmann machine learning algorithm

Sampling-based approaches to calculating marginal densities

J. Am. Stat. Assoc.

Towards efficient probabilistic diagnosis in multiply connected belief networks

Learning and relearning in Boltzmann machines

Local computations with probabilities on graphical structures and their application to expert systems (with discussion)

J. Roy. Stat. Soc. B

An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition

Bell Syst. Tech. J.