(a) Causal representation learning aims to infer abstract, high-level causal variables and their relations from low-level perceptual data such as images or other sensor measurements [ ]. Recent work in this direction includes: (b) a proof that self-supervised learning isolates the invariant (content) representation c that is shared across views (e.g., obtained via data augmentation) [ ]; (c) a method for extracting causal structure from trained deep generative models that allows for interventions leading to novel "hybrid" data [ ]; and (d) a new instantiation of the principle of independent mechanisms suitable for unsupervised representation learning [ ].
Causal representation learning aims to move from statistical representations towards learning causal world models that support notions of intervention and planning, see Fig. (a) [ ].
Coarse-grained causal models Defining objects that are related by causal models typically amounts to appropriate coarse-graining of more detailed models of the world (e.g., physical models). Subject to appropriate conditions, causal models can arise, e.g., from coarse-graining of microscopic structural equation models [ ], ordinary differential equations [ ], temporally aggregated time series [ ], or temporal abstractions of recurrent dynamical models [ ]. Although models in economics, medicine, or psychology typically involve variables that are abstractions of more elementary concepts, it is unclear when such coarse-grained variables admit causal models with well-defined interventions; [ ] provides some sufficient conditions.
Disentanglement A special case of causal representation learning is disentanglement, or nonlinear ICA, where the latent variables are assumed to be statistically independent. Through theoretical and large-scale empirical study, we have shown that disentanglement is not possible in a purely unsupervised setting [ ] (ICML'19 best paper). Follow-up works considered a semi-supervised setting [ ], and showed that disentanglement methods learn dependent latents when trained on correlated data [ ].
Multi-view learning Learning with multiple views of the data allows for overcoming the impossibility of purely-unsupervised representation learning, as demonstrated through identifiability results for multi-view nonlinear ICA [ ] and weakly-supervised disentanglement [ ]. This idea also helps explain the impressive empirical success of self-supervised learning with data augmentations: we prove that the latter isolates the invariant part of the representation that is shared across views under arbitrary latent dependence, see Fig. (b) [ ].
Learning independent mechanisms For image recognition, we showed (by competitive training of expert modules) that independent mechanisms can transfer information across different datasets [ ]. In an extension to dynamic systems, learning sparsely communicating, recurrent independent mechanisms (RIMs) led to improved generalization and strong performance on RL tasks [ ]. Similar ideas have been useful for learning object-centric representations and causal generative scene models [ ].
Extracting causal structure from deep generative models We have devised methods for analysing deep generative models through a causal lens, e.g., for better extrapolation [ ] or creating hybridized counterfactual images, see Fig. (c) [ ]. Causal ideas have also led to a new structured decoder architecture [ ] and new forms of gradient combination to avoid learning spurious correlations [ ].
New notions of non-statistical independence To use the principle of independent causal mechanisms as a learning signal, we have proposed two new notions of non-statistical independence: a general group-invariance framework that unifies several previous approaches [ ], and an orthogonality condition between partial derivatives tailored specifically for unsupervised representation learning, see Fig. (d) [ ].