Causal discovery is at the core of human cognition. It enables us to reason about the environment and make counterfactual predictions about unseen scenarios, that can vastly differ from our previous experiences. We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure. In particular, our goal is to discover the structural dependencies among environmental and object variables: inferring the type and strength of interactions that have a causal effect on the behavior of the dynamical system. Our model consists of (a) a perception module that extracts a semantically meaningful and temporally consistent keypoint representation from images, (b) an inference module for determining the graph distribution induced by the detected keypoints, and (c) a dynamics module that can predict the future by conditioning on the inferred graph. We assume access to different configurations and environmental conditions, i.e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions. We evaluate our method in a planar multi-body interaction environment and scenarios involving fabrics of different shapes like shirts and pants. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions. The causal structure assumed by the model also allows it to make counterfactual predictions and extrapolate to systems of unseen interaction graphs or graphs of various sizes.
Here we show the qualitative results of the unsupervisely detected keypoints from our perception module.
1.1 Multi-body Interaction
1.2 Fabric Manipulation
2. Discovered Causal Summary Graph
Here we show the qualitative results of the discovered graph and the predicted future from our inference and dynamics modules.
2.1 Multi-body Interaction
Example #2 (less balls than training)
Example #3 (more balls than training)
2.2 Fabric Manipulation
For the cloth environment, the keypoints on the fabrics act as a reduced-order representation of the original system, where we do not know the ground truth causal summary graph.
Show as the following, the same inference module produces different causal graphs for different types of fabrics that reflect the underlying connectivity patterns, which illustrates the model’s ability to recognize the underlying dependency structure.