Causal Bandits Podcast

Causal Bandits Podcast with Alex Molak is here to help you learn about causality, causal AI and causal machine learning through the genius of others.

The podcast focuses on causality from a number of different perspectives, finding common grounds between academia and industry, philosophy, theory and practice, and between different schools of thought, and traditions.

Your host, Alex Molak is an a machine learning engineer, best-selling author, and an educator who decided to travel the world to record conversations with the most interesting minds in causality to share them with you.

Enjoy and stay causal!

Keywords: Causal AI, Causal Machine Learning, Causality, Causal Inference, Causal Discovery, Machine Learning, AI, Artificial Intelligence

All Episodes

Causal Bandits Podcast

From Physics to Causal AI & Back | Bernhard Schölkopf Ep 17 | CausalBanditsPodcast.com

June 03, 2024 • Alex Molak • Season 1 • Episode 17

Send us a text

Causal AI: The Melting Pot. Can Physics, Math & Biology Help Us?

What is the relationship between physics and causal models?

What can science of non-human animal behavior teach causal AI researchers?

Bernhard Schölkopf's rich background and experience allow him to combine perspectives from computation, physics, mathematics, biology, theory of evolution, psychology and ethology to build a deep understanding of underlying principles that govern complex systems and intelligent behavior.

His pioneering work in causal machine learning has revolutionized the field, providing new insights that enhance our ability to understand causal relationships and mechanisms in both natural and artificial systems.

In the episode we discuss:

Does evolution favor causal inference over correlation-based learning?
Can differential equations help us generalize structural causal models?
What new book is Bernhard working on?
Can ethology inspire causal AI researchers?

Ready to dive in?

About The Guest
Bernhard Schölkopf, PhD is a Director at Max Planck Institute for Intelligent Systems. He's one of the cofounders of European Lab for Learning & Intelligent Systems (ELLIS) and a recepient of the ACM Allen Newell Award, BBVA Foundation Frontiers of Knowledge Award, and more. His contributions to modern machine learning are hard to overestimate. He's a an affiliated professor at ETH Zürich, honorary professor at the University of Tübingen and the Technical University Berlin. His pioneering work on causal inference and causal machine learning inspired thousands of researchers and practitioners worldwide.

Connect with Bernhard:

Bernhard on Twitter/X
Bernhard on

Inspiring Tech Leaders - The Technology Podcast
Interviews with Tech Leaders and insights on the latest emerging technology trends.

Listen on: Apple Podcasts Spotify

Support the show

Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4

017 - CB017 - Bernhard Scholkopf

Bernhard Scholkopf: A very interesting metaphor to me comes from Conrad Lorenz, who was one of the fathers of ethology, the study of animal behavior. And he at some point said that thinking is nothing but acting in an imagined space. But then there's another aspect of biological systems, which is that our training data is, is finite.

And I think some of the questions around generative modeling, especially when it comes to a controllable generation. A lot of people are working on this and don't even know that this has something to do with causality. I think the general idea is if you have an internal world model, you can learn without having to risk your life every time.

Marcus: Hey, Causal Bandits. Welcome to the Causal Bandits podcast. The best podcast on causality and machine learning on the internet.

Jessie: Today we're traveling to Los Angeles to meet our guest. Is curiosity inspiring? Fired him to study physics and mathematics. He completed his PhD under the supervision of the legendary of Vladimir Nik.

He worked with at and t, bell Labs, Microsoft Research, Amazon, and more recipient of the A CML and Newell Award. Co-founder of Ellison Director at the Max Plank Institute for Intelligence Systems, professor Bernhard Schölkopf. Lemme pass it to your host, Alex Molak.

Alex: Welcome to the podcast Bernhard.

Bernhard Scholkopf: Thank you very much.

Alex: How do you like the conference so far?

Bernhard Scholkopf: Yeah, I think it's very exciting, very interesting talks, and it's a, it feels like a nice community is forming around these problems at the intersection, or maybe also in the union of causal inference and machine learning. So it's exciting to see this coming together.

What brought you into causality? Your background is very, is very rich. What was the one or two aspects that brought you, that attracted you to this field? Yeah. So it, I think it

was several things coming together. Maybe the first one was at some point I was in this, maybe this already 20, 25 years ago, I was invited to a conference was called the interface.

I don't know if it still exists. It was a conference Somehow, I don't know, maybe it was the interface between statistics and computer science. It was held not far from here, I think in Orange County. So I was invited to give a talk. I was working in kernel methods those days. And, uh, but then there was some other very interesting invited talks, and one of them was by Judea Pearl.

And I was, quite blown away that there is such a theory of causal inference. And I had always found this interesting from a philosophical point of view, causality, and to understand that this is studied using tools of mathematics, I thought was fascinating. At the same time, I was also very excited about the field that I was working at the time.

So I didn't I didn't change my research direction. Uh, but then some years later, old friend of mine, Dominic Janssen came and said, and he was working in quantum information theory at the time. He had a student who wants to work with him on causality. And Dominic told him, look, I don't really work on causality.

But then, uh, we started talking about it together and we thought, well, it's not so far from machine learning and maybe we can. Advise the student together, and then we basically this student was so persistent. She said he wants to work on this with us, and then he convinced us. And then we started getting into that field and got more and more sucked into it.

And at some point, I almost stopped working little methods. Or now it's just a small fraction of what I do. And I work mostly on causality. And it's, I think it's, um, maybe to me at some point, you know, at the beginning, we were interested in the standard problems of causality. Cosmic Discovery, Cosmic Graphs, and so on.

But pretty soon it became clear to me that actually many of the interesting open problems with machine learning are connected to causality. So I, I didn't see it as two separate things anymore, but actually as a way to make progress on fundamental problems, machine learning and inference.

Alex: What was the story behind, behind the book you call for the elements of causal inference?

Bernhard Scholkopf: Well, we had worked on causality For some time, when we start writing, I don't remember exactly. And earlier, I had written a book about kernel methods with Alex Vola and, you know, writing a book is very painful. And if you finished one, and we must have worked on that for, I don't know, four or five years, finish one book, you swear to yourself, you're never going to write a book again.

Um, but then, uh, this was maybe, 10 years in the past. I had forgotten about that. And then Dominic and I, we started talking about writing a book and we had a very good student. I guess he must have been a student when we started thinking about it, Jonas Peters, but then he graduated. And then the three of us, uh, said, well, why don't we try to do that together?

And we, I think we were thinking similarly, we also reasonably close mathematics. And we, we thought it would be nice to have a modern treatment, sort of reasonably compact of them, the main ideas of causality. And it was actually fun. It was. Maybe it was actually a little bit less painful than the first book because from the start we said let's try to keep it Reasonably short and if someone is thinking about writing a book, I would always give the recommendation try to aim for something short It's gonna get longer anyway, and and try to finish it within two years Otherwise, if it starts cracking out, it's too painful.

Alex: As an author myself, I must say that I think this is an excellent piece of advice.

Bernhard Scholkopf: Now we're working on another one.

Alex: Yes? Oh, what is it about?

Bernhard Scholkopf: About causal representation learning.

Alex: Okay, that's great. I didn't know about it.

Bernhard Scholkopf: Yeah, so it's still, still work in progress. I hope, let's see, I hope we can finish.

Alex: Can you share something with the audience already about the book, uh, maybe about the structure or the, um, content of the main, main ideas?

Bernhard Scholkopf: So it

will cover, of course, some basics of causality, but we're also thinking a lot about representation learning. So, you know, nowadays, modern machine learning is about representation learning. I mentioned generative models. And, uh, if you analyze, analyze high dimensional data, of course, or in general in machine learning, as long as you're in an IRD setting or independent identically distributed data, it is enough to just look at correlation, statistical dependencies, and, and exploit these.

If you are in a setting where things change. And, uh, change could mean that the distribution changes, it could also mean that the variables that you measure change. So today you see this set of variables, tomorrow you see another set of variables. So all these settings that occur a lot in the real world, from our point of view, have to do with causality.

And now when it comes to causal representation learning, there's this idea that you sometimes have high dimensional data, uh, where they are. The entities which should be modeled causally are not given to you. So you might see a scene with some objects in it. Nobody tells you a priori which pixels belong together and form an object.

So you have to somehow learn this. You have to either learn it by having a data set where things change. You know, sometimes moves the lighting. Or sometimes moves the camera position, or someone moves the camera, or you get to move the object yourself. So there are ways to violate the IID assumption that also give you a hint as to how you should represent the data in the first place.

And what are the, the, uh, objects or the symbols or the dimensional representations that actually should be scrutinized or should be learned. So, so we're trying to move the field a little bit in this direction and understand, understand this better. And of course it has to do with the classical problem of causal discovery, but we think it's not identical to that.

So it's a little bit like If you think of, you know, one of the major advances of machine learning compared to classical AI is that in classical AI, we assume that the symbols are given, and then we think of algorithms, how these symbols should be processed. So if your symbols could be a positions of the pieces on a chessboard and the types of chess figures that you have, and if these are given to you, You can think about clever algorithms, uh, to search through the game tree, et cetera.

But maybe they're not given to you. Maybe you just observe chess play and you get to manipulate the symbols and you see how people learn chess. And then it's a hard problem, maybe even harder than the actual chess problem, to identify these things in the first place and find the representation of the data on which you then can perform reasoning, learning, et cetera.

And how do you see causality entering this picture? Would you see it as another step going beyond just learning the representation?

When I think about, uh, intelligence or artificial intelligence, but actually I like to think of it as a more general problem, which is not just about artificial intelligence, but also about natural intelligence systems.

So I think the most, of course, the only real examples of intelligence that we have are in the animal kingdom. So, Animal intelligence is because those are really the examples where we have compelling forms of intelligence currently realized in the world. And uh, if you didn't think about how humans or animals think, then a very interesting metaphor to me comes from Conrad Lorenz, who was one of the fathers of ethology, the study of animal behavior.

And he at some point said that thinking is nothing but acting in an imagined space. Now, if you think about the. The current state of artificial intelligence or machine learning, generative AI, it's a lot about representations, we talked about that before, but usually there are statistical representations.

So you have some statistical dependencies in the data, and then you try to transform these dependencies in a useful space, maybe a lower dimensional space or space that generalizes to new problems, etc. But by and large, I think it's fair to say it's still about statistical representations. What is correlated?

What, how can we do large scale pattern recognition? And in the end, I think if we're honest, most of the impressive things that we can do with AI is still large scale pattern recognition, pattern matching. Um, so if we want to move in this direction to make the representations interventional, so if we, if we have a representation that allows us to act in representations of thinking in an imagined space like a night moment.

Then the representations have to include a notion of intervention and inaction. And I think that's moving things towards causality. So we have to find these representations and we have to find representations such that they, uh, commit actions in the representations. We also have to find ways of representing actions in the same space.

So we need some kind of working space. And I think biology's a good example for that. So people believe that there's something we have a visual cortex. There are certain representations that are, uh, Driven by a signals from outside, not just outside, but certainly strongly influenced by that. But we have not only that, we also have what biologists called inference copies.

So we have copies of sort of actions that our brain produces with the goal of affecting the world outside or moving the eyes, et cetera. And then if you have both these things represented your actions and information coming from the world, then, then you start moving towards an internal model. It allows you to simulate the world, simulate your actions in the world.

Okay. And the first step towards understanding thinking,

Alex: I recently had a conversation with Andrew Lampinen is a researcher at Google DeepMind. He works with agents, reinforcement learning, but he's also a very interesting causality. And in our discussion, we had this, um, this threat, uh, you know, where we also talked about biological systems that, that you are also inspired by.

And we had this conversation that at least for humans and Some other animals that we know of a large part of what we are learning might be correlational, right? So we do not always use causal models But probably from evolutionary perspective would not be the most efficient way also for us to function but causal models are Very important tools for us.

Nevertheless, we talked today a lot in the community about artificial general general intelligence, like agents that can learn on themselves and so on. What is your intuition or what is your hypothesis? How those systems should be constructed and where should we find? Where should we look for the balance between correlational and.

Symbolic or causal? Yeah, I think that's a, it's a deep question.

Bernhard Scholkopf: Um, I think in the past, uh, the AI community, sometimes we have been a little bit too fast with just dismissing stupid correlational learning. So I think what we're experiencing today is sort of a very compelling demonstration how far we can get with correlational learning.

So maybe that has to be said first, and it's quite possible that a lot of what we do is based on correlation learning, and there's certain things that the brain has to do very fast. And maybe sometimes doing explicit causal reasoning might be more expensive and you might want to be able to have to react fast if a predator comes or something like that.

So I think correlation learning is great. And maybe a lot of what we do should be about that. But then there's another aspect of biological systems, which is that our training data is finite. So we can't train on the whole internet. So the most compelling correlation learning systems, foundation models, large language models, of course, they use huge training sets.

They basically use the sort of collected cultural knowledge produced by billions of humans. And so we are not in this situation. So we have finite resource in terms of training data. We also have finite resources. Computational resources. We have a finite size brain. We can't, we can't grow it beyond limits.

It takes a lot of energy. I think biological systems have to be a little bit more clever about how they learn. And, and if you, I suppose you have to learn multiple tasks and you have to learn them in a way that they work across multiple environments, changing conditions. I don't know that. The light changes, even if you just imagine how the color of the spectrum, the light changes from morning through the middle of the day until the evening, it doesn't make sense to build a separate object recognition system.

Yet, let's say you want to eat an apple, you want to recognize whether it's ripe from the physics point of view, the apple looks very different in the morning. The spectrum of reflected light looks very different in the morning from lunchtime. Uh, still, it looks the same to us. So, so we have methods that process this data.

We have methods of color constancy with methods of, of gain control. And once we have learned such a method, of course, this is a module we can apply not just for apples, but also for pairs and for recognizing people and for recognizing all sorts of things. And I think this is just one example, how in biology, once you have learned one module or one task.

If your resources are finite, then you really try to reuse this somewhere else. You learn in a more, probably in a more modular fashion than we do in modern AI, because we're not forced to do this. We just make the system bigger and bigger. So we're not forced to be clever about modules. And, but then the interesting thing is if you learn in a modularized fashion, and if at the same time, if it's true that the world is also composed much of your module set.

Play together. Then there's this fascinating thought that maybe there could be a bias that the modules that we learn might have something to do on a structural level with what's going on in the world. And it's in the structural level because, so for instance, if we have a method in our It's in your retina, we have gain control mechanisms that allow us to sort of exhibit some degree of invariance across a wide range of brightness in the world.

Now, this module, of course, has, in terms of physics, nothing to do with how the world generates scenes of various brightnesses. It doesn't know about the physics of the sun and, uh, I don't know, the atmosphere and so on. Um, but this module could play a role that corresponds to a module in the world. So that's why I think it's interesting to think about cultural representations in terms of structure, structural similarities in terms of, so there could be some kind of, it could be a homomorphic property.

Sometimes you have certain transformations in the world that form a group. And they might be represented also, uh, by preserving the structure. And there's this second aspect that, for instance, was discussed by, by the physicist Hertz. And he was saying that, uh, if we represent an object in our brain and then we think about the evolution of this object in thought, then the result of this evolution should correspond to performing the evolution in the real world and then representing the evolved object.

So it's a special case if you think about interventions. So I take an object, I look at the object, if I then move the object and look at it again, I should get the same result as if I first looked to close my eyes and just imagine performing the intervention in my brain. So that's a very different operation, right?

I don't really move the object as it's imagined moving it, but I get to the same results of this. commutative diagram going on, uh, which captures this kind of consistency. So I think it's interesting to think about representations from this point of view.

Alex: That's very interesting. And when we think about our own cognition, for instance, uh, for the lens of Daniel Kahneman, who recently passed away, we see that this ability to simulate the world, Might be very good.

Sometimes in certain cases, in certain other cases, it fails. There's a very interesting body of work by a neuroscientist called Donald Hoffman. I don't know if you're aware of him. So he started by running simulations, evolutionary simulations to check How agents in those simulations would learn reality, right?

So he has some concept of reality and then agents under evolutionary pressure, pressures in this environment. And it turned out that the agents that were perceiving the environment in a way that was unbiased were getting extinct in those simulations. And so his conclusion from the study was that perhaps being biased towards whatever is good for our fitness, It's a better evolutionary strategy than seeing reality as it is.

Do you think that this could be a good explanation of what we observe in human and animal cognition? And when we think about the simulations that you mentioned, could that be a reason why we are good at those simulations in certain aspects, but in certain other aspects, not necessarily?

Bernhard Scholkopf: Yeah. So I think so.

Simulations are something that's maybe relatively expensive. So I don't think we will do it. In all tasks, but then, but there's some things, so there's so many different things, uh, interesting things connected to this. So maybe a simple, a simple case, let's say you are, you are trying to hunt an animal, you have a spear, you're trying to throw the spear.

Then maybe you have some kind of mental model, depending, so you have some input parameters, the angle how you throw it, the force that you put in, and you have some kind of mental model how fast the spear is going to get thrown. But maybe if you have it, if you've done it thousands of times, You don't have to run this through this model anymore, but maybe if you teach it to someone else, you teach your children how to hunt animals, you might first explain and show them some examples and tell them about the principles.

So there's, there's an aspect also of communication, I think, which is interesting, especially when it comes to cultural learning. It's a field that I think we haven't studied much, but that's extremely important for. for human learning, and then there might be other tasks that are more complicated than throwing up in spares.

Let's say we want to work out what kind of food we can eat, how we have to prepare the food that we don't get sick afterwards. We might have models of these kind of things, or we might eat something that does make us sick, and then we want to be able to go back and say, oh, let me think about what have I eaten before.

What could it be? This thing I've tried before. I didn't get sick last time, but what I did differently was this thing. So there are also processes of credit attribution, not just in eating, but in many different problems where you can. I think the general idea is if you have an internal world model, you can learn without having to risk your life every time you learn.

You know, you can. You can also learn directly in the real world. And maybe I think maybe many aspects of learning Many types of learning have this aspect and maybe can be done much more cheaply by doing them directly in the world. But I see, I do think there are some aspects to human learning that do benefit from having an internal world models.

And probably in the end, there's a, there's not a clear cut separation between the two. And it might be something that at the beginning. You do explicitly that someone teaches you to brush your teeth and they say, okay, you have to touch each side of each tooth and you have to rub the toothbrush or whatever.

And if you've done it a thousand times, it becomes an automatic thing. When you brush your teeth, you're not thinking anymore about what you're doing, where it comes up, it's an automatic thing and you don't need a, a world model to simulate. You don't, you're not focusing before you start doing it and making a plan for how you brush your teeth.

Alex: Yeah. On the other hand, if I have. Uh, to fake, and I don't want to touch this one, one tooth. I, it's pretty easy for me to imagine what, how I should modify my action, right? Not to, not to touch it. When I was reading the, the elements of, of causal inference, I had an impression that this is one of those books that is inspired heavily by physics.

I don't know how heavily, but compared to other books on causality, uh, probably it's hard to say heavily. If we take this context, what is the, in the, in the, is this basic physical perspective? That can help us when we think about causality.

Bernhard Scholkopf: Yeah, so I think that's an interesting question. So, so, first of all, so Dominic Genzi and I are, are both physicists, uh, originally.

So we met while studying physics. Uh, we then later also moved into mathematics. So I guess the book is, um, in terms of motivations is quite inspired by physics, but it's trying to be also mathematically precise, but it's probably, I think it's fair to say it's closer to physics than most other texts about causality.

And so from my point of view, I don't want to speak for everybody. You can think of. You can describe causal systems on, on multiple levels. So, uh, or you could describe systems, you can model systems on, on multiple levels, of course. And in your, in your machine learning, we model them on the level of statistical dependencies or, or maybe even just correlations.

And the other extreme is we can model differential equations. So the gold standards in physics would be to say you have a coupled system of nonlinear partial differential equations. And if you Manage to fit this to your data or to somehow come up with such a system through experimentation. Then that's the gold standard because it allows you to simulate the system.

It also allows you to reason about interventions about different side initial conditions, etc. So once you have that you can do Everything in it. That's the best thing you could do from the physics point of view. And then the question is, what's, what's in between. And, uh, one way to think about causality about structural causal models is that these are something in between, uh, hopefully preserving some of the simplicity of, of machine learning methods that you can still learn things from data without having a full mechanistic understanding of the system.

Uh, but at the same time, allowing, understand what's going on, allowing us to reason about. So at least a class of interventions, uh, uh, so, so it's somewhere in between, but it's still, of course, in the end, it should be something that's consistent with the underlying physical reality as, as described more closely by the differential equation system.

So we're also quite interested in questions like, suppose you have such a physical system described by differential equation, under which conditions and how can you abstract that into a physical system? a structural causal model, for instance, or, or other levels in between other ways of capturing causality directly in dynamical systems.

So I think that from this point of view, the thinking is inspired by physics. We're trying to be consistent with physics. And also a lot of our work, it's about thinking. One way to think of causality would be to say, we think about mechanisms rather than Uh, about statistical dependencies and mechanisms are one level lower, they give rise to statistical dependencies, but a mechanism is something that, from my point of view, is a physical mechanism, a physical process.

I think this is also consistent with how some computer scientists think about causality. I think Judea Pearl also fundamentally, in the end, thinks about mechanisms and tries to understand causality in terms of mechanisms. And so I would even. We view them philosophically as almost as a physicist thinking about causality and then a lot of our work about independent mechanisms.

So coming up with additional assumptions that are not just mathematical or structural about causality, but they try to capture something. about how cost systems are realized in the world, physically realized. I think that's also very much inspired by physics.

Alex: You talked about differential equations. In one of the recent papers that you co authored, you proposed a A new formalism for talking about causality based on stochastic differential equations.

What was your motivation for this work?

Bernhard Scholkopf: Yeah, so I think, uh, you are referring to the work with, uh, last laws. Yes, yes. Course. Yes. I think that's a very interesting direction. Um, I think there were multiple motivations. One is that often in practice, if we look at cost systems, we, we do have type serious data.

For instance, if you look, look at biological problems, you often have. I'm serious data. And the second thing is often impractical positive problems. You can't guarantee that it's systems form a direct, it's like a graph. You might have loops. You can unroll the loops by extending things in time. And then the question is whether such causal models unrolled in time are still the optimal formalism for this or not.

And then, so Lars was thinking about this problem and came up with an alternative formulation and has a nice mathematical connection to kernel methods. He came up with a notion that he called the kernel deviation from stationarity. So I think it's an interesting framework to try to see how much of the causal Formalism can be retained if we move to this more general, uh, time dependent setting.

What are two books that changed your life? Two books that changed my life. Uh, you know, it's hard to say whether these are, you, you mean science books or Oh, I leave it to you. Okay, let me think. So I, I, I'm quite influenced by the literature of Borges, the Argentinian writer, and I've been thinking about Borges, uh, for, uh, Well, ever since I discovered, actually, I think I first read about Bosch's in the book of Douglas Hofstadter is this book, Goethe and Robach, which was an old AI Bible.

And I read this before I even knew that the field of machine learning existed. And he, I think somewhere in this book, or maybe in one of the other books of Hofstadter, he also reprints a very short text by Bosch's, which is called Bosch's and I, and, uh, It's about Bosch's writing about the author Bosch's and he says, okay, there's this guy and I read about him in the newspaper and He's written some stories he's written some useful texts, but He somehow talks about things in a vain way that doesn't I I don't I probably don't remember in detail But it's a very interesting article that It sort of works out the difference between him as a, as an individual and him as an author.

And he starts seeing himself from the outside as an author for his published work. And then at the end of the text, he says, I don't know who of the two of us has written this text. So I thought this is very interesting. And then I started getting into Bosch. So I think that's, um, That's one book that influenced me.

A book that influenced me when I was a physicist was an interesting book by Hans Primas. It's called Chemistry, Quantum Physics, and Reductionism. So that's maybe one science book that I found very interesting. And then, of course, in machine learning, I was early on quite influenced by the books of Vapnik, who was my PhD advisor.

Um, I managed to, or I ended up working with him through an exchange program, and I was looking around. It was an exchange program with Bell Labs where Vapnik worked, and somehow I got my hands on a text that was about different people at Bell Labs that are working on things related to machine learning.

This is the web. Lots of interesting people there, I don't know, Jan Lekar, Patrice Simard, Isabelle Guillon, and then a bunch of people in there in the Murray, so these were all in Homedale, a bunch of people in the Murray Hill lab, and, and there was Mavnik in the Homedale lab, and I didn't know Mavnik before I saw this text, but then I noticed he had written a book, which was called Estimation of Dependencies from Empirical Data or Statistical Data, I think Empirical Data.

And I looked at it and I was blown away because I thought, wow, this is, um, I was always interested in this problem. Why, why and how can we perceive structure, perceive and identify non random structure in the world? And here there was a book of someone studying this mathematically and studying under which condition can we identify dependencies in the world?

And then I think that even though the book is very dry and technical in places, I think the underlying philosophy influenced me a lot. And then when I came to Vapnik, maybe that influenced me even more. He was just in the process of writing another book about learning theory and called it The Nature of Statistical Learning Theory.

That was, uh, maybe even a little bit more philosophical. It was in the process of writing it. So I had to sort of proofread many versions of it and discuss many parts of that book with him. So in a way, I was lucky that I arrived when he was working in this because that led to a lot of discussions and some understanding of how he's thinking about this problem.

So this book, which I got to know before it was even finished, maybe that was the scientific book that influenced me most. I don't know.

Alex: What's your message to the causal community and in particular. Where do you think or do you believe we should focus our efforts today in research in order to move the entire field forward?

Bernhard Scholkopf: Yeah. So I think one simple answer is to say we have to come up with compelling applications. So we have to, I think in this community we understand, and on a philosophical level we understand that it's not enough to just model statistical dependencies if we want to understand the world. Um, but I think to convince the broader community, we need compelling applications.

So I think that's the one message. There's the other messages. I think we. We have to really work at the interface between causality and generative modeling. Generative modeling is now, uh, it's a very hot topic in machine learning. And I think, uh, some of the questions around generative modeling, especially when it comes to controllable generation, a lot of people are working on this and don't even know that this has something to do with causality.

So I think we as a causality community have get into this also, we have to get our hands dirty and, and understand how to train high performance generative models. So we shouldn't be afraid of neural networks. Uh, this is how currently these kinds of problems are used, but use them in a way that connects to causality in interesting ways.

I think that's what I would encourage people, especially young students moving into the field. I think there's a lot of interesting things to do there.

Alex: Before we finish, I would like to ask you one more question. You studied physics and mathematics and dealt with very, very, uh, challenging topics in your, in your career, and you did it successfully as, at least as, as it looks like from my perspective.

What would be your advice to people who are coming to complex? Fields like causality or advanced physics or mathematics. What would be your advice? What what should they focus on or what skills should they train in order to be successful in this field?

Bernhard Scholkopf: Yeah, one thing is to pick a good problem and you when you want to pick a problem I think you want to pick one which is not Already beaten to death.

So at some point I was before I moved into machine learning I was working on quantum field theory and algebraic quantum And theory and it's absolutely beautiful field. But, um, I mean, we're not going into that field. Many, many smart people had already been thinking about it for 50 years or more than 50 years.

I mean, you can, you can do this, but then the bar is quite high. If you want to contribute something interesting, potentially new from that point of view, choosing the right area. I think it's an important aspect, but then once you've chosen it, I think you should not be afraid of going into depth. So if you are in an area which is not yet so much explored, then my experience is almost no matter what you look at, if you go into sufficient depth, you find something intriguing and interesting.

So you just have to stick long enough with the problem and don't be disencouraged if it doesn't work. Work immediately because even even if you don't solve what you set out to solve you will find something interesting If you dig deep, so I would try to encourage people. That's also what I tell my students.

Don't not be afraid of that

Alex: That's a beautiful advice. Thank you so much. It was a pleasure.

Bernhard Scholkopf: Thank you.

Alex: Thank you