Causal Bandits Podcast

Causality, LLMs & Abstractions || Matej Zečević || Causal Bandits Ep. 000 (2023)

November 06, 2023 Alex Molak Season 1 Episode 0
Causality, LLMs & Abstractions || Matej Zečević || Causal Bandits Ep. 000 (2023)
Causal Bandits Podcast
More Info
Causal Bandits Podcast
Causality, LLMs & Abstractions || Matej Zečević || Causal Bandits Ep. 000 (2023)
Nov 06, 2023 Season 1 Episode 0
Alex Molak

Send us a Text Message.

Support the show

Video version of this episode available on YouTube
Recorded on Aug 14, 2023 in Frankfurt, Germany


Are Large Language Models (LLMs) causal?

Some researchers have shown that advanced models like GPT-4 can perform very well on certain causal benchmarks.

At the same time, from the theoretical point of view it's highly unlikely that these models can learn causal structures. Is it possible that large language models are not causal, but talk causality?

In our conversation we explore this question from the point of view of the formalism proposed by Matej and his colleagues in their "Causal Parrots" paper.

 We also discuss Matej's journey from the dream of becoming a hacker to a successful AI and then causality researcher. Ready to dive in?

Links 

Should we build the Causal Experts Network?

Share your thoughts in the survey

Out-of-the-box insights from digital leaders
Delivered is your window in the minds of people behind successful digital products.

Listen on: Apple Podcasts   Spotify

Support the Show.

Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4

Show Notes Transcript Chapter Markers

Send us a Text Message.

Support the show

Video version of this episode available on YouTube
Recorded on Aug 14, 2023 in Frankfurt, Germany


Are Large Language Models (LLMs) causal?

Some researchers have shown that advanced models like GPT-4 can perform very well on certain causal benchmarks.

At the same time, from the theoretical point of view it's highly unlikely that these models can learn causal structures. Is it possible that large language models are not causal, but talk causality?

In our conversation we explore this question from the point of view of the formalism proposed by Matej and his colleagues in their "Causal Parrots" paper.

 We also discuss Matej's journey from the dream of becoming a hacker to a successful AI and then causality researcher. Ready to dive in?

Links 

Should we build the Causal Experts Network?

Share your thoughts in the survey

Out-of-the-box insights from digital leaders
Delivered is your window in the minds of people behind successful digital products.

Listen on: Apple Podcasts   Spotify

Support the Show.

Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4

[00:00:00] 

It's a double edged sword because there's a lot of wrong intuition that we have about causation. Hey, Causal Bandits. Welcome to the Causal Bandits Podcast, the best podcast on causality and machine learning on the internet. Today we're traveling to Frankfurt to meet our guest. He worked in robotics and wanted to become a hacker.

He's a passionate person and a dedicated researcher with interest in AI, philosophy, and neuroscience. He co authored papers at the intersection of causality and large language models, graph neural networks, and more. A firm believer in human. And a passionate community leader. Ladies and gentlemen, Mr.

Matej Zecevic, lemme pass it to your host, Alex Molak. 

Hi Matej. How are you today? 

I'm doing fine. And how are you? Doing fine. Yeah. 

Beautiful. I'm also good a little bit, uh, tired after the trip. As always but I'm super excited about our conversation today. Me 

too. I'm looking very much forward to this episode today.

You're a very successful researcher and a very successful community leader. Now you co organized NeurIPS' causal workshop last year. Uh, you published or called forth papers DeepMind's Petar Velichkovich. Yuda Perl talked about your work on, on, on Twitter. How all of this has started for you? 

Okay.

This, this is a lot to digest. I guess I'll start with the first. So, , thank you for the compliment. I don't see myself that way. I'm trying to be, so if the perception is, is, is coming to, to this point, then I guess I'm doing something right. But in general, I'd say that all of this started really, we were talking about this, uh, just recently, uh, with the book of why, Pearl's book, which sparked the flame of interest in here.

Of course, I was already studying AI, in my computer science studies at TU Darmstadt. But yeah, that's, that's where it kicked off. And then from there, it's, I guess, just the combination of tools and, and, and tips and tricks and all the amazing people who have supported me all along the way, which has led to this point that I can use all of these things.

And, and I guess things like community and so on have always been important to me. So you talked about 

 The book of why, which means I understand you focus on the causal part of the equation. But causality was not the first topic in machine learning AI or computer science for you, right?

Correct. So I actually started my, um, computer science studies, my undergrad in TU Darmstadt, my bachelors, um, and, and, and wanted to do computer security, IT security. And then really quickly through, uh, professor Jan Peters, who is a big name in robotics and reinforcement learning convinced me otherwise very, very convincingly.

And, um, this way I started in robotics a little bit, then it went into a bit of neuroscience, all kinds of topics relate to intelligence. Um, before then, when I was in my masters, I, I suddenly stumbled upon this book, right? And, you know, a long tradition of statistics just got overwhelmed and, um, and yeah, that's, that's how we got here.

What's 

the common denominator for, for all of this, uh, security robotics, and then finally causality? Is there something that some question that you had some more basic question that you are trying to answer and find some answers? Looking into those three fields or maybe more fields.

So if I take security for instance, that's an interesting question Then then I'd say it was really just about the coolness factor over, you know Um, adolescent who's trying to be cool, trying to be a hacker, like they portray in the movies. Um, I guess with intelligence it's a bit similar, but with intelligence it really was more the foundational question.

Trying to understand oneself, right? Giving oneself meaning in a sense, right? Might be broad speaking now, but it's still, this is, I guess, deep down what motivated me. Um, but if you take now the subtopics in regards, say, uh, reinforcement learning, uh, I mean, related fields like neuroscience, cognitive science, continual learning, all these different aspects of learning, right?

Statistical learning theory. Then I think that the common thing is really trying to, to figure out what intelligence is, get it, you know, operationalized. But coming from different perspectives, right? So I, I always felt [00:04:00] that reinforcement learning was closer to robotics while, uh, you know, neuroscience, cognitive science, they are closer to the human, but all have the same ultimate goal.

So, so there was a personal question behind all of this in a, in a sense. 

Yeah, I guess. So, um, as, as you said, uh, I'm actually, uh, generally passionate about everything. And I guess that's just stems from my, from my person genetically, ethically. Uh, and also ethnologically and all these aspects, right, which, which make a person.

So, um, I've always been very passionate about all these different things. And, uh, I guess, the, the natural curiosity of a scientist, maybe, yeah, that's really what sparked it for me. And given that intelligence is by no means resolved, it's not really even defined, right? We all have some kind of intuition, I guess, for me, the best definition is actually that, um, We are our own example of it, right?

So, so whatever we are, and that's how I also go about causal inference, right? That, that, that's the key argument for me why it's [00:05:00] necessary. Not that it's sufficient, but it's necessary. 

Necessary, but not necessarily sufficient for? For what? For intelligence. For intelligence. Yes. So reasoning causally would be this necessary, but not a sufficient element.

That's my personal belief, right? It could also be sufficient. I guess if you ask someone like Judah, he would say probably. It's also sufficient. Um, to me, it's definitely necessary, but not necessarily sufficient because there's all these different topics which are kind of hard to grasp, like emotions and things like that.

Regarding your curiosity, what fascinates you the most or what do you find the most promising direction in causal research today? That's 

a difficult one. So I always think of it as these two strands, right? So... We, as you know, I also organized this discussion group, by the way, please check it out if you haven't so far.

Yeah, if you're interested. We'll, we'll give you a link in the, in the, uh, show notes. It's the weekly causality discussion group. And, and we had a session by, uh, Marcel t He's from, I think it was Lubeck, university of Lubeck in the north of Germany. They do some amazing work. Shout out to Marcel. And, um, he was actually, uh, talking about this, that there's like causal discovery and then there's this inference part, right where the inference part is more about modeling assumption about.

You know, having sound conclusions based off what you have and what you have is usually then assumptions on the graph. While causal discovery is, uh, the problem of getting that graph in the first place from data, right? It's, it's more machine learning in a way, yeah? And, um, these are kind of like the two, two main strands.

And so research is happening there and it's super important. Uh, I think what is missing though is kind of the, the bridge between the two. And also what is missing and what I'm trying to do with my own research actually is, um, more philosophical in nature, less scientific, more philosophical in the sense that, uh, have we asked all the questions, right?

Like broadening the horizon, looking outside of the box, right? And, and yeah, just, just raising questions we might have not asked before. [00:07:00] And obviously that goes with, you know, questions that might not be of any relevance, but well, that's where you try, right? And, and maybe something, uh, will actually be very relevant.

So I think it's about interfaces, about broadening the horizon. These are missing things. Um, and apart from that, I think, especially in machine learning now, it's, it's very much about this discovery part, right? Learning representations and things like that. Um, to me, actually, a little bit what is missing still and which I personally find very important is abstractions.

So there's been this work by Sander Beckers, also shout out to you Sander, um, and, and Ruhmstein and all others who pioneered in this direction. Which is kind of analytical philosophical. So very theory grounded. Um, and there's first attempts. So also the group from Moritz Grosser Windrup in Vienna, they are really pushing for this.

But still, it's kind of a bit less. Yeah. Also, what I would maybe mention as a final point [00:08:00] is, Um, the connection to logic. So, as you said, we organized this NIRVs workshop last year, 2022, and it was titled, uh, Neurocausal and Symbolic AI. So, neuro was this part for neural networks, deep learning, all the modern stuff.

Causal for causality and symbolic for all the, well, logic based work, symbolic work, as we say, the gophi, the good old fashioned AI, um, and, and, and neuro symbolic stuff, which is kind of the intersection of, of, of these two worlds, to talk to causality, because I really feel like it's Two sides of the same coin, as we say, um, and there's groups.

So Thomas Eichert at, uh, Stanford, he's actually working with his students on these topics. Uh, again, very grounded in formalism and theory. Well, that's from the logic side of things, of course, and causality wouldn't be much different though. Um, but yeah, that's, that's also a bit missing actually, because, uh, yeah, uh.

I guess there's a lot, as you can tell. 

You said about abstractions. Can you [00:09:00] tell our audience a little bit more about abstractions? Why would they be important in causal reasoning? And where would be the place where we could find them useful today? 

So, to me, as a computer scientist by training, abstraction is like a key concept.

If I think of abstractions in the causal sense, so if I think of this original paper in 2017 by Ruhmstein et al. at UAI, um, It was not even called abstractions. It was called like, I mean, in the work, I think they used the word abstraction, but the title wasn't abstraction. It was something like the causal consistency of structured causal models, right?

So. You look at two different models, two different structural causal models, which is the centerpiece of Perl's formalism, and you try to equate those. Under which conditions can you, can you make them equal? But in a sense that you have different levels of abstractions. What does that mean? So, you have low level and high level variables.

They give this [00:10:00] very nice example of something like, uh, Cholesterol in your body, right, the total cholesterol, that will be a high level variable. And actually, if we look more closely into the biology, then, you know, people know that there are these lipoproteins, right, HDL and LDL, and they are kind of composing this total cholesterol.

And actually, only one of them, in a sense, is bad for you. And so, these would be low level variables, and now you try to equate them to high level variables. And the tricky part about the causal stuff is that, because in post formalism it's interventionalist, right, so you have interventions and, um, you need to make sure that, you know, you don't just equate the variables correspondingly, in this case LDL, HDL, to, uh, as a sum to, to, to the total cholesterol, but also that you respect interventions which are possible, right?

Um, and that's abstractions in a causal sense as they define it. And then, you know, Sanderbeck is generalized from there with his co authors and so on and so forth. Um, And just thinking maybe on an intuitive level, broadly speaking, well, we [00:11:00] always talk about, you know, graphs, graphs, and stuff like this, graphs consist of, you know, variables that are somehow connected to each other.

But where do, like, we always ask, where do the graphs come from? And I have a counter question, like, where do the variables come from, right? What is even a causal variable? I remember, uh, One researcher from Amsterdam, Taco Cohen, very famous also, I'd say, in the geometric deep learning, uh, community. And he's also doing, uh, uh, more research in causality nowadays, and he was also asking, uh, on Twitter actually, like, what is the causal variable to begin with, right?

The concept somehow didn't make sense. And I think this is where abstractions, and maybe also just... The whole question paradigm of representation learning or representations comes into play, right? Like that's why it's important. We, we, we are able to, to look at things at a different scope, um, and, and, and still capture their characteristics.

And, and, and that's why I think abstractions are important because that's exactly, it's a topic, the study of exactly those things. And it's the first step [00:12:00] essentially, right? So if, if on an autonomous system, right? Like we also make sense of some kind of abstractions when. We interact with things. We say, Oh, this is, this is a car.

Oh, this is a whole set of car, which forms a congestion as we were talking in the beginning of the episode. Right? So, 

yeah, so, so we can have different levels of of description. I remember that in pearls book causality, not the book of why, but the bigger book, the town would have causality as I call it in his book.

He has a section. I think, um, Which, which is a transcript of one of his talks and he discusses the Russell's Bertrand Russell's argument against causality and he, uh, and he talks about this idea that causality could be a, a convenient shortcut. Do you feel that this idea of abstractions and those questions regarding causal variables are related to this idea of causality as a, as 

a shortcut?

Can you maybe just elaborate once more on a [00:13:00] convenient shortcut with respect to what? Thank you. 

So there's a Russell's argument against causality. And, um, in this context, um, in this context, peril says that, Hey, maybe causality with the causality in, in a sense, like, uh, on, on the fundamental six physical level, if I remember correctly.

So don't, Hey, don't shoot me if I'm missing, it was a little bit, uh, sometime ago when they read it, but. Yeah. He says that maybe we can also, maybe we can think about causality as a useful shortcut as we do in physics sometimes. Right. So we have some reasoning in physics and we say like, Hey, this is just a useful form about thinking about reality, um, that, that brings us value.

So in the sense I, I, I felt like when you talk about causal variables and levels of abstractions, it doesn't make sense, um, to talk about a single screw in your car. In any sense, right? If we talk about the property of, of, of [00:14:00] traffic in a, in a large city, because this is an emergent complex system, at least we could think of it.

This way. Um, so, so that was my, that was, that was the connection that I had in my mind. I was wondering if this is something that also rings above for you. 

Thanks. So, so that I understand the point. Uh, so there's two things which come to mind. So, first of all, I think what was already implicit in what you just said, um, you know, you don't care about a loose screw, you know, a single loose screw in your whole car.

If we look at a larger system. Um, what is hidden there is the thing that it's always with respect to some thing, right? Whether it's measurable or not, but it's with respect to some, some aspect. Uh, the second point was actually, um, with, uh, an argument by, by Bernard Shcherkov and others, right? So Jonas Peters, Dominic Janssing, Bernard Shcherkov.

They had this book on elements of causal inference, MIT Press, 2017. And, um, they, so Bernard, for instance, and Dominic, they are physicists by training. Actually, just like Judah, although Judah, I guess, went more into the [00:15:00] computer science direction eventually, um, and, and, and they have this table in the book, actually, where they talk about differential equations, right?

And kind of the specifics of the physics of the system as being like the most fine grained and most containing, um, um, level of abstraction, essentially of, um, how we can, You know, use formalism to capture things about reality and causality was the second rung on this one. So it was what they would call a useful abstraction, right?

So it was kicking out all the unnecessary detail, let's say. I always compare this to model based and model free reinforcement learning. So there's this prime example of You know, there's a baseball player hitting a ball, going at high speeds with a bat. Um, that person is not calculating on the fly, you know, probably, um, how they are going to hit at which angle and whatnot, right?

It's on an intuitive level, right? And so that's the argument for model free reinforcement learning, that you don't have to have a model, right? You just [00:16:00] need to know what to do in the next step. What's the best action, right? And so in that sense, if you then, you know, go into the second rung where causality is then placed, you kick out all these details.

Of the model, which are not necessary, but still are sufficient to and necessary to actually answer about your hypothesis, right? Actually also compare this always to, to, to neuroscience. So, so my little stint in neuroscience was in a subfield called connectomics, where they are concerned with actually building a map of the brain, right?

The connectome and it's a huge endeavor. It's on the neuron level. So you get all the, you know, the, the somata, the X axioms, the, the. And that's a very detailed, yeah, the dendrites and everything. Right. So it's, it's like on the, on the, on the micrometer scale, right. Nanometer scale, actually. Um, and, and, and, uh, kind of base idea as far as I understood always was that, um, It might be too detailed, but it's certainly sufficient to, to, to capture [00:17:00] everything, uh, about the brain's functioning, right?

And, and that was always the idea there, right? And, and so, again, it's about, like, your hypothesis space and how you want to cut it down. And, so... Peter said, I'll also make the argument that causality is a useful abstraction. And so I think they agree with Pearl and I would also agree with Pearl. So physics might still hold other interesting abstraction levels and so on.

But then again, that's, that's the choice, right? The choice you make and let's see. Eventually, if we go down the road, we'll find out if we are successful or 

not. That's very interesting. You mentioned reinforcement learning and in particular model of reinforcement learning where we maybe don't care about having A model that much, but we are interested in understanding what is the next best thing to do.

And this brings, uh, to my mind, the idea of the discussion that is all over the place happening all over the place now. Um, about large language models. What's your position on this? Where do you find yourself in the LLM debate, [00:18:00] especially from the causal point of view? 

Yes. So, on a general topic, right, um, also if I reflect on what my advisor Christian Casting would say, is that, um, maybe we should start, you know, being more transparent about this and actually giving our thoughts even in the papers, right?

So, I mean, certain conferences do that similar things already. So, for example, NeurIPS, right? It's actually doing, you know, these ethical statements for societal statements and so on and so forth, which you should include in the papers. And I guess for most of the papers, uh, nothing is happening there. Of course, if it's LLMs, then surely that's a big topic.

Um, and maybe we should also start talking about like our philosophical grounding, right? Like, where do we come from as, as, as, as persons, right? As, as the people, the human beings behind the science, right? Just in a brief, like, self summarizing way, right? So that we can understand, like, uh, am I a supporter of the scaling hypothesis with LLMs, right?

Or not, right? So, um, I think this would be useful. I mean, I'm always against, you know, labeling people into [00:19:00] boxes and stuff like this. I try to get out of boxes myself. Um, but I think that will be super useful, actually. Uh, this was on the general topic. For the LLM topic, I'd say that I'm placing myself personally as, um, an advocate of LLMs.

I, I like what they are doing, as I mentioned earlier. Um, but then again, um, from a causal perspective, right, I'm really just trying to understand, are they causal or not, right? So what we have seen, certainly, Um, I mean, many of you have experienced it yourself when you're using something like chat GPT or GPT 4, right?

Um, but also we in, in, in, in our, uh, setting of, of evaluating these models empirically, um, trying to be clever about the ways we formulate, you know, queries that could, you know, have some kind of causal implications or give some causal insights. We find that, you know, sometimes they actually performing pretty well, right?

So actually GPT 3 and, and, and, and other models were performing not so good, but then you go to GPT 4 and suddenly, well, it's performing well, right? Might be that they [00:20:00] already use the prior version of our paper in their training data now, you know, if things can happen. Um, but even without that, even if it's just the improvement part, right?

It's actually incredible. And then you're like, okay, so, so when it gets it right. Why does it get it right? And so that's the question we were asking. That's what we were investigating. In your paper, 

you propose this interesting formalism that you call meta SCM. You mentioned three different languages, formal languages that can be used to describe three letters or three ranks of the letter of causation, um, that Julliard Pearl has proposed in his original work.

What are your thoughts about this formalism? Or maybe let's, let's, let's take a step back. Can you tell our audience a little bit about how to understand those meta SCMs and what is it about? What is, what is this 

idea about? So, so I think the key aspect to understand, and it's a lot easier to understand first than meta SCM is this.

Correlations of causal facts. That's the conjecture we propose, right? Conjecture, because again, it's a [00:21:00] hypothesis that we believe to all true, although we don't have proof, right? The best we could do so far was empirics, right? And, you know, proposing a theoretical grounding in Perl's formalism, which would explain this, right?

Um, but no definite proof. So, so this is open, an open problem. Um, and this idea really stemmed from some intuition also Christian had, um, and then, you know. Moritz and myself, we picked up on this, was that, well, let's suppose they only learn correlations, right? Then, for them to answer some causal questions first, correctly, that would imply that there were some correlations on these causal questions and causal answers, right?

And then again, if we think about it a little bit on, on this intuitive level, a bit further, following our, our nose, in German we would say, our Riecher, huh? Then, um, it's something along the lines of, um, You know, I asked the question, does altitude cause temperature in the sense of if I [00:22:00] go up a mountain, does it maybe become cold?

And the answer is yes, right? And then if you have a textbook, which is talking about the physics of, of these magic mechanisms, right? Then you can sure bet that, you know, if there's a question formulated that The corresponding correct causal answer, at least in our world, there might be different words where, you know, the causal direction is reversed, um, holds true, right?

So, and, and, and, and, and following this intuition, essentially, we just state that, well, LLMs are actually training on, on, on such causal knowledge, right? Um, and, and, and just by thinking about it from a different perspective, it seems to make a lot of sense intuitively, because We do experiments, right? We find out this mechanism that, you know, the molecules start moving more slowly as you go up and so on and so forth.

That, um, yeah, there is a link from altitude to temperature. And then we write this down, right? This is like how knowledge is being passed on from generation to generation. We have Wikipedia. It's an [00:23:00] encyclopedia, right? For all kinds of different knowledge. And actually, we found this example, this very example, we have it in our paper.

Um, about lapse rate and, and these kinds of physical concepts, uh, you have articles on them. So the knowledge is there, right? And if we learn to predict the best next word, well, if, you know, I'm talking about, you know, altitude and temperature, um, then causing is probably the better word than not causing, right?

Or formulation for that matter. And so this is where this intuition comes from, right? And, and also then, with our formalism of the meta SEM to be a bit more technical, but for the technical details, please go into the... Um, and so there the idea is, uh, well, it's, it's this, it's this idea, but like conceptualized and formalized a little bit differently.

So what we are saying is, well, we have these rugs, right? And run two and three are considered causal. In, in, in the sense of 

interventions and counterfactual correct. [00:24:00] 

And so SEM though, is a very general form formalism, right? Even if we, you know, or, or especially if we look at work by, for example, Bongas and Peters and others who have, you know, looked at cyclic causal models and then they have really captured what, what we mean by structural causal model, right?

And it's still even in their formalism with the measure of spaces and so on and so forth, it's very general. Again, we, you know, as humans, we do experiments, we get some causal insight, and then, you know, save it in some textual representation. And while, you know, we can have this debate of understanding versus knowing, right, which is a whole philosophical thing, you know, related to the Chinese room argument by John Sahl, and so on and so forth.

Um, it's already this aspect of... Um, that we have this knowledge safe there as a representation and well, this could also just be a variable in our SCM now, right? Our SCM has a variable whose domain is all these statements, right? [00:25:00] And, or just textual representation and specifically it's these statements, right?

And so we just conjecture that essentially LLMs are training. on correlations you can find in the training data of these causal facts. And so now if you look from an SCM perspective, from a formal perspective, it's essentially always a dance between two models. One is the regular SCM, right, which is something like, you know, altitude, temperature, and so on and so forth.

And I find out that there is a link from variable A, which represents altitude, to variable T, which represents temperature, right? Um, again, an SCM is a more general formalism. It implies a graph structure. I'm talking about the graph now for the moment. And now imagine I have a second model, and this is what we call this meta SCM, which is kind of a hierarchy level above, right, which is, um, talking about this insight, about this whole graph itself, about the assumption, right?

That altitude that A causes T. And that's why we call it meta, right? Because it's [00:26:00] an SCM, which is on a meta level, on a level higher than regular SCMs talking about other SCMs. So if you depict it graphically, in a sense. Uh, you could say that, you know, you have these different rungs for the SCMs, but one is just shifted above, right?

And so the L1 of this one is connected to the L2 of the other one. I 

was curious, did your, your, let's say your prior regarding the, the, the conjecture of meta SCMs, has it shifted after, after GPT 3 was, 

was released? No, I'd say no. No? So if I elaborate a little bit on this, so yeah, I think my personal belief, right?

About I guess large language models. I'm certainly biased by, you know, I guess the nature of causality being a symbolic thing But also then, you know the perspectives of my advisor and my colleagues. Mm hmm So I'd say still like personally if I ever have to take a day, I would say like, okay for me it's already [00:27:00] this conceptual difference right between the textual representation of knowledge and And the data and the experimental side of things, right?

So, again, we also have this example, it's like this intuition, this non formal part of the argument in our paper, because, well, it's part time philosophy. And so, you know, a physicist has their setup, right? Like, we have the setup here with the microphone, the camera, and everything. And we record stuff, we measure stuff.

And then, you know, we look at the data, we think about it, and we conclude. You know, something that comes out of it, right? It's like the symbolic regression, as we call it in AI, that we are doing. Um, but then we write it down in textbooks, right? Like we have one beautiful book here as well, yeah? And, and, and that's now knowledge that, you know, if I trust the source and assume it's correct, essentially, and, and let's say for the matter it's actually correct, right?

Then, well, if I learn this fact [00:28:00] now, and someone asks me a question in a test, Whether I learned it myself by doing the experiments as the author did, or whether I just learned it from the author, doesn't matter anymore, right? So, the behavioral aspect is not, it's indistinguishable, right? It collapses to one point, essentially.

Um, and so... This 

is complex, because we also have the entire societal structure around this, right? To say that I trust the software, there's a number of conditions that need to be met. And we have heuristics to, to reassess our, uh, our trust in, in somebody's, uh, statements and so on and so 

on. Right? Exactly.

And that's why it's a philosophical topic. It's a deep philosophical topic. Um, but as we see, uh, even AI and LLMs are not unfazed by philosophy. 

Yeah, definitely. Um, so what's, what are your thoughts about [00:29:00] scaling laws? So, so we, we can see there are some papers from Microsoft research teams, uh, for instance, the two teams, uh, actually published papers on this, uh, showing that, uh, those larger models in particular GPT 4, you know, can do pretty good job in, in causal, I don't know if you can call it reasoning, but they can answer causal queries in a way that seems.

Relevant to us and much more so than the previous generations generations of those models. So one obvious path is to think about the size of the training data and how many of the testing procedures that's also your hypothesis in the paper could have been included in the training data of GPT 4. But it seems that also GPT 4 is just making just doing better job in many different tasks that require some form of world model.

Or, or at least we could think that they require some form of the [00:30:00] world model. What are your thoughts on this? 

I think I want to kick this off with, uh, just, um, talking shortly about one aspect, which is important to me, um, from Emre and Amit's and the others work. So in their work, right, they, they looked at LLMs and their causal powers, right.

And essentially concluded also something like that on the tubing and data set. Uh, you know, they perform very well. Um, and I'm very critical of this, right? So, um, um, I'm positive about the fact that, yeah, sure. I mean, I believe these results. I think they're reproducible, and this is scientific results. But I don't agree with the implications, or at least, you know, the way I've perceived them.

So... 

You mean the conclusions that were proposed in the paper? Correct. 

Essentially, I had the impression, at least also from my Twitter interactions with Amit that, um, we have seen some high, relatively high, uh, accuracy on, on this dataset. And so now we kind of, uh, [00:31:00] say like, wow, this is incredible. Um, but actually, if you dig down and look into this dataset, so the tubing and pair dataset, I don't know, it's like a bit more than a hundred, I think.

Uh, different pairs of X, Y variables. So it's a B variate data set. The task is to conclude whether X calls Y or the other way around. And you know, while there's, you know, examples which are very clear, uh, something like altitude and temperature, for instance, it's actually in that data set. And, and there's like also the, the Deutsche Wetterdienst data, which is, you know, like for different places, temperature and their corresponding altitude recorded.

But then there's very obscure pairs. So I recently gave a talk actually at the Paris Workshop on causality. Mm-Hmm. Uh, causality in practice. And, um, there I was mentioning this, so I have it on my slides, so I don't know the details right now of which one it is. Right. But it was very obscure. So, so there was something like, um, I don't know, the, something produced by some, you know, uh, object X at, at time y [00:32:00] that was just one variable.

Right. And then the other variable, I don't even know what the concept was. And, and, and, and then surely for, for such thing. Um, especially if say there's names in there, right, like, like just like in the famous Bayesian network repository, there's this Bayesian network where there's like the earthquake diagram where there's like, whether, you know, John or Mary, what's, you know, uh, call the, the, the firefighters if there was an earthquake, right?

Well, uh, the LLM does not know who John is, right? So as soon as we have this, I think this invalidates it completely, right? But in general, these concepts were just so obscure that, um. Well, it's, it's just a guess and it's a binary guess, right? So either XY or YX, right? And then again, we also just looking at accuracy, right?

And so if you look a bit more closely into details, you start to realize that, um, well, this doesn't mean anything that the output that I get now, right? Sure, it's, it's, it's cool to see that it works [00:33:00] quite well, but I think that's the only, only point that I want to emphasize on, right? That, um, We should not base it on, on, on such tech, not just like, like very simple metrics, like, uh, accuracy, but especially like we have to consider what data we are basing it on.

Right. This is, uh, for me, always an important concern, just like someone who's in applied machine learning, uh, will tell you that, uh, working with the data preprocessing, cleaning and, and, and things like that. Right. Uh, I usually the, the biggest challenge, right? Of course, with deep learning is the hope, right?

That's, you know, we have so much data that, you know, it just mitigates itself. Still biases will exist, but that's, you know, a human problem actually. So, uh, yeah, to, to conclude to that question, that's what, that was my only, only point to that one. And 

regarding the, the other tasks that we used in those, in those papers.

So there was like this counterfactual reasoning benchmark and so on and so on. Would you, would you say that this kind of argument will also apply applies there? 

Uh, no, I was just specific about the tubing and [00:34:00] about the tubing. Exactly. 

Great. So in this, in this context and in the context of your own work and also the, uh, the updates to your, to your recent paper, uh, what do you think about scaling loss going back to the original question?

Yes. Yes. 

So they are. kind of part of this whole discussion, right? So if you're like a connectionist or in general, like in neuroscience and functional connections and things like that, you know, on, on average, there's like a hundred billion neurons in our brain. That's a lot, right? Um, and each of these, and this is now actually the most impressive part, because actually if it goes just by cell number, you know, your, your liver would have more cells actually, right?

And, and also some other mammals would have more neurons, right? But we consider ourselves, I guess, more intelligent. So, um, it's not just the number, but the number is definitely crucial factor. It's actually the connectivity. So, on average, each of those neurons has like a thousand connections to other neurons.

[00:35:00] And it's a big social network and, uh, that's kind of the grounding also for, because these are neuroscientific results, right? And this is grounding for, you know, something like the scaling hypothesis that essentially, The models, if we put it into relation, they are not quite there yet, and scale is all we need, right?

Uh, there's this funny meme also, which, uh, uses also the bitter lesson by Rich Sutton, who's one of the, uh, pioneers in reinforcement learning, and it just goes like, uh, GPU go brr. And it should kind of depict that, you know, it's, it's just going like, yeah, GPUs and, and, and put them to the test, get the temperature up and overheating and then, and then you'll get there.

Um, my personal take is, um, I guess also biased just by the causality side of things that, um, with the, uh, symbolic part of, of the equation, um, which means that, uh, I believe that we still need ingenuity and, uh, conceptual development and that it's going to be a combination of both. And that scale is [00:36:00] certainly necessary.

I mean, we, as humans are certainly an example of that. Um, and also if we look at something like the neocortex, right, like the outer layer of the human brain, and then how it's just like even twisted, and that just gives you more highways, essentially interesting connections that you can form. Right. So connectivity is super important and we have seen so much success already with deep learning.

So, so why not just push it further? Um, but in my humble opinion, it's definitely. A combination of the both right just white like and given that I see the connections to the logic parts. I was saying like neural causal and symbolic in your paper. 

You also mentioned causal models in the context of black box.

Black boxes of. Uh, contemporary neural networks. We often talk about people, that people are good at causal reasoning. I'm a little bit skeptical about this, but this is a view that appears here and there in [00:37:00] the community. From the psychological and neuroscientific point of view, we have pretty good evidence that humans are sometimes black box reasoners for themselves as well.

Do you think that being white box is necessary for causal models and, and if so, why would 

that be? Very interesting question, deep question. So I'll have to think for a moment. So maybe I'll start with the human aspect. So I think I agree that, um, Humans might not always be as good as they might think themselves to be in causal reasoning, um, or in general in causal reasoning.

So, of course, naturally we are good in something like a personal experience. So, so my bike got stolen when I was still studying. Um, it was partly my fault because I know that chances arise when, when it's nighttime. Yeah. Um, and I still, still left it all the night. I stayed at university the whole day. [00:38:00] Um, and then when I arrived, it was a rainy day and my bike was nowhere to fall to be found.

Um, but yeah, in that moment, what I say is, ah, damn it. Right. If, if, if I had come sooner. My bike would still be here as a counterfactual, right? That's also the examples you'll find in the book of why, for instance. So in that sense, we are ready. Very good, right? But if we think of policy and more complicated topics, right, then it does not feel that agents, human agents act causally, right?

Um, if we go back to the, to the, to the other point, right? So, um, could you just repeat maybe once more, like the, the, the key aspect you, you, you want to discuss 

yet? Yes. So I was, um, I was thinking about, yap. Um, humans as not necessarily white box reasoners in, in general and in causal sense as well. And maybe to expand also a little bit on what you mentioned, there was, uh, there was, there were many interesting experiments in this area, but one that I had in [00:39:00] mind when I thought about this question, well, it was one by, uh, Michael Gatzaniga, uh, who was, um, experimenting with patients who had, uh, who's.

Um, hemispheres will disconnected. And so these patients were sometimes primed just one eye because eyes, as you know, but maybe some of the listeners are not aware of this. Um, we have this cross reference in the brain. So left eye goes to the right hemisphere, right eye goes to the left hemisphere and so on.

And if we. Destroy the connection between those hemispheres, which is, um, which are connected by a part called corpus callosum. Then those hemispheres are largely independent. And so the researchers were were priming the participants of the experiment by showing them just to one of their eyes. Some object and then they were asking them to say, tell a [00:40:00] story or maybe, I don't know, asking them for a reason why something happened and they were making up very plausible, very plausible explanations, causal explanations, why something happened, why the true reason apparently was just because they were primed, but they were not aware of this.

So I don't want to go into neuroscientific details like why they were not aware and so on and so on. But that's the, that's basically, basically the case. So this means that we might be very good in coming up with causal explanations, but they are not necessarily, they might not be necessarily relevant.

And, uh, sometimes we might have also good explanations and we might be not aware. Where they are coming from and they might be relevant. Um, so my question was given all of this and given that we as a machine learning or air community, we are looking up to, to how humans, uh, [00:41:00] function in all the different, different areas.

How do you think, uh, necessary is the, the white boxes versus black boxes, boxes to, uh, to causal machine learning, to causal AI, uh, to be useful for us? 

So I think it's, um, almost implicit also in what you said now that, uh, white box ness and explanations are kind of intertwined, but they're still independent concepts, right?

So, what we seem to do as a community is that we want to be white box because... We want to be able to understand these systems, right? Then we expect that from a white box system, we can actually do that. But who tells us this is the case, right? Like, even if we just go from the cognitive science perspective, right?

The study of the, the human cognition, right? Then, which is again, a twin discipline to AI, right? Essentially in cognitive science, this twin science of AI, um, we are already asking the question, what is an explanation? Um, it's not defined, right? There's definitions you can [00:42:00] propose, right? But then you can find counter examples as oftentimes in philosophy.

Um, and so we're trying to capture the thing which does the best job in a sense, right? And so even if you were to have a white box model, doesn't mean it's explainable, right? Uh, I think a very prominent example for this are linear programs. So linear programs, it's kind of like space class of, uh, mathematical optimization problems.

Um, you have like a cost function, constraints, they are linear and we can solve these, uh, systems. Uh, we have algorithms and, and, and also just the whole thing is white box in nature, right? Uh, it's, it's not like we don't know what's happening, but still, so it's 

white box in a sense that we. We are very well aware how the structure is, how the algorithm is structurally constructed and what is happening and how the signal is being processed at each of the steps.

Correct. It's completely explicit, right? But now, and, and even if I, it's very interpretable as well, if I go for [00:43:00] a simple example, but now I scale it up. Actually, this is one project we've been working on, which is called Plexplane here in Germany, funded by the government on, um, it's concerned with the like, uh, uh, energy goals of Germany, you know, to be kind of climate neutral by 2050.

And so a lot of, uh, machine learning in also, you know, energy systems is based actually around LPs, which are just very large. And now the motivation is there to understand. You know why, for example, we need more photovoltaics than we need, uh, market bots, uh, electricity, for example, right? And so here now you have a white box model, still a white box model, but it's at a scale which humans just cannot process.

I always think of this quote by Judah Pearl, which says like, um, you know, I can not even understand five variables, let alone a thousand or more, right? So that's why you oftentimes in the example see. only maybe up to five variables, right? And [00:44:00] so explanation does not go along white boxes. We, we tend to believe so.

And it's certainly. More accessible because of its explicit nature than black boxes, but doesn't mean it's the case, right? And so that's the hope and that's why I just want to touch upon this point because I think it's already independent. And then again, this whole aspect of explanations are not even defined right in, in, in some kind of Um ultimate sense, right?

Surely there's a lot of useful explanations But I think what i'm trying to get at is that uh, maybe this is a hot take now that you know Black box can just be fine, right? As, as, as long as we, you know, get explanations from that system that are faithful. And that do the job, right? And so, actually, we also have some works in this direction.

That's why I was coming also from different perspectives. 

And what are your conclusions from your work so far? 

So, our work is, uh, we propose a recursive [00:45:00] algorithm called structural causal explanations. We were about to put the structural causal in there. Why? Because, um, It's essentially an algorithm which uses the graph structure, but also quantitative knowledge about, you know, cause and effect relations.

For example, if it's linear causal models, then it's just the coefficients, right? And then, you know, you have a question, say we are in a medical case, right? This is the example we do in the paper as well. So, so we have different patients and record some kind of data about them, right? We capture them in different variable representations, say, for example, the age of a person.

This we can naturally represent as an integer number, for example, then we have some knowledge about, say, their nutrition, um, in some kind of numerical sense, right? Say a high value says that the person has a relatively good nutrition, right? It's balanced by any dietary standards, right? We can measure maybe some kind of, uh, um, key indicator of overall health and mobility.

And then, you know, you could have, you know, [00:46:00] a set of patients, and then say there's a patient called Hans, and now you see that, you know, Hans is an elderly gentleman, um, and his health is overall not that good, actually, also mobility and stuff like that. And now the doctor might ask the question, well, why is Hans's mobility so bad, right?

And that's kind of a relative notion, right? Doctors comparing it to some kind of standard, right? And that standard can just be like the average, uh, mobility of the whole group he's considering, or, or overall what we consider in society, yeah? And then what this algorithm does is it traverses recursively through the parents essentially, right?

And, you know, it would give you automatically then an answer. Which is something like, you know, Hans's mobility is bad actually because of his bad health. And then again, this bad health because of being elderly, right? And mostly because of that. Although actually the food habits, the nutrition is good, right?

So you have like both the structural knowledge, right? About the parents. [00:47:00] But also then, you know, it's causal because we ground it in a causal model. That's the modeling assumption, right? Of course, if that model is wrong, then, you know, your conclusions are not sound. But what we are proposing is, again, this causal inference part of things.

Now we look back to the beginning, right? The conclusions we make should be sound, right? So, whether what we base it on is true or false, that's a different story. But the sound conclusions is what we care about. And then, yeah, you have like the traversal, the structural, you have the causal, and you have also a little bit of quantitativeness, right, because, you know, food habits, if they are better than, you know, that usually improves health, right, um, if you're older, that usually, uh, um, Decreases health and so on and so forth, right?

So 

this algorithm also gives you information about so called actual causation What caused this outcome in this particular observation? So 

this is a great point that you mentioned because this work is also still under review And one of the reviewers was actually coming from this actual causation area and while we do have a [00:48:00] comment in our work on this We never Explicitly go in details through this.

And anyhow, I personally believe the papers already packed a lot. So maybe we should split up actually. And the actual causation is not happening here. Right? So you're talking about individuals, but it's not the actual causation formalism by Halpern and Pearl or nowadays mostly Halpern. So informally, 

you can, you can speak about this, but informally, it's not, it's not a formal.

So 

you can compare it to the formal. You can compare it, right? It's definitely individualist. Causation, right? Not type causation, which is because it's not talking about the population, but it's talking about individuals, um, which becomes very apparent in this particular example I was giving just now. Um, but, uh, it's, it's not an actual causation work, right?

It doesn't satisfy now. At least we haven't checked whether it satisfies the actual causation axioms. 

You mentioned this example of, of, of a patient and, uh, and patient being in some, uh, in some context, uh, their personal [00:49:00] context of. Of the diet, a context of the medical record and so on. This is an example that puts causality in a very, very practical place.

What are your thoughts regarding adoption of causality? We are definitely. More advanced with adoption in industry than we were even one or two years ago. I think things are moving very fast, but still there are some obstacles that people meet on their on their causal journeys or organizations meet on their causal journeys.

What are your thoughts on this? 

So I'm a bit split on this actually. So. On the one hand, I think there's so much work left to be done, right? And I guess that's the scientific perspective. That's the philosophical perspective as well, right? It just feels to me that there's so many unresolved things. So although we have made tremendous progress, right?

Um, And well, that's also a good thing because, well, that's my job. And so there's more things to do, right? You're never running out of things to do. [00:50:00] Um, so, so in that sense, I, I feel like, okay, it's not very practical, right? There's what, what feels at times like an outcry by the community that we need benchmarks, you know, to, to measure the whole, you know, machine learning AI thing of objectives, right?

Um, Nowadays, I'm a bit contrary to that as well, because I've read this book, um, The Myth of the Objective, right? Uh, which is talking about this idea of objectives actually being a false compass when they are ambitious. I'm not gonna define any of the terms right now. Um, please check out the book. It's a very nice book by, by researchers who are based in Florida, I believe.

Um, So, so, so this whole objective thing, it's, it's still kind of like, yeah, and that's what a benchmark does essentially. And, you know, the, the classical machine learning way. Um, so, but in general, yeah, I agree. You know, gold standards are somehow missing, but then again. Well, that's the whole point, right?

Like, we did physical experiments, we found out about some laws about other things, [00:51:00] and that's our causal knowledge now, our gold standard, right? Like, if we had this, then we wouldn't have a problem in the first place. So that's, that's my kind of like, I guess, academia perspective on it, right? Um, the other perspective is actually that I think we are already super successful, right?

I mean, sure, we can never evaluate, and again, this is the objective part, but we can also never, you know, um, Uh, we will probably never have like this ultimate truth kind of thing, right? That's again, a very philosophical debate, right? Um, relates to, to works by, by good or by contour and these tremendous mathematicians and logicians.

Um, so if you think about definite mathematical proofs that could be applied, 

so that's more from the academic perspective and this one, it's more about, um, so I was trying to, to get at the point that I think we are already doing very great. And it's not perfect by any means, and that this discussion of perfect and [00:52:00] truth is anyhow a different one, which we might never resolve, right?

I guess my personal take would be it's likely, very likely. And so that essentially if you look at practitioners in causal inference, right? Like, so for example, I remember going to Grenoble. For, for, for, uh, causality workshop and, and we had practitioners there who, uh, applying it to, you know, again, biomedical data and so on.

Um, and they had huge graphs, like discovered with causal discovery, right? Um, and, and validated with experts and, and things were surprising there, right? It reminds me of this one work by, by Petar Velichkovich and others at DeepMinds, right? Where they... Looked at, you know, uh, science X AI, right? Like math X AI, where essentially, uh, they didn't use any fancy techniques per se, right?

But they just applied them thoroughly and consistently and then always caught back to the mathematicians. And in this way, the mathematician could find some representation, which was more suitable for making the [00:53:00] next big step in, you know, a certain theorem, right? And that's why they got some new results, right?

And I see it in that kind of way that essentially It's our assistance, right? So what you there is also doing a lot later, right? Like trying to do this personalized medicine and these kind of things, which is really the, the, yeah, a scientist or the AI assistant in a sense, right? I guess that's also what a lot of researchers envision in the future.

Um, 

you mentioned better village coverage. How was your work with better? And what was the origin of? Um, most of your meeting and then your common journey, your, your, your journey 

together. Yeah. So Petter is a really nice guy, um, amazing lecturer. Um, I remember watching some of his lectures, uh, on, on, you know, geometric deep learning graph neural networks.

Um, and then I was actually just a participant at the Eastern European machine learning summer school, by the way, amazing summer school. So if anyone wants to, you know, go there, you should try apply. [00:54:00] Uh, I was part there. Uh, I was like, Once a participant and then the second time I was, uh, actually also part of, of the lecturing and both times were a blast from both perspectives.

Um, people are very nice and, and you learn a lot and, and he, and he was actually, uh, one of the mentors there, but you know, he was always also giving lectures and I just reach out. Like on the slack channel. It can be as easy as that. Right. And, uh, and then, yeah, we, we, we got together. I told him it's my intuition about, you know, how graph neural networks and structural causal models have a relation.

And that's where we picked it up. Right. And eventually wrote a paper. It's my most cited paper so far, although it's not published. And yeah, that, that, that is how, how the game can go. Right. Um, but we improve, right. We revise, uh, I'm sure eventually it'll go in somewhere. Um, but yeah, that, that's how it ended up.

And it's pretty cool. What were the 

results of this paper, this paper, this work that you, that you have in with Peta, something that you find the most important outcome of this, uh, this, this project with him. [00:55:00] 

So to loop back to one of the things I said earlier. Uh, when you ask me the question of what are the most important things I consider now in my humble opinion about causality and how the scientific research is going to continue, I was talking about, you know, thinking outside the boxes, asking the questions that have not been asked before, right?

Making connections, building bridges, not burning them. Um, and so I think this is the most important outcome of this work. Why? Because we brought a bridge between one of the hottest fields in deep learning, geometric deep learning, right? Graph neural networks and things, to causality, to structural causal models, right?

Like, We coming rather from a causal perspective, but then again, don't get it twisted. We also have you on the machine learning. It's the artificial intelligence machine learning laboratory after all. And that's, that's my original training. And yeah, we, we, we, we, we try to find consistent way of connecting these two, you know, uh, concepts, ideas, frameworks.[00:56:00] 

Um, and I think we succeeded. And this would be the most important contribution that essentially you have a bridge now between these two fields. You can, you can talk about these topics. You open a whole new, uh, research direction 

will link to the paper in the show notes as well. So everyone can read the paper themselves.

And you mentioned that graph neural networks are, are one of the hottest. Sub areas of machine learning today in your paper on, on causal parents, you size show there and the quad is early dramatic success followed by sudden unexpected difficulties. This is a description of the typical life cycle in, in, in machine learning research.

Um, what do you think about? What do you think, and how do you feel about those hype cycles that we, that we have in machine learning? Do you think this is something that is useful, uh, or maybe we [00:57:00] would be better off if we just try to temper down emotions a little 

bit? So I think what my advisor Christian would say, no, and I guess I would agree is like ride the wave and just push for it.

Um, I don't think it's stopping this time. So actually there was some study which, uh, suggested there's like 30 years cycle always. When the next AI winter, as we call it, historically speaking, uh, occurs, it seems 

like a corre correlational measure. 

And so this If we look at the IA history, right, it started arguably back then with John McCarthy, uh, Marvin Minsky and all these other pioneers who got together at the

1950s. Um, they were thinking, okay, if we put a. bunch of smart people, also including people like Claude Shannon, right? Like, I mean, information theory, right? Entropy and all these kinds of concepts. And in the sense that we could solve AI, right? [00:58:00] And well, this didn't work out, but at least they established a field in a sense, right?

That's closely connected to historically speaking, also just to computer science and its origins, right? Um, And then, you know, there's always this, right, the perceptron, you know, neural nets. And then, you know, again, it doesn't work. And there's a winter. Now we have the symbolic, but, oh, it's inflexible. It doesn't work, winter, right, and so on and so forth.

But nowadays, it feels, at least in my opinion, right, I have not lived in these times, right? I can only report from reports that I've read, um, and accounts that I've considered. But it doesn't feel like it's, it's slowing down. It's just increasing and Going to the bus aspect of it that you were mentioning with all the heat and everything, um.

Difficult topic. So, so now on the one hand, you can make the case then, well, we need this, right? And it doesn't matter. Uh, there's no bad publicity, right? Like in an attention kind of sense. [00:59:00] But then again, you can say, okay, well, we, we try to be factually correct, right? We, we endorse being reproducible and, and have certain scientific values.

So, so we should tone it down. Um, actually, I was just told by, by a colleague was doing a PhD in tubing and that, uh, their AI or machine learning department is, is, is actually trying to decouple itself a little bit from AI because of all these implications, because they want to be on the defensive. I personally think this is wrong.

Why? Because it should be discourse. It should be, you know, scientific debate and you can still. Not make these statements and, and separate yourself from these statements, but now to start separating AI from ML, I think that's not possible and should also not be. 

You mentioned about, um, putting many smart people together in one place in, in a hope that they will solve a problem.

So this is also, uh, somehow related to the, to the idea presented in the, in the Oppenheimer, the movie. 

Have you seen the movie? [01:00:00] Yes, I've seen the movie. Uh, I consider it more of a documentary. Yeah, I was aware of, of the history of the characters, uh, without spoiling now, because again, I see it more as a documentary.

These things happen, right? Um, I find it nice when, when we saw the scene at the Institute to Advance Study in Princeton, where Einstein was actually having a walk with Kurt Goethe. Right. Um, that's, that's a little detail, which really showed me that Chris Nolan and his team did an amazing job of portraying this movie.

So, so yeah, I watched it and I liked it. And the three hours went by in, in a, in a rush. Do 

you have any personal heroes among the people, the, the characters that were portrayed in the movie? 

So, so I think I should be taking a hot take now because that's the thing, right? With heroes, right? Like we are trying to idolize certain people and so on and so forth.

Um, and it has its cons and, and, and pros, right? As with everything. Um, but if I just take my personal, personal state of, it's difficult, right? I have not known these people, right? [01:01:00] I'm judging based off of just what I've read and seen. And of course the movie, You know, has these movie aspects to them of being more dramatic and, and presenting in a certain way.

Um, but I think really, yeah, actually the person I mentioned just now, Kurt Goethe, one of the godfathers of, uh, logic, uh, with his incompleteness theorems following the, uh, David, uh, David Hilbert program, right? Um, tremendous results, incredible, what's a life story. Actually, I've read a biography on, on his, right?

Um, I think the book was called the, the. the journey to the edge of reason or something, right? Even the title was just like, wow. Um, and actually there's a lot of funny stories about him. So, so we were talking about Naftali earlier. So while I was going to, to Munich for the, the causality for ethics and society, uh, workshop, uh, which was co organized by Naftali Weinberger.

Um, I was reminded by, by a person, David, uh, Talking also about Gödel that, [01:02:00] you know, when he went to, to the IAS, eventually in the U. S., uh, when he had to take this citizenship test, that he actually found a flaw in the constitution, in the U. S. constitution. And it was, and actually this was mentioned in the book, but I read the book a while back.

So, so I didn't remember the details, but he reminded me. And it was essentially that the definition of, of, of a year. Was somehow not clear, so, so you could redefine it, and then actually you could have the government, uh, well, uh, rule the people for this year, which would now be, well, practically infinite, right?

And so, and then, you know, people like Einstein and the others, right, had to hold them back. Don't mention this during your test because you just want to pass, right? You don't want to freak out the people. So, um, yeah, I think from that movie, particularly, I think Kurt would be my hero. Yeah, I think, I mean, his, his, his, his, his passing then was, was quite sad.

Um, you know, he was. [01:03:00] It seems depriving himself from food, right, because of worry of poison and these things and it's a harsh reality. I mean, for all these individuals, if you just look at the human side of things, that's very, very important to me as a researcher, as a person, right? Um, it's a harsh story, but the achievements, maybe especially in light of that, are just incredible.

So. If I had to pick, I'll put my money on Kurt. 

What are two books that were most influential for you, for you as a person? It can be during your career, but also your personal development. 

Definitely what I'm picking is... The book of why because it got me sitting here right now And I really it sparked a fire in me.

It really ignited. I think whatever you'd I had in mind He definitely achieved it on that day with me Because I think what he had in mind was, you know Inspiring people to to study the science of cause and effect for the second book. There's so many books So [01:04:00] I like reading lately. I've not read much but in the past I would always have my Kindle because it's easier to carry.

I also like physical books. Of course, I like them actually more, but then again, you know, you have backlight and stuff like this. So it's pretty good. Um, for the second book, it would be hard to choose. I mean, a lot, I love a book like from, from Monroe on, on like, what if, right? Like the XKCD comics and you have like these amazing contrasts, right?

Still, I also love a book like thinking fast and slow, which got cited. I believe way too much and, and, and also in, in, in adequately, uh, at times by, by scientists and AI right now, um, and, and, and, and, uh, in the recent past. Um, but then again, I'm also thinking of, of, of, of things from Goethe, you know, we actually sitting here at the university campus of, of Goethe University and, uh, Johann I mean, Tremendous poet thinker in general, right?

Like it really reflects this, I guess, this historic German, [01:05:00] um, traditions on, on, and values for, for, for science, for, for discourse, for philosophy, um, and you know, a book like Faust, right? I mean, we had to read this during school, right? But it was actually a fun reads because there's so much hidden in these books.

Actually, one of his lesser known books is the, best as Disha Devon, which is like the West Eastern Devon and Devon in the Arabic Islamic sense. Um, it's actually, he, he read the translation by Hafiz, uh, Persian, uh, you know, uh, poet God essentially. Yeah. Um, and, and, and, and, and, and works of, of Hafiz where, where, where Translated by, by an Austrian guy, which then fell into the hands of Goethe and, and he felt like this was his, uh, twin, his twin brother in, in, in the mind.

And so, so, so you see Goethe referencing Islamic things and it's just a beautiful combination of, of, you know, like. Multicultural thing right and historically and so maybe it's even a book [01:06:00] like that. That could be my second favorite But for first I definitely choose the book of why when you think about 

people who are just coming to causality What resources would you recommend to them in the beginning how to start 

so?

I'd say you go to discuss. causality. link. That's the link to the landing page of the causality discussion group. And actually at the bottom of the page, there's like a big section, a list with all kinds of resources. But just to briefly, for the sake of the podcast, reiterate on this, um, essentially it's the books, right?

So the causality textbook, standard book by Pearl, um, it's a bit big and dense, right? So I more use it as a reference book and read here and there. I would not go and pick one goal. I mean, Sure. Go ahead. Uh, if you're coming more from a machine learning perspective, I'd propose the elements of causal inference, right?

By Peter Sedal. Um, uh, amazing examples, uh, really like, I enjoyed the book a lot. Um, [01:07:00] other than that, there's a very nice survey by a friend of mine, Jean Cadour and, and, and his colleagues, uh, which is particularly for machine learning, right? But still it's, uh, again, if you care about this, then it's a great way to actually get to know.

papers in this area, right? Like tremendous works, uh, works which were influential and, and popular. Um, and then I guess there's some lectures. I love the lectures by Jonas Peters in general as well. He's a really amazing, uh, lecturer. Um, but also by Elias Barenboim. Um, so yeah, check out, uh, these references and then also just use the slack channel so you can just ask, uh, for anything particular, right?

Or reach out to me or Alex. So yeah, 

some people come to causality, uh, and they are really, really passionate about this topic, but maybe they feel a little bit discouraged by the way that sometimes causality is fought in a very formalized way. What would you say to these people? What would be your advice, [01:08:00] uh, for them to move?

To move forward. 

That's a bit tricky one, because while I would love to say that, you know, don't worry, you don't need it. I think you actually need it. So, you know, I think Duper would you, you need it. You need the formalism, right. Formal for causality. Because I think how UAP Pearl would phrase it himself is that it's a.

Language. It's a language, a formal language, a mathematical language with mathematical notation to talk about modeling assumptions, you know, about of the data generating process and, and, and, and, and finding, you know, Um, expressions and sound conclusions, right thereof. So in that sense, formalism seems necessary.

And then again, I guess, well, it's the standard that we have in the scientific community in AI nowadays. Um, and since it's also more foundational grounds, ground research, closer to the maths [01:09:00] closer. To probability theory, right? So we have measured theory and then a special case probability theory. And now within probability theory, we have causality.

And actually, there's been a recent work by by June and his colleagues from from to begin, we actually did an axiomatization off causality, but not the Perlian framework, actually, right? So they do interventions on these things, but it's a different notion, um, properly in probability theory, right? So, so satisfying even all the Uh, the precise and, and, and, and the pure maths.

So I think you don't get away from the formalism, but just talking about intuitively, I think you can get a lot from there, but then it's a double edged sword, right? Because there's a lot of wrong intuition, I believe, as well that we have about causation as famous examples in philosophy would 

also show.

Is there anything you would like to say to the community? Of people who are, I don't know, just starting, or maybe they have started a little bit, a little bit ago, [01:10:00] um, how would you encourage them to continue their journey? 

Thanks for this wonderful question, because I think this, as we call it in German, the Appell is an amazing thing.

Um, so I'll speak to the camera because I'll just treat it as the audience right now. Um, if you're passionate about this, if you think this is meaningful. This would be fun. I think it will be fun. So do it. Just do it. As Nike would say, it was a pleasure. Thanks for having me. It was a lot of fun. Until the next time, I guess.

Definitely. Thank you for staying with us till the end and see you in the next episode of the causal bandits podcast.

Who should we interview next? Let us know in the comments below or email us at hello at causal Python. io. Stay.

(Cont.) Causality, LLMs & Abstractions || Matej Zečević || Causal Bandits Ep. 000 (2023)