Causal Bandits Podcast

Free Will, LLMs & Intelligence | Judea Pearl Ep 21 | CausalBanditsPodcast.com

August 12, 2024 Alex Molak Season 1 Episode 21

Send us a text

Meet The Godfather of Modern Causal Inference

His work has pretty literally changed the course of my life and I am honored and incredibly grateful we could meet for this great conversation in his home in Los Angeles

To anybody who knows something about modern causal inference, he needs no introduction.

He loves history, philosophy and music, and I believe it's fair to say that he's the godfather of modern causality.

Ladies & gentlemen, please welcome, professor Judea Pearl.

Subscribe to never miss an episode


About The Guest
Judea Pearl is a computer scientist, and a creator of the Structural Causal Model (SCM) framework for causal inference. In 2011, he has been awarded the Turing Award, the highest distinction in computer science, for his pioneering works on Bayesian networks and graphical causal models and "fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning".

Connect with Judea:


About The Host
Aleksander (Alex) Molak is an independent machine learning researcher, educator, entrepreneur and a best-selling author in the area of causality.

Connect with Alex:


Links

Should we build the Causal Experts Network?

Share your thoughts in the survey

Support the show

Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4

 021 - CB018 - Judea Pearl - Audio

Marcus: Hey, Causal Bandits, welcome to... [broken record]

Alex: Hi, Causal Bandits. This episode is special for two reasons, and that's why I decided to record this intro in person. First, this is the last guest episode in the first season of the Causal Bandits podcast. It has been an incredible journey so far. We've published 23 episodes, recorded in 13 different locations across the globe.

We've visited Europe, Asia, and America. And over the last 10 months, the podcast gained over 30, 000 views and over 1, 300 subscribers on YouTube. And last month we crossed 1, 300 monthly downloads on podcast platforms. And I want to personally thank you for being here and supporting this podcast. It means a lot to me.

The second reason why this episode is special is the guest. His work has pretty literally changed the course of my life and I'm honored and incredibly grateful for an opportunity to record this conversation with him at his home in Los Angeles. Ladies and gentlemen, to anybody who knows something about modern causality, he needs no introduction.

He loves history, philosophy, and music. And I believe it's fair to say that he's the Godfather of modern causality. Ladies and gentlemen, please welcome Professor Judea Pearl. Ladies and gentlemen, please welcome professor Judea Perl.

Judea Pearl: Hi everybody. Good to be with Alex here. 

Alex: Great, Judah Thank you so much.

I'm super grateful for a meeting today And before we start I wanted to tell you that your works pretty literally changed my the direction of my life Because the book of why was where were you before? Well, before I will tell you in a second, I just wanted to share with you that I'm really grateful for your work in particular for the book of why, because this is how my adventure with modern causality has started.

How has this started for you?

Judea Pearl: With causality? 

Alex: Yes. 

Judea Pearl: I have an anecdote on that. In the junior class in high school, our teacher, Freudwanger, tried to introduce us to the wonders of logic. So he wanted to motivate us. Here is a logical problem, a logical paradox, that can be rectified only if you have the tools of logic.

If you don't have it, you fall into conclusions. So how did he motivate us? He told us a story about the smallpox in 1840 in France, where they started vaccination, vaccinating people, and then they discovered that the number of people that died because of the vaccination itself is larger than the number of people that died out of smallpox.

So people start protesting, maybe we should ban the vaccination. Not realizing that actually the data proves the opposite, that the vaccination is effective. The reason why so few people died because of smallpox was because the vaccination was successful. So they fell into this kind of wrong conclusion.

So he was going to show us that logic has the power, formally, to rectify your, uh, paradox. But he didn't, of course. It motivated us, and we were all, uh, enchanted. And only later on, in, uh, when I did, uh, artificial intelligence. I tried to work out this problem to see if the tools of logic could turn out straight.

They didn't, because there was not any language in which to say number of people that died from. How do you say died from? There's no way to say it. You can count the number of people who are dead, but from. A certain factor that's hard to formulate in the tools available at that time. So that was fascinating to me, and I looked for the language in which we can formulate it.

It took a long time, only recently we have actually the correct solution for this problem. Given the number of people, number of people, what can you say about the effectiveness of the vaccination? So, yeah, but what's the motivation? But, uh, I would say that recently, speaking from the time I've been working in artificial intelligence, the, uh, The challenge of causality only came in the 1980s, so after my book, uh, on the Bayesian network came out, which was purely probabilistic, you know, I already knew that I'm missing something.

I was saying that, uh, uh, managing uncertainty with the Bayesian rules is the top, the best way to handle things. But I understood that the success of Bayesian network came from a different source. It came from, from the simple, uh, reason that Bayesian network are constructed in the causal direction and not counter to it.

And that was the secret of their success. Mm hmm. So I, I felt already that I'm missing something. The 1988, when the book came out, then I already felt that I need to go in a different direction. Mm hmm. 

Alex: Yeah. And that was the time when you realized that Bayesian networks are not enough and you moved into the formalism of structural causal models?

Judea Pearl: Structural came later, in 1991, I believe, when, uh, Thomas Verma. And I try to, to formalize, and bulky, and formalize counterfactuals with deterministic functions. Purely deterministic, where the, um, uncertainty comes only when the boundary conditions are uncertain, namely the U variables. Or the error variables, but everything is deterministic.

And it was, uh, for us, it was a trauma because we constantly, we worked for two decades on, uh, probabilistic reasoning. Everything was stochastic. Yeah. And here we are formulating everything in deterministic functions. It took some time to convince ourself we are on the right track. Mm hmm. 

Alex: The teacher, the physics teacher that you mentioned, 

Judea Pearl: Yeah.

Alex: It turns out that he was also a teacher of Daniel Kahneman, who passed away recently. 

Judea Pearl: That's two weeks ago. And Daniel just mentioned to me, I think in the last year, he wrote me a message and said that he read this book. It's an anecdote about a teacher, and the name is not coincidental, because we also had a teacher named Forchtwanger.

By the way, Forchtwanger, the same family of Leon Forchtwanger, who was a very successful author in Germany. At that time, and, uh, also, it's a very huge, it's a huge family, it has many branches. I worked in my summer job at, in the uncle of Wolfgang, also. He was an electrical contractor. He used to wire, uh, public, uh, institution, uh, wire the electricity to public institution.

And I worked there as a cable puller. Yeah. So. 

Alex: Many connections. 

Judea Pearl: One word. Yeah. The word. Yes. But the interesting thing was that Daniel Kahneman also had the same teacher, and we started talking about our teachers, how good they were. And they were really something that we look back, we don't find teachers like that today.

You know, a teacher can come to class and without any notes, can talk about any topic. whatsoever. The, uh, the gross national product of Kamchatka. Okay. Everything, the new geography and economics and physics and mathematics. It was really, we were lucky. My generation was lucky to be part of this educational experiments.

Alex: Speaking of Daniel Kahneman, His work, uh, was focused, a large part of his work was, was focused, his and Amos Tversky's work, on the biases and errors, systematic errors in human, in human reasoning. When we think about intelligence from this perspective, we could probably hypothesize that there are some evolutionary reasons for these errors to propagate in humans.

Do you think that an artificial intelligence, general artificial intelligence system, which Uh, should be a system that always reason reasons correctly in formal sense, let it be logically or, uh, or causally. 

Judea Pearl: I don't know what the word should means. Uh, we human are constrained by our resources. We don't have the power to reason all the way.

So we reason quickly because we have to make decision quickly. So we are, we building in shortcuts. The shortcuts have errors in them. By definition, it's shortcuts. You didn't go and search all the way. You didn't think of all the factors. You found only the main factors, and you make a decision. That's, uh, it's laden with errors and biases.

It depends on the kind of shortcuts you make. You're talking about large machines. They do not have these limitations. In the resources. So I don't know, they may just not have those biases, because they can afford to search deeper and come out with a more, more reasoned conclusion, unless they are limited by other things.

They could be limited by the samples that they can access. Not everything is in the training set. But it's a different kind of animal, the new machine, because they have different computational limitations than our, than we do. So that's why I hate to speculate. We do not have the vocabulary or the metaphors to even begin to predict how they're going to behave in 10 years.

Alex: In your work with Elias barenboim, you have demonstrated that we cannot learn certain Structures, model sets and structures based solely on observational data. When large language models appeared in the, in the field, they entered the field. There was. It's a debate that is still an ongoing debate, uh, regarding what those models can learn.

So some, some people said, hi, they can actually learn the world models. Other people said, no, it's impossible. We know this from the causal hierarchy, theorems and so on. Now, but when we look at this, uh, from a slightly different perspective, for instance, perspective, uh, proposed by Andrew Lampinen from DeepMind.

He says those models can actually learn active causal strategies from passive data. So he makes this distinction, he says this is passive data, which means Some of this data is observational, but some actually is interventional because it describes experiments or some of this data is actually Maybe some interactions between people like in a discussion forum and so on and so on and so in his work He's he has shown that At least in simple cases, a large language model, transformer based large language model, can learn how to generalize, uh, causal, uh, causal reasoning.

What are your things about L, what are your thoughts about LLMs today and their limitations? You, you said you hate to speculate, so I don't want to pu, push you into speculation, but um, do you have any thoughts about the future as well? 

Judea Pearl: The basic difference is that the LLM, not language models, have a a new kind of data in their training set.

It's not data that comes from the environment. It comes from people who wrote articles, and these people have causal model of the world, and so we can learn from their models. We can just copy their models. If you copy somebody's model of the world, it doesn't mean that you learn the model of the world from data.

You learn it from another person who had this model. So therefore, they are not tied or constrained by the ladder of causation. They actually access the data. The data contains text produced by people who have models of the world. So either you copy them, but if you, if you don't have a, Explicitly match explicit matching between the query and then you compose them together by building a salad of associations among the causal model that were authored by people.

Fine. Well, now we are working with a salad. How the salad works, I don't know. It's a very strange kind of mixture. And we have to experiment with as a black box, a new black box, salad of rumors,

Alex: a salad of rumors.

Judea Pearl: Yes. 

About causal model. 

Alex: Yeah. No, that's a good one. I love this. I love this name. Yeah. So, so, so that's very interesting.

We have some papers showing that those models can perform pretty well on some causal benchmarks and then they perform very, very bad on other causal benchmarks. Perhaps they haven't read enough rumors. 

Judea Pearl: What are the causal benchmarks? 

Alex: Oh, that's a great question. 

Judea Pearl: The one, the toy problem that I have in my book.

Alex: Oh, no, people have like, different like, causal reasoning benchmarks nowadays. 

Judea Pearl: I don't love them. I don't like them. Because they are too big. What, the COVID 19, that's a benchmark? You cannot, you cannot handle it. You don't understand that that is so large. You don't have a ground truth. I'd like to work with the firing squad.

Mm hmm. Four variables. Okay? And a simple question. What if, uh, Rifleman 1 refrain from shooting? At least I understand what the ground truth, what kind of answer I do expect, so I can manipulate it and see under what condition it works well and when it doesn't. So this is my benchmark. And I can tell you the story that, uh, how I experiment with LLMs on the, uh, firing squad problem.

First it tells me, firing squad, it's illegal to shoot in California. So that was the first thing to be, you have to be very careful when you start firing. And so I said, forget about it, let's start talking about causal modeling, slowly, slowly got the idea. And he told me, yeah, if you're telling me that the rifleman listens only to the captain, and if you tell me to keep on all repeating the assumption, even though the assumption will be derivable already from the story, he said, if this is, this is, this is true, then.

You're right. If riflemen, when refrain from shooting, the prisoner will still be dead. If you prompt it more and more and tell him, but you have already the answer, why do you have to repeat? Eventually you'll get it right. So it's really an exercise in prompting. So you're dealing with a black box and you're asking what kind of prompt will give you the answer that you expect.

Here, we know what answer we expect in big problem. We don't know the answer, we expect. So I like to experiment with the firing squad. 

Alex: Yeah, so there's actually one benchmark that, uh, builds on top of this kind of approach. It was proposed by Zhijing Jin and Bernhard Schölkopf and other authors. They build Basically, an engine that generates stories like this, like, Yeah, with the ground truth as well.

And they generate stories on the associate on each of the rungs of the letter of conversation. 

Judea Pearl: The stories comes in the form of elements, factors, objects that we are understanding our everyday life or comes in the form of a variable, variable X, variable Y, and just names of variables or names that mean something to us.

a baby. 

Alex: That's a great question because it turns out that large language models perform much, much poorer if we use variables instead of objects. Yeah. Yeah. Yeah. Which in a sense, it's similar to humans. 

Judea Pearl: It means that they have the baby world, uh, under control. 

Alex: If you could advise a causal community today on choosing the directions for research, for the next 5 10 years.

What, what would that be? 

Judea Pearl: Personalized medicine. Personalized medicine. Individualized decision making. Because we have done some recent work, you know, with Scott. It's something we didn't think is feasible. Five years ago. And now we see that we can answer questions about, uh, probability of harm. And quantifying harm is very important in medicine.

As I told you in the lecture, it's also important in political science. Yeah? And quantifying benefit, uh, swing states and things like that. It's everywhere. In, in, uh, business. In marketing, you always make a decision on a particular situation, specific, and you want to compare doing it versus not doing it to this particular setup.

Mm hmm. Not population, but this particular setup. So we have now the means of answering this situation specific question. And that is, 

should be put into use. 

Alex: Some practitioners, uh, um, referring to, to your works with, with Scott, uh, say that it might be difficult in certain cases to sample from population.

Yeah. To get the observational data that is sampled from population at random or represative. 

Judea Pearl: Actually they argue that when you sample from population, it's a different kind of population than the one which, on which you experiment because the, in order to get to experiment, you have to have all kinds of condition, parents can agreement.

You have to sign papers and you're being incentivized by amend. Okay. Good treatment. So the people who are selected for randomized control trial are different kind of people than those that you can get on a telephone book. So that is maybe correct for many, in many exercises, both experimental and observational data from the same population.

For instance, if you are doing a causal discovery and you have a graph, the graph generates for you. The observational data, that's p of x, y, z, together with the interventional data, which you can get by, uh, the do operator on the graph. We have in the same population, in the same, uh, mathematical object.

Yeah. Can generate both. I mean, you have both and you can narrow the bounds, bounds and get good bounds, informative bounds on, uh, individual behavior. Yeah. 

Alex: Yeah, I think this is a very, very promising. Um, direction, what do you think is the cause of the rift that we have in the causal community? So we have people, uh, who are maybe coming from your tradition, graphical tradition and identification.

Judea Pearl: How's it in here? Anyone else? 

Alex: Well, there are trialists, right? That's right. There are people in epidemiology and people in economics. And, you know, I must say that from my perspective, I see today that more and more people are using graphs. There's, there's virtually no one who has, who is not using graphs today, uh, but still there are those different traditions and different modes of thinking.

And sometimes people in those different sub communities are just speaking different language, using different language, speaking about the very same thing, the very same issue. And this is problematic, I think. Because we lose the efficiency of research. 

Judea Pearl: You cannot communicate. Yeah. Sometimes we duplicate work.

Alex: Yes, exactly. 

Judea Pearl: And sometimes we just, uh, object to each other results on the basis of misunderstanding what the results are. Yeah. Yeah. So what, what can I say? We have a few theorems that we should, uh, consult. And the theorem says that the potential outcome framework is logically equivalent to the, uh, structural causal model.

And, uh, which means that, uh, A theorem in one framework is also a theorem in another. An assumption in one framework is, can be articulated as an assumption in, uh, in the other framework. So it's a logical equivalence. The only difference is whether you feel comfortable, uh, articulating your assumption or your knowledge in general, yeah, in one language or in another.

And it so happens that, uh, I feel more comfortable and everybody that I know feel more comfortable. Even people who are doing potential outcome feel more comfortable. They're actually doing, they're doing graphs in their mind and they put down equations to pacify the reviewers in their field. Because no one can think in terms of, uh, conditional ignorability.

Mm. Oh, you can. You can put down assumption in conditional probability format without even thinking whether it is defensible or not, because it helps you get the equation, get the identification right. Okay? So you can do that. And many, many are doing that. But when you come down to ask, well, can I defend?

This conditional ignorability assumption, uh, does it make sense in my setup? I don't know of any mortal that can answer this question. 

Alex: Yudha, what do you think is the next thing in artificial intelligence? Where should we look? 

Judea Pearl: Everybody's looking at large language models now. No matter what I say, people are going to play with large language models, which is fascinating.

I'm using them too, you know, for their purpose. Yeah, they help me, uh, improve my text when I write, uh, papers. And they allow me to find new, uh, metaphors. I can ask him, can ask him, give me some poetical expression for this idea. That's beautiful, and we know how to, if you're coming from causal inference, we know how to use large language models to benefit, because it's a functional approximator, and once we can express the answer to our queries in a collection of functions, we know what engines will approximate those functions.

on the basis of sampling from data. Yeah. And we know how to put it together. The theory tells you how you put together expression from one, one do expression from another do expression or probabilistic equation. So we know how to compose them. So there is a hybrid of causal inference and large language model that we know how to handle today.

What's this future is going to be? How to speculate? I really don't know. Well, we have many areas which automatically are begging to be explored. Automatic scientists. Scientists that can decide on the best experiment to conduct next, because there is a dilemma in theory. Free will, it's another thing which I, I touched on in the last lecture.

the book of why, which I think will be solved, solved, I mean, solved to the, to the satisfaction of all the philosophers that call it the curse of science or scandal, sorry, scandal of science. That's what philosophers call it, and it's going to be solved because the, the illusion of free will evidently has some, Survival value.

Otherwise we wouldn't obtain this, we wouldn't get hooked on this illusion for so many decades, so many centuries. So evidently it has a survival value. When we exercise intelligent machine of a large number of environments, eventually those who will have the illusion of free will will survive and will overcome the others, right?

So, and we learn. What is the computational advantage of this illusion, and then we can program it. And what will be the answer? There will be computer systems that act as though they have a free will, even though they are just nullifying. So they don't have free will. Yeah. They will act as though they have free will, and they will understand us.

When we talked about our free will. So it will be a trustable system. 

Alex: You mentioned the, the idea of the automated scientists. Uh, sometime ago, just a couple of weeks ago, I had a conversation with a person from one of the national labs here in the U. S. And they told me that they cannot anymore afford using brute force in scientific experimentation.

They do stuff in chemistry, physics, and so on. 

Judea Pearl: They cannot afford to do what? 

Alex: Brute force, which means like just, Do this experiment, do that experiment, you know, so they want something smarter. And the idea of automated scientists is something that they are looking for. And now some of the people that are trying to construct automated scientists like this today, one of the challenges that they face is that it's difficult somehow to encode in a model like this.

The all the possible, the entire possible space of. Starting points. Another thing is that they say that it's difficult to make the model understand which direction it should go into. 

Judea Pearl: I don't understand it. Okay. 

What is the input? What's the output? I think like engineer, you know, I was electrical engineer.

Alex: Great question. So I understand that input is It's some knowledge that we have today and some measurements that are coming from different measurement systems. And then what I know people are trying to do, for instance, is to use some kind of Bayesian optimization. Like a Gaussian process or something like this, in order to, to look for another feasible hypothesis.

Judea Pearl: Let's take an example. I don't, cannot think in generalities. You believe that malaria is caused by bad air from swamp, okay? What does it take? I mean, I don't, but No, but we used to believe. 

Alex: Yes, of course. Hence the name, right? Right. Malare. Malare. 

Judea Pearl: Malare. And eventually, we came up to the conclusion, no, it was a mosquito called Anopheles, whose bite transfer removed from one model to another.

Is that a good example of what they're looking for? 

Alex: Yeah, I think 

so. I think that would be a good model. 

Judea Pearl: Yeah. So, in this case, the survival value lies in either taking a mosquito net or taking an air mask, which is going to be more effective when you go to the swamp. Yeah. We have a survival value. One model.

Begs a different act, preventive action than another one is correct and one is not. How, how one can, um, learn it or hypothesize that there might be some something else. Uh, I think the way to go about it is local perturbation on existing model. Mm-Hmm. , you have a. A link between going to the swamp and malaria.

And you ask, is there any intermediate variable that we can intervene on? This is a general scheme, okay? So, now we're going to experiment with different intervening variables, okay? What could it be? Local perturbation on the model is something which corresponds to, it's a, it's a troika example of a changing paradigm.

But it might just work from this local perturbation. You can test one by another and, uh, outcome the correct model of the invention of the, uh, vitamin C. Yeah. Link experiment. 

Alex: So I understand what you're saying is that we can come up with a hypothesis to intervene on the existing hypothesis on the existing model that we have.

Judea Pearl: Yes. 

Alex: And then basically perform a simple experiment, right? And see what the intervention leads to, to what changes in the outcome. Um, sometimes I think though that it might be difficult. If in the malaria example, you know, we have somebody thought at one day, Hey, there, there are mosquitoes there. Maybe we can, change something related to the mosquitoes and this lead, this person to design the intervention turned out to make sense for all of us today.

But sometimes we just don't know. We might not know, probably depending on the case, we might not know what is the set of possible interventions or like which subset of all possible interventions we should look for. What are your thoughts on this? 

Judea Pearl: It has to be situation specific. Of course, if you see mosquitoes, then the hypothesis is generated.

Perhaps there is something to the mosquitoes. But you also see other things. You see green grass in the swamp, right? So maybe the green grass. And then comes the power of, metaphors. We know that mosquitoes are capable of doing certain things, and green grass is not capable of certain things. And there is also the germ theory of disease that comes into play.

And we are able to put all this together. The systems that we have now are not capable, but the metaphor is another. Powerful, uh, uh, isn't by analogy. It is another powerful mode that has not been tackled in artificial intelligence. 

Alex: It sounds to me, resembles to me the idea of affordances. Like the object has something.

When I look at the chair, I already think what I can do with the chair. 

Judea Pearl: Right. Yes. Yeah. 

Alex: There's like a lot of contextual knowledge that I bring into the scene. What do you think are the main blockers today that stop us from As a community applying causality more broadly in industry. 

Judea Pearl: There are many blockages.

Funding is one, uh, language is another, training is the third one, lack of platforms, oh, you can mention. But I think mainly it's attention. People do not pay attention to the limitations which are predicted by causality or by the virtue of the knowledge that we acquired in causal reasonings. If I talked to a machine learning person, he wouldn't know what I'm talking about.

Everything is sampling, um, interpolations, carefitting, and everything is in the data. It's a barrier, it's a big barrier. It's a, it's a matter of changing paradigm. And as, uh, Thomas Kuhn Keenly said, you can't expect people to change paradigm every 10 years, and now we have to change it every 10 years.

People are paid and they are rewarded by continuing to work in the same paradigm, and they are paid to continue that and not to change it. So you can't expect them to change language. Look at all the statisticians, they are still doing regression. I 

know at least one statistician that doesn't agree with your diagnosis.

Because he said, no, we are statisticians, causal modeling has been our guideline from day one from Fisher, right? And we are doing it. Okay, fine. You pick up. A statistics book, and show me in the index the word cause, you wouldn't find it. 

Alex: I mean, I agree with you in principle, you know, when I had statistics classes.

For instance, we were learning about SEM, Structural Equation Models. Right, yes. So, this is a model that can represent causality. Yes. In general, at least in certain cases, some modifications, some variations of those models and so on. 

Judea Pearl: There's still a journal of Structural Equation Models, right? 

Alex: Yeah. Yes. But the way, I was taught, I was taught it.

So my first question was like, okay, so this is, we have some covariance matrices, fitting and some blah, blah, blah, and some other statistical stuff, which is, well, it is very important, but how, how is it different from any other statistical method? That was my question. How do I know if I fit this model to one structure versus another structure, which, which one should I pick?

And what I was told in my classes was that. Whichever has better fit to the data, which is based on information criteria measures by Bayesian information criteria or something like this. But this is a terrible advice, actually. And we know that we can build a statistical model fitted to causal data in a way that is not correct and get actually better fit to the data than a causal model.

The thing is that the non causal fit will generalize, will not generalize, under distribution shift. And the other one will. So, so, in the sense I agree with you that I think we need a shift in education as well. Or maybe primarily we need shifts in, in, in education, how we teach people. 

Judea Pearl: Every college has a statistics department.

Not every college has a, has a statistics department. Causal influence department. If you just change it to, wow, Satin should be a special case, a special branch of causal influence dealing with the lower level of the, of the la I'm not saying it's simpler, it's, it's a lot of intricate questions. You can ask about efficiency of estimators and things of that sort.

Within the level, but you should have in mind the entire ladder, so you can move and ask reasonable, meaningful. By the way, when I So, I looked into a structural equation model. The answer that I got was it's a parsimonious representation of the covariance matrix, which also is meaningful. That's what the answer I got from Peter Bentler, who is the author of several books in structural equation model.

Yeah. So the answer was, yeah. What is structural equation model? It's a parsimonious representation of the covariance matrix. Nothing else. But it's also meaningful. Why is it meaningful? The word causation did not appear. It's meaningful because it represents your causal model. Namely, your knowledge. Yeah.

What we call knowledge is really not just data, it's data plus the ropes behind the data. That is causal model. Structure. Then, so that 

is what makes it meaningful. 

Alex: The last, uh, keynote presentation during this year's CLEAR conference that took place here nearby at the UCLA campus, your campus, was by Elizabeth Tipton.

And she talked about her experience with so called evidence based framework that heavily relies on randomized controlled trials. She gave some examples, very eye opening examples from the field of education and interventions And one of the challenges that she talked about was that although there were randomized control trials behind the interventions, people didn't want to use those interventions because they were asking a question, how does this apply to my particular school?

So they understood that there was some intervention, but they We're not convinced that the results of this intervention can be translated to their, to their use case. And I found it very interesting and we know that we have. Some results from your work and your students work that can help us Translating such interventions to particular cases at least in certain case 

Judea Pearl: They are right in the sense that in order to use our method the method of data fusion Taking results of several a randomized experiment and applying them to a new environment using the idiosyncratic characteristic of your new environment That relies on assumptions It relies on having the graph, which means it relies on level two assumptions, which are not all of them, uh, verifiable by, um, randomized experiments.

So, they can say, ah, we don't believe that, okay, we just believe on randomized experiments, but we cannot apply it from one environment to another. And what we are saying is that there are people, people are sometimes are capable of characterizing idiosyncratic properties of their environment. And if you can do that formally, then you can transfer information from other experiments to your environment.

That is a trade off here. Evidently, people do not use what is available. In other experiments, they simply, they, it's a waste. So, um, uh, and people have a very good idea of what's special about their environment, okay? Like, we have, uh, homeless, so we have, Whatever character is, yeah, so they have to model it, and then, but they have to be familiar with the art of modeling.

And these people are so, uh, are not, are not trained in causal inference. So they are not familiar with the available technique. I made this point on, in my comments to Deaton and Cartwright, uh, paper. On the limitation of randomized experiments that we do have today tools for combining experiments. Yeah, 

Alex: as you said, maybe there's a challenge again in how we teach people about causal inference.

So in a statistic, in a traditional statistics course, You, you will get at least an overview of what a, an RCT is and so on. But often we teach people that RCTs are the golden standard for establishing causation. And we are not talking about the limitations of RCTs. We're not saying, Hey, Um, this is the rung two, right?

This is the interventional questions. These devices can answer interventional questions, but not counterfactual questions. And so then it causes a lot of confusion because people, some people do not distinguish between rungs two and three. 

Judea Pearl: And the areas where they need level three immediate areas where they need is to find Causes of effect.

You are having, you know, you see the effect and you ask what could be the factors that account for it, okay? Um, and then we go to the question of, um, direct cause versus indirect cause and distinction between necessary cause and sufficient cause. Necessary and sufficient cause. All these quantities that answer the question about causes of effect.

Okay. Those are beyond the reach. of statisticians who are only interested in randomized experiments. They simply cannot answer. And I give you a simple example of, um, having a drug which, um, has no effect. But you don't know if the no effect, randomized experiment can show no effect, you don't know if the no effect means no effect of every, any individual.

Or kill some and save some. Kill 10 percent and save. A randomized experiment will not be able to distinguish between the two models, but we have tools today to distinguish between them. For instance, tools that combine observational studies. And randomized experiment from these two, you can get right away bounds on the probability of harm, probability of benefit for each individual.

Yeah. So it's, we are missing the people who are, who are blind to level three are missing on those questions. And we know exactly what questions they are missing on. Yeah. Okay. Some of them will tell you, we don't need this level. We don't want, we never want. Uh, Rubin wrote in one place, Causes of Effect is serious for cocktail party.

It's not scientific. Oh. Yeah, because you can always go, what is the cause, you go to the first cause, you go to the big bang. Okay. So it's a cocktail party question. 

It's not, 

Alex: well, but we have Markov blankets and all this stuff, right? We don't need to go infinitely far. 

Judea Pearl: Oh, there are questions. 

Simple question.

I'll give you the possible cause. I ask you to what degree is it possible? Uh, sufficient cause like the, but for mm, criteria for legal, in legal setting. Would the injury be okay? But for the actions of the accused, eh, with a question, which is the can ask in legal setting and we have the answer for it. We have a way, a way of formulating it.

We know under what condition we can identify what kind of data we need to. The tools are available, the education is missing. I don't think they teach it in law school. Even in law schools that have the audacity of appealing to statistical method for answering questions. And some of the lawyers are interested.

Those lawyers are still unaware of the available tools in, uh, in causal inference. 

Alex: I must say, I want to give you something that is a little bit more optimistic than what you said. I got an email from a lawyer who was actually talking about probability of necessity and probability of sufficiency, and he asked me for a review of his, of his work.

So, it happens. It happens. Maybe slowly, but, uh, but it happens. 

Judea Pearl: It will take time. I hope the, the advent of the large language model does not, uh, cover the result inference. Sometimes it may cover and it may, uh, expose them. 

Alex: Cover in a sense that 

Judea Pearl: cover means bury them. Bury them. Yeah. With the attention going into large language models rather than to causal inference, which is the foundation of our knowledge.

I hope it not covers it, but it exposes it and amplifies it. 

Alex: Hopefully. I think many people are getting 

more and more aware about the limitations of large language models and the questions, when they ask the questions how to improve them, and how to improve them. Inevitably go into causal questions. 

Judea Pearl: The causal questions give you the framework, the kind of question you can answer, the tools are available.

We know so much about human knowledge because of the causal inference results. This is our knowledge. So in every aspect that requires a user interface with a machine, you have to take into account the user structure of knowledge. very much. And that's where it's going to play a role. 

Alex: Do you think we will find or discover methods to automatically and efficiently learn user knowledge somehow?

Judea Pearl: To learn user knowledge? 

Alex: Yeah, instead of encoding it. 

Judea Pearl: I think 

so, yeah. It's just another environment. If you can model the environment. Why shouldn't we model the user? It's another, uh, black box. 

Yeah, another set of variables. Contextual variables. So, before we finish, I would like to do something new. So, I Sing.

Alex: We can sing at the end. I'm happy to. Um, so before I came here, I published on LinkedIn and Twitter. A question to people, if they could ask you one question, what would that question be? And so I've just checked and there are some comments from people. So I would like to read maybe one or two questions from them.

Okay. One question comes from Subhajit Das from Amazon. And he asks about the access to data and he says, the access to data grows and AI is becoming more polished than ever. And his question to you is, if you think that the day is near when true causality can be established with lesser cost. Assumptions, or is it the other way?

So I understand that he asks. He says, like, we have so much data now. 

Judea Pearl: I don't think that I can make up for assumptions. Data just gives you data in the asymptotic case. Assuming that you have infinite data. So what do we have? Probability distribution and true. Most of the work we, we, in causal inference, we assume already conditional distribution or the distribution, right, joint distributions.

So, the limitation that we found are limitation in the asymptotic case when you have infinite data. The kind of data that you get makes a difference, but not the size. 

Alex: So that would be what we discussed before, right? 

Judea Pearl: Before, you cannot go from level one to level two if you have infinite data of observation only.

Passive 

observation. 

Alex: Yeah. So this is about what is the content of the data rather than the size of it. 

Judea Pearl: Absolutely. 

How was the data obtained? 

Alex: Great. Let's see another one. Andy Wilson. asks about causal discovery. What do you think about causal discoveries? He always got the sense that, your crusade was that, causal discovery shouldn't exist, but it also seems promising.

Judea Pearl: Not shouldn't exist. I simply, I focus my work on the questions, on the different questions, not how to discover it. But as soon as you have it, what can you do with it? The vision of work. I let the CMU people continue the causal discussion. I watch it once in a while, achievement and the improvement that they came up with.

And I know there have been a lot of recent improvement, making the assumptions less stringent and others. But I just focus on what can we do with it. Instead of what, uh, how to discover it, I, look, you have to have a division of labor. 

Alex: Beautiful. Okay. Some people just ask you a question. Why? 

Judea Pearl: I think silence would be the best answer.

If you don't articulate your query, don't expect a correct answer.

Alex: I'll leave it without a comment. Okay, so now we have also some questions from Twitter. How will causality be learned by future AGI? 

Judea Pearl: In the same way that we learn it now, in the same way that people learn it. I look at all the avenues where we can learn causality. What does it mean to learn causality? To learn a causal model, right?

On the three levels. Uh, the same way as we learn it. Level two is learned by experiment. Level three is learned by hypothesizing. Uh, hypothesis. And, uh, imagination, um, we have the structure for that, so I hope it will learn by the same way that people learn it, like the way that scientists learn it. And I think we are doing very well in science, except it will be accelerated, because we'll have a greater access on, on data and, and computation.

Alex: The last question that we have here. Comes from Boris, I know you from Vancouver. Why bother with DAG based identification instead of focusing or on a more accurate specification of 

scms? 

Judea Pearl: What's the difference? I mean, DAG is just the abstruction of SCM. So why focus on a DAG? What do you mean, uh, on level two identification?

Alex: Yeah, I understand, I understand. So I understand why to only focus on DAGs, so the structure, rather than a richer specification of SCMs. 

Judea Pearl: Elias, just give me paper on identification level three. So I have it, I haven't read it yet. 

Alex: Good, we'll link to this paper in the show description so everybody can refer to it.

Judea Pearl: What can you identify in level three? The functions themselves. 

Alex: Yeah, it seems that, that's the additional part of information that we get there, right? 

Judea Pearl: But since we don't have a way of testing it, when we have bounds on them, that's good. So we, you're right, we have But combining experimental and non experimental data, we have bounds on queries from level three.

So, yeah, we can, well, to the extent that the bounds allow you to learn, especially when you, when the bound collapsed into a point estimate. Yeah. Which happens sometimes. Which happens sometimes, yes. And we know when, that's a nice thing. 

Alex: Yeah, that's a great thing. That's true. Judah, what question would you like to ask me?

Judea Pearl: That's a tough one. What 

did you do before you got into causal inference? Into causal inference? So Were you happy?

Alex: Well, I have an additional dimension of excitement today. But what I did, well, I was doing stuff related to psychology and neuroscience. And before that, I was a musician and a music producer. 

Judea Pearl: Oh, that's what made you play the piano. It's, that's when you play the piano. 

Alex: Yes. 

Judea Pearl: That's good. So we can end it with a good song.

Shalom Aleichem, Malachi Asharet, Malachi Elohim, mihim melech, malachim, malachim.

Alex: That was beautiful. 

Judea Pearl: It's a beautiful, good song. It's a beautiful song. 

Alex: Absolutely. I wanted to ask you two more questions before we finish. Okay. The first one is, who would you like to thank 

Judea Pearl: my teachers in, uh, high school? And in college, it always gave us the, uh, illusion that we can be contributing scientists.

That we are not just users of science, but makers of science. Yeah. That each one of us can contribute something. That's, uh, it's a great gift. For me. So that's one thing that I want to go else, where I could mention a hundred people, my students who pushed me into areas that I didn't want to go to, against my will.

And um, they taught me many new things and um, What, ah, they say in the Mishnah, I learned a lot from my teacher, but more so from my students. Thank you. What constitutes a good life? I think it's the illusion that you are Part of a chain. And it comes in different level. I see myself in a part of a chain, okay?

There was a big giant, Aristotle and all this. And I did all, I added a pebble there. That's good life. And the same thing goes for the family, right? My grandfather gave to my father. I give to my children. And that's, yes, it's an illusion of being a link in a chain, but it gives meaning to life. And finding something today that I didn't know yesterday, it's even a luster to the chain.

Alex: What would be your advice for people who are just starting with something complex? And they see that there's so much to learn. Maybe they're starting with mathematics or machine learning or physics or causality. 

Judea Pearl: I don't know. I would quit. I would not start something which looks too complex. I'll find something that I feel that I can contribute something, that I mastered at least part of it.

And from this little part, I can expand, but I wouldn't get into an area which is, uh, too complex, like, for instance, I don't know genetics, and I find it very hard to get into. It's too complex. I don't know how to get into it. So I quit. 

Alex: So that'd be start small with something. 

Judea Pearl: If I knew at least one tiny areas of genetic relationship that I can relate to it.

I really understand from there I can grow bigger. I can expand. 

Alex: Beautiful. It was a pleasure, Yuda. Thank you so much. Well, thank you, Alex. 

Thank you. It was also fun.