Causal Bandits Podcast

Causal AI, Justin Bieber & Optimal Experiments || Jakob Zeitler || Causal Bandits Ep. 007 (2024)

January 08, 2024 Alex Molak Season 1 Episode 7
Causal AI, Justin Bieber & Optimal Experiments || Jakob Zeitler || Causal Bandits Ep. 007 (2024)
Causal Bandits Podcast
More Info
Causal Bandits Podcast
Causal AI, Justin Bieber & Optimal Experiments || Jakob Zeitler || Causal Bandits Ep. 007 (2024)
Jan 08, 2024 Season 1 Episode 7
Alex Molak

Send us a Text Message.

Support the show

Video version of this episode is available here
Recorded on Sep 5, 2023 in Oxford, UK


Have you ever wondered if we can answer seemingly unanswerable questions?

Jakob's journey into causality started when he was 12 years old.

Deeply dissatisfied with what adults had to offer when asked about the sources of causal knowledge, he started to look for the answers on his own.

He studied philosophy, politics and economics to find his place at UCL's Centre for Artificial Intelligence, where he met his future PhD advisor, Prof. Ricardo Silva.

At the center of Jakob's interests lies decision-making under partial knowledge.

He's passionate about partial identification, sensitivity analysis, and optimal experiments, yet he's far from being just a theoretician.

He implements causal ideas he finds promising in the context of material discovery at Matterhorn Studio, earlier he worked on sensitivity analysis for quasi-experimental methods at Spotify.

Want to learn what a 1000-years-old church, communism and Justin Bieber have to do with causality?

Tune in! ------------------------------------------------------------------------------------------------------

About The Guest
Jakob Zeitler is a researcher at Centre for Artificial Intelligence at University College London (UCL) and a Head of R&D at Matterhorn Studio. His research focuses on partial identification, sensitivity analysis and optimal experimentation. He works on solutions for automated material design.

Connect with Jakob:
- Jakob Z

Should we build the Causal Experts Network?

Share your thoughts in the survey

Out-of-the-box insights from digital leaders
Delivered is your window in the minds of people behind successful digital products.

Listen on: Apple Podcasts   Spotify

Support the Show.

Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4

Show Notes Transcript Chapter Markers

Send us a Text Message.

Support the show

Video version of this episode is available here
Recorded on Sep 5, 2023 in Oxford, UK


Have you ever wondered if we can answer seemingly unanswerable questions?

Jakob's journey into causality started when he was 12 years old.

Deeply dissatisfied with what adults had to offer when asked about the sources of causal knowledge, he started to look for the answers on his own.

He studied philosophy, politics and economics to find his place at UCL's Centre for Artificial Intelligence, where he met his future PhD advisor, Prof. Ricardo Silva.

At the center of Jakob's interests lies decision-making under partial knowledge.

He's passionate about partial identification, sensitivity analysis, and optimal experiments, yet he's far from being just a theoretician.

He implements causal ideas he finds promising in the context of material discovery at Matterhorn Studio, earlier he worked on sensitivity analysis for quasi-experimental methods at Spotify.

Want to learn what a 1000-years-old church, communism and Justin Bieber have to do with causality?

Tune in! ------------------------------------------------------------------------------------------------------

About The Guest
Jakob Zeitler is a researcher at Centre for Artificial Intelligence at University College London (UCL) and a Head of R&D at Matterhorn Studio. His research focuses on partial identification, sensitivity analysis and optimal experimentation. He works on solutions for automated material design.

Connect with Jakob:
- Jakob Z

Should we build the Causal Experts Network?

Share your thoughts in the survey

Out-of-the-box insights from digital leaders
Delivered is your window in the minds of people behind successful digital products.

Listen on: Apple Podcasts   Spotify

Support the Show.

Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4

Maybe to put it in there, it's not like assumptions are binary, it's not like it's true or false. In my PhD work, led by Ricardo Silva, I was put on this path on partial identification. In that field, you actually learn quite fast, intuitively, that assumptions are almost like a range. It's like a slider you pull up and down.

It's the future causal, I think. Hey Causal Bandits, welcome to the Causal Bandits Podcast. The best podcast on causality and machine learning on the internet. Today we're traveling to Oxford to meet our guest. He learned programming at nine. Convincing his dad to buy him a book on PHP. He started thinking about causality at 12.

Frustrated by the fact that the only thing he could learn at school was the old mantra that correlation is not causation. He used to play piano, but now he prefers Counter Strike. Workouts and family time, ladies and gentlemen, please welcome Mr. Jacob Zeitler. Let me pass it to your host, 

Alex Molak, ladies and gentlemen, please welcome Jacob Zeitler.

Thank you very much, Alex. It's great to be here and talk about my favorite subject, which is causality. 

Thank you for joining us. Jacob, where are we? 

Well, we're here in the, St. Alde's Church in Oxford in the United Kingdom. And, I chose this place because today I want to really talk about the assumptions of causal inference.

And assumptions in statistics or in science in general is something we believe in and we build faith in that they work. And I think this church setting here kind of represents that, in the same way. where people kind of like come to ask the question of like, you know, why am I alive? Why am I doing these things?

You know, why do I believe certain things? So you're questioning your beliefs and you're questioning your assumptions. And I think that's what we're here to do as well. Like how do we make causal inference work and which assumptions are more reliable than other assumptions? Assumptions 

are basic or fundamental in causal inference.

In some of your work, you focus on the cost of those assumptions. Can you tell us a little 

bit more about this? Yeah. So it's, um, it's something that's emerged, as an idea over the, over time of my PhD is basically that not only do we have assumptions and we need assumptions for causal inference, I think in your book, I took a quick look as well, but you know, there are certain steps that help us to get to a point identification and causal inference and they're necessary.

These assumptions are necessary. But I think the question we always kind of like ignore is like, the cost of these assumptions. So more simply, for example, the best assumption we can have is randomization, but it comes at a quite expensive cost. For example, for a clinical trial, it costs, you know, millions, billions for the pharma industry to run those.

Um, and so that's great because then we have absolute, you know, the best kind of certainty about the causal effect of a drug, whether it works or not, but it's expensive. Whereas on the purely observational side, we have assumptions that are kind of coming for free, right? No unmeasured confounding, it's just something I say, and then we assume it's true, and then we keep moving in the causal path towards estimation.

But then again, the question is like, can we spend money to reduce the risk that that assumption is wrong? So I think assumptions are important and then the cost associated with that as well. 

It sounds to me like, the heavier the assumptions we make, the larger the risk we take 

so maybe if we have a purely observational study and we assume that there is no hidden confounding, for instance, uh, that might be a. Well, that might be a heavy assumption, or maybe a less heavy, depends on your context. The price goes down but the risk of being wrong regarding this assumption goes, goes up.

Yeah. I think something through my conversations with you actually I realized is that in this framework of the cost of the assumptions of cost inference, the, the notion of risk is still missing. So I haven't actually thought too much about risk in this framework, but I think as you say, like, I think there's, there's ways for us to buy.

You know, can spend money to reduce the risk, certainly. So I think, for example, one practical way would be at Harvard at the public health school where they do, you know, a lot of fundamental research in causal inference. They spend a lot of money to discuss these assumptions for observational studies.

You know, they're empirically not verifiable, but we can sit in a room and discuss them. And I think they. Set up a new lab as well, where they even, ask the public to send in, you know, questions and justifications for, the cost inference analysis. And so that obviously costs money, people need to manage this, people need to talk, need to pay professors.

But it does, of course, reduce the risk that we are, making wrong assumptions. So I think that's a definitely a good way to do it for, observational studies. 

Yeah, it reminds me of the idea that we know so well from statistics and machine learning of Not having a free lunch 

Absolutely.

Yeah, and um, maybe to put it in there. It's not like assumptions are binary. It's not like it's true or false in my PhD work, led by, by my supervisor, Ricardo Silva, I was put on this path on partial identification. In that field, he actually learned quite. Fast, intuitively that assumptions are almost like a range.

It's like a slider you pull up and down. So, traditionally we look at like, you know, we have data and then we're like, Oh, let me get the cause effect for aspirin, does it work or not? And then they're like, I have to make assumption A, B and C. And only if I have A, B and C, I get a cause effect.

But that's not the whole story. It's more like we have data and then we can make some assumptions to get some causal, statement or some causal result and the point identification is actually only one way to look at causal effect analysis. The other one is partial identification of causal bounds, which is kind of in between saying, you know, I can't do anything with this data because I don't like the assumptions and I'm making all these very strong assumptions to get to this one number.

With partial identification you kind of like can easily move in between and what you get is not a number but you get a lower and upper bound on the true effect. And so as you add more assumptions those bounds get tighter and tighter and at some point they collapse. So they're both on top of each other which is just the point identification.

And so this is, quite important. I think to emphasize is that assumptions, you know, our range, it's not binary. It's not an all or nothing. It's something we can discuss at very different levels. I 

love this perspective. And I think this is something very, very important. I would love everyone to hear that, that, thinking about causality in terms in binary terms, that we either can do it or we cannot do it.

might be very, very limiting. And we might be actually throwing away potential gains that might be much cheaper than we, than we initially assume when you think binary 

about this. Yeah. I think it's just comes back to very basic, you know, your philosophy of science, which is, what's your scientific paradigm.

And if your paradigm means to, question your hypothesis and justify your answers, then you would want to look at every possible perspective. of the causal question you're facing. And so being like, well, it's all or nothing, it's, it's just two perspectives. So if you're able to go in between, we can much more easily, always like with a different microscope, you know, we're looking at different zoom levels at the, at the causal problem.

And that, you know, is, is going to produce better papers. So it's, it's a very, very early field. I'm, I'm not saying like you should have heard about partial identification before. So it started in. 1989, I think, with her first kind of written paper, um, I think it was Robbins who just as a, as a side note, kind of introduced it.

Pearl also talked about it at the same time. You know, great minds think alike, it's quite common to have these, things happen at the same time. And of course, Pearl took a DAG approach, and I personally find the DAG approach actually much more intuitive to explain partial identification. But with a student, Alexander Balker, they wrote a few papers that introduced these causal bounds.

and, and so then obviously we have the trough in between, and then there was another guy from economics called, I think Charles Mansky. He wrote a whole book around it, I think 2003 maybe, and it wasn't picking up as much. And so now we're actually kind of like getting into maybe, let's say, a third generation of partial identification.

And I think it's partially driven again by the frustration that is like, there surely isn't just full causal effect estimation and no estimation. There must be something in between. And so people. People want more flexibility. People want to talk about more assumptions, different assumptions, and more importantly.

Weaker assumptions, so strong assumptions are just inherently hard to justify. So if, if you're trying to, you know, if you're talking to the government about COVID policy or a company about their policies and marketing strategies, you know, you want to have a bit more than just a binary all or nothing.

You want to be able to provide, causal results with weaker assumptions because that's easier to justify at the end of the day. One question 

that I often hear from people who are practitioners in causality or are interested causality. And they hear about, partial identification. They also ask, Is this concept related to sensitivity analysis?

And if yes, 

in what way? It's a good question. I actually met with someone who wrote a paper on bounds and sensitivity analysis. And in the most kind of like, you know, first step way, yes, it is in a way, can be seen as a sensitivity analysis, but I think strictly. It's just a rephrasing of a cost model.

And then you can actually do sensitivity analysis on that in some way. But, if it helps to understand the idea, yes, you could think of it as some kind of sensitivity analysis, sensitivity analysis, strictly speaking, means you know, introduce a, subjective parameter, which you dial up and down and then see how the results change.

And how they match, with what you see in the world. And then you have to take that parameter and also calibrate it. So, Guido Imbens, I think, first described that in a paper. And it's, it's a very simple idea. It's not complicated at all, to be honest. It's a paper that everyone can read and be like, that makes sense.

You know, just one more variable in there. And I dial it up and down. It's a subjective exercise though. Bipartial identification isn't subjective by default, by design. The only subjective thing might be the causal graph you assume, but there is no subjective sensitivity analysis in there, but you can add it and you can make it that way.

What do 

you feel contributes or contributed to the fact that, I think it's safe to say that both areas, partial identification and sensitivity analysis are not so well known in the community today. And although they seem very powerful in a sense that they can really broaden the let's say the action space in causal analysis, they are not that frequently 

applied.

You know, it's the, it's the age old question, really, this seems to make sense, but why aren't we using it? There, there are psychological reasons is also the fact like before something is being used, it's not being used, you know, that's the stage before it is being used. So are we at the beginning of, of a change of people using these things?

And I think the answer is yes. I think we just, at that point, we are making this change. Sensitivity analysis has been around for some time. I think it's just that basically people's bandwidth is lacking, you know, the investments you need to make to get to already just a cause effect with a simple Python library.

I mean, you wrote a whole book about it, you know, it's not, it's not easy. And then to ask them to also do a sensitivity analysis. Is this point maybe a bit, still a bit of an ask. There are obviously more libraries coming out and simpler methods, so I know the Tyler Vanderveel for example came up with the E value.

And that's a paper from 2017. I think he came also to give a talk here in 2019, actually. Very simple idea. And he was like, the medical clinical trial literature, they need to do this. They need to do sensitivity analysis on top of their cost analysis. And here's a very simple way to do it. And here's a Python or R package just down on the put in the thing.

And it's just one equation that the package just has one equation in it. You know, it's trying to make it as simple as possible. Still, people are not using it. It's just a question of resources you have and you have to choose to make trade offs. Right? So sensitivity analysis is absolutely important and it should be all included.

It's just that people don't have the money to walk that far. And I think the same with partial identification, we're going to have to first create these tools and make them available and simplify them. There is still a caveat with partial identification, which is that these bounds that I'm talking about.

They can often be not particularly informative, and that's just the nature of things. Doesn't mean we shouldn't report them. You know, as a scientist, you should, uh, you know, independently. You know, go into your inquiry of a causal question or a science question and you should put, report all the results you have.

It shouldn't be like, I'm not going to report this or something like that. You should try to provide the widest perspective of, you know, the questions you're trying to answer and the results you've seen and partial identification should be part of that even if it is not as informative. And so the second thing is also partial identification on top of all the causal stuff is also another level of complexity.

And it's not just complexity to understand, it's also complexity to run these methods. So we have exact methods, that's something I've spent a lot of time on and that's actually the kind of core part of my PhD. And then we have approximate methods that probably have a better chance to actually work in, in, in real life and in practice.

And so the, the problem with the exact methods is they don't scale well. So, you know, if you go beyond five nodes in a graph, if you go beyond the domain of three or four, you know, discrete, states of a variable, It, it doesn't compute. And so my PhD actually in one particular paper, has brought down as in my opinion has brought, you know, suggested one of the best ways to, you know, deal with this problem of scaling.

But then the other paper, by Curtin, Padh, we collaborated all on this, has shown a method that's. Much more applicable because it's, you know, in the most simple case, just the IV setting, periclassical, you know, many people understand that and then it allows for continuous variables and it allows for multiple treatments.

So I think we're going to end up with the more practical thing, but if you still want to go to the roots, the exact methods is where it's at. Fortunately, no free lunch, right? So these methods are expensive to understand and to run. We will link 

the paper, and all the resources you mentioned in the show notes.

So everybody who's interested can dive deeper and, and read about those methods. And recently I had a conversation with Andrew Lawrence , one of his works was about applying a method, A-star algorithm, which comes from computer science, it's like a path search algorithm, to make some of causal, discovery methods more efficient.

 Do you also in your work have examples of taking some methodology or some idea from another context and applying it to causality in order to make it more usable? More 

reliable, I think it's actually the way science works, to be honest, you know, science isn't a hundred percent revolution sciences, 80 percent the old stuff and 20 percent something new, and especially, I think people get stuck trying to be particularly original on the 20%.

Don't be original. Just find something like the a star algorithm that fits the other 80 percent in a new way and that creates a new value. So, you know, Actually, with the causal marginal polytope, it is taking ideas from something called belief propagation, and it's applying that, so basically, let's say you have a graph x1, 2, and 3, yeah, so x1, 2, 3, 4, 4 nodes, and they're fully connected.

Now, if you were going to do partial identification with that, it would already get quite tricky computationally. It would require a lot of RAM, a lot of, you know, arrays to allocate for the calculation. But with belief propagation, what you take is you just take the marginal, so you reduce. This four, node graph into, for example, four graphs, each with three nodes.

So they're like subsets, basically. They're like, let's just say the sub worlds. It's like a core perspective on the causal world. And with that method, you can then actually overlap them, for example, to use the statistical information between those worlds. And then also add expert knowledge and that way you can actually tackle, you know, causal bounding questions that are beyond four or five or six variables.

So I think 10 also is possible. Obviously, you know, free lunch still need to make a lot of expert decisions, but it is very much a prime example, just like the A star approach of using a different idea and applying it to a causal problem. 

It sounds like a beautiful application of the divide and conquer approach to computer science that we know from sorting and, 

and, yeah, divide and conquer is the way to go.

Absolutely. You know, there's all kinds of people in the world and I think if you want to go far, first of all, you need a team. And second of all, you just got to be practical. So, science is baby steps. It is in huge leaps over three, four years where you have a genius idea and then, you wake up, you do have these ideas.

Personally, I do have these ideas at night or like in the shower or whatever, but they're not like huge ideas that just like after grinding away and looking at the problem again and again, you're like, maybe this works in a different way. Really just in a way, banging your head against the wall, and there's nothing genius about banging your head against the wall.

Yeah. You mentioned 

experiments in the beginning of our conversation. And that's it. Um, well designed and well conducted randomized trials are a great tool to talk about causality, but they also have certain limitations. For instance, one of the limitations would be, when there is effect heterogeneity, right?

So all the different people, different units in the experiment, uh, react differently. Some people in biostatistics have ways to deal with this. Okay. Another limitation of RCTs that seems more fundamental is that even though they can inform us about average treatment effect, in some cases, conditional average treatment effect, They cannot help us distinguish between different counterfactual scenarios or answer more generally answer 

counterfactual queries.

What are your thoughts about? So in our research group, we've actually, repeatedly had to clarify what we mean by, you know, conditional average treatment effect, heterogeneous treatment effect, individual treatment effect. I mean, maybe just give a background, right? So when we say causal inference, there's actually a whole zoo of different effects we can look at.

So the most common one is the average treatment effect, but if you say conditional, for example, you might be, you know, conditioning an age and then you look for each age group, what the cause effect is, and this can go as complicated as you want. Actually, once again, there is a really nice paper in partial, identification, the field, it's by Wu et al.

2019, and they're looking at bounding for fairness, but it's a really nice paper where I think there's a table where they just list all the different effects and how they can be bounded, and I think that's a really good overview of what kind of effects there are. Specifically now for the question with heterogeneity, yes, it's of course important, I mean, you know, we, We can make decisions based on average treatment effects, but, we probably want to drill deeper, but it does get complicated.

And frankly, I'm not the expert, you know, when we have these discussions about what, what is the difference between CATE, you know, C, you know, Conditional Average Treatment Effect and ITE and stuff like that. Frankly, I, I've never got past the stage of understanding it. All I know is that people tend to practically revert to ATE.

Because it is easy to refer to. Of course there's many people that spend a lot of time looking at these questions and maybe I'm not the right person to comment on what the right heterogeneous treatment and effect estimation methods are. I guess it's just important to like be careful with these because there's a lot of confusion as far as I see.

But it is important, and I think I was just told by someone, that, these methods tend to also have quite the danger of something like p hacking, so you can easily create kind of like a sub population that supports a conclusion that you want to see, but that's, you know, I would need to look up the reference for that.

How about 

counterfactuals, this more fundamental distinction between interventional and counterfactual 

queries? It's one of my favorite topics, actually. Because, I think I saw you as well, you know, posted that you, looked at the topology, of course, and for instance, it's a wonderful work, that Thomas Icard, I think wrote down, and it seems to be based on some previous work by Konstantin Ganyin at Tübingen, and Konstantin was a PhD student with Kevin Kelly at CMU.

And CMU is one of the breeding grounds of causality, but the causal hierarchies is incredibly important. And I think I saw it mentioned in your book as well. Um, of course, the causal ladder as it's called by Pearl, for example, it's such an essential concept, because it really lays out the limitations of what we can do, you know, in science, it isn't about like reaching for the stars, I think at least, I mean, you can dream, you can envision.

But ultimately you need to come back to Earth and you need to ask the question of like, is this theoretically possible? And so with the causal hierarchy, we're able to inquire what is theoretical possible. Most importantly, can I make statements about the interventional world just from observation of data?

And the answer is no, you cannot. And so that is shown, for example, with topological arguments, but then also in the paper, I think it's called the causal hierarchy maybe by Elias Bremboim, they use, I think measure theory to show the same thing. That basically you can't, when you have observational data, if you want to make statements about interventions, you have to make assumptions.

Maybe that's a better way to say it. It's not like it's impossible to say things about interventional states with just observational data. It's just that you have to make assumptions that are probably pretty hard to defend. And then when we go from level two to three, so from intervention to counterfactuals, it's the same story.

Just interventional data like RCTs, interventions on the graph and so forth. Isn't going to get to your counterfactual conclusion, but you can go and make the step of saying, I talked to an expert or I'm willing to take, I guess, willing to take the risk to make the conclusion based on these additional strong assumptions.

And I think once you realize you, you have a much better realistic perspective on what can be done. It's not, it's not a statement of like, Oh, well, then we just go home and do nothing. It's just like, Oh, and now I feel much more confident to be like, okay, well, this, this statement is possible. This statement is impossible or more practical.

How much money do I have to spend, you know, to defend these assumptions. So the cost of, of causal, assumptions. And so, those costs get quite a bit more expensive as you go up the ladder. Observational data is just fricking cheap. It's the best we have. The reason machine learning is so successful, right, and big data was a term is because observational data is cheap.

Interventional data is so much more expensive and counterfactual data basically doesn't exist except if you do something like a twin study, which still requires you, you know, to find the twins and pay them and, make some assumptions that the twins are the same. Traditional 

associative, associative machine learning in rung one.

Yeah. Machine learning. We could say that it's crazy successful in certain areas and this is the family of methods that brought topics that you and I and other people in this podcast are interested in, into, into the public 

awareness through, 

especially recently for generative models, both in graphics like mid journey and in text, like chat, GPT, Llama, and all those large language models.

What is it, about the place where we are today that the community became more receptive to hear about causality, although those methods, associative methods are so popular and 

powerful. It's probably the same reason for most of us, which is that you train a model and the world changes and you make predictions and they don't work and you wonder what happened.

Well, the world changed. And I think we all with a good intuitive feeling that that's happening. If I trying to predict airplane ticket prices with data learned from 2018 in 2020, it's not going to work because there weren't planes in the sky, right? Because it was pandemic. So there was a systematic change, there was a causal intervention on the world, which was a virus, and that changed the whole system.

So of course, the predictor I learned with machine learning, or if it's just a linear regression, it doesn't matter. It's not going to apply anymore, because there isn't a causal system or thinking behind that model. It's just correlation, right? It learns that if these, you know, few data points are observed, then the ticket price is going to be this, because I've seen it in the past.

The past is not the present or the future, usually, so, for me personally, it was also just the, you know, dissatisfaction taking courses in statistics, that would go on and on about why you know, make this So, you use this formula, and then just apply this formula, and you know, then you get this number, and then if it's about 0.

05, you know, it's not significant, but there's one star, two star, like, it's very applied, and then even if you then do a theoretical statistics, you still will not hear about causality. At least, five, 10 years ago, very few institutions would talk about that. It's changed obviously because, we have people that have invested the time.

And so, you know, I, when I heard the first time, that you can actually use mathematics to characterize causality, I was very happy because I had, I had all these thoughts and I was being kind of told and applied that no, you cannot characterize it. There's no maths to do that. I mean, maths is the language.

The formal language to characterize things in a very precise way. And, um, I kind of had started to think or believe I made the assumption it's not possible. And so machine learning models, don't have that causal adjustment in there. There's no discussion of that traditionally. And so that's why they fail.

And that's why I was also unhappy when I would take courses in machine learning. And then I was usually at the end of the course, I'd be like. You know, professor, but what if, the world changes or something like that? And it'd be like, well, yeah, I guess it's probably some kind of causal reason or something like that.

And I'm not blaming the professor, you know, it's a structural problem in academia. But these methods fail. And of course, that's why we have these huge churning server farms that retrain every day or every three months or something like that. And it is a practical approach, but it's really the applications where we can think causally about these problems to characterize them in a better way and maybe not, you know, blindly retrain every day.

You started 

asking causal questions very early in your life, at least compared to what people might imagine is the moment when you start thinking about stuff like this. Can you tell me about the first time you remember you, you started thinking about those 

questions? Yes, I, I think it was, when I was 12 and, uh, you know, I was young, um, I, I guess it was somewhat limited in my.

Perspective, but, it was around that age that I, I guess in school, some children were treated differently than others. And I felt like, you know, that also the differences in wealth created a lot of problems there, you know, kids couldn't go to school tours because they didn't have the money and other kids, you know had all the designer clothes and three game boys.

And, so I kind of like, in a very natural, youthful way, I was like isn't there something, you know, a different way where we can treat everyone the same and, long story short, it's known as communism. And I was like, well, communism sounds very attractive and surely when they tried it so far, maybe it didn't work out, but maybe we just haven't had a proper causal analysis, I mean, maybe if we just do like some kind of experiment where we have one country and it's capitalist and another country that's communist.

Maybe we'd see differences and obviously that's an RCT, right? So that's the cause of thinking there. And, it was kind of motivated by the question of like, what kind of political system is, has the better cause effect on welfare or economic development? And I was just thinking about that, you know, Didn't have the math skill or the reach to ask the right questions.

I mean, when I was 12, what year was it? 2008, Judea Pearl hadn't published this book, how would I have known, how would I have found it? So I was left with that and very frustrated, but it never left me. It was always a question. And annoyingly so, I mean, teachers hated me in school because I would always, post causal questions.

I'd be like, yeah, but how do you know this is what it does? And some teachers were a bit more honest and they were like, well, I don't, you know, this is, we learned these causal paradigms, whereas others would be like, oh, why are you asking this question? You know, there is no way to do this. And it is frustrating, you know, when you're like, I feel like there's a way to do causal inference and causal understanding of the world.

Maybe mathematically precise and then some person of authority tells you, no, because correlation doesn't employ causation and move on with your life. We always going to just have these correlational studies and unless it's an RCT, you're not going to have an answer. And sometimes the funny thing is sometimes they wouldn't even like say, Oh, maybe just run an experiment.

They just would be like, no, you can't get the cause effect. It's like, but actually, I mean, just run the experiment at least. When you mentioned 

this idea of taking two countries and randomizing a treatment between them, it immediately brings to my mind the, uh, memory of a paper by Alberto Abadie about the influence of a conflict on economy.

There was another paper also about German unification. Both of those papers use the synthetic controls method. This is the method that you also used during your internship at Spotify. Can you tell us a little bit more about this project and what you 

learned? Yeah, I'd love to talk about that because it has a really good lesson for PhD students in there, especially if you're just beginning or just research in general.

So PhD, every PhD is different, I like to say. When you start your PhD, there's the, you know, pressure to publish. There's only so many years you can do your PhD, three or four, and you want to show something for the work you're doing. And I've been working on that partial identification, thing for some time, and There was a stretch, arguably also with COVID things were slowing down massively and at some points there was weeks without, you know, progress because of all kinds of COVID pandemic problems.

But, after two years, I still really like didn't have something, that was, publishable. And so, the causal marginal polytope was coming around finally and we submitted it and. It went through a bunch of reviews and then I went into this internship and I was like, well, you know, three months, good luck doing anything here.

And it is absolutely true that three months isn't a lot to produce research, novel, original and publish. But it turns out that in this one with just maybe two and a half months of work, I'll get a whole publication out of it. And that was such a surprise to me because, I have been working for two years in the same way, same pace, same environment.

And I go to Spotify, nothing changes. Just get paid much more. And I was like, well, what's the factor here? And the truth is there is no factor. Which is that the only thing that matters is that research is random. And so some projects, how great they are, just might not make it out there because of all reasons.

But specifically with Spotify, now coming back to synthetic control, it's just that in a way, some kind of, stars aligned, you could say, in terms of the research constellation, which was that, Kieran, who gave me that opportunity basically at Spotify, and who's running the advanced causal inference lab there now.

He was like, you know, there's this, a problem we have at Spotify, you know, we want to, estimate the cause effect on these kinds of time series data. So let's say, that, Justin Bieber, wants to promote a song on Spotify or something like that. He wants to know the causal impact.

So that's a causal question it's not just, he wants to know the impact. He wants to know the causal impact, right? Here we go. Causal inference. And so synthetic control, the same way Abadie would use it for answering political or economic question. Can be also very effectively used in a marketing or any kind of like, uh, you know, web platform analysis or time series analysis.

And so Google had done some work there before in 2014. They brought out a paper in a package called causal impact. And that one had a Bayesian approach, but that's not important for the method. But specifically at Spotify, we kind of like, asked the question of, okay, we want to use this, but what are the assumptions that go into this?

And so that comes really back to my core question here of like the cost of causal, assumptions. And so the assumptions that go into synthetic control, then once again, a few that are testable, but a lot of them are untestable. We first of all wanted to actually properly characterize them.

So it wasn't until then, with this paper, that people really, um, this assumption three double prime, you know, you can look it up in a paper. It, no one ever really went and did that in a non parametric way. And before we could do that, we did something else that seems no one had done before, which was to characterize synthetic control with DAGs.

So, Abadie, of course, Econometrics, would use the Potential Outcomes Framework in some way to, you know, characterize. this causal system, and we were like, well, how would you describe it in DAGs? Because, well, if we know of SWIGs, so single world intervention graphs, then presumably there's a way to translate Abadie's potential outcomes, synthetic control, into Judea Pearl's graphical models, synthetic control.

And so as a first step, we characterize synthetic control with DAGs, and that already makes the whole problem much more easy to look at. And then we use, proofs from previous papers, to then provide that non parametric identification result. So identification does come in different forms. In your book, you talk obviously about the different step that leads you to get the estimate.

In this case, there's another step, that's actually being used. It's a theorem that comes from proximal learning. So proximal learning is another emerging trend, um, or topic in causal inference. We can talk about that later, which, as you know, I'm very passionate about proximal learning as a deeper causal concept, but, coming to the paper with Spotify, we then also on top of the DAG and identification provides sensitivity analysis, and we already talked about that before.

But we basically introduce an additional parameter into our causal model, and that one is subjective. We also have two more parameters, but it, you know, seems like we're lucky that we can actually estimate them from the data we have. And then we're only left with one parameter. And so we have to assume linearity for this very simple synthetic, control sensitivity analysis to work.

But, then you just have this parameter and you can dial it up and down. And it will give you different results based on your assumptions. And I think that is something people haven't, looked at enough. So synthetic control is very powerful. It's a method where I personally think, like, we have the causal hierarchy, which can be very depressing when we learn that going from rung 1 to 2 and 3, It's very hard or hard to justify, but then we have practical methods like synthetic control where it's like, wow, it does have strong assumptions, but this thing does go a long way.

And so that was the internship, personally, I felt like it was two and a half months of work. Of course, there was some submission and some final experience we did. It was a great time anyway, because Spotify is a great company. With a lot of opportunities, and a lot of freedom, at least in the position I was given.

 And then we submitted it and it was accepted at clear and I presented it in April and as a whole incredibly smooth and incredibly unlikely experience for someone to go through. But I was, you know, given that chance to experience that and it was a wonderful experience. It's still one of the projects I'm most proud of in the research I've done.

Great, congrats on the project. Thank you. On this very smooth journey 

as well. It's Kieran who is the visionary here. You know, I'm just the one that like kept bringing up ideas that didn't work, but non parametric identification is down to him. The DAGs is something, you know, we challenged together and then.

Implementing the whole thing, it came down to me, but yeah, it's teamwork.

 I think teamwork is, is very powerful and often underappreciated perhaps. Yeah. Especially in those maybe more competitive contexts where, where people might feel a little bit, uncertain if they give a credit to another person, maybe, maybe their credit will, will hurt somehow.

Yeah. Synthetic control is an, is a very interesting method, but it's a less known fact in the community. That is my impression that this method does not come with guarantees of identification. In other words, is it might be susceptible to confounding. You mentioned that you also use DAGs. Was that something that helped you frame the problem from the identification 

point of view?

Are you asking like whether using DAGs helped us produce that identification result? 

My question was even a little bit more basic. So I intended to ask you about if the fact that you use DAGs, although you are using synthetic control, which is usually or typically used without DAGs because it's, it's somehow binded to this idea of potential outcomes, was the fact that you use DAGs.

With synthetic control was helpful for you, to think about identification in more clear 

terms. I think so. Absolutely. I have these potential outcomes and I think I'm gonna say, but like, I think a causal inference researcher that always chooses one over the other, it seems a bit one sided.

I think both work fine and you can actually mix and match. So. There are papers out there that use potential outcomes in the same sentence as DAGs, and they are mathematically basically interchangeable, you know, keywords, single world, intervention graphs. I personally just do prefer more the DAGs after all, I do find potential outcomes clunky, but they can be more expressive at times.

So when I want to express a much more complicated causal system, I would use both together, but if it can be done with a DAG, then I will use a DAG, and so in this case, Using a DAG really, I think helped talk about it better on a whiteboard as well, you know, it's also, you take a picture and you send it to your research collaborator and you get your, you wrap your head around it better as far as I can see.

Yeah, so, so DAGs was definitely the way to do, to go there. Again, you can make all the same arguments with, potential outcomes. You just can't draw the nice pictures for it. I 

want to circle back now to. One of the things we discussed briefly before, you mentioned this paper about topological view on causal inference and causal hierarchy. When you look at, those limitations that appear on the lower rung, on the rung that is minus one, right, compared to the rung where you are, we can observe the symmetry, that, rung one can produce many potential rung two systems as its basis and one system at level two can produce many different systems on the third rung counterfactual rung.

Yeah. That would lead to the same interventional distribution. From the topological point of view. It seems that we could try to extrapolate this symmetry, maybe quote and quote symmetry, I 

don't want to be. Yeah, yeah, yeah, yeah, no, um, 

I'm very precise about this. We could extrapolate this symmetry, to even a higher level.

So this gives us a potential to think about, at least theoretically, about another rung that is beyond. Yeah. What are your thoughts on this ? 

I, yeah, I think, I mean, I probably asked Judea Pearl first, what he thinks about that. And then probably ask, I mean, who's a good potential outcome person. you know, there are many of them, I guess.

And what, what did I thought about it? And I'm sure they did. I have a slight inkling maybe that I've read as well, the argument of like, of course, in a symmetrical way, let's say you can go beyond counterfactuals, what does it philosophically mean though? I mean, I don't know. Like when we do science or when we do data science or when we do statistics, there's always a philosophical underpinning, right?

And so especially the great thing about causality as well is actually that, um, it makes us all, thank you so much. It makes us all, you know, I think more about, you know, I guess what life is and where cause and effect comes from, you know? And I, I think the question like that of like, what, what is a higher hierarchy than counterfactuals is incredibly philosophical question, which I don't have an answer to now, I think, with tools like measure theory and topology, we can maybe extend that.

So once you have phrase causal inference or causality with topology, I presume that would be an operator, something like that, that could extend this infinitely. I just don't know how that, what interpretation is. So with maths, we can do a lot of things, but how to actually interpret them in terms of the real world is the harder question, to be honest.

At the end of the day, I think it's an academic exercise. You know, I already find counterfactuals themselves. It's quite hard to understand at times. I'm a very practical person that respect, I know some other people are like, oh, it's just mathematical, just write it down and then it exists. But I'm like, yeah, but like parallel worlds, you know, like cross world interventions, they do make logical, reasonable sense.

You know, do we live in multiple universes? And now we're already in a philosophical discussion, right? So I think there's much more behind this, but I think practically also. We need to be extremely clear about how we go and I use counterfactual interventions. That's fine. I get it. Counterfactuals making the step.

You have to be much more careful. It just gets so much more complex up there. Talking 

about philosophy. I had a conversation with Naftali Weinberger from LMU is a philosopher of science specialized in the intersection of causality and complex or dynamical systems. And one of the things that he shared with me one of his ideas, is that causality is a concept that is scale specific.

So you could look at causality and the causal system at one spatial or temporal scale, and then moving to another scale, there would be another causal system or another SCM that could represent properties of the system. I was very curious what I have thoughts 

about this. Well, it sounds interesting, but do you mean specifically if we were looking at causal systems at different points in time?

Yeah. No, rather at different time scales. So maybe like, it's like a nanosecond scale versus, uh, 

I don't know, a millennium scale. Okay. Well, I'm going to take the conversation in this direction now, which is cyclical systems, right? So I think when, classically or more simply, if you look at the DAG, we kind of assume it's an equilibrium so that things have settled and these are the probabilities on the nodes and these are the structural equations.

Now, someone who spent a lot of time on these questions of, I guess, different timescales is, is Joris Moui from University of Amsterdam. And I had the chance actually early this year in March to go to a, a workshop and, talk and discuss this at a table. And it was actually a fascinating story, almost for like a book, or maybe there will be a movie on causal inference sometime and I'll provide some, you know, movie, you know, direction details.

But It was, it was a table, you know, dinner table, then it was done. And, there was this exact discussion actually. So one, one person, we sat in like, we, we don't need, you know, cyclical DAGs system, introducing cyclicality in DAGs is quite hard and in fact, again, introduces many more limits to causal inference.

So Joris Mooij has spent a lot of time on it and has a paper that discusses all of this in extent. But that person was like, we don't actually need that because we can just represent causal, steps in different chunks of time, slices. And so this is what I would respond to, like, it's different timescales.

Well, yeah. If you have the timescale of months, so you have patients coming in every month, then you have chunks of different months and every month they get a different treatment, which was dependent on the previous results and so forth. And then you can also say, well, what, what if we do it every second?

And of course you can also take it to nanoseconds and you can take it to hundreds of years. So frankly, I don't know where this conversation ended. It was. probably two or three hours, and it went into very, very, very big details. And I think the conclusion was that we don't actually need, you know, si I don't, I don't want to be the person, but from my perspective, I think cyclicality is ultimately, very instructive, but it's not particularly practical.

It's practical in some ways, but for like practical applications such as health and epidemiology. Those chunk time slices are just fine and you can put them in different time scales. Now, philosophically, I don't know what, I guess, Naftali Weinberger would want to imply with this statement, but this is what I would imagine he would be talking about at that point.

He might have meant something very different, but I think this, just as, as a whole, already something very important to, to mention, I guess, for the audience. Is that, you know, DAGs are called directed acyclic graphs, but that means they're directed. So if you go with cyclic graphs, things get much more complicated again.

You mentioned 

a couple of times you refer to your work on, on, on the polytope paper. So a polytope is a structure that might be less familiar to some people in our audience. Could you provide us with a little bit of an intuition, what a polytope is , what is it in the causal sense

and 

how is it useful? Oh, I would actually love to do that because I think it's, one of the good ways to visualize, these causal bounds. Here we go. So, there's a paper for 2012 by Ram Zahai. He actually did his PhD here at Oxford University in the stats department. And he has. a graph in there that actually shows bounding problem when a three dimensional graph, so you can actually see the bounds on a graph, you can visualize it.

And so, as I said before, bounds are basically a lower and an upper bound on a causal effect. And imagine you, you have a box, let's just say it's a cube. A cube, and this cube actually represents your causal problem. With causal assumptions, We're slicing through the cube and we're slicing off parts.

And so with the cost assumption, let's say I can slice off the top. Now it's like half as high. That is equivalent to making an assumption that slices off the top. And then lowers the upper bound further down because the upper and lower bound are the maximum and the minimum point on a polytope and a polytope, for example, a cube is a polytope, but more complex, you know, high dimensional, objects can also be polytopes.

And so that, that is the polytope in causal Marginal polytope is that we actually put different boxes together, but these boxes are still ultimately are convex. So convexity is a very important concept. for optimization, and convexity basically means that you can find the global maximum and optimum.

That is also why I say I work on exact methods, because they're called exact because at any point in time, when I calculate the minimum and maximum, or the low and upper bound, those numbers are the true low and upper bounds. There's nothing lower or higher than that. And so that's the polytope there, and you can really imagine it like a cube that you're then slicing down with assumptions, and so let's come back to it.

Cost of causal assumptions. If you want to slice the cube. You have to pay, whether it's money or sitting down to talk with your colleagues about the assumptions. But you can slice down the cube and at some point you will have sliced it down so much that the lowest point in the polytope and the highest point are together and this is your causal identification.

Becomes a hyperplane. Yeah, yeah, basically that's the other word there, yeah. In the 

beginning of our conversation we talked about experiments. And in a sense, we could see experiments as a special case of causal inference. So we could think about doing causal inference using do operator just on the data and doing causal inference by intervening in the real world.

And this gives us, or can give us some additional information about the system. This is related to something that you're also working on, which is optimal experimentation. Could you give us a little bit of a perspective on this? 

Absolutely. Yeah. Become very passionate about optimal experimentation, as I said, because of course the inference and it really is a motivation.

Um, that has grown out of that and I think the, the realization that, we can cut the, the marginal polytope or we can cut the causal space with experimental data, it's just quite powerful because the only question then left is which experiment should I run next? and so there are statistical answers to this.

So we have, actually there's a nice story to this. The story goes like this. When people were mining, for gold, let's say, or oil. They had a three dimensional space. Let's say you have a kilometer by kilometer and you want to figure out where is the gold pocket, right? What methods can you make? Well, you can just randomly drill down, and then when you find something, you can be like, oh, this looks good.

You know, I'll keep drilling more. Or you can make a grid, so you drill every 10 meters. But that's quite expensive, right? The question then becomes, can we be smart about the drilling? And so that is something called as kriging, and that's been used to explore basically, you know, mining areas or oil areas as far as I understand.

And so the same way, you know, if you have a statistical parameter space, and you want to learn the data points about it, you can just. Or you can use something called, nowadays, Bayesian Optimization. So, Bayesian Optimization is a method which basically takes, just any predictor with uncertainty on it.

Doesn't have to be a Gaussian process, which is commonly used, but it is used 99 percent of the time. And then, it uses that uncertainty to make a decision. Where the next data point should be acquired and I find it just a really inspiring method, really effective method to not do random, you know, experimentation, like, okay, now we need to be careful to not randomly select points where we do experiments, or measure data, and to not go with a brute force approach, but to be smart about which, you know, the data points we measure next.

And that is true for just correlational data analysis as well as for causal analysis. So both of these can be phrased as Bayesian optimization problems.

What are the main challenges in this work or this area? 

I think right now the main challenge is about the integration of expert knowledge. So we have something called vanilla VL.

So it's just a boilerplate method. It only takes data. but then these, surrogate models, so those models You used to make predictions and you used to figure out the next best experiment. They do have a lot of parameters that you can tune. And then those tunings hopefully can be linked to expert knowledge and that will help you find, you know, the gold pocket faster.

Because you're not like just having somewhat of a good idea. You have a very precise idea of where gold pockets are. For example, There might be, you know, at the bottom of a hill. So maybe with expert knowledge, you can steer your surrogate model to be looking more in the hill areas. And then that expert knowledge can be also very concretely physics knowledge.

So physics presumably is true. So if you ask a physicist about certain formulas, they'll be like, yes, we derived in this way, you know, 10, years ago. And so maybe we can integrate those and basically kind of like exclude. In the parameter space, certain things that we know are physically, inconclusive or like nonsensical.

And so these are the main challenges right now. And specifically for me, working on applying these methods to material science, it would be obviously the knowledge of a material scientist that goes into the model. And that, I think, is quite important because materials are everywhere. We all use materials, but it's The glass here or these microphones or the high tech recording device.

 The reason we spend billions on optimizing, you know, microcomputer chips is because we are really highly dependent on them. And if we can make them cheaper, we can do more. You know, technology and in healthcare that can go a long way and it just goes a long way everywhere. And so experimentation is incredibly important for that.

In my conversations with Juan Orduz from Wolt and Andrew, Lawrence from CausaLens, I heard, something that I think also goes well with this perspective that you just presented. Both of them, although they work in different contexts and with maybe different types of clients, different types of business problems.

They both emphasize the importance of getting or incorporating expert knowledge in the model. In case of Juan, these are often structural Bayesian models, where incorporating expert knowledge can help you with building a graph, but also Limiting the search space for the parameters in the model in case of Andrew's work, that will be a causal discovery.

We know that in search space grows super exponentially with the number of nodes. So even if we can exclude one edge or 10 edges, this can reduce the search space significantly. From what you're saying, I hear that, expert knowledge can also help us with. With optimal experimentation by contracting this 

search space.

Yeah, it's the same thing. I mean, expert knowledge as a term, or I think, getting the expert knowledge into your model is called preference elicitation. Um, but using that knowledge that you have makes sense because it's there, so if you don't use it, presumably you're not being smart, as smart as you can be.

And actually, outside of optimal experiments, or outside of causal inference, this is also a term, so expert knowledge in statistics. Presumably, every statistical model is based on expert knowledge, right? I mean, you assume a distribution. That your expert said, yep, that checks out, you know, floods behave like a Poisson distribution or something like that.

But, yeah, an optimal experiments. Absolutely. It's very important. And I think to bring it back to, you know, a causal inference. Well, I mean, there's a cost of expert knowledge, right? So if you get a good expert, that might be quite expensive. If you get a not so good expert, or you ask your mom, who is probably a cheap expert, she might give you an opinion on why there shouldn't be an edge in this marketing model.

Um, and I think we, we are already, you know, confidently moving away from big data and building models that just are supposed to explain and predict everything to a, to a world where we like, we have these experts, they know these things better than a machine, better than jet GPT, which hallucinates. And we're going to bring those into those models.

And that's true for, for cost inference in the cost of Marginal Polytope, In fact, you know, half of the contribution is, integrating that expert knowledge about edges, whether they're directed or bidirected. So you can read it that, you know, the graphs, half of the diagrams, half of the graphs show the impact of expert knowledge actually in the continuous impact actually.

So it's not just like expert knowledge for this, but like you can see how. Varying the expert knowledge and what they know has an impact on this bounce. So expert knowledge slices the polytope, makes the space smaller, constrains the bounce further, closer to the true cause effect. But obviously at the cost of that expert knowledge, and expert knowledge can be wrong.

So, that's something we also need to talk about. So how do we have methods that, you know, recover from that? So if you take a Bayesian approach, you're in luck, because as you collect more true data, it will Kind of overrule the wrong expert knowledge, but you will have to collect the true data, which comes at a cost.

So everything costs something, but you want to be smart how you spend your money. Jacob, who would you like to thank? I did think a little bit about that. I think I'm gonna, you know, thank first of all my family who are incredibly supportive. And also my father who supported me through all of this, I think education does pay the best interest, um, whether it's an actual real money or your experience or life quality, let's say, and him being visionary to, you know, support me and, get me a computer early on.

Really helped me actually learn programming very early on access to that is worth, you know, millions for me personally, or, you know, there's nothing that could pay for that experience. I guess I'm also really grateful for, for the church here hosting us today. Um, so for me, the cost of causal, um, you know, assumptions is a very important topic.

And I think the church reflects that quite as well that we should question every assumption we have. Questioning those is pretty cheap. And so when we didn't go invest in verifying those, you know, that's where we put down our money. And so in the same way, I guess people have been gathering here for hundreds and thousands of years.

They're also questioning the assumptions of life, so, I think that was a nice, topic, uh, to bring around here, and I guess finally I'm thanking my wife who, helped me, with all of these things, so it's just very supportive.

What question would you like to ask me? 

That's a good question. I was thinking you wouldn't ask that.

Um, I guess, I don't want to ask a question. That other people might've asked, like, what's your story in causal inference, and where you come from. But I guess I, I probably would ask maybe a personal question, like, um, you have studied philosophy, I think. You studied also psychology, you, you've been a music producer, so you've had a whole bunch of wide experiences there.

What's a common learning? Like, what's, what's one wisdom you would want to pass on helping you through all of these different stages in life? I mean, that's, that's a wide variety of experiences you've seen basically the world. So what is it that you think causal inference can learn from your experience in music production, in philosophy, in psychology?

What is it that we should emphasize in the next five, ten years? 

I think the unifying, thread that goes through all of those experiences is looking at, at the information flow. So this might sound a little bit abstract, so I'll, I'll unpack it a little bit. Yeah, go for 

it.

 When I started studying philosophy, I came there with, with a lot of questions, and I thought that I will find answers to those questions. But I think I ended up with something much more valuable, which was that I started asking questions about the assumptions that I didn't know I had.

Yeah. And this was one of the probably first moments when I realized that by living in certain environments, you learn certain things and not necessarily always in a conscious way. So then when I went to music, and I started, started playing music first, I started playing music because I was very passionate about it and I just loved it.

And I was like, so curious how to. I heard some jazz concert, you know, like what do they do to get the sound? What is the harmony there? Like, 

ah. How does he do his fingers so fast on a piano? 

 Then I, at some point I became a music producer and then I realized that you can. Take essentially the same composition and you can frame it completely differently, using the ideas that you have about how human will react to certain things that you can change, I don't know, in instrumentation, like the arrangement, the, how somebody sings and so on and so on.

And then I started realizing that it's all about information flow and there are many channels of information flow. Yeah. And this goes for culture in general, it works for art, whatever art form that is. And then when I got very interested in statistics and data science and machine learning, I think that was one of the things that made this transition for me much easier because I came there and like, okay, there's a certain problem.

Those people have a question and they have some data. How should we channel this information? So we either answer this question or we learn something about the question itself, you know, or the system, or the data, so we can take another step that can lead us in the direction of solving this problem.

I think this is a very, in a sense, fundamental or like first principles based perspective. And I believe it's also very important causality, and I find DAGs to be a device that can help us understand information flow in a very clear, very 

clear manner. Yeah, that's great. I think actually, uh, to mention there as well.

Have you read the book, Godel, Escher and Bach? 

No, I heard about this. I think at least ten people in my life told me like, hey, this book is great. You 

need to read it. Absolutely. So I think, I think there's a strong connection. First of all, I think we will look back at this book and we will find connections to causal research there as well at some point.

It's, it's obviously a long read, but I do recommend it to the listeners as well because Pergudo was a mathematician, Escher was a, artist, a painter, and, Bach was a musician. And there is an underlying, you know, there's an underlying unifying concept here for all of them. And it's, it's quite the same actually, you know, and it's expressed in different ways, music or mathematics, you know, I mean, Bach, if you look at it actually, what he, what he composed.

It's very mathematicized. Exactly. Yeah. Yeah. Yeah. And so, I think it's the same with causal inference. We need, we need to go deeper, you know, isn't it in the movie Inception where it's like, we need to go deeper, you know, we get to know deeper in those levels and proximal learning, for example, I think is also going to be really interesting in the future.

So I'm really interesting to see what we're going to be, you know, in five years, looking back at this episode and seeing what's developed out of all of these very, you know, uh, early initiatives and what people will do with it. 

Yeah, definitely. I'm also very curious about this and, you know, recording a podcast like this gives us a possibility to look back and hear what we thought back 

then.

Exactly. Yeah. Yeah. 

Is the future causal? 

Is the future causal? I think life's always been causal. I think life was causal yesterday, today, and tomorrow. Unfortunately, some people thought that causality can't be talked about for the last hundred years, so let's Just ignore that little blip there and move on to accept that the future is causal, because it's always been causal.

And we should be just more explicit about it. The causality isn't going to go away, languages might change. But, the fact that there's a cause and an effect is, is hard to deny. Unless you want to go really philosophical. So the future is absolutely causal and will always be. 

Some people start just starting with causality.

Or maybe starting with machine learning might feel a little bit unsure if they will be able to learn all the tooling that they need in order to make this. work. What would be your advice to them? 

It entirely depends on your entry point. If you're a maths PhD, you're gonna have an easy time. You're just gonna pick up a bit of stats.

You're gonna read a book and if you are gonna practical, you know, they can read your book and see probably how to, use the Matheson packages. If you're at the very other end and you don't know mathematics, of course, it's gonna be hard. I think if you want to get quick results, obviously read, Pearl's Causal Primer to get a bit of an understanding and then just see if you can learn from examples and, emphasize quality over quantity.

You know, if you start without much knowledge and, you make it as your goal to understand, for example, the instrumental variable model, which is very established and there's a lot you can read. And that's a good goal. And that should be good for, go for, and that's the causal method you can rely on.

And that's something that's going to give you it. A good stepping stone. Don't go too broad and frankly with all the most recent research that's out there and published, you know, sometimes the code isn't good. Sometimes, the method isn't even properly verified, so. Do always go deeper and find the ground truths.

And the fact is causality at its heart isn't actually that hard. It's just, we've been not teaching it right, for the last 20, 30 years. So erase your brain about all you've heard about stats, you know, find good resources, and then put quality before 10 times over 10 months, then that's worth it.

And so that, that's what I would say 

this advice is on fire Causal Bandits I love it. Jacob, if I asked you to build a causal model of your life, what would be the main, things, resources, your internal resources or external resources that you feel helped you in your career? 

I think the external one was, obviously, incredible, support from my family to just explore these, you know, things, and, always be tolerant of all the questions I asked and, having access to education, having, living in a time with internet and meeting the right people at the right time, which requires yourself to be put out there.

So I guess internally, I guess I'm grateful that I was given, you know, some gifts to understand mathematics a bit more and. Gifts of curiosity and yeah, so I, I guess internally they, I also feel like they give me a responsibility to share that as well. I mean, I'm, I'm just grateful to be here today to, share my experience, with PhD students about research and with, with practitioners about, you know, how, how to address this and make sure that you really question your assumptions, you know.

One wrong assumption can create a lot of damage, but, you know, 10 good assumptions really well justified can go a very long way. And so this is what I guess personally, internally drives me. It's the responsibility to share the learnings I have, with other people. Because I guess I expected from them as well, because I would want to be treated that way.

And that's pretty much it. Yeah. 

What resources would you recommend to people just starting with 

causality? Causal Primer, Judea Pearl. That's a good book. Easy to read. And what else we got? It's quite funny because, of course, once you've done your PhD, you forget where you started. I mean, a practical book like that.

So, it depends what learner you are. I think the Causal Primer is good for someone who's a bit more theoretical abstract. I think a book like yours would be good for someone who just learns by practice. I'm actually also more of a learn by practice person. So, I would recommend a practical stuff to start first and get a feeling for it.

Depends where you are for sure. Yeah, I, that, that's pretty much it. I have a webpage, it's a community called causal inference. org. I haven't updated much, but I actually have a list that says beginner, intermediate and advanced. That's great. So you can go on the webpage, and, learn some things there as well.

Yeah. You mentioned 

your webpage. What are other places where people can learn more about you, your team, 

your research? Yeah. So there's my personal website, which is jacob zeitler. de. And then, there's also a blog on my company webpage. Where we, and then we actually have a seminar series as well on, on Bayesian optimization and experimentation.

And are there online or, they can be online. Yeah. So it's on request. So it is online. Some people join, virtually. And you know, if someone has a real thing they want to contribute and yeah, we happily talk and chat and otherwise on LinkedIn, just like you, you know, I, I do think too, that LinkedIn is, at the current stage doing exceptionally well and connecting people.

And so people can just add me on LinkedIn and shoot me a question, or send me an email. Jacob, what's your message 

to the 

Causal Python community? The Causal Python community, my message is very simple. Think about the cost of your causal assumptions. Number one, the step first is to question your assumptions and then think about what's the price, what's the dollar tag on this.

And because they don't come for free. Some are cheaper, some are more expensive. And do just really consider when experimentation is a good choice. I think that's, that is the most, you know, realistic perspective of causal inference you're gonna get. Because we wanna avoid a world where we're just walking around with purely observational, you know, data sets.

 I'm okay with people doing observational studies to inform trials. But I find it really hard to just do an observational study and never even think about how you would run an experiment for, for that, to actually verify the results. And so, the cost of these assumptions, to slice the, the polytope, to get to something more justifiable, it, it's just gonna make your life easier as well.

Focus on that, and you will be just fine. So follow the truth, question your assumptions, and you will get to a good place. Thank you so much 

for your time. I love this conversation and I hope that the community also loved it!

Thanks for having me. It was a pleasure. Congrats on reaching the end of this episode of the Causal Bandits Podcast.

Stay tuned for If you liked this episode, click the like button to help others find it. And maybe subscribe to this channel as well. You know.

(Cont.) Causal AI, Justin Bieber & Optimal Experiments || Jakob Zeitler || Causal Bandits Ep. 007 (2024)