Causal Bandits Podcast
Causal Bandits Podcast with Alex Molak is here to help you learn about causality, causal AI and causal machine learning through the genius of others.
The podcast focuses on causality from a number of different perspectives, finding common grounds between academia and industry, philosophy, theory and practice, and between different schools of thought, and traditions.
Your host, Alex Molak is an entrepreneur, independent researcher and a best-selling author, who decided to travel the world to record conversations with the most interesting minds in causality.
Enjoy and stay causal!
Keywords: Causal AI, Causal Machine Learning, Causality, Causal Inference, Causal Discovery, Machine Learning, AI, Artificial Intelligence
Causal Bandits Podcast
Causal Models, Biology, Generative AI & RL || Robert Ness || Causal Bandits Ep. 011 (2024)
Support the show
Video version available on YouTube
Recorded on Nov 12, 2023 in Undisclosed location, Undisclosed location
From Systems Biology to Causality
Robert always loved statistics.
He went to study systems biology, driven by his desire to model natural systems.
His perspective on causal inference encompasses graphical models, Bayesian inference, reinforcement learning, generative AI and cognitive science.
It allows him to think broadly about the problems we encounter in modern AI research.
Is the reward enough and what's the next big thing in causal (generative) AI?
Let's see!
About The Guest
Robert Osazuwa Ness is a Senior Researcher at Microsoft Research. He explores how to combine causal discovery, causal inference, deep probabilistic modeling, and programming languages in search of new capabilities for AI systems.
Connect with Robert:
- Robert on Twitter/X
- Robert on LinkedIn
- Robert's web page
About The Host
Aleksander (Alex) Molak is an independent machine learning researcher, educator, entrepreneur and a best-selling author in the area of causality.
Connect with Alex:
- Alex on the Internet
Links
Find the links here
Causal Bandits Team
Project Coordinator: Taiba Malik
Video and Audio Editing: Navneet Sharma, Aleksander Molak
#causalai #causalinference #causality
Should we build the Causal Experts Network?
Share your thoughts in the survey
Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com
Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4
011 - CB010 - Robert Ness
Robert Ness: Should I just go ahead and apply this kind of Q learning code that we've been doing for a long time? Oh no, in this scenario, it's going to be very error prone because you don't have identification or the thing that you're trying to optimize and the thing that you are actually optimizing are different and are going to lead to different outcomes.
So with respect to AGI, if we try to anchor it from a causal perspective, I think we can agree that
Marcus: Hey, Causal Bandits. Welcome to the Causal Bandits podcast, the best podcast on causality and machine learning on the internet.
Jessie: Today we're traveling to an undisclosed location to meet our guest. He studied economics, but moved to statistics.
He finds inspiration in modeling natural phenomena. That's why he preferred to study systems biology rather than financial markets during his PhD, he loves hiking, learning languages and teaching others, senior researcher at Microsoft research. Ladies and gentlemen, let me pass it to your host, Alex Molak.
Alex: Ladies and gentlemen, please. Welcome Robert Ness.
Robert Ness: Nice to meet you. Thanks for having me on the podcast.
Alex: Hi, Robert. How are you today?
Robert Ness: Good. Pretty, I'm feeling pretty good today. Yeah,
Alex: You started your educational journey with, uh, with economics, and then you gradually moved towards things related to computation and then causal inference.
What was the unifying aspect of all those choices?
Ah, so what connects all of that? So I knew fairly early on that I wanted to do something quantitative , but also connected to being out there in the real world and solving applied problems. And I knew it would be related to statistics because I learned fairly early on that I had perhaps a little bit of a superpower when it came to statistics of all the formal topics, mathematical topics or quantitative topics that I had approached it.
This one just seemed to come to come to me so naturally, very naturally in a way that I couldn't really understand why given the amount of effort I had to put in other things.
What did you find attractive in statistics? Was there anything else beyond the fact that it was just easy for you?
Robert Ness: Yeah, that was it.
It was just easy. Like I remember and there was one, and it's funny because, uh, probability theory was actually kind of not that easy for me. I mean, I, I eventually got good at it, but I wouldn't claim that I had some kind of natural gift at it. But every time I would study and then take a stats related exam in undergrad and then in graduate school things just clicked really easily.
But I wouldn't say that, you know, it was statistics itself that attracted me. It was, I think the process of modeling of data, doing data science building models. I think that's the part that I enjoyed, but the theoretical idea is always just kind of came naturally. So I think I just leaned into that.
Alex: How your experience in,systems biology and looking through the lens of biology influence the way you think about causality today?
Robert Ness: I got into systems biology during my PhD because I wanted to work with a professor. I wanted her as my advisor and I liked a little bit about what she was doing.
She was doing mostly computation or statistical proteomics. And I liked that it was in the natural science. I think at that point I had been thinking for a while about working in financial engineering and found it to be, yeah, that the financial markets, I felt lacked a certain type of anchor in reality, like you would build this model, but what is this model?
Right. What we're like in the natural science, you know, there's some kind of ground truth in reality that, that you're modeling, right? If you're modeling, if you're building a model of physics or a chemical system or a biochemical system, your model is obviously going to be an approximation of the truth, but the truth exists.
And so I think at that point, that's what was attracting me to. Comp bio. And I like, I wanted to work with complex systems and dynamic systems and build models that would simulate, say for example, the workings of a cell. And that was close enough to my advisor's, um, area of expertise that she felt that she could support that even though it would be a new direction for our group.
And through that kind of working in computational biology, systems biology, I got introduced to the task of trying to use structure learning or causal discovery algorithms, we might say today to try to reconstruct biological pathways, namely in my case, signal transduction pathways from the single cell data, single cell data, you would get enough degrees of freedom to be able to apply these algorithms and learn something that's approximated the biology.
And so that was my, my first foray in the causality.
Alex: What were the main challenges at that stage for you when you, when you try to apply structure learning algorithms to this biological data,
Robert Ness: it was building something that was useful in practice to people. So causal discovery, generally speaking, is this task of taking data could be observational, could be experimental and trying to learn the causal properties of that system. Typically, this is learning the structure of a causal DAG, uh, directed acyclic graph. And at the time it was interesting to try to take some of these biological data sets that we were, we were seeing this kind of explosion in measurement technology for molecular biology.
So things that could measure a lot of variables at high throughput, uh, with decent speed with decent cost and could get, you know, relatively large and rich data sets for, from, um, measuring these systems. And, you know, so you can take this data and you could throw it at some structure learning algorithm, some causal discovery algorithm, and you'd get a graph.
And then what? Right. What do you do with this? It became clear to me very early on that I needed to understand not just the biology, but what's the biologists, what the laboratory analysts were doing, or we're trying to solve, uh, as a result of this analysis, right? So typically they don't just want a DAG, right?
They don't just learn and DAG and then publish it. And this is the DAG and this, and I'm done. Right. And so they're usually trying to do something with it. Say, for example, learn new biomarkers or learn, do something with respect to drug discovery, um, or understand you know, whether or not there's some kind of signaling pathway that they didn't know about.
And so through that need, through that kind of pain point that people in the laboratory were facing, I started nudging kind of causal discovery for systems, biology, type of types of methods along the lines of experimental design. Specifically I was taking Bayesian experimental design methods where there's some kind of outcome that you want where the DAG is just an artifact on the way to that, that final outcome and you're dealing with uncertainty, you're dealing with experimental design problems, what should I measure, how much should I measure and just kind of trying to have an entire workflow and perhaps a sequential workflow that somebody in a laboratory setting could use, um, to, you know, ideally even automates to, to, um, whatever their final goal was in terms of biological molecular biology research.
Alex: When you were able to find or discover a structure underlying certain processes there, what was your way back then to evaluate this structure, if this is a correct structure or no.
Robert Ness: Yeah. So in the papers that came out initially in this space, you would write, apply the algorithm to the data get a graph and then you would know in these cases, in these publications, the ground truth graph. And then you would just say, look at some kind of way of measuring distance between your learned graph and the ground truth graph. So maybe like some kind of precision recall style metric or some , statistics related to say structural Hamming distance, things like that.
And that's only of course useful if you have the ground truth model. And so there's no way of evaluating how you're doing when you don't, you just have a, we have this good, this graph on the other end. And so I turned to a Bayesian style reasoning to say, number one, we want to be able to deal with uncertainty.
Number two, typically this causal structure or the structure of the system you're modeling is not entirely unknown to you. There are some things that you, you have some prior knowledge about the size of the shape of the system or what's not allowed in the system. Maybe this protein will never, you know, never interact with that protein, for example.
And so being able to give people a way of incorporating that prior knowledge and modeling that uncertainty and then getting to a final answer that incorporated all that knowledge and uncertainty in a rigorous way such that you could show through mathematical reasoning that given more data and some criteria for evaluating whether or not you've achieved an answer to the biological question that you're interested in, you can be having, you can have guarantees that you'd be at least moving in direction of the right answer, even if you didn't know exactly what the ground truth right answer was itself.
Alex: You mentioned Bayesian experimentation, which is related to a broader topic of optimal experiment theory. I was wondering, what are your thoughts about the entire field that we could call causal decision making or causal decision theory?
Robert Ness: Yeah. So it's, there's interesting history there.
I think, um, if you look up causal decision theory online and you'll see a contrast with something. Yeah. I believe people call them perhaps empirical decision theory where it's presented as this idea where that when you're a decision making agent, you should only be attending to the consequences of your actions as opposed to the more traditional view of maximizing expected utility, uh, conditional interaction. And there are some interesting examples for where the, the causal ideas kinds of lead to suboptimal results, like with something called Newcomb's problem but I think now these days where we have a lot more of a mature understanding about causal models and how to incorporate them into some end to end analysis We see it's coming up in many places where we're doing automated decision theory.
So this is a show causal bandits. So, you know, Elias Barenboim, for example, has this paper on, causal bandits, uh, where you have some kind of adversarial relationship between some confounder in the environment and you're trying to estimate a causal effect, perhaps with in a setting where it's not identified and maybe you have some partial identification and you're doing some I think what he calls causal Thompson sampling in that scenario. And so I would say that is a question of can we apply causal knowledge to optimizing in a system where a, there are some unknown causal factors and be those causal factors, uh, would lead to us about the more decision if we use traditional, um, bandit methods.
Another area that I see kind of having gotten a lot of traction in so called causal decision theory or causal sequential decision making or so is, is causal reinforcement learning, typically this either shows up in the shape of how can we use causal assumptions to get more sample efficiency in terms of our learning and I think that's very important, particularly when you're trying to learn a model at a very high dimensional setting and sample efficiency can make something that's intractable tractable and also an area of credit assignment. So when you're trying to understand how or why a policy led to a certain outcome, you can be asking questions.
It could be, you can pose it as a causal question, maybe use some attribution methods or a root cause analysis or, or actual causality style methods from causal inference theory.
And I would say, personally, one of the things that's interesting me about this field is can we think about ways that humans make decisions, you know, reasoning about cause reasoning in other ways and algorithmatize them in some sense, such that, uh, we don't, we're, we're automating a lot of that decision making or also rooting it in the statistical, the innate statistical and probabilistic inference capabilities of our current state of machine learning algorithms, right?
Like we know that we're very good. If you give us enough data, we can learn statistical patterns in that data , but a lot of the ways that humans reason, they're kind of actually not working with. They're not conditioning on data that they observed, they're imagining hypothetical scenarios and then conditioning on outcomes in those hypothetical worlds, what we in causal inference would call potential outcomes.
And so this is something that we do that I think, yes, is very sample efficient. Like you don't need to have experienced something in order to make it as in based on the lessons from that experience. You could read it from, you can read it or imagine it. You could observe it in somebody else.
For example, it's like, okay, this is what I'm going to do when I, when I face the situation. Right. And, uh, I think causal models give us a semantics, give us a language for building those kinds of algorithms. And I think much like, the causal bandit example, it's, there's a potential here to go beyond just sample efficiency and actually open up capabilities that we didn't have before in specific scenarios where there's, where these types of capability and matter,
Alex: I noticed that many people in the community are interested in causal reinforcement learning, but we actually haven't covered it in depth in any of our episodes so far. Would you like to share with those people in our audience who are not familiar with the problem setting of why reinforcement learning might not be causal?
Although it's, uh, involve involves action, right? And what are the advantages of causal reinforcement learning? What problem causal reinforcement learning can solve?
Robert Ness: Yeah. I think a lot of it would align with things that I've just said, because to me, causal decision theory and causal reinforcement learning are very much related, right?
I think reinforcing learning is essentially asking, how can we find policies for optimal decision making that's automated and done in sequence, right? And of course, you know, reinforcing learning itself is, I think a little bit of an overloaded term. It generally means as opposed to the specific task of learning with reinforcement and that kind of encapsulates everything that's in some sense related, I would say for somebody who's trying to understand what contributions causality could have to reinforcement learning. Um, who has some, maybe some intuition about reinforcement learning and less about causality. In one sense, so I think, yeah, the low hanging fruit here is probably sample efficiency. In the sense that in causality, we think a lot about the structure of the data generating process and those assumptions allow us to make inferences that we could not make if we were just modeling statistical patterns in, in the data, traditional, let's say, deep, reinforced learning tries to do it I guess maybe specifically model free reinforcement learning tries to essentially model everything as a Markov decision process with just like, you think about this as a causal DAG of this kind of direct relationships between state action and reward you know, over time and a lot of that often ignores some of the causal nuances of the system and It just folds everything into state, right? There's just some state variable that just captures everything about the world. And we're not interested in stepping or separating that out, uh, those variables into some kind of causal related nodes in a direct the graph.
And, and again, most of the time, since your, your goal is to try to find these, the action or set of actions or the action generating policy that maximizes reward, that's usually good enough. But then there are some cases where. The thing that's that that maximize the actions that maximize expect the reward and the actions that maximize expect the reward when you consider how the actions change the environment can be different and it's, and you can use causality to try to understand when and how those differences happen, but oftentimes it's easier just to go and collect more data from all the various, you know, again, try and fold more data and try to fold everything in the state and then get data that covers as much of that state as possible.
So. Okay. Uh, the causality is not really going to play that much of a helpful role unless you're a dealing with trying to maximize outcomes, select actions that maximize outcomes under circumstances that are not seen in training data. And even in those cases, you want it such that the, the actions that maximize the outcome are different from the, from the actions that maximize the outcome once you, once you account for all of the causal nuances of the system.
And I think in a lot of problems that we kind of see in textbooks or that we see in class, like those are the nuances are small. It doesn't seem to matter, but I think to get reinforced with learning to a practical place, it probably is going to matter, particularly if, um, you know, in those settings where you just don't have the ability to generate, um, training data outs and, and, uh, in all the scenarios where you want to apply the model,
Alex: especially that the state variable might not contain all the information that is required for causal identifiability, right? So we can put information, we can put more information, um, that is required and this can bias our results like some collider biases, maybe something like this, or we can also exclude information, confounding information in the state variable.
And then we'll basically make the best decisions in a purely associative terms, but not, uh, in.
Robert Ness: Yeah. I mean, there's definitely cases, particularly in causal effect inference where conditioning on some data can hurt you, right? Because it biases the answer and things like colliders for example, or mediators.
Um, and yeah, not much of that's goes into. Uh, the considerations of the traditional, um, Reinforced learning approach. And so I think there could be some wins there,
Alex: you know, I, so reward is not enough,
Robert Ness: I probably is enough for a lot of things to be honest. I mean, like, so, I mean, like, uh, I mean, I hear this a lot from causal inference people where you can kind of find some case where you find some case where, Hey, if you don't consider the causal nuance in this case, um, you're going to lead the problems, but it might work.
It still might work. Very well in most practical problems. So I don't want to, um, so I think that, um, if you're working in, you know, in causal inference research or applied causality, you need to be thinking not just about the toy problems that prove that what other people are doing is not going to work on your toy problem, but you need to find Practical scenarios that have high value of respect to somebody's, um, you know, goals and research goals, business goals, uh, engineering goals where say, Hey, all right, you definitely need a, a causal model in this case.
Otherwise it will not work. And that takes, that, that takes a lot of domain knowledge. And so it's, it takes, uh. Tracing that, that, uh, threshold from toy model to real world scenario where this thing is necessary is a lot of work that I think is, um, Uh, well, a lot of times we're not incentivized to do it, right?
Like you can get, you can publish a paper on that toy example. Yeah. Uh, you've proven that some generalist algorithm is not going to work in this in, in this scenario, but like, unless you take it to people's problems, um, it's, it's, you're not, you're not gonna get much on the audience for that work.
Alex: Truth is complex.
All we can hope for, for approximations. I think that was John von
Neumann.
Where is the boundary of those approximations? Where are they good enough in practical terms?
Robert Ness: Rephrase the question. Let me make sure, because it's a broad question. Is there? Yeah,
Alex: it's, it's a broad question. So we're talking about decision making and you also mentioned that in certain cases we might not be, it might be not really necessary to, to, to use causal models because the decision making, For a given use case, let it be a business case or whatever, a scientific case might not be justifying, uh, building a more complex model.
Robert Ness: Okay. Let me give you an example there. I think that's a very easy one to jump into. So like, I think a lot of people who are. Um, who, who think about causal inference, they're, they're mostly thinking about causal effect inference. You know, so average treatment effects, individual treatment effects, conditional average treatment effects.
And if Sophie, if, if the novices in the art and the audience will forgive me, you know, this is a causal bands podcast, I'll, I'll, I'll mention some causal terms that maybe some people will, will, will recognize. And so if I say, you know, a causal effect, you're interested in, you know, the expectation of why given Uh, do X equals treatment minus the expectation of Y given do X equals control, right?
And so like this to say, so instead of say treatment and control, we can say, you know, action one or action two, right? And, and so essentially what you need to do is learn the probability distribution of Y given do X, the do action, or even maybe you don't even need to learn all that. You just learn the expectation of Y given do X action, right?
And so it's very easy to come up with an example where the expectation of Y given do action and the expectation of Y given action are different. There'll be less cases where the argument, the value of X that maximizes expectation of Y given, uh, do X and they argue about the value of X that maximizes expectation of Y given X.
So oftentimes while these two things will be different, the, the argument that maximizes both, both queries. We'll be the same. And I, you know, I don't have any kind of measured theoretic quantification of how much this is, but you can imagine that it's easy to figure out the two queries are different, but that the argument that maximizes the query might often be the same.
Right. And so, you know, in other words, like the expectation, the expectation of why given do X is a hundred while they expect to do X equals one is a hundred. The expectation of Y given X equals one is, you know, 80. Right. And it might be that, uh, Um, the value X that maximizes both those queries is the same, and that's often the case.
And so like in that case, your expectation of Y given X is, uh, when you're maximizing it, it's maximizing it's you're, you're doing an approximation of the actual causal thing that you're trying to do. And a lot of the time that'll get you to where you need to go. And so in order for you to find a. For you to motivate actually modeling the modeling, the do X, you need to zero in on instances where these were the argument that maximizes one and the argument that maximizes the other are different and are often different.
And you need to do so in a practical scenario, like it's easier for me to draw, draw some kind of toy model where this is true, but to do it, say, for example, with robotics or, um, you know, Um, you know, learning how to fold a protein, for example, that's going to be a lot more work.
Alex: I imagine that knowing in which cases, this argument will be the same for interventional and observational, uh, query might not always be straightforward.
Right? So
Robert Ness: I mean, that's the whole, I mean, I think that's part of the goal of the causal analysis, right? To be able to explain to people clearly, Hey, this is when you can expect this. This, uh, this approximation approach of number one year, what you're doing is kind of an approximation to the right way of doing it.
And number two, even though your, your way of doing it will actually work in many, many practical scenarios. Here are some practical scenarios that are, that are important for a, B and C reasons where it will not work. And so, and then now that you've given somebody like. a clear guideline, right? Like, Oh, if I want to deploy here, I have a, I have a learning problem.
I reinforced the learning problem. Should I just go ahead and apply this, this, uh, this kind of Q learning code that we've been doing for a long time? So, Oh no, in this scenario, it's going to be very error prone because you know, you, you don't have identification or you don't, you know, or, or. You know, the thing that you're trying to optimize and the thing that you are actually optimizing are different and are going to lead to different outcomes.
Um, and so being able to provide people with that type of roadmap, I think is very important.
You
Alex: mentioned before in the context of decision making or decision theory, um, that you're interested in looking into ways how we could make algorithms, uh, make decisions in a human-like in a human-like way.
Mm-Hmm. . We, we know that humans, uh, might be good in decision making, may be good in causality in certain circumstances, and in certain other circumstances. We are, we are just not very good. Mm-Hmm. , any of those? Any of those tasks. Um, and I want to take it into a broader context because with the race of generative models.
We started talking more about AGI and so on. And some people said like, Hey, this is AGI already. We already have it. Maybe they changed their mind later. Maybe, maybe not, but AGI seems to be a vague concept. Right. So when I started, uh, interacting with chat GPT after it was released, I had this impression that it gives a very strong.
Not indication of, of its own intelligence in a sense, but it not necessarily, it does not necessarily comes from the fact that it's very good at some numerical tasks or estimation task and so on. We know that in factual and council, factual information, it can, it can be reliable sometimes, but sometimes those fail failure modes.
It might be unexpected and very, you know, and very strong in a sense, right? That the information might, might be arbitrary, incorrect, but this aspect of a human like quality that we find in this text, I think makes, uh, it makes it very appealing to, to many of us. What would be a, an AGI system? Would that be a system that will be as optimal in its estimations?
It's possible. I would be a system that would be maybe just human like accurate, or maybe these are not um, exclusive, exclusive, uh, options. What are your thoughts?
Robert Ness: So with respect to AGI, if we try to
anchor it. From a causal perspective, I think we can agree that an AGI model, uh, artificial, um, generalized intelligence would be able to reason about. Cause and effect. And, and so, you know, one way that I think about that personally is I, as I kind of looked at some of the work that's come from the computational psychology community in the sense where kind of how, how this research tends to go is you.
Um,
Robert Ness: observe some ability that humans have in terms of reasoning or some kind of tendency that they have in terms of reasoning and you model it ideally of a computational model. And uh, and then you maybe design some kind of task where you recruit a bunch of undergrads and um, ask them, uh, to read something yet and answer some questions.
And you look at the distribution of their answers and then you compare and then you apply your algorithm to those, to those vignettes, to those examples, to those, those questions and look at its answers. And hopefully the, uh, the distributions align, hopefully you can find some other ways of figuring out that your model is a good model of how humans are reasoning about a certain process.
Um, and I think that's interesting in the sense that, you know, from a causal perspective, again, we need causal models. We need causal inference theory to, um, specify the, uh, some set of assumptions and inductive biases that in addition to observe data or experimental data, we allow us to make conclusions, uh, that we could not make without those assumptions.
And so, insofar as So I think there's a con there's a connection here in the sense that, you know, is it, is it, this computational psychology model is like body of assumptions about how to do a thing, uh, with respect to reasoning that are yours. That's a lot of what we're doing in causality. And I, so I think that a lot of that comes together, particularly when it comes to causal reasoning tasks.
And, and I think that is a step towards AGI in the sense that we're thinking about how do I build models that have. Uh, a certain type of reasoning capability that we observe in humans. And you know, and so, you know, some of that is causal, some of it's not causal, but, and we'll respect this insofar that we're looking at causal abilities there.
I think that that's, that's an interesting thing to work on. And it's, it's, it's actually a little bit different from kind of how traditional causal inference research works. Traditional causal inference research is typically about answering objective questions from epistemal as epistemological point of view about.
A cause and effect relationship that's external to somebody's head that exists in the, in the, in the world. Right. So like, does, uh, this drug treat this illness, um, better than placebo does, um, does this vaccine prevent, uh, uh, catching this disease does, um, smoking cause cancer, right. And then those, you know, and does it, does it do so on average across some population and.
And those questions we can be very objective about, but if we're trying to figure out how to emulate a college student reasoning about some information they see, uh, in a vignette or, um, or, you know, in some kind of, in some, in some kind of controlled environment, then it's less about whether or not that human is making a objectively correct judgment is whether, but it's rather whether your, your algorithm can align with what that human is doing.
And, and oftentimes these two goals of objective truth versus alignment are overlapping, right? If I'm trying to understand, if I'm trying to model how, um, a human makes a conclusion from a scientific experiment, for example, then that's a lot like what we're already doing with causal effect estimation.
But if I'm trying to evaluate how a human is, you know, to use the previous example, um, observing evidence and then imagining various futures and then making a decision based on some type of utility function that's very maybe specific to the human. Maybe that utility function is polluted by some cognitive biases or some kind of inefficient heuristics, right?
Those are things that, um, Uh, or inaccurate heuristics rather, and those are things that maybe I want to capture my model as well. And then maybe, maybe later on, then when I deploy this model, I'm like, okay, now I want to make sure I don't have any of these biases or these logical fallacies in there and just kind of make sure it's anchored by the, by the data.
And so I think that is an interesting path towards AGI. Yeah. Let me pause there because I think there's some other things you asked, but, uh, that I didn't address, but we can, we can circle back to that. But I think that that's kind of how I think about the problem in particularly in terms of this alignment versus objectivity problem.
Alex: It sounds like having a general model of, of human thinking, uh, like general causal model that will allow us to turn on and off those biases. Would be something extremely flexible in this case.
Robert Ness: Yeah. I know. So one example I was thinking of was, there's an interesting result from CogSci about how humans are, we definitely engage in counterfactual simulation in terms of our, you know, what we imagine and when we're making decisions, um, and sometimes it doesn't quite happen.
And so for example, there is, so in one case, um, Imagine that I say that, uh, I broke the machine and now it no longer works. And you might say, and, and, uh, a simple label flip there would be to say, I failed to maintain the machine and now it no longer works right now from a kind of numerical and coding point of view, that's more or less the same thing.
Maybe one is one and the other is zero. You can just switch the label. It's the same. But, um.
Um,
Robert Ness: it turns out that humans use counterfactual simulation less in the absence of actions or in the absence of events. Right. And so, um, you know, whilst if I say, if I prompt somebody with that type of, uh, that vignette, a story about how, you know, somebody does something and it causes something to break, failing to do something and it, and it, and it.
And causing it something to break in a second case, oftentimes they won't engage in as much counterfactual simulation than in the first case on average across, across groups. And that kind of makes sense in this, from a, from a heuristical standpoint, in the sense that it's a lot easier to enumerate actions that happen.
Then actions that did not happen or events that happened that events that did not happen, right? So trying to think of all the events that didn't happen and how the absence of those events could have caused something is a lot harder than enumerating the things that did happen and ask anybody who works in you know, even monitoring for like, you know a server or an IT system, right?
Like it's very difficult to
Um,
Robert Ness: uh, detect when something doesn't happen as opposed to when something does happen. And so, um, you know, so that makes sense from a cognitive standpoint, right? Because we, you know, just simply being economical about what types of events we mentally simulate, but you can imagine.
Then finding ways algorithmically to address that problem, you know, maybe you want to scope out the, uh, the negative events in a certain way that make it much easier to algorithm, uh, to, to reason algorithmically about the counterfactuals of those missing events or those, those absence of events. And then, so like, I think that's one example where you can kind of, um, model heuristic that's maybe.
Isn't, you know, has certain types of is useful from a heuristical standpoint, but maybe in some sense it's not efficient or it's not, uh, it's not sound, um, and then using statistical and algorithmic reasoning to, uh, uh, remedy that in, in some appropriate way for, for a given problem.
Alex: So not ago, you published a very interesting paper with, uh, Sarah, Muhammad, the Harry, um, Karen's accent and other call first.
And although this paper was, uh, based in, in the context of biology, again, you have shown some interesting identification results, um, there. Can you share a little bit more with the audience about this paper and what were the most interesting results in your opinion in this paper? So this paper.
Robert Ness: The idea was simple.
Uh, we were using a latent variable modeling type of approach that we often see in probabilistic machine learning, um, or that you see in probabilistic machine learning frameworks or like, uh, like PI MC or Stan or, or pyro. Um, and showing that if you come from a kind of graphical modeling standpoint and, uh, or a, or a causal graphical modeling standpoint in your, in your intuition here is to say, I want to model an intervention on this model by kind of removing edges from the, the, uh, the target of the intervention and then sampling from that model.
We know that we can do that if we observe all the variables. But If we don't observe all the variables, then, uh, all bets are off, right? We just might just be sampling junk. And this is a simple result using a Bayesian proof that, uh, showed that if your model has a, a DAG and, uh, and the identity, um, intervention distribution that you're sampling from as identified, given the rules of the do calculus or some other, um, graphical.
identification framework, then, uh, your sampling procedure is valid because it's just an estimator for an estimate that the, uh, do calculus or whatever your identification, your graphical identification procedure proves exists. And so it's, the goal here was, you know, I think a lot of people might say, see tools like PyMC or Pyro and, um, or have some background in latent variable models, say from topic models, for example, and they look at them and say, well, you can write this as a DAG.
Can I use this for causal inference? And, You know, the answer is either nobody knows or, uh, you know, we're not sure because, uh, you know, you have to do a whole bunch of kind of mathematics of causal identification to make sure that it works. And so this was just showing that, like, yeah, you can do this.
You can, if you have identification using, say, the Duke calculus, it's going to work. Um, and so we showed this, um, by sampling from. The, a predictive distribution of this target intervention distribution. And then show that as the, as you increase the training data, the distribution converges to a ground truth intervention distribution.
Uh, and so that's a very kind of maybe a ba approach to thinking about this identification problem. But, uh, you know, I wanted to make sure with this paper that people who are used to thinking in terms of. Latent variable models that you can model with the graph, that the world of causal reasoning with these models is open to them as long as they can apply, they can prove that what they're doing is valid using graphical identification and graphical identification, say with the rules of the do calculus, you know, might be opaque to a lot of people, but the good thing there is that it's algorithm ties, right?
So you can use libraries like in Python, there's a, there's a library called why not wise and why, and then the number zero, the letter Y and the number zero. Um, and you can say, plug in a DAG and, uh, some set of variables that you're observing in your data, and it'll tell you true and false, whether or not you can actually identify some query of interest, given your data.
And, uh, and so, you know, that combined with whatever your favorite latent variable modeling approaches, you know, now you can say if you, you know, using these algorithms, you can prove that your latent variable model can also infer some, uh, uh, causal distributions and causal queries.
Alex: So, so I understand that the result of this paper, the main result of this paper is, uh, showing that if we have a quantity that is identifiable using two calculus, so we can Produce an estimate that can be used to correctly identify causal effect in a given system.
Then if we implement the system in a probabilistic programming framework, we're also guaranteed to get, uh, to, to go to good estimate of the causal unbiased estimate of the causal effect.
Robert Ness: Yeah. And so for somebody who's coming from causal inference, that's like not a big deal. It's like, okay, great. Yeah.
You haven't, you show that you have identification and you just kind of construct a new estimator, maybe using some Monte Carlo approach. Right. But from a, if you're somebody who's used to working with these types of systems, right. Where it's kind of very model based approach, you're thinking, you know, what's the right.
You know, what are the variables in my system? How do they relate to each other? I'm going to implement this as a, as a program. I'm going to put distributions on all the variables in my model. And if there are parameters, I'm going to put priors and all of these parameters. And then I'm going to implement some sampling based procedure or some, um, approximate transcript Inference procedure like HMC or, or variational inference, I'm going to sample from a posterior distribution of the variables that I'm trying to target.
And then maybe I'll take some of those samples and then I'll apply them to some downstream function that is estimating some query that I, that I want, right? That's, that's the kind of Bayesian or probabilistic reasoning workflow. Right. And, you know, you're, you're using probability to represent your uncertainty about the system.
And, um, you know, and, and so, uh, and you're, and you're being very explicit about what your domain, your assumptions about danger, the main generating process is, I think people used to call this model based machine learning, although I don't think that term is used very much anymore. But, you know, so if you're, I think, I know you had Thomas way, Keon in the past, like in people who use PI MC, this is the way they're used to thinking about a problem.
Yeah. And yeah. Um, the idea that you can then say, uh, use like an intervention on these models. So like PI MC now has a do, um, uh, do function that will take a model and then modify it such that it's now reflecting an intervention on that system. And, and so you start sampling from it and then you're like, does this work?
Is this, is this, you know, just because my, as anybody who's worked with say. And MCMC based model work will, uh, system will tell you it was like, yeah, this is just because you're getting samples coming out of, it doesn't mean they're actually, it's converged or that they're, that they're, uh, that you actually converge to the, uh, target distribution.
You need to, you need to evaluate the traces. You need to look at the posterior predictive distribution, all of these things that you need to make sure that your model was, is doing the right thing. And so once you now have, you start doing causal stuff to your model, like then all those traditional Bayesian checks, they don't, they're not meant to kind of deal with this kind of causal problem.
But now if you just say, listen, here's an algorithm from that you can import in Python or there's a library in R if you're an R user, um, it's called R CID. Yeah, I think that's right. And you just put in the thing that you're trying to estimate and, and some details about your model and it'll tell you.
Yes or no, whether or not what you're trying to do is valid and, um, and. You know, so it wasn't, I think, earth shattering to people who are just, who, who to see the kind of model that you build in PI MC or pyro as a, as another estimator, but for people who are used to working in those frameworks, particularly in, in leveraging their ability to work in.
High dimensional settings, you know, using the broadcast semantics of Pytorch and stuff like that, being able to model images and media and sound and video and stuff like that, that you can work with some of these deep probabilistic programming languages. That's kind of a revelation that you can, that you can now these, these latent variable models that you're training in your, um, uh, that you've been training for a while can now be used to answer causal queries.
Alex: Talking about, uh, probabilistic programming a couple of weeks ago, I knew. Um, library has been released called Cairo hero depends on how you decide to read the Greek characters and the, and this library, uh, builds on top of pyro. You had an opportunity to work a little bit with this library and it helps to abstract certain causal graphical models.
What were your impressions? First off,
Robert Ness: there's, it's an amazingly, it's a stellar group of people who are behind this library. Um, and they've been thinking about these types of problems for a while. When I teach about, um, say for example,
Um,
Robert Ness: doing parallel world counterfactual reasoning, using something like a structural causal model.
You know, these are, these are models where you have some exogenous variable. It's like a root node in the model that is sampled from a distribution, but everything downstream of it, then variables that actually represent things that you're interested in reasoning about are set deterministically given those exogenous variables.
Now for those who are familiar with, um, the probabilistic machine learning.
Um,
Robert Ness: whenever you want to take something that's set deterministically and conditioned by evidence, you have some, you know, you have a, an intractable likelihood. And so you have to spend, you have to think about how to solve that.
Maybe you're, there's all, there's, there's ways to solve it, but these are all, now you're becoming like an inference engineer instead of a data scientist or a causal inference, you know, reason or, uh, or analyst. And so, um, Cairo is, is going in the direction of taking a lot of those, uh, those more difficult abstractions of causal inference were difficult in the sense that they don't really quite mesh well with existing abstractions for deep probabilistic machine learning, uh, or, you know, or, uh, probabilistic programming or Bayesian inference or basic computational base, whatever you want to call it.
And, um, making that kind of abstracting that away. So that's, you don't have to really think too hard about that. That inference side of that, uh, of the problem. And they're also, I think they have a very interesting philosophy. If you go into the tutorial, um, though they make some, they make some statements that I generally agree with things like, um, you know, causal uncertainty is, you know, or they kind of draw parallels between being uncertain about the causal structure of your model and then just being uncertain in a Bayesian sense where you're kind of thinking about.
Modeling uncertainty with probability and that's, and then vice versa, where like, uh, uh, uncertainty by your system can be addressed with making causal assumptions with the system. And so like they, they, I think for people who have a, who understand, say I got a probabilistic machine learning or a, a, a Bayesian probabilistic programming kind of, um, way of thinking about problems and have just seen.
Causality is something that seems vaguely related, like there are, there are graphs and there are, and there's inference and stuff we're kind of wondering, you know, exactly where are they connected? I think Cairo will be a wonderful library for these people to kind of sink their teeth and sink their teeth into, uh, still bit early days.
There's still, they've just released it. And so, um, they have some very useful tutorials on there now, but you can, you can also tell that it's really in a good place to, you know, for, for, for continued development.
Alex: Hmm. What do you think will be the next big thing
Robert Ness: in causality? Um, pretty rubbish at predictions.
Um, I could tell you things that seem to be, uh, that have caught my interest. And I think that if somebody solves them could be a big deal. I, um, so one, I think is probably the biggest deal. Like if somebody really just nailed this. Problem. And, and then, and just put out a few papers that were really just sorting it, um, really put their name on the, uh, on the, uh, in the history books.
It would be causal representation learning. Hmm. Right. And so, I mean, and this is related again, to some of these probabilistic machine learning problems of like disentanglement and, you know, trying to learn latent representations that correspond to, you know, concepts in, in the, uh, domain that you're mounting.
Um, and so, and, and so what causality really adds here is to say like, okay, well. Let's use causality to say, you know, if, if, if we're learning some latent representation that corresponds to an actual kind of cause in our, in our data journey process, then. We can use our, our knowledge about causality to identify some dys for how this cause ought to behave.
Like if we intervene on it, if we, you know, certain things ought to be modular in variant so and so, and then, you know, then you can figure out, okay, okay then what kinds of practical assumptions can I apply to? And, uh, a, a modeling problem such that we would learn, we can, we, we can be, have some, some kind of guarantee that we would learn these types of abstractions.
I think that, um, that would have a huge impact if, if, for anybody who was, who managed to do that well, and, and, uh, both theoretically and practically, but theoretically sound and, and practically useful way, um, give an example would be like generative AI, right? If you've ever tried using one like mid journey, your stable diffusion.
And, you know, you generate an image, but you, maybe you, you kind of want the head to be tilted. You generate some figure. You want the head to be tilted 15 degrees to the left, or you want the, uh, the glasses to be blue instead of red or something like that. If you go five fingers, five fingers instead of, you know, octopus tentacles and you go in there and you, you imagine that.
You should be able to go to the prompt and just say, I want this image you have here, except, you know, red glasses instead of blue glasses. Right. And it should just kind of work, right? Everything should be the same, save for the glasses. This is a counterfactual question. What would this image, you know, given that I observed this image with red glass or blue glasses, what would this image look like?
If there were, if there was a red glasses, nothing else should change except for the glasses. Right? Or any, you know, or whatever my hypothetical condition is, nothing else in this image to change except for things that based on my causal model of what I'm representing in the image would be causally downstream of said hypothetical change.
Right? So if I say I want, you know, I want him to be wearing a pirate hat instead of a, instead of a baseball hat, his ears should not move around. Right. Or you shouldn't get all kinds of weird artifacts in the background, but that's what happens now when you use these images and it kind of, it's frustrating, right?
Because, you know, maybe you're trying to get this perfect image and you just keep kind of jumping around the distribution of things that are close to what you want, but not quite. Yeah. And, um, now if you, for example, could, you know, operate. Semantically on those learned causal abstractions, uh, causal representations behind the image, then in theory, that should be a lot easier to do.
Um, I mean, easier said than done. Um, but you can imagine how like, um, you know, just that one case of, of, of image modeling, I think, you know, for all these AI startups out there who are trying to take, um, uh, Uh, generative AI for images or video and, and making and turning them into a business. It just the ability to kind of give people knobs that they can turn to make adjustments about the image.
Nice. Think about natural language models like, you know, chat, GBT and. Claude and Bard is that when it generates something and it's not quite right, you can edit it because it's text with images. You can, you can do like infill with pixels and stuff like that, but you want to actually reason about it on a semantic level, not in just terms of the form of the generated, uh, the generated artifact.
And I think for, and that's one example for where a more concrete causal abstractions, because that's when we, when we, when we, when we discuss what it is representing, representing in the image we're thinking about. You know, the image is representing some scene, for example, and that's, and as we think about that scene, we're bringing our kind of causal representation of the objects in that scene to bear on our interpretation of that scene.
And so if we can work directly in that language, that would be a very powerful, um, improvement to this class of models. Um, and so like, that's just one example for where kind of better causal representations could, I think, have a huge impact. And in AI, in AI, particularly in, well, I think in reinforcement learning as well.
Um, but, uh, uh, especially in generative
AI.
Alex: When we talk about generative models and causality, uh, it brings to my mind this, uh, idea of trying to make large language models causal, and you have some experience, uh, with trying to do this. Can you tell us a little bit more about your, your project? We try to constrain the models in the causal way and what were the results?
Robert Ness: I think this is an interesting space. I mean, there's a lot of folks now thinking about, uh, you know, whether or not, uh, large language models learn a world model. And essentially what they mean is, is, you know, a causal model, at least in my view. Um, uh, that's it, that in other words, that learning that the large language model is learning some set of rules about the world and that it can now operate on when generating responses.
And, and in terms of parsing that problem, I found it causal inference to be quite, you know, causal inference theory to be quite useful in terms of like, so we, we mentioned identification, right? Like yeah, this, for example, using the do calculus to prove that your generative model can sample from an interventional dis interventional distribution.
And so that's, we can extend that here to this case as well, which is to say like, you know, we can, if.
Um,
Robert Ness: in the idea, in the theory of identification, there's this result called the causal hierarchy theorem. Um, and so if the causal hierarchy also called Pearl's hierarchy and, and, uh, or Pearl's causal ladder.
So people here who read the book of why might remember it. Um, but it says like, you know, level one is associational, like observational statistics. Level two is interventions. Level three is counterfactual counterfactual is essentially reasoning across worlds like this happened in this world had things been different imagined in another world where things had been different.
How might things have played out? Right? And, and we know that in order to answer those questions from a based on this theorem, you need assumptions That are at the same level of the question. So if I want to estimate a causal effect, I need a level two assumptions, say, for example, in the form of a causal DAG.
Um, if I want to answer that kind of multi world counterfactual, uh, I need level three assumptions, say, for example, in the form of a structural causal model, but it could take other forms as well. And so when we ask ourselves whether or not these are deep neural network models, like, uh, large language models or other Other models that are with transformer architectures that are trained on a large amount of data.
Um, whether or not they can solve these questions, um, well, one, one thing we can do is just ask them empirically, right? We can pose a bunch of causal questions to the large language model and see if it can answer that. And in truth, it can, um, if you ask it to, to, you know, it does. Smoking caused cancer.
It'll say, yeah, if you ask it, uh, you know, produce some kind of causal analysis for you, analysis for you in Python code using a library, like do I, it'll do it. And if you ask it to, for some kind of counterfactual simulation question and say, you know, this happened and that happened, what would have happened if things had been different, it'll, it'll simulate for you.
Um, but of course we know these models. Quote unquote hallucinates. In other words, that sometimes they will say things as if they were true that are not true. And so the question is, uh, how do we, you know, can we ensure or bring some kind of, uh, can we understand when that's going to happen? Can we understand that we're even capable of not doing that?
And is there some way of curing hallucination? Is that possible? Uh, you know, is there some way that we can bring kind of causal theory to bear Um, large language models such that this, that this problem is reduced and that we can actually call answer causal questions like this reliably. And there's a lot of angles you can take here, kind of backing up what the causal hierarchy theorem is telling us is that, you know, if you were going to pose these counterfactual questions to the, uh, large language model reliably.
And, and get, or get reliable assumptions, sorry, get reliable answers, reliable generations. Then it must be, it must be using level three assumptions. Those assumptions has to have to have to exist somewhere. Maybe they're somehow included in the training data, which. Given how we're told these models are trained, um, we know it's not true or how we've, you know, and based on the open source models, we, we see trained.
We know it's not true. Could be in the model architecture itself. Could be some somehow learned in the parameterization of the train model and or it could be in the prompt. Right now, if it's in the prompt, we know it's in the prompt if it's in the architecture or somehow in what in the parameterization, um, that the model is learned.
We don't know if it's there and that's we can't prove or validate or ensure that we have this kind of reliable behavior. And so, um, you know, it could be, you could do something like. Have some kind of causal arms. Good. I'm good. Then I'm, I'm supposed to man, I had to pronounce that word. I'm Bootsman. Okay.
It's some kind of call Columbus, man, Columbus, man, some kind of causal validator in the, uh, decoder of, in the, um, decoding of, uh, representations into, into tokens when you're generating from your model. Um, In other words, that somehow it's checking what's being generated and making sure that it's, you know, valid according to some criteria.
I've been looking into ways of trying to incorporate, uh, causal information into the structure of the model of the, of the transformer architecture itself. Such that, uh, we have some for the same reasons with the, uh, paper that you mentioned where with the, with the latent variable model. We, uh, you can get some theoretical guarantees from things like the Duke calculus, we would have the same types of guarantees based on the structure of the, uh, of the, uh, transfer architecture based, uh, model itself.
And so I have, so, so anybody who's interested in that, uh, there's a notebook circulating online. I'm sure you could find it if you Google around with a very simple toy model where I'm showing how you could, you know, the scenario here I'm imagining is that if you imagine that you're. Uh, a production company and you have a bunch of scripts that were all kind of using some kind of software that, that, uh, so, you know, making films, for example, that all enforce some kind of event, uh, some kind of act one, act two, act three type of structure.
And then you can kind of say, well, act one causes act two and act act one and act two cause act three and, and then showing how, if you train the model using this structural knowledge, you can then. Simulate outcomes for act three that were not in the training data and it's all a natural language. And so, um, right now I'm trying to build that into a much more practical example and traded on some kind of heavy duty models.
And so, yeah, that's, that's the way I'm taking to, uh, address this problem. Um, but I, I think it's, it's, uh, if we can.
Um,
Robert Ness: you know, I, again, I want to emphasize that it's not just about, you know, guaranteeing that it's only saying causally correct things to me, it's more thinking. A lot in terms of like the, uh, the knobs that I gave in the generative AI is have like, you know, can we, you know, we're generative AI is useful insofar as it kind of augments our own creative process.
Right? And so it's useful for other things, a lot of bad things, but you know, the one that I'm the use case I'm interested in is, is how, as we, as creators, um, can leverage these models to, you know, you know, charge our, you know, hypercharge our creative abilities. Right. And so the more you can, Um, Make sure that the models are aligned with how we're thinking about the abstractions and the thing that we're, we're writing about or, or, or drawing or, or producing, um, I think the better.
And so I think this is part of an area of research. A lot of that coming from, um, you might see from papers coming out of Microsoft research where, you know, if we write out, there's a lot of emphasis on scaling these models up. The ever larger sets of data. And I think we're kind of, you know, there's only so large that the data could possibly get it right.
Even if we continue training them and these models are generally under trained surprisingly, but we can still, but, but there's still a big emphasis on getting more data and pumping them, pumping them into these models.
And
Robert Ness: if we can go in a different, you know, there's, there's an interesting area of research that goes into a different direction, which says like, okay, well let's freeze.
Thanks. The data size, and let's figure out how much power we can get, how much we can improve the models by, um, trying to develop in other directions. And so if, if, for example, you're trying to train models more efficiently with get, say, GPT four level capabilities with a smaller data set that opens up a lot of possibilities, say, for example, you can now leverage.
structure already like innate in the data, right? So one example I gave is, you know, with GitHub, co pilot, it's able to simulate good production quality or production quality is able to, good simulate kind of good, uh, code and it helps, you know, uh, developers kind of avoid a lot of the boiler plate of coding, but we're still generally just kind of feeding it, uh, kind of disembodied.
Individual documents of code, right? Now you can imagine, say like, you know, there's a lot of structure that's being missed. This is a repo. Here's the history of the repo. And you can see here. The, the development of the code or all the throughout time, maybe if the person is a, is, is, is good at, at doing Git, then you see very informative Git commits that say why this change with why this change was made.
Right. And, um, and then of course you could say even compile that, uh, code into. Uh, and executable, and then maybe even provide examples of inputs and outputs of that, uh, uh, of, of the executable, right? And all of that, that the anti structured system. Imagine now you could, uh, um, figure out how to tokenize all that and, and, and, and feed it into your transformer model or somehow, you know, incorporate force the architecture to reflect that structure.
Right? And so, while, you know, our, our, our foundation model so far have been focusing on. Kind of breadth. This gives us the example to, to, uh, leverage depth and hopefully kind of leverage more of that structural information. Some of the causal other, there are other kinds of structure, hierarchical structure, for example, that would, um, that we can leverage in this type of analysis.
So that's kind of where I'm hoping, uh, that's, that's an area of my research and I'm, I'm hoping we see development as a community. What were two books that changed your life? Darren Wilkinson is a. Professor in the UK, he has this book on, uh, computational systems biology and it really, it has both kind of really good broad introduction to computational bays.
So various kinds of random variable simulation algorithms, as well as. Inference algorithm. So like, you know, from, from imports, important sampling to all flavors of MCMC. And then the other half of the book is, uh, how do we build a computational model of a dynamic, of a dynamic system? And the focus is very much on, on biology, which is great, but there's no reason it couldn't be on.
I don't know, marketing or, or economics or, you know, agent models. And you know, I think I, you judge an impact, the impact of a book on your life by figuring it, you know, by thinking about how often you go back to, to, to think about the principles that you learned from that book or how often you crack, you crack it open.
That book has been super impactful in terms of just how often I've just been leveraging the ideas and probably how it shaped my research too. Um, I don't, I no longer work in biology, but I still. Oftentimes, you know, it wasn't not too long ago. I was putting a kind of computational model of advertising. I was, it was, it was very heavily influenced by that book.
Um, so I think that's one another book speaking of these books.
No,
Robert Ness: I want
to say
Robert Ness: Hadley
Wickham.
Robert Ness: Wrote a book on the R language and it was very much influenced by the functional programming paradigm and books like structure and interpretation of computer systems, which I think, you know, people who have a computer science background, like maybe their experience with functional programming might be something like SICP or, or say a book on, um, I don't know, uh, closure or something like that.
Um, but you know, me being, you know, I did my PhD in statistics and so me having been trained in R as a statistician, this was kind of my first foray into this world. It really kind of taught me, you know, it was basically my class as a statistician, you know, what a computer scientist in graduate school, if they take a class on.
Um, the design of programming languages, uh, design and implementation of programming languages. Um, uh, this book was maybe my introduction as a statistician to that, uh, field. And I don't think it was specifically designed to do that. It was just that the philosophy that he was bringing to bear really came out there.
And, um, and so, and, and that led to a bunch of other explorations in that field, which I think really impacted me as a modeler. Who would you like to thank? My wife for putting up with, uh, a lot, both in terms of, uh, all the sacrifices it takes to kind of become, uh, somebody who does what I do. Um, I'd say
my advisor for sure from my, my PhD advisor, Olga Vitek, who, um, You know, it turned me into a respectable person, um, and I'd say Karen Sacks, who is another, um, researcher who's had a huge impact on causal discovery in particular. I met her in the middle of my PhD, right when I was kind of finding my footing with my, my topic and my dissertation topic, and I think had Yeah, she, that relate my relationship with her both kind of professionally and as a friend, um, you know, really, um, it shaped me as a researcher.
It got me through my PhD and kind of gave me kind of source of inspiration and kind of, and, and guidance at the time when I really needed it. Um, um, yeah, so I think, uh, I owe a lot there.
Alex: What will be the most. A precious thing that you got in your life from somebody else, I don't know, or, or just from, from, from, from the fate that helps you in your career.
Robert Ness: Learning it on the power of routine and habits, right? I think maybe particularly for, um, Yep. Millennials like myself, like you kind of, you kind of start off thinking like you really need to find something that motivates you and that you're really excited about.
But of course, motivation and excitement is unreliable as a source of fuel to kind of keep going in directions you want to go. Sometimes you're just not feeling it. Sometimes you hit a, you know, you hit a kind of emotional plateau or even a valley. Yeah.
Yeah. I just
Robert Ness: don't feel motivated, but if you have the right habits and routines in place, then you can keep at it.
And it becomes something that you can rely on, especially in those times when you maybe are feeling a little bit unmotivated. When I think about what I want to pass on to my son, if I, if I, if I, if I had one thing that, you know, when he's in his twenties, I'm hoping that he is. learned and picked up. It's just the power to be able to design our own habits and routines and then, you know, essentially, uh, set them in place and then put them on.
Autopilot.
Alex: Where can people find more about you and your work?
Robert Ness: Sure. Um, you can always, uh, find me on Microsoft researchers website. I also, uh, teach online at, um, all deep dot AI, and there's also links there to a GitHub repo, uh, so that'll be. And the repo is all too deep slash causal ML. We will see a lot of free tutorials and Jupyter notebooks on stuff online for a lot of the stuff that I teach about.
Um, yeah, those are my two main kind of online places to look at. I try to keep my research for all my LinkedIn. Yeah. You can see me on LinkedIn. People should add me if they want to talk. Um, uh, yeah. So those, those three places all deep AI, LinkedIn. com and.
Oh, Microsoft
Robert Ness: research, Microsoft research is, uh, you know, profile pages.
I'm not sure what the URL
is.
Alex: Before we, before we wrap it up, um, what would be your advice to people who are just starting about causality and they are thinking about the research path.
Robert Ness: So you're thinking that these are people who want to. Um, like focus on causal inferences as a research field themselves, or they kind of, maybe they're engineers and they're trying to find, apply causality to a problem they're, they're wanting to work on.
Alex: So let's think about people who, who'd like to become researchers in the field of causality, let it be inference or structural learning or representation learning, whatever the, whatever the field.
Robert Ness: I mean, it sounds maybe, um, maybe a bit obvious, but it would be to, uh, try to, Well, for one, maybe find a good, you know, book and work through it.
Um, and there's a lot of candidates for that, but I think, uh, and so I'll kind of, at least you can kind of understand what the shape of the territory looks like. Um, and, you know, cause you, once you know the shape, then you can go find out where the frontier is and figure out how to push against it. Right.
And I would say, um, Um, with, you know, once you understand kind of where the directions are that people are developing, then you want to find out who are the people in that place, you know, who are making an impact. Um, and you know, so there'll be giving workshops at conferences and, and, uh, maybe doing podcasts.
And so then you, um, uh, you look at their papers and figure out who they're citing. And get a feel for kind of what the cutting edge is there. And then, um, I think, I think it's important to think a lot about, um, connecting with those people during the ones. Training, right? Whether that's a PhD or someone has something else you want to be in proximity to, um, that network, uh, of people.
And so I think that's true of anything, right? Like if you want to work in finance, you should probably live in Chicago or, or, or New York, right? If you're American or London, maybe if you're in Europe or something like that, but like, you don't want to work in Nebraska. Um, because, because you want to go to where the people are who are also doing this thing.
I mean, I think in, in, um, Um, and research is probably a bit more, at some extent, it's less about being in a physical place, but, um, although sometimes it can be, um, but really just figuring out kind of, you know, what the, uh, who are the people there who are trying to have an impact and then, um, and then figuring out how to developed, you know, from that very kind of person of that personable way, as opposed to say, for example, like just picking, trying to figure out what the most influential institution is or figuring out, you know, how can you go get the,
Um,
Robert Ness: you know, I think a lot of people these days think about getting as much prestige as possible on their CV and listen, prestige doesn't hurt obviously, but it's, you know, unless you want to work in some crowded domain where everybody knows that if you're around a bunch of procedures, people didn't, you know, it's tends to be good.
So like those 10, you know, so there's a lot of competition with that strategy. So while keeping that in mind, focusing on the people who are doing the things that you actually care to do. Thank you. I think, uh, that's,
uh,
Robert Ness: uh, that's, uh, that's the advice I'd give
Alex: is the future causal.
Robert Ness: I mean, the past, present, and future are causal,
right?
Robert Ness: Sure. Yeah. I think, I think these questions aren't going away. I don't think, for example, that's, you know, We're going to invent some new deep learning architecture that's going to make all these questions obsolete. I do not believe that that will happen.
Alex: Thank you, Robert. It was a pleasure.
Marcus: Thank you.
Congrats on reaching the end of this episode of the Causal Bandits podcast. Stay tuned
for the next one.
Jessie: If you liked this episode, click the like button to help others find
it.
Marcus: And maybe subscribe to this channel as well. You know.
Jessie: Stay causal.