Causal Bandits Podcast

Causality, Bayesian Modeling and PyMC || Thomas Wiecki || Causal Bandits Ep. 001 (2023)

November 07, 2023 Alex Molak Season 1 Episode 1
Causality, Bayesian Modeling and PyMC || Thomas Wiecki || Causal Bandits Ep. 001 (2023)
Causal Bandits Podcast
More Info
Causal Bandits Podcast
Causality, Bayesian Modeling and PyMC || Thomas Wiecki || Causal Bandits Ep. 001 (2023)
Nov 07, 2023 Season 1 Episode 1
Alex Molak

Send us a Text Message.

Support the show

Video version of this episode is available on YouTube
Recorded on Aug 24, 2023 in Berlin, Germany


Does Causality Align with Bayesian Modeling?

Structural causal models share a conceptual similarity with the models used in probabilistic programming.

However, there are important theoretical differences between the two. Can we bridge them in practice?

In this episode, we explore Thomas' journey into causality and discuss how his experience in Bayesian modeling accelerated his understanding of basic causal concepts.

We delve into new causally-oriented developments in PyMC - an open-source Python probabilistic programming framework co-authored by Thomas - and discuss practical aspects of causal modeling drawing from Thomas' experience.

"It's great to be wrong, and this is how we learn" - says Thomas, emphasizing the gradual and iterative nature of his and his team's successful projects.

Further down the road, we take a look at the opportunities and challenges in uncertainty quantification, briefly discussing probabilistic programming, Bayesian deep learning and conformal prediction perspectives.

Lastly, Thomas shares his personal journey from studying computer science, bioinformatics, and neuroscience, to becoming a major open-source contributor and an independent entrepreneur.

Ready to dive in?


About The Guest
Thomas Wiecki, Phd is a co-author of PyMC - one of the most recognizable Python probabilistic programming frameworks - and the CEO of PyMC Labs. 
Connect with Thomas: 

    Should we build the Causal Experts Network?

    Share your thoughts in the survey

    Out-of-the-box insights from digital leaders
    Delivered is your window in the minds of people behind successful digital products.

    Listen on: Apple Podcasts   Spotify

    Support the Show.

    Causal Bandits Podcast
    Causal AI || Causal Machine Learning || Causal Inference & Discovery
    Web: https://causalbanditspodcast.com

    Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
    Join Causal Python Weekly: https://causalpython.io
    The Causal Book: https://amzn.to/3QhsRz4

    Show Notes Transcript Chapter Markers

    Send us a Text Message.

    Support the show

    Video version of this episode is available on YouTube
    Recorded on Aug 24, 2023 in Berlin, Germany


    Does Causality Align with Bayesian Modeling?

    Structural causal models share a conceptual similarity with the models used in probabilistic programming.

    However, there are important theoretical differences between the two. Can we bridge them in practice?

    In this episode, we explore Thomas' journey into causality and discuss how his experience in Bayesian modeling accelerated his understanding of basic causal concepts.

    We delve into new causally-oriented developments in PyMC - an open-source Python probabilistic programming framework co-authored by Thomas - and discuss practical aspects of causal modeling drawing from Thomas' experience.

    "It's great to be wrong, and this is how we learn" - says Thomas, emphasizing the gradual and iterative nature of his and his team's successful projects.

    Further down the road, we take a look at the opportunities and challenges in uncertainty quantification, briefly discussing probabilistic programming, Bayesian deep learning and conformal prediction perspectives.

    Lastly, Thomas shares his personal journey from studying computer science, bioinformatics, and neuroscience, to becoming a major open-source contributor and an independent entrepreneur.

    Ready to dive in?


    About The Guest
    Thomas Wiecki, Phd is a co-author of PyMC - one of the most recognizable Python probabilistic programming frameworks - and the CEO of PyMC Labs. 
    Connect with Thomas: 

      Should we build the Causal Experts Network?

      Share your thoughts in the survey

      Out-of-the-box insights from digital leaders
      Delivered is your window in the minds of people behind successful digital products.

      Listen on: Apple Podcasts   Spotify

      Support the Show.

      Causal Bandits Podcast
      Causal AI || Causal Machine Learning || Causal Inference & Discovery
      Web: https://causalbanditspodcast.com

      Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
      Join Causal Python Weekly: https://causalpython.io
      The Causal Book: https://amzn.to/3QhsRz4

       So I would say that it's great to be wrong. And that's how we learn. Hey, causal bandits. Welcome to the causal bandits podcast, the best podcast on causality and machine learning on the internet. Today, we're traveling to Berlin to meet our guests. He started learning programming as a child by modifying code examples from a book that he got from his uncle.

      He played guitar in a metal band and studied bioinformatics. It's just a fall in love with Bayesian modeling, which inspired him to grow and develop one of the most recognizable Python probabilistic programming frameworks, ladies and gentlemen, Dr. Thomas Wiecki, let me pass it to your host, Alex Molak.

      Yeah, thank you so much. I'm excited to be here. How are you today? Very good. Yeah, I'm relaxed. I had the pleasure of seeing you set up this amazing. Set up and excited for this discussion. So 

      yesterday we had a dinner together and you told me a little bit about your story about how you got  fascinated by programming, by computers, by this idea of creating something out of nothing.

      Then you got fascinated by neuroscience and bioinformatics and then Bayesian modeling came. Tell me about the day when you felt that. You cannot imagine yourself anymore as a person in the future that is not doing Bayesian modeling. 

      I don't know if there was like a particular day, but I mean, it definitely became more and more certain as time went on.

      And I found that these tools and Bayesian modeling specifically, I could not just use in my PhD research, but also then at. The job I was after Fintech startup called Quantopian based in Boston, and that was focused on building a crowdsourced hedge fund. So quant finance, something completely different.

      And nonetheless, the type of problems related to portfolio construction and evaluating algorithms that really was well solved also by Bayesian modeling. So just the same tools as I was using there, I could use just as well to solve these completely orthogonal problems. And. That's when I definitely realized it's like, okay, well, this tool is not just useful for academic, but also for industry problems and pretty much any data science problem.

      Well, not any, but a lot of data science problems where you need to really build a deep understanding and want to incorporate that understanding into the model. 

      What was the bridge from this first day that you, that you saw a computer in your life? And maybe you realized  what set of possibilities it offers you.

      And between this day that you find yourself falling in love with Bayesian modeling or probabilistic modeling. 

      I always just thought it was so fascinating to be able to create something and do so creatively, right? So when it's programming, there's a lot of creativity involved in terms of having ideas and then bringing those to life.

      And certainly in the development of PyMC, that was really alive and, but also when building models, but really, I think these days, the thing that I enjoy. The most is the community aspect behind that. So, I mean, I love just being in front of a computer and doing some modeling, but much more fun is it to do that with other people.

      So that is very strongly expressed, of course, in the open source arena where you just put some stuff online and then random people from across the world show up and they're like, Hey, I thought I might just make this really, really significant contribution in my free time. And then you get to know these people and realize how talented they are.

      And friendships form and a sense of community is built around that. And that's really what I feel about PyMC and also PyMC Labs where it's just amazing to be able to work with driven, talented people solving extremely advanced problems and pushing the boundary of what's possible. So something that wasn't possible yesterday, today, we maybe have made possible and just opening up these things to things that before you couldn't really even imagine maybe. So yeah, those things I just find endlessly motivating and fascinating. 

      It sounds like this element of discovery of pushing the boundary of seeing a possibility in something that today we see as impossible. It's something important to you.

      Yeah, definitely. I. That image of being like an explorer in the modern sense, I find very appealing. And and yeah, being able to do something at that, that no one has done before and, and not only for myself, but once we've done it, and that's the amazing thing about technology and software, right? Like the day after something new is merged

      then already the next day, the whole world has access to it. So that sense of pushing the boundary and then that level of impact to enable humanity to do things that weren't possible before and solve potentially massive problems, right? That's the other thing where there are so many important problems, right?

      From climate change to What have you, and then, yeah I think that these tools definitely can make a lasting impact to that. And it is being used in IMC is for example, in all these different, like I just looked the other day over 2000 citations of that original publication from like astrophysics.

      So people are using it to solve actual applied problems. So that I think is just immensely cool. It sounds like 

      something really, really rewarding. Yeah. to have this feeling of, of such an impact. Okay. So, you just added something, something new to, to, to PyMC. And this is a causal do operator. Can you tell us a little bit more about this?

      Yeah. And we're very excited about that. So me personally, I I don't consider myself a causal expert, so I have that Bayesian background, and I feel like I'm just starting to learn through these, these topics in through that Bayesian lens. And I feel like that is actually really helpful, because to my understanding, these two fields have mostly developed independently.

      But nonetheless, there are all these really interesting cross connections, like also between Bayesian... Statistics and machine learning, right? So, and I think it's very instructive to learn about that. So that's how I've been approaching it. And really the main driving force behind that is Ben Vincent, who has gone through like a lot of the causal theory and then really did the hard work of mapping that into a Bayesian could understand.

      And I think then other people who come from a, from that background can understand better. And very naturally, then, as we are starting to understand these concepts and reformulate them in our own language, then we want to start and apply them and take the best ideas and incorporate them into that framework, into that Bayesian modeling framework.

      And the DO operator was one of them, where it was actually, um, it actually fit really well into the framework from the software side. It was still, I guess, a little bit challenging because it requires some graph manipulation. We had to add some functionality to that. And actually it was Ricardo Vieira who did that implementation of the do operator and everything that has to happen under the hood.

      And yeah, so now we have it. And that adds, I think one of the critical missing pieces to really make PyMC be a framework in which you can answer structural causal. Problems and build these types of models that I think are really, really exciting. And now you have more of that machinery from directly the causal inference domain.

      You mentioned the structural aspect. And when you look at the way PyMC and other probabilistic programming languages frame the problem of modeling. There is a structural aspects in this and, and it's, and it's a very fundamental aspect. Structural thinking is also fundamental for, for causality.

      And when was the first time when you heard about causality and was it something natural for you to build this association between those structural aspects in both? 

      So yeah, I remember when I first saw causal inference before I just heard about it, and that was at a talk at ODSC London a couple of years ago, and I remember sitting there and seeing that while they were creating these structural causal models to answer, Interesting data science questions in that example around prices elasticity.

      And I remember thinking like, well, that is what we're doing with Bayesian modeling as well, right? We don't call it, I guess, structural causal modeling. We call it the data generative process, but nonetheless. So at the end of the talk, I asked. What is the difference between the type of models we're building in a Bayesian framework?

      And they were like, that's a great question. I don't know. And on many subsequent talks about causal modeling, I kept asking that question. Everyone's like, Hmm, I don't know. I mean, yeah, it sounds kind of similar, but no one really seemed to know. So Yeah. This basically is then so now I think we have the answer and the answer is yes.

      Those two things are the same, just to a different lens through a different framework. And that's what I think is exciting about it. So I think there's a lot to learn in both domains. Like one is causal inference, I think is. Has very powerful idea and really puts causality, obviously front and center and estimating treatment effects and how to think about the world.

      But a lot of it, as I see it is expressed in a frequentist framework. Now   many of the people behind that, like Pearl, they don't say that it has to be frequentist, but in reality, that's what I observe. So I think there's a lot of value in expressing these ideas in a Bayesian framework and on the Bayesian side, I think there's a lot that we can learn in adopting that language and that framework because what really is the benefit of, of Bayesian modeling?

      Well, there's many things you could say, and I have always tried and refined that obviously with PyMC and PyMC Labs, it's our mission to try and make these methods more widely available. And more widely used. So we really have to figure out how to explain this to people who haven't heard about it. And once you start talking about priors and uncertainty, like people don't really.

      Get that and then I guess you can talk about transparency and that's a bit better, but talking about causality, I think that really resonates with a lot of people. And I think that's also why causal modeling is quite popular these days because it, it makes sense, right? Just intuitively like, yeah, if you want to act on the world, you have to understand what causes what to take the most effective action.

      So. So that is, I think, an amazing motivation for building these structural models, and we can just as well do that in a Bayesian framework and now with the do-operator in PyMC. So that's what I think is really cool about this, and yeah, there are these parallels starting with the structural approach. 

      You said so many interesting things here that I have already a list of questions in my head to learn more about your perspective here.

      Let me start with one, one particular question. So you said something about the communication, how you communicate with business stakeholders and what choices you make where to put your focus in order to show them the value of Bayesian framework. And it sounds to me like you just said that talking about causality seems like a powerful tool.

      To convey this value to people who are not necessarily in deep into modeling, deep into statistics and so on. Is that correct? 

      Yeah, absolutely. So, after really having tried this literally for 10 years to try and explain why Bayesian modeling is a good choice for a certain type of problem. It and many misstarts in that that idea of really.

      So the way that I'm phrasing it now is, well, what is the purpose of data science? Right. Why are we even doing this?  And I mean, a lot of people like me do that. And I mean, I guess that's also what we talked about. Just, I think it's cool and it's fun and right. And we're pushing the boundaries and that's a valid point.

      Is it a really good point? Well. Maybe not. So, or maybe it's to make better forecasts, right, that we can predict what's going to happen. And that sounds useful, but really, I think ultimately the best answer that I can come up with is that the purpose of data science is to make better actions, right?

      Make better decisions. And how do we do that? Well, if we want to. Make better actions that lead to desirable outcomes. We really need to understand how actions affect outcomes, right? And that is at the core causal question, because if we mess that up, right, then we might be able to forecast what's going to happen, but we're not able to affect what's going to happen.

      So that type of logic and communication style, I find. Pretty compelling, at least to me. I don't know what you, I'm curious if you agree. But yeah, so that's that's how I think we can really start conveying these methods better. So, but yeah, what, what do you think? Well, it 

      sounds to me like, like in a sense, going back to basics and asking a very fundamental question, why actually we're doing all this stuff.

      And I, and my view is very similar in a sense that. I think we built a culture in data science that we just take some tools and we use them and we try to apply those tools to any problem that we, that we are encountering. And the fundamental myth that is beyond this this culture is that prediction and decision making is the same thing.

      Exactly. But it is not, but it is 

      not. A couple of weeks ago I got, I got a message from my friend. So he texted me on WhatsApp and he, and he, and he added a, a picture. So he should have picture of a, of a whiteboard with like structural, some, some structural modeling. And he, and he captioned it this by saying.

      I'm trying to understand how our marketing works. We have a set of machine learning models. We use them for over three years. And our marketing is pure losses for all the three years there. And this is using a machine learning predictive, predictive modeling or predictive approach for decision making, right?

      It's not a tool that can provide you with this information. I had this metaphor with a, with a map, if you just use Google maps, the, the, the most default view in Google maps, it's very useful to help you get from point A to point B. But if you're a climber and you want to find another mountain to climb and so on, you won't find this information in the, in the default mode of Google maps.

      It's just not the right tool to get the answer that you are seeking for. Yeah. 

      Yeah. Yeah. I completely. Agree. And I think it's very easy for us as data scientists to not really think hard enough about really how do we have this business impact, right? So, and I mean, there's all kinds of reasons for that.

      Some of them are structural that often these are different organizations, right? And the data scientists are doing their thing and they're deploying models. And then. How critically are those evaluated at the end of the day? Right. And maybe three years later, you take a look and you're like, Oh, actually this isn't working at all.

      Like, yes, it's doing very good predictions and it's super fancy, but does it really solve the underlying business problem? And that is something that I think we need to think much more carefully about. And I think for most of those problems, then that causal approach is very powerful. And not for all of them.

      Right. I think there's like legit forecasting prediction problems where just black box ML algorithm will do the job just fine. But in my personal experience also with seeing this through PymC labs is there aren't. They are way fewer than I think are getting applied to way fewer problems where machine learning is getting applied to where it's actually the right fit.

      So I think with a little bit more careful thought and understanding the problem and building a model that is mapping that causal structure, we can solve these problems and it might be harder. We might have to learn new tools and develop new tools, but I mean, that's going to be what is. Determine the future of data science, right?

      Like whether we're still going to be relevant or not, we'll depend on whether we can solve actual business problems or not. Some time 

      ago, a couple of weeks ago, I had a conversation with, with Robert Ness from, from Microsoft research. And it turned out that we share a common experience. We both work with causality and we had this experience of hearing from people.

      That they are really, really concerned and really, really afraid that if they need to explicitly define this structure, the data generating process or the structural causal model, they are just afraid that it will be wrong. And as a Bayesian model You are experienced with this, right? So making choices like what makes sense and what doesn't make sense and which direction should be, should the influence flow and so on and so on.

      What would be your advice to people who are just starting thinking structurally to overcome those fears? And what would be the, what do you see as an opportunity cost in this case? 

      So I would say that it's great to be wrong. We're wrong all the time and that's how we learn, right? So we build. A model and we start with something very simple and that oftentimes we might think like, Oh yeah, this is like a good first cut.

      But oftentimes it's not. And then there's all these tools, right. To find out how, whether you're wrong and how wrong you are, like posterior predictive checks and simulations, right. To see like, can I reproduce the data? And then you find like, Oh, actually no, like it's doing a terrible job at that.

      And then it's like. Then you can ask full on questions like, well, why does it have that behavior and start fixing things? And then you actually learn something about your data rather than just like throwing more machine learning at it. And of course the other path that opens up is communication with those domain experts, right?

      So usually as consultants, we don't come in with The domain knowledge, and we build these models from our best understanding. And then we present them to the client and they're like, Oh, well, actually, that's not how these things really fit together. And then we learn something and we improve the model.

      So yeah, we're, we're so long we're wrong as long as we. Until we get right. And then you have actually learned something about the problem. You have a model that works and that the domain experts agree on and that model the data well, and then you're really cooking. 

      And what's the alternative actually 

      of not doing this?

      Yeah, exactly. I, I don't know. I mean, I guess, you could just, yeah. Train a whole bunch of classifiers that learn whatever they learn. And it might be nonsensical. But even if it is, you don't learn anything in that approach, which I think is problematic and you can't really communicate your solution either to other people, which is also then a question of trust, right?

      Like would you really have your marketing budget be determined by a black box algorithm that might work completely counter to your own intuition? And oftentimes these people have done this for. Right. And really have a really good idea of how these things fit together. And then in the best case scenario, you ring that together, you bring the domain expertise and what the data says together and get the best of both worlds, rather than doing things in a completely qualitative way, using just only the domain expert and Excel spreadsheets, or doing a 100 percent data driven approach where you're only.

      Have the data and you fit a model and, and have the predictions too. So I think, yeah, having both of these work in tandem and support each other helps with the solution, but also with the trust in terms of using them. I 

      always say to people that just selecting features using their, I don't know, intuition or just putting everything into the model is also building a structural model, right?

      Yeah. And every feature you put in those, in this model. Every, every, every decision about putting or not putting a feature in the model has a chance to impact the conditional probability landscape of this model. Yeah. 

      So that is oftentimes what Bayesians like to say when people criticize priors for being subjective and and the response to that as well, all modeling is.

      It's subjective and, and that's true. And like you do actually, I think it's really a great idea to extend that argument further, right? Because even in a machine learning framework, you make all kinds of decisions, right? Starting with the data processing. And by now there's a lot of research also on the uncertainty that is introduced through that process.

      And. It turns out that that matters a lot, right? Like, how do you remove outliers? Do you normalize your data? What type of machine learning algorithm do you fit? Like all of those things matter a lot to the end result. So yeah, these are all choices and you can either be very conscious about them and transparent or not.

      You 

      also have extensive knowledge in neuroscience. Is there any, any connection between Between learning how we as humans and maybe all the non-human animals, how we function cognitively and how we function in the environment, we, how we react to environment and your approach to based modeling. 

      Hmm.

      So, there's definitely a very interesting research direction in cognitive neuroscience that is furthering the hypothesis that the human brain works using Bayesian updating essentially. Whether that is true on like, every level is, is a science question, right? And there's good data for it.  But I think just that general idea is so compelling, right?

      That like we start with some idea about how the world works and then we go out and we apply that and we learn something and then we update those beliefs. So that's how we all operate. Right. And. I guess what is interesting in the context of this conversation is that there is this obviously causal model that we're all building in our head of how the world works and, and, and that is as much a part of it as updating our beliefs.

      So I think there's some really interesting. Parallels there and also just really helpful intuitions that, well, yeah, that is how we learn about the world. And that's probably how our computer should learn about the world as well. 

      When humans learn about the world, we build those causal models, but perhaps, and I say perhaps because I don't know the answer.

      We, we do not build one huge causal model of everything that we experience. Maybe we have just like local causal models. And when I think about marketing and, and, and you with the Pymc team also invest in this direction by, by building Pymc marketing and publishing marketing, mixed modeling blogs, and all this kind of stuff.

      Marketing is, is a set of actions in a very complex environment. And we could think about two essential, two fundamental ways to, to, to model this, this complex environment. So one would be to try build as huge model as possible and as accurate as possible. So let's call it a global way. And another one would be just focusing on what's important for us for a given given problem that is maybe a little bit more narrowly defined.

      And then maybe build a series of those smaller models from your experience, which approach makes more sense in practice. 

      It's a great question. So I think in practice, what we're mostly doing is always building the more localized specialized models because really that's the only way you can start.  I, in the past, I've definitely tried and built like overly ambitious model that then just crumble and always had to go back to basics.

      So these days we're always starting that way, but I think there's a huge appeal to these global models. And one of those comes from the fact that probably many different things influence each other. So in that blog post that you mentioned before, where we introduced the do operator. We talk about the marketing funnel and how different aspects affect purchasing behavior at the, at the bottom of what we're interested in.

      So certainly marketing does that, right? So, and there's different types of marketing, there's brand marketing that is more driving awareness of the product and maybe then through downstream effects and cascades has that effect on sales versus more direct Performance based marketing. And then there's other, and that's usually, well, that is already pretty advanced for like a media mix approach where usually you just have those, you don't separate those out.

      So you can start doing that and then you can think of, okay, well, I'm not the only one doing marketing. There's also competitors who are doing marketing and that also also works for increasing awareness of my products, right? Because it grows the general pie. So why don't also include that? But then there's other things that we know affect things like, for example the price, right?

      Like how, how, what's the price elasticity of my product? And where is, do I set the current price point and also the price of the competition? So, like all those things interact. And I think it's it will be very interesting. And I think it'll be the best model to really include that. And probably we will start with individual pieces and then start connecting them.

      But whenever we did that in the past especially connecting these different data sets, like you have your marketing spend and your purchasing behavior, but maybe also you ran lift tests. so There's something we did with in the HelloFresh project where they had these lift tests where you test specifically how good is that channel using much more accurate measurements than what an MMM could provide.

      And you can just add that as another data source, another view into a latent process. And then that. Turned out to make the fit significantly better. So yeah, like linking disparate datasets and connecting them in sensible ways is usually what I think we should all be thriving for. 

      To what extent do you feel like just putting some of the, some of the existing processes.

      As we sometimes call it exogenous variables, which means that just distributions like some noise distributions that are impacting how the model works, but we don't define them. So we move them outside of our scope of the variables that we can intervene on. We just say like, Hey, we know that all this stuff impacts this variable jointly.

      We assume that the distribution should be more or less like this. Maybe it's normal. Probably it's not in, in real world and we just, we just put it aside in a sense to, to what extent, or where, where's the, where's this boundary where you feel that this approach, just putting things like, Hey, this is the external noise to our system.

      It's useful versus saying, Hey, let's analyze what actually, what is this noise, what constitutes it, and so on and so on. Where is, where would you set the boundary in practice in practical terms? 

      Yeah. So it depends. And I think there is, it's always going to come down to like. The specific problem in data set and whether that actually matters or not.

      So this is definitely something that in the model creation process, we would test and say, okay are those noise terms, can I just like summarize them and be done with it and simplify them all that way. And if that works, then great. And if it doesn't, well then we add more structure to it and see how big the benefit is.

      But of course. Even if there is a benefit, it might not warrant the additional complexity. I would say though, in practice, we tend to err on the side of including more than less. So, and, and I think for good reason. So, we do add most of the things we care about because even with that, usually I would say that we're always a bit under Utilized in terms of what the data can provide.

      So,  there's always a little bit more that we would like to be able to model then, then what we have data for in terms of like structure, like for example, we might only have aggregate purchase behavior where really we want individual purchase behavior of individual customers, but we only have the aggregate.

      So we can only do that. And then the assumptions that come with that. So there's yeah. A lot of assumptions that we then have to build into the model that we'd rather not directly have. So these are like approximations that we're way off and probably it's gonna be fine. But if we had this additional information, we would love to include that and build a better model.

      When 

      we work with with causal models using frequentist approach to, to inference, to, to estimating the parameters, uh, we are often faced with. The decision which variables should be included in the model in order to reflect the structural properties of this, of this model. Bayesian perspective in terms of probabilistic programming languages and frameworks like PyMC seems to be fundamentally different.

      What are your thoughts on this? 

      Yeah. So I think that's one of the areas where Bayesian modeling can really provide a big benefit for causal inference. The way that I understand what mostly is being done today with causal inference where you, you build the structural causal graph and then you input that into like an analysis framework that does backdoor path, frontdoor path and figures these things out and then says, okay, these are the variables you should include and these are the ones you shouldn't include if you, for the subsequent estimation of that model.

      But oftentimes it's not fitting that

      aNd then of course you want to be very, it's very important to solve that variable selection problem. Because if you include two, the wrong ones, then you're getting biases. If you don't like a collider for exactly, yeah. If you include the collider, you get the wrong answer. If you don't include confounders in the right way, you get wrong answers.

      So that is very critical to that. Okay. I guess frequentist path of doing things, although I think these two things are largely orthogonal. However, in the Bayesian framework, what we're doing is we're just like building that model, the same structural model, and then we're just estimating that directly.

      We're not sort of, we don't really need this type of logic to say, okay, these variables I have to include, these I don't have to include. We just. And for example, a collider, well, we just include that structure in the model, and then we run our inference and that will incorporate that already in the right way.

      And there's actually a related point that is really critical here, which is, I think there's going to be, uh, I mean, people are definitely working on that, right? And the ability to have this causal graph and then directly estimating the strength of those connections.  But if you still do that in a, uh, for example, the point estimate or a frequentist way, that is oftentimes gonna give very, um, biased or noisy estimates.

      So that's something that we, that as a Bayesian, we often see is that if we want to do point estimates, which Bayesians can do, that these point estimates are oftentimes just. Not at all what you want them to be. So for example in a hierarchical model Which are very common, right? But in a Bayesian framework, you just estimate them and things just work out but if you Estimate them with point estimates and you just look for the the mode the most likely value in a maximum likelihood framework It will just oftentimes collapse everything together to a single point.

       And there's like theoretical reasons for that, where weird things happen in this parameter space, if you're trying to estimate things in a certain way, and you're just looking for the maximum in that parameter space, that is often not the point that you really, that's going to be most representative and most So what you really want is the mean and.

      And, and for that, you have to run something like Markov chain Monte Carlo. So if that was too technical, doing point estimates is very limited when you're trying to build more complex structural causal models. And that's really where you need to integrate. And that's where you need oftentimes Markov chain Monte Carlo, because that is one of the most flexible and general purpose algorithms for solving that problem.

      It's 

      all of this that you mentioned. And also... Related to the fact that Bayesian modeling in this version that we are discussing here and that Pymc offers is generative events versus what we have in traditionally in, in frequentist inference, a non generative approach. 

      I think that definitely helps.

      And it's a very powerful tool in building these models and really understanding the type of problem that we're trying to solve. So Thank you. One step in that Bayesian workflow, which is essentially how we build models, is first, before you even fit the model to data, you just build the causal structure that you think is going to underlie the problem.

      And once you have that in this framework that can actually express this graph and then generate data, you can exactly see what type of data is generated by this. And. Usually humans don't think in parabola spaces, they think in terms of like, Oh, well, this is the data that I have. And this is what I want to, uh, this is what I expect to see.

      So once you do that, you really immediately see what whether it's doing the right thing or not. And then oftentimes from there you can really intuit, okay, well, that's not probably this assumption that like this thing influences the other thing might be wrong. What happens? To the pattern of data that are generated when I changed that.

      And maybe then all of a sudden like, Oh, okay, well, this actually makes more sense now. So there's a lot of structural discovery.  That's a very caudal term that can be done just by going through different hypotheses and seeing the implications. Of the hypotheses that we put into the model, 

      this workflow reminds me very much of, of what we do when we, when we use causal discovery in, in real world settings, which means that this is almost always an iterate iterative process.

      so We do something very similar. We learn a structure very often. We also want to.  Include as some sort of a prior, not necessarily in Bayesian sense, sometimes yes, sometimes not the expert knowledge that we have available. And then we compare what, what this model looks like comparing to the data that we have from the real world or some insights that we have maybe from an experiment and so on and so on.

      So this seems very, very related on a conceptual level. Yeah. 

      And sometimes now that I'm talking more about Bayesian causal inference, I do get that question of like, well, how do we even come up with that graph? Which is the question for structural discovery where there are quite a few causal inference tools that like just try a lot of different combinations.

      And there's some work in the Bayesian domain for that, but most of the time we're still mainly just. Building these things in this iterative simulator approach. And that actually I find so far hasn't really produced many issues in terms of like, Oh, well, the space is so large and we have no idea about the problem.

      We just need to iterate everything through. Usually that search space is pretty constrained and. With some guidance, you can make a lot of progress. So, so far we've fared fairly well with that approach. 

      Over the last year or so, like many new developments in, in the PyMC ecosystem that are causal. So we had a do operator that we, the do operator that we discussed before, but we also had, I think last year, a new package in the, in the family called causal pie is the future of PyMC.

      tHat's a good question. So I think there's a lot of premise there. So, causal pie has been really the first, I guess, piece of causal inference that we started to attack and a great package developed by also Ben Vincent, and that is. focused on quasi experimentation, which I think is a pretty nicely constrained set of models or problem domain.

      And that's, I think, where we first understood like, okay, well, this is how causal analysis would approach this type of problem. Well, really what it's doing is just fitting data on and in. saMple period and then predicting the auto sample period and then comparing to what actually happened. And there's various combinations of that.

      But, well, once we're fitting something and predicting something, we can do that with machine learning or we can do that in a Bayesian model, Bayesian framework. And that's what CausalPy does. So it allows you to either use a scikit learn model or a Bayesian model. And if you use a Bayesian model, you get uncertainty.

      And in the other case you don't. And so, yeah, that, that was that insight. And from there, I guess we really learned that there's a lot of misunderstanding is too strong of a word, but nonetheless, I think there is a lot of historical

      And maybe even baggage where I have heard that like, well, you can't really do causal analysis in a Bayesian framework. Or sometimes people say with Bayesian networks, that's not supported. And for example, one is well in Bayesian, we're looking at conditional probabilities and that doesn't cover the case of interventions, which is the do operator.

      And well, okay. trAditionally that's true, but there's really nothing stopping us from adding the do operator. Right. And then like, well, is it strictly technically still a Bayesian network who cares? Right. I mean, these are all just tools and we want to build the best tool. So, and it's all at the end of the day, probability.

      So, yeah, I think there's a lot we can do if we're trying to come at this with fresh eyes and really. Yeah. Seeing, okay, well, what is still required today and maybe what are things we can now do better with the better tools we have and just being free with the exploration of these methods and allowing ourselves to do that.

      So to bring that back to your question yeah, I think that's a really powerful direction for PyMC and adding these tools and also in communication, as I said earlier, well, it's a very powerful way of thinking about. The world and data science problems and solving them. And I'm I'd be delighted if Pymc is is going to be helpful in that endeavor.

      You mentioned 

      the, the topic of uncertainty and uncertainty estimation a couple of times today. And, and you emphasize that Bayesian framework allows to do it in a very natural, very organic way. And over the last. A year, maybe last year or maybe two last years, there's also another way of estimating uncertainty that became very popular and prominent which is called conformal prediction that comes from more from the frequent east from, from the frequentist tradition.

      What are your thoughts on strong and weak sides of both approaches to uncertainty modeling? 

      Yeah, so I think, like, those are really interesting directions. The way that I started to think about it is that really there's two orthogonal things going on. And I think that is what, at least for me, was a big point of confusion where like I said earlier, I just kept asking people like, well, in Bayesian framework, we're also doing that.

      Is that the same or different? And no one really knew the answer. So. The way that I'm starting to think about is that there's two axes, and one is along the axis of uncertainty quantification. And the other one is, I have to find a better term for it, is maybe actionability or understandability. Where, for example, black box machine learning models, I would say, are ranked very low in terms of the understandability.

      They just fit some function. Then we can move up that scale, and look at correlational models and that certainly gives us more insight, right? We can really understand how things relate to each other. And then, of course, the causal people are saying, well, that's not enough because if we don't understand the directionality, then we can't really take action.

      So that I would put at the top there, that causal understanding that we get through causal structure models. But these models, how we then estimate them is a completely orthogonal question. So we could just do point estimates. That's fine. We can do frequentist that's fine, or we can do pays in model.

      And conform prediction fits in that model as well. So, and, and yeah, so different tools can be placed on that map as well. So yeah, that's how I think about it. And like, I think there's all kinds of different ways of. Doing that. And on that two dimensional space, you want to be on different points at different times for different problems.

      And then it's just about like, okay, well, what, what tooling do I require to solve that particular problem? I also 

      found one another one, another one dimension that I think is significant here, and I would be very curious to hear your, your thoughts on this, so invasion framework, we often can pretty easily split the.

      Estimates of aleatoric and epistemic uncertainty while conformal prediction just gives us total uncertainty for, for a given model in a, in a more like a black box manner, right? Because it looks at the outputs and then the, there was a little bit of calibration. That's a simplification, but let it be, um, do you find the.

      This ability to split aleatoric and epistemic uncertainty quantification as, as something useful in 

      practice. It depends on how much you trust that split and. A lot of it will depend on, on your priors and how precise you choose those. So I think more of uncertainty in terms of not absolute, but relative terms.

      So given that it was my starting belief about the world now I'm thinking about it this way and, but nonetheless, like even that uncertainty estimate, I think needs to be validated and. And tested on, on reality, whether those uncertainty estimates are correct. So for example, for a while, there was the very top popular topic of Bayesian deep learning, and I also got really interested in that because I thought like, oh, well, there's a lot of interesting things that we can do here and maybe having things in a Bayesian framework will be really helpful, but.

      The more I worked on this and started thinking about it. And of course, other research has come out as well because one of the biggest selling points there as well, we get uncertainty in our predictions and that sounds cool, but then the question is, well, what type of uncertainty is it and how does it behave?

      And then when you look at it, it actually is not really the type of uncertainty you want because What it does is it estimates a hyperplane, right? That's if you have a classification problem. And then the uncertainty just keeps decreasing the further away from that hyperplane. Sorry, the uncertainty keeps yeah, keeps decreasing the further from the hyperplane you go.

      But maybe very far away from the hyperplane, you also don't have any data, so it doesn't behave in the way we're like, Oh, this is the the type of example I've seen before. So I have very high certainty about it. And this is something I've never seen before, right? Like some, let's say I have an image model and I just give it a white noise thing.

      And then you would expect that it has very high uncertainty about that, but that's not necessarily the case because. Maybe it ends up somewhere in the parameter space, which is very far away from your hyperplane, and then the uncertainty will be very low. So it has these weird properties. And then when we start thinking about the parameter space itself, and then the uncertainty in that space, what does that even mean in a way?

      So the uncertainty, I think, is very model specific. And then the type of answers that we're getting, and then whether it's how we divide that between uncertainty from the data or the model, I yeah, I think that that is I would need to do a lot of validation in order to really believe those numbers.

      And when you use uncertainty in uncertainty estimates in, in PIMC practice. So we met, we discussed a couple of examples with relative marketing. What's your approach there? 

      So, yeah I think the, that type of uncertainty is very interesting and becomes even more interesting when it becomes actionable and the way that we always love to do it is again, coming back to that concept of actionability, right, rather than just providing.

      Posterior estimates and cool. Like we can estimate how strongly these two things are correlated or maybe causal. And now in addition, we get uncertainty bounds around that and that's useful. And we need to make sure that they're somewhat calibrated and meaningful. Again, they are, I would say it's more about, comparing this to, let's say that same model and I fit it on a different data set, like a smaller data set, and then you can see like, oh, the uncertainty is much wider here than there, or compare different marketing channels, right? Mm-Hmm. . So that I feel much more comfortable with. And, but what you can do then, which is really cool, is take those uncertainty estimates and put them into an optimizer and.

      Define a loss function that tells you, okay, this is an outcome that is preferable in this case, it's, I want more sales, right? So, and because at the end of the day, what do we want to do with that marketing mix model? Yeah. Analyzing it and seeing how effective certain channels are is, is interesting. Right.

      And people can take that information in and then maybe do something with that. And, but what are they going to do? They're going to adjust. How much they're going to allocate to different marketing channels. And that's cool, but we can do one better where we just treat that as the optimization problem and say, okay, well, we want the, to allocate the budget most effectively to maximize the sales.

      That's going to be my loss function, my objective function, and I'm going to find that best setting. And they are, the uncertainty plays a big role because if you just treat point estimates, A marketing channel, for example, that maybe was on for like two months. Right. And looks really, really good. We'll get a very high effectiveness estimate if you only look at the point estimate and a marketing channel that has been where we have data for three years and it's been solid, but not amazing.

      You would give that a smaller location. Well, the optimizer would in practice, most people would say, well, I don't trust that small window, right? They will have only very little data for that channel. I will probably like stay on the safe side. But once we include the uncertainty in that the optimizer does exactly what the person would do, which is okay I'm gonna go with the safe choice where I have a lot of data Very little uncertainty that things will go wrong and the new one.

      I'm gonna like keep in there but not like double down on 

      In a sense, we're coming back to this to this area of connection between human decision making and human psychology and modeling again. 

      Yeah exactly. So I think a lot of that is how humans make decisions, how we think about the world and we are risk averse, right?

      There's a lot of research on that. And if you don't have a measure of risk, which is just another word for uncertainty in your model those answers can't possibly take that into account that we like to have solutions that take that risk into account. And of course we also coming back to that concept of actability and what the purpose of data science is, right?

      It's about making better decisions. And for that, just giving someone a posterior distribution or an estimate with error bars. Is not as actionable as just saying, okay, well, given everything and you understand the model, right, and all the connections, this is the best put, this is the best budget allocation of our marketing spend that we can have.

      And then, of course, we can inquire that and play around with knobs and do different simulations, right? And see what happens if I diverge from that. And so it opens up that discussion.  And provides really interesting solutions to, to problems much more directly than only the modeling does. You mentioned 

      risk aversion that we as humans experience or display in certain decision making scenarios.

      When we talked before and you told me about your story, I had a feeling that there's a lot of exploration there and that discovery is something that's important to you and discovery is in itself. An act of going outside of what is known, it's, it's a risky endeavor and by its nature in a sense. What would be, what would be your advice to people who are starting with something new?

      Maybe they want to go into machine learning or Bayesian modeling or maybe causal inference or causal modeling in broader terms. And they just feel that there's so much going on there, they feel a little bit overwhelmed or maybe they feel.

      Yeah. No, I mean, I've definitely had a lot of doubts along the way. I remember just starting considering studying bioinformatics, computer science and knowing that that was that there's a lot of math required and, I wasn't sure whether I would be good enough at math to do that. And then nonetheless, I gave it a try and it didn't come easy, but nonetheless, I.

      Really spent a lot of time just banging my head against the wall and like reading these weird proofs. And then finally like, Oh, okay, actually it's not that complicated. It's fairly obvious once you really understand it. And then there's a, for me, there always was like a really deep level of satisfaction and understanding these concepts and building bridges to different things.

      But in terms of the risk, well, okay, so two things. One is. I, for the most part, just followed whatever was fun for me. Like, I never felt like I really did something that I really hated doing and just only did it because like, Oh, I want to be a data scientist. So I want to be a machine learning person.

      So I have to do this. What I hate, like it was always fun for me. And that is really what kept me going. So it never felt like work. aNd nonetheless, I. Did certainly take risks, but to me, they always felt like risks where I really thought carefully about like the up and the downside and, and protecting the downside.

      So I wouldn't say that I'm not risk averse. I think I'm definitely am. And that was one of the reasons also why I didn't stay in academia. So when I did my PhD, that was also a very calculated risk in terms of like, okay, well. What are the probabilities that I will have a career there and become a professor?

      And what would that look like? And I didn't think that the chances were very favorable and nonetheless, I thought it made a lot of sense to still do the PhD because I could learn a lot of things and then that would help me for an industry career if I would choose it. But maybe I loved academia so much that I would stay there and even then throughout the whole time I was very conscious of the fact that Well, if I want to move to industry, I should probably have some skills as well.

      So all this programming, which I enjoyed anyway, but I just realized that that was really going to be very beneficial for that. And also going to pay data conferences and reaching out and doing internships. At Quantopian, this is how I got my first job. Then after I finished my PhD in industry, so that  that was the other key aspect, I guess, is following the passion and taking calculated risks and also just reaching out to people.

      So many people in the past that I just sent random email to and was like, Hey that sounds pretty cool. What are you doing? Like, can I learn more about that? And maybe do like an internship there. And what I've found is like people are often extremely open and helpful and eager to. Work with you if you, if, if it comes from a good place.

      So yeah, I think those are some of the things that, that worked for me. And then of course, also with starting my own company with PymC Labs, which has been like an incredible adventure, but also there, like, for example, we didn't take on any VC funding or I didn't take on loans at the bank. So it's, it's totally bootstrapped.

      So really what is the worst that can happen? Like it. It may not work and then I'll do something else. That's cool. That's not that bad. Right. So just give it a try. So yeah, I think trying things and, but also maybe not over committing, right? Like I think it's easy to just have this like very just follow your dreams.

      And like, if you work hard at it, it'll definitely work out. I don't think there's any guarantees that that's going to happen. So I think yeah, it is important to be a realistic and if things are not working out Well, maybe then time to try something else at a certain point rather than just like doubling down on something that might not be Might not be the thing that's actually working out Because the other thing I learned is oftentimes also then talking to many other people who for example stayed longer in academia than I did they did a postdoc and then switch to industry and And that shows a different type of commitment, I guess.

      Right. Because those were people who were like, Oh yeah, no, I love this research and the work I'm doing so much, I will fight for it. I had that dream of becoming a professor and then that didn't work out. And then they switched industry and 100 percent of them, two or three months after I talked to them, whether they like miss those research questions that were like their life they're all like, what, no, like, who cares?

      Like this stuff I'm doing now is like super cool. And they're really excited about that. So, most humans I think just fall in love with. Whatever they're doing. So I think that can be factored in as well. That like the thing that someone might think is like the only thing that will be amazing for them maybe they'll like something else just as much, or maybe even more just haven't, haven't tried it yet.

      It 

      sounds to me like there are, there are three main lessons that, that you talked about. So one was. Give yourself the last one was give yourself a chance to explore stuff outside of your, what you're doing currently, because maybe there's something that you would actually like more. The other one was to take risks, but in a calculated way.

      So to think about the consequences and the worst case scenarios and the best case scenarios. And the second one was to follow your passion and do what's passionate for you. 

      Yeah, I think that's well summarized. Is there anyone 

      who you would like to thank? 

      Well, I definitely had quite a few mentors along the way that were really helpful.

      So my PhD advisor, Michael was, Michael Frank was amazing. I learned a lot at my time at Quantopian our CEO there, Foss and Gestalt, who was my manager. They really taught me a lot in terms of like the type of. Company culture that was even possible, like before I had this abstract view of like, well, it's just you need to have this like hierarchy.

      Right? And can't be fun. And like to have project management and maybe it's agile or whatever. And the, the possibility to just break those rules and just like, well, no. Maybe you can just have a group of people and be very transparent about what you're doing and everyone can contribute and there's no red tape and there's like very little hierarchy or like self like fluid hierarchies that just like assemble themselves and just trust a lot more in the people to do the right thing when you're They have the whole picture and they have the sense that they can actually contribute.

      Where can people find more resources to learn more about PyMC, the whole open ecosystem, and in particular for our audience about Bayesian modeling as you see it and you see it at PyMC.

      So I would start with the PyMC website, pymc. io, that has a really great examples gallery where you can just like browse through and like see all different notebooks on all kinds of problems, and that is, I think, a great way to get started because again, it's that playful self guided exploration principle, uh, I really love the blog post that we put out on the PyMC labs website.

      So. labs. io and then go to blogs. And that has like, for example, also that post on the do operator. So that's where we output, I would say most of the current thinking of like what we're excited about and where we see things going and where these ideas, like we talked about today, uh, expressed. And then of course, social media.

      So, I guess LinkedIn is mostly where we post stuff.  Yeah, I think those are some good, good starting points. 

      Is there any question that you would like to ask me? 

      So what is the, what is your motivation for, well, not just the podcast, but everything you do, right? So what is that, what does that drive?

      And what does that what does success look like? In in this endeavor, great question. 

      One of the, one of the main motivators for me doing the work for, for, for, for the community is that in some of the darkest moments in my life, I had this possibility to go to the internet and learn from all those people experiences only because they just decided to put it out there for free.

      And.

      When I think about it, even, you know, it gives me just goosebumps all over my body. I feel super, super grateful for this. I have a ton of gratitude in myself for this and, and I want to pay back to the community. So this is behind my, my blogging. This is behind, behind this podcast as well.  aNd, and, and all my activities.

      Now, of course, there is also a. Let's say there are more business oriented aspects of my, of my actions, but they also make it possible for me to give more to the community. 

      Yeah. And I mean, those two things don't need to be, they can all fit the same thing. I mean, I think that the work we're doing with Pymes Labs is as much a part of the community building and because our customers are part of that community too.

      Right. And like the people who   watching this and, and learning from your resources and, and your customers.   Yeah, all, all part of that. So I think it's, it's really great to be able to think about this holistically and, and really, yeah, build these communities. And that has been for me, one of the most rewarding aspects of this whole endeavor.

      What 

      is the best way for, for people to connect with you? 

      So, you can connect with me on LinkedIn. That would be the first line of business. I still have a Twitter T Wiki that you can check out and that's, that's pretty much it. Happy to hear from anyone. Also feel free to drop me an email. Happy to hear from anyone.

      Great. Thank you so much, Thomas. It was a pleasure. I hope that you all also enjoyed this, this discussion. And see you in the next episode. 

      Thank you very much for having me. That's great. Thank you for staying with us till the end and see you in the next episode of the Causal Bandits podcast.

      Who should we interview next? Let us know in the comments below or email us at hello at causalpython. io. Stay causal. 

      (Cont.) Causality, Bayesian Modeling and PyMC || Thomas Wiecki || Causal Bandits Ep. 001 (2023)