Causal Bandits Podcast

Causal ML, Transparency & Time-Varying Treatments || Iyar Lin || Causal Bandits Ep. 008 (2024)

January 22, 2024 Alex Molak Season 1 Episode 8
Causal ML, Transparency & Time-Varying Treatments || Iyar Lin || Causal Bandits Ep. 008 (2024)
Causal Bandits Podcast
More Info
Causal Bandits Podcast
Causal ML, Transparency & Time-Varying Treatments || Iyar Lin || Causal Bandits Ep. 008 (2024)
Jan 22, 2024 Season 1 Episode 8
Alex Molak

Send us a Text Message.

Support the show

Video version available on YouTube
Recorded on Sep 13, 2023 in Beit El'Azari, Israel


The eternal dance between the data and the model

Early in his career, Iyar realized that purely associative models cannot provide him with the answers to the questions he found most interesting.

This realization laid the groundwork for his search for methods that go beyond statistical summaries of the data.

What started as a lonely journey, led him to become a data science lead at his current company, where he fosters causal culture daily.

Iyar developed a framework that helps digital product companies make better decisions regarding their products at scale and at budget.

Here, causality is not just a concept, but a tool for change.

Ready to dive in?

------------------------------------------------------------------------------------------------------

About The Guest
Iyar Lin is a Data Science Lead at Loops, where he helps customers make better decisions leveraging causal inference and machine learning methods. He holds master's degree in statistics from The Hebrew University of Jerusalem. Before Loops, he worked at ViaSat and SimilarWeb.

Connect with Iyar:
- Iyar on LinkedIn
- Iyar's web page

About The Host
Aleksander (Alex) Molak is an independent machine learning researcher, educator, entrepreneur and a best-selling author in the area of causality (https://amzn.to/3QhsRz4).

Should we build the Causal Experts Network?

Share your thoughts in the survey

Out-of-the-box insights from digital leaders
Delivered is your window in the minds of people behind successful digital products.

Listen on: Apple Podcasts   Spotify

Support the Show.

Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4

Show Notes Transcript Chapter Markers

Send us a Text Message.

Support the show

Video version available on YouTube
Recorded on Sep 13, 2023 in Beit El'Azari, Israel


The eternal dance between the data and the model

Early in his career, Iyar realized that purely associative models cannot provide him with the answers to the questions he found most interesting.

This realization laid the groundwork for his search for methods that go beyond statistical summaries of the data.

What started as a lonely journey, led him to become a data science lead at his current company, where he fosters causal culture daily.

Iyar developed a framework that helps digital product companies make better decisions regarding their products at scale and at budget.

Here, causality is not just a concept, but a tool for change.

Ready to dive in?

------------------------------------------------------------------------------------------------------

About The Guest
Iyar Lin is a Data Science Lead at Loops, where he helps customers make better decisions leveraging causal inference and machine learning methods. He holds master's degree in statistics from The Hebrew University of Jerusalem. Before Loops, he worked at ViaSat and SimilarWeb.

Connect with Iyar:
- Iyar on LinkedIn
- Iyar's web page

About The Host
Aleksander (Alex) Molak is an independent machine learning researcher, educator, entrepreneur and a best-selling author in the area of causality (https://amzn.to/3QhsRz4).

Should we build the Causal Experts Network?

Share your thoughts in the survey

Out-of-the-box insights from digital leaders
Delivered is your window in the minds of people behind successful digital products.

Listen on: Apple Podcasts   Spotify

Support the Show.

Causal Bandits Podcast
Causal AI || Causal Machine Learning || Causal Inference & Discovery
Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/
Join Causal Python Weekly: https://causalpython.io
The Causal Book: https://amzn.to/3QhsRz4

 008 - CB007 - Iyar Lin - Audio

Whenever you do one of these activities, you have to gain a much deeper understanding of the subject matter in order to successfully deliver that kind of content. You also lose the temporal relation between the confounders themselves and the treatment. So, for example, when you do that collapsing of data, it's possible that Some of those features will actually.

Hey, causal bandits. Welcome to the Causal Bandits Podcast, the best podcast on causality and machine learning on the internet. Today we're traveling to Bay de Lazar to meet our guest. He almost became a pilot. He studied political science, yet ended up. getting a degree in statistics, just to realize early on in his career that statistics alone is not enough to answer the most interesting business questions.

He plays cello and is truly passionate about solving real world problems. Ladies and gentlemen, please welcome Mr. Iyar Lin. Let me pass it to your host, Alex Molak. 

Ladies and gentlemen, please. Welcome Iyar Lin 

hi, Alex. Nice to be here. Hey, 

hi. Iyar, where are we 

today? We are at my, childhood garden and the house here in Bei El'Azari Israel.

When was 

the first time in our career when you felt that the statistical approach might be not enough to answer some of the important 

questions? Actually, you see it quite early on in your career. I started off at Viasat, which is a satellite internet satellite internet company based in the States.

And you see folks, running statistics all day long, T tests, wherever they go. And, it's in those instances where you realize that those measurements don't make sense. And, and, you know, they go off and say that it is statistically significant, but you'd understand that there is more than just statistical significance to a result for it to mean something that really threw me off and made me realize that there was a whole parallel kind of dimension.

To analyzing data, it isn't about the uncertainty related with just measurement error, but rather about the data generating process that underlies it. What was your first 

encounter with 

causality or causal thinking? So I think that if I go back to what I said earlier working at Viasat, I joined this team that was working on email campaigns.

For our customers sending them those discounts in order to retain them. And I remember you know, seeing that what was done till that point was to model which customers are about to churn, which are not. And then sending out those discounts to those who were most likely to churn. I went to the literature and read some and realized that what we are interested with is not predicting which users are going to churn.

Oftentimes, by the way, they would churn anyhow. So that made me think, well, it's not about who's going to churn. It's about who's going to respond best to our discount. And that might not necessarily be those high probability churners. In fact, you can imagine that there would be scenarios where some customers are so upset that, you know, sending them a 5 voucher would probably not make things better for them.

And vice versa, though, would might be customers who are perfectly satisfied what they have and you sending them that 5 voucher makes them start think about the bills that they are paying every month and whether they should check out the competition. Why should I enjoy a one time 5 discount when maybe the competition is cheaper.

So, you know, these are the sleeping dogs, the last one I was referring to, and then there's the The first group, which you can obviously act on. So, at that point in time, I realized that just the statistical associations is not what we are after most of the time. Sometimes we are, but most of the time, as far as decision making goes, it's that layer of how would variables in the system react that you are most interested in.

Today, you are 

working on projects that are helping companies make better decisions. So you took this path, right? That 

day. Right. Yes. So, you know, during that time I started reading up about that topic and at some point the topic of causal inference came up and as you go through the material and you go deeper and deeper into it, you realize that it's very complex.

No one around me does it. Or knows how to perform it. So I don't any, I don't have anyone to consult and I need to convince everyone around me that it's actually needed. So at that point in time, I realized that I wanted to do it, but I wasn't sure about what first steps I should be making, I, I've never met someone who was certified or known for applying causal inference methods.

And so what I came up with was well, I need to study this around. And at least for me, I don't know how most people are, but if you just sit and read papers, you kind of like chat GPT, you are able to summarize them and maybe have an intelligent conversation about them, kind of referring to the main topics in a paper.

But for me to be able to actually utilize those methods in real world scenarios where the settings might be different to some degree than what I've just read. I needed some something that will imbue me with a much deeper level of understanding. And the way I went about doing it is writing a blog. And I can say that this blog you know, got me a lot of attention or a large audience, but it's okay because that wasn't the aim to begin with.

It was a need to understand the topic well enough to be able to explain it to others and be able to write it down in a way that's approachable and understandable. So if you have to go to these lengths, then as you are writing the blog, you suddenly realize, wait, I don't really understand that point to the fullest.

So you go back and you keep reading. And then I'd always make it a point for myself to do a code examples, like real life examples. So Again, for me, for it to be useful and applied and also come up with, you know, topics I didn't really understand. Oftentimes I realize that the best way of understanding a specific model Is when you want to simulate a model, you all of a sudden need to think about all the specifics of how things actually work because you need to code it.

So you need to be as explicit as possible. So in these blog posts, I would often simulate those situations, suddenly understand that, Oh shoot, I didn't really. Think through this part. So, uh, that's how I kind of rolled with it. And then funny story at some point I was contacted by the person who's now the CEO of the company I work for Tom Laufer, and he reached out and said you know, I came across your blog posts and how about you come and work in doing causal inference for real and not just writing about it.

That's a beautiful 

story. You do something that helps you. It also helps the community because people can learn from your experience and all the hard work that you did. And then finally, it also helps a company on the market that just finds a person with a skillset that can help them solve their problems.

So it's like a triple win. 

Exactly. And to be frank, I didn't expect that scenario to play out because again, it wasn't like it didn't went viral or anything, but you know, if he posts constantly and you make it a point to yourself, even though it's not directly work related and it is time consuming to not neglect it and post every once in a few months, then it does generate the amount of awareness needed.

To facilitate that kind of connections. So it was a very happy scenario for me because during all that time, I knew that the next way to level up would be to actually work in doing causal inference. You know, so that's the next step in the ladder of getting into the field is actually applying it. And then you again go through these realizations where, you know, you get tough questions asked because this is not just a blog post.

This is your work and your boss comes to you and he tells you, well, how do you know that it works? Why should I believe it? What good does it do? And, you know, you have to answer it. You can't just say, well, you know, I don't have time for this right now. This is my dad's job. So I need to be able to answer it.

You got your degree in statistics, 

How much attention in your statistical course was devoted to causal thinking and causal inference? 

Well, you know, zero. There was no attention given to causal thinking whatsoever, and you become very surprised in retrospective that was the case because in essence, causal inference is a sub branch of statistics.

You know, it's the world where you need to work with assumptions like you do in statistics and build some model of the world. And then you apply the model and you analyze your results. So it's all, it's in the same state of mind, a lot of that literature isn't that new. I mean, you know, Rubin's work from the seventies.

Pearl's from the eighties. So, it's been there for a while. So it's a good question as to why that is the way it is. I mean, for sure there is some overlap or many of the statistical tools I learned through my studies were very useful. But in the end, they only were constrained to the statistical associations field.

I've never really touched about the data generating process and how things actually work under the hood. 

In the real world scenarios, we often hear that it's difficult to control for all possible confounders. For instance, this might differ between different settings, right? So maybe for a manufacturing process, it might be much easier to control for all the factors because a system that is essentially isolated from the influence of the external world.

Not completely, but substantially while for a marketing campaign or political science research, those systems tend to be very, very open. And it's very hard to contain all the influences that might be, might be relevant. In your work, you found a way. To debias the problems that you work with, at least to a certain extent using the time domain.

Can you share a little bit more 

about this? Yeah, sure. So, just to give some context I work for a company that provides an analytics platform and for product and growth themes. And so the problem set up we are dealing with is a product where. Product managers need to make decisions such as should they change the ordering of the menu and what features in the product drive conversions for example, to paid users.

So if I am to state the problem fully, let's imagine that we have this freemium product where you have a trial period and you want to understand by the end of the trial period. Which of the features in the product drive conversions to paid the most. I mean, that's one of the main money drivers for these apps, right?

And the second one would be churn right after that. So, and then usually what companies would do is, well, compare the conversion rates among users who adopted a certain feature versus those, the conversion rate among those who didn't. And the problem here is it's often realized very quickly that those users who adopted a certain feature are usually much more engaged and come up with an initial intent that it's much higher than those who didn't.

And then the question arises, well, the differences I'm seeing in conversion rates, are those due to the feature adoption or just due to the intent that brought a user to adopt it in the first place. 

So what you're saying, sorry for interrupting you, what you're saying is that. Users might be adopting features because of the features themselves, but they might be also adopting features because of some properties of their, I don't know personality.

Exactly. Or maybe their preferences for novelty 

or, or just by, or just click buttons. Exactly. Or just by how valuable that product is to them. Right? It totally makes sense that users, different users find products value different. And so there's a strong sense of confoundedness in that area that, you know, all product managers feel that those users adopting the feature probably just found the entire product more useful.

And so how can we disentangle that effect now, one approach that you can take is say, well, how about we try and control for the baseline attributes that users arrive at when they join the product. So, for example, you could measure country device marketing campaign on which they arrived, but.

Essentially, all these metrics are very coarse. I mean, you can imagine that even within certain, let's say the US, you'd still have a very wide range of personas or user types. So what we did here is we used a method that enables us to measure users activity within the app. As a confounder or as a proxy to that initial intent or user personality in that way, generate much stronger conditions or much stronger confounders on which we can de bias the results we are seeing.

So, you know, just to be, to be specific, you'd might say something like. Within users who have already adopted or interacted with at least five features in the product, the difference between those who additionally adopted the feature I'm interested with, Had a 10 percent higher conversion rate versus those who didn't.

Now, when you say that it makes the comparison much more, I'd say believable, or you understand that it's much less confounded because those users who already adopted the five features cannot be very low intent users or users who don't find any value in the product. And so that additional difference between them is probably.

More due to the additional feature that was adopted. So what you're 

saying is that if somebody adopts five features in some period of time, it says something, reveals something about them. So that would be like a, an equivalent of a personality. Exactly. Instead of asking people questions. We just look at their behaviors, which, ah, every psychologist would tell you that this is a better test than asking 

a question.

Yeah. So, just to be more specific there are certain challenges involved with measuring the effect of time varying treatments, AKA feature adoption and time varying outcomes, such as conversions, and also using time varying confounders, such as activity in the app. So, right off the bat, you know, we were up for a challenge because just the way you organize your data isn't straightforward.

And just to give you an idea, let's imagine that users usually adopt the feature by the end of the first day. Let's say 50 percent of them do that. And also 50 percent of the users convert to paying within the first day. That's very common in certain products like casual apps, for example. So now, even before we are talking about confounding, if you were to want and measure.

 You know, conversion within those who adopted versus those who didn't, you'd need to specify some timeframe for the adoptions and that's timeframe would have to proceed the outcome you're interested with because otherwise you know, it doesn't make sense to say that the adoption affected the conversion.

So adoptions have to come. before conversions. And then in order to cast it as a simple classification problem, you'd have to say something like, okay, well, let's measure first day adoptions and compare it with conversions after the first day. So that kind of settles the problem with the ordering of the events.

What ends up happening is that you have to discard half of your conversions because they happen within the first day. You'd also have to discard half of the adoptions because they happen after the first day. So, you can't really frame it as a regular classification problem without losing a lot of data.

One of the most problematic phenomenons we saw is that oftentimes if a feature is very strong, then the conversion would happen right after the feature adoption. And then what would end up happening is that either both happened on the first day, or both happened after the first day. In both cases, you would throw them, right?

Because either the conversion or the adoption didn't happen in the correct time frame. And what you'd end up thinking is that that feature is actually not very good, because all the cases where users adopted it and actually converted were discarded. So this discarding of information is not just about the quantity of data points.

It's also a very biased. way of discarding information. Now one way you could go about fixing this problem is using a method which I feel is very underrated. And that's called survival analysis. I was lucky. to get introduced to it via my statistics studies. But it's not very well known outside the statistics community.

And it's actually a third branch of supervised learning. So you have classification regression, and then the third one is survival, which kind of It looks like a mixture of both. It has both a continuous binary part. And so using survival lets you solve the problem I was mentioning earlier. You would go about and use what is called time varying covariates to encode those instances where users have adopted the feature.

But haven't converted yet. And those instances where users haven't adopted the feature. And you would build a survival curve for both groups. And then compare the you know, the survival or one minus survival, which is the conversion rate at the end of the period. And by doing that, you'd get an equal footing for both groups in terms of time they had to convert.

Now, so you wouldn't need to discard any information. So really cool methodology. I I recommend everyone to get acquainted with regardless of. Causal inference. And then the way it ties back to what we were talking about earlier is that you, you can also model the confounders. We were talking about the behavioral ones in the same manner.

So you treat feature adoption as a censored time to event target, just like you would conversion and your covariates would be the usages in the app or in the product. And that way you are able to find those user profiles who convert the fastest to adopting the feature or who adopted the fastest versus those who usually take a lot of time to adopt it or not adopt it at all.

So that's the gist of the method. So that sounds 

like what you're trying to do here is you're trying to preserve as much of temporal information as possible. And leverage it. In order to one, get more accurate estimates, but two also to exclude the possibility you control for an irrelevant variable in certain cases.

Yes, exactly. And to make that last statement more clear, what it helps you do is make sure that whenever you analyze the effect of a certain treatment on the outcome. The way that the data is structured makes sure that it actually happened before the outcome and not, for example, afterwards. And same goes for the confounders.

The user profile always precedes the treatment adoption. And regarding what you said earlier about time that's exactly right. The way the data is structured enables you to not discard the time element but rather encode it itself. So, you know, you could get millisecond, precise description of users journey.

I mean, he used that feature on that specific second, and then. He used another feature on this specific second, and this is how we traverse through these states. So again, to make this less ambiguous, I would recommend anyone read about survival analysis and the time varying covariates survival analysis.

And these are the methods that enable you to preserve the time dimension, which oftentimes, you know, just being collapsed into, into one dimension. Like I said earlier, you say, let's look at all first day adoptions or like uses of the app and see how those relate. To an outcome sometime afterwards, by the way, even in that, what I said earlier was the problem that you had loss of information because, you know, the temporal rage relation between treatment and outcome.

But you also lose the temporal relation between the confounders themselves and the treatment. So for example, when you do that collapsing of data. It's possible that some of those features were actually interacted with because of the treatment. And it would be wrong, to condition on them because they are the result of the treatment, not preceding of the treatment.

They don't cause the treatment. They are caused by it. So, when you squash your data and you eliminate the time dimension, then you can get all these backwards relationships that can really bias your analysis. I mean, you know, in that case, for example, if you are controlling for actions in the product that are a result of adopting a certain feature, you're actually.

Biasing you're making the estimate look lower than it should be. 

What I hear you say is that in certain cases, the same variable in one case, the same variable might be. Might be preceding the treatment, so adoption of the feature in, in certain other cases, the same variable, the same, let's say, behavior might be a result of adopting the feature.

And in the second case, if we control for this variable, we would actually control for the mediator. Exactly. Exactly. Blocking the, blocking the, the flow of information from the treatment to the 

outcome. Exactly. To talk, you know, in Pearl's language, it would be like you said, the mediator or a blocker where you'd be biasing your effect estimates down.

One way of thinking about this in a more intuitive way is saying that let's say that this feature is so good that it makes me use the app much more now by controlling for that further activity. I'm actually punishing my treatment. I'm telling him even within that certain group of users who are highly, highly engaged.

Because of the treatment, you need to show an extra effect, or to talk, you know, in mediation effect, you really remove the mediation effect and you only look at the, the remaining effect, right? So you have total effect that's, you can partition it to mediated effect and direct effect. So here you'd be looking only at the direct effect.

And from a decision making point standpoint, you don't really care usually about that break. You care about the total effect, be it mediated or direct. So it's very important to not control for these activities. And by keeping the time dimension, you are able to, to overcome that naturally, whereas if you were to collapse them, you'd need to use further methods to understand what's the directionality, you know, of the, of the flow.

So, you know, talking in Pearl's terms, you need to know which way the arrow should be pointed. Yeah. In Pearl's framework, we can 

show that the time criterion is not always enough to exclude the spurious relations between variables. What are your thoughts about this in the context 

of your work that is a valid concern what you're referring to for example, what is known as the M bias where you'd have two users who are connected for mutual interest.

And you could show that even though their self preferences are preceding the actions that they do in terms of what to read. They actually confound that relationship once you control for the, for them being connected. So it is it is a possibility. So what I would say about this is that setting the or keeping the time dimension resolves.

Some biases that can arise in data such as, for example, blockers or mediators, this is the most prevalent type of biases that sorting variables on time can fix But in the presence of hidden confounding then you'd still need to further develop, you know, the framework or the model that you're assuming in order to correctly handle those apparent biases.

If we are in a scenario, by the way, where you assume that there are no hidden confounders which oftentimes is an assumption that is made, you know, in practical terms, even if no one's really willing to admit it. And then sorting that by time should solve everything because in those scenarios, Only blockers you know, are the cases where you'd want to avoid, controlling for something.

So, so 

you say that in the case of M-bias, for instance, if we have this collider node and we control it for it, it allows this spurious flow of information. But if we would be able to control for those two variables that are connecting to the collider. That would be, again, blocked. Exactly. And so 

it would solve the problem.

Exactly, yeah. 

I'm not sure if for our listeners, if that would be clear. On YouTube, we'll try to draw a 

little graph for you. You also had a great post on it. So, thank 

you. Thank you. I had a very interesting conversation with Jacob Zeitler recently in Oxford. And so Jacob's perspective is that causal assumptions come with a cost.

So for instance, running a randomized control trial allows us to, assuming that it's designed properly and And carried out properly, it allows us to assume that there is no hidden confounding in our, in our data. But this comes with a cost. Another example would be consulting with experts, subject matter experts and building a complete graph of a problem.

Complete in a sense of our treatment and outcome but this might also be very, very costly. Now, on the other side of the spectrum, we might make certain assumptions about the data generating process, and this, on the other hand, might be risky. So we might be wrong with those assumptions. In some cases, we might perform something like partial identification that can bound the causal effect in a useful way for us, or we can perform something like sensitivity analysis that can tell us, like, hey, if this is a hidden confounding.

The confounder should be, I don't know, three or 30 times more powerful than another thing in our model in order to make the effect to be zero. For instance. How do you think about your work in the context of this frame of costs and risks? 

Yeah, that's, that's a great question. So I'd say you mentioned all sorts of costs like, you know, making certain assumptions or consulting domain knowledge experts.

And in my mind, it all boils down to the cost of man hours that you need to pay. So for example, to make well judged. Assumptions about your data might require a lot of men hours of a highly trained individual, for example, or conversing with a domain expert knowledge might entail cost again in men hours of a specific expert whose time is very valuable.

So to me, it all kind of boils down to time in men hours that you need to pay in order to Get guarantees on the results that you are seeing and you know, cost of an RCT could also be equated with that. So you pay that amount of money in order to be able to get those guarantees and in order to not pay it, you'd need to pay that amount of men hours through salary, for example.

So you can equate those. So what I ended up doing in my work is addressing those situations where organizations don't have that many resources. And trying to come up with methods and ways to utilize causal inference in those low resource environments. And talking back about what he said, that means having lower guarantees about what you can say from the data.

And the silver lining here is that while you have to be satisfied with a lower degree of guarantees. And what you should be thinking of is what type of guarantees you'd have without utilizing those methods. And so I always think about causality, not as a binary state, but rather as a spectrum. And as long as you are able to make some estimates somewhat more causal, then that's always better than using, you know, plain correlations for that matter.

And so what I've been most interested with is finding those exact methods that yield the highest guarantees with the lowest costs. 

You and your team were able to help some companies improve their, their operations in a way that was significant. 

Yeah. Yeah. We, so, you know, going back to the example I gave earlier, we had a customer who has a golfing app.

And he had one of these you know, the features in the app that wasn't getting a lot of traction and, you know, even intuitively thinking when you have a feature that doesn't get much traction, it leads you to think that, well, maybe it's just not a good feature, but the situation is that there are many confounding factors here, for example, it's possible that that feature is just located Somewhere in the product where it's inaccessible, or it's possible that it's very valuable to a small subset of users.

And so, using the methodology I described earlier, we were able to disentangle All these biases and uncover the reality that that feature had a great potential to drive that user's KPIs. And so using the recommendations we gave him, you know, he started pushing that feature ahead in the product. And he made it part of the onboarding process.

And by doing that, he was able to really pump, you know, his, his top line KPIs by numbers that you're not usually accustomed to seeing. I mean, I, I personally was a bit surprised by how successful it was, you know, us being so skeptical about models and hidden confounding, et cetera. So, you know, seeing it actually work quite well, you know, was both surprising and pleasant surprise.

And, you know, we serve many clients. So, you know, that story kind of repeats itself in many different scenarios. Part of the reason why I work where I work is to be able to propagate the type of research I do. To many companies. So in a way to be able to multiply the effect or the the way that these methods affect you know, decision making in general.

What 

are the main challenges in terms of data and expert knowledge Or anything else that is substantial or relevant for, for projects like this that you encountered in your career so far? 

Yeah, I love that question. And you know, the challenges that you face tend to be very different to those that people usually associate with applying causal inference.

For once, I find that the biggest challenge to applying causal inference methods in our case, but in general as well, is being able to convince that they are valuable because at the end of the day, organizations already collect data and they already aggregate it in some ways, and they are already using it for decision making.

And you need to be able to come up and tell them, I'm gonna aggregate your data differently. And that somehow is going to generate a lot of value for you. And this is the main pain point where you need to be able to somehow convince customers that what you're doing is real, because up until that point, it all kind of floats in the air.

You can talk about confounding and, you know, tell them stories, but at the end of the day, when they need to make a decision, it can be very challenging to make them realize why, what you're saying is true, by the way, part of the way of dealing with that challenge is setting the expectations, right? You're not telling them that what you're saying is necessarily true.

You're just telling that there is a higher probability that it's true or that it's more correct than what they are seeing. In their regular data. So it's a lot about setting the expectations, right? And kind of comparing yourself not to running an RCT or knowing the truth, but rather comparing yourself to doing regular associations.

And then in that context, it's easier to show why your work matters or why it's valuable. Many 

practitioners talk about this topic and about communication with stakeholders and ways to convey value regarding causal, causal methods. What, in your experience, 

worked best? Right.

So I think that at the end of the day. One of the main drivers of trust is your ability to tie back the results that you are showing to the data that you are using. That is a point where if that connection is done clearly enough, where you might be able to convince stakeholders. In truths that sometime might be hard to swallow.

So oftentimes, you know, if you tell decision makers you know, results or recommendations that they already believe. Then it's much easier to swallow and they'd be much more willing to cooperate. But then you wouldn't be delivering much value because in a counterfactual world where you didn't exist, they would do it anyhow.

It's those places where you're telling something that is counterintuitive or that might even make them look bad. That it's in, you know, it's paramount that you'll be able to, to convince them and no matter, well, at least in my experience, no matter how much rigorous your math, your method is, it doesn't matter how sophisticated it is or how well it is shown to perform in some simulated scenarios.

If you're unable to tie it back to the data, then you, you'd have very small chances of convincing them in those, you know, difficult positions. And so, one of the main pillars of every method I develop. Is that it's completely transparent in the sense that you can see how every calculation that gives you the final answer is derived from your data and then they can repeat the calculation themselves and see that, you know, it adds up.

And you kind of empower them and give them ownership on the results because they are now able to reproduce them, recreate them and believe them. That's kind of, of, of the main line of thought there in addressing, you know, the need to convince in those tough decision points is the future causal.

Yes, definitely. But like I said earlier, I think it's very prudent to treat causality as a spectrum. So I definitely see the future as being more causal. I think that the way that causality will be adopted across organizations. will depend a lot on the resources that they have at hand, and the ones with more resources will tend to use state of the art methods and to answer big questions with high guarantees, while the smaller players would probably Adopt some sort of say automation or services, you know, like SaaS solutions that would enable them to tap into causal inference, with lower guarantees, because again, you're lacking domain knowledge.

You don't have experts that are sitting on it, but you will be able to, to derive more causal conclusions. So definitely I think that there is a lot of going on now for causal. I mean, I think it's been when I started out, you know, in 2017, it kind of started to take off with the ACIC conferences where they would do these data hackathons, like, competitions and awareness starts to build slowly.

But intriguingly enough, I think that the rise of LLMs. Really made, and you know, the way that they behave like people and kind of give you the sense that machines can be ultra intelligent, then it kind of, opens the door for people believing in things that they might find hard to understand initially thanks to these latest developments.

So I do see causal inference as a field, also starting to pick up and gain steam you know, in the community and not only in the academy where it's been hot for a few years now, but also in the industry itself. More and more stakeholders and business leaders feel the need to invest in causal inference and understand its value.

So definitely. People starting with 

causality or even starting with machine learning in general sometimes lose. Feel a little bit overwhelmed with the amount of information that they need to learn in order to make those things work in your life, you tried many different things. And you are successful in many of them.

What would be your advice to people who are 

just starting? Right. That's a good question. I still remember the sense of awe when I started reading up those papers about causal inference back in the day and not knowing where even to start. So, one important question is, one important answer is.

Don't try to take short cuts, especially in the age of LLMs and chat GPTs. It's very tempting to just go ahead and ask the machine for the answer. But causal inference requires a very deep level of understanding, which I think can only come through very meticulous study. Of that field now as to how to go about studying that field, I guess different people have different things working for them, but I can say that at least for myself, what works best was I started my own blog about causal in France.

It was back in 2019. And no one around me was talking about causal inference. I had no one to consult on that topic and no one was even asking me at work to apply it. But I realized back then that I need to, to understand that field. And by just reading those papers, I felt like I only gained a very shallow understanding of.

What's going on in there. And if someone were to ask me about them, I would probably be able to give an answer. That's somewhat equivalent to what chat GPT does. I mean, it sounds about right, but if you need to apply it, then you are going to be lost. So since I, I wasn't be, I wasn't doing it at work. And I guess that for many people, that would be the scenario, right?

Because it's kind of like the chicken and egg. You don't know how to do causal inference. So no one hires you to do it. But if you are not hired to do it, then you don't know how to do it. So one intermediary step is writing, for example, a blog post or giving lectures. Whenever you do one of these activities, you have to gain a much deeper understanding of the subject matter in order to, you know, successfully deliver those that kind of content.

So you read the paper and then when you try to explain it and you try to be as specific as possible, for example, do simulation analysis and, you know, run specific examples on your own. Only then, when it really starts to sink and you start to really understand it and see how it might behave in settings different to those in the paper you just read, for example, or how it relates to other papers or what are the limitations that you are not mentioned in the paper.

But just from your experience, you come to realize them. So, that, that would probably be find a way, whether it's writing blog posts, doing presentations, I take a newsletter, whatever to start and get yourself acquainted with the field. And then the next step is start working in it. Now that's, that one's can be a bit tough.

But if you're able to find a job where you are expected to perform such types of analysis and that type of skill is sought after, then you should definitely go for it. Even if it sounds like a demotion at some point, it's probably worth it because that's by far the best way to get into any kind of domain.

Also causal influence for that matter. Who would you like to thank? Oh, well, that's an easy one. So I'd like to thank Elad Cohen. He was my team leader at Viasat, the first company I worked for, you know, as an adult data scientist. And what I'd like to thank him for is first off being a great friend, but also imbuing me or kind of, educating me from the first day of work about the value of producing value for the business.

And, you know, we analytics people at the end of the day, we take data, we scramble it some, and then we give it to someone else. And you always need to be super aware to what kind of value you are bringing to the company and always strive to be able to, you know, show that value and, and pursue it directly and always prioritize it over other stuff.

Like, for example, the shiniest algorithm out there, or I don't know, job titles or whatever, always look for the value. And then. From there, you know, go back and think what you should be doing right now to achieve that value. And so he was always laser focused. And that's something, you know, I carry with me till now, I mean, always thinking what the things I do, how they impact the business.

And if I can't answer that question clearly, then I know I'm doing something wrong and I should probably rethink this. As 

a child, you used to play cello and then you went to study political science. Is there anything in those experiences that you can translate or you? Actively used today in your work or something that helps you in your work.

Well, yeah, definitely. I mean, each one of them in a different way. I think that you probably know this to learning to play an instrument is both fun kind of like. Our job can be fun, but it's also tedious and requires tons of concentration. And, and so what my music studies gave me that I keep with me till this day is the ability to focus and concentrate for very long hours you know, on specific tasks that I want to achieve.

Just like back then, I would try to nail that, you know, amazing part in the piece and you'd have to do it repetitively for hours. Today, I would sit in front of an article and try to piece it together and break it part by part in order to really. make sense of what's going on there. So that that's for the music part.

And then for the political science I think that the years spent in university studying political science and later statistics and economics really cultivated my awareness. To what's going on around us and caring or wanting to make positive change in people's life. And it might not seem highly related at first to what we do, but when you think about it, decision making.

Makes a large part of the reality that surrounds us, basically humans shape nature and every decision that they are making affects our lives. And we all know that data driven decision making is important, but I feel that we still have a very long way to go in doing data driven decision making better.

And so when I. Help companies and decision makers do that process of data driven decision making better. I feel like I do contribute to, to the reality around us, to, to the people. And so in my own small way, I am also helping advance that front, you know, little by little. What question would you like to ask me?

Tons of questions. It's, I'll tell you one that's actually been with me for a while. I would really like to ask you what advice you would have for folks who want to take the professional occupation. And be able to well share it with large audiences of people. So, you know, in my case, I feel like I've been able to collect.

A lot of knowledge around causal inference, product analytics and statistics. And I actually already have thoughts about, you know, putting it down into some form like text or whatever. And just like in the days where I was thinking about making my first steps in causal inference it seems daunting trying to, you know, go out there, build an audience and, and communicate those.

Realizations with them in a way, you know, that resonates with them and, and activates them. So, it was a long question, but if you have like one or two tips to how get started, because how to do it, that's probably a long talk, but like, what's the thing I should do tomorrow to get me 

kickstarted.

That's a great question. Just start 

tomorrow. I 

think I think another important thing is to understand why you're doing stuff you're doing. So for me I learned a lot from the internet, just thanks to many people spread around the world who are willing to share their experience and their expertise with.

With others and anonymously often for free and it just immensely helped me change my life. And I have, I have a ton of gratitude towards those people, towards the community in general. So this is very natural for me to just go there and do stuff. And no, my, my book is not free, but this podcast is free.

My blog posts are free. My LinkedIn content is free. Some other content that I will create won't be free. But it will allow me to build more free content for those people who are not in the place today. That they can afford something that costs some amount of dollars, right? 

Right. That's really helpful.

I think that just do it. It's very to the point. The 

first step is just do it and do it today or do it tomorrow. Don't wait. I think that's the most important part. Thanks. Appreciate it. What resources would you recommend to people just 

starting with Causality? Well, beside the book you just released, and I don't mean that, you know, as a pun it really is, I mean, it's written very well for beginners.

Aside from texts like yours Which are great introductories, there are a couple of books and resources that I've written about in the past, so I'll be sure to also you know, write them about them in LinkedIn soon, but one of them is Judah Pearl's, often less mentioned causal inference, Primer for statistics.

So it's a, it's a small booklet, very to the point. So anyone wanting to get into DAGs real quick, I think that's the one you should be reading. There is a very good paper that's called a tale of two cultures or something along those lines. I'm sure you'll. upload the link later, but it talks about, you know, the difference between the statistics community and the computer science community.

And I feel. It teaches you a lot about different mindsets of analytics and how we should all be a bit of both. I think these are the main ones. I mean, to be honest, there are tons of resources out there. So I think that beyond these, you'll probably go ahead and drill down on the specific materials that relate to your specific use cases.

So I hope that helps. What's your message to the 

causal Python community? I think that many of the members are probably already aware of how challenging it could be to justify the usage of causal inference methods and the results that they entail over, you know, simpler methods. And my message to them would be that It's very important to do science correctly, and for me, it's doing science.

And when we go and we use data to drive recommendations, If we are not doing it correctly, if we don't handle confounding, if we remove the time dimension without accounting it for correctly, we could end up Hurting the business instead of helping it. So it's very important that you, you guys persist on your journey to become better practitioners.

And remember that it matters. It's not just a formal thing. Well, you should be applying causal inference methods because that's proper. You should do it because it will help you drive better decision making and. Ultimately help your business or whatever entity you work for thrive. 

Where can people find out more about you and learn more about your work where they can find your blog where 

they can connect?

Well, first off, LinkedIn. It's a place where I roam quite a bit. So anyone feel free to, to reach out. And I'd love to connect and, and, and chat. I do as you briefly mentioned, maybe I also mentioned write a blog post. A series of blogs, I try to maintain it every few months but right now you can go check it out iyarlin.github.io and you'll find there a couple of dozen of blogs I wrote over the years. That's about it for now. You know, you do inspire me to try and increase my reach. So hopefully. You'll see me in more places. 

Iyar Thank you. It was a pleasure. Me as well. And I'm so happy that we were able to meet in this beautiful 

place today.

Me too. It's it means a lot for me. Congrats on reaching the end of this episode of the Causal Bandits podcast. Stay tuned for the next one. If you like this episode, click the like button to help others find it. And maybe subscribe to this channel as well. You know. Stay causal.

(Cont.) Causal ML, Transparency & Time-Varying Treatments || Iyar Lin || Causal Bandits Ep. 008 (2024)