# Causal Bandits Podcast

Causal Bandits Podcast with Alex Molak is here to help you learn about causality, causal AI and causal machine learning through the genius of others.

The podcast focuses on causality from a number of different perspectives, finding common grounds between academia and industry, philosophy, theory and practice, and between different schools of thought, and traditions.

Your host, Alex Molak is an entrepreneur, independent researcher and a best-selling author, who decided to travel the world to record conversations with the most interesting minds in causality.

Enjoy and stay causal!

Keywords: Causal AI, Causal Machine Learning, Causality, Causal Inference, Causal Discovery, Machine Learning, AI, Artificial Intelligence

## Causal Bandits Podcast

# Causal Inference, Clinical Trials & Randomization || Stephen Senn || Causal Bandits Ep. 012 (2024)

Support the show

Video version available on YouTube**Do We Need Probability?**Causal inference lies at the very heart of the scientific method.

Randomized controlled trials (RCTs; also known as randomized experiemnts or A/B tests) are often called "the golden standard for causal inference".

It's a less known fact that randomized trials have their limitations in answering causal questions.

What are the most common myths about randomization?

What causal questions can and cannot be answered with randomized experiments? Finally, why do we need probability?

Join me on a fascinating journey into clinical trials, randomization and generalization.

Ready to meet Stephen Senn?

**About The Guest**

Stephen Senn, PhD, is a statistician and consultant specializing in clinical trials for drug development. He is a former Group Head at Ciba-Geigy and has served as a professor at the University of Glasgow and University College London (UCL). He is the author of "Statistical Issues in Drug Development," "Crossover Trials in Clinical Research," and "Dicing with Death".

Connect with Stephen:

- Stephen on Twitter/X

- Stephen on LinkedIn

- Stephen's web page

**About The Host**

Aleksander (Alex) Molak is an independent machine learning researcher, educator, entrepreneur and a best-selling author in the area of causality.

Connect with Alex:

- Alex on the Internet

**Links**

Find the links here

**Causal Bandits Team**

Project Coordinator: Taiba Malik

Video and Audio Editing: Navneet Sharma, Aleksander Molak

#causalai #causalinference #causality #abtest #statistics #experie

Should we build the Causal Experts Network?

Share your thoughts in the survey

**Causal Bandits Podcast**Causal AI || Causal Machine Learning || Causal Inference & Discovery

Web: https://causalbanditspodcast.com

Connect on LinkedIn: https://www.linkedin.com/in/aleksandermolak/

Join Causal Python Weekly: https://causalpython.io

The Causal Book: https://amzn.to/3QhsRz4

**Stephen Senn:** There will be two men falling ill for every woman who falls ill. So how are you going to recruit them into the clinical trial? There are many deep, wonderful, and complex theories of optimal experimental design that we medical statisticians ought to know better and wait till we meet probability.

**Marcus:** Hey, Causal Bandits.

Welcome to the Causal Bandits podcast, the best podcast on causality and machine learning on the internet.

**Jessie:** Today, we're traveling to Edinburgh, Scotland, to meet our guest. He loves hiking and occasionally posts a picture of a beer on his Twitter. He's passionate about the history of statistics and knows a lot about randomization.

The author of Statistical Issues in Drug Development and Dicing with Death. Ladies and gentlemen, please welcome Dr. Stephen Sen. Let me pass it to your host Alex Mola

**Alex:** Welcome to the podcast Steven.

**Stephen Senn:** Thank you!

**Alex:** Before we start. I wanted to I wanted to thank you Because your tweets and then your book were very helpful for me when I was writing my book.

**Stephen Senn:** Oh, okay. I'm surprised because I know nothing about Python at all and not very much about causal inference either. But there we are.

**Alex:** I would not agree with the second one, but I understand your perspective on this. In your book, there's actually a quote that is both in this book.

**Stephen Senn:** Ah, okay.

**Alex:** And in this book, and the reason is the causal relationship is that this quote was in this book and then I cited you.

**Stephen Senn:** Okay, good.

**Alex:** In one of the pages here, you write, That it's pointless to beat around the bush that randomization is not a controversial idea and that it's still controversial among statisticians and scientists even today. What are the most controversial ideas about randomization that you think or you expect most people are not aware of?

I think that the criticism comes from misunderstanding. The misunderstanding is that what the goal is, is perfect estimation, but R. E. Fisher, who proposed randomization, realized that perfect estimation was impossible, but that maybe one could estimate how imperfect any estimate was. And to be able to estimate how imperfect randomization.

So the critics start from the point, following position, well, hundreds and hundreds of things which could affect an outcome and a clinical trial. And once you randomize, it's obvious that they're not all going to be perfectly balanced. And since they're not all perfectly balanced, there will be some bias in the particular estimate that you produce.

But what they don't understand is that this is accepted. It's known that this is the case. But what happens is that these factors contribute also to our estimate of error. And I often put it like this. If we knew that all of the particular factors in a clinical trial were perfectly balanced, And in that case, the standard analysis would be wrong because the standard analysis allows for the fact that they will not be perfectly balanced.

And it's failure to understand this which leads to endless, tedious conversations in which people don't really know what's going on.

Some people say or expect that randomization will balance the groups. So if we have a binary treatment, for instance, that by randomizing the treatment assignment, we will get, groups that are somehow comparable, but this is not necessarily true.

**Stephen Senn:** No, it's certainly not true. In fact, they won't be comparable in that sense. They won't be perfectly balanced. It's obvious that they won't be. But I think, um, to give another analogy, which is perhaps helpful here is I sort of imagine the conversation between a mathematician and an engineer and the mathematician says to the engineer, well, actually the problem with the bridge that you've built is that, uh, in the hot weather, the bridge will expand and then in that case, you're gonna have trouble because you've calculated the bridge in such a way that it fits over the river, but, thermal expansion will lead to problems. And the engineer says, yes, I know that. That's why we have expansion joints. And the mathematician then say, yes, but you're ignoring the fact that the bridge expands.

He's not listening to what the engineer says. What I've already told the philosophical critics is I've already allowed for the fact that the factors will not be perfectly balanced. And I can demonstrate this by showing you how narrow the confidence interval would be for a crossover trial Compared to how broad it will be for a parallel group trial in a crossover trial In fact, I have some data in which some of the data from the same trial can be analyzed as if they were in a crossover trial because the same patient's been treated More than once and some of the data As if in a parallel group trial, because different patients were also treated in different ways as regards other treatments, and you can look at both of the analyses, and you can see that the confidence interval is much, much wider for the parallel group than the crossover trial.

In the crossover trial, we're balancing for 20, 000 genes, all life history to date, because each subject has their own control, and this leads to much narrower confidence intervals. In the parallel group trial, we aren't doing that. But you can't tell me that my estimate is wrong. Because that's not the game I'm in.

I know my estimate is wrong. The question is, how wrong can it be? And I've told you how wrong it can be probabilistically.

**Alex:** Can you quickly define for those people in our audience who are not experts in experimental design? What are the differences between crossover trials and parallel group trials?

**Stephen Senn:** So parallel group trial is probably the most common one. And that's where you have patients say, for example, that you would randomize to receive either the new treatment or a standard treatment. And so every patient gets one or the other. And these are usually relatively easy to implement, but it would be nice if we could actually have each patient to be their own control.

And for certain diseases only. diseases which are chronic, uh, where the treatments are essentially reversible, then in that case we can do this. So, for example, in asthma, which is a long term chronic condition, we could compare two beta agonists, two drugs designed to help your lungs to expand and breathe more easily, and we can do this by comparing two drugs designed to help your lungs to expand and breathe more easily.

We could compare them by treating the patients for a period with one beta agonist and then for another period with another beta agonist. And here we would randomise the order in which they're treated. But in a parallel group trial instead, which we might do, for example, in a, in a trial in stroke prevention, then in that case half of the subjects will be, randomized to receive the new treatment to reduce the risk of getting a stroke and half would get the standard treatment.

That's a parallel group trial.

**Alex:** Some years ago, you released this presentation called seven myths of randomization. And you discuss a couple, I think, five questions, different questions, uh, that we either can answer with randomized trials, or sometimes we maybe hope to answer with those trials.

Can you tell us a little bit more about the idea behind this presentation and what motivated you to prepare this material.

**Stephen Senn:** Yeah. I think the first statement of that was a little bit before the seven myths of randomization. It was actually, um, in a paper called added values and they, the five questions are as follows.

Give you the questions first and then give you the motivation. Question number one is, was there a difference between treatments in the trial? So in other words, was there an effect of treatment to use the language that statisticians will use? Second question is, what was the average effect for the patients that were actually treated?

What difference did it make to them if they were given treatment B rather than treatment A? The third question would be, was the effect the same for all the patients in the trial? And the fourth question then would be, which would follow if the answer to three was no, no we have evidence that it's not the same, perhaps we would then try and say, well, is it different for particular subgroups?

Can we say, for example, it's different for men than for women or for the elderly than for the young or whatever? And the fifth question, which is the most ambitious, is what will it be like in future patients not in the trial? Can we say anything about patients outside the trial? So these are the five levels of question and they gradually get more difficult. My motivation was partly because I think that statisticians were not being always honest with their clients, they were selling what they did as being more capable of answering all the questions than it was. Strictly speaking, I think most of what we do is stuck in question one and question two.

Occasionally we're able to answer question three, whether the treatment is the same for all patients. And sometimes we have enough power in the trial to say something meaningful about subgroups, but not very often. But the fifth one is one which really involves a degree of uncertainty which is not captured in the formal statistical analysis.

And we could always be proved wrong about this. We can't know what the future will hold. A good example is, um, the first clinical trial, or the first one that's really cited as being a pioneer trial, which is the streptomycin trial of tuberculosis by Bradford Hill. That trial showed, I think beyond a shadow of a doubt, that the patients given streptomycin were better off at the time the trial reported than those who were not on average.

It's pretty convincing. But if you were to say, is streptomycin still effective for tuberculosis? And the answer is no. Why? Well, because tuberculosis has mutated as a disease. We now have resistant forms of tuberculosis. You now no longer have the terrific treatment for tuberculosis that we thought we had then.

**Alex:** So the fifth question is essentially a counterfactual question. If we want to address, uh, we need to know more than, than just the information we are able to obtain from a single.

I'm not so sure I try.

**Stephen Senn:** Maybe, but maybe not. My understanding of a counterfactual is that counterfactuals are also involved or could be involved depending on one's point of view in the answer to the first two questions, question one and question two.

Because in a parallel group trial, for example, the, the patients who are given the control treatment are somehow. Substituting for the fact that we cannot observe the counterfactual in the patients who are given the new treatment. So each group acts as a sort of counterfactual group for the other. If you like, it's a way of estimating what the counterfactual might be.

You don't necessarily have to think of it that way, but I'm not opposed to thinking of it that way myself. No, I think that the business about the, um, The future, the question five is a different one, is we don't know how the world will change. Now we might like to think of that as counterfactuals, but it's not as if we're looking at what might have happened differently or how we can estimate what happened differently.

This is just the fact that the future holds many surprises for us and we can't predict what they will be. Things may change. Just to make another point, I think a lot of people are fixated on the idea that genetics is the only thing that matters. And therefore, if only we can find some reasonable way, whether it's classical genetics or whether it's even epigenetics, more complicated, some way of classifying patients, we will then be able to say everything about future patients.

But I think that's misleading. There are lots of other things that change in the environment all the time that may be affecting humans for all we know.

**Alex:** My thought here was that counterfactual question in a sense that we need to understand the mechanisms of change in order to say something about future outcomes.

**Stephen Senn:** Maybe, maybe that's so, I don't know. I mean, there are different views of causation, um, which might succeed if I understood them thoroughly enough in convincing me that I have to think of counterfactual that way. But I would think of it more as being a sort of, um, prediction using theory, and obviously our theories are fallible, even in physics, then physicists have had to tear up theories that they had and replace them with new, more sophisticated theories or alternative versions.

And the, perhaps the old theories were then found to be approximations immediately to the new theory and so forth. And obviously there are many, many things about the human organism we don't understand. So biology is even more hypothetical, if you like, in that way. We don't really know what's relevant always.

But nevertheless, we're trying to make predictions with theory. But we just don't know what the future holds for us. Maybe one would be better thinking of it in terms of history. We don't know what history holds for us because everything is terribly contingent. We don't know how contingent our predictions are.

**Alex:** Contingent?

**Stephen Senn:** Contingent in the sense that depending upon things that we would like to know but we don't know. We don't know what is the nub, what is the essence of the thing. This is also a problem in general when we have a look for effect modifiers. As I often point out to people, we assign patients their treatments but we don't assign them their covariates.

So we might think in a particular trial clinical trial that really what's important is the sex of the patient But it's not the sex. It's the weight we overlooked that The women are on average lighter than the men But actually once we condition on weight it explains everything between the two sexes I'm not saying this is necessarily the case, but it's a it's a possibility and so if we were Um, making a predictions based on sex, these predictions fail when we move from a population in which the women are all very, very slim to one in which there are many, many obese women who are perhaps even heavier than some of the males that there were in the original clinical trial that we ran.

So we don't know this, it's difficult to predict this, and we have to have some ideas. So we can't escape theories in a way. And we can't sample from the future. We can't know what the future population would be like. Yet it's obvious that populations change in time. You only have to look at the life expectancy of different generations of Americans.

The children of a previous generation of Americans or British or Swiss or whatever, had a larger life expectancy than their parents. And it wasn't the genes that were different. What changed?

**Alex:** In practice? How? Do you deal with this when you design clinical trials and you think about Providing your sponsors as you call them right in this field with information that will be helpful for developing a product and helping the patient.

**Stephen Senn:** Yes. I suppose the motto is that we, we try and walk well before we learn how to run. So, generally what we're trying to do is we're trying to answer question one and question two. It's incredibly hard to produce precise answers to what actually happened on average to the patients in a clinical trial.

primarily because, as I say, either the counterfactual person is missing, the alternative reality that there would have been for this individual because it's a parallel group trial or even in a crossover trial because the circumstances will change from occasion to occasion. We know that this is true, by the way, because there are some rare crossovers in which the patients have been randomized to receive A and B and then B and A and then A and B.

And we actually know that the difference from B to A changes from cycle to cycle. It changes in a way which we suppose is random in some way, some way that we can't see. So there's always this missing element. It's quite hard to establish that the treatment was effective. But we're also working on the limits of what we think is possible in terms of resources.

But financially and ethically, and I often put it like this, supposing in a particular clinical trial in lung cancer, we discovered a significant, a really clinically significant, not just statistically significant, benefit in terms of survival for patients. But we then discovered, well, actually the problem with this trial is, and it's not so surprising because lung cancer has been until recently anyway, predominantly, but not certainly not entirely predominantly a male disease on more males than females.

We discovered that although we can give a pretty firm answer, looking at the males, the proportion of females is just not large enough to say something separately for the females. Now imagine that you are a female lung cancer sufferer. What do you want? Do you want someone to come and say, well, you know, actually we can't register this drug for women.

Plausible it would work for women since it worked for men, but we can't register for women because we haven't studied enough women. So you're going to have to wait in the meantime, male lung cancer patients, yes, certainly you can have the treatment, but you female ones, you can't have the treatment.

And

in the end, we can't escape making these leaps of faith that we have to make.

And so I would say that whether we like it or not. We frequently can't answer even question three or question four adequately, nevermind question five, we cannot say anything meaningful about subgroups, but we have to behave as if it doesn't matter, because in any case, any attempt to answer these questions competes with other projects and the other projects involve new molecules, which might be even better than the current molecule, which is apparently the best around that we're raising question marks about.

So you simply don't have the time to do all of these things.

**Alex:** I understand that what you're saying is that there's a number of practical considerations that we need to take into account when we think about human health and human life. And we don't always have the luxury to collect enough data or to be congruent with the theory to the level that we might imagine or expect as researchers that do not have to deal with the practical stuff.

**Stephen Senn:** I think it's a mistake to imagine it's only a problem for clinical trials. And I think that the problem is The problem is that too many people have got sort of sociological investigations, which might involve surveys, possibly, what do people think, what do people believe about something, which involve a sampling paradigm.

And experimentation, I think, is not like that, that I think there are many cases where we cannot actually experiment on exactly the material we want. It's no coincidence that one of, in my opinion, that one of the most important early books on survival analysis was Jerry Lawless's book. And that he worked in, uh, essentially, if I understand correctly, he worked in, um, survival analysis applied to the automobile industry.

Now, what's the survival analysis applied to the automobile industry? Well, the thing is, you want to know how long a car will last before you need to replace X, Y, Z, piece. But supposing you want to have a warranty for a car for 10 years. You design the car, you build the car, are you going to wait 10 years before you sell it?

Is that what you're going to do? Would that be sensible? Because in 10 years, there'll be new models on the market. No. So what you do is you have to do accelerated life testing. You have to simulate that car being driven all the time by putting on a machine to drive it or something like that. And you have to really put it under stress testing, and you have to then extrapolate from the stress testing to, to somewhere else.

That's what you have to do. You have to do that in order to be able to use these things. So it's not that unusual. And you, this is an example I've given you from the physical sciences. And I think we've sort of lost the history of experimentation. It's a long, long history now, a hundred years at least, of a formal theory of experimentation, Was dealing with all of these sort of matters in a formal way but using highly abstract models Which are experiments and this was a causal program in a way it may be a causal program that we've lost sight of.

**Alex:** What can the Industrial community learn from clinical trials.

**Stephen Senn:** Well, I was thinking more the other way around Let me go back and tell you a bit about the history of experimental design how it went It I think it really starts with R. A. Fisher arriving at Rothamsted You in, uh, 1919. And he soon started to work on, uh, on experiments. And then these experiments were quite complex.

You had fields where it was known that there were fertility gradients in the fields, and it was also known that, um, there was some sort of a correlation structure that, you know, the closer two pieces of soil were to each other, the more likely it was that the yield would be similar. And what you had to do was you had to try and compare ex treatments in this particular experiment.

And you could do complicated things, like you could actually vary nitrogen and, and phosphate at one level, and then you could maybe vary the variety of wheat that you plant at another level. So the actual experimental units, blocks, plots, subplots, they were different sizes. And, interesting enough, Fisher's first analysis, he got it wrong.

The Fisher McKenzie paper which has many innovations in many ways, but his estimation of the Vera variance, I think this would be sometime between, let's say 1923 and 26, something like that, he actually got it wrong. But within a short amount of time, he realized what you needed to do. He realized you needed to really make sure that the contribution to your estimate of error reflected the way in which this would also contribute to the differences between the treatments.

If you could match the two, you could then get valid standard errors. And out of this, they developed a theory which one starts to have a look at designs in which not all the same treatments could be compared in the same block. So incomplete block designs were developed and then ultimately designs fractional factorial designs in which you wanted to compare lots and lots of combinations of treatments but you had more treatment combinations than you had experimental material.

How could you do this? The answer was by deciding that higher order interactions were. This was then developed further in industry, starting maybe around about, uh, the 1940s. And, uh, you then had a, a large impetus of development in industry, and clinical trials came along relatively late to this. And now I understand there are also internet experiments, which is perhaps the fourth age of experimental design, with which I'm not particularly familiar.

But there's a long, long history of this. And one of the important things, I think a key milestone for me, was a development in 1965 by John Nelder of a front end machine, a front end way of thinking for linear models, which would dictate that given the blocking structure, given the way experimental variable.

Variation could be seen before you started experimenting given the treatment structure and then given the design matrix which matched one to the other, it would tell you precisely how you had to analyze the experiment in a way that made the standard errors correct. And this unfortunately has not had as wide an appreciation as it should have.

This is a sort of thinking that, that should guide in my opinion the way we look at a lot of design experiments.

**Alex:** You are not only a seasoned statistician, you are also interested in the history of the field. Who would you say is the most underappreciated statistician of all time?

**Stephen Senn:** Well, as if that's difficult to know, I mean, um, Fisher as a scientist is underappreciated.

I think there's no question. I would say his reputation among statisticians is pretty high. But occasionally his ability as a mathematician is underestimated, but Jimmy Savage, who was certainly a very good mathematician himself in his rereading R. A. Fisher piece, confesses that he realized he completely underestimated Fisher as a mathematician.

And I think the reason was partly because Fisher was a very good mathematician. But using a form of mathematics that was slightly old fashioned by the time that he was using it. So he was not someone who cared about Lebesgue measure, for example. And so a certain formalism that, uh, that soon became established in, uh, in statistics, largely, I think, due to the influence of Jerzy Neyman was not present in Fisher's work.

Instead, he used a lot of geometrical intuition and also some geometrical proofs, which I think are actually fairly rigorous, but people preferred algebraic proofs. And so for some reason, his mathematical influence was, uh, his mathematical ability was underestimated. And this means in certain quarters, as a statistician, he's underestimated, but his reputation in statistics is fairly high.

Um, otherwise I don't know, really. I think there are some other important figures that maybe were eclipsed a bit by him. Pittman's work is very profound, but there are many papers, partly because he was very busy both as a teacher and administrator. Then there are earlier forerunners of Fisher Edgeworth from the 19th century, very important.

And, uh, yeah, it's difficult to, difficult to know. Maybe we will know in 20 years time or 25 years time or whatever

**Alex:** You as a teenager were very interested in mathematics yourself. How did this interest Influence choices in your career and also your view of the field of clinical trials where you worked for most of your career.

**Stephen Senn:** So growing up in Switzerland, I didn't study enough mathematics really to, and I wanted to study in, uh, as a university student in England, I didn't study enough mathematics to really do an English mathematics degree. So I chose economics and statistics as a possibility, which was a fairly quantitative subject that I felt I could manage, but actually in retrospect, I think that probably in a sense, I'm making excuses for myself.

I think even though my formal education didn't include much mathematics, I think maybe. Mathematics was not my thing anyway. I mean, I do use mathematics a lot and I like mathematics and I enjoy it to the extent that I'm able to do it. But I often feel that in statistics, that's not the way the, where the action is for me

and that often I can't proceed with a problem until I can see what the solution is. And I see what the solution is by thinking about it and avoiding doing the mathematics. until I feel it's time and I'm ready to do it. And that's partly because I feel that unless I have some intuition to guide me with the mathematics, I'm going to get it wrong.

I'm going to, if I just follow it in the, you know, this logically follows from that, which follows from this, which follows from that, and so forth, then I'm not going to get it right. And also I'm lazy. I don't want to start doing heavy work if I don't have to and so I tend to sort of try and think about it, and then once I understand the problem, I think I understand the problem, then proceed.

But I mean, you know, I have said, I have a paper in which I talk about the role of mathematics and statistics, in which I freely confess that I wish I knew more. I wish I was better at maths than I am. Um, I, at university, I had for my purposes, a fairly thorough grounding in linear algebra as part of my economics course, funnily enough, not as part of my stats course.

And of course, you know, the stats course covered probability theory, derivation of distributions, properties of statistical estimates and all this standard stuff, but you were only ever using as much calculus and analysis as you needed to for the purpose of understanding the statistics. So, you know, I, I don't have a background of, of much, much deeper mathematics that I can, can call on for the purpose of doing what I've done.

But I know other people who also didn't study maths at university who have, have gone on to have a much deeper mathematical understanding than me, so, you know.

In a sense, there's

no excuse for me.

**Alex:** What made you interested originally in mathematics, in looking at the world through, through this formal lens?

**Stephen Senn:** Well, I mean, I, I think I, I certainly as a, as a school child I loved mathematics. I thought it was a terrific subject, it was my favorite subject and the subject I hated above all others was art. I just, you know, really loathed art and I've never been able to, To draw to save my life. I can hardly write my own name, anything to get my finger tracing on a piece of paper is completely hopeless.

And, so from that point of view I always liked math. My mother was very interested in mathematics. And, uh, so she was a big influence on me from that point of view, although she was not a mathematician. She was a radiographer by training and then become a teacher, but she was always interested in it, but I think. It was more what I discovered was statistics. I discovered that statistics had another dimension in it, which also made the mathematics more enjoyable about me because, enjoyable to me because what I particularly liked was the fact that you could design things and you could implement them and you could actually see the results.

So, you know, one of the biggest buzzes I had working in the pharmaceutical industry was designing an incomplete blocks crossover trial in comparing seven treatments in five periods in 21 sequences. And, uh, that's fairly complex, fairly complex. Yes, I think I would claim it was possibly the most complex clinical trial that had ever been run at that particular stage crossover trial.

I've never come across. One in patients before then that was as complex as that, I was worried that we wouldn't be able to implement it, and I contacted trial logistics, which is something you should always do before you start thinking that you can run a particular trial, go and see the people in charge of trial logistics because they know what limiting factors.

I came across this recently was giving some advice from a particular company and they were proposing a particular randomization ratio which implies a very, a very long block size. If, if for example, if for example you choose a, a 13 to 5 randomization ratio, you might think that's optimal for some particular reason.

You can't achieve that except by having a block size which is a multiple of 13 times 5. So what's that? Uh, 65 is it? I forget. No. Yes. 65. Yeah. They say there are three kinds of statisticians, those who can count and those who can't. Anyway, so the, the, uh, so the 665, so that's rather a large block size. So, you know, you have to think through these things.

Anyway, to go back to, to my incomplete blocks design, I said to them, what's the maximum number of groups you can have for packaging? Because each of these 21 sequences would have to have their own package. This is what you take for the first day, this is for day two, this is for day three, this is for day four, this is for day five.

We'll wash out periods between the days, but on these days we would have to take these, these things. So I said, what's the maximum number of packages you can have? And they said, 26. I thought, wow, that's great because I've just done 21. I nearly broke the bank, but I didn't get there. I can have 26. And then I said, uh, why 26?

They said, there are 26 letters of the alphabet. So it, it turned, it turned out was they had a labeling system which had to have a label A, B, C, D, E, F, G for each of the packages in the list that they were doing or each of the lists that they were doing. And no, and nobody had ever thought that it would be crazy enough to have a clinical trial, which had more than 26 groups.

So I would need more than one letter. So, I don't know whether they would have to do what the Excel solution is, which is to go on from A, B through to Z, and then you go, A, is it A, A, and then A, I forget. Anyway, but that turned out to be, to be manageable. And so we did it. And the particular trial, which was to compare three doses of a new formulation to three doses of an existing formulation to placebo, that's the seven treatments.

It proved that the existing formulation, the new formulation, much to our surprise, had one quarter the potency of the existing one. So that project was killed dead. So I killed a project, stone dead, with a design in which I had a major part. And that gave me a great deal of satisfaction. Because my name was Mud, you know.

Senn killed this project. And I said, no, no, no, no. I'm done. The project killed itself. It's just that I offered the coup de grace at the time when it was necessary to put it out of its misery. When we, we found out early that we weren't going anywhere with this particular formulation and that was, that was important.

**Alex:** Being a part of the process of designing and conducting clinical trials is connected to, I suppose a huge dose of responsibility because the outcomes of, of what you design and what you conduct can have very profound impact on people's lives. What were some of the stories in your career where you felt that this responsibility is something important, something significant.

**Stephen Senn:** Well, I, I think every time I've been on a data safety monitoring board, um, there is that possibility because you have to decide whether the trial will continue or not and my experience in the field of cancer on data safety monitoring boards has been particularly marking from this point of view because typically, uh, an effective drug will have toxicities.

So, and typically you will see the toxicities before you see any efficacy. To put it another way, if you don't want any toxicity, take a placebo. If you don't want any efficacy, take a placebo. So, if you want some efficacy, you're typically going to buy it at the expense of some toxicities, expense of some unpleasantness at the very least for a number of patients.

Also, you will see the toxicities before you see any benefit in survival. So, when you're looking at the two arms, there. What you see is on one of the arms, you'll see the toxicities mounting. And that could be an indication that there will be a benefit there as well. It would be a benefit from increased survival, but it might not be. You might simply be giving toxicities without any increase in survival. So it's quite a difficult decision to make there, as to whether one should continue or not. And of course on the Associated Monitoring Board, I'm fortunate that I have a number of medical colleagues who are trained, gifted for, and able to make decisions based upon individual stories that they get from patients as well.

I always make it quite clear. I can't do that. All I can do is help them in comparing numbers, averages, totals, these sort of things between the two groups. So these are quite, uh, quite unpleasant things. It's also happened to me. Once that I've stopped a trial early because of efficacy. So then you feel rather better about that.

You know, you can say, well, I was able to, to, we were able to see quite clearly early on in this particular trial that, uh, that the new treatment was efficacious. And that's happened to me once. It's also happened to me that I've been involved in a trial in which the particular treatment, which was already on the market, was proven to have a toxicity in the trial and so the trial was then stopped and this led to the treatment being withdrawn from the market. So, of course, that's a shocking thing because you don't like to think that a, that a trial has been started, which had that particular outcome. On the other hand, of course, you can always say, if the trial hadn't taken place, then maybe the treatment would still be on the market, though, um, This particular treatment had been introduced in an earlier era in which there wasn't such rigorous rigorous testing. So it's difficult, difficult to know. You you won't discover anything without there being some risk that the treatment will, that the result will not be positive. If we knew it would be positive, we wouldn't be doing a trial.

So it's inherent to the particular process.

**Alex:** You've been working in both academia and industry. What are the main differences between those two worlds from your experience?

**Stephen Senn:** Well, I would say that, um, you can learn from both in very different ways. So my first academic job was lecturing at the Dundee College of Technology.

And for that particular job, I had to teach a large variety of courses. And I had to teach things that I had not myself studied at any stage. So for example, I had to teach some operational research, which I hadn't done and in some cases I was only one week ahead of the students in what I was learning.

So it was high pressure and you were teaching a lot of stuff some of which you're familiar with, some of which you weren't familiar with. And so of course, one of the things that happens there is that you get a fairly broad, but not necessarily deep knowledge of a number of things. Topics and I'm glad I got that knowledge.

I've never actually worked in applying multivariate methods, but I've had to teach multivariate analysis. I've done very little work in sampling theory, applying it, but I've had to teach it. And so I know something about these topics simply from having to teach that, and that's what, what happens there.

But in industry, You also have a learning process, and that's a very different sort of learning process. You then actually learn by doing things, working on projects, and it matters whether the projects succeed or fail. And if they fail, it matters that they should fail early and they shouldn't go on too long and these sort of things.

And it's an intense, collaborative work, and that's also very, very useful and very interesting from that point of view. And I'm also glad I had that. Actually, before I started as an academic, I'd had three years working for the health service. And there, that at least taught me one thing, and that was, be very careful about data that are given to you.

Because there, my job involved looking, using a lot of data that were collected for official purposes. But when you look to them closely, you found that the data were not really quite so innocent as you thought. One example is actually treated in my book Dicing with Death, and that's um, to do with the population figures that I put together for the health district I used to work for.

And I put them together and I noticed, based upon other figures that have been given to me from official publications, and I noticed that the population of the district seemed to go up and then down and then go up again. And then I realized that the going down occurred shortly after census had been taken.

And what was happening was a series of being recalibrated. So you have a census and then what happens is you add the births, you subtract the deaths, and you make an allowance for migration, which is always very, very difficult. And you do this annually. And then 10 years later, you get another census and you get a chance to correct and recalibrate the figures.

So, this was a lesson to me that, you know, even, even though it's an official statistic and it's collected by people who are trained to collect such data, doesn't mean that there isn't a story behind it that you have to understand if you want to use the data. So that was also a valuable lesson.

So I think, I think some practice is a very good thing, but certainly the, the time I had in the pharmaceutical industry was very, very enjoyable from the point of view of professional challenges and having problems and solving problems and not always solving them, but then trying to find another way of thinking about them.

So it's really good.

**Alex:** What was the challenge that you solved, the one that you are the most proud of?

**Stephen Senn:** Well, I think that what we did during the time I was working for Ciba Geigy, which is a forerunner company of Novartis during that particular time, I think the, the general approach that we managed to bring to designing and analyzing trials in asthma which also carried over to a certain extent to the other particular conditions we were dealing with. I was, um, latterly there, I was head of the group that dealt with, um, chronic diseases. And uh, we, we instigated a number of things, which I think are fairly obvious things to do, but which are not standard.

We made sure we didn't dichotomize any data. You'll still find that there are commonly lots and lots of dichotomies used for analyzing clinical trials, which is very wasteful and misleading. Uh, we use covariates, uh, more than one covariate and we use analysis of covariance for doing that and we sort out useful transformations.

So basically I think we were sorting out, we were establishing the way in which trials in that particular area should be done in what was a relative Cinderella area, there have been some excellent papers on how you should analyze trials in cancer. Some of that's been forgotten since unfortunately, uh, there were excellent papers on how you should analyze cardiovascular disease trials.

These are big big areas, but something like asthma was less well established And so that was that was something I think as a whole I think it wasn't just me It was my group as well, you know I think that was something that we could regard as being an achievement of a crossover trials were a particular field where I also worked a lot on the theory, and so I have a book on crossover trans, which is now rather elderly, second edition, 20 years ago.

**Alex:** You mentioned working with additional covariates. We can imagine that there's always a trade off. If we have a description of a given unit, a patient or. whatever our unit is, um, that is a multidimensional description. And then maybe we have an experimental study or maybe a mixed study with observational and experimental data.

And there's a trade off between including as much of this description as possible and the statistical Power. Where would you put the boundary in practice from your experience? Are there any hints or any rules of thumb that you would say could work in, in, in cases like this, when we need to make this choice, which covariates to add to the model?

**Stephen Senn:** Yeah, I would say there are some rules of thumb for a typical phase three. Okay. parallel group trial will involve hundreds of patients. And for a trial like that there's no reason why you shouldn't have half a dozen covariates. You could probably have more. You could probably have 15. It wouldn't be a problem.

You know, it might not be worth it, but you know, you could have, you could have that number. It's not really a problem. If it's a very small trial, it begins to be a problem. And so, um, there are three things you have to understand about what happens when you fit a covariate. Here, I'm talking about the linear mod, the normal linear model case.

We can think about the nonlinear case in a minute, but for the normal linear model case, then there are three things that happen. The first one is that the mean square error. To the extent that the covariates you fit a predictive will go down. And this is advantageous because it means you're going to have, uh, smaller confidence intervals, narrower confidence intervals, other things being equal and greater power in the, in the trial for a given signal that you're trying to detect. To the extent that the covariates are not orthogonal, not perfectly balanced, there will be some inflation of the variance, so this acts in the other direction. But the Consequence is small. It's usually the order of a loss of about one patient per covariate. So in a trial with hundreds of patients, that's not important, but in a trial with a dozen patients, that will begin to be important.

You shouldn't fit five covariates if you've only got 12 patients. The third thing is that the, is what I call second order precision. In addition to the variance going down because of the mean square error, And the variance going up because of the non orthogonality. We're talking here about the true variance of the treatment effect, but nobody will tell you what the true variance is.

You're going to have to estimate it. And you have to estimate it by using the residuals. And you will lose one degree of freedom for every particular covariate that you fit. And one way to understand the effect of this is to have a look at the variance of the t distribution. I think for memory, the variance of the t distribution is nu divided by nu minus 2, where nu is the degrees of freedom.

And so you can see that the variance will be, Much larger than one if nu is very small, but rapidly you approach you approach one. An interesting test for anybody who uses statistical tables, but nobody does these days But I'm from a generation who did the question is what value of the t distribution?

the degrees of freedom of t distribution corresponds to critical value of 2 for 2. 5 percent either side. So in other words, for a typical 5 percent two sided test, and the answer is 60. 60 degrees of freedom, you're at 2. At infinity, you're at 1. 96, which is the value for the normal distribution that everybody knows, but 60 degrees of freedom will give you 2.

So this is the second order thing, and that's usually not so important. So these are the three things you have to consider. And I would say that provided you stop, consider them, and think about it, then usually you will have a pretty good feel for what's reasonable for a particular clinical trial.

And we probably, for most of the clinical trials we run, certainly in phase three, we underuse the ability to fit covariates. Now, the normal distribution is a two parameter model. Many of the other models that we use are one parameter models. So for example, logistic regression, it's a probability And there's just one probability.

Admittedly, you've got lots of linear predictors, but in the end, the distribution itself, the binomial distribution has this one parameter p that you don't know. You know the end. So that's a different because then, um, essentially what happens is that if you miss out covariates which are predictive, your model is somehow misspecified and so, you don't see the effect on the variance in the same way. And what you tend to see is you tend to see that you're losing the signal itself directly. So you see an attenuation of the estimate on the particular scale that you're using there. And so things are a bit more complicated. A lot of the considerations still carry over.

You still pay exactly the same penalty for loss of orthogonality. It doesn't make any difference whether you're using. Least squares or the normal model or logistic regression or survival analysis or whatever. That thing which is a function of the imbalance of the Xs, that is paid for, that penalty is paid in all of those cases.

You don't get the mean square error benefit that you would get in the normal case, but the benefit is expressed in not attenuating the treatment estimate for these things. So it's a, it's a field where I think a lot of work has been done, a lot of work has been forgotten people reinvent the wheel all the time.

I still see lots of papers about how covariate models predict using simulations where I say, well, you know, That's obvious and that's not true. And I know this, you know, simply from reasonable knowledge of the theory, but it's, it's not nothing that I've developed. It's just should be out there. People should know it.

**Alex:** And what happens in nonlinear cases that you mentioned before?

**Stephen Senn:** Well, as I said, it's basically the models are in a sense inconsistent. This is also related to something I think, which some of the causal people complain about non collapsibility of particular models. Um, the odds ratio could be the same for two subgroups that you have.

The subgroups could be perfectly balanced. There's not a question of confounding. But the odds ratio for the two groups pulled together would not be the same. This seems to be baffling. That's simply because there is a scale effect. If you look at it in prediction space, then that's not true. Parameters are one thing.

I think it's Philip David. I'm, I think Philip himself couldn't find where he said this, but I always attribute this particular remark to him. He says a parameter is just a resting stone on the road to prediction. So a parameter is not in itself particularly important. It's just a means of deciding what we're going to do in practice.

And so I sometimes feel that, well, you know, okay, so collapse, non collapsibility, so what, you know, does it really matter? Isn't it just a question as to whether the model predicts things reasonably or not? But maybe I don't understand enough about it. You know, I know people that I respect highly who work in this field and know a lot more about causal inference than I do, like Sander Greenland, for example, are quite worried about non collapsibility.

So maybe I should be.

**Alex:** What do you think about the ideas that could help us be more efficient with experimentation? So there are a couple of families of this ideas, right? One would be. Um, general optimal experimentation theory, another would be a causal data fusion, combining data sets from observational and experimental realms together in order to maximize the information gain.

Is this something that could be helpful for us as a community when it comes to clinical trials and other areas where experimentation is prevalent today.

**Stephen Senn:** Yes. I think there's a lot that we could do, but I think there were two ideas that you picked that you mentioned that I'd like to pick up each in turn.

The first one is experimental design. I think there it's very worrying in a way what happened. There are many deep, wonderful and complicated, complex theories of optimal, optimal experimental design that we medical statisticians ought to know better and don't. So there's a lot of really deep theory there.

But equally, there's a lot of deep practice which some of the people who talk about optimal experimental design ought to know. So optimal experimental design could almost be, I think, a branch of pure mathematics. It's so, it's so austere and beautiful and fantastic. But as soon as you want to apply it to anything, to any real experiment, you enter into the world of the engineer. And then things are somewhat different. And I've come across cases where people have proposed optimal experimental designs, but they don't know, for example, that patients are not treated simultaneously. So, they misunderstood what a period effect is. So, all sorts of things like this are just not known.

They don't know what the constraints are in practice they don't know, for example, that, uh, certain types of optimality would require you to lengthen the time for which a clinical trial would report. Suppose, for example, that what you have is you have a disease in which 30 percent of the patients are female and 70 percent are males.

But you decide you want to make an equally precise statement about the women and the men. How are you going to do that? Because the patients will fall ill when they fall ill. And they will fall ill at the rate that there will be two men falling ill, more than two men falling ill for every woman who falls ill.

So how are you going to recruit them into the clinical trial? It's going to be quite difficult to organize things so that you can actually get the same numbers. And typically what will happen is if you define a stratum with a target, you will find the rest of the trial is finished and the trial is continuing.

So I blame both sides. I blame us medical statisticians for not knowing more about the theory. I blame some of the optimal experimental design experts for not thinking enough about the theory. I'll give one further example, and that is, you can have optimal dose finding designs, which outperform regular ones that statisticians use in clinical trials, but they only outperform them if you can guarantee you'll come to the end of the dose escalation.

But part of the point about a dose escalation is you might not get to the final dose. That's why it's a dose escalation. And if you don't get to the final dose and these designs are sometimes inferior than the back of the envelope rule of thumb designs that the statisticians use. So that's yet another example.

So there is a failure for the two communities to speak to each other. There's also a second, there are two strands of optimal design. There's the optimal design, which is, um, Kiefer Wolfowitz. the sort of wonderful work that they developed together starting in the late 50s and culminating around about the early 60s.

And there's a work which had already been done by, begun by people like, um, Fisher, Frank Yates, um, Finney and so forth which was based upon randomization and correct analysis of variance. And there's still these two particular strands, but both of them, both of them have a lot to teach medical statistics, no question about it.

And we need to find some way of doing that. bridging the gap. Now as regards your second thing, fusion, there I feel that it's plausible that a way of getting, putting observational data sets and clinical trial data sets together will yield more power. I've made some intensive suggestions as to how it might be done.

I've seen some very interesting and intriguing proposals by other people, but some of them don't work. Some, some of them would have to rely upon a random sampling, which doesn't take place. But that doesn't mean that they should be dismissed. It may be they could be fixable with further things done to them.

And so I think the onus is on both sides to try and talk to each other.

**Alex:** It sounds to me like we need a lot of Improvements when it comes to communication between those little different ghettos that we have here.

**Stephen Senn:** Yeah, there's not just the ghettos between statisticians and people who are working in other fields, but there's a, there's a ghettos within statistics that I've been talking about.

But I mean, sometimes surprising things happen. A very early paper on applying standard errors to agricultural experiments predates Fisher. And it's actually a collaboration, between an agricultural scientist and an astronomer. And they actually say in their paper, people may think that agriculture and astronomy have got nothing to do with each other, but both of them are really badly affected by the weather.

So, or strongly affected by the weather. Um, but then the astronomers were used to calculating, uh, probable errors. Nowadays we calculate standard errors with probable errors for the astronomical data. And in this particular paper, it's shown how you could apply this idea to actually, to experiments on root crops.

So this dates from about 1910 from memory, and one of the persons was certainly Stratton, and I think the other one was Whittaker. I think it was Whittaker and Stratton, but I'm not sure. But anyway, that's, that's an example. So, and, and Fisher himself was obviously influenced by agriculture about which he knew nothing when he arrived at Bothamsted.

And, then Cochran who this is not the Cochran, Archie Cochran of the Cochran collaboration, but William Cochran, Bill Cochran of Experimental Design fame. He then later got involved in sampling methodology. so that was yet another field of application, application to, uh, Survey methodology and so forth.

**Alex:** For people familiar with Perlian or Rubin formalisms for causality.

What would be one book that you would recommend for them to read in order to understand a perspective of a practitioner of, of the limitations in the real world that could help them?

**Stephen Senn:** Well, first of all, I'm not the best person to ask about, I mean, uh, Don Rubin, I know well, uh, Judea Pearl's book. I read I mean the big book on causality. I read when it first came out and was very impressed and, uh, I follow him on Twitter. And there's no question that, um, his, uh, do see distinction is very, very important.

You know, it's a sort of mere. He's obviously done a lot more than just that, but I'm saying that itself is a sort of Columbus egg type, uh, type thing. It's in retrospect, it seems obvious and simple, but in fact, it eluded many people before. So, I mean, this is all, all very important, but I'm not the person to talk about, uh, um, I'm more familiar with Don Rubin's work, but, but, um, they're, but they're both for me, um, very, very important figures in the wider world of inference of that.

There's no question. But what people, if I answered your question, you're asking me, what should I tell people who are only familiar with, say, Perl or Rubin? What should they know about statistics? Is that what you, what you.

**Alex:** About statistics as you use them in experimentation in study design and I asked, let me give you a little bit of motivation.

So you told me before today that you have a feeling that sometimes we might have people who have certain theory, but they are not familiar with what is going on on the frontline, so to say. And this makes certain theoretical ideas, although they might be valuable in themselves, difficult to apply. In practice.

So my question was motivated by this earlier conversation of ours. And in particular, this idea that maybe we could open more doors between our ghettos in order to make people more aware of the limitations and challenges that the other side is facing. So my question was, what would be one book that you would recommend to people who are maybe causality researchers and they understand the formalism, but they not necessarily have a good perspective, a good clear view on what are the limitations of applying their theories in practice?

**Stephen Senn:** Yeah, I think I would tend to think that it probably has to be grounded in whichever particular application they're looking at. I'm not sure there could ever be, or there has been a book in general, which would do this. Certainly, Um, if you wanted to have a look at say, you know, what's going on in economics, there's no point in looking at my book, for instance, even though there is a chapter on health economics in it, which is now, uh, particularly old fashioned, I think.

So let's stay in the, in your area, in my, in my, in my area. Well, I think on modeling, um, I would recommend Frank Harrell's book. I think that's a good book if you want to understand some of the practice and theory Uh, merged together by somebody who actually is a real programming junkie. I mean, Frank is, when I was in the pharmaceutical industry, Frank was famous for the work that he'd done on SAS to make SAS usable.

Because SAS was very powerful in many ways, but you could only do survival analysis and you could only do logistic regression by using the routines which Frank himself had written for the Suki library.

**Alex:** And

**Stephen Senn:** then he went on to S plus and now he does R and so forth. So that I think would be a good book to get a flavor of some of the practical things that people think about.

If you want to know about clinical trials in the pharmaceutical industry, then I would recommend my own book has lots of practical issues. That you would have to, to think about. So I think that would, I'd like to think that that would be a book that you could use. And it was originally written in any case, to try and make it easier for life scientists.

This book? Yeah, this book. Yes. To try and make it easier for life scientists and statisticians to talk to each other and there are a number of issues. I mean, that's the whole point. The point is to raise issues, which I consider to be still controversial, in some cases unsolved. I don't know what the answer is.

In some cases, I think I know what the answer is, but I know that some people would disagree with me. And I think that if you have a look at those things, it will give you an idea as to what sort of things that really are debated by the people in the particular field. Um, I imagine that there are similar things in economics.

Marketing would be another area. I don't know. You wouldn't think of all sorts of application areas. Yeah. But the, but I think the application field is, is certainly important because there are particular restrictions you have to know about, even in survey work, things about, you know, what in practice is possible for people to do if they are, using a clipboard still these days to ask questions, you know, what sort of things can you do?

What are the practical constraints about how many questions you can ask? You know, what are the effects on, on, uh, quality of asking too many questions? And, you know, what about embarrassing questions? You know, all these sort of practical things are very, very important if you want to get good answers in that particular field.

And certainly, my book on designing clinical trials is not where you would find them. An industrial experimentation Um, I would say that looking at, um, Box Hunter and Hunter is probably the book which has both the theory, George Box was someone who certainly contributed to the theory of experimental design, but also the practice.

He was someone who'd worked for ICI on the production side, in fact, and did, uh, did a lot of important work on experimentation there. So that's the sort of book I would encourage people to have a look at. So rub your noses in reality, basically.

**Alex:** In your book, you devote a significant part of the text to how to communicate with, with stakeholders.

What would be two or three main lessons from your career communicating with sponsors that you could share with the community?

**Stephen Senn:** I would say, try and find Simple stories that everybody can understand to communicate an idea. I can give you one example. People often say it's very strange if you do an analysis of variance that you should be able to determine from the analysis of variance that there is a treatment effect, which is to say at least two treatments, let's say amongst the five tested, are different.

But once you look at the pairwise comparisons, you cannot actually say that any particular pair is definitely different. Yeah. How can this be possible? This is surely ridiculous. And I say, is it? So imagine the following scenario. A mother leaves her two children playing together in a room. When she comes back, she finds that they're fighting.

She knows that at least one of them has been guilty of aggressive behaviour. But she doesn't know who the aggressor is. She doesn't know that from the answer. So it's perfectly possible in everyday life to have results like this. There's nothing particularly mysterious about it. We can, in fact, find that the answer to a question is, well, at least some of the treatments are different, but we don't know.

Exactly, or it's difficult for us to say exactly where the difference occurs. But that's just, just one example. We'll try and find things. Again, try and find simple graphical methods. I have a method, a way of teaching regression to the mean, which just uses graphs and nothing else. You don't need to know anything about the normal distribution.

You don't need to know anything about that. It just shows you that just cutting the data the particular way you choose to cut them will cause regression to the mean to happen.

**Alex:** Hmm.

**Stephen Senn:** And people can then understand that by seeing the diagram. If they don't believe you, then you can give them the data set from which the diagram is constructed and say, well, try it out yourself and see what happens.

And so I think it's more important to teach these ideas to a certain extent, except for people who want to see it. Resorting to algebra is to admit defeat. It's not what you want to be doing if you can avoid it and there are some, some examples also, I think, but also the statisticians themselves.

I mean, you know, if you have, for example, you have a particular experimental design, which has a number of let's say cells in it, these cells are created by cross classifications. What you can say is that any linear solution will be a linear combination of the cells. You can hide it behind a matrix algebra.

You can say, well, the answer is x transpose x to the minus one x transpose y. That's what the solution will be like. And the variance will be equal to x transpose x to the minus one sigma squared, blah, blah, blah, blah, blah. You can show all of this. But it's a linear combination. So the question is, can you find the weights?

Because if you can find the weights, you can see exactly what they do. In a crossover trial, if you want to eliminate the patient effects, the weights must add to zero over any patient. If you want to eliminate the period effects, they must add to one over any period. If you want to estimate the treatment effect, B minus A, they must add to one over all cells labelled B, they must add to minus one over all cells labelled A.

You can often solve for the weights with a little bit of extra minimization at the very, end, just using these insights. And I frequently find people looking at things like step wedge designs or, you know, complicated designs like that. They failed even to grasp that in such a design, the first period data contribute nothing at all to the estimate.

The weights for the first period are zero because it's the only way you can eliminate that period effect. It's obvious in retrospect, but until they understand that, they don't understand the design. If they just understand it in terms of This is the matrix algebra, and this is what it says the variances are.

They don't understand, until they understand that all the algebra is doing is in these solutions. You asked me about mathematics earlier on. When I was at university, I had a friend who was very dismissive about statistics. He said, all you statisticians do is add things up and occasionally square them.

Which I found rather hurtful, but as I've grown older, I've come to the conclusion, well actually He's right. A lot of statistics is basically adding things up and occasionally squaring them. It's adding linear, linear combinations is what a lot of it reduces to. Maybe through an iteratively reweighting process, but eventually that's what it reduces to.

And understanding this is important. And I also say to every, we say to more statistics students, you need to understand statistics at least two different ways. You need to understand it from the math, certainly. But you need to understand it in the sense of the intuition. If you don't have the intuition that goes with the maths, you don't understand it.

If you don't see why the maths, in order to do what you needed to do, delivers the, the answer it does, you don't understand it. And you need to understand both ways. And that's why it's such a difficult subject, in my opinion, but also such a beautiful one. It's got this, this two dimensions to it. But probably that's true of all sciences, I don't know.

**Alex:** In your career, when you experienced difficult moments, challenging moments, what was the thing that was keeping you going?

**Stephen Senn:** I have a job where I get employed and I get to solve difficult problems and That's a plus and a bonus and you know, you have to be grateful really. I mean, I think I came across statistics by accident and I feel which I know is a statistical fallacy.

I feel I was preordained to be a statistician. As I say, I couldn't be a mathematician because I hadn't got the background. I studied economics and statistics. What then happened was I didn't like the economics and I stopped and looked around for an MSc. I wanted to stay on at university because I'd met my wife by then.

She was still, in her second year when I graduated. So I did, computing and statistics. I didn't like the computing. So by process elimination, I was left with statistics. Then I got my first job and I started really enjoying statistics. But of course, as statisticians, we also know that most of anything that happens in life is luck.

It's not under your control. However good you think you are at modeling and predicting and whatever. So, you know, I have to admit that I've been lucky. So be lucky is my advice.

**Alex:** What would be your advice to people who are? just starting with an advanced field like statistics or causality or machine learning or whatever they they've chosen

and they may be feel that there's so much to learn and they feel unsure if they will be able to master all the pieces, all the elements that are needed to succeed in those, in those fields.

**Stephen Senn:** Well, I think it's good to find concrete problems to work on. So that you get the habit of solving something which is just yours, you know, it's something that you particularly own.

I mean, I, I used to, my PhD was on regression to the mean, and I mistakenly assumed at one particular life point in my life that I was the world expert on regression to the mean. I was not, but I, it was a field I worked in a lot and, um, but don't necessarily stick with that, you know, look, look wider, have something concrete to be going on with at any particular time.

Solving the concrete problems, at least for me, has been a good thing. But also spend some time to look around, see what else is happening, and see where you can borrow stuff. You know, maybe interesting results that other people would use. For example, NF1 trials, which is particularly applicable to personalizing medicine.

Which is related in a sense to the crossover trials that I did. So it's a field that I have some background in, but N of 1 trials themselves were proposed in medicine, by people at McMaster University, Gordon Guyatt and colleagues of his, who shared a coffee room with a psychologist.

And the psychologists were always talking about these end of one studies and they could never understand what on earth they were talking about. But they essentially were talking about testing and retesting individuals, maybe by giving them stimuli, you know, sort of, Faber Fechner law, I don't know what, some sort of stuff that they might, they might do.

And then suddenly, it suddenly hit them. That's what they meant. They were randomizing in the same subject and they said, Oh, we could do this in medicine. And they then started, I think, doing, amitriptyline for fibromyalgia. They actually started doing this. A number of patients were randomized in different episodes to either get placebo or, the active treatment.

And so that would, that was, just by looking a little wider there, it was a chance, essentially a chance encounter, which, which led to that. So I think, concrete problem plus, plus looking around.

**Alex:** What would be the one idea that you think if we, or one challenge that if we were able to solve would change the face of how we approach clinical trials today?

**Stephen Senn:** Well, I think that the practical thing that we could do quite easily but it involves solving a problem of human psychology is to analyze clinical trials efficiently using what we already know how to do. So if you have a look at a lot of clinical trials, you'll find that the data are dichotomized.

And we have responders and non responders and not only is such a classification misleading, but it leads to a huge loss in , In power, we know that in the best of cases, when you dichotomize, the sample size will have to be something like 60 percent greater than it would be if you didn't dichotomize, but if you choose a bad cut point, if it's not near the median, then in that case, you will have a much bigger loss than that.

The second thing is to use covariance information. So if we just had the covariates in, then in that case, we would get also much greater precision, greater understanding. So there are simple modeling steps that we could do, which from some quarters are resisted because they're described as being too complex, but they're basically only using theory, which is a hundred years old.

**Alex:** Why are we not doing this stuff?

**Stephen Senn:** That's a good question. I don't know. We have , we have a lot of talk about, you know, sequential trials, flexible designs, increasing efficiency, but this would increase it, very simply, much more than those will do. So it's, it's a bit of a mystery as to why it's resisted, but it is resisted.

I think slowly, very, very slowly, there's an increasing realization that, uh, that this will help. You still feel, you still find very strange things where people try and balance as much as possible by a whole of co, sort of covariates, which you can only do, uh, dynamically using a method called minimization.

But they then don't fit the covariates in the model, which is really strange, because if they don't think they're predictive, why balance by them? What's the point? But if they are predictive, then why aren't you using them in the model? So this is a sort of, uh, a sort of contradiction in thinking.

**Alex:** Is it a cultural thing?

**Stephen Senn:** No, I don't know whether it's cultural. It might be a cultural thing, a difference between statisticians and life scientists. I don't know. Not really sure. It's just ingrained habits. You find sometimes also that what happens is that once, once people have succeeded in publishing something a particular way, uh, you're then told that this is the way that it has to be done.

The very nice paper from about. 10 years ago, I think, in biostatistics by Christophe Lambert, who is a scientist, there's also an actor called Christophe Lambert, but it's not him. And he says something like, uh, pay attention to experimental design or my head's And he looks at some of the early GWAS studies and he showed that the way in which they were carried out meant that extensive data curation was necessary in order to deal with all sorts of plate effects that there were.

And if you looked at the two largest principal components for the results, they were actually plate effects and not gene effects. Um, and what everybody's copied, or at least the time he wrote his article, whatever he copied was, he copied the curation. The extensive curation rather than simply designing out the problem.

If you designed properly, you wouldn't have had the problem in the first place. Being able to fit the plate effects very simply, orthogonally to, uh, rather than having to deal with them in this particular way. And yet, you know, it became a habit. So who knows? So this

is a problem about understanding the structure of like what variables are linked to what other variables in the study, as I understand.

Yes, partly, but if you think about any observational study, then people tend to make the assumption that the only difference is the difference between the experimental material, which might be, let's say, something like cases and controls in a case control study. But actually what you're doing is you're processing data, if it's biological data, you're processing data.

So the question is, are you sending off. all of the control data to one lab and all of the, um, case data to another lab? And if many labs are involved, are you sure that that particular assignment is randomly orthogonal or is there in fact some correlation over time? How do you know that there isn't a time factor to do with measuring which is correlated to what form the cases and what come the controls unless you are obtaining them simultaneously in a double blind clinical trial?

There is no way that this sort of bias could occur. It's simply impossible to send off all of the, all of the cases, all of the patients given treatment A and all the patients given treatment B to be processed in different labs. It's simply impossible. All of these things are impossible by the nature of the design, but they become possible in observational studies and they're overlooked all the time.

**Alex:** A title of one of your papers suggests that people who think they are Bayesian might be wrong. Why is that?

**Stephen Senn:** It's because I was much taken by the beauty of the Bayesian method and the very, in many ways, convincing arguments by a number of leading Bayesian deep thinkers, but I was unconvinced by some of the examples that they then produced.

They had examples in which all sorts of errors are made which are obvious, I think, to any frequentist without having to be a Bayesian. And so in this particular paper, I take some classical examples that these people have used and I've shown how they couldn't possibly believe in the posterior distribution that they calculate because they simply are using the wrong prior.

It's not a prior that they could own.

**Alex:** Why do we need probability?

**Stephen Senn:** Well, that's a good question. Some people think we don't. I would say, I mean, I know that, Nassim Taleb thinks that probability misleads a lot of people. He refers to ludic fallacy, I think, using games as an analogy, but games are a very well structured game.

a setup in which you could plausibly believe in the probabilities elsewhere. You don't the probabilities. And I think he probably thinks, believes more in sort of trying to find options or ways of not being hedged in by probabilistic decision analysis. I think in some cases we have no choice, but nevertheless to think probabilistically about things, but who knows, maybe others will eventually prove it was all a fiction that was unnecessary.

**Alex:** These cases are the cases where we don't have enough information about the system that generates the data.

**Stephen Senn:** Yeah. Well, you could argue that. I mean, there's always an argument as to whether bedrock probability exists or not, you know, some people are believing bedrock determinism. Um, I always use perhaps the cheap and false argument that, well, it may be that you're right that bedrock determinism exists, but if that's the case, it seems to be determined that I don't believe you.

I have no choice but not to believe you if that's the case it seems on the other hand if you're not right then there's some value in my not believing you so what's the point in a really why are we discussing this you go ahead and believe that if you think in that particular way so but it's obviously a mystery it's been a mystery in religion as well you know you If God is almighty, then how can there be free will?

And if there isn't free will, how can there be sin? And then, you know, the Calvinist comes to the conclusion, well, some people were predestined to sin and some were predestined to be saved. And that's the way it is because anything else would imply that God is not almighty. But it's actually, actually mirrored exactly, I think, by people who are atheists as well thinking about, you know, how does the world work?

Why does it matter if it just works the way it does and we can't do anything about it? You know, if everything is preordained, so.

**Alex:** Who would you like to thank, Stephen?

**Stephen Senn:** Who would I like to thank? I would like to thank a number of people who've helped me understand stuff. In particular, Gilbert Rutherford, who was a colleague of mine when I was at Dundee College of Technology.

I learned a lot from him. Uh, Andy Grieve and Amy Racine when I was at CB Geigy, as it then was, two Bayesian statisticians have been very influential in the way I think. I'd like to thank a lot of people I work with from whom I learned during the time I've been a statistician.

**Alex:** Where can people learn more about you and your work?

**Stephen Senn:** Following my blogs is probably the best thing. Either follow me on Twitter, you will find that there are a fair number of rather bad puns you have to put up with. Um, and also that there are some photographs of my hiking trips you have to put up with.

And the occasional photograph of beer that you will find, but otherwise I'm mainly, mainly tweet, tweet about, uh, statistics. I don't know what you say these days. It's no longer Twitter, it's called X, UX. I don't know what you do when you, but anyway, and also, my blog site. So I have a, a collection of, uh, a web page which links to the various blogs I've, I've written on various subjects.

**Alex:** Great. We'll link to this in the show notes in the show description so people can find it.

**Stephen Senn:** Okay. Thank you.

**Alex:** Thank you so much, Steven. It was a pleasure.

**Stephen Senn:** Yeah. Pleasure

**Marcus:** is mine. Congrats on reaching the end of this episode of the Causal Bandits Podcast. Stay tuned for the next

**Jessie:** one. If you liked this episode, click the like button to help others find it.

**Marcus:** And maybe subscribe to this channel as well. You know,

**Jessie:** stay causal.