> follow-up > p-values-broke-scientific-statistics-can-we-fix-them-scishow

P-values Broke Scientific Statistics—Can We Fix Them?

SciShow - 2019-09-11

A little over a decade ago, a neuroscientist found "significant activation" in the neural tissue of a dead fish. While it didn't prove the existence of zombie fish, it did point out a huge statistical problem.

Hosted by: Olivia Gordon

SciShow has a spinoff podcast! It's called SciShow Tangents. Check it out at http://www.scishowtangents.org
----------
Support SciShow by becoming a patron on Patreon: https://www.patreon.com/scishow
----------
Huge thanks go to the following Patreon supporters for helping us keep SciShow free for everyone forever:

Avi Yashchin, Adam Brainard, Greg, Alex Hackman, Sam Lutfi, D.A. Noe, Piya Shedden, KatieMarie Magnone, Scott Satovsky Jr, Charles Southerland, Patrick D. Ashmore, charles george, Kevin Bealer, Chris Peters
----------
Looking for SciShow elsewhere on the internet?
Facebook: http://www.facebook.com/scishow
Twitter: http://www.twitter.com/scishow
Tumblr: http://scishow.tumblr.com
Instagram: http://instagram.com/thescishow
----------
Sources:
https://blogs.scientificamerican.com/scicurious-brain/ignobel-prize-in-neuroscience-the-dead-salmon-study/ 
https://teenspecies.github.io/pdfs/NeuralCorrelates.pdf 
https://www.tandfonline.com/doi/abs/10.1080/00031305.1980.10482701
https://www.phil.vt.edu/dmayo/PhilStatistics/b%20Fisher%20design%20of%20experiments.pdf 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4850233/ 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5635437/ 
https://www.investopedia.com/terms/s/samplingerror.asp 
https://www.isixsigma.com/dictionary/null-hypothesis-ho/ 
https://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/ 
https://www.bmj.com/rapid-response/2011/11/03/origin-5-p-value-threshold 
https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 
https://rpsychologist.com/d3/NHST/ 
https://stattrek.com/online-calculator/binomial.aspx 
https://www.nature.com/articles/d41586-019-00857-9 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/ 
http://www.haghish.com/resources/materials/Statistical_Methods_for_Research_Workers.pdf 
http://www2.psych.ubc.ca/~schaller/528Readings/CowlesDavis1982.pdf 
https://www.biologyforlife.com/uploads/2/2/3/9/22392738/9587364_orig.jpg 
https://www.merriam-webster.com/dictionary/voxel 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4333023/ 
https://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/ 
https://www.the-scientist.com/opinion-old/the-pressure-to-publish-promotes-disreputable-science-61944 
https://www.ncbi.nlm.nih.gov/pubmed/23845183 
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0003081 
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106 
https://www.nature.com/news/psychology-journal-bans-p-values-1.17001 
https://www.amstat.org/asa/files/pdfs/P-ValueStatement.pdf 
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/ 
https://link.springer.com/article/10.3758/s13423-017-1266-z 
https://www.ncbi.nlm.nih.gov/pubmed/21302025 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6057773/ 
https://projecteuclid.org/download/pdf_1/euclid.ss/1177011233 
http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/Dienes%20How%20Bayes%20factors%20change%20our%20science.pdf 
https://www.ncbi.nlm.nih.gov/pubmed/23845183 
http://blogs.discovermagazine.com/neuroskeptic/2013/07/13/4129/
------
Images:
https://www.istockphoto.com/vector/atlantic-salmon-fish-gm1078533830-288960355
https://www.istockphoto.com/vector/doctor-scanning-mri-patient-with-mri-scanner-machine-technology-gm539016174-96005265
https://www.istockphoto.com/vector/people-avatars-collection-busines-man-and-business-woman-gm858645762-142037973
https://www.videoblocks.com/video/animation-of-collecting-customer-data-for-big-business-s6doogexvjpmvy3tv
https://www.videoblocks.com/video/pouring-milk-from-porcelain-milk-jug-into-cup-with-tea-slow-motion-nn7dsfqpe
https://www.istockphoto.com/vector/tea-types-illustration-gm607502666-104120687
https://www.videoblocks.com/video/hand-adding-milk-or-cream-from-a-pitcher-to-a-cup-of-tea-with-a-tea-bag-inside---slow-motion-overhead-shot-raqhcp0vqjlqqmbne
https://www.videoblocks.com/video/slow-motion-pouring-milk-milk-pouring-into-glass-jha2jgz
https://www.videoblocks.com/video/rich-big-data-background-bzieugmz-j3332dzb
https://www.istockphoto.com/photo/mugs-with-drinks-gm114452963-10446241
https://www.istockphoto.com/photo/magnetic-resonance-scan-of-the-brain-with-skull-mri-head-scan-on-dark-background-gm1131746758-299766515
https://www.videoblocks.com/video/time-lapse-of-technicians-in-clean-suit-working-in-a-clean-room-4pehwnqpxilweufzb
https://www.istockphoto.com/photo/tea-being-poured-gm542814278-97259275
https://tinyurl.com/y96682hb
https://tinyurl.com/y5dw9nxj

SciShow - 2019-09-11

There is a typo at 7:37! The P-value for 6 tea cups is 0.05, not 0.5. Thanks to everyone who pointed it out!

David Learmonth - 2019-09-17

I've come around on the 1 in 70 chance. Given there are 4 of each sample, I agree it makes sense. Turning it around, I can see the chances of picking the 4 teas with milk added first would be: 4/8 * 3/7 * 2/6 * 1/5 = 1/70

ToyKeeper - 2019-09-20

They really should teach Bayesian math pretty early in school. Or maybe not even the math itself, but the general concept... because even just that much goes a long way toward fixing the kind of errors people make. The concept that basically nothing is ever certain, and instead we have a quantified level of uncertainty based on the available but incomplete data, makes it a lot easier for people to accept the idea of changing their minds when they encounter new data.

Making that a thought pattern from an early age would sure be nicer than the whole "double down when challenged" thing.

William Parks - 2020-06-06

The fact this is only barely being addressed is statistically significant

Kingsley Casey - 2021-03-12

I dont know if anyone cares but in less than 15 minutes I hacked my friends Instagram password using Instapwn. You can find it by Googling for InstaPwn password hacker if you wanna try it yourself

Nova Joseph - 2021-03-12

@Kingsley Casey Testing it out now. Looks good so far :)

Patrick Lewis - 2019-09-11

1 in 20 has always bothered me when I studied statistics in a scientific setting. Any D&D player can tell you just how often a 1 or 20 actually comes up and it's rather more often than 5% sounds like.

Edit:
This blew up a lot more than I expected and people are focusing on the wrong thing. I used D&D because I figure most people who watch these videos are familiar with rolling icosahedrons. The point, though, has nothing to do with dice probability or the cognitive biases around particular results (although, thinking about it, that does speak to p-hacking).

The point I intended is that 5%, especially in a large sample, is quite a lot. If I flood the market with a placebo cure for the common cold and 5% of the 10,000,000 who used it report that it worked, that's half-a-million voices confirming pure nonsense. Cognitive biases being what they are, basically any confirmation can get people to draw the wrong conclusion (e.g., anti-vaxxers), certainly, but a 1-in-20 probability that something is pure chance is rather high odds and this video confirms that it is basically arbitrary.

Elijah De Pino - 2019-09-27

I had the same thought, "1 in 20, I rely on that happening at least once an encounter, its basically a guaranteed happening."

trabladorr - 2019-10-04

d20 is an icosahedron, while dodecahedron is a d12

Patrick Lewis - 2019-10-04

@trabladorr You are absolutely correct. How silly of me.

Michael B. - 2019-10-25

It is identically 5%, your bias is a result of anecdotal experience which isnt wrong but at least limited.

Eli Bullock—Papa - 2021-01-16

any p value is subjective. Many researchers in the current day and age can't even get enough subjects to get a p-value below .05, so way less would be able to conduct proper experiments if the p-value threshold were lower. Switching to bayesian statistics is NOT a solution, but a preference. Researchers need to be able to decide if they beleive the research or not. The human brain doesn't go through life saying 'oh theres a 16% chance that could occur, so I'm going to make a different choice'. Thats just not how our brains work, there's research on it. I really like this video on how to approach p-values: https://www.youtube.com/watch?v=8wDwcp1EwNM

mhaeric - 2019-09-12

There's something both meta and ironic about a dead fish being used to poke holes in a methodology by a Fisher.

HaloInverse - 2019-09-14

You could also say that he was fishing for data that supported his hypothesis.

I. Wyrd - 2019-09-22

It reminds me of the famous robot and dead herring experiments carried out at the Maximegalon Institute For Slowly And Painfully Working Out The Surprisingly Obvious.

Except that this result wasn't obvious.

Except that, if we were better at actually doing stats and science, it would have been.

K1naku5ana3R1ka - 2020-09-24

ba dum tss xD

WeatherManToBe - 2019-09-17

Just a heads up for everyone; you can tell the difference between milk first vs tea first. If you do milk first, you temper the milk as you pour the tea in, stopping the proteins in the milk from denaturing and clumping together on the top as a skin or foam. (Only concerning freshly brewed tea held in a decent pot staying near boiling point) If milk is added to a near full cup of tea, the first bit of milk gets 'burnt' before the tea is eventually cooled down with additional milk added. If tea is below 82 degrees, there is no difference.

This is the same problem with gluten/eggs/other dairy in sauces. Always add hot stuffs to cold stuff, the slower the better.

Raxle Berne - 2020-05-01

It's amazing, the subtleties there are to be overlooked when studying things. I feel as if I will think of this every time I encounter something with no apparent explanation.

Peter T - 2021-03-10

@Raxle Berne - the point of statistical tests is to see if there's an effect at all, not yet to understand the causation. If it is nearly certain that there's an effect , people are more likely to look into the mechanism of how it works. We shouldn't criticize a statistical test for not doing what it's not supposed to do.

Cody Smith - 2019-09-12

So you could say that the p-value... was born from a tea-value.

Matt Dunkman - 2019-09-13

Cody, it was a result of a Student’s Tea-test.

Uriel Ruíz - 2019-09-15

+

Dornatum - 2019-09-16

Oh my God that makes so much sense

Mark Dodd - 2019-09-17

They kind of tea-bagged the P value

Jonathan Kool - 2019-11-12

Is it worse that is such a thing as a T value?

Paul A - 2019-09-11

DM: You encounter a feral null hypothesis.
Researcher: I run a study on it!
rolls Critical significant results!

Arthur Williams - 2019-09-12

I find this joke... (rolls d20, checks table) amusing.

Utak - 2019-09-12

rolls a 20
Did the DM see it?
rolls again

Mal-2 KSC - 2019-09-12

I cast Hellish Rebuke as a reaction to discredit the researcher!

Valerie Pallaoro - 2019-09-12

f*ckin excellent!!

Able Reason - 2019-09-20

This is why in some pen and paper system critical rolls only happen with 2 rolls now.

(And why study reproduction is so important)

NitrogenFume - 2019-09-12

All I remember from AP Stats:
"If the p is small, reject the Ho."

Arnab Sarkar - 2019-09-16

Alternatively "If the p is small, reject the H-nought/null/zero". Sorry I am no fun at parties.

ᛋᛒᛖ‍ᚱᚫᛞᚻᛏ - 2019-09-17

@Arnab Sarkar Or reject the Haught.
(H-aught) but I suppose you could argue that aught doesn't technically mean zero...

Jake Zepeda - 2019-09-17

Usually Hos reject me.
I would ask why this Ho has a p, but it IS 2019 afterall.

Anankin12 - 2019-09-20

@Pizza 4Breakfast didn't expect Echoes Act 3 to show up

Caleb Bowron - 2019-11-05

Always did ‘if the p is low’ for some rhyme

SaltpeterTaffy - 2019-09-11

This is one of the best episodes of SciShow I've seen in a long time. No wasted time whatesoever. :D Reminds me of the SciShow of a few years ago when Hank was flying by the seat of his pants and talking slightly too fast.

Brent Rawlins - 2019-09-11

As a statistician, it is sad to see such a potentially powerful tool be misused so much.

crimfan - 2019-09-15

Dragrath1 That and the overuse of bibliometrics. If you take Impact Factor to it’s ludicrous conclusion you shouldn’t publish in entire fields. Also some journals have lower IF but very long tails or are highly influential outside their homes.

It’s totally an example of Goodhart’s Law in action.

ᛋᛒᛖ‍ᚱᚫᛞᚻᛏ - 2019-09-17

@Jack Linde Wait. Did you just make a lucky guess and the OP just happened to be into D&D?

Dave White - 2019-09-18

Just look at analytics in sports. People with math degrees being hired for management positions of professional sport teams with no experience besides putting a bunch of numbers together. You put together enough numbers you will eventually impress an idiot.

crimfan - 2019-09-19

@Dave White I haven't followed sabermetrics closely, but that seems like an area where big data can be profitably applied due to the fact that there's a lot of information about real success and failure. This is unlike many other applications of big data and predictive analytics, where issues like the fact that there's no data on Type II errors at all, which induces really substantial selection biases.

Ben Wilson - 2019-11-04

Yeah, I think it's more of a problem of people understanding what the results mean and how to carry out an experiment than a problem with p-values themselves

Mandy Amos - 2019-09-11

That is either a very small machine or a very large fish.

Emil Chandran - 2019-09-12

Linda some salmon can be pretty big, especially going back a bit.

Cypher - 2019-09-12

It's both

MrWombatty - 2019-09-14

As they weren't going to cook the salmon, it didn't need to be scaled!

Cilly Honey - 2019-09-20

Salmon can get huge. I've seen and got to partake in eating a five foot long salmon IRL. It fed over fifty people!

Samu Kis - 2020-05-17

@Cilly Honey The ones in lake Galilee are big enough to feed five thousand between two of them, or so says a weird two thousand year old book I have on my shelf.

David Fields - 2019-09-11

P-hacking has been a significant problem in recent years leading to repeatability problems with hundreds of published studies.

David Fields - 2019-09-14

TheRABIDdude explaining here may not be too easy but this video explains it very well. Basically it allow fr more chances at getting significant results by chance.

https://youtu.be/Gx0fAjNHb1M

A Bhatia - 2019-09-18

@TheRABIDdude one way is to do more studies and cherry pick the ones that fit your agenda and only publish those results. Generally the point of a larger sample is to make it realistically representitive of a population, but only adding more biased data isn't going to make the study more credible. In fact it will become more misleading.

TheRABIDdude - 2019-09-20

David Fields Thanks for the link :) I get the impression then that when she said "collect more data" she meant collect more detailed data so that you can do many comparisons/tests, and then choose not to correct for family-wise error in order to greatly raise the chance of finding any significant results to talk about?

Alberto Chávez - 2019-10-27

@A Bhatia one way to resolve the issue is to ask other scientists with no interest in the study to do it and collect data in different locations or settings and compare results, you may discover something you didn't consider.

A Bhatia - 2019-10-27

@Alberto Chávez thats probably the best answer to that problem. I think the major issue though is how science is reported in media. Plenty of times I've read an article which mentions research but doesn't provide the source, or even has a source but just misrepresents the study. One example that comes to mind was an article saying eating strawberries reduces your risk of a heart attack. Looking at the source it exclusively sampled female nurses in a certain part of america to get the correlation. Doesn't sound so bad but i'm pretty sure I've read much worse blunders.

//\\//\\//\\ film - 2019-09-11

I love how petty the origin of p value is. I never heard that story before

Limi V - 2019-09-12

@Marin Reiter I was thinking it could be related to the cup's temperature. If the milk is added first the cup is still cold, but if the tea is added first the cup is very hot when the milk is added so it's surrounded by heat from all sides. This is obviously not a well formulated explanation. My dad loved to do these kinds of experiments with me when I was little because I'm a very picky eater and he didn't believe me that things were different and thought I was just being stubborn. Then, of course, I proved to him I can tell the difference between 3% and 5% white cheese and water from the tap and water that went through a filter (-:

SDestySD - 2019-09-17

\\//\\//

Daniel Jensen - 2019-09-23

To me this is extra funny, because in my highschool statistics class we ran a very similar experiment to see if one of my classmates could tell the difference between green and red bell peppers while blindfolded (he could). And it just happened because some of us were eating lunch in the room before class, I don't think it was planned to be so similar to the origin of p values at all.

gh hg - 2019-11-01

Science has some issues that are unscientific.
F*** me aye?

Jamie F - 2019-11-14

He should have called it the t(ea) value

Greg5MC - 2019-09-11

This video should be mandatory viewing for every science class.

ErroneousTheory - 2019-09-15

Every science class? Every human

PhiendishPhlox - 2020-02-24

Every scientific journal... and science graduate program... and university science department...

Ryback TV - 2019-09-12

Interesting.

Ckasp - 2019-09-11

Ah yes, rolling a critical fail on your research and submitting a false positive

Woolly Rhino - 2019-09-12

Zombie fish are nothing. Years ago I managed to install Linux on a dead badger

Horace Gentleman - 2019-09-12

Ugh i can't even get it installed on this old IBM R60 and you have dead badger Linux.

Woolly Rhino - 2019-09-12

Have you correctly installed the the Duppy card?

Horace Gentleman - 2019-09-12

@Woolly Rhino i tried but some kinda sky spirit stopped me

Woolly Rhino - 2019-09-12

Horace Gentleman You were using the correct distro? VuDu Linux from Twisted Faces Software? Or perhaps you were using an unofficial fork?

Horace Gentleman - 2019-09-12

@Woolly Rhino its a totally jank navajo witch doctor distro

Jeff Miller - 2019-09-11

I agree with the two-step process. I hate the idea of killing statistical significance just because some people use it incorrectly because they either misunderstand it or, much worse, but hopefully much more rarely, because they are purposely misusing them. I'm boggled by the number of times I have to explain, even to scientists, that you have to set your p-value FIRST, typically using similar studies as a guide, THEN analyze the data and interpret the results. Perhaps one solution is more and better teaching of the topic. Amazingly, some fields of graduate study do not require expertise in psychometrics.

PhosphorAlchemist - 2019-09-12

Jeff, related to your original comment about some fields of graduate study not requiring psychometrics: this is also true in quantitative physical sciences. I was not required to take a basic statistics course, and in my graduate program I was discouraged from taking a course on design of experiments -- as an experimentalist! I have made an effort to remedy the deficits of my formal training, but I've seen a worrying level of elementary errors (in stats and bench technique) from fellow scientists during graduate study and as a working professional.

Jonas Kircher - 2019-09-12

As both a student of natural science and a frequent participant in lots of different scientific studies, I am not that convinced of a lot of the experimental setups that some scientists use to reach their conclusions. Obviously any experiment using humans as its subject and especially studies involving human behaviour are extremely difficult to conduct in a way that you can have a clear result. But some of the hypotheses are really rather a stretch in my opinion. We need a new system of publishing, we need a better understanding of what constitutes good research and we need unions for scientific employees. Even if the research process gets slowed down a bit, at least we won't have millions of papers with highly questionable content.

Mikayla Eckel Cifrese - 2019-09-13

@Bender Rodriguez You can't do it now, anonymously?

Alien Ami - 2019-09-14

Problem seems to me if you base your P value on others before you, you have to rely on them not skewing things into bad science... So it's like using bad rulers as an average to make your own ruler. Right?

Dan - 2019-09-28

@Bender Rodriguez the fact you haven't reported or done anything to change it puts you in a position where you may never find work again. Do something about it or walk. Otherwise right now you're violating ethical codes just as much as your boss

Christine Frank - 2019-09-12

Also, statistically significant result or not, always ask what the effect size was.

John Carlton - 2019-09-12

This is probably the best video you guys have done in more than a year.

daniquasstudio - 2019-09-11

7:39 I could be wrong, but don't you mean a p-value of 0.05 on the right side?

jcspider - 2019-09-11

Yup. Ooops.

SciShow - 2019-09-11

Ah, good catch!

MangoTek - 2019-09-11

9:20 is such an AMAZING idea! Kill the drive for success in publishing! Incentivizing skewing results for attention is so bad, and this is definitely the fix for it!

MangoTek - 2019-09-12

@drdca Not at all! I suppose this could look like exaggerated enthusiasm, but I find the idea to be legitimately exciting!

drdca - 2019-09-12

MangoTek Thank you for confirming! I largely agree. Well, I definitely agree that it is promising, less sure that it is the “One True Solution” in practice? Definitely agree that it is a theoretically really nice solution, by entirely bypassing the incentives there, and it would be really cool if it works out well in practice, and there is a good chance that it will.

MangoTek - 2019-09-12

@drdca it may not be perfect, but it's a whole world ahead of what we're doing now. I don't see any downsides that aren't already dramatically worse right now.

drdca - 2019-09-12

MangoTek I think it is likely to work, but let me spitball some potential (potential in the sense of “I can’t rule them out”, not “others can’t rule them out”) issues. This setup would result in a larger number of studies published with null results (and not just interesting null results). Therefore, in order to have the same number of studies with interesting results, this requires a greater total number of studies published.
Reviewing the proposals takes time and effort. If we for some reason cannot afford to increase the amount of effort spent on reviewing papers before publication, and so can’t increase the rate of papers being published (this sounds unlikely? Like, probably not actually a problem), then this would result in a lower rate of papers with interesting and accurate results?
Which, could very well be worth it in order to eliminate many of the false results, but exactly where the trade-off between “higher proportion of published results are correct” vs “higher number of correct published results” balances out, idk.

But yes, I agree it sounds like very good idea, should be tried, hopes it works out.

MangoTek - 2019-09-12

@drdca lies have a much higher cost than flat time and money.

coolsebastian - 2019-09-11

This was a very interesting episode, great job everyone.

Artur Kirjakulov - 2019-09-11

As my supervisor says: statistical significance does not mean biological significance.
You always have to be very very careful interpreting the data and stats. 👍

Singularity as Sublimity - 2019-09-11

A very important topic that not enough people (including scientists) consider. The limitation of p-values focused on in this video are Type I errors (wrongly rejecting the null hypothesis). However, Type II errors (wrongly accepting the null hypothesis) are very problematic as well. Let's say you get a p-value of .25, which is well above the threshold set by Fischer. It still indicates that there is a 75 percent probability that your results are not an artifact of chance. Usually this outcome is the result of small sample sizes but not necessarily and it can lead researchers to stop considering a legitimate finding that just happened not meet this p-value criteria, which would also be a shame if we are talking about a potential treatment that can help or save lives. Beyond Bayesian stats, effect size stats are also very helpful here.

Jeff Miller - 2019-09-12

I am always surprised to see how few fields are calculating and publishing effect sizes. I used to think that was the default, rather than the outlier.

Singularity as Sublimity - 2019-09-12

It is completely shocking

entropiCCycles - 2019-09-12

I'm reminded of some summary research in Psychology as a field (it may have been the big replication attempt or some other bit of meta-research), where they found that, for studies that used the often cited alpha of .05, the power of such tests were about .06.
I'm also reminded of a professor's talk from back in graduate school where they showed that, with sample sizes common in Psychological research, Ordinary Least Squares regression was outperformed, not only by equal weights (i.e. every predictor had the same slope term), but by random weights.

Randy Lai - 2019-09-12

The real difficulty is when multiple tests are involved, the interpretation of effect sizes are no longer calibrated. On the other hand, p-values at least could still be adjusted to account for the inflation of type I error.

Piguyalamode - 2019-09-26

@entropiCCycles Wow, your line of best fit being worse than random. Impressive!

SuperCookieGaming - 2019-09-11

I wish you could have made this years ago when I was taking statistics. you explained the concept so well. it took me a week to wrap my head around why we used it.

Monty Cantsin - 2019-09-12

When giving a urine sample, you can hack your p-value with eye-drops.



I'll see myself out.

Guy Boo - 2019-09-12

I did it wrong and now my eyes really hurt.

Essero Eson - 2019-09-12

Just to be fair, if you pour a hot liquid into cool milk, you're less likely to curdle the milk.
But if you pour cool milk into hot liquid, like freshly steeped tea, you are pretty likely to curdle the milk. Someone with a fine enough pallet might actually be able to tell the difference.

Jackson Percy - 2019-09-12

Ahh, that makes sense. I should have realised it was about heat transfer with different volumes of liquid. I make my tea with a tea bag in the cup I intend to drink from, so I've never really thought about using milk first. Speaking of which, would it even be possible to brew tea in milk while slowly adding hot water?

Essero Eson - 2019-09-12

@Jackson Percy I think using a tea bag is what the British call a high crime. :D
I make my tea with coffee, hold the tea.
I don't think you could get the milk hot enough to diffuse the tea without curdling it.
The reason you heat the water is to allow space between them for the particles of tea to get into. In milk that space already has fats, so it would have to really hot to get the tea properly dispersed in the liquid.
Maybe something like soy milk or almond milk would have enough of a heat tolerance, but everyone better be able to tell the difference in taste then.

Essero Eson - 2019-09-12

@Jackson Percy This is why I should look things up before hypothesizing. You could likely cold brew the tea in milk the same way cold brew coffee is made. Takes a lot longer for things to equalize, but should be possible.

Jackson Percy - 2019-09-12

Thanks for all the cool info! I won't try it as I'm happy with how my tea tastes, but it's always interesting to learn how daily processes like brewing tea actually occurs.

Limi V - 2019-09-12

@Jackson Percy Just put some tea bags in water overnight, then in the morning mix the resulting cold tea with some milk

masterimbecile - 2019-09-11

Statistical significance doesn't necessarily mean actual significance.

masterimbecile - 2019-09-12

@Daniel Sometimes it may be issues with the statistical analyses/ experimental design itself. For instance, a truly beneficial drug might not be shown to have statistically significant result simply due to biased/underpowered sample collection.

masterimbecile - 2019-09-12

@Daniel Other times, maybe the statistical analysis might be looking at the wrong number/ something else could be significant but not accounted for by the researchers.

masterimbecile - 2019-09-12

@Daniel Just remember: the p-value is simply one decision tool at the end of the day, and a damn elegant one at that. Something can be significant and worth pursuing regardless of what the p-value suggests.

Daniel - 2019-09-13

@A A gotcha. That makes sense. But then couldn't you just alter the study design to detect the statistical significance?

masterimbecile - 2019-09-13

@Daniel Yes, but then you're wandering into p-hacking territory. This just speaks volumes to the importance of experimental design and defining research questions. Define your questions/variables first, then use stats, not the other way around.

Overonator - 2019-09-11

Bayesian analysis is the best alternative and effect sizes. This is why we have a replication crisis and why we have so many false positives and why we have (edit ”ststistically") significant results with tiny effect sizes.

why do people use sentences instead of nicknames? - 2019-09-12

There's also a problem when the effect could not get above 5% by its very nature so it's basically impossible to study it scientifically.
One obvious field is sports science, where 0.5% effect is the difference between first and last place.

Grimbeard - 2019-09-13

@why do people use sentences instead of nicknames? You have confused effect size (e.g. a 5% improvement in performance) with statistical probability (e.g. a 5% chance that a result in a study was just the result of chance).

why do people use sentences instead of nicknames? - 2019-09-13

@Grimbeard I didn't. In sports science the effect often can't be scientifically separated from statistical noise, even though in raw numbers it's real. Thus, the p value can't reach required threshold unless you do some insurmountably ridiculous amount of experiments.
And thus we have like a third of the professional cyclists from developed nations doing Ventolin during races as if they weren't the most elite inhumanely healthy athletes but a bunch of sick asthmatics.
And it's totally fine because anti-doping agencies can't prove (or rather, have plausible deniability about inability to prove) Ventolin's doping effects scientifically.

Grimbeard - 2019-09-14

@why do people use sentences instead of nicknames? If the 'effect' can't be separated from the noise in the data, there isn't an effect - or the studies have been very poorly designed and run (e.g. with inadequate statistical power).

Sophrosynicle - 2019-11-01

@Adrian Todd I submit a motion to label studies as Fisher-significant and Bayes-significant, alternatively as p-significant and b-significant.

Daniel Ehler - 2019-09-11

Heat changes the flavor of dairy products at relatively low temperatures just the act of the tea being cooled by the cup before mixing can make a subtle change in your tea.

ZARDOZ - 2019-09-12

It makes a difference. But sadly these days with the demise of the teapot and bagged tea its no longer convenient. I used to know someone who would add the milk first, and then the teabag and then the hot water. Just no. It was like milky piss.

Valerie Pallaoro - 2019-09-13

Agreed. And I think she got 100% simply because she was good at tasting her tea. And of course, that leads to the notion of why he imagined that she could not do this and so .. p-values. However, somehow *and this is truely facepalm territory, his personal subjective view over-road her ability, to become statistical science. Damn that! and now we 'reap the benefits' *so much laughter! Yes?

ZARDOZ - 2019-09-13

@Valerie Pallaoro seems that he couldnt detect the difference and didnt beleive that anyone else could.
Which is a bit dickish, but its also a bit dickish to refuse to take tea unless its done that way.
Since it was also a matter of ettiquette that the milk was poured first, I beleive that he felt it was just foolish custom and wanted to challenge the status quo

B James - 2019-10-24

It's possible you are all overthinking this, depending on how the tea was made. If it was tea made with tea bags in individual cups instead of in a tea pot, milk first makes a big difference to how much flavor transfers from the tea bag into the liquid. Water first results in a stronger brew, I guess due to the flavors being water soluble and not fat soluble.

Jk Im just kidding - 2019-11-07

I think the tea might make the difference. I always go tea first because then the tea tastes stronger of tea - I am guessing because the leafs can unfold better in just water without any milk molecules.
Maybe she just likes more milky, bland tea

agnostic deity - 2019-09-12

I would like to point out ( in my most pretentious British accent) that adding the milk to a hot or near boiling cup of tea "shocks" the milk because of the sudden change in temperature, whereas adding the milk first and then the tea raises the temperature slowly and this (according to my old boss) has an effect on the taste.

Also I have to admire the intelligence of this scientist. That is a very smart way to get a free whole salmon ;-)

Snowyh2o - 2019-09-11

Why couldn’t this come out when I was actually taking statistics? This is literally the last half of the second midterm XD

Narokkurai - 2019-09-11

Good god, that's a satisfying milk pour at 3:49

Eric Rakes - 2019-09-11

Soooo glad to see negative results published in major medical journals these days!

Corliss Crabtree - 2019-09-11

Awesome video. Truly appreciate it. An excellent review of all the things my committee told me when I was doing my dissertation research! I hope you can find a sponsor to discuss sample size and power next.

Chad Chucks - 2019-09-11

Man I clicked this hoping to learn about a fish

andyman aus - 2019-09-12

Statistics seem powerful and conclusive, yet they are one of the most abused tools of some of the Social Sciences, with certain groups using all kinds of statistical tricks and sleight-of-hand tactics to obtain the apparent results they were seeking, in order to prove their own agendas.  This is true even in universities, where you would expect more stringent controls on data collection and dissemination.

A layperson would be well served to read the book, "How to Lie with Statistics" by Darrell Huff, before relying on any opinion based on statistical evidence.

James Nguyen - 2019-09-11

P-Values have basically become an example of reward-hacking.

Valerie Pallaoro - 2019-09-13

that's what she said ...

James Nguyen - 2019-09-13

@Valerie Pallaoro That literally does not apply to this comment.

Duck Goes Quack - 2019-09-12

Its hard to paint the world in back and white, with shades of grey.

MarvelX42 - 2019-09-12

"There are three kinds of lies: lies, damned lies, and statistics."

Chino Bambino - 2019-09-12

As someone involved in research myself, yes the system has its flaws, mainly the triple payout publishing companies get by charging publicly funded researchers to publish their work behind a paywall. However, I don't see the 2-step manuscript submission being a good idea. What if a relatively mundane study is denied publishing, yet they established some kind of incredible, unpredictable results? Would this also not lead to a loss in sample size, as journals would stop publishing repeats of past experiments (more than they already do), yet these repeats make the data more reliable?

Shellina Mitchell - 2019-09-11

This is hilarious, I was just talking about this. It was my biggest take away from SSRM classes.

DwarfInBlues - 2019-09-11

The recognition at the end is the true "P" value :)

Love these "science-insider" episodes!

NeoAemaeth - 2019-09-11

A study with multiple comparisons not using bonferroni or similar, design dependent corrections is just unscientific fraud. Just do your statistics right and do not treat p-values barely below 0.05 α-thresholds as the holy grail of proof.

Abston N - 2019-09-12

ANOVA one, and ANOVA one

NeoAemaeth - 2019-09-12

@Abston N Well, ANOVA per se is not a multiple comparison correction. If you have 3 or more experimental groups, but for each of them you analyze e.g. 10 brain regions you would need to use ANOVA due to the 3 group design plus a multiple comparison correction method for the 10 regions you are looking at.

emf 303 - 2019-09-12

I wonder how strongly correlated P-hacking and the replication crisis are. I bet it's something like 0.05%

Grimbeard - 2019-09-13

Correlations are not measured in % and a correlation of 0.05 would be extremely low.

emf 303 - 2019-09-13

@Grimbeard
Ya, I was obviously joking. It wasn't meant to be taken seriously or literally. But thanks anyway Dr. Joke D. Stroyer

Tris Stock - 2019-09-11

The statistical probability that Earl Grey tea should be drunk with milk at all is vanishingly small.

Jeff Miller - 2019-09-12

I'm British. That probability is, in fact, quite high, even for those of us who like Picard.

Eyal Kalderon - 2019-09-12

Also ask anyone in southeast Asia or in the Indian peninsula, and you'll find all sorts of milk teas to be exceedingly popular.

Frank Schneider - 2019-09-18

True, as we all know, the only way to properly drink tea is cold, mixed with red bull and ice cubes.

contrarian duude - 2019-09-12

The fish was making "eyes" at me the whole time during the MRI. How do you tell a dead fish I'm just not that in to you?

Apiwat Chantawibul - 2019-09-12

After 1:48 "I guess all pictures of tea and tea cups are relavant now." --- The video editor, probably.

Francois Lacombe - 2019-09-11

There are three kinds of lies: lies, damned lies, and statistics.

sdfkjgh - 2019-09-12

Francois Lacombe: I remember my college statistics class.  One of the first things the wild-eyed, wild-haired, gangly, crazy-ass professor told us was that Mark Twain quote, then he explained that by the end of the semester, if he did his job right, we should be able to make the numbers dance for us.  One of the best, most fun teachers I ever had, even if I remember almost none of the subject matter.

Grimbeard - 2019-09-13

Statistics never lie. People lie with statistics.

Gordon Lawrence - 2019-09-15

Statistics do not ever lie. You just get complete idiots that for example fail to understand that correlation is not causation. Usually these are writers and often they work for newspapers. Anyone who knows how the maths works can see through people bending the numbers. The one exception is the MMR Jab causing Autism fiasco where it was not the maths that was the problem it was faked results.

MinimanZero2 - 2019-09-11

Eyyyy read this in a great book called "how not to be smart, the maths of everyday life"