Steady State Cardio as Effective as HIIT

Check this study involving Alwyn Cosgrove and John Berardi et al.

In a nutshell, trainees doing identical strength programmes where given either:

  1. steady state cardio
  2. sprint intervals
  3. TRX conditioning
    on top of their weight training. Despite the hoo ha that HIIT rules, results show the group who did the steady state improved more or less as well as the others!

This is news to you?

What a marketing scam. Those guys have been HITT for years.

What are they trying to sell? TRX stuff?

Fact: Steady state cardio is how people get lean.

I have nothing against anyone getting well rewarded for their services, but AC has “lost me” recently because of his over the top marketing strategies. Even his blog is now too focused on plugging products and telling others how to get rich. He sees to have lost his way.

Anyway, Tom Venuto has an interesting article on HIIT versus traditional steady state cardio.

Based on the title I believe this is the one:

I saw it. It seemed interesting, particularly the dropout rates. I think the unfamiliarity with the TRX had to have been a big factor. Think about it, getting a letter commanding you to do steady-state cardio…you already know what’s in store for you. TRX? What the hell is that? You can’t possibly imagine what you’ll be doing until you do it.

Also, CaliLaw, I never thought I should be using steady-state until recently. Granted, I haven’t been training 20 years, but I’ve been around the block. The low-intensity steady-state cardio while on a restricted energy intake only began to interest me within the last year, particularly after doing my first BB competition. I was always under the impression that HIIT was superior, and that was that.

I’m disappointed that the article failed to touch on how body composition was affected by each. I really don’t give a shit about weight, it’s too blanket of a metric. OH WOW, YOU MEAN TO TELL ME THAT PEOPLE WHO INCREASED THEIR CALORIC EXPENDITURE LOST WEIGHT REGARDLESS OF ACTIVITY TYPE?! Tell me something I already don’t know…I can’t believe no skin fold testing was done…and looking at the result variabilty per each method, it is possible that no changes actually took place.

"Now, we don�¢??t have body composition data, as described above. Had we collected those data, perhaps we�¢??d have seen more subtle changes in fat mass and lean mass.

But, truthfully, I doubt it. All three programs included a strength training program and a similar volume of exercise. We have no reason to believe more muscle would have been built and fat lost with any specific intervention."

Tell that to sprinters vs long distance runners. Disappointing they chose to assume here…even if they assumed correctly, that’s bad science.

I truly wish I didn’t have to say this but actually this is embarrassingly wrong misapplication of data.

When the results are this…

Table 2 Average weight loss (in pounds) over 8 weeks

Male Female Combined
Steady-state -3.4 (+/- 4.4) -4.9 (+/- 4) -4 (+/- 4.1)
Interval cardio -2.9 (+/- 3.8) -0.6 (+/- 2.2) -1.8 (+/- 3.7)
TRX group +4.2 (+/- 5.1) -1.1 (+/- 3.2) -2.8 (+/- 4.5)

… NOTHING can be concluded of any kind because the random variability is more than enough to drown out any actual differences if actual differences existed.

The only thing the study showed is that none of the methods was so bad as to able to prove, even in the presence of all this random variability, that it was worse than the others. But it would have had to have been unrealistically worse for this study to be able to show it.

Now it COULD be that the steady state group, if there were a larger number of participants, might have averaged say 6 lb “weight loss.” Of course we would know nothing about body composition, but that wasn’t measured.

And it COULD be – so far as anything in the study enables us to know – that that the interval cardio group, if there were a larger number of participants, might have failed to lose any weight at all.

So it COULD be, that in terms of “weight loss,” steady-state is far superior.

But on the other hand it COULD be that were there more participants, the steady-state group might have had only say 1 lb of weight loss, while the interval group might have had perhaps 5 lb.

So it also COULD be that the interval method is far superior for “weight loss.”

The data provide absolutely no means of saying that one either is or cannot be far superior to another; or that both are or are not virtually identical in performance.

It just shows NOTHING regarding whether there is difference, and if so how much, between these methods for even weight loss, let alone body composition. No conclusion should have been given that is was supposedly shown that there is no difference.

This is unfortunately not an isolated example. The same fundamental error is present in many scientific articles let alone popular-audience articles such as this.

Actually on later reflection the above may not be clear to those not habitually working with that kind of thing. So, trying again:

Let’s suppose – we don’t know, but let’s suppose – that chance averaged out, as it very often does. So while there was a lot of variation, with some people in a group losing much more than others and some not losing anything, let’s suppose that it all balanced out pretty well.

Let’s look at the non-unlikely possiblity that each group had a similar proportion of high-responders, average-responders, and low-responders, and the average for each group is about what it would have turned out had we had a very large number of people in the group, thus having chance almost surely average out.

Well, if it is so that chance didn’t favor any group over the other, then steady-state gave a 4 lb weight loss, interval about 2 lb, and TRX about 3 lb.

So if we were going to say anything, the most likely thing – as most commonly chance does about average out – is that steady-state beat TRX which beat interval. (For weight loss.)

So why conclude that it was shown that there was NO difference in weight loss?

Now actually we can’t conclude that there is a difference because we don’t know that one group may not have by chance been loaded up with high-responders and another with low-responders.

But most certainly we can’t say this shows there is no difference. Absolutely not.

[quote]Bill Roberts wrote:
Actually on later reflection the above may not be clear to those not habitually working with that kind of thing. So, trying again:
[/quote]

Haha…if they didn’t understand it the first pass, that probably didnt help them Bill.

Hmmm…

Another way of saying it is, the authors should have said something like:

"There could be large actual differences in weight loss results between the methods studied, but we did not have enough subjects for it to be proven whether any such large differences exist or not, let alone small ones.

"We saw differences in average results but these may be due only to chance.

"It also may be that where our data shows less results on average than another with our subjects, this may be due to adverse chance and the method may be better than the other.

“Essentially, nothing can be concluded from our study as to whether any of these methods are superior to the other for weight loss, or whether they are about the same.”

Great analysis, Bill, but had they posted your hypothetical conclusions they would ultimately be admitting their study was worthless. Certainly they can’t have that.

Well, they did that by including the data.

However, the real issue, IMO, is a very widespread lack of understanding of this matter in science.

Very, very widely, the fact that statistical analysis shows that something is not proven from the data is actually believed by the scientists to mean that it was shown that there is no difference or “probably is no difference.”

I don’t at all think that there was deliberate desire to hide that the data shows nothing as to whether there are or are not weight loss differences, even substantial ones, between these methods. Rather, the reason for not including statements such as I suggested is, I am pretty sure, genuinely not recognizing that that is what the situation is.

It gets worse, actually: it’s even more widespread that statistical analysis showing that a criterion such as p <= 0.05 was met is taken to mean that there is less than a 5% probability that chance DID cause the difference that was seen, and therefore we should assume that the effect seen most probably is real.

Actually that is not true at all – in many cases the likelihood that chance alone caused the apparent outcome and there is no real effect whatsoever may be 1000 or more to 1: almost no chance that the thing is real, rather than the other way around.

But that is getting much more complicated and isn’t directly relevant.

Anyway, I think very very few scientists and most surely not these knowingly misinterpret what the statistics actually mean. Rather there is very widespread misunderstanding and drawing wrong conclusions – such as that this study shows probably no difference between these methods – is extremely common and I am sure honestly believed by the writers.

Which is why failing to reject the null hypothesis is not the same as accepting it. Unfortunately, in this case they accepted it (which is never justified in statistics).

In other words:

You can definitively conclude that things are different due to a given effect, but you can never definitively conclude that things are the same.

I really shouldn’t, but now any who continue with this post will be punished with the tedious explanation of how if the opposite occurred – the statistics had the authors claiming from them that a real difference probably existed – that in many cases would be unjustified as well.

Most scientists and readers believe that when a difference in effect is seen, and the statistical analysis says that the difference is to, for example, p < 0.05, this means that there is less than a 5% probability that random chance, rather than real effect, caused the measured difference in effect.

But that is not at all what it means! It has nothing to do with that.

Rather, it is dealing with a completely different matter: if we generated two sets of numbers differing from the same value, by chance alone, in the same manner that the test data was seen to randomly vary, what percent of the time would chance alone produce the apparent effect that was seen?

For example let’s say we are medicinal chemists. We’ve taken on the really poor chance-of-success idea of randomly trying isolated compounds from previously unknown plants and seeing if they seem to increase the lifespan of rats.

The company we work for has been doing this for decades now. 1000 compounds have been run and, on giving really thorough investigation to each one that looked hopeful, only one has really panned out as solid.

But we keep going. We are now considering Compound X.

We ran it with, oh I don’t know, 16 rats. They lived an average of 33 months with a standard deviation of, oh I don’t know, 72 days.

Our established placebo results are average of 30 months, with standard deviation of, I don’t know, say 65 days.

We run the statistics, which I’m not actually going to do, and hey!! That 3 month difference is statistically significantly different to p of say right exactly at 0.05! In other words, only 5% of the time would chance alone yield an apparent improvement as large as 3 months!

Hooray! Let’s publish the article!

But wait a sec, is there really only a 5% probability that in the actual experiment, chance alone was the cause?

By no means. We already know that only 1 time in 1000 (roughly, as our best information) does a so-selected compound work to increase lifespan of rats.

But for every 1000 experiments we do, by chance alone on average 50 of them will turn up an apparent improvement as large as this!

So for every real effect, there are 50 others that will appear merely by chance but still looking just as solid and satisfying the p <= 0.05 requirement.

So the reality is that the chances are 50 to 1 against this compound actually having this effect, rather than a 95% chance that it does have this effect that most readers and scientists would assume.

This is far more important than one might think. This fundamental error results in a lot of nonsense being taken as being demonstrated.

When tens or hundreds of thousands of quite-unlikely-to-have-real-effect experiments are done each year in the world, and there most certainly are at least this many, there are inevitably vast numbers that clkaim outcomes as “statistically significant” but in fact the claimed effect does not exist.

Different situation from the article, as there it was claimed that a difference probably did not exist when that could not correctly be concluded from the data, but the same fundamental thing of coming to undemonstrated or wrong conclusions because of misunderstanding statistical fundamentals in ways that are practically epidemic among scientists, doctors, readers in general, etc.

[quote]Bill Roberts wrote:
I truly wish I didn’t have to say this but actually this is embarrassingly wrong misapplication of data.

When the results are this…

Table 2 Average weight loss (in pounds) over 8 weeks

Male Female Combined
Steady-state -3.4 (+/- 4.4) -4.9 (+/- 4) -4 (+/- 4.1)
Interval cardio -2.9 (+/- 3.8) -0.6 (+/- 2.2) -1.8 (+/- 3.7)
TRX group +4.2 (+/- 5.1) -1.1 (+/- 3.2) -2.8 (+/- 4.5)

… NOTHING can be concluded of any kind because the random variability is more than enough to drown out any actual differences if actual differences existed.

The only thing the study showed is that none of the methods was so bad as to able to prove, even in the presence of all this random variability, that it was worse than the others. But it would have had to have been unrealistically worse for this study to be able to show it.

Now it COULD be that the steady state group, if there were a larger number of participants, might have averaged say 6 lb “weight loss.” Of course we would know nothing about body composition, but that wasn’t measured.

And it COULD be – so far as anything in the study enables us to know – that that the interval cardio group, if there were a larger number of participants, might have failed to lose any weight at all.

So it COULD be, that in terms of “weight loss,” steady-state is far superior.

But on the other hand it COULD be that were there more participants, the steady-state group might have had only say 1 lb of weight loss, while the interval group might have had perhaps 5 lb.

So it also COULD be that the interval method is far superior for “weight loss.”

The data provide absolutely no means of saying that one either is or cannot be far superior to another; or that both are or are not virtually identical in performance.

It just shows NOTHING regarding whether there is difference, and if so how much, between these methods for even weight loss, let alone body composition. No conclusion should have been given that is was supposedly shown that there is no difference.

This is unfortunately not an isolated example. The same fundamental error is present in many scientific articles let alone popular-audience articles such as this.
[/quote]

This is all true, I agree.

It is too bad because they started by matching subjects which was really good and often not done. But then the dropouts in SS condition ruined that.

However, more importantly, what I would like to see in a small study like this is within-subjects data. Averaging subjects together is ridiculous. Especially with the reported variability.

It is quite possible that if we could look at the actual data, we could come up with a better conclusion for this experiment. For example, in each group we might see a that some people had a pattern of results much worse than others, to the point where you could guess that these people simply blew off the workouts. Given the reported variability in the group average, I would expect to see individual data like this. So you would throw those out. Then you’d look at each trio of matched subject’s data, one trio at a time. And you’d look for other outliers and find out why they happened.

In studies like this, take the time to look at the actual results for individuals and figure out what they mean before you start computing means and standard deviations.

I agree. Actually (at least in my opinion) this is the sort of thing where the “non-scientific” opinion of experts such as the authors, based on things that they would have to count as being unusable for publication, actually means a lot more than scientifically designed attempts such as this.

I am certain that in person Berardi, Cosgrove etc are quite aware in each case of people they work with of factors such as you describe. While not “scientifically” arrived at, they will have opinions based on what they have seen on how these approaches compare with other not only for sheer weight loss in a given limited time, but effects on body composition in a similar time, and effects on body composition over longer periods.

It wouldn’t be “science,” but really the conclusions they’d draw that way would have a lot more weight.

Well, that would be unavoidable as the conclusions that could be drawn from the scientific approach and this number of subjects amounted to zero, so I don’t mean to condemn with faint praise: I am sure their observations over time amount to really worthwhile opinions. Actually I don’t know Cosgrove at all (just lack of knowledge) but for Berardi I’d have absolutely no doubt of that.

Largely because of being able, non-scientifically, to make adjustments such as what you are talking about, and also because a lot more cases wind up going into the judgment.

But such is not publishable as scientific findings, because they aren’t.

On potentially cleaning up the data: This winds up being problematic (at best) unless one establishes criteria beforehand. It is fine to have established beforehand that if subjects do this or that, then their data will be excluded and so forth.

But collecting the data and then after having it, making decisions on what to include and what to exclude can readily result in bias, even unconscious bias, creating results out of nothing.

Sort of like running an election where there are problems in different counties or precincts, and according to whether the grand total comes out to the guy you like being the winner or not, you decide whether another re-re-recount is needed using different rules limited to specific counties of your choice where you think the new rules or methods would do better for your guy, or whether it’s all done and we go with the total we now have. That would not be science. You really have to do it according to methods already established, rather than established in response to the data to try to turn non-significance into significance or any other change in outcome.

[quote]Bill Roberts wrote:
I really shouldn’t, but now any who continue with this post will be punished with the tedious explanation of how if the opposite occurred – the statistics had the authors claiming from them that a real difference probably existed – that in many cases would be unjustified as well.

Most scientists and readers believe that when a difference in effect is seen, and the statistical analysis says that the difference is to, for example, p < 0.05, this means that there is less than a 5% probability that random chance, rather than real effect, caused the measured difference in effect.

But that is not at all what it means! It has nothing to do with that.

Rather, it is dealing with a completely different matter: if we generated two sets of numbers differing from the same value, by chance alone, in the same manner that the test data was seen to randomly vary, what percent of the time would chance alone produce the apparent effect that was seen?

For example let’s say we are medicinal chemists. We’ve taken on the really poor chance-of-success idea of randomly trying isolated compounds from previously unknown plants and seeing if they seem to increase the lifespan of rats.

The company we work for has been doing this for decades now. 1000 compounds have been run and, on giving really thorough investigation to each one that looked hopeful, only one has really panned out as solid.

But we keep going. We are now considering Compound X.

We ran it with, oh I don’t know, 16 rats. They lived an average of 33 months with a standard deviation of, oh I don’t know, 72 days.

Our established placebo results are average of 30 months, with standard deviation of, I don’t know, say 65 days.

We run the statistics, which I’m not actually going to do, and hey!! That 3 month difference is statistically significantly different to p of say right exactly at 0.05! In other words, only 5% of the time would chance alone yield an apparent improvement as large as 3 months!

Hooray! Let’s publish the article!

But wait a sec, is there really only a 5% probability that in the actual experiment, chance alone was the cause?

By no means. We already know that only 1 time in 1000 (roughly, as our best information) does a so-selected compound work to increase lifespan of rats.

But for every 1000 experiments we do, by chance alone on average 50 of them will turn up an apparent improvement as large as this!

So for every real effect, there are 50 others that will appear merely by chance but still looking just as solid and satisfying the p <= 0.05 requirement.

So the reality is that the chances are 50 to 1 against this compound actually having this effect, rather than a 95% chance that it does have this effect that most readers and scientists would assume.

This is far more important than one might think. This fundamental error results in a lot of nonsense being taken as being demonstrated.

When tens or hundreds of thousands of quite-unlikely-to-have-real-effect experiments are done each year in the world, and there most certainly are at least this many, there are inevitably vast numbers that clkaim outcomes as “statistically significant” but in fact the claimed effect does not exist.

Different situation from the article, as there it was claimed that a difference probably did not exist when that could not correctly be concluded from the data, but the same fundamental thing of coming to undemonstrated or wrong conclusions because of misunderstanding statistical fundamentals in ways that are practically epidemic among scientists, doctors, readers in general, etc.
[/quote]

(1) Your analysis is correct. Very good.

(2) Since each person has so many variables ‘attached’ to them, I really don’t care for studies like this. Age, years of training, what diet each person is on, and so forth.

Its simply better to choose what you enjoy and have fun with it.

Bill Roberts is a good man. I’m glad to see you posting more often lately Bill, you always add to the discussion in a positive way.

I knew something smelled fishy when I first went through the article a few days ago, and this is probably what it was. You obviously have more rigorous science training than I, so you were able to sniff it out better.

[quote]Headhunter wrote:

(2) Since each person has so many variables ‘attached’ to them, I really don’t care for studies like this. Age, years of training, what diet each person is on, and so forth.

Its simply better to choose what you enjoy and have fun with it.

[/quote]
Agreed.

This also is pretty much my response or thoughts (as I usually don’t respond) to debates about dieting.

It’s more important that something suit a person individually, even if for no reason other than personal taste and psychological preference, than if for people on average it may or not be a few percent superior in some regard to some other diet.

E.g., if a person really just is not going to do a given form of dieting, but would enjoy another, how does it benefit them to push them towards the one they won’t stick with? Or if they like a diet that I think really could be better, but they like it, stick with it, and do well with it, why should I persuade them that something else is better? If the person is not a competitive athlete then even if I am correct about the difference, better as you say that they do what they enjoy and make progress with.

[quote]Lonnie123 wrote:
I knew something smelled fishy when I first went through the article a few days ago, and this is probably what it was. You obviously have more rigorous science training than I, so you were able to sniff it out better.[/quote]

Actually you would be surprised where I learned the part about the real meaning of p < 0.05 and how in fact the correct conclusion may be that the thing is still, even with the new data, highly unlikely to be from real effect.

From amateur interest in parapsychology, which is very rigorous in statistics.

Not from 4 years of graduate school, or from the usual undegrad statistics course, or in the process of learning the additional statistics required for some of the papers I co-authored. Though it wouldn’t have made a difference in those cases.

Really, scientists aren’t taught that, or at least such is my understanding of what is typically taught and believed. They use the numbers but don’t know that they don’t at all, in themselves, unless of truly low values that are never seen in biological sciences actually say anything about the likelihood that the effect was real.

I had, while in grad school, a faint degree of awareness on the real meaning of the p values but only in a theoretical way without ever the concrete realization that in and of themselves – again unless of astoundingly low values – they say nothing about likelihood that the data in question occurred from the claimed potential cause or from chance only.

However, in fairness to my degree program, in chemistry p values in the vast majority of cases simply aren’t an issue. It is not a matter of chance as to whether the reaction went or not, whether the NMR spectrum was as follows or not, etc. So it’s really not important to a chemistry program, for the most part anyway.

This took a pleasant turn.

You might be interested (probably just Bill), that alot of physiological texts now want not only a p-value, but also 95% confidence intervals for mean differences between groups or changes over time, and usually effect sizes as well (mean relative to standard deviation). You often see a study where the improvement was “significant”, such as an improvement with a low intensity fitness program of 10%. However, when you calculate a small effect size (less than 0.2 for example) and then evaluate the “clinical significance” of the finding (a 10% fitness increase when VO2 is only 30ml.kg.min-1 is less than 1MET, that is nonsense), you can better interpret your results.

I do agree with you that a large majority of schools do not adequately teach the scientific process (edit: this was a terrible sweeping generalization - alot of schools just don’t have time in the curriculum to get into the nitty gritty of statistical analysis and interpretation).

HOWEVER, this is also the difference in rigor between submitting to a poor journal who don’t pay any attention to your stats, and more strict academic journals which seem to be insanely anal about the appropriateness of your stats.

Another interesting aspect you don’t see dealt with in alot of low level journals are corrections of the p-value for multiple comparisions (Bonferroni’s adjustment is commonly used), which can introduce the chance of finding something significant just by comparing lots of different things (in a brief way to sum that up).