Lies, damned [transcranial electrical stimulation] and statistics

Brain zap. Credit: Patrick Hoesly, on Flikr.

Brain zap. Credit: Patrick Hoesly, on Flikr.

A recent editorial in Nature magazine warns against the risks of the “Brain Blast” potentially associated with readily accessible brain-enhancement technologies like transcranial direct current stimulation (tDCS). tDCS involves passing an electric current across the surface of the skull in order to excite or inhibit brain activity within the regions of cerebral cortex that lie just below the surface of the brain. The concerns voiced in Nature echo those recently proposed by brain enhancement enthusiasts (ethicists), who caution that unbridled access to these technologies may outstrip the apparent benefits of cheap and reliable brain function enhancement.

What none of these individuals discuss is whether it is right to promote the effectiveness of a technique with such uncertain practical utility. Indeed, the question of safety becomes moot if the technique does not produce meaningful benefits. Moreover, by sounding the “safety” alarm prematurely, and emphasizing the purported effectiveness of this technique, well-meaning ethicists may actually promote, rather than dampen, public enthusiasm for conducting “do-it-yourself” (DIY) brain booster experiments.

For example, writing in the Journal of Medical Ethics, Fitz and Reiner note:

“[It] seems that many normal functions—working memory, numerical competence, risk-taking behaviour and more—can be either enhanced or enfeebled by tDCS.”

The Nature editors also note that “random electrical stimulation of the brain could improve mathematical abilities”, citing a study which I recently debunked for its claim that transcranial electrical stimulation effectively boosts arithmetic learning. This study’s purported positive finding relied upon misleading interpretations of null data, rather than encouraging results.

However, one poor study does not debunk an entire field. I intend to write a few follow-up pieces to investigate whether there is, in fact, evidence that transcranial electrical stimulation can produce practically-relevant performance enhancements on tasks such as “working memory, numerical competence, and risk-taking behaviour”.

For the second of this series, I will discuss one of the first reports indicating that numerical abilities can be improved through the use of tDCS.

Kadosh et al, 2010

Back in 2010, Kadosh and colleagues’ (Note: Kadosh is a co-author on the Snowball et al paper I recently discussed) conducted initial experiments intended to demonstrate that tDCS produces performance enhancements during numerical learning (freely accessible link). Taken at face value, they provided robust evidence in support of this hypothesis (from the abstract):

“The specificity and longevity of TDCS on numerical abilities establishes TDCS as a realistic tool for intervention in cases of atypical numerical development or loss of numerical abilities because of stroke or degenerative illnesses.”

Powerful claims, to say the least.

In the present study, Kadosh et al recruited 15 college students to learn an artificial number system, over a period of six days. The artificial numbers look like abstract shapes and have no resemblance to the Arabic numerals that westerners are familiar with. However, each artificial numeral had a specific value, equivalent to the numbers 1 through 9 in the Arabic system (below).

Artificial numbers, 1-9. Credit: Kadosh et al., 2010

Artificial numbers, 1-9.
Credit: Kadosh et al., 2010

On each day, participants randomly selected pairs of the artificial numbers for a total of 1584 trials, over a period of up to 2 hours. Upon presentation of each pair, participants were to judge which of the two values had a greater magnitude. Immediate feedback was provided, indicating whether the answer was correct or incorrect, but otherwise participants were not provided with any information regarding the value of each numeral.

During the first 20 minutes of this task, the experimenters stimulated broad regions of the parietal cortex (the right parietal lobe is believed to play an important role in numerical competence) while participants viewed the number pairs. The remainder of the 2 hour period was spent continuing with the same task in the absence of stimulation. After the two hours were complete, each participant completed tests of “number sense abilities”, including the numerical Stroop task and a number-to-space mapping task.

Three different stimulation protocols were administered to the 15 participants, who were divided into equal groups of five. One group received anodal stimulation (excitation) to the right parietal cortex and cathodal stimulation (inhibition) to the left parietal cortex (RA/LC); a second group received right cathodal and left anodal stimulation (RC/LA) and a final “sham” group only received stimulation for 30s to each parietal lobe, instead of the full 20 minutes.  Thus, the authors predict that the RA/LC group (excited right parietal cortex) will have enhanced mathematical learning (“enhancement group”) relative to the sham and RC/LA (inhibited right parietal cortex, “impairment group”) participants.

Numerical competence tests

Numerical Stroop

The primary test of numerical competence relies upon a variant of the famous “Stroop Effect”. The traditional Stroop task involves a reaction time test where participants must read simple words that have been systematically filled with different colors (e.g. “red”, “green”, “blue”). Do you notice how the words “green” and “blue” (incongruent colors) take more time to read than the (congruent color) word “red”? The difference in reaction time for reporting on the congruent vs incongruent colors can be measured and serves as the quantitative basis for the Stroop Effect.

In this version of the numerical Stroop paradigm, participants read numbers of different sizes. For example, on a congruent trial, the two presented numbers scale up in size with increasing magnitude (e.g. 2 4), whereas on incongruent trials numbers scale in reverse (e.g. 2 4).  On neutral trials the same number is presented twice with different sizes (e.g. 2 2).

Individuals with numerical competence tend to process incongruent trials slower than neutral trials (interference) and neutral trials slower than congruent trials (facilitation). The reaction time difference between incongruent and congruent trials can also be measured (congruity). Overall the order of reaction times is as follows:  incongruent > neutral > congruent. In contrast, individuals who lack numerical competence (e.g. young children, innumerate adults who grew up in societies lacking numeracy) tend to display reaction time differences that are smaller or absent (e.g. low magnitude of facilitation, interference and congruity).


The second test of numerical competence tested the abilities of participants to map numbers to the “space” along a number line. Participants were shown a number line with the artificial numbers with the smallest and largest values (e.g. “1″ vs “9″) as anchor points, and over a series of trials they were asked to place each of the remaining numbers at a location between the anchors (e.g. with distance from anchors proportionate to value).  Numerically competent individuals tend to map numbers linearly along the line (equal distances for each number), whereas young children and numerically incompetent adults tend to map numbers logarithmically.

Given that each of these measures has been shown to correlate with numerical competence in past studies, the authors hypothesized that these measures would provide a good index of how well participants had mastered the new number system.

Numerical Stroop: results

The authors report that on all tasks of numerical competence the enhancement group performed better than the impairment group, whereas the sham group fell somewhere in between–providing direct support for their hypothesis.

However, I don’t find this interpretation believable. Let’s go over their results.

Keep in mind that I will focus my comments on the comparison between the sham group and the enhancement (RA/LC) group, as this is the only comparison that can demonstrate performance enhancements—the impairment (RC/LA) group, on the other hand, may demonstrate performance impairments as a result of right parietal inhibition, independent of performance enhancements seen in the RA/LC group.

In their discussion of the results, the authors only report the effect of congruency, but they abstain from reporting on either facilitation or interference effects, which is a little weird (more on that below).

In their statistical analysis of congruency effects, the authors find a significant interaction between group (enhancement, impairment, sham), session (days 2 through 6) and congruity (reaction time difference between congruent and incongruent trials). Post-hoc analyses indicate a just significant congruity effect for the enhancement group (p=.035) and a non-significant congruity effect for the sham group (p=.12). When they analyzed results by day, congruity effects were significant for the enhancement group on days 4-6 and in the sham group on days 5-6.

Even if we take the statistical assumptions of this analysis at face value, then the authors show no difference between the sham and enhancement group. Just because one group shows a significant effect and another group does not, tells us nothing about the relative performance of the two groups (in fact, given the small sample size, it suggests we didn’t have the power to detect these effects very well). In order to demonstrate a performance enhancement, relative to the sham/control group, the authors need to compare the relative performance of the two groups, which they fail to do. (By the way, don’t feel bad if you missed this technical distinction: a 2011 report indicated that roughly half of all papers published in top neuroscience journals reporting this type of comparison make the same error!).

So when the authors infer that the enhancement group performs better than sham they are clearly making an error, but it is a common (albeit, extremely basic) error after all.

Additionally, on the final training day, the authors asked all participants to perform the numerical Stroop task, but using the numbers 1 to 9 instead of the artificial numbers. This time, they find no significant interaction across any of the groups, but instead a consistent significant effect of congruity across all groups. This suggests that all participants have numerical competence on the western number system.

Numerical Stroop task reaction time performance. Congruity effects (difference between incongruent and congruent trials) are apparent for both sham and enhancement (RA/LC) groups when using either Arabic or artificial numerals. Congruity effect magnitude does not appear to differ within either group when comparing performance on the two numeral systems. Credit: Kadosh et al., 2010

Figure 2. Numerical Stroop task reaction time performance. Congruity effects (incongruent minus congruent trials) are apparent for both sham and enhancement (RA/LC) groups for either Arabic or artificial numerals. Congruity effect magnitude does not differ for either group across the two  numeral systems. Credit: Kadosh et al., 2010

Visual inspection of the data (see Figure 2, above) indicates that this difference is due to better performance in the impairment group (RC/LA), whereas the sham and enhancement (RA/LC) groups have unchanged congruity effects.

This is consistent with the above point: even if we assumed that the sham group were performing worse on the numerical Stroop task than the enhancement group when using the artificial number system (which the authors do not provide evidence to suggest), this would appear to indicate that the sham group performs slightly poorer overall, even when using Arabic numerals. Thus, even if there were a meaningful performance difference, it would be just as likely to reflect baseline differences in numerical competence.

The authors also reported the “cumulative” congruity effect across all days (EDIT: the authors only count cumulative data for days which were “significant”, meaning they exclude data that didn’t cross the threshold, thus artificially magnifying their effect; thus, the graph they use is misleading, but I don’t report it here). For both the sham and enhancement groups, there is a congruity effect, but the authors don’t attempt to make any kind of statistical comparison between these groups here either.

Numerical Stroop results, raw data (click to enlarge).

Table 1. Numerical Stroop results, raw data (click to enlarge). Session: learning trial days 2-6. Daily average reaction times (RT) for enhancement (RA/LC), impairment (RC/LA) and sham groups.

For clarification, I have also reproduced the supplementary table (Table 1, left) that the authors provide in their original publication and an additional rough calculation of facilitation, interference and congruity effects (Table 2, below) for each day as well as cumulatively across all days (Note: I only include reaction times in this calculation, standard errors have been excluded, but it looks roughly the same as what the authors find).

In brief, the data are pretty messy (as is expected) and don’t really tell any clear story, which is probably why the authors only reported congruity effects. We can see that there is no facilitation effect for any group (and potentially reverse facilitation effects in some groups, which is of unclear importance). Meanwhile, there are clear interference effects for all groups. Feel free to try and reproduce the authors’ graphs seen in Figure 2 from the raw data I present in the table–I can’t seem to get them to match up.

Facilitation, interference and congruity calculations. Numbers were calculated based upon the reaction time difference between congruent and incongruent trials across all days. cumulative values across all days.

Table 2. Facilitation, interference and congruity calculations. Calculations made by subtracting reaction times (ignores SEM). Facilitation = congruent – neutral; interference = neutral – incongruent; congruity = incongruent – congruent. Cumulative values summed across all days.

A major review paper on the topic, and cited by the present study to either their interpretations of  the results, indicates that:

“Controls showed both facilitation (response to congruent trials faster than to neutral trials) and interference (response to neutral trials faster than to incongruent trials), whereas DD subjects showed a pattern similar to children at the end of first grade, that is, a lack of facilitation and a smaller overall effect.”

So basically: numerically competent individuals demonstrate facilitation and numerically incompetent individuals (e.g. those with developmental dyslexia or DD) don’t.

In the present study, none of the participants demonstrated facilitation. Thus, are any of the participants numerically competent with the artificial number system? It’s unclear if this is the case. On the other hand, all groups showed interference, so this suggests at least some potential numerical competence with this system in every group.

In summary:

The authors never directly compare the sham group to the enhancement group (RA/LC) and when we look at the raw data it’s unclear that participants in any group demonstrate a high level of numerical competence or clear evidence of numerical incompetence. Thus, even when taking the results of this study at face value, it is not possible to demonstrate performance enhancements in normal individuals on the numerical Stroop task as a result of right parietal excitation using tDCS.

Number-to-space mapping: results

On the number-to-space task, the authors report that the sham and impairment groups didn’t map the artificial numbers as well as the enhancement group. However, as before, they don’t make any direct comparisons between the sham and enhancement groups.

Nonetheless, the authors argue that “[b]rain stimulation also affected the performance in the number-to-space task.”

The authors’ find that the number-to-space data from the enhancement group best fits a linear function, despite the fact that “all studies that have documented the log-to-linear shift involved populations that showed linear mapping due to extensively learned material”. Thus, the authors infer that the enhancement group has undergone a rapid shift from logarithmic to linear spatial mapping with this new number system as a result of the tDCS treatment.

However, this claim is a bit disingenuous. The people in this study are college students! When tested on the Arabic numeral system, all of the groups map numbers with near perfect linear precision to a number line. So why should we expect them to need extensive learning to start mapping artificial numbers in the same way? They should already be primed to learn this relationship quickly.

Still, the authors contention would be true if the sham and impairment groups failed to learn how to map the artificial numbers in a linear fashion. This is exactly what the authors claim.

They report that the other two groups’ data are best fit by a logarithmic function. But, this is also dubious. The data may fit a log function better than a linear function, but by visual inspection, it’s also clear to me that they would fit a linear function very well. Thus, the appropriate test of this hypothesis is to compare the linear fits across all three groups, which the authors never do. This is extremely suspicious, because the relative comparison is what we care about! The authors are trying to make a categorical distinction (e.g. linear vs logarithmic), but the categories are similar enough to be indistinguishable.

In summary:

All three groups demonstrate a number-to-space mapping that (appears) to fit a linear function quite well. Because the authors never attempt to compare the strength of fit across groups, their analysis cannot tell us anything about whether one group has increased numerical competence relative to another.

Six month follow-up

Six months later, the authors asked participants in the enhancement group only to come back–all but one of them did so (n=4). They found that these participants still demonstrated a congruity effect (they don’t report raw data, however) and that they continued to map the artificial numbers to a line in a way which was best predicted by a linear function. The authors didn’t bother to call back any of the other participants, so it’s not possible to compare performance across groups—which means that because we lack a sham control, we don’t know anything about whether or not there was an enhancement in performance at follow-up.

Nonetheless, the authors conclude:

“[T]he current results show that noninvasive brain stimulation can not only impair such capacities but can also enhance numerical abilities with remarkable longevity.”


Kadosh et al (2010) claim that they have established tDCS as a method which induces long-lasting performance enhancements in normal individuals, and moreover, that this method can be extended to individuals who lack numerical abilities because of brain damage or atypical development.


However, there are numerous flaws in the authors’ claims.

I have discussed how the authors never appropriately compare the effects of tDCS when applied to stimulate the right parietal cortex to a sham control group. In fact, the very data that Kadosh et al (selectively) report, suggests that these two groups are fairly similar in learning performance. I will go on the record to state that the authors must have been well aware of the fact that their comparisons were insufficient (it is, I am sorry to say, patently obvious). Moreover, even if true enhancement effects had been demonstrated at baseline learning, the authors drop the sham group from long-term follow-up, thereby obviating the usefulness of this data.

This discussion still ignores the law of small numbers: statistical inferences based upon tiny samples don’t tell us very much. For most of the above discussion, I have ignored this fact in order to demonstrate that the study itself is flawed, regardless of its inherent statistical limitations (which are probably even more important). The authors only have fifteen participants in total, with a final follow-up sample size of n = 4. Even if we combine this sample of four with the sample of 12 included in the 6-month follow-up of their more recent study, the authors only have a 6-month sample size of n=16.

Best practice requires that each group have close to twice as many participants as this entire study, in order for researchers to have sufficient power to actually detect moderately sized effects. The effect sizes here are small, which means that the authors are very unlikely to be able to detect real effects (a good example: the failure to detect a congruity effect in the sham group). Thus, these data tell us less virtually nothing about whether or not the author’s favoured hypothesis is true.

(Here is a great video to get a sense for just how easy it is for studies of this type–actually, much better studies–to miss real effects.)

In the future, I intend to report more on the quality of evidence for the use of similar techniques to improve other aspects of human performance or as therapeutic treatments. If I find indications for which such techniques are effective, I will be sure to update this series as well.

ResearchBlogging.orgCohen Kadosh R, Soskic S, Iuculano T, Kanai R, & Walsh V (2010). Modulating neuronal activity produces specific and long-lasting changes in numerical competence. Current biology : CB, 20 (22), 2016-20 PMID: 21055945


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>