[R] Appropriate specification of random effects structure for EEG/ERP data: including Channels or not?
paolo.canal at iusspavia.it
Mon Sep 28 10:42:34 CEST 2015
Thank you Philllip,
And sorry being late in the response. Thanks for the reference, I
believe that many of the published papers on ERPs with mixed models have
descriptions of the analysis that often lacks of detail (even when
looking for non *.linguistic journals).
Concerning your comments: I understand that nuisance factors are not
random effects. At the same time the eeg amplitude is recorded from
several electrodes which are units of observations around which the data
are clustered. In the paper you suggested they write:
"In simplified models, by-channel random slope parameters were estimated
at zero, resulting in failures to converge to an optimal solution. This
likely reflects the limited variance in effects across the selected
centro-parietal channels due to volume conduction. Therefore, random
slopes of effects across channels were not fit in final models." This
observation is not far from your point about adjacent electrodes being
very highly correlated with each other, but I guess this would emerge in
the random effects covariance matrix, only when asking for the
calculation of by-channel random slopes or intercepts for some of the
Therefore they used the term, only calculating adjustments to the
intercepts, and I would have done the same just to increase fit, because
I would say 1|ch better describes the data structure to lmer. But for
pragmatism or necessity or parsimony, I'll likely forget about this (I
am already trying to fit more than 80 parameters in the random structure
so I do not want to be too strict about the inclusion of channels).
I keep your suggestion about modeling the topographic factors using the
XYZ scalp-coordinates for the next studies when I'll have a better grasp
on mixed models and some more computational power ;).
Thanks again for all your insights (I will refer to the special interest
group in the next future).
On 28/09/2015 04:17, Phillip Alday wrote:
> You might also want to take a look at the recent paper from the Federmeier group, especially the supplementary materials. There are a few technical inaccuracies (ANOVA is a special case of hierarchical modelling, not the other way around), but they discuss some of the issues involved. And relevant for your work: they model channel as a grouping variable in the random-effects structure.
> Payne, B. R., Lee, C.-L., and Federmeier, K. D. (2015). Revisiting the incremental effects of context on word processing: Evidence from single-word event-related brain potentials. Psychophysiology.
>> On 24 Sep 2015, at 22:42, Phillip Alday <Phillip.Alday at unisa.edu.au> wrote:
>> There is actually a fair amount of ERP literature using mixed-effects
>> modelling, though you may have to branch out from the traditional
>> psycholinguistics journals a bit (even just more "neurolinguistics" or
>> language studies published in "psychology" would get you more!). But
>> just in the traditional psycholinguistics journals, there is a wealth of
>> literature, see for example the 2008 special issue on mixed models of
>> the Journal of Memory and Language.
>> I would NOT encode the channels/ROIs/other topographic measures as
>> random effects (grouping variables). If you think about the traditional
>> ANOVA analysis of ERPs, you'll recall that ROI or some other topographic
>> measure (laterality, saggitality) are included in the main effects and
>> interactions. As a rule of thumb, this corresponds to a fixed effect in
>> random effects models. More specifically, you generally care about
>> whether the particular levels of the topographic measure (i.e. you care
>> if an ERP component is located left-anterior or what not) and this is
>> what fixed effects test. Random effects are more useful when you only
>> care about the variance introduced by a particular term but not the
>> specific levels (e.g. participants or items -- we don't care about a
>> particular participant, but we do care about how much variance there is
>> between participants, i.e. how the population of participants looks).
>> Or, another thought: You may have seen ANOVA by-subjects and by-items,
>> but I bet you've never seen an ANOVA by-channels. ANOVA "implicitly"
>> collapses the channels within ROIs and you can do the same with mixed
>> models. (That's an awkward statement technically, but it should help
>> with the intuition.)
>> There is an another, related important point -- "nuisance parameters"
>> aren't necessarily random effects. So even if you're not interested in
>> the per-electrode distribution of the ERP component, that doesn't mean
>> those should automatically be random effects. It *might* make sense to
>> add a channel (as in per-electrode) random effect, if you care to model
>> the variation within a given ROI (as you have done), but I haven't seen
>> that yet. It is somewhat rare to include a per-channel fixed effect,
>> just because you lose a lot of information that way and introduce more
>> parameters into the model, but you could include a more fine-grained
>> notion of saggital / lateral location based on e.g. the 10-20 system and
>> make that into an ordered factor. (Or you could be extreme and even use
>> the spherical coordinates that the 10-20 is based on and have continuous
>> measures of electrode placement!) The big problem with including
>> "channel" as a random-effect grouping variable is that the channels
>> would have a very complicated covariance structure (because adjacent
>> electrodes are very highly correlated with each other) and I'm not sure
>> how to model this in a straightforward way with lme4.
>> More generally, in considering your random effects structure, you should
>> look at Barr et al (2013, "Random effects structure for confirmatory
>> hypothesis testing: Keep it maximal") and the recent reply by Bates et
>> al (arXiv, "Parsimonious Mixed Models"). You should read up on the GLMM
>> FAQ on testing random effects -- there are different opinions on this
>> and not all think that testing them via likelihood-ratio tests makes
>> That wasn't my most coherent response, but maybe it's still useful. And
>> for questions like this on mixed models, do check out the R Special
>> Interest Group on Mixed Models. :-)
>> On Thu, 2015-09-24 at 12:00 +0200, r-help-request at r-project.org wrote:
>>> Message: 4
>>> Date: Wed, 23 Sep 2015 12:46:46 +0200
>>> From: Paolo Canal <paolo.canal at iusspavia.it>
>>> To: r-help at r-project.org
>>> Subject: [R] Appropriate specification of random effects structure for
>>> EEG/ERP data: including Channels or not?
>>> Message-ID: <56028316.2050004 at iusspavia.it>
>>> Content-Type: text/plain; charset="UTF-8"
>>> Dear r-help list,
>>> I work with EEG/ERP data and this is the first time I am using LMM to
>>> analyze my data (using lme4).
>>> The experimental design is a 2X2: one manipulated factor is
>>> the other is noun (agreement being within subjects and items, and
>>> being within subjects and between items).
>>> The data matrix is 31 subjects * 160 items * 33 channels. In ERP
>>> research, the distribution of the EEG amplitude differences (in a
>>> window of interest) are important, and we care about knowing whether
>>> negative difference is occurring in Parietal or Frontal electrodes.
>>> the same time information from single channel is often too noisy and
>>> channels are organized in topographic factors for evaluating
>>> in distribution. In the present case I have assigned each channel to
>>> of three levels of two factors, i.e., Longitude (Anterior, Central,
>>> Parietal) and Medial (Left, Midline, Right): for instance, one
>>> is Anterior and Left. With traditional ANOVAs channels from the same
>>> level of topographic factors are averaged before variance is
>>> and this also has the benefit of reducing the noise picked up by the
>>> I have troubles in deciding the random structure of my model. Very
>>> examples on LMM on ERP data exist (e.g., Newman, Tremblay, Nichols,
>>> Neville & Ullman, 2012) and little detail is provided about the
>>> treatment of channel. I feel it is a tricky term but very important
>>> optimize fit. Newman et al say "data from each electrode within an
>>> were treated as repeated measures of that ROI". In Newman et al, the
>>> ROIs are the 9 regions deriving from Longitude X Medial
>>> Anterior-Midline, Anterior-Right, Central-Left ... and so on), so in
>>> way they treated each ROI separately and not according to the
>>> dimensions of Longitude and Medial.
>>> We used the following specifications in lmer:
>>> [fixed effects specification: ?V ~ Agreement * Noun * Longitude *
>>> * (cov1 + cov2 + cov3 + cov4)] (the terms within brackets are a
>>> of individual covariates, most of which are continuous variables)
>>> [random effects specification: (1+Agreement*Type of Noun | subject) +
>>> (1+Agreement | item) + (1|longitude:medial:channel)]
>>> What I care the most about is the last term
>>> (1|longitude:medial:channel). I chose this specification because I
>>> thought that allowing each channel to have different intercepts in
>>> random structure would affect the estimation of the topographic fixed
>>> effects (Longitude and Medial) in which channel is nested.
>>> a reviewer commented that since "channel is not included in the fixed
>>> effects I would probably leave that out".
>>> But each channel is a repeated measure of the eeg amplitude inside
>>> two topographic factors, and random terms do not have to be in the
>>> structure, otherwise we would also include subjects and items in the
>>> fixed effects structure. So I kind of feel that including channels as
>>> random effect is correct, and having them nested in longitude:medial
>>> allows to relax the assumption that the effect in the EEG has always
>>> same longitude:medial distribution. But I might be wrong.
>>> I thus tested differences in fit (ML) with anova() between
>>> (1|longitude:medial:channel) and the same model without the term, and
>>> third model with the model with a simpler (1|longitude:medial).
>>> Fullmod vs Nochannel:
>>> Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
>>> modnoch 119 969479 970653 -484621 969241
>>> fullmod 120 968972 970156 -484366 968732 508.73 1 < 2.2e-16 ***
>>> Differences in fit is remarkable (no variance components with
>>> close to zero; no correlation parameters with values close to ?1).
>>> Fullmod vs SimplerMod:
>>> Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
>>> fullmod 120 968972 970156 -484366 968732
>>> simplermod 120 969481 970665 -484621 969241 0 0 1
>>> Here the number of parameters to estimate in fullmod and simplermod
>>> the same but the increase in fit is very consistent (-509 BIC). So I
>>> guess although the chisquare is not significant we do have a string
>>> increase in fit. As I understand this, a model with better fit will
>>> more accurate estimates, and I would be inclined to keep the fullmod
>>> random structure.
>>> But perhaps I am missing something or I am doing something wrong.
>>> is the correct random structure to use?
>>> Feedbacks are very much appreciated. I often find answers in the
>>> and this is the first time I post a question.
More information about the R-help