机器算法验证 - 有没有办法指定一个具有多个主体内因素的 lme 模型？ - 吾爱随笔录

数据

假设我们有一个数据集d，其中包含两个主体间因素（即组）group和condition，以及两个主体内因素（即重复测量因素），topic以及problem（我将数据上传到 pastebin，所以每个人都应该能够获得它）：

> d <- read.table(url("http://pastebin.com/raw.php?i=4hRFyaRj"), colClasses = c(rep("factor", 6), "numeric"))
> str(d)
'data.frame':   2928 obs. of  6 variables:
  $ code     : Factor w/ 183 levels "A03U","A08C",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ group    : Factor w/ 2 levels "control","experimental": 2 2 2 2 2 2 2 2 2 2 ...
  $ condition: Factor w/ 3 levels "alternatives",..: 3 3 3 3 3 3 3 3 3 3 ...
  $ topic    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
  $ problem  : Factor w/ 4 levels "AC","DA","MP",..: 3 4 1 2 3 4 1 2 3 4 ...
  $ mean     : num  94.5 94.5 86.5 84.5 80 46.5 73.5 43.5 51 39 ...

数据来自一项行为实验，其中六组（2 个级别group乘以 3 个级别condition）的参与者完成 16 项任务（每个任务 4 topics4 不同problems）。将参与者分配到组/条件是完全随机的。任务的呈现是随机的，因为问题被阻止在主题内（即，对于每个主题，所有问题都按顺序呈现），但问题和主题的顺序是随机的。
更新：识别参与者的因素（其中嵌套了主题和问题）是code。

问题

如何使用拟合这个数据集lme？
（旁注：我也会考虑使用lme4，但我有点害怕没有 p 值，如果有一些容易消化的 p 值，我也会考虑lme4一个选项）。

到目前为止，我设法拟合了一个lme只有一个主题内因素的模型，但不是两个（见下文）。

我试过的

lme如果我只有一个主题内因素，我可以拟合模型：

require(nlme)
 m1 <- lme(mean ~ condition*group*problem, random = ~1|code/problem, 
           data = d, subset = topic == "1")

anova(m1)
                        numDF denDF F-value p-value
(Intercept)                 1   531   12101  <.0001
condition                   2   177      31  <.0001
group                       1   177       2  0.2178
problem                     3   531      35  <.0001
condition:group             2   177       1  0.3672
condition:problem           6   531      24  <.0001
group:problem               3   531       1  0.2180
condition:group:problem     6   531       2  0.0281

这（尤其是 df）与标准 ANOVA（使用 ez）的结果很好地对应：

require(ez)
ezANOVA(subset(d, topic == "1"), dv = .(mean), wid = .(code), between = .(condition, group), within = .(problem))$ANOVA

Warning: Data is unbalanced (unequal N per group). Make sure you specified a well-considered value for the type argument to ezANOVA().
                   Effect DFn DFd     F                             p p<.05     ges
2               condition   2 177 30.69 0.000000000003611248905859672     * 0.13079
3                   group   1 177  1.53 0.217821969825403999321267179       0.00374
5                 problem   3 531 34.85 0.000000000000000000014254103     * 0.10028
4         condition:group   2 177  1.01 0.367225806638525886782531416       0.00492
6       condition:problem   6 531 24.40 0.000000000000000000000000142     * 0.13503
7           group:problem   3 531  1.48 0.217959293081550348203379031       0.00472
8 condition:group:problem   6 531  2.38 0.028119961573665430004664856     * 0.01499

尝试用两个主题内因素拟合此数据lme失败（每个代码或每个 dfs）：

m2 <- lme(mean ~ condition*group*problem*topic, random = ~1|code/(problem*topic), data = d)
# fails: Error in getGroups.data.frame(dataMix, groups) : 
#  Invalid formula for groups

m3 <- lme(mean ~ condition*group*problem*topic, random = ~1|code/problem/topic, data = d)
# the next model takes some time (probably already an indicator, that it is the wrong model)
# and produces wrong denominator df!

# with both factors as ANOVA
m4 <- ezANOVA(d, dv = .(mean), wid = .(code), between = .(condition, group), within = .(problem, topic))

#effects are the same
all(row.names(anova(m3))[-1] == m4$ANOVA$Effect)

#denominator dfs are not:
anova(m3)$denDF[-1] == m4$ANOVA$DFd

# only for effects with topic:
row.names(anova(m3))[-1][!(anova(m3)$denDF[-1] == m4$ANOVA$DFd)]

更新：由于精确的错误或嵌套有点不清楚，我在这里提供了等效的aov调用（这是“标准”模型 via aov），它与ezANOVA. 关键误差项是Error(code/(problem*topic))：

m5 <- aov(mean ~ (condition*group*problem*topic) + Error(code/(problem*topic)), d)
summary(m5)