您不需要多个merge()步骤,只需要aggregate()两个感兴趣的变量:
> aggregate(dx[, -1], by = list(ID = dx$ID), head, 1)
  ID AGE FEM
1  1  30   1
2  2  40   0
3  3  35   1
> system.time(replicate(1000, aggregate(dx[, -1], by = list(ID = dx$ID), 
+                                       head, 1)))
   user  system elapsed 
  2.531   0.007   2.547 
> system.time(replicate(1000, {ag <- data.frame(ID=levels(dx$ID))
+ ag <- merge(ag, aggregate(AGE ~ ID, data=dx, function(x) x[1]), "ID")
+ ag <- merge(ag, aggregate(FEM ~ ID, data=dx, function(x) x[1]), "ID")
+ }))
   user  system elapsed 
  9.264   0.009   9.301
比较时间:
1)马特的解决方案:
> system.time(replicate(1000, {
+ agg <- by(dx, dx$ID, FUN = function(x) x[1, ])
+ # Which returns a list that you can then convert into a data.frame thusly:
+ do.call(rbind, agg)
+ }))
   user  system elapsed 
  3.759   0.007   3.785
2) Zach 的 reshape2 解决方案:
> system.time(replicate(1000, {
+ dx <- melt(dx,id=c('ID','FEM'))
+ dcast(dx,ID+FEM~variable,fun.aggregate=mean)
+ }))
   user  system elapsed 
 12.804   0.032  13.019
3)史蒂夫的data.table解决方案:
> system.time(replicate(1000, {
+ dxt <- data.table(dx, key='ID')
+ dxt[, .SD[1,], by=ID]
+ }))
   user  system elapsed 
  5.484   0.020   5.608 
> dxt <- data.table(dx, key='ID') ## one time step
> system.time(replicate(1000, {
+ dxt[, .SD[1,], by=ID] ## try this one line on own
+ }))
   user  system elapsed 
  3.743   0.006   3.784
4) Chase 使用数字而非因子的快速解决方案ID:
> dx2 <- within(dx, ID <- as.numeric(ID))
> system.time(replicate(1000, {
+ dy <- dx[order(dx$ID),]
+ dy[ diff(c(0,dy$ID)) != 0, ]
+ }))
   user  system elapsed 
  0.663   0.000   0.663
和 5) Matt Parker 替代 Chase 的解决方案,对于 character 或 factor ID,它比 Chase 的数字略快ID:
> system.time(replicate(1000, {
+ dx[c(TRUE, dx$ID[-1] != dx$ID[-length(dx$ID)]), ]
+ }))
   user  system elapsed 
  0.513   0.000   0.516