如何在 R 中生成引导样本?
数据挖掘
r
统计数据
数据
2022-02-17 00:44:11
1个回答
创建样本
sample您可以使用和replicate函数创建示例。我smp用第一行创建。
> smp <- c(rep('A', 8), rep('B', 4), rep('C', 2), rep('D', 11), rep('E', 21), rep('F', 0))
> table(smp)
smp
A B C D E
8 4 2 11 21
>
> samples <- replicate(1000, sample(smp)) # creating 1000 samples
> dim(samples) # 1000 samples of 46 votes
[1] 46 1000
>
> samples[,1:3]
[,1] [,2] [,3]
[1,] "E" "A" "E"
[2,] "D" "E" "E"
[3,] "E" "D" "C"
[4,] "D" "E" "E"
[5,] "A" "D" "D"
[6,] "B" "E" "E"
[7,] "E" "A" "D"
[8,] "B" "C" "D"
[9,] "B" "A" "E"
[10,] "E" "E" "E"
[11,] "A" "E" "E"
[12,] "D" "E" "D"
[13,] "D" "E" "B"
[14,] "D" "B" "E"
[15,] "E" "E" "D"
[16,] "E" "E" "E"
[17,] "E" "B" "B"
[18,] "A" "D" "E"
[19,] "E" "E" "E"
[20,] "E" "E" "C"
[21,] "E" "E" "E"
[22,] "E" "D" "A"
[23,] "D" "D" "B"
[24,] "E" "E" "E"
[25,] "D" "D" "A"
[26,] "E" "E" "D"
[27,] "C" "A" "E"
[28,] "D" "E" "D"
[29,] "D" "E" "E"
[30,] "D" "A" "E"
[31,] "A" "D" "E"
[32,] "E" "E" "A"
[33,] "E" "E" "D"
[34,] "E" "E" "A"
[35,] "A" "E" "E"
[36,] "D" "C" "A"
[37,] "A" "D" "A"
[38,] "E" "D" "D"
[39,] "E" "D" "E"
[40,] "A" "E" "A"
[41,] "B" "A" "A"
[42,] "C" "A" "D"
[43,] "A" "A" "D"
[44,] "E" "B" "E"
[45,] "E" "D" "B"
[46,] "E" "B" "E"
如果您只需要样品,就sample可以解决您的问题。
自举
要引导,您需要计算统计数据。例如,我计算投票表的加权平均值。我正在使用boot::boot引导程序。您必须传递原始样本和一个接收原始样本的处理程序s和一个索引被打乱的向量 ( idx)。此函数返回一个boot显示引导统计信息的对象。
> boot.smp <- boot::boot(smp, function(s, idx) {
+ tt <- table(s[idx])
+ weighted.mean(tt, as.numeric(factor(names(tt), labels=1:6, levels=LETTERS[1:6])))
+ }, 1000)
> boot.smp
CASE RESAMPLING BOOTSTRAP FOR CENSORED DATA
Call:
boot::boot(data = smp, statistic = function(d, w) {
tt <- table(d[w])
weighted.mean(tt, as.numeric(factor(names(tt), labels = 1:6,
levels = LETTERS[1:6])))
}, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 11.4 0.4126077 1.274827
该boot对象有其他方法来计算置信区间并返回更多信息。
其它你可能感兴趣的问题
