左加入 b.key 在 R 中为 NULL

数据挖掘 r sql
2022-03-11 19:03:15

我正在尝试在 R 中复制以下 sql 查询

select a.*, b.key from Table1 a 
LEFT OUTER JOIN Table2 b 
on a.key = b.key where b.key is null

我已经通读了这篇文章,但是我仍在努力编写我的具体案例。https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right

我尝试了以下方法,但结果不允许我过滤b.key IS NULL

LoansToInsert_stg1 <- merge(x = Prior_stg2, y = BankLoans_stg2, 
                            by = "Account_ID", all.x = TRUE)

有什么见解吗?

例子:

Key1 <- c("A1","A2","A3","A4","A5")  
Key2 <- c("A1","A2","A3","B4","B5")  
BV1 <- c(100, 200, 300, 400, 500)  
BV2 <- c(150, 250, 350, 450, 550)  
df1 <- as.data.frame(cbind(Key1, BV1))  
df2 <- as.data.frame(cbind(Key2, BV2))  

作为新df的预期输出:

Key1 BV1 Key2 BV2
A1 100 A1 150
A2 200 A2 250
A3 300 A3 350
A4 400 NA NA
A5 500 NA NA
1个回答

如果我理解正确:

Table1 <- data.frame(key = seq(1,100),a.data = rnorm(100))
Table2 <- data.frame(key = c(seq(1,30),rep(NA,30)), b.data = seq(1,60))


##Assuming this is what you want
library(sqldf)
sql.ans <- sqldf("select a.*, b.key from Table1 a LEFT OUTER JOIN Table2 b on a.key = b.key where b.key is null")

## dplyr version
library(dplyr)

dplyr.ans <- Table1 %>% filter(!key %in% Table2$key)

## Regular R version
R.ans <- Table1[which(!Table1$key %in% Table2$key),]

在虚拟数据和预期输出之后编辑

dplyr.ans2 <- left_join(df1,df2, by = c("Key1" = "Key2"))
 Key1 BV1  BV2
 1   A1 100  150
 2   A2 200  250
 3   A3 300  350
 4   A4 400 <NA>
 5   A5 500 <NA>