在 R 中使用自定义函数进行变异不起作用

数据挖掘 r 数据框 dplyr
2022-02-20 21:46:32

我有一个数据框,其中包含一个名为:“频率”的列。频率有“年”、“周”、“月”等值。现在我想基于频率列创建一个新列,其中年的新对应值为 1,月的对应值为 12,周对应的值为是 48 岁。我尝试为此创建一个函数作为“getValue”,并尝试创建一个新列,在该函数上应用突变(dplyr)。但不幸的是,我收到以下警告,并且所有值都被“1”的值转换。

getValue <- function(input) {

          if (input == 'Year')
          {
            result <- 1
          } 
          else if(input == 'Month')
          {
            result <- 12
          } 
          else if(input == 'Week')
          {
            result <- 48
          } 

          return(result)
        }

      Data =  mutate(gateway, YearlyHit = getValue(gateway$Frequency))
      Data

这是我收到的警告消息-

Warning message:
In if (input == "Year") { :
  the condition has length > 1 and only the first element will be used

如何在 R 中获得所需的结果?

4个回答

您传递给 mutate 的函数getValue不理解矢量数据。

尝试:

v_getValue <- Vectorize(getValue)    
Data =  mutate(gateway, YearlyHit = v_getValue(gateway$Frequency))

更多细节在这里

你为什么不这样解决呢?

mutate(gateway, YearlyHit = case_when(Frequency == 'Year' ~ 1,
                                      Frequency == 'Month' ~ 12,
                                      Frequency == 'Week' ~ 48)
)

我的问题已通过以下命令解决:

gateway$newcol <- mapply(getValue, gateway$Frequency)

代替:

Data =  mutate(gateway, YearlyHit = getValue(gateway$Frequency))

另一种方法是使用 mutating 和 ifelse:

最小可重复示例的数据:

df = tibble(
            x = 1:10,
            Frequency = c('Year','Week','Month',
            'Month','Year','Week','Year','Year',
            'Week','Month')
            )
# A tibble: 10 x 2
       x Frequency
   <int> <chr>    
 1     1 Year     
 2     2 Week     
 3     3 Month    
 4     4 Month    
 5     5 Year     
 6     6 Week     
 7     7 Year     
 8     8 Year     
 9     9 Week     
10    10 Month   

df %>% mutate(YearlyHit = ifelse(Frequency == 'Year', 1,
                  ifelse(Frequency == 'Month', 12,
                  ifelse(Frequency == 'Week', 48,NA))))

您也可以使用 %in% 运算符实现此目的:

df %>% mutate(YearlyHit = ifelse(Frequency %in% 'Year', 1,
                  ifelse(Frequency %in% 'Month', 12,
                  ifelse(Frequency %in% 'Week', 48,NA))))

或者也许这更具可读性:

df %>%
  mutate(YearlyHit = 1 * (Frequency == 'Year') +
                   12 * (Frequency == 'Month')+
                   48 * (Frequency == 'Week'))
                   
# A tibble: 10 x 3
       x Frequency YearlyHit
   <int> <chr>         <dbl>
 1     1 Year              1
 2     2 Week             48
 3     3 Month            12
 4     4 Month            12
 5     5 Year              1
 6     6 Week             48
 7     7 Year              1
 8     8 Year              1
 9     9 Week             48
10    10 Month            12