Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Data:

DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),    
                 orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
                 itemID = c(2,3,2,5,12,4,2,3,1,5),   
                 customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
                 itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99))

Expected outcome:

DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),    
                 orderDate = c("1.1.12","2.1.12","3.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
                 itemID = c(2,3,2,5,12,4,2,3,1,5),   
                 customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
                 itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99),
                 DateOfFirstOrderofCustomer = c("1.1.12", "2.1.12", "3.1.12", "1.1.12", "1.1.12", "3.1.12", "2.1.12", "2.1.12", "1.1.12", "1.1.12"))

For Understanding:

The orderID is continuous. Products orderd from the same customerID at the same day get the same orderID. When the same customer orders products at another day he/she it′s a new orderID.

I want to add an additional column for every row/entry which contains the date of the customer's first order (e.g. customer 1 (customerID 1) made his first order on 1.1.12 so this date is entered in all orders from this customer). How can we do this?

The original data has about 500k rows: so plz give a solution which needs only little perfomance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
157 views
Welcome To Ask or Share your Answers For Others

1 Answer

With just base R functions:

# convert the date column to date-format
DB$orderDate <- as.Date(DB$orderDate, format('%d.%m.%y'))

# get the first date for each customer
DB$DateFirstOrder <- with(DB, ave(orderDate, customerID, FUN = min))

the result is then (using the data of Mike Spencer):

> DB
   orderID  orderDate itemID customerID itemPrice DateFirstOrder
1        1 2012-01-01      2          1      9.99     2012-01-01
2        2 2012-01-04      3          2     14.99     2012-01-04
3        3 2012-01-06      2          3      9.99     2012-01-06
4        4 2012-01-13      5          1     19.99     2012-01-01
5        4 2012-01-13     12          1     29.99     2012-01-01
6        5 2012-01-12      4          3      4.99     2012-01-06
7        6 2012-01-10      2          2      9.99     2012-01-04
8        6 2012-01-10      3          2     14.99     2012-01-04
9        7 2012-01-21      1          1     49.99     2012-01-01
10       8 2012-01-24      5          1     19.99     2012-01-01

For the fastest solution, I would recommend the data.table package. To get the desired result with this package, you need to do:

library(data.table)
setDT(DB)[, orderDate := as.Date(orderDate, format('%d.%m.%y'))
          ][, DateFirstOrder := min(orderDate), by = customerID]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...