I have a tibble
of flights for a given airline schedule that I am trying to 'link' together. The data follows the IATA SSIM format, or more explicitly contains the current flight 'key' and the following flight 'key'. I am trying to 'link' these flights to determine the number of aircraft in that schedule. For example, if a schedule looks like:
flight123 -> flight456 -> flight789 -> end
flight987 -> flight654 -> flight321 -> end
This would require two (2) aircraft to fly.
I have been able to accomplish this using which()
, match()
, or filter()
, but I am having issues with the speed. My tibble
has over 200,000 rows so this is taking more than 8 minutes to return. I would like this to return in less than a minute, if possible. Example below:
library(tidyverse)
dat %>% mutate(nextIndex = purrr::map(nextKey, function(id){match(x = id, table = dat$key)}))
Could group_by()
or nest()
help to improve speed? Using a for()
loop took an absurd amount of time...
Here is a slice of the data... unfortunately, the IATA industry standard format does not include much data as the format is ancient. The key is defined as {FlightNumber}/{DepartureDate}{Origin}
. Tail numbers are not assigned and therefore not available.
# A tibble: 10 x 2
key nextKey
<glue> <glue>
1 3845/13Apr19GGG 3876/13Apr19DFW
2 246/29Apr19CLT 123/29Apr19PBI
3 2561/24Apr19PHX 2604/24Apr19BOS
4 2101/01Apr19DCA 1660/01Apr19DFW
5 3443/21Apr19BTR 3703/21Apr19DFW
6 2772/07Apr19JFK 1810/07Apr19AUS
7 784/21Apr19BWI NA
8 5199/25Apr19PHL 5090/25Apr19HVN
9 375/14Apr19JAX 2360/14Apr19DFW
10 5517/30Apr19YYZ 5301/30Apr19DCA
Ideally, I would like the final result to be grouped
or nested
by each individual line of flying (aircraft).