Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a dataset of rental advertisements. For each advert I have its precise latitude (and longitude but let's take just latitude now for simplicity). For each latitude I have calculated boundaries approx. 2 kms north- and southwards (latmin and latmax):

  advert_id locality_public_lat   latmin   latmax
1  10083631             49.5794 49.56143 49.59737
2  10247827             50.0546 50.03663 50.07257
3  10647798             50.0910 50.07303 50.10897
4  10584515             50.0906 50.07263 50.10857

I would like to know now how many of all the other adverts fall in these boundaries for each advert (take all values in locality_public_lat column and compare it with latmin and latmax in first row and yield a count for the first row; then repeat for all other rows).

So far I tried to aggregate the data using summary function but the problem is that it compares only latitudes within each row and therefore the result is not valid (just counts all rows).

summary <- data_prodej_3 %>%
  group_by(advert_id, month_stats) %>%
  summarise(pocet_inz = sum(locality_public_lat > latmin & locality_public_lat < latmax))

Is there a way to summarise the data the way I described?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
905 views
Welcome To Ask or Share your Answers For Others

1 Answer

Guess that you need to work by using a function and some loops.

With your data saved in a file named lat.txt:

dvert_id,locality_public_lat,latmin,latmax
10083631,49.5794,49.56143,49.59737
10247827,50.0546,50.03663,50.07257
10647798,50.0910,50.07303,50.10897
10584515,50.0906,50.07263,50.10857

We can use:

library (tidyverse)

#load the data
data_prodej_3 <- read.csv(file='lat.txt')
#save the dvert_id list
dvert_id_vector  <- data_prodej_3[,c(1)]
#initialize the final recap table
recap_table <- data.frame(cbind(dvert_id=character(),locality_public_lat=numeric(),occurrences=integer()))

for (i in dvert_id_vector) {
  
  counter <- 0
    
  #save the locality_public_lat we're interested to compare
  locality_public_lat_value  <- data_prodej_3[data_prodej_3$dvert_id==i,c(2)]
  
  #remove the dvert_id we're examining
  data_prodej_3_filtered <- data_prodej_3[data_prodej_3$dvert_id!=i,]
  
  for (j in seq(1,length(data_prodej_3_filtered)-1)) {
      
    if (locality_public_lat_value > data_prodej_3_filtered[j,c(3)] & locality_public_lat_value < data_prodej_3_filtered[j,c(4)]) {
      counter <- counter + 1
    }
  
  }
  
  #save an entry in the recap table with the resulting count for the particular dvert_id
  recap_table <- rbind(recap_table, c(i,locality_public_lat_value,counter))
  
}

colnames(recap_table) <- c('dvert_id','locality_public_lat','occurrences')
recap_table

To obtain:

  dvert_id locality_public_lat occurrences
1 10083631             49.5794           0
2 10247827             50.0546           0
3 10647798             50.0910           1
4 10584515             50.0906           1

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...