Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm learning how to analyze using web scraping. However, at the moment I get an error when I use the website that is in the code and grab the season of 2020.

But if I grab the season of 2019 there is nothing wrong.

The error I get is : Error in names(x) <- value : names' attribute [27] must be the same length as the vector [20].

What does it mean, and how can I fix this code so I can create a data frame

Load the data

# Import/ingest the Formula 1 race results for season 2016 ----------------
# Take a look at the data in the browser
browseURL('https://www.formel1.de/rennergebnisse/wm-stand/2020/fahrerwertung')
# Fetch the contents of the HTML-table into the variable f1
f1 <- read_html('https://www.formel1.de/rennergebnisse/wm-stand/2020/fahrerwertung') %>% 
  html_node('table') %>% 
  html_table()
# Display our data
f1

This works fine

Transform the data

# Transform & tidy the data -----------------------------------------------
# Add missing column headers
colnames(f1) <- c('Pos', 'Driver', 'Total', sprintf('R%02d', 1:24))
# Convert to tibble data frame and filter on top 9 drivers
f1 <- as_tibble(f1) %>% 
  filter(as.integer(Pos) <= 10)
# Make Driver a factorial variable, replace all '-' with zeros, convert to long format
f1$Driver <- as.factor(f1$Driver)
f1[, -2] <- apply(f1[, -2], 2, function(x) as.integer(gsub('-', '0', as.character(x))))
f1long <- gather(f1, Race, Points, R01:R21)
# That looks better
f1long

error Error in names(x) <- value : 'names' attribute [27] must be the same length as the vector [20]

Source https://www.formel1.de/rennergebnisse/wm-stand/2020/fahrerwertung 2020

https://www.formel1.de/rennergebnisse/wm-stand/2019/fahrerwertung 2019


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
221 views
Welcome To Ask or Share your Answers For Others

1 Answer

The problem is not with the web-scraping, but with the colnames()-part.

The table f1 which you scrape contains 20 columns:

ncol(f1)
# [1] 20

But your colnames has 27 names, premising that f1 has 27 columns.

You thus need to do two changes to your code:

  • Change to colnames(f1) <- c('Pos', 'Driver', 'Total', sprintf('R%02d', 1:17)) [note the 17 instead of 24] and it should be fine.

  • In addition, change the gather()-part to f1long <- gather(f1, Race, Points, R01:R17) [again, note the 17 instead of 20].

(By the way, instead of gather(), it is recommended to use pivot_longer() in the future; cf. ?gather or see here.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...