Webscraping from list of values in Table

Hi all,

I have a list of 77 patent unique IDs (e.g. US03082024A1, EN03082019B2 etc., n= 77).

I want to use R to automate the task of searching Google Patents (Url: https://patents.google.com/) and then pull the following data (patent classification code, application year, patent title, company and abstract) for each unique patent by ID.

The resultant file would be saved as a CSV with the column names identical to the data parameters above.

Many thanks!

1 Like

Hi @mchina1,
Im try to help but only can get the abstract for US03082024A1. The EN03082019B2 is not a valid patent id in this page.

Im use rvest. I have problems for get the correct nodes for others items. Im sure that an advanced R user could help better.

library(rvest)

link <- 'https://patents.google.com/patent/US20030082024A1/en?oq=US03082024A1'

url_data1 <- link |> 
  read_html() |> 
  html_nodes(xpath='//*[@id="A-0001"]') |> 
  html_text() 

> url_data1
[1] "A cargo bar having reduced costs due in part to being constructed from square tubes and due to being 
collapsible to a length that fits a 4 foot pallet so as to facilitate shipping and storage. Pressure induced 
extension of the cargo bar against opposed truck walls is provided by a rack and pinion gear arrangement, the 
rack teeth provided on a first tube wall and the pinion teeth provided on a pivotal lever mounted to a second 
tube. The bar ends have pressure pads that will conform to side walls of a truck or van and the tube interior is 
alternately fitted with retractable track pins that extend through the pads and retract behind the pads to 
accommodate different cargo bar systems. "

Could you provide the other patent id for try to download all abstract?

1 Like

Hi — thanks a lot!

The code above seems like it would certainly work.

I have attached two patents to trial:
— [US10952730B2]
— [EP3155984B1]

I was just wondering if there would be a way to have a CSV file with a list of these patent IDs and pull the data above i.e patent title, abstract, application year?

Many many thanks

Hi @mchina1,
This patent dont have abstract, my script only download the abstract.