I have a list of 77 patent unique IDs (e.g. US03082024A1, EN03082019B2 etc., n= 77).
I want to use R to automate the task of searching Google Patents (Url: https://patents.google.com/) and then pull the following data (patent classification code, application year, patent title, company and abstract) for each unique patent by ID.
The resultant file would be saved as a CSV with the column names identical to the data parameters above.
Hi @mchina1,
Im try to help but only can get the abstract for US03082024A1. The EN03082019B2 is not a valid patent id in this page.
Im use rvest. I have problems for get the correct nodes for others items. Im sure that an advanced R user could help better.
library(rvest)
link <- 'https://patents.google.com/patent/US20030082024A1/en?oq=US03082024A1'
url_data1 <- link |>
read_html() |>
html_nodes(xpath='//*[@id="A-0001"]') |>
html_text()
> url_data1
[1] "A cargo bar having reduced costs due in part to being constructed from square tubes and due to being
collapsible to a length that fits a 4 foot pallet so as to facilitate shipping and storage. Pressure induced
extension of the cargo bar against opposed truck walls is provided by a rack and pinion gear arrangement, the
rack teeth provided on a first tube wall and the pinion teeth provided on a pivotal lever mounted to a second
tube. The bar ends have pressure pads that will conform to side walls of a truck or van and the tube interior is
alternately fitted with retractable track pins that extend through the pads and retract behind the pads to
accommodate different cargo bar systems. "
Could you provide the other patent id for try to download all abstract?
The code above seems like it would certainly work.
I have attached two patents to trial:
— [US10952730B2]
— [EP3155984B1]
I was just wondering if there would be a way to have a CSV file with a list of these patent IDs and pull the data above i.e patent title, abstract, application year?