Webscraping from list of values in Table

mchina1 · August 4, 2024, 6:21pm

Hi all,

I have a list of 77 patent unique IDs (e.g. US03082024A1, EN03082019B2 etc., n= 77).

I want to use R to automate the task of searching Google Patents (Url: https://patents.google.com/) and then pull the following data (patent classification code, application year, patent title, company and abstract) for each unique patent by ID.

The resultant file would be saved as a CSV with the column names identical to the data parameters above.

Many thanks!

M_AcostaCH · August 8, 2024, 3:31am

Hi @mchina1,
Im try to help but only can get the abstract for US03082024A1. The EN03082019B2 is not a valid patent id in this page.

Im use rvest. I have problems for get the correct nodes for others items. Im sure that an advanced R user could help better.

library(rvest)

link <- 'https://patents.google.com/patent/US20030082024A1/en?oq=US03082024A1'

url_data1 <- link |> 
  read_html() |> 
  html_nodes(xpath='//*[@id="A-0001"]') |> 
  html_text() 

> url_data1
[1] "A cargo bar having reduced costs due in part to being constructed from square tubes and due to being 
collapsible to a length that fits a 4 foot pallet so as to facilitate shipping and storage. Pressure induced 
extension of the cargo bar against opposed truck walls is provided by a rack and pinion gear arrangement, the 
rack teeth provided on a first tube wall and the pinion teeth provided on a pivotal lever mounted to a second 
tube. The bar ends have pressure pads that will conform to side walls of a truck or van and the tube interior is 
alternately fitted with retractable track pins that extend through the pads and retract behind the pads to 
accommodate different cargo bar systems. "

Could you provide the other patent id for try to download all abstract?

mchina1 · August 8, 2024, 9:51am

Hi — thanks a lot!

The code above seems like it would certainly work.

I have attached two patents to trial:
— [US10952730B2]
— [EP3155984B1]

I was just wondering if there would be a way to have a CSV file with a list of these patent IDs and pull the data above i.e patent title, abstract, application year?

Many many thanks

M_AcostaCH · August 12, 2024, 2:11pm

Hi @mchina1,
This patent dont have abstract, my script only download the abstract.

system · November 10, 2024, 2:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.