Webscraping Timeout Error with VPN

Hi everyone, I am posting this on behalf of a student of mine who is currently located in China. He is encountering issues scraping American websites because of the firewall there. While he has a VPN to bypass the firewall, when he uses the VPN to try to webscrape, he encounters a timeout error. He is using rvest and his VPN is called "Clash for Window". I am not very familiar with how VPNs, proxies, networks, etc. work, especially in conjunction with webscraping, so I would really appreciate any insight.

He was not able to create a formal reprex, but here is the code used and the error output:

> url <- "https://trends.google.com/trends/?geo=US"
> wiki <- read_html(url)
Error in open.connection(x, "rb") : 
  Timeout was reached: [trends.google.com] Connection timed out after 10000 milliseconds

While he is attempting to scrape Google trends in the above, he was also having issues with other websites, including Wikipedia.

Here is his sessionInfo():

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936 
#> [2] LC_CTYPE=Chinese (Simplified)_China.936   
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C                              
#> [5] LC_TIME=Chinese (Simplified)_China.936    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] reprex_1.0.0    forcats_0.5.1   stringr_1.4.0   dplyr_1.0.3    
#>  [5] purrr_0.3.4     readr_1.4.0     tidyr_1.1.2     tibble_3.0.5   
#>  [9] ggplot2_3.3.3   tidyverse_1.3.0 rvest_0.3.6     xml2_1.3.2     
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7      compiler_4.0.3   
#>  [5] dbplyr_2.0.0      highr_0.8         tools_4.0.3       digest_0.6.27    
#>  [9] lubridate_1.7.9.2 jsonlite_1.7.2    gtable_0.3.0      evaluate_0.14    
#> [13] lifecycle_0.2.0   pkgconfig_2.0.3   rlang_0.4.10      cli_2.2.0        
#> [17] DBI_1.1.1         rstudioapi_0.13   curl_4.3          yaml_2.2.1       
#> [21] haven_2.3.1       xfun_0.20         withr_2.4.0       styler_1.3.2     
#> [25] httr_1.4.2        knitr_1.30        hms_1.0.0         generics_0.1.0   
#> [29] fs_1.5.0          vctrs_0.3.6       grid_4.0.3        tidyselect_1.1.0 
#> [33] glue_1.4.2        R6_2.5.0          fansi_0.4.2       readxl_1.3.1     
#> [37] rmarkdown_2.6     modelr_0.1.8      magrittr_2.0.1    scales_1.1.1     
#> [41] backports_1.2.0   ellipsis_0.3.1    htmltools_0.5.1   assertthat_0.2.1 
#> [45] colorspace_2.0-0  stringi_1.5.3     munsell_0.5.0     broom_0.7.4      
#> [49] crayon_1.3.4

Here is the info from Sys.getenv():

Sys.getenv()
#> AGSDESKTOPJAVA          C:\Program Files (x86)\ArcGIS\Desktop10.6\
#> ALLUSERSPROFILE         C:\ProgramData
#> APPDATA                 C:\Users\Raymond\AppData\Roaming
#> BESIEGE_GAME_ASSEMBLIES
#>                         D:/Games/SteamLibrary/steamapps/common/Besiege/Besiege_Data\Managed/
#> BESIEGE_UNITY_ASSEMBLIES
#>                         D:/Games/SteamLibrary/steamapps/common/Besiege/Besiege_Data\Managed/
#> CLICOLOR_FORCE          1
#> CommonProgramFiles      C:\Program Files\Common Files
#> CommonProgramFiles(x86)
#>                         C:\Program Files (x86)\Common Files
#> CommonProgramW6432      C:\Program Files\Common Files
#> COMPUTERNAME            DESKTOP-094HOAL
#> ComSpec                 C:\WINDOWS\system32\cmd.exe
#> CYGWIN                  nodosfilewarning
#> DISPLAY                 :0
#> DriverData              C:\Windows\System32\Drivers\DriverData
#> FPS_BROWSER_APP_PROFILE_STRING
#>                         Internet Explorer
#> FPS_BROWSER_USER_PROFILE_STRING
#>                         Default
#> GFORTRAN_STDERR_UNIT    -1
#> GFORTRAN_STDOUT_UNIT    -1
#> GIT_ASKPASS             rpostback-askpass
#> GOOGLE_API_KEY          no
#> GOOGLE_DEFAULT_CLIENT_ID
#>                         no
#> GOOGLE_DEFAULT_CLIENT_SECRET
#>                         no
#> HOME                    C:\Users\Raymond\Documents
#> HOMEDRIVE               C:
#> HOMEPATH                \Users\Raymond
#> LOCALAPPDATA            C:\Users\Raymond\AppData\Local
#> LOGONSERVER             \\DESKTOP-094HOAL
#> MPLENGINE               tkAgg
#> MSYS2_ENV_CONV_EXCL     R_ARCH
#> NUMBER_OF_PROCESSORS    4
#> NVIDIAWHITELISTED       0x01
#> OneDrive                C:\Users\Raymond\OneDrive
#> OS                      Windows_NT
#> PATH                    D:\R-4.0.3\bin\x64;C:\Program Files
#>                         (x86)\Common
#>                         Files\Oracle\Java\javapath;C:\Program Files
#>                         (x86)\Intel\iCLS Client\;C:\Program
#>                         Files\Intel\iCLS
#>                         Client\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
#>                         Files (x86)\Intel\Intel(R) Management Engine
#>                         Components\DAL;C:\Program Files\Intel\Intel(R)
#>                         Management Engine Components\DAL;C:\Program
#>                         Files (x86)\Intel\Intel(R) Management Engine
#>                         Components\IPT;C:\Program Files\Intel\Intel(R)
#>                         Management Engine Components\IPT;C:\Program
#>                         Files (x86)\NVIDIA
#>                         Corporation\PhysX\Common;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Python27;C:\WINDOWS\System32\OpenSSH\;D:\anal
#>                         application\phantomjs-2.1.1-windows\bin;D:\anal
#>                         application\selenium;C:\Program Files\Mozilla
#>                         Firefox;C:\Program Files
#>                         (x86)\Java\jre1.8.0_181\bin;C:\Program
#>                         Files\Microsoft VS Code\bin;C:\Program
#>                         Files\NVIDIA Corporation\NVIDIA
#>                         NvDLISR;D:\Git\cmd;C:\Users\Raymond\AppData\Local\Microsoft\WindowsApps;D:\bin\;D:\wind\bin\;
#> PATHEXT                 .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
#> PROCESSOR_ARCHITECTURE
#>                         AMD64
#> PROCESSOR_IDENTIFIER    Intel64 Family 6 Model 78 Stepping 3,
#>                         GenuineIntel
#> PROCESSOR_LEVEL         6
#> PROCESSOR_REVISION      4e03
#> PROCESSX_PSWJOXQBNTYY_1615860287
#>                         YES
#> ProgramData             C:\ProgramData
#> ProgramFiles            C:\Program Files
#> ProgramFiles(x86)       C:\Program Files (x86)
#> ProgramW6432            C:\Program Files
#> PSModulePath            C:\Program
#>                         Files\WindowsPowerShell\Modules;C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules
#> PUBLIC                  C:\Users\Public
#> QT_D3DCREATE_MULTITHREADED
#>                         1
#> R_ARCH                  /x64
#> R_BROWSER               false
#> R_COMPILED_BY           gcc 8.3.0
#> R_DOC_DIR               D:/R-4.0.3/doc
#> R_HOME                  D:/R-4.0.3
#> R_LIBS_USER             C:/Users/Raymond/Documents/R/win-library/4.0
#> R_PDFVIEWER             false
#> R_USER                  C:/Users/Raymond/Documents
#> RMARKDOWN_MATHJAX_PATH
#>                         D:/RStudio/resources/mathjax-27
#> RS_LOCAL_PEER           \\.\pipe\42860-rsession
#> RS_RPOSTBACK_PATH       D:/RStudio/bin/rpostback
#> RS_SHARED_SECRET        63341846741
#> RSTUDIO                 1
#> RSTUDIO_CONSOLE_COLOR   256
#> RSTUDIO_CONSOLE_WIDTH   62
#> RSTUDIO_MSYS_SSH        D:/RStudio/bin/msys-ssh-1000-18
#> RSTUDIO_PANDOC          D:/RStudio/bin/pandoc
#> RSTUDIO_PROGRAM_MODE    desktop
#> RSTUDIO_SESSION_PORT    42860
#> RSTUDIO_USER_IDENTITY   Raymond
#> RSTUDIO_WINUTILS        D:/RStudio/bin/winutils
#> SESSIONNAME             Console
#> SHIM_MCCOMPAT           0x810000001
#> SSH_ASKPASS             rpostback-askpass
#> SynaProgDir             Synaptics\SynTP
#> SystemDrive             C:
#> SystemRoot              C:\WINDOWS
#> TEMP                    C:\Users\Raymond\AppData\Local\Temp
#> TERM                    xterm-256color
#> TMP                     C:\Users\Raymond\AppData\Local\Temp
#> TMPDIR                  C:\Users\Public\Documents\Wondershare\CreatorTemp
#> TZDIR                   D:/R-4.0.3/share/zoneinfo
#> USERDOMAIN              DESKTOP-094HOAL
#> USERDOMAIN_ROAMINGPROFILE
#>                         DESKTOP-094HOAL
#> USERNAME                Raymond
#> USERPROFILE             C:\Users\Raymond
#> windir                  C:\WINDOWS

I'm happy to try to provide other context. If there is somewhere more relevant to ask, please let me know that as well.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.