Hello, I'm still new to R and I'm not good with XML or html. So I'm trying to receive information from the web using API and turn it into a dataframe. All the information I got back is in XML, with most the API calls, it worked well with the code below:
library(httr)
library(xml2)
library(dplyr)
#Reformat the function source to make it more readable
#Translate that to a plain POST call without namespacing
getInfoInJsonCont <- POST(url = "https://app.bluefolder.com/api/2.0/contracts/list.aspx",
body = "<request><contractList><listType></listType></contractList></request>",
authenticate(user = "TOKEN", password = "x"),
verbose(),
add_headers(),
encode = "json")
#Creating data frame object
ContractDF = data.frame(matrix(nrow = 0, ncol=2))
colnames(ContractDF) = c("contractId", "contractName")
#Parsing and getting the information for a specific node
ContractXML = content(getInfoInJsonCont) %>% xml2::xml_find_all("//contract")
for(contract in ContractXML){
contractId = contract %>% xml_find_all(".//contractId") %>% xml_text()
contractName = contract %>% xml_find_all(".//contractName") %>% xml_text()
#Using rbind to create the columns for the data frame
ContractDF = rbind(ContractDF,data.frame(contractId=contractId,
contractName=contractName
))
}
There's one API that's giving me errors because when it tries to parse the data, there's invalid characters. After looking through the file, it has these characters in it:
- "
- '
- 
-  
- 

- 
- 
Here's the steps I tried to do but I'm stuck:
library(httr)
library(xml2)
library(dplyr)
library(XML)
library(plyr)
getInfoInJsonSR <- POST(url = "https://app.bluefolder.com/api/2.0/serviceRequests/list.aspx",
body = "<request><listType>full</listType></request>",
authenticate(user = "TOKEN", password = "x"),
verbose(),
add_headers(),
encode = "json")
#Approach is to get the content as text so I can try to clean the data with gsub.
serviceXML = content(getInfoInJsonSR, type = "text", encoding = "UTF-8")
#Change working directory to be able to save on network share drive
setwd("\\\\hwo-file\\ExampleLocation")
#Save the data in an txt file format
write.table(serviceXML, file="Sample.txt")
#Formatting and using gsub to get rid of invalid XML characters to successfully parse the data.
# Read a txt file
tx <- readLines("Sample.txt")
tx <- gsub("'", "", tx)
tx <- gsub('"', "", tx)
tx <- gsub("
", "", tx)
tx <- gsub(" ", "", tx)
tx <- gsub("
", "", tx)
tx <- gsub("", "", tx)
tx <- gsub("", "", tx)
tx <- gsub("&", "", tx)
#The full tx object is a long list so I try to convert the list into 1 string.
tx2 <- paste( unlist(tx), collapse='')
#Exporting clean file to an XML file format
write.table(tx2,file("Sample2.txt"))
#Parsing the clean XML File
data <- xmlParse(file = "Sample2.txt")
When I try xmlParse, I get the error:
Error: 1: Start tag expected, '<' not found
I know the start tag is there but I can't get past this error. I want to be able to parse the data successfully and make a data frame. Here is a sample from what I get from the:
serviceXML = content(getInfoInJsonSR, type = "text", encoding = "UTF-8")
"x"
"1" "<?xml version=\"1.0\" ?><response status='ok'><serviceRequestList><serviceRequest><accountManagerId>11111</accountManagerId><billable>0</billable><billableTotal>0.0000000000</billableTotal><billingStatus>Not Billed</billingStatus><costTotal>0.0000</costTotal><customerContactEmail>example@imperial.nhs.uk</customerContactEmail><customerContactId>2222222</customerContactId><customerContactName>Example Example</customerContactName><customerContactPhone>0044 (0)000 111 2222</customerContactPhone><customerContactPhoneMobile></customerContactPhoneMobile><customerId>444444</customerId><customerLocationCity>London</customerLocationCity><customerLocationCountry>United Kingdom</customerLocationCountry><customerLocationId>9999999</customerLocationId><customerLocationName>Example's Hospital</customerLocationName><customerLocationNotes></customerLocationNotes><customerLocationPostalCode>W2 1NY</customerLocationPostalCode><customerLocationState>Greater London</customerLocationState><customerLocationStreetAddress>Example Street</customerLocationStreetAddress><customerLocationZone></customerLocationZone><customerName>Example Healthcare (EXAMPLE)</customerName><dateTimeCreated>2010-04-06T09:47:25</dateTimeCreated><dateTimeClosed>2011-05-24T07:32:05.240</dateTimeClosed><description>Example - Ex/CANCELED</description><detailedDescription></detailedDescription><priority>3</priority><priorityLabel>Medium</priorityLabel><serviceManagerId>0</serviceManagerId><serviceRequestId>1007</serviceRequestId><status>Closed</status><timeOpen_hours>9909.7500000</timeOpen_hours><type></type></serviceRequest><serviceRequest><accountManagerId>11111</accountManagerId><billable>0</billable><billableTotal>0.0000000000</billableTotal><billingStatus>Not Billed</billingStatus><costTotal>0.0000</costTotal><customerContactEmail>example.example@gstt.nhs.uk, example2.example2@gstt.nhs.uk</customerContactEmail><customerContactId>5555555</customerContactId><customerContactName>Ex Example</customerContactName><customerContactPhone>88888 444444</customerContactPhone><customerContactPhoneMobile>07817 738912</customerContactPhoneMobile><customerId>957056</customerId><customerLocationCity>London</customerLocationCity><customerLocationCountry>United Kingdom</customerLocationCountry><customerLocationId>1372407</customerLocationId><customerLocationName>St Thomas' Hospital</customerLocationName><customerLocationNotes></customerLocationNotes><customerLocationPostalCode>SE1 7EH</customerLocationPostalCode><customerLocationState>Greater London</customerLocationState><customerLocationStreetAddress>Example Bridge Road</customerLocationStreetAddress><customerLocationZone></customerLocationZone><customerName>Examples' Trust (EXTT)</customerName><dateTimeCreated>2010-06-10T07:37:58</dateTimeCreated><dateTimeClosed>2010-06-10T07:42:40</dateTimeClosed><description>Software - EXAMPLE - 65463</description><detailedDescription>The example that I have created.
This is an example, I made up the data.
This is another line for the example. 
</detailedDescription><priority>3</priority><priorityLabel>Medium</priorityLabel><serviceManagerId>0</serviceManagerId><serviceRequestId>6007</serviceRequestId><status>Closed</status><timeOpen_hours>0.0833000</timeOpen_hours><type>Problem</type></serviceRequest></serviceRequestList></response>"
How can I parse the example above? I tried
data <- htmlTreeParse("Sample.txt")
data
And got the results below:
$file
[1] "Sample.txt"
$version
[1] ""
$children
$children$html
<html>
<body>
<p>
"x"
"1" "
<?xml version=\"1.0\" ??>
<response status="ok">
<servicerequestlist>
<servicerequest>
<accountmanagerid>111111</accountmanagerid>
...
...
</servicerequest>
</servicerequestlist>
</response>
"
</p>
</body>
</html>
attr(,"class")
[1] "XMLDocumentContent"