Hi dear R community
When I read a CSV file to apply Hill climbing algo : No problem !
When I use readxl . I get this :
===== The check gives this :
str(Donnees)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 54 obs. of 7 variables: thick : chr "petit" "moyen" "gros" "petit" ...
Bshaft: chr "fin" "fin" "fin" "medium" ...
$ length: chr "court" "court" "court" "court" ...
....
....
res_hc <- hc(Donnees.Learn,whitelist=NULL,debug=TRUE, score = "aic")
====>>> Error in data.type(x) : variable thick is not supported in bnlearn (type: character).
When I am using read CSV the co_type is factor... (instead of "chr")... I have not seen any way to convert the dataframe after having red the file with excel.
Thanks a lot for an indication
It sounds like you have two questions. One loading data as an excel file. And one understanding the error message bnlearn gives you.
With the read.csv call, are you familiar with the stringsAsFactors setting? (some fun background). The readr package's read_csv had string variables as character types.
I have a feeling the data-loading issue will be a quick solve here once you give a reprex. But for the bnlearn question you might change the category to #ml, machine learning and modeling.
Thanks for these elements. I will investigate some of them.
The purpose is for me to transform "chr" vector to factor vector for bnlearn after reading the data from excel. (and not from csv where the dataframe is directly "factor")
In other word,the question, in reading data from Excel file, would be :
==> Why does stringsAsFactors not default to TRUE ????
Thanks also for the "stringsasfactors-an-unauthorized-biography/"... which gives the idea to investigate on the function "...as.factor " or something approaching...
As a reprex, here is the simplified set of R commands:
Script:
library(bnlearn)
library(lattice)
library(gRain)
library(readxl)
setwd("My_Dir") # working dir
Donnees <- read_excel("My_data.xlsx", sheet = "RB_FLAP") #
str(Donnees)
This sounds quite versatile... But it is another package...
I am surprised not to find a simple function to adapt data like if it comes from csv file...
Is there any simpler convert function like : Donnees2 <- as.factor(Donnees) ?
I will try any with this new package... Thanks
Thanks again for this step by step conversion procedure...
After installing two new packages (yaml & "dplyr"), I applied the command : Donnees <- read_excel("Excel_vers_R/RB_FLAP_to_R complet.xlsx", sheet = "RB_FLAP_to_categories") %>% mutate_if(
** is.character, as.factor)**
And I got this : Error in mutate_if(., is.character, as.factor) : **
** could not find function "mutate_if"
Nonetheles I got the help for this function and I explore it.... Just a question: What is the meaning of the string %>% ???
Thanks in advance
Almost working with the following sequence :
library(bnlearn)
library(lattice)
library(gRain)
library(yaml)
library(dplyr)
Donnees <- read_excel("My_dir/My_data.xlsx", sheet = "RB_FLAP",col_names = TRUE) %>% mutate_if(is.character, as.factor)
Donnees
A tibble: 54 x 7
thick Bshaft length Ribs Strain Utotal MRFY
1 petit fin court non ].2-.4] ].5-1] ]1.3-2]
2 moyen fin court non [0-.2] [0.-.5] ]1.3-2]
1 petit fin court non ].2-.4] ].5-1] ]1.3-2]
2 moyen fin court non [0-.2] [0.-.5] ]1.3-2]
3 gros fin court non [0-.2] [0.-.5] ]1.3-2]
4 petit medium court non ].2-.4] ].5-1] ]1.3-2]
... with 50 more rows>>>>>>>>>>>>> for this test Learnset = Donnees
res_hc <- hc(Donnees.Learn,whitelist=NULL,debug=TRUE, score = "aic") Error in check.data(x) : variable thick must have at least two levels. >>>>>>>>> Nonetheless
levels(learn_set[["thick"]])
[1] "gros" "moyen" "petit"
So close to the solution. I look more precisely into "cell-and-column-types.html" you gave me before... I let you know...
For modeling, it makes a lot of sense (to me at least) to make them factors. However, there are a lot of cases where it is much better to work with the raw strings and the creators of those packages made that decision on that basis.
The modeling packages, specifically recipes, will convert them to factors since that what you would need for models.