I have a dataset about gaming players that you can see on the images below.
The goal here is to predict the value of: Score.
[NUM] score
[DATE] RegistrationDate
[CAT] Gender
[NUM] Age
[CHR] City
[CHR] State
[CAT] Group
[CAT] GamingRoom
[NUM] PointsEarned
[CHR] Sponsor
[CAT] ServerNode
[NUM] DistanceFromServer
[CAT] PlayerType
[CHR] Device
Where:
NUM: numeric value
CAT: category value (or enumeration)
CHR: string value
The dataset was originally on a .CSV
file, imported by running the following command:
dataset_xyz = read.csv("R/xyz/dataset_xyz.csv")
I have two questions about this here:
Question 1: How can I transform the columns: { Gender, Group, Gaming, ServerNode, PlayerType } from numeric to something like category?, I don't want that during the training process these numbers be handled incorrectly.
Question 2: When I use string values on the columns during the training process, should these values be as Factor
like on the image above or should be as chr
(characters)?
Question 3: Before the training process, when I normalize, for example the numeric values (most likely by using: z-scores
technique), how do I keep those values in order to use the same scaling to normalize the test data?. Both data need to be handled with the same scale.
Thanks in advance!