I'm working with H2O on a Regression problem.
I have like 10 continuous variables and 20 discrete variables. One of these variables have a high cardinality. Then I want to use: Target Encoding
for it.
The target variable
I need to predict is continuous.
I was reading the following document:
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/target-encoding.html
On that specific example they are using a Gradient Boosting Machine
for the model on a Classification
problem. However, I tried the same steps for my Regression
problem.
At some point they say to run the following lines:
# Create a fold column in the train dataset
train$fold <- h2o.kfold_column(train, nfolds = 5, seed = 1234)
# Fit the target encoding map
te_map <- h2o.target_encode_fit(
train,
x = list("addr_state"),
y = response,
fold_column = "fold"
)
but when I run the second one I get the following error:
ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/99/Rapids)
java.lang.IllegalStateException
[1] "java.lang.IllegalStateException: `target` must be a binary categorical vector. We do not support multi-class and continuos target case for now"
[2] " ai.h2o.automl.targetencoding.TargetEncoder.ensureTargetColumnIsBinaryCategorical(TargetEncoder.java:156)"
[3] " ai.h2o.automl.targetencoding.TargetEncoder.prepareEncodingMap(TargetEncoder.java:105)"
[4] " water.rapids.ast.prims.mungers.AstTargetEncoderFit.apply(AstTargetEncoderFit.java:53)"
[5] " water.rapids.ast.prims.mungers.AstTargetEncoderFit.apply(AstTargetEncoderFit.java:23)"
[6] " water.rapids.ast.AstExec.exec(AstExec.java:63)"
[7] " water.rapids.Session.exec(Session.java:85)"
[8] " water.rapids.Rapids.exec(Rapids.java:94)"
[9] " water.api.RapidsHandler.exec(RapidsHandler.java:38)"
[10] " sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)"
[11] " sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"
[12] " java.lang.reflect.Method.invoke(Method.java:498)"
[13] " water.api.Handler.handle(Handler.java:60)"
[14] " water.api.RequestServer.serve(RequestServer.java:462)"
[15] " water.api.RequestServer.doGeneric(RequestServer.java:295)"
[16] " water.api.RequestServer.doPost(RequestServer.java:221)"
[17] " javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"
[18] " javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"
[19] " org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"
[20] " org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)"
[21] " org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"
[22] " org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:427)"
[23] " org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"
[24] " org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"
[25] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[26] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[27] " water.webserver.jetty8.Jetty8ServerAdapter$LoginHandler.handle(Jetty8ServerAdapter.java:119)"
[28] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[29] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[30] " org.eclipse.jetty.server.Server.handle(Server.java:370)"
[31] " org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"
[32] " org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"
[33] " org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:984)"
[34] " org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1045)"
[35] " org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)"
[36] " org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:236)"
[37] " org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"
[38] " org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"
[39] " org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"
[40] " org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"
[41] " java.lang.Thread.run(Thread.java:748)"
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
`target` must be a binary categorical vector. We do not support multi-class and continuos target case for now
where at the end we can read the following:
`target` must be a binary categorical vector.
We do not support multi-class and continuos target case for now
My questions are:
- Is
H2O
NOT supportingTarget Encoding
when the target variable iscontinuous
? - If the previous point is
TRUE
, do you know if they are planning to support it in the future? - Do you know about any
R
package that supportsTarget Encoding
for discrete variables when the target variable iscontinuous
(Regression
)?
On that link, at the very beginning they say:
Target encoding is the process of replacing a categorical value with the mean of the target variable.
"Mean of the target variable"? Normally the mean is only applicable to continuous
variables. So, based on that, their algorithm should support continuous
target variables.
Thanks!