- URL:https://<geoanalytics-url>/ForestBasedClassificationAndRegression
- Version Introduced:10.7
Description
The ForestBasedClassificationAndRegression operation creates models and generates predictions using an adaptation of Leo Breiman's random forest algorithm, which is a supervised machine learning method. Predictions can be performed for both categorical variables (classification) and continuous variables (regression). Explanatory variables can take the form of fields in the attribute table of the training features. In addition to validation of model performance on the training data, predictions can be made to another feature dataset.
The following are examples:
- You have seagrass occurrence and a number of environmental explanatory variables that have been enriched using a multivariable grid to calculate distances to factories upstream and major ports. Future seagrass occurrence can be predicted based on future projections for those same environmental explanatory variables.
- You have crop yield data at hundreds of farms across the country, along with other attributes at each of those farms (number of employees, acreage, and so on). Using this data, you can provide a set of features representing farms where you don't have crop yield (but you do have all of the other variables), and make a prediction about crop yield.
- Housing values can be predicted based on the prices of houses sold in the current year. The sale price of homes sold, along with information about the number of bedrooms, distance to schools, proximity to major highways, average income, and crime counts, can be used to predict sale prices of similar homes.
Request parameters
Parameter | Details |
---|---|
predictionType (Required) | Specifies the operation mode of the tool. The tool can be run to train a model to only assess performance or to train a model and predict features. Prediction types are as follows:
REST Examples
|
inFeatures (Required) | The features that will be used to train the dataset. This layer must include fields representing the variable to predict and the explanatory variables. Syntax: As described in Feature input, this parameter can be one of the following:
REST Examples
|
featuresToPredict (Required if using TrainAndPredict) | A feature layer representing locations where predictions will be made. This layer must include explanatory variable fields that correspond to fields used in inFeatures. This parameter is only used when predictionType is TrainAndPredict and is required in that case. Syntax: As described in Feature input, this parameter can be one of the following:
REST Examples
|
variablePredict (Required) | The variable from the inFeatures parameter containing the values to be used to train the model and a Boolean denoting whether it's categorical. This field contains known (training) values of the variable that will be used to predict at unknown locations. REST Examples
|
explanatoryVariables (Required) | A list of fields representing the explanatory variables and a Boolean value denoting whether the fields are categorical. The explanatory variables help predict the value or category of the variablePredict parameter. Use the categorical parameter for any variables that represent classes or categories (such as land cover or presence or absence). Specify the variable as true for any that represent classes or categories such as land cover or presence or absence and false if the variable is continuous. In the example below, fieldName is the name of the field in the inFeatures used to predict the variablePredict, and categorical is either true or false. A string field should always be set as true, and a continuous value should always be set as false. REST Examples
|
numberOfTrees (Optional) | The number of trees to create in the forest model. More trees will generally result in more accurate model prediction, but the model will take longer to calculate. The default number of trees is 100. Values must be greater than 0. REST Examples
|
minimumLeafSize (Optional) | The minimum number of observations required to keep a leaf (that is, the terminal node on a tree without further splits). The default minimum for regression is 5, and the default for classification is 1. For very large data, increasing these numbers will decrease the run time of the tool. REST Examples
|
maximumTreeDepth (Optional) | The maximum number of splits that will be made down a tree. Using a large maximum depth, more splits will be created, which may increase the chances of overfitting the model. The default is data driven and depends on the number of trees created and the number of variables included. REST Examples
|
sampleSize (Optional) | The percentage of the inFeatures used for each decision tree. The default is 100 percent of the data. Samples for each tree are taken randomly from two-thirds of the data specified. Each decision tree in the forest is created using a random sample or subset (approximately two-thirds) of the training data available. Using a lower percentage of the input data for each decision tree increases the speed of the tool for very large dataset. REST Examples
|
randomVariables (Optional) | The number of explanatory variables used to create each decision tree. Each of the decision trees in the forest is created using a random subset of the explanatory variables specified. Increasing the number of variables used in each decision tree will increase the chances of overfitting your model, particularly if there are one or two dominate variables. A common practice is to use the square root of the total number of explanatory variables if your variablePredict is categorical, or to divide the total number of explanatory variables by 3 if variablePredict is numeric. REST Examples
|
percentageForValidation (Optional) | The percentage (between 0 percent and 50 percent) of inFeatures to reserve as the test dataset for validation. The default is 10 percent. The model will be trained without its random subset of data, and the observed values for those features will be compared to the predicted value. REST Examples
|
createVariableOfImportanceTable (Optional) | A Boolean that specifies whether an output table will be generated that contains information describing the importance of each explanatory variable used in the model created. Values: true | false REST Examples
|
explanatoryVariableMatching (Optional) | A list of the explanatoryVariables specified from the inFeatures and their corresponding fields from the featuresToPredict. By default, if an explanatoryVariable is not mapped, it will match to a field with the same name in the featuresToPredict. This parameter is only used if there is a featuresToPredict input. You do not need to use it if the names and types of the fields match between your two input datasets. predictionLayerField is the name of a field specified in the explanatoryVariables parameter and trainingLayerField is the field that will match the field in explanatoryVariables. REST Examples
|
outputTrainedName (Required) | The task will create a feature service of the results. You define the name of the service. REST Examples
|
context (Optional) | The context parameter contains additional settings that affect task execution. For this task, there are four settings:
|
f | The response format. The default response format is html. Values: html | json |
Example usage
Below is a sample request URL for ForestBasedClassificationAndRegression:
https://machine.domain.com/webadaptor/rest/services/System/GeoAnalyticsTools/GPServer/FindHotSpots/submitJob?
predictionType=Train&inFeatures={"url":"https://webadaptor.domain.com/server/rest/services/Hurricane/hurricaneTrack/0"}&featuresToPredict={"url":"https://webadaptor.domain.com/server/rest/services/USA/cities/0"}&variablePredict={"fieldName":"shelterCapacity","categorical":true}&explanatoryVariables={"fieldName":"townDensity","categorical":true}&numberOfTrees=20&minimumLeafSize=6&maximumTreeDepth=10&sampleSize=95&randomVariables=3&percentageForValidation=10&createVariableOfImportanceTab=false&explanatoryVariableMatching=[{"predictionLayerField":"Hurricane2019","trainingLayerField":"hurricanesIn2019"},{"predictionLayerField":"ShelterLocations","trainingLayerField":"CorpusChristiShelters"&outputTrainedName=myOutput&context={"extent":{"xmin":-122.68,"ymin":45.53,"xmax":-122.45,"ymax":45.6,"spatialReference":{"wkid":4326}}}&f=json
Response
When you submit a request, the service assigns a unique job ID for the transaction.
{
"jobId": "<unique job identifier>",
"jobStatus": "<job status>"
}
After the initial request is submitted, you can use jobId to periodically check the status of the job and messages as described in Check job status. Once the job has successfully completed, use jobId to retrieve the results. To track the status, you can make a request of the following form:
https://<analysis url>/ForestBasedClassificationAndRegression/jobs/<jobId>
Access results
When the status of the job request is esriJobSucceeded, you can access the results of the analysis by making a request of the following form:
https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/<response type>?token=<your token>&f=json
Response | Description |
---|---|
outputTrained | The input features that are fit to the model. The type of feature (point, line, or polygon) depends on the input layers.
The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter provided in the initial request. The value contains the URL of the feature service layer.
See Feature output for more information about how the result layer is accessed. |
outputPredicted | The features predicted using the model. The type of feature (table, point, line, or polygon) depends on the input layers. This result is optional and is only returned when featureToPredict is provided as input.
|
variableOfImportance | A table representing the variable of importance from the model fit. This result is optional and is only returned when createVariableImportanceTable is true.
|
processInfo | The processInfo parameter contains strings that summarize the ForestBasedClassificationAndRegression result. These strings are used for reporting tool results. You can create custom reports for your application using these strings. There are four parts in the returned JSON as follows:
|