Skip To Content

Forest-based Classification And Regression

  • URL:https://<geoanalytics-url>/ForestBasedClassificationAndRegression
  • Version Introduced:10.7

Description

Forest-based Classification And Regression

The ForestBasedClassificationAndRegression operation creates models and generates predictions using an adaptation of Leo Breiman's random forest algorithm, which is a supervised machine learning method. Predictions can be performed for both categorical variables (classification) and continuous variables (regression). Explanatory variables can take the form of fields in the attribute table of the training features. In addition to validation of model performance on the training data, predictions can be made to another feature dataset.

The following are examples:

  • You have seagrass occurrence and a number of environmental explanatory variables that have been enriched using a multivariable grid to calculate distances to factories upstream and major ports. Future seagrass occurrence can be predicted based on future projections for those same environmental explanatory variables.
  • You have crop yield data at hundreds of farms across the country, along with other attributes at each of those farms (number of employees, acreage, and so on). Using this data, you can provide a set of features representing farms where you don't have crop yield (but you do have all of the other variables), and make a prediction about crop yield.
  • Housing values can be predicted based on the prices of houses sold in the current year. The sale price of homes sold, along with information about the number of bedrooms, distance to schools, proximity to major highways, average income, and crime counts, can be used to predict sale prices of similar homes.

Request parameters

ParameterDetails
predictionType

(Required)

Specifies the operation mode of the tool. The tool can be run to train a model to only assess performance or to train a model and predict features. Prediction types are as follows:

  • Train—This is the default. A model will be trained, but no predictions will be generated. Use this option to assess the accuracy of your model before generating predictions.
  • TrainAndPredict—Predictions or classifications will be generated for features. Explanatory variables must be provided for both the training features and the features to be predicted. The output of this option will be a feature service, model diagnostics, and an optional table of variable importance.

REST Examples

//REST web example
Train

//REST scripting example
"predictionType": "TrainAndPredict"
inFeatures

(Required)

The features that will be used to train the dataset. This layer must include fields representing the variable to predict and the explanatory variables.

Syntax: As described in Feature input, this parameter can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A URL to a big data catalog service layer with an optional filter to select specific features
  • A feature collection

REST Examples

//REST web example
{"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}

//REST scripting example
"inFeatures": {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}
featuresToPredict

(Required if using TrainAndPredict)

A feature layer representing locations where predictions will be made. This layer must include explanatory variable fields that correspond to fields used in inFeatures. This parameter is only used when predictionType is TrainAndPredict and is required in that case.

Syntax: As described in Feature input, this parameter can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A URL to a big data catalog service layer with an optional filter to select specific features
  • A feature collection

REST Examples

//REST web example
{"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}

//REST scripting example
"featuresToPredict": {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}
variablePredict

(Required)

The variable from the inFeatures parameter containing the values to be used to train the model and a Boolean denoting whether it's categorical. This field contains known (training) values of the variable that will be used to predict at unknown locations.

REST Examples

//REST web example
{"fieldName": "variablePredict", "categorical": true}

//REST scripting example
"variablePredict": {"fieldName": "variablePredict", "categorical": true}
explanatoryVariables

(Required)

A list of fields representing the explanatory variables and a Boolean value denoting whether the fields are categorical. The explanatory variables help predict the value or category of the variablePredict parameter. Use the categorical parameter for any variables that represent classes or categories (such as land cover or presence or absence). Specify the variable as true for any that represent classes or categories such as land cover or presence or absence and false if the variable is continuous.

In the example below, fieldName is the name of the field in the inFeatures used to predict the variablePredict, and categorical is either true or false. A string field should always be set as true, and a continuous value should always be set as false.

REST Examples

//REST web example[{"fieldName": "CrimeType", "categorical": true},{"fieldName": "population", "categorical": false}]

//REST scripting example
"variablePredict": [{"fieldName": "isSunny", "categorical": true},{"fieldName": "isWeekend","categorical": true},{"fieldName": "hoursOutside", "categorical": false}]
numberOfTrees

(Optional)

The number of trees to create in the forest model. More trees will generally result in more accurate model prediction, but the model will take longer to calculate. The default number of trees is 100. Values must be greater than 0.

REST Examples

//REST web example
20

//REST scripting example
"numberOfTrees": 50
minimumLeafSize

(Optional)

The minimum number of observations required to keep a leaf (that is, the terminal node on a tree without further splits). The default minimum for regression is 5, and the default for classification is 1. For very large data, increasing these numbers will decrease the run time of the tool.

REST Examples

//REST web example
3

//REST scripting example
"minimumLeafSize": 6
maximumTreeDepth

(Optional)

The maximum number of splits that will be made down a tree. Using a large maximum depth, more splits will be created, which may increase the chances of overfitting the model. The default is data driven and depends on the number of trees created and the number of variables included.

REST Examples

//REST web example
14

//REST scripting example
"minimumLeafSize": 10
sampleSize

(Optional)

The percentage of the inFeatures used for each decision tree. The default is 100 percent of the data. Samples for each tree are taken randomly from two-thirds of the data specified. Each decision tree in the forest is created using a random sample or subset (approximately two-thirds) of the training data available. Using a lower percentage of the input data for each decision tree increases the speed of the tool for very large dataset.

REST Examples

//REST web example
95

//REST scripting example
"sampleSize": 70
randomVariables

(Optional)

The number of explanatory variables used to create each decision tree. Each of the decision trees in the forest is created using a random subset of the explanatory variables specified. Increasing the number of variables used in each decision tree will increase the chances of overfitting your model, particularly if there are one or two dominate variables. A common practice is to use the square root of the total number of explanatory variables if your variablePredict is categorical, or to divide the total number of explanatory variables by 3 if variablePredict is numeric.

REST Examples

//REST web example
3

//REST scripting example
"randomVariables": 2
percentageForValidation

(Optional)

The percentage (between 0 percent and 50 percent) of inFeatures to reserve as the test dataset for validation. The default is 10 percent. The model will be trained without its random subset of data, and the observed values for those features will be compared to the predicted value.

REST Examples

//REST web example
15

//REST scripting example
"percentageForValidation": 45
createVariableOfImportanceTable

(Optional)

A Boolean that specifies whether an output table will be generated that contains information describing the importance of each explanatory variable used in the model created.

Values: true | false

REST Examples

//REST web example
false

//REST scripting example
"createVariableImportanceTable": false
explanatoryVariableMatching

(Optional)

A list of the explanatoryVariables specified from the inFeatures and their corresponding fields from the featuresToPredict. By default, if an explanatoryVariable is not mapped, it will match to a field with the same name in the featuresToPredict. This parameter is only used if there is a featuresToPredict input. You do not need to use it if the names and types of the fields match between your two input datasets.

predictionLayerField is the name of a field specified in the explanatoryVariables parameter and trainingLayerField is the field that will match the field in explanatoryVariables.

REST Examples

//REST web example
[{"predictionLayerField": "CrimeType", "trainingLayerField": "TypeOfCrime"},{"predictionLayerField": "population", "trainingLayerField": "population"}]

//REST scripting example
"variablePredict": [{"predictionLayerField": "isSunny", "trainingLayerField": "isSunny2010"}]
outputTrainedName

(Required)

The task will create a feature service of the results. You define the name of the service.

REST Examples

//REST web example
myOutput

//REST scripting example
"outputName": "myOutput"
context

(Optional)

The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  • Extent (extent)—A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.
  • Processing spatial reference (processSR)—The features will be projected into this coordinate system for analysis.
  • Output spatial reference (outSR)—The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.
  • Data store (dataStore)—Results will be saved to the specified data store. The default is the spatiotemporal big data store.

Syntax:
{
"extent" : {extent},
"processSR" : {spatial reference},
"outSR" : {spatial reference},
"dataStore":{data store}
}

f

The response format. The default response format is html.

Values: html | json

Example usage

Below is a sample request URL for ForestBasedClassificationAndRegression:

https://machine.domain.com/webadaptor/rest/services/System/GeoAnalyticsTools/GPServer/FindHotSpots/submitJob?
predictionType=Train&inFeatures={"url":"https://webadaptor.domain.com/server/rest/services/Hurricane/hurricaneTrack/0"}&featuresToPredict={"url":"https://webadaptor.domain.com/server/rest/services/USA/cities/0"}&variablePredict={"fieldName":"shelterCapacity","categorical":true}&explanatoryVariables={"fieldName":"townDensity","categorical":true}&numberOfTrees=20&minimumLeafSize=6&maximumTreeDepth=10&sampleSize=95&randomVariables=3&percentageForValidation=10&createVariableOfImportanceTab=false&explanatoryVariableMatching=[{"predictionLayerField":"Hurricane2019","trainingLayerField":"hurricanesIn2019"},{"predictionLayerField":"ShelterLocations","trainingLayerField":"CorpusChristiShelters"&outputTrainedName=myOutput&context={"extent":{"xmin":-122.68,"ymin":45.53,"xmax":-122.45,"ymax":45.6,"spatialReference":{"wkid":4326}}}&f=json

Response

When you submit a request, the service assigns a unique job ID for the transaction.

Syntax:
{
"jobId": "<unique job identifier>",
"jobStatus": "<job status>"
}

After the initial request is submitted, you can use jobId to periodically check the status of the job and messages as described in Check job status. Once the job has successfully completed, use jobId to retrieve the results. To track the status, you can make a request of the following form:

https://<analysis url>/ForestBasedClassificationAndRegression/jobs/<jobId>

Access results

When the status of the job request is esriJobSucceeded, you can access the results of the analysis by making a request of the following form:

https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/<response type>?token=<your token>&f=json

ResponseDescription
outputTrained

The input features that are fit to the model. The type of feature (point, line, or polygon) depends on the input layers.

{"url": "https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/outputTrained"}

The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter provided in the initial request. The value contains the URL of the feature service layer.


{
  "paramName":"outputTrained", 
  "dataType":"GPRecordSet",
  "value":{"url":"<hosted featureservice layer url>"}
}

See Feature output for more information about how the result layer is accessed.

outputPredicted

The features predicted using the model. The type of feature (table, point, line, or polygon) depends on the input layers. This result is optional and is only returned when featureToPredict is provided as input.

{"url": "https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/outputPredicted"}
variableOfImportance

A table representing the variable of importance from the model fit. This result is optional and is only returned when createVariableImportanceTable is true.

{"url": "https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/variableOfImportance"}
processInfo

The processInfo parameter contains strings that summarize the ForestBasedClassificationAndRegression result. These strings are used for reporting tool results. You can create custom reports for your application using these strings. There are four parts in the returned JSON as follows:

  • messageCode—The serial number for each unique message
  • message—Text that may or may not contain parameters (in ${paramsName} format) that must be replaced by values
  • params—A dictionary of the keys and values to be inserted into the ${paramsName} parameter in the message
  • style—The formatting of the report produced by the Forest-based Classification And Regression tool in the map viewer.

{
  "messageCode" : "SS_84507",
  "message" : ["Attribute", "Min", "Max", "SD", "Mean","Input"],
  "params" : {},
  "style" : "<table><tr><th></th><th></th><th></th><th></th><th></th><th></th></tr>"
}