Skip To Content

Find Point Clusters

Description

Find Point Clusters

The FindPointClusters operation extracts clusters from your input point features and identifies any surrounding noise.

Two clustering methods can be used, DBSCAN or HDBSCAN. Both methods can find clusters in space, while DBSCAN can find spatiotemporal clusters in time-enabled point layers.

For example, a nongovernmental organization is studying a particular pest-borne disease. It has a point dataset representing households in a study area, some of which are infested, and some of which are not. By using the Find Point Clusters tool, an analyst can identify clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.

To learn more, see the ArcGIS Pro documentation on How Density-based Clustering works.

Request parameters

ParameterDetails
inputLayer

(Required)

The point features from which clusters will be found.

Syntax: As described in Feature input, this parameter can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A URL to a big data catalog service layer with an optional filter to select specific features
  • A feature collection

REST Examples

//REST web example
{"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}

//REST scripting example
"inputLayer": {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}
clusterMethod

(Required)

The algorithm used for cluster analysis. This parameter must be specified as either DBSCAN or HDBSCAN. The DBSCAN algorithm uses a specified distance to separate dense clusters from sparser noise. DBSCAN is faster than HDBSCAN, but is only appropriate if there is a very clear searchDistance to use that works well to define all clusters that may be present. DBSCAN finds clusters that have similar densities. The HDBSCAN algorithm finds clusters of points similar to DBSCAN but uses varying distances allowing for clusters with varying densities based on cluster probability (or stability). HDBSCAN is very data-driven and does not require or use searchDistance, but is a more time-consuming calculation than DBSCAN.

The DBSCAN algorithm finds clusters in two-dimensional space only by default. When timeMethod is set to Linearand inputLayer is time enabled and is of type instant, DBSCAN will discover clusters in both space and time. When searching for cluster members, minFeaturesCluster must be found within a specified search range and search duration to form a cluster. Temporal clustering is available at ArcGIS Enterprise 10.8. HDBSCAN currently only supports spatial clustering and will not use time to discover clusters.

Note:

When using the HDBSCAN algorithm with an input layer containing more than 3 million features, the tool may fail unless you increase the value of the javaHeapSize parameter on the GeoAnalyticsTools GP Service. Roughly 2 GB of heap size is needed per 3 million features. The amount of RAM specified by javaHeapSize should be available on each GeoAnalytics Server machine in addition to the 16 GB normally required by GeoAnalytics Server. For example, if you want to cluster 9 million features with HDBSCAN, you should set javaHeapSize to no less than 6144 MB (or 6 GB). In this case, each GeoAnalytics Server machine should have a total of at least 22 GB or RAM available.

REST Examples

//REST web example
DBSCAN

//REST scripting example
"clusterMethod": "DBSCAN"

timeMethod

(Optional)

When this parameter is set to Linear and clusterMethod is DBSCAN, both space and time will be used to find point clusters. If clusterMethod is HDBSCAN, this parameter will be ignored and clusters will be found in space only. This parameter can only be used if inputLayer has time enabled and is of type instant. Temporal clustering is available at ArcGIS Enterprise 10.8.

REST web example: Linear

REST scripting example: "timeMethod" : "Linear"

minFeaturesCluster

(Required)

This parameter is used differently depending on the clustering method chosen. For DBSCAN, this parameter specifies the number of features that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search range distance is set using the searchDistance parameter. For HDBSCAN, the minFeaturesCluster parameter specifies the number of features neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

REST Examples

//REST web example
10

//REST scripting example
"minFeaturesCluster": 5
searchDistance

(Optional)

When using DBSCAN, this parameter is the distance within which minFeaturesCluster must be found. This parameter is not used when HDBSCAN is chosen as the clustering method.

REST Examples

//REST web example
108.3

//REST scripting example
"searchDistance": 100
searchDistanceUnit

(Optional)

The units used for the searchDistance parameter. This parameter is required when using DBSCAN but will not be used with HDBSCAN.

Values: Meters | Kilometers | Feet | FeetInt | FeetUS | Miles | MilesInt | MilesUS | NauticalMiles | NauticalMilesInt | NauticalMilesUS | Yards | YardsInt | YardsUS

REST Examples

//REST web example
Meters

//REST scripting example
"searchDistanceUnit": "Miles"

searchDuration

(Optional)

When using DBSCAN with timeMethod set as Linear, this parameter is the time duration within which minFeaturesCluster must be found. This parameter is not used when HDBSCAN is chosen as the clustering method or when timeMethod is not used.

searchDurationUnit

(Optional)

The units used for the searchDuration parameter. This parameter is required when using DBSCAN but will not be used with HDBSCAN or space-only DBSCAN.

outputName

(Required)

The task will create a feature service of the results. You define the name of the service.

REST Examples

//REST web example
myOutput

//REST scripting example
"outputName": "myOutput"
context

(Optional)

The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  • Extent (extent)—A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.
  • Processing spatial reference (processSR)—The features will be projected into this coordinate system for analysis.
  • Output spatial reference (outSR)—The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.
  • Data store (dataStore)—Results will be saved to the specified data store. The default is the spatiotemporal big data store.

Syntax:
{
"extent" : {extent},
"processSR" : {spatial reference},
"outSR" : {spatial reference},
"dataStore":{data store}
}

f

The response format. The default response format is html.

Values: html | json

Example usage

Below is a sample request URL for FindPointClusters:

https://hostname.domain.com/webadaptor/rest/services/System/GeoAnalyticsTools/GPServer/FindHotSpots/submitJob?inputLayer={"url":"https://hostname.domain.com/webadaptor/rest/services/Hurricane/hurricaneTrack/0"}&clusterMethod=HDBSCAN&minFeaturesCluster=10&searchDistance=&searchDistanceUnit=&outputName=myOutput&context={"extent":{"xmin":-122.68,"ymin":45.53,"xmax":-122.45,"ymax":45.6,"spatialReference":{"wkid":4326}}}&f=json

Response

When you submit a request, the service assigns a unique job ID for the transaction.

Syntax:
{
"jobId": "<unique job identifier>",
"jobStatus": "<job status>"
}

After the initial request is submitted, you can use jobId to periodically check the status of the job and messages as described in Check job status. Once the job has successfully completed, use jobId to retrieve the results. To track the status, you can make a request of the following form:

https://<analysis url>/FindPointClusters/jobs/<jobId>

Access results

When the status of the job request is esriJobSucceeded, you can access the results of the analysis by making a request of the following form:

https://<analysis-url>/FindPointClusters/jobs/<jobId>/results/output?token=<your token>&f=json

ResponseDescription
output

The output parameter will contain the cluster results. Fields added to output include all the fields from the inputLayer and the following:

  • CLUSTER_ID—A numeric value showing you which cluster a feature falls into. A feature with a CLUSTER_ID of -1 does not fall into a cluster and is noise.
  • COLOR_ID—An ID value used for rendering results. Multiple clusters will each be assigned a different color. Colors will be assigned and repeated so that each cluster is visually distinct from its neighboring clusters.

When the HDBSCAN algorithm is used to find clusters, the following fields will also be added to output:

  • PROB—The probability that a feature belongs in its assigned cluster.
  • OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates that the feature is more likely to be an outlier.
  • EXEMPLAR— Indicates which features are most representative of each cluster. These features are indicated by a value of 1.
  • STABILITY— The persistence of each cluster across a range of scales. A larger score indicates that a cluster persists over a wider range of distance scales.

{"url": "https://<analysis-url>/FindPointClusters/jobs/<jobId>/results/output"}

The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter provided in the initial request. The value contains the URL of the feature service layer.

{
"paramName":"output", 
"dataType":"GPRecordSet",
"value":{"url":"<hosted featureservice layer url>"}
}

See Feature output for more information about how the result layer is accessed.