Skip To Content

Describe Dataset

Description

Describe Dataset workflow diagram

The DescribeDataset operation provides an overview of your big data. By default, the tool outputs a table layer containing calculated field statistics and a JSON string outlining geometry and time settings for the input layer.

Optionally, the tool can also output a feature layer representing a sample of your input features or a single polygon feature layer that represents the extent of your input features. You can choose to output one, both, or none.

For example, imagine you are tasked with completing an analysis workflow on a large volume of data. You want to try the workflow, but it could take hours or days with your full dataset. Instead of using time and resources running the full analysis, first create a sample layer to efficiently test your workflow before running it on the full dataset.

Request parameters

ParameterDetails
inputLayer

(Required)

The table, point, line, or polygon feature layer that will be described, summarized, and sampled.

Syntax: As described in Feature input, this parameter can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A URL to a big data catalog service layer with an optional filter to select specific features
  • A feature collection

REST examples

//REST web example
{"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}

//REST scripting example
"inputLayer": {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}
sampleSize

(Optional)

The task will output a feature layer representing a sample of features from the inputLayer. Specify the number of sample features to return. If the input value is null, 0, or empty, no sample layer will be created. The output will have the same schema, geometry, and time type as the input layer. The default is null.

REST examples

//REST web example
450

//REST scripting example
"sampleSize": 450
extentOutput

(Optional)

The task will output a single rectangle feature representing the extent of the inputLayer if this value is set to true. The default is false.

Values: true | false

REST examples

//REST web example
true

//REST scripting example
"extentOutput": true
outputName

(Required)

This value is required when you choose to output an extent feature layer or sample feature layer. The task will create a feature service of the resulting layers. You define the name of the service.

REST examples

//REST web example
myOutput

//REST scripting example
"outputName": "myOutput"
context

(Optional)

The context parameter contains additional settings that affect task execution. For this task, there are four settings:

  • Extent (extent)—A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed.
  • Processing spatial reference (processSR)—The features will be projected into this coordinate system for analysis.
  • Output spatial reference (outSR)—The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84.
  • Data store (dataStore)—Results will be saved to the specified data store. The default is the spatiotemporal big data store.

Syntax:
{
"extent" : {extent},
"processSR" : {spatial reference},
"outSR" : {spatial reference},
"dataStore":{data store}
}
f

The response format. The default response format is html.

Values: html | json

Example usage

Below is a sample request URL for DescribeDataset:

https://hostname.domain.com/webadaptor/rest/services/System/GeoAnalyticsTools/GPServer/DescribeDataset/submitJob?inputLayer={"url":"https://hostname.domain.com/webadaptor/rest/services/Hurricane/hurricaneTrack/0"}&sampleSize=450&extentOutput=true&outputName=myOutput&context={"extent":{"xmin":-122.68,"ymin":45.53,"xmax":-122.45,"ymax":45.6,"spatialReference":{"wkid":4326}}}&f=json

Response

When you submit a request, the service assigns a unique job ID for the transaction.

Syntax:
{
"jobId": "<unique job identifier>",
"jobStatus": "<job status>"
}

After the initial request is submitted, you can use jobId to periodically check the status of the job and messages as described in Check job status. Once the job has successfully completed, use jobId to retrieve the results. To track the status, you can make a request of the following form:

https://<analysis url>/DescribeDataset/jobs/<jobId>

Access results

When the status of the job request is esriJobSucceeded, you can access the results of the analysis by making a request of the following form:

https://<analysis-url>/DescribeDataset/jobs/<jobId>/results/<result type>?token=<your token>&f=json

The DescribeDataset operation has the following result types: outputJSON, output, extentLayer, sampleLayer, and processInfo.

ResponsesDescription
outputJSON

outputJSON returns a JSON that details the properties of the input layer.

The following characteristics will be defined in the output JSON:

  • datasetName—The name of the inputLayer. In the following example, the input layer name is my_bigdata_dataset.
  • datasetSource—The storage location for the input dataset. This could be one of the following: ArcGIS Data Store - Relational, ArcGIS Data Store - Spatiotemporal, Big Data File Share - <your big data file share name>, Feature Collection, or Remote Feature Service. In the following example, the dataset source is a big data file share named my_registered_file_share.
  • recordCount—The count of nonempty input records. The output below shows the input layer has 234 records.
  • geometry—A list of input layer geometry settings including geometry type, spatial reference, spatial extent, and record counts. In the following example, the input layer has point geometry and a spatial reference of 4326, and six records do not have a geometry. A record may be counted as having no geometry if the geometry value is outside of the valid extent of the input layer.
  • time—A list of input layer time settings including time type, record counts, and temporal extent. In the following example, the input features have time of type interval and four of the time values are empty.

{"url": "https://<analysis-url>/DescribeDataset/jobs/<jobId>/results/outputJSON"}

The result has properties for parameter name, data type, and value. The value property is a JSON that defines general inputLayer characteristics.


{	
    "paramName": "outputJSON",
    "dataType": "GPString",	
    "value": {
        "datasetName": "my_bigdata_dataset",	
        "datasetSource": "Big Data File Share - my_registered_file_share",
        "recordCount": 236,		
        "geometry": {
            "geometryType": "Point",			
            "sref": {"wkid": 4326},			
            "countNonEmpty": 230,			
            "countEmpty": 6,			
            "spatialExtent": {
                "xmin": 895229.0608758491,
                "ymin": 557949.5851721496,				
                "xmax": 915995.4702218114,				
                "ymax": 597425.2187718959			
            }
        },
        "time": {
            "timeType": "Interval",
            "countNonEmpty": 232,			
            "countEmpty": 4,		
           	"temporalExtent":{
                "startTime": 1420059600000,				
                "endTime": 1420070280000			
            }
        }
    }
}

See Feature output for more information about how the result layer is accessed.

output

By default, output will return a table of field statistics.

For numeric fields, the following statistics will be calculated:

  • Count—Totals the number of values of all the features in the field.
  • Sum—Calculates the total value of all the features in the field.
  • Mean—Calculates the average of all the features in the field.
  • Min—Finds the smallest value of all the features in the field.
  • Max—Finds the largest value of all the features in the field.
  • Range—Finds the difference between the Min and Max values.
  • Stddev—Finds the standard deviation of all the features in the field.
  • Var—Finds the variance of all the features in the field.

For date fields, the following statistics will be calculated:

  • Count—Totals the number of values of all the features in the field.
  • Min—Finds the earliest date value of all the features in the field.
  • Max—Finds the latest date value of all the features in the field.
  • Range—Finds the difference between the Min and Max date values.

For string fields, the following statistics will be calculated:

  • Count—Totals the number of strings for all the features in the field.
  • Any—Returns a sample string of features in the field.

{"url": "https://<analysis-url>/DescribeDataset/jobs/<jobId>/results/output"}

The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter provided in the initial request. The value contains the URL of the feature service layer.


{
  "paramName":"output", 
  "dataType":"GPRecordSet",
  "value":{"url":"<hosted feature service layer url>"}
}
extentLayer

Setting extentLayer to true returns a single polygon feature equal to the extent of the input features. Context settings will be used while creating this layer.

{"url": "https://<analysis-url>/DescribeDataset/jobs/<jobId>/results/extentLayer"}

The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter provided in the initial request. The value contains the URL of the feature service layer.


{
  "paramName":"extentLayer", 
  "dataType":"GPRecordSet",
  "value":{"url":"<hosted feature service layer url>"}
}

See Feature output for more information about how the result layer is accessed.

sampleLayer

sampleLayer returns a subset of the input layer as a feature layer with the same geometry type, time type, and schema as the input. Context settings will be used while creating this output layer. This layer is only output if the sampleSize value is set to 1 or greater.

{"url": "https://<analysis-url>/DescribeDataset/jobs/<jobId>/results/sampleLayer"}

The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter provided in the initial request. The value contains the URL of the feature service layer.


{
  "paramName":"sampleLayer", 
  "dataType":"GPRecordSet",
  "value":{"url":"<hosted feature service layer url>"}
}

See Feature output for more information about how the result layer is accessed.

processInfo

The processInfo output contains strings that summarize the Describe Dataset result. These strings are used for reporting by the Describe Dataset tool in the portal's Map Viewer Classic. You can create your own custom reports for your application using these strings. There are four parts in the returned JSON:

  • messageCode—The serial number for each unique message.
  • message—Text that may or may not contain parameters (in ${paramsName} format) that need to be replaced by values.
  • params—A dictionary of the keys and values to be inserted into the ${paramsName} parameter in the message.
  • style—The formatting of the report produced by the Describe Dataset tool in Map Viewer.

{
  "messageCode" : "BD_101220",
  "message" : ["Dataset name","MY_DATASET_NAME"],
  "params" : {},
  "style" : "<table><tr><th></th><th></th><th></th><th></th><th></th><th></th></tr>"
}