The Run Python Script task allows you to programmatically execute most GeoAnalytics Tools with Python using an API that is available when you run the task. A geoanalytics object is instantiated automatically and gives you access to each tool using the syntax shown in the example and table below. Each tool accepts input layers as Spark DataFrames and will return results as a Spark DataFrame or collection of Spark DataFrames. To learn more, see Reading and writing layers in pyspark. DataFrames are held in memory and can be written to a data store at any time. This allows you to chain together multiple GeoAnalytics Tools without writing out intermediate results.
Note:
The API described in this topic can only be used within the Run Python Script task and should not be confused with the ArcGIS API for Python, which uses a different syntax to execute stand-alone GeoAnalytics Tools and is intended for use outside of the Run Python Script task.
In the example below, the Detect Incidents task and Find Hot Spots task are used together and only the final DataFrame is written to a data store as a feature service layer. The input layer (represented in the example below by layers[0]) is a big data file share dataset of city bus locations recorded at 1-minute intervals for 15 days. To learn more about using layers, see Reading and writing layers in pyspark. Chaining together GeoAnalytics Tools with DataFrames
import time
# Run Detect Incidents to find all bus locations where delay status has changed from False to True
exp = "var dly = $track.field[\"dly\"].history(-2); return dly[0]==\"False\" && dly[1]==\"True\""
delay_incidents = geoanalytics.detect_incidents(input_layer = layers[0], track_fields = ["vid"], start_condition_expression = exp, output_mode = "Incidents")
# Use the resulting DataFrame as input to the Find Hot Spots task
delay_hotspots = geoanalytics.find_hot_spots(point_layer = delay_incidents, bin_size = 0.1, bin_size_unit = "Miles", neighborhood_distance = 1, neighborhood_distance_unit = "Miles", time_step_interval = 1, time_step_interval_unit = "Days")
# Write the Find Hot Spots result to the spatiotemporal big data store
delay_hotspots.write.format("webgis").save("Bus_Delay_HS_{0}".format(time.time()))
For more examples, see Examples: Scripting custom analysis with the Run Python Script task.
The table below describes the method signature for GeoAnalytics Tools in Run Python Script. All tools can be called except for Copy To Data Store and Append Data. The parameter syntax is the same as that of the REST API except where noted. See the documentation for each tool for descriptions of parameter syntax and tool outputs.
Note:
For all tool methods with time_step_repeat and time_step_repeat_unit arguments, these correspond to the timeStepRepeatInterval and timeStepRepeatIntervalUnit REST parameters, respectively.Tool | Syntax | Returns | Notes |
aggregate_points(point_layer, bin_type = None, bin_size = None, bin_size_unit = None, polygon_layer = None, time_step_interval = None, time_step_interval_unit = None, time_step_repeat = None, time_step_repeat_unit = None, time_step_reference = None, summary_fields = None) | DataFrame | ||
build_multi_variable_grid(bin_type = "Square", bin_size = None, bin_size_unit = None, input_layers = None, variable_calculations = None) | DataFrame | input_layers should be list of DataFrames. | |
calculate_density(input_layer, fields = None, weight = "Uniform", bin_type = "Square", bin_size = None, bin_size_unit = None, time_step_interval = None, time_step_interval_unit = None, time_step_repeat = None, time_step_repeat_unit = None, time_step_reference = None, radius = None, radius_unit = None, area_units = "SquareKilometers") | DataFrame | ||
calculate_field(input_layer, field_name, data_type, expression, track_aware = None, track_fields = None, time_boundary_split = None, time_boundary_split_unit = None, time_boundary_reference = None) | DataFrame | ||
calculate_motion_statistics(input_layer, track_fields, track_history_window = 3, motion_statistics = ["All"], idle_distance_tolerance = None, idle_distance_tolerance_unit = None, idle_time_tolerance = None, idle_time_tolerance_unit = None, time_boundary_split = None, time_boundary_split_unit = None, time_boundary_reference = None, distance_method = "Geodesic", distance_unit = "Meters", duration_unit = "Seconds", speed_unit = "MetersPerSecond", acceleration_unit = "MetersPerSecondSquared", elevation_unit = "Meters") | DataFrame | ||
clip_layer(input_layer, clip_layer) | DataFrame | ||
create_buffers(input_layer, distance = None, distance_unit = None, field = None, method = "Planar", dissolve_option = "None", dissolve_fields = None, summary_fields = None, multipart = False) | DataFrame | ||
create_space_time_cube(point_layer, bin_size, bin_size_unit, time_step_interval, time_step_interval_unit, time_step_alignment = None, time_step_reference = None, summary_fields = None, output_name = None) | String | Returns the local path to the resulting space-time cube on a ArcGIS GeoAnalytics Server machine. The cube is written to a temp directory and will be deleted if not copied to a different location. | |
describe_dataset(input_layer, sample_size = None, extent_output = False) | Dictionary | Example result: {"output":<DataFrame>, "outputJSON":<string>,"extentLayer":<DataFrame>,"sampleLayer":<DataFrame>} | |
detect_incidents(input_layer, track_fields, start_condition_expression, end_condition_expression = None, output_mode = "AllFeatures", time_boundary_split = None, time_boundary_split_unit = None, time_boundary_reference = None) | DataFrame | ||
dissolve_boundaries(input_layer, dissolve_fields = None, summary_fields = None, multipart = False) | DataFrame | ||
enrich_from_multi_variable_grid(input_features, grid_layer, enrich_attributes = None) | DataFrame | ||
find_dwell_locations(input_layer, track_fields, distance_method = "Planar", distance_tolerance, distance_tolerance_unit, time_tolerance, time_tolerance_unit, summary_fields = None, output_type = "DwellMeanCenters", time_boundary_split = None, time_boundary_split_unit = None, time_boundary_reference = None) | DataFrame | ||
find_hot_spots(point_layer, bin_size, bin_size_unit, neighborhood_distance, neighborhood_distance_unit, time_step_interval = None, time_step_interval_unit = None, time_step_alignment = None, time_step_reference = None) | DataFrame | ||
find_point_clusters(input_layer, cluster_method = "DBSCAN", time_method = None, search_duration = None, search_duration_unit = None, min_features_cluster = None, search_distance = None, search_distance_unit = None) | DataFrame | ||
find_similar_locations(input_layer, search_layer, analysis_fields, most_or_least_similar = "MostSimilar", match_method = "AttributeValues", number_of_results = 10, append_fields = None) | Dictionary | Example result: {"output":<DataFrame>, "processInfo":<string>} | |
forest_based_classification_and_regression(prediction_type = "Train", in_features = None, features_to_predict = None, variable_predict = None, explanatory_variables = None, number_of_trees = 100, minimum_leaf_size = None, maximum_tree_depth = None, sample_size = 100, random_variables = None, percentage_for_validation = 10, create_variable_importance_table = False, explanatory_variable_matching = None) | Dictionary | Example result: {"outputTrained":<DataFrame>, "variableOfImportance":<DataFrame>,"outputPredicted":<DataFrame>,"processInfo":<string>} | |
generalized_linear_regression(input_layer, features_to_predict = None, dependent_variable = None, explanatory_variables = None, regression_family = "Continuous", generate_coefficient_table = False, explanatory_variable_matching = None, dependent_mapping = None) | Dictionary | Example result: {"output":<DataFrame>, "coefficientTable":<DataFrame>,"outputPredicted":<DataFrame>, "processInfo":<string>} | |
geocode_locations(input_layer, geocode_service_url, geocode_parameters, source_country = None, category = None, include_attributes = None, locator_parameters = None) | DataFrame | ||
geographically_weighted_regression(input_layer, explanatory_variables, dependent_variable, model_type = "Continuous", neighborhood_type = "NumberOfNeighbors", neighborhood_selection_method = "UserDefined", distance_band = None, distance_band_unit = None, number_of_neighbors = None, local_weighting_scheme = "Bisquare") | DataFrame | ||
group_by_proximity(input_layer, spatial_relationship, spatial_near_distance = None, spatial_near_distance_unit = None, temporal_relationship = None, temporal_near_distance = None, temporal_near_distance_unit = None) | DataFrame | ||
join_features(target_layer, join_layer, join_operation = "JoinOneToOne", keep_all_target_features = False, join_fields = None, summary_fields = None, spatial_relationship = None, spatial_near_distance = None, spatial_near_distance_unit = None, temporal_relationship = None, temporal_near_distance = None, temporal_near_distance_unit = None, attribute_relationship = None, join_condition = None) | DataFrame | ||
merge_layers(input_layer, merge_layer, merging_attributes = None) | DataFrame | ||
overlay_layers(input_layer, overlay_layer, overlay_type = "Intersect", include_overlaps = True) | DataFrame | ||
reconstruct_tracks(input_layer, track_fields, method = "Planar", buffer_field = None, summary_fields = None, time_split = None, time_split_unit = None, distance_split = None, distance_split_unit = None, time_boundary_split = None, time_boundary_split_unit = None, time_boundary_reference = None) | DataFrame | ||
summarize_attributes(input_layer, fields, summary_fields = None, time_step_interval = None, time_step_interval_unit = None, time_step_repeat = None, time_step_repeat_unit = None, time_step_reference = None) | DataFrame | ||
summarize_center_and_dispersion(input_layer, summary_type, ellipse_size = None, weight_field = None, group_fields = None) | Dictionary | Example result: {"centralFeatureLayer":<DataFrame>, "meanCenterLayer":<DataFrame>, "medianCenterLayer":<DataFrame>, "ellipseLayer":<DataFrame>} | |
summarize_within(summary_polygons = None, bin_type = None, bin_size = None, bin_size_unit = None, summarized_layer = None, standard_summary_fields = None, weighted_summary_fields = None, sum_shape = True, shape_units = None, group_by_field = None, minority_majority = False, percent_shape = False) | Dictionary | Example result: {"output":<DataFrame>, "groupBySummary":<DataFrame>} | |
trace_proximity_events(input_points, entity_id_field, entities_of_interest_ids = None, entities_of_interest_layer = None, distance_method, spatial_search_distance, spatial_search_distance_unit, temporal_search_distance, temporal_search_distance_unit, include_tracks_layer = false, max_trace_depth = 2147483647, attribute_match_criteria = None) | Dictionary | Example result: {"output":<DataFrame>, "tracksLayer":<DataFrame>} |
In addition to the tools listed above, a project tool is provided with the geoanalytics package that allows you to project the geometry of a DataFrame into the specified spatial reference.
Tool | Syntax | Returns | Notes |
Project | project(input_features, output_coord_system) | DataFrame | input_features is the DataFrame to project and output_coord_system is the WKT or WKID of the spatial reference to use. Example: geoanalytics.project(df, 2796) |