Note:
At 10.9.1 or later, register a big data file share through your portal Contents page. This is the recommend way to register big data file shares. Only use Server Manager for editing if your big data file share was created using Server Manager, and you haven't replaced it with a big data file share in the portal.
A big data file share is an item created in your portal that references a location available to your ArcGIS GeoAnalytics Server. You can use the big data file share location as an input and output to feature data (points, polylines, polygons, and tabular data) of GeoAnalytics tools. When you create a big data file share through your portal contents page, at least two items are created in your portal:
- A data store (big data file share) item
- A big data file share item
- A data store (cloud storage location) item, if you're registering a cloud storage data store for a big data file share
Note:
A big data file share is only available if the portal administrator has enabled GeoAnalytics Server. To learn more about enabling GeoAnalytics Server, see Set up ArcGIS GeoAnalytics Server.
Big data file shares
There are several benefits to using a big data file share:
- You can keep your data in an accessible location until you are ready to perform analysis. A big data file share accesses the data when the analysis is run, so you can continue to add data to an existing dataset in your big data file share without having to reregister or publish your data.
- You can also modify the manifest to remove, add, or update datasets in the big data file share.
- Big data file shares are extremely flexible in how time and geometry can be defined and allow for multiple time formats on a single dataset.
- Big data file shares also allow you to partition your datasets while still treating multiple partitions as a single dataset.
- Using big data file shares for output data allows you to store your results in formats that you may use for other workflows, such as a parquet file for further analysis or storage.
Note:
Big data file shares are only accessed when you run GeoAnalytics Tools. This means that you can only browse and add big data files to your analysis; you cannot visualize the data on a map.
Big data file shares can reference the following input data sources:
- File share—A directory of datasets on a local disk or network share.
- Apache Hadoop Distributed File System (HDFS)—An HDFS directory of datasets.
- Apache Hive—Hive metastore databases.
- Cloud storage—An Amazon Simple Storage Service (S3) bucket, Microsoft Azure Blob container, or Microsoft Azure Data Lake Storage Gen2 store containing a directory of datasets.
When writing results to a big data file share, you can use the following outputs for GeoAnalytics Tools:
- File share
- HDFS
- Cloud storage location
The following file types are supported as datasets for input and output in big data file shares:
- Delimited files (such as .csv, .tsv, and .txt)
- Shapefiles (.shp)
- Parquet files (.parquet)
Note:
Only unencrypted parquet files are supported.
- ORC files (.orc)
Big data file shares are one of several ways GeoAnalytics Tools can access your data and are not a requirement for GeoAnalytics Tools. See Use the GeoAnalytics Tools in Map Viewer Classic for a list of possible GeoAnalytics Tools data inputs and outputs.
You can register as many big data file shares as you need. Each big data file share can have as many datasets as you want. See Add a big data file share for instructions to register a big data file share with the GeoAnalytics Server site.
The table below outlines some important terms when talking about big data file shares.
Term | Description |
---|---|
Big data file share | A location registered with your GeoAnalytics Server to be used as dataset input, output, or both input and output to GeoAnalytics Tools. |
Big data catalog service | A service that outlines the input datasets and schemas and output template names of your big data file share. This is created when your big data file share is registered, and your manifest is created. To learn more about big data catalog services, see the Big Data Catalog Service documentation in the ArcGIS Services REST API help. |
Big data file share item | An item in your portal that references the big data catalog service. You can control who can use your big data file share as input to GeoAnalytics by sharing this item in portal. |
Manifest | A JSON file that outlines the datasets available and the schema for inputs in your big data file share. The manifest is automatically generated when you register a big data file share and can be modified by editing or using a hints file. A single big data file share has one manifest. |
Output templates | One or more templates that outline to file type and optional formatting when writing results to a big data file share. For example, a template could specify that results are written to a shapefile. A big data file share can have none, one, or more output templates. |
Big data file share type | The type of locations you are registering. For example, you could have a big data file share or type HDFS. |
Big data file share dataset format | The format of the data you are reading or writing. For example, the file type may be shapefile. |
Hints file | An optional file that you can use to assist in generating a manifest for delimited files used as an input. |
Prepare your data to be registered as a big data file share
To use your datasets as inputs in a big data file share, ensure that your data is correctly formatted. See below for the formatting based on the big data file share type.
File shares and HDFS
To prepare your data for a big data file share, you must format your datasets as subfolders under a single parent folder that will be registered. In this parent folder you register, the names of the subfolders represent the dataset names. If your subfolders contain multiple folders or files, all of the contents of the top-level subfolders are read as a single dataset and must share the same schema. The following is an example of how to register the folder FileShareFolder that contains three datasets, named Earthquakes, Hurricanes, and GlobalOceans. When you register a parent folder, all subdirectories under the folder
you specify are also registered with the GeoAnalytics Server. Always register
the parent folder (for example, \\machinename\FileShareFolder)
that contains one or more individual dataset folders.
Example of a big data file share that contains three datasets: Earthquakes, Hurricanes, and GlobalOceans.
|---FileShareFolder < -- The top-level folder is what is registered as a big data file share
|---Earthquakes < -- A dataset "Earthquakes", composed of 4 csvs with the same schema
|---1960
|---01_1960.csv
|---02_1960.csv
|---1961
|---01_1961.csv
|---02_1961.csv
|---Hurricanes < -- The dataset "Hurricanes", composed of 3 shapefiles with the same schema
|---atlantic_hur.shp
|---pacific_hur.shp
|---otherhurricanes.shp
|---GlobalOceans < -- The dataset "GlobalOceans", composed of a single shapefile
|---oceans.shp
This same structure is applied to file shares and HDFS, although the terminology differs. In a file share, there is a top-level folder or directory, and datasets are represented by the subdirectories. In HDFS, the file share location is registered and contains datasets. The following table outlines the differences:
File share | HDFS | |
---|---|---|
Big data file share location | A folder or directory | An HDFS path |
Datasets | Top-level subfolders | Datasets within the HDFS path |
Once your data is organized as a folder with dataset subfolders, make your data accessible to your GeoAnalytics Server by following the steps in Make your data accessible to ArcGIS Server and registering the dataset folder or HDFS path through portal.
Hive
In Hive, all tables in a database are recognized as datasets in a big data file share. In the following example, there is a metastore with two databases, default and CityData. When registering a Hive big data file share , only one database can be selected. In this example, if the CityData database was selected, there would be two datasets in the big data file share, FireData and LandParcels.
|---HiveMetastore < -- The top-level folder is what is registered as a big data file share
|---default < -- A database
|---Earthquakes
|---Hurricanes
|---GlobalOceans
|---CityData < -- A database that is registered (specified in Server Manager)
|---FireData
|---LandParcels
Cloud storage data stores
To prepare your data for a big data file share in a cloud storage location, format your datasets as subfolders under a single parent folder.
The following is an example of how to structure your data. This example registers the parent folder, FileShareFolder, which contains three datasets: Earthquakes, Hurricanes, and GlobalOceans. When you register a parent folder, all subdirectories under the folder
you specify are also registered with GeoAnalytics Server. Example of a how to structure data in a cloud storage location that will be used as a big data file share. This big data file contains three datasets: Earthquakes, Hurricanes, and GlobalOceans.
|---Cloud Store < -- The cloud storage location being registered
|---Container or S3 Bucket Name < -- The container (Azure) or bucket (Amazon) being registered as part of the cloud storage data store
|---FileShareFolder < -- The parent folder that is registered as the 'folder' during cloud storage registration
|---Earthquakes < -- The dataset "Earthquakes", composed of 4 csvs with the same schema
|---1960
|---01_1960.csv
|---02_1960.csv
|---1961
|---01_1961.csv
|---02_1961.csv
|---Hurricanes < -- The dataset "Hurricanes", composed of 3 shapefiles with the same schema
|---atlantic_hur.shp
|---pacific_hur.shp
|---otherhurricanes.shp
|---GlobalOceans < -- The dataset "GlobalOceans", composed of 1 shapefile
|---oceans.shp
Manage big data file shares in a portal
Once you have created a big data file share, you can review the datasets in it and the templates that outline how results saved to big data file shares will be written.
Modify a big data file share
When a big data file share item is created, a manifest for the input data is automatically generated and uploaded. The process of generating a manifest may not always correctly estimate the fields representing geometry and time, and you may need to apply edits. To edit a manifest and how datasets are represented, follow the steps in Edit big data file shares.. To learn more about the big data file share manifest, see Big data file share manifest in the ArcGIS Server help.
If you created your big data file share in ArcGIS Server using Manager, follow the steps in Edit big data file share manifests in Server Manager.
Modify output templates for a big data file share
When you choose to use the big data file share as an output location, output templates are automatically generated. These templates outline the formatting of output analysis results, such as the file type, and how time and geometry will be registered. If you want to modify the geometry or time formatting, or add or delete templates, you can modify the templates. To edit the output templates, follow the steps in Create, edit, and view output templates. To learn more about output templates, see Output templates in a big data file share.
If you created your big data file share in ArcGIS Server using Manager, follow the steps in Edit big data file share manifests in Server Manager.
Migrate big data file shares created in Server Manager to a portal
Big data file shares created using a portal have many advantages over big data files shares created in Server Manager, for example:
- An improved user experience to make editing datasets easier.
- Simpler experience to register your big data file shares.
- Items are stored and shared using portal credentials
It is recommended that you create a data store item for the big data file shares that you created in Server Manager. In some cases, it is required. In the following cases, you must migrate big data file shares to be data store items in the portal to continue using them:
- Big data file shares based on a Microsoft Azure Data Lake Storage Gen1 cloud storage data store.
To migrate a big data file share you created in Server Manager to a portal data store item, ensure that you have the following:
- The credentials and file location of your configured big data file share.
- If applicable, the credentials and file location of your configured cloud storage data store.
- Sign in to Server Manager on your GeoAnalytics Server site.
- Go to Site > Data Stores. Click the edit button on the big data file share you'd like to migrate.
- Go to Advanced > Manifest. Click the Download button to save the manifest.
- If you have any hints, complete the same steps for hints. Click HintsDownload to save your hints file. Rename your file extension from .dat to .txt.txt.
- If you have output templates under the AdvancedOutput Templates section, copy the text and save it in a text file.
- Create a big data file share in the portal Content page using the same type and input location as was previously used.
If you don't know the credentials, your administrator can find them in Server Administrator using the decrypt=true option on the big data files share and cloud storage data store items.
Follow the steps in Add a data store item, and use the same credentials and location as your existing big data file share.
- After the big data file share item is created, click Datasets, and turn on the Show advanced option.
- Upload the manifest you saved previously by clicking Upload in the manifest section. Browse to the manifest JSON file that was saved earlier, and click Upload. Click the Sync button so that changes are reflected.
- If you have a hints file to upload, complete the same steps, and upload your hints file under the Show advanced > Hints > Upload option. Click the Sync button so that changes are reflected.
- To upload the output templates, do one of the following:
- Manually add the output templates using the big data file share item Outputs > Add output templates.
- Edit the JSON file of the big data file share item through ArcGIS Server Administrator Directory. This is only recommend if you're familiar with editing JSON files.
You now have a big data file share and manifest for your big data file share item in your portal. You can update your workflows to use and point to this big data file share. When you are confident it's working as expected, delete your original big data file share in Server Manager.
Run analysis on a big data file share
You can run analysis on a dataset in a big data file share through any clients that support GeoAnalytics Server, which include the following:
- ArcGIS Pro
- Map Viewer Classic
- ArcGIS REST API
- ArcGIS API for Python
To run your analysis on a big data file share through ArcGIS Pro or Map Viewer Classic, select the GeoAnalytics Tools you want to use. For the input to the tool, browse to where your data is located under Portal in ArcGIS Pro or on the Browse Layers dialog box in Map Viewer Classic. Data will be in My Content if you registered the data yourself. Otherwise, look in Groups or All Portal. Note that a big data file share layer selected for analysis will not be displayed in the map.
Note:
Ensure that you are signed in with a portal account that has access to the registered big data file share. You can search your portal with the term bigDataFileShare* to quickly find all the big data file shares you can access.
To run analysis on a big data file share through ArcGIS REST API, use the big data catalog service URL as the input. If you created the big data file share in portal, this will be in the format {"url":" https://webadaptorhost.domain.com/webadaptorname/rest/DataStoreCatalogs/bigDataFileShares_filesharename/"}. For example, with a machine named example, a domain named esri, a web adaptor named server, a big data file share named MyData, and a dataset named Earthquakes, the URL is: {"url":" https://example.esri.com/server/rest/DataStoreCatalogs/bigDataFileShares_MyData/Earthquakes_uniqueID"}. If you created the big data file share in Server Manager, this will be in the format {"url":"https://webadaptorhost.domain.com/webadaptorname/rest/DataStoreCatalogs/bigDataFileShares_filesharename/BigDataCatalogServer/dataset"}
To learn more about input to big data analysis through REST, see the Feature input topic in the ArcGIS Services REST API documentation.
Save results to a big data file share
You can run analysis on a dataset (big data file share or other input) and save the results to a big data file share. You can do this through the following clients:
- Map Viewer Classic
- ArcGIS REST API
- ArcGIS API for Python
When you write results to a big data file share, the input manifest is updated to include the dataset you just saved. The results you have written to the big data file share are now available as an input for another tool run. When you save results to a big data file share, you cannot visualize them.