Create and Use Dataset Object#

Any analysis in pyIncore, by default uses Dataset Object as input. This tutorial introduces users to the basic concept of creating and using Dataset Object via either loading from local files, or connecting to remote IN-CORE Data Services.

import pandas as pd
from pyincore import IncoreClient, DataService, SpaceService, Dataset, FragilityService, MappingSet
from pyincore.analyses.buildingdamage import BuildingDamage
from pyincore.analyses.meandamage import MeanDamage
client = IncoreClient()
data_services = DataService(client)
space_services = SpaceService(client)
Connection successful to IN-CORE services. pyIncore version detected: 0.9.0

Upload Dataset to Data Services#

Write Metadata#

  • Metadata is a string describing the dataset.

  • dataType needs to be align with the analyses in pyincore.

  • format is the file format of the dataset. Currently we support “shapefile”, “table”, “Network”, “textFiles “, “raster”, “geotiff” and etc. Please consult with development team if you intend to post a new format.

# note you have to put the correct dataType as well as format
dataset_metadata = {
    "title":"Tutorial Test ERGO Memphis Hospitals",
    "description": "ERGO Memphis Hospitals",
    "dataType": "ergo:buildingInventoryVer5",
    "format": "shapefile"
}

Upload metadata#

After upload metadata the “placeholder” dataset object has been created on INCORE service with the id which does not have files attached to it yet. However it is already possible to see the empty dataset on the service by searching that particular id.

created_dataset = data_services.create_dataset(dataset_metadata)
dataset_id = created_dataset['id']
print('dataset is created with id ' + dataset_id)
dataset is created with id 603e5e1034f29a7fa4282a8f

Attach files to the dataset created#

Using the dataset id we attach the files that contain the data for the dataset.

files = ['files/all_bldgs_ver5_WGS1984.shp',
         'files/all_bldgs_ver5_WGS1984.shx',
         'files/all_bldgs_ver5_WGS1984.prj',
         'files/all_bldgs_ver5_WGS1984.dbf']
full_dataset = data_services.add_files_to_dataset(dataset_id, files)
full_dataset
{'id': '603e5e1034f29a7fa4282a8f',
 'deleted': False,
 'title': 'Tutorial Test ERGO Memphis Hospitals',
 'description': 'ERGO Memphis Hospitals',
 'date': '2021-03-02T15:47:28+0000',
 'creator': 'mondrejc',
 'spaces': None,
 'contributors': [],
 'fileDescriptors': [{'id': '603e5e1034f29a7fa4282b02',
   'deleted': False,
   'filename': 'all_bldgs_ver5_WGS1984.shp',
   'mimeType': 'application/octet-stream',
   'size': 716,
   'dataURL': '60/3e/603e5e1034f29a7fa4282b02/all_bldgs_ver5_WGS1984.shp',
   'md5sum': '6e1e96c4a6cf5762317054fe813d82bf'},
  {'id': '603e5e1034f29a7fa4282b05',
   'deleted': False,
   'filename': 'all_bldgs_ver5_WGS1984.shx',
   'mimeType': 'application/octet-stream',
   'size': 276,
   'dataURL': '60/3e/603e5e1034f29a7fa4282b05/all_bldgs_ver5_WGS1984.shx',
   'md5sum': '799965579a991f1f45afeb22c07c5ece'},
  {'id': '603e5e1034f29a7fa4282b08',
   'deleted': False,
   'filename': 'all_bldgs_ver5_WGS1984.prj',
   'mimeType': 'application/octet-stream',
   'size': 205,
   'dataURL': '60/3e/603e5e1034f29a7fa4282b08/all_bldgs_ver5_WGS1984.prj',
   'md5sum': '30e5566d68356bfc059d296c42c0480e'},
  {'id': '603e5e1034f29a7fa4282b0b',
   'deleted': False,
   'filename': 'all_bldgs_ver5_WGS1984.dbf',
   'mimeType': 'application/octet-stream',
   'size': 10859,
   'dataURL': '60/3e/603e5e1034f29a7fa4282b0b/all_bldgs_ver5_WGS1984.dbf',
   'md5sum': '7ea0a4c769ca254a6b4821f2e737eb35'}],
 'dataType': 'ergo:buildingInventoryVer5',
 'storedUrl': '',
 'format': 'shapefile',
 'sourceDataset': '',
 'boundingBox': [-90.07376669874641,
  35.03298062856903,
  -89.71464767735003,
  35.207753220358086],
 'networkDataset': None}

Moving your dataset to INCORE space#

If you would like other people to access your data, you can contact NCSA to move your dataset to a certain space. In a future release, you will be able to do this yourself.

1. Load Dataset from Data services#

building_dataset_id = "5a284f0bc7d30d13bc081a28"
buildings = Dataset.from_data_service(building_dataset_id, data_services)
buildings
Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
<pyincore.dataset.Dataset at 0x7fcf7c7985d0>

2. Load Dataset from local files#

  • Note you have to make sure you pass the right data_type when constructing Dataset Object from scratch

  • To look up what data_type it should be, please refer to the source code of the analyses

  • You want to look take a look at the spec section -> input_datasets -> type

buildings = Dataset.from_file("files/all_bldgs_ver5_WGS1984.shp", data_type="ergo:buildingInventoryVer5")
buildings
<pyincore.dataset.Dataset at 0x7fcee8b73150>

3. Input the Dataset object in analyses#

# for example: Building Damage Analyses
bldg_dmg = BuildingDamage(client)
bldg_dmg.set_input_dataset("buildings", buildings)  
True
# Memphis Earthquake damage
# New madrid earthquake using Atkinson Boore 1995
hazard_type = "earthquake"
hazard_id = "5b902cb273c3371e1236b36b"

# Earthquake mapping
mapping_id = "5b47b350337d4a3629076f2c"
fragility_service = FragilityService(client)
mapping_set = MappingSet(fragility_service.get_mapping(mapping_id))
bldg_dmg.set_input_dataset('dfr3_mapping_set', mapping_set)

result_name = "memphis_eq_bldg_dmg_result"
bldg_dmg.set_parameter("result_name", result_name)
bldg_dmg.set_parameter("hazard_type", hazard_type)
bldg_dmg.set_parameter("hazard_id", hazard_id)
bldg_dmg.set_parameter("num_cpu", 4)

# Run Analysis
bldg_dmg.run_analysis()
True

4. Chaining the output Dataset object in subsequent analyses#

Output is a dataset object as well, here is how to display

print("output datasets:", bldg_dmg.get_output_datasets())
bldg_dmg.get_output_dataset('ds_result').get_dataframe_from_csv().head()
output datasets: {'ds_result': <pyincore.dataset.Dataset object at 0x7fcee8ab9fd0>, 'damage_result': <pyincore.dataset.Dataset object at 0x7fcee8a48150>}
guid LS_0 LS_1 LS_2 DS_0 DS_1 DS_2 DS_3
0 a41e7dcc-3b82-42f2-9dbd-a2ebdf39d453 0.848146 0.327319 2.722903e-02 0.151854 0.520828 0.300089 2.722903e-02
1 254d1dd8-5d2f-4737-909b-59cc64ca72d4 0.844340 0.328296 2.860487e-02 0.155660 0.516045 0.299691 2.860487e-02
2 4253802e-b3e5-4ed3-93b0-dda9ef6362b0 0.896775 0.480926 8.756764e-02 0.103225 0.415849 0.393358 8.756764e-02
3 b185d5b6-5bc0-43a3-800a-c046017372ab 0.810564 0.331283 4.895657e-02 0.189436 0.479281 0.282327 4.895657e-02
4 7b5dc4f6-ef5e-4178-9836-f044b4b92f0d 0.970342 0.154675 1.000000e-10 0.029658 0.815668 0.154675 1.000000e-10

Chaining with Mean damage analysis#

md = MeanDamage(client)

# use the output of road damage
building_damage_output = bldg_dmg.get_output_dataset('ds_result')
md.set_input_dataset("damage", building_damage_output)

md.load_remote_input_dataset("dmg_ratios", "5a284f2ec7d30d13bc08209a")
md.set_parameter("result_name", "building_mean_damage")
md.set_parameter("damage_interval_keys", ["DS_0", "DS_1", "DS_2", "DS_3"])
md.set_parameter("num_cpu", 1)

# Run analysis
md.run_analysis()
Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
True
print("output datasets:", md.get_output_datasets())
md.get_output_dataset('result').get_dataframe_from_csv().head()[['meandamage', 'mdamagedev']]
output datasets: {'result': <pyincore.dataset.Dataset object at 0x7fcee8a18f90>}
meandamage mdamagedev
0 0.271043 0.238080
1 0.271340 0.239546
2 0.360131 0.275124
3 0.274576 0.256321
4 0.211648 0.160879

Utility methods#

# e.g. read the shapefile properties
rd = buildings.get_inventory_reader()
for row in rd:
    print('year built:', row['properties']['year_built'])
year built: 1978
year built: 1925
year built: 1924
year built: 1910
year built: 1991
year built: 1963
year built: 1976
year built: 1958
year built: 1927
year built: 1972
year built: 2004
year built: 1974
year built: 2001
year built: 1973
year built: 1971
year built: 1970
year built: 1999
year built: 2003
year built: 2003
year built: 1998
year built: 1986
year built: 1987