SARXarray and STMTools: Dependencies of DePSI¶
This notebook demonstrates the two data libraries used in DePSI: SARXarray and STMTools:
In this notebook, we will demonstrate the basic usage of these two libraries. We will show an example workflow using these two data libraries to do:
- Lazily load a coregistered interferogram stack
- Make a Mean Reflection Map (MRM)
- Perform Persistent Scatter (PS) selection and save the results t as a SpaceTime Matrix (STM)
- Enrich selected STM with contextual data
PSI processing is not involved in this notebook.
Data used for this notebook¶
To run this notebook, you will need the following two datasets:
Please retrieve this dataset and save it in the data directory in your working directory.
Scale up this example to HPC (Optional)¶
This notebook can run on a local laptop to process a relatively smaller dataset. But it can also be scaled up to process larger datasets by using HPC. It supports using JupyterDaskOnSLURM, which is one of the tools in RS-DAT platform, to scale up. Please refer to the user guide to execute this notebook on HPC.
Step 0: Setup a Dask SLURMCluster (Optional)¶
You can ignore this step if you are running on a local laptop.
If you are running on an HPC environment, you can setup a Dask SLURMCluster to perform parallel processing. To set up the SLURMCluster, you can either follow the Dask documentation to set it manually, or follow the RSDAT user guide to set it with a Dask plugin in Jupyter Server
Below is an example of connecting to a SLURMCluster hosted at tcp://10.0.1.124:35589.
# One example of setting up a Dask client connecting to a SlurmCluster
# from dask.distributed import Client
# client = Client("tcp://10.0.2.109:33771")
# client
Client
Client-416a8dac-0061-11ee-95de-fa163e9835c7
| Connection method: Direct | |
| Dashboard: /proxy/8787/status |
Scheduler Info
Scheduler
Scheduler-2831ed6c-f63d-483e-9058-7a97bc82f333
| Comm: tcp://10.0.2.109:33771 | Workers: 16 |
| Dashboard: /proxy/8787/status | Total threads: 64 |
| Started: 1 hour ago | Total memory: 256.00 GiB |
Workers
Worker: SLURMCluster-0
| Comm: tcp://10.0.2.188:40745 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.2.188:36701 | |
| Local directory: /tmp/dask-worker-space/worker-kt4esw0a | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 408.04 MiB | Spilled bytes: 0 B |
| Read bytes: 1.05 kiB | Write bytes: 2.01 kiB |
Worker: SLURMCluster-1
| Comm: tcp://10.0.2.188:35325 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.2.188:44011 | |
| Local directory: /tmp/dask-worker-space/worker-ywynu1_u | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 424.65 MiB | Spilled bytes: 0 B |
| Read bytes: 2.11 kiB | Write bytes: 7.95 kiB |
Worker: SLURMCluster-10
| Comm: tcp://10.0.0.18:45661 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.0.18:38503 | |
| Local directory: /tmp/dask-worker-space/worker-0r090clu | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 404.03 MiB | Spilled bytes: 0 B |
| Read bytes: 2.03 kiB | Write bytes: 5.97 kiB |
Worker: SLURMCluster-11
| Comm: tcp://10.0.1.193:44853 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.1.193:40885 | |
| Local directory: /tmp/dask-worker-space/worker-kh75reu7 | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 2.0% | Last seen: Just now |
| Memory usage: 395.94 MiB | Spilled bytes: 0 B |
| Read bytes: 1.82 kiB | Write bytes: 6.53 kiB |
Worker: SLURMCluster-12
| Comm: tcp://10.0.0.18:42803 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.0.18:39907 | |
| Local directory: /tmp/dask-worker-space/worker-u1hber22 | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 2.0% | Last seen: Just now |
| Memory usage: 417.21 MiB | Spilled bytes: 0 B |
| Read bytes: 1.91 kiB | Write bytes: 4.41 kiB |
Worker: SLURMCluster-13
| Comm: tcp://10.0.2.188:43135 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.2.188:44451 | |
| Local directory: /tmp/dask-worker-space/worker-a3izpami | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 435.75 MiB | Spilled bytes: 0 B |
| Read bytes: 1.58 kiB | Write bytes: 6.33 kiB |
Worker: SLURMCluster-14
| Comm: tcp://10.0.0.18:42839 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.0.18:45913 | |
| Local directory: /tmp/dask-worker-space/worker-uxoklxna | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 2.0% | Last seen: Just now |
| Memory usage: 416.45 MiB | Spilled bytes: 0 B |
| Read bytes: 3.08 kiB | Write bytes: 10.83 kiB |
Worker: SLURMCluster-15
| Comm: tcp://10.0.2.109:46183 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.2.109:35539 | |
| Local directory: /tmp/dask-worker-space/worker-kqph49d0 | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 2.0% | Last seen: Just now |
| Memory usage: 399.46 MiB | Spilled bytes: 0 B |
| Read bytes: 30.53 MiB | Write bytes: 364.68 kiB |
Worker: SLURMCluster-2
| Comm: tcp://10.0.2.188:34679 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.2.188:38209 | |
| Local directory: /tmp/dask-worker-space/worker-z5uk12g7 | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 2.0% | Last seen: Just now |
| Memory usage: 359.18 MiB | Spilled bytes: 0 B |
| Read bytes: 1.05 kiB | Write bytes: 659.8214809049156 B |
Worker: SLURMCluster-3
| Comm: tcp://10.0.1.193:34567 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.1.193:39079 | |
| Local directory: /tmp/dask-worker-space/worker-unf4j1_1 | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 2.0% | Last seen: Just now |
| Memory usage: 404.32 MiB | Spilled bytes: 0 B |
| Read bytes: 1.43 kiB | Write bytes: 4.84 kiB |
Worker: SLURMCluster-4
| Comm: tcp://10.0.2.188:41333 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.2.188:35617 | |
| Local directory: /tmp/dask-worker-space/worker-xwcsfwus | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 409.61 MiB | Spilled bytes: 0 B |
| Read bytes: 1.05 kiB | Write bytes: 2.00 kiB |
Worker: SLURMCluster-5
| Comm: tcp://10.0.0.18:46173 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.0.18:46755 | |
| Local directory: /tmp/dask-worker-space/worker-9u4lgwp8 | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 398.43 MiB | Spilled bytes: 0 B |
| Read bytes: 2.29 kiB | Write bytes: 7.45 kiB |
Worker: SLURMCluster-6
| Comm: tcp://10.0.2.188:43111 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.2.188:40531 | |
| Local directory: /tmp/dask-worker-space/worker-wx6z3ayl | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 422.71 MiB | Spilled bytes: 0 B |
| Read bytes: 2.11 kiB | Write bytes: 9.32 kiB |
Worker: SLURMCluster-7
| Comm: tcp://10.0.2.188:35455 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.2.188:45367 | |
| Local directory: /tmp/dask-worker-space/worker-0aq84up2 | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 381.32 MiB | Spilled bytes: 0 B |
| Read bytes: 1.32 kiB | Write bytes: 4.99 kiB |
Worker: SLURMCluster-8
| Comm: tcp://10.0.1.193:42905 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.1.193:38391 | |
| Local directory: /tmp/dask-worker-space/worker-v4k55lt_ | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 2.0% | Last seen: Just now |
| Memory usage: 444.56 MiB | Spilled bytes: 0 B |
| Read bytes: 1.43 kiB | Write bytes: 3.48 kiB |
Worker: SLURMCluster-9
| Comm: tcp://10.0.0.18:44711 | Total threads: 4 |
| Dashboard: /proxy/8787/status | Memory: 16.00 GiB |
| Nanny: tcp://10.0.0.18:45993 | |
| Local directory: /tmp/dask-worker-space/worker-x71kzwe4 | |
| Tasks executing: | Tasks in memory: |
| Tasks ready: | Tasks in flight: |
| CPU usage: 0.0% | Last seen: Just now |
| Memory usage: 383.39 MiB | Spilled bytes: 0 B |
| Read bytes: 3.08 kiB | Write bytes: 9.26 kiB |
Step1: Load a SLC stack using SarXarray¶
import os
from pathlib import Path
import numpy as np
import sarxarray
cwd = Path(os.getcwd())
# Path to the interferogram dataset
path = Path(cwd / 'data/nl_amsterdam_s1_asc_t088')
# Make a list of SLCs to read
f_slc = 'cint_srd.raw'
list_slcs = [p for p in path.rglob('*_cint_srd.raw')]
list_slcs.sort()
# Geo-referenced coordinates
f_lat = [path / 'lat.raw']
f_lon = [path / 'lon.raw']
# Metadata of the stack, assume known.
shape=(2000, 4000)
# Define reading chunks
reading_chunks = (500,500)
# Load complex data
stack = sarxarray.from_binary(list_slcs, shape, dtype=np.complex64, chunks=reading_chunks)
# Load coordinates
lat = sarxarray.from_binary(f_lat, shape, vlabel="lat", dtype=np.float32, chunks=reading_chunks)
lon = sarxarray.from_binary(f_lon, shape, vlabel="lon", dtype=np.float32, chunks=reading_chunks)
stack = stack.assign_coords(lat = (("azimuth", "range"), lat.squeeze().lat.data), lon = (("azimuth", "range"), lon.squeeze().lon.data))
stack
<xarray.Dataset>
Dimensions: (azimuth: 2000, range: 4000, time: 10)
Coordinates:
* azimuth (azimuth) int64 0 1 2 3 4 5 6 ... 1994 1995 1996 1997 1998 1999
* range (range) int64 0 1 2 3 4 5 6 ... 3994 3995 3996 3997 3998 3999
* time (time) int64 0 1 2 3 4 5 6 7 8 9
lat (azimuth, range) float32 dask.array<chunksize=(500, 500), meta=np.ndarray>
lon (azimuth, range) float32 dask.array<chunksize=(500, 500), meta=np.ndarray>
Data variables:
complex (azimuth, range, time) complex64 dask.array<chunksize=(500, 500, 1), meta=np.ndarray>
amplitude (azimuth, range, time) float32 dask.array<chunksize=(500, 500, 1), meta=np.ndarray>
phase (azimuth, range, time) float32 dask.array<chunksize=(500, 500, 1), meta=np.ndarray>Step 2: Making a MRM¶
mrm = stack.slcstack.mrm()
mrm
<xarray.DataArray 'amplitude' (azimuth: 2000, range: 4000)>
dask.array<mean_agg-aggregate, shape=(2000, 4000), dtype=float32, chunksize=(500, 500), chunktype=numpy.ndarray>
Coordinates:
* azimuth (azimuth) int64 0 1 2 3 4 5 6 ... 1994 1995 1996 1997 1998 1999
* range (range) int64 0 1 2 3 4 5 6 ... 3993 3994 3995 3996 3997 3998 3999
lat (azimuth, range) float32 dask.array<chunksize=(500, 500), meta=np.ndarray>
lon (azimuth, range) float32 dask.array<chunksize=(500, 500), meta=np.ndarray>mrm = mrm.compute()
# Visualize
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
ax.imshow(mrm)
ax.set_aspect(2)
im = mrm.plot(ax=ax, cmap='gray')
im.set_clim([0, 40000])
Step3: PS seletion¶
stmat = stack.slcstack.point_selection(threshold=4, method="amplitude_dispersion",chunks=5000)
stmat
<xarray.Dataset>
Dimensions: (time: 10, points: 316762)
Coordinates:
* time (time) int64 0 1 2 3 4 5 6 7 8 9
lat (points) float32 dask.array<chunksize=(5000,), meta=np.ndarray>
lon (points) float32 dask.array<chunksize=(5000,), meta=np.ndarray>
azimuth (points) int64 dask.array<chunksize=(5000,), meta=np.ndarray>
range (points) int64 dask.array<chunksize=(5000,), meta=np.ndarray>
Dimensions without coordinates: points
Data variables:
complex (points, time) complex64 dask.array<chunksize=(5000, 10), meta=np.ndarray>
amplitude (points, time) float32 dask.array<chunksize=(5000, 10), meta=np.ndarray>
phase (points, time) float32 dask.array<chunksize=(5000, 10), meta=np.ndarray>from matplotlib import pyplot as plt
fig, ax = plt.subplots()
plt.scatter(stmat.lon.data, stmat.lat.data, s=0.005)
<matplotlib.collections.PathCollection at 0x7f9773b57e10>
# Export to Zarr
stmat.to_zarr("stm.zarr", mode="w") # overwrite for demo
<xarray.backends.zarr.ZarrStore at 0x7f97727aa110>
Step4: STM enrichment¶
import xarray as xr
path_stm = Path('./stm.zarr')
stm_demo = xr.open_zarr(path_stm)
stm_demo
<xarray.Dataset>
Dimensions: (points: 316762, time: 10)
Coordinates:
azimuth (points) int64 dask.array<chunksize=(5000,), meta=np.ndarray>
lat (points) float32 dask.array<chunksize=(5000,), meta=np.ndarray>
lon (points) float32 dask.array<chunksize=(5000,), meta=np.ndarray>
range (points) int64 dask.array<chunksize=(5000,), meta=np.ndarray>
* time (time) int64 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: points
Data variables:
amplitude (points, time) float32 dask.array<chunksize=(5000, 10), meta=np.ndarray>
complex (points, time) complex64 dask.array<chunksize=(5000, 10), meta=np.ndarray>
phase (points, time) float32 dask.array<chunksize=(5000, 10), meta=np.ndarray># Path to the BRP polygon of NL
# Need a abs path for cluster processing
path_polygon = Path('/project/caroline/Public/demo_sarxarray/data/bag_light_AMS_WGS84.gpkg')
# Data enrichment
fields_to_query = ['bouwjaar']
# When AoI is small and less number of chunks: directly feed in the polygon file. Multi-process can be blocked by file IO
stm_demo = stm_demo.stm.enrich_from_polygon(path_polygon, fields_to_query)
# Select PS in sied polygons
stm_demo_subset = stm_demo.stm.subset(method='polygon', polygon=path_polygon)
stm_demo_subset
<xarray.Dataset>
Dimensions: (time: 10, points: 85315)
Coordinates:
* time (time) int64 0 1 2 3 4 5 6 7 8 9
azimuth (points) int64 dask.array<chunksize=(5000,), meta=np.ndarray>
lat (points) float32 dask.array<chunksize=(5000,), meta=np.ndarray>
lon (points) float32 dask.array<chunksize=(5000,), meta=np.ndarray>
range (points) int64 dask.array<chunksize=(5000,), meta=np.ndarray>
Dimensions without coordinates: points
Data variables:
amplitude (points, time) float32 dask.array<chunksize=(5000, 10), meta=np.ndarray>
complex (points, time) complex64 dask.array<chunksize=(5000, 10), meta=np.ndarray>
phase (points, time) float32 dask.array<chunksize=(5000, 10), meta=np.ndarray>
bouwjaar (points) object dask.array<chunksize=(5000,), meta=np.ndarray># Compute the construction year
bouwjaar = stm_demo_subset['bouwjaar'].compute()
# Visualize results
import matplotlib.cm as cm
from matplotlib import pyplot as plt
colormap = cm.jet
fig, ax = plt.subplots()
plt.title("Construction year, PS")
plt.scatter(stm_demo_subset.lon.data, stm_demo_subset.lat.data, c=bouwjaar, s=0.002, cmap=colormap)
plt.clim([1900, 2023])
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f9766659bd0>
2023-06-01 11:51:53,744 - distributed.client - ERROR - Failed to reconnect to scheduler after 30.00 seconds, closing client