Last active
October 6, 2025 17:26
-
-
Save pletchm/f1302a61d81285838d6e4255d460b16f to your computer and use it in GitHub Desktop.
Xarray Example Cheatsheet
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Cheatsheet Outline" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "1. Common Xarray-related Imports\n", | |
| "2. Creating Xarray DataArrays\n", | |
| "3. Converting from Xarray to Pandas\n", | |
| "4. Converting from Pandas to Xarray\n", | |
| "5. Reading and writing Xarrays to netCDF files\n", | |
| "6. Slicing and dicing data\n", | |
| "7. Changing values\n", | |
| "8. Data Reduction\n", | |
| "9. Vectorized operations\n", | |
| "10. Changing and adding coordinates/Expanding or broadcasting dimensions\n", | |
| "11. Datasets" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 1. Common Xarray-related Imports" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import numpy as np\n", | |
| "import pandas as pd\n", | |
| "import xarray as xr" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 2. Creating Xarray DataArrays\n", | |
| "There are two kinds of data structures in Xarray: DataArrays and Datasets. We'll start with DataArrays, because Datasets are actually just a collection of DataArrays. Also Datasets are less commonly useful. " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "da = xr.DataArray(\n", | |
| " data=np.random.random([2, 3, 11, 100]),\n", | |
| " dims=[\"sex_id\", \"age_group_id\", \"year_id\", \"draw\"],\n", | |
| " coords={\n", | |
| " \"sex_id\": [1, 2],\n", | |
| " \"age_group_id\": [11, 12, 13],\n", | |
| " \"year_id\": range(1990, 2000+1),\n", | |
| " \"draw\": range(100),\n", | |
| " },\n", | |
| " name=\"fake_thing\"\n", | |
| " )" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "scrolled": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n", | |
| "array([[[[0.996779, ..., 0.193241],\n", | |
| " ...,\n", | |
| " [0.50371 , ..., 0.654273]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.227811, ..., 0.912856],\n", | |
| " ...,\n", | |
| " [0.24642 , ..., 0.581184]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.072334, ..., 0.684663],\n", | |
| " ...,\n", | |
| " [0.628984, ..., 0.358811]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.491698, ..., 0.876439],\n", | |
| " ...,\n", | |
| " [0.829525, ..., 0.611719]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99" | |
| ] | |
| }, | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "da" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 3. Converting from Xarray to Pandas" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>age_group_id</th>\n", | |
| " <th>sex_id</th>\n", | |
| " <th>year_id</th>\n", | |
| " <th>draw_0</th>\n", | |
| " <th>draw_1</th>\n", | |
| " <th>draw_2</th>\n", | |
| " <th>draw_3</th>\n", | |
| " <th>draw_4</th>\n", | |
| " <th>draw_5</th>\n", | |
| " <th>draw_6</th>\n", | |
| " <th>...</th>\n", | |
| " <th>draw_90</th>\n", | |
| " <th>draw_91</th>\n", | |
| " <th>draw_92</th>\n", | |
| " <th>draw_93</th>\n", | |
| " <th>draw_94</th>\n", | |
| " <th>draw_95</th>\n", | |
| " <th>draw_96</th>\n", | |
| " <th>draw_97</th>\n", | |
| " <th>draw_98</th>\n", | |
| " <th>draw_99</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1990</td>\n", | |
| " <td>0.996779</td>\n", | |
| " <td>0.761632</td>\n", | |
| " <td>0.350849</td>\n", | |
| " <td>0.750393</td>\n", | |
| " <td>0.433888</td>\n", | |
| " <td>0.764425</td>\n", | |
| " <td>0.122375</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.225819</td>\n", | |
| " <td>0.836557</td>\n", | |
| " <td>0.885162</td>\n", | |
| " <td>0.222884</td>\n", | |
| " <td>0.641429</td>\n", | |
| " <td>0.393851</td>\n", | |
| " <td>0.381577</td>\n", | |
| " <td>0.294711</td>\n", | |
| " <td>0.650573</td>\n", | |
| " <td>0.193241</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1991</td>\n", | |
| " <td>0.786725</td>\n", | |
| " <td>0.014690</td>\n", | |
| " <td>0.184935</td>\n", | |
| " <td>0.269309</td>\n", | |
| " <td>0.493112</td>\n", | |
| " <td>0.365666</td>\n", | |
| " <td>0.573797</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.196058</td>\n", | |
| " <td>0.190651</td>\n", | |
| " <td>0.266525</td>\n", | |
| " <td>0.453888</td>\n", | |
| " <td>0.333859</td>\n", | |
| " <td>0.377547</td>\n", | |
| " <td>0.304548</td>\n", | |
| " <td>0.035076</td>\n", | |
| " <td>0.905141</td>\n", | |
| " <td>0.262088</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1992</td>\n", | |
| " <td>0.851437</td>\n", | |
| " <td>0.367362</td>\n", | |
| " <td>0.736778</td>\n", | |
| " <td>0.500674</td>\n", | |
| " <td>0.885498</td>\n", | |
| " <td>0.350236</td>\n", | |
| " <td>0.837336</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.526494</td>\n", | |
| " <td>0.398270</td>\n", | |
| " <td>0.609992</td>\n", | |
| " <td>0.480893</td>\n", | |
| " <td>0.261509</td>\n", | |
| " <td>0.537468</td>\n", | |
| " <td>0.326550</td>\n", | |
| " <td>0.393128</td>\n", | |
| " <td>0.236991</td>\n", | |
| " <td>0.239981</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1993</td>\n", | |
| " <td>0.431025</td>\n", | |
| " <td>0.786596</td>\n", | |
| " <td>0.385705</td>\n", | |
| " <td>0.140987</td>\n", | |
| " <td>0.742205</td>\n", | |
| " <td>0.380742</td>\n", | |
| " <td>0.247266</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.811025</td>\n", | |
| " <td>0.964106</td>\n", | |
| " <td>0.484327</td>\n", | |
| " <td>0.387248</td>\n", | |
| " <td>0.862704</td>\n", | |
| " <td>0.320871</td>\n", | |
| " <td>0.288251</td>\n", | |
| " <td>0.752603</td>\n", | |
| " <td>0.482269</td>\n", | |
| " <td>0.423913</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1994</td>\n", | |
| " <td>0.715613</td>\n", | |
| " <td>0.937093</td>\n", | |
| " <td>0.276558</td>\n", | |
| " <td>0.155267</td>\n", | |
| " <td>0.892415</td>\n", | |
| " <td>0.782576</td>\n", | |
| " <td>0.620654</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.038820</td>\n", | |
| " <td>0.025020</td>\n", | |
| " <td>0.422900</td>\n", | |
| " <td>0.139842</td>\n", | |
| " <td>0.229250</td>\n", | |
| " <td>0.092306</td>\n", | |
| " <td>0.262763</td>\n", | |
| " <td>0.009972</td>\n", | |
| " <td>0.457518</td>\n", | |
| " <td>0.653466</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>5 rows × 103 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " age_group_id sex_id year_id draw_0 draw_1 draw_2 draw_3 \\\n", | |
| "0 11 1 1990 0.996779 0.761632 0.350849 0.750393 \n", | |
| "1 11 1 1991 0.786725 0.014690 0.184935 0.269309 \n", | |
| "2 11 1 1992 0.851437 0.367362 0.736778 0.500674 \n", | |
| "3 11 1 1993 0.431025 0.786596 0.385705 0.140987 \n", | |
| "4 11 1 1994 0.715613 0.937093 0.276558 0.155267 \n", | |
| "\n", | |
| " draw_4 draw_5 draw_6 ... draw_90 draw_91 draw_92 draw_93 \\\n", | |
| "0 0.433888 0.764425 0.122375 ... 0.225819 0.836557 0.885162 0.222884 \n", | |
| "1 0.493112 0.365666 0.573797 ... 0.196058 0.190651 0.266525 0.453888 \n", | |
| "2 0.885498 0.350236 0.837336 ... 0.526494 0.398270 0.609992 0.480893 \n", | |
| "3 0.742205 0.380742 0.247266 ... 0.811025 0.964106 0.484327 0.387248 \n", | |
| "4 0.892415 0.782576 0.620654 ... 0.038820 0.025020 0.422900 0.139842 \n", | |
| "\n", | |
| " draw_94 draw_95 draw_96 draw_97 draw_98 draw_99 \n", | |
| "0 0.641429 0.393851 0.381577 0.294711 0.650573 0.193241 \n", | |
| "1 0.333859 0.377547 0.304548 0.035076 0.905141 0.262088 \n", | |
| "2 0.261509 0.537468 0.326550 0.393128 0.236991 0.239981 \n", | |
| "3 0.862704 0.320871 0.288251 0.752603 0.482269 0.423913 \n", | |
| "4 0.229250 0.092306 0.262763 0.009972 0.457518 0.653466 \n", | |
| "\n", | |
| "[5 rows x 103 columns]" | |
| ] | |
| }, | |
| "execution_count": 4, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "draw_cols = [\"draw_{}\".format(i) for i in range(100)]\n", | |
| "da_draw_dim = da.assign_coords(draw=draw_cols)\n", | |
| "ds = da_draw_dim.to_dataset(dim=\"draw\")\n", | |
| "df = ds.to_dataframe().reset_index()\n", | |
| "df.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n", | |
| "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n", | |
| " 0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n", | |
| " [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n", | |
| " 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n", | |
| " [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n", | |
| " 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n", | |
| " [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n", | |
| " 0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n", | |
| " [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n", | |
| " 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 5, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "da.mean(\"draw\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 6, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>sex_id</th>\n", | |
| " <th>age_group_id</th>\n", | |
| " <th>year_id</th>\n", | |
| " <th>mean</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>1</td>\n", | |
| " <td>11</td>\n", | |
| " <td>1990</td>\n", | |
| " <td>0.515046</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>1</td>\n", | |
| " <td>11</td>\n", | |
| " <td>1991</td>\n", | |
| " <td>0.482924</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>1</td>\n", | |
| " <td>11</td>\n", | |
| " <td>1992</td>\n", | |
| " <td>0.519905</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>1</td>\n", | |
| " <td>11</td>\n", | |
| " <td>1993</td>\n", | |
| " <td>0.461850</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>1</td>\n", | |
| " <td>11</td>\n", | |
| " <td>1994</td>\n", | |
| " <td>0.465647</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " sex_id age_group_id year_id mean\n", | |
| "0 1 11 1990 0.515046\n", | |
| "1 1 11 1991 0.482924\n", | |
| "2 1 11 1992 0.519905\n", | |
| "3 1 11 1993 0.461850\n", | |
| "4 1 11 1994 0.465647" | |
| ] | |
| }, | |
| "execution_count": 6, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_df = da.mean(\"draw\").rename(\"mean\").to_dataframe().reset_index()\n", | |
| "mean_df.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 4. Converting from Pandas to Xarray" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'mean' (sex_id: 2, age_group_id: 3, year_id: 11)>\n", | |
| "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n", | |
| " 0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n", | |
| " [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n", | |
| " 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n", | |
| " [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n", | |
| " 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n", | |
| " [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n", | |
| " 0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n", | |
| " [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n", | |
| " 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 7, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da = mean_df.set_index([\"sex_id\", \"age_group_id\", \"year_id\"]).to_xarray()[\"mean\"]\n", | |
| "mean_da" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>age_group_id</th>\n", | |
| " <th>sex_id</th>\n", | |
| " <th>year_id</th>\n", | |
| " <th>draw_0</th>\n", | |
| " <th>draw_1</th>\n", | |
| " <th>draw_2</th>\n", | |
| " <th>draw_3</th>\n", | |
| " <th>draw_4</th>\n", | |
| " <th>draw_5</th>\n", | |
| " <th>draw_6</th>\n", | |
| " <th>...</th>\n", | |
| " <th>draw_90</th>\n", | |
| " <th>draw_91</th>\n", | |
| " <th>draw_92</th>\n", | |
| " <th>draw_93</th>\n", | |
| " <th>draw_94</th>\n", | |
| " <th>draw_95</th>\n", | |
| " <th>draw_96</th>\n", | |
| " <th>draw_97</th>\n", | |
| " <th>draw_98</th>\n", | |
| " <th>draw_99</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1990</td>\n", | |
| " <td>0.996779</td>\n", | |
| " <td>0.761632</td>\n", | |
| " <td>0.350849</td>\n", | |
| " <td>0.750393</td>\n", | |
| " <td>0.433888</td>\n", | |
| " <td>0.764425</td>\n", | |
| " <td>0.122375</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.225819</td>\n", | |
| " <td>0.836557</td>\n", | |
| " <td>0.885162</td>\n", | |
| " <td>0.222884</td>\n", | |
| " <td>0.641429</td>\n", | |
| " <td>0.393851</td>\n", | |
| " <td>0.381577</td>\n", | |
| " <td>0.294711</td>\n", | |
| " <td>0.650573</td>\n", | |
| " <td>0.193241</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1991</td>\n", | |
| " <td>0.786725</td>\n", | |
| " <td>0.014690</td>\n", | |
| " <td>0.184935</td>\n", | |
| " <td>0.269309</td>\n", | |
| " <td>0.493112</td>\n", | |
| " <td>0.365666</td>\n", | |
| " <td>0.573797</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.196058</td>\n", | |
| " <td>0.190651</td>\n", | |
| " <td>0.266525</td>\n", | |
| " <td>0.453888</td>\n", | |
| " <td>0.333859</td>\n", | |
| " <td>0.377547</td>\n", | |
| " <td>0.304548</td>\n", | |
| " <td>0.035076</td>\n", | |
| " <td>0.905141</td>\n", | |
| " <td>0.262088</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1992</td>\n", | |
| " <td>0.851437</td>\n", | |
| " <td>0.367362</td>\n", | |
| " <td>0.736778</td>\n", | |
| " <td>0.500674</td>\n", | |
| " <td>0.885498</td>\n", | |
| " <td>0.350236</td>\n", | |
| " <td>0.837336</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.526494</td>\n", | |
| " <td>0.398270</td>\n", | |
| " <td>0.609992</td>\n", | |
| " <td>0.480893</td>\n", | |
| " <td>0.261509</td>\n", | |
| " <td>0.537468</td>\n", | |
| " <td>0.326550</td>\n", | |
| " <td>0.393128</td>\n", | |
| " <td>0.236991</td>\n", | |
| " <td>0.239981</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1993</td>\n", | |
| " <td>0.431025</td>\n", | |
| " <td>0.786596</td>\n", | |
| " <td>0.385705</td>\n", | |
| " <td>0.140987</td>\n", | |
| " <td>0.742205</td>\n", | |
| " <td>0.380742</td>\n", | |
| " <td>0.247266</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.811025</td>\n", | |
| " <td>0.964106</td>\n", | |
| " <td>0.484327</td>\n", | |
| " <td>0.387248</td>\n", | |
| " <td>0.862704</td>\n", | |
| " <td>0.320871</td>\n", | |
| " <td>0.288251</td>\n", | |
| " <td>0.752603</td>\n", | |
| " <td>0.482269</td>\n", | |
| " <td>0.423913</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>11</td>\n", | |
| " <td>1</td>\n", | |
| " <td>1994</td>\n", | |
| " <td>0.715613</td>\n", | |
| " <td>0.937093</td>\n", | |
| " <td>0.276558</td>\n", | |
| " <td>0.155267</td>\n", | |
| " <td>0.892415</td>\n", | |
| " <td>0.782576</td>\n", | |
| " <td>0.620654</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.038820</td>\n", | |
| " <td>0.025020</td>\n", | |
| " <td>0.422900</td>\n", | |
| " <td>0.139842</td>\n", | |
| " <td>0.229250</td>\n", | |
| " <td>0.092306</td>\n", | |
| " <td>0.262763</td>\n", | |
| " <td>0.009972</td>\n", | |
| " <td>0.457518</td>\n", | |
| " <td>0.653466</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>5 rows × 103 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " age_group_id sex_id year_id draw_0 draw_1 draw_2 draw_3 \\\n", | |
| "0 11 1 1990 0.996779 0.761632 0.350849 0.750393 \n", | |
| "1 11 1 1991 0.786725 0.014690 0.184935 0.269309 \n", | |
| "2 11 1 1992 0.851437 0.367362 0.736778 0.500674 \n", | |
| "3 11 1 1993 0.431025 0.786596 0.385705 0.140987 \n", | |
| "4 11 1 1994 0.715613 0.937093 0.276558 0.155267 \n", | |
| "\n", | |
| " draw_4 draw_5 draw_6 ... draw_90 draw_91 draw_92 draw_93 \\\n", | |
| "0 0.433888 0.764425 0.122375 ... 0.225819 0.836557 0.885162 0.222884 \n", | |
| "1 0.493112 0.365666 0.573797 ... 0.196058 0.190651 0.266525 0.453888 \n", | |
| "2 0.885498 0.350236 0.837336 ... 0.526494 0.398270 0.609992 0.480893 \n", | |
| "3 0.742205 0.380742 0.247266 ... 0.811025 0.964106 0.484327 0.387248 \n", | |
| "4 0.892415 0.782576 0.620654 ... 0.038820 0.025020 0.422900 0.139842 \n", | |
| "\n", | |
| " draw_94 draw_95 draw_96 draw_97 draw_98 draw_99 \n", | |
| "0 0.641429 0.393851 0.381577 0.294711 0.650573 0.193241 \n", | |
| "1 0.333859 0.377547 0.304548 0.035076 0.905141 0.262088 \n", | |
| "2 0.261509 0.537468 0.326550 0.393128 0.236991 0.239981 \n", | |
| "3 0.862704 0.320871 0.288251 0.752603 0.482269 0.423913 \n", | |
| "4 0.229250 0.092306 0.262763 0.009972 0.457518 0.653466 \n", | |
| "\n", | |
| "[5 rows x 103 columns]" | |
| ] | |
| }, | |
| "execution_count": 8, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n", | |
| "array([[[[0.996779, ..., 0.193241],\n", | |
| " ...,\n", | |
| " [0.50371 , ..., 0.654273]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.227811, ..., 0.912856],\n", | |
| " ...,\n", | |
| " [0.24642 , ..., 0.581184]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.072334, ..., 0.684663],\n", | |
| " ...,\n", | |
| " [0.628984, ..., 0.358811]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.491698, ..., 0.876439],\n", | |
| " ...,\n", | |
| " [0.829525, ..., 0.611719]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99" | |
| ] | |
| }, | |
| "execution_count": 9, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "back_to_da = pd.wide_to_long(\n", | |
| " df, stubnames=\"draw_\", i=[\"sex_id\", \"age_group_id\", \"year_id\"], j=\"draw\").to_xarray()[\"draw_\"].rename(\"fake_thing\")\n", | |
| "back_to_da" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 10, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "True" | |
| ] | |
| }, | |
| "execution_count": 10, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "back_to_da.identical(da)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 5. Reading and writing Xarrays to netCDF files\n", | |
| "Can include metadata/attributes." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "da.attrs[\"metric\"] = \"rate\"\n", | |
| "da.attrs[\"author\"] = \"Me\"\n", | |
| "da.to_netcdf(\"data.nc\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 12, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n", | |
| "array([[[[0.996779, ..., 0.193241],\n", | |
| " ...,\n", | |
| " [0.50371 , ..., 0.654273]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.227811, ..., 0.912856],\n", | |
| " ...,\n", | |
| " [0.24642 , ..., 0.581184]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.072334, ..., 0.684663],\n", | |
| " ...,\n", | |
| " [0.628984, ..., 0.358811]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.491698, ..., 0.876439],\n", | |
| " ...,\n", | |
| " [0.829525, ..., 0.611719]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n", | |
| "Attributes:\n", | |
| " metric: rate\n", | |
| " author: Me" | |
| ] | |
| }, | |
| "execution_count": 12, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "da" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "And of course, we use Unix/Linux-based systems here, so file extensions are really just for use humans to quickly determine\n", | |
| "what file type a file claims to be. However, there is restriction/rule/utility from the computer's perspective." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 13, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "da.to_netcdf(\"data.csv\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 14, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "da.to_netcdf(\"data.nc_martin\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 15, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n", | |
| "array([[[[0.996779, ..., 0.193241],\n", | |
| " ...,\n", | |
| " [0.50371 , ..., 0.654273]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.227811, ..., 0.912856],\n", | |
| " ...,\n", | |
| " [0.24642 , ..., 0.581184]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.072334, ..., 0.684663],\n", | |
| " ...,\n", | |
| " [0.628984, ..., 0.358811]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.491698, ..., 0.876439],\n", | |
| " ...,\n", | |
| " [0.829525, ..., 0.611719]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n", | |
| "Attributes:\n", | |
| " metric: rate\n", | |
| " author: Me" | |
| ] | |
| }, | |
| "execution_count": 15, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "read_da1 = xr.open_dataarray(\"data.nc\")\n", | |
| "read_da1" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 16, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n", | |
| "array([[[[0.996779, ..., 0.193241],\n", | |
| " ...,\n", | |
| " [0.50371 , ..., 0.654273]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.227811, ..., 0.912856],\n", | |
| " ...,\n", | |
| " [0.24642 , ..., 0.581184]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.072334, ..., 0.684663],\n", | |
| " ...,\n", | |
| " [0.628984, ..., 0.358811]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.491698, ..., 0.876439],\n", | |
| " ...,\n", | |
| " [0.829525, ..., 0.611719]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n", | |
| "Attributes:\n", | |
| " metric: rate\n", | |
| " author: Me" | |
| ] | |
| }, | |
| "execution_count": 16, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "read_da2 = xr.open_dataarray(\"data.csv\")\n", | |
| "read_da2" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 17, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n", | |
| "array([[[[0.996779, ..., 0.193241],\n", | |
| " ...,\n", | |
| " [0.50371 , ..., 0.654273]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.227811, ..., 0.912856],\n", | |
| " ...,\n", | |
| " [0.24642 , ..., 0.581184]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.072334, ..., 0.684663],\n", | |
| " ...,\n", | |
| " [0.628984, ..., 0.358811]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.491698, ..., 0.876439],\n", | |
| " ...,\n", | |
| " [0.829525, ..., 0.611719]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n", | |
| "Attributes:\n", | |
| " metric: rate\n", | |
| " author: Me" | |
| ] | |
| }, | |
| "execution_count": 17, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "read_da3 = xr.open_dataarray(\"data.nc_martin\")\n", | |
| "read_da3" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "I mention this for two reasons:\n", | |
| "\n", | |
| "1) You have to be careful and get used to saving files with the ``.nc`` file extension, if you're super used to saving ``.csv``s or something else\n", | |
| "2) Sometimes we do save netCDFs with other file extensions than just a .nc, for example, for our risk attributable pipeline we save files partitioned over cause, draw and year, so we decided it was useful to name files in the following file name format: ``{acause}.ncdraw:range(100, 200)year_id:2017)``" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 6. Slicing and dicing data" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 18, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "mean_da = da.mean(\"draw\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 19, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 2)>\n", | |
| "array([[[0.438642, 0.484231]],\n", | |
| "\n", | |
| " [[0.482908, 0.469676]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11\n", | |
| " * year_id (year_id) int64 1995 1996" | |
| ] | |
| }, | |
| "execution_count": 19, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.sel(year_id=[1995, 1996], age_group_id=[11])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "point coordinates versus single coord dimensions" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 20, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 1, age_group_id: 1, year_id: 11)>\n", | |
| "array([[[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 2\n", | |
| " * age_group_id (age_group_id) int64 11\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 20, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.sel(sex_id=[2], age_group_id=[11])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 21, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (year_id: 11)>\n", | |
| "array([0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908, 0.469676,\n", | |
| " 0.505307, 0.528676, 0.498105, 0.496416])\n", | |
| "Coordinates:\n", | |
| " sex_id int64 2\n", | |
| " age_group_id int64 11\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 21, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.sel(sex_id=2, age_group_id=11)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 22, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (year_id: 11)>\n", | |
| "array([0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908, 0.469676,\n", | |
| " 0.505307, 0.528676, 0.498105, 0.496416])\n", | |
| "Coordinates:\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 22, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.sel(sex_id=2, age_group_id=11, drop=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "**Warning:** you'll want to avoid the following. " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 23, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n", | |
| "array([[0.438642, 0.484231],\n", | |
| " [0.482908, 0.469676]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| "Dimensions without coordinates: year_id" | |
| ] | |
| }, | |
| "execution_count": 23, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.sel(year_id=[1995, 1996], age_group_id=11, drop=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Instead do one of these:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 24, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n", | |
| "array([[0.438642, 0.484231],\n", | |
| " [0.482908, 0.469676]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * year_id (year_id) int64 1995 1996" | |
| ] | |
| }, | |
| "execution_count": 24, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.sel(year_id=[1995, 1996], age_group_id=11).drop(\"age_group_id\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 25, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n", | |
| "array([[0.438642, 0.484231],\n", | |
| " [0.482908, 0.469676]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * year_id (year_id) int64 1995 1996" | |
| ] | |
| }, | |
| "execution_count": 25, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.sel(year_id=[1995, 1996]).sel(age_group_id=11, drop=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Instead of slicing to specific coords, you can also exlcude specific coords" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 26, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n", | |
| "array([[[0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n", | |
| " 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n", | |
| "\n", | |
| " [[0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n", | |
| " 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 26, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.drop([11, 12], \"age_group_id\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 7. Changing values" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Another way to slice data is with the `.loc[]` method but I recommend only doing that when you're actually\n", | |
| "_changing_ values for a given slice of the data, because unlike the `.sel` method, it does not return a deep copy\n", | |
| "-- that is it's still pointing at the original data (unless you save the slice to another variable, BUT\n", | |
| "still don't risk it!!!)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 27, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "mean_da_cp = mean_da.copy()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 28, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n", | |
| "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n", | |
| " 0.484231, 0.543931, 0.512703, 0.495674, 0.490514]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 28, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da_cp.sel(age_group_id=[11])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The next operation can't be done:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 29, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "ename": "SyntaxError", | |
| "evalue": "can't assign to function call (<ipython-input-29-815bf6d51d25>, line 1)", | |
| "output_type": "error", | |
| "traceback": [ | |
| "\u001b[0;36m File \u001b[0;32m\"<ipython-input-29-815bf6d51d25>\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m mean_da_cp.sel(age_group_id=[11]) = mean_da_cp.sel(age_group_id=[11]) + 100\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m can't assign to function call\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "mean_da_cp.sel(age_group_id=[11]) = mean_da_cp.sel(age_group_id=[11]) + 100" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Using ``.loc``" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 30, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "mean_da_cp.loc[dict(age_group_id=[11])] = mean_da_cp.loc[dict(age_group_id=[11])] + 200" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 31, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n", | |
| "array([[[200.515046, 200.482924, 200.519905, 200.46185 , 200.465647,\n", | |
| " 200.438642, 200.484231, 200.543931, 200.512703, 200.495674,\n", | |
| " 200.490514]],\n", | |
| "\n", | |
| " [[200.480634, 200.501324, 200.48249 , 200.489984, 200.434139,\n", | |
| " 200.482908, 200.469676, 200.505307, 200.528676, 200.498105,\n", | |
| " 200.496416]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 31, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da_cp.sel(age_group_id=[11])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 8. Data Reduction\n", | |
| "\n", | |
| "Taking the sum, mean, quantile, or product over one or more dimensions.\n", | |
| "\n", | |
| "Also taking diff, cumprod, cumsum, etc." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Summing over dimensions" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 32, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n", | |
| "array([[[51.504646, 48.292353, 51.99048 , 46.18502 , 46.564747, 43.864225,\n", | |
| " 48.423078, 54.393145, 51.270341, 49.567356, 49.051368],\n", | |
| " [47.937805, 48.381889, 53.578497, 55.738359, 49.353499, 50.416803,\n", | |
| " 48.22734 , 47.17919 , 50.830201, 46.291025, 48.116702],\n", | |
| " [50.16666 , 52.051829, 52.148661, 51.15924 , 55.499361, 52.081173,\n", | |
| " 51.564503, 57.55673 , 48.786959, 47.284275, 54.801283]],\n", | |
| "\n", | |
| " [[48.063404, 50.13239 , 48.248955, 48.998371, 43.413857, 48.290827,\n", | |
| " 46.967581, 50.530699, 52.867631, 49.810484, 49.641557],\n", | |
| " [47.680474, 53.340312, 53.079849, 54.45405 , 51.181648, 48.059735,\n", | |
| " 49.768662, 48.72965 , 47.350284, 47.70728 , 45.04015 ],\n", | |
| " [50.742754, 50.526813, 47.339263, 50.463871, 49.240083, 47.83956 ,\n", | |
| " 45.524688, 45.940104, 55.508125, 50.005635, 52.937063]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 32, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "da.sum(\"draw\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Taking means" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 33, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (age_group_id: 3, year_id: 11)>\n", | |
| "array([[0.49784 , 0.492124, 0.501197, 0.475917, 0.449893, 0.460775, 0.476953,\n", | |
| " 0.524619, 0.52069 , 0.496889, 0.493465],\n", | |
| " [0.478091, 0.508611, 0.533292, 0.550962, 0.502676, 0.492383, 0.48998 ,\n", | |
| " 0.479544, 0.490902, 0.469992, 0.465784],\n", | |
| " [0.504547, 0.512893, 0.49744 , 0.508116, 0.523697, 0.499604, 0.485446,\n", | |
| " 0.517484, 0.521475, 0.48645 , 0.538692]])\n", | |
| "Coordinates:\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 33, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "da.mean([\"draw\", \"sex_id\"])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 34, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' ()>\n", | |
| "array(3289.684552)" | |
| ] | |
| }, | |
| "execution_count": 34, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "da.sum()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Taking quantiles" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 35, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (quantile: 2, sex_id: 2, age_group_id: 3, year_id: 11)>\n", | |
| "array([[[[0.946091, 0.947006, 0.989351, 0.938826, 0.957233, 0.961054,\n", | |
| " 0.962361, 0.9847 , 0.991939, 0.992401, 0.931928],\n", | |
| " [0.96409 , 0.989095, 0.971059, 0.932266, 0.992066, 0.995731,\n", | |
| " 0.961511, 0.95729 , 0.929298, 0.971658, 0.961813],\n", | |
| " [0.985317, 0.95646 , 0.971972, 0.984797, 0.965313, 0.9662 ,\n", | |
| " 0.967894, 0.966558, 0.983227, 0.975196, 0.969915]],\n", | |
| "\n", | |
| " [[0.968146, 0.976099, 0.977238, 0.909349, 0.948866, 0.948609,\n", | |
| " 0.951318, 0.960428, 0.949223, 0.983426, 0.994721],\n", | |
| " [0.953117, 0.944793, 0.99642 , 0.971521, 0.951413, 0.926127,\n", | |
| " 0.945339, 0.983831, 0.976795, 0.989499, 0.965297],\n", | |
| " [0.978185, 0.975026, 0.967635, 0.991123, 0.945305, 0.965985,\n", | |
| " 0.937724, 0.966592, 0.944686, 0.947914, 0.969859]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.035937, 0.032809, 0.036074, 0.054938, 0.020817, 0.016693,\n", | |
| " 0.010495, 0.052086, 0.032586, 0.063762, 0.043832],\n", | |
| " [0.040247, 0.042332, 0.038767, 0.022039, 0.02663 , 0.073381,\n", | |
| " 0.016937, 0.019657, 0.063141, 0.015542, 0.033979],\n", | |
| " [0.019053, 0.023236, 0.041205, 0.024052, 0.025952, 0.046415,\n", | |
| " 0.013963, 0.082439, 0.039138, 0.028172, 0.087198]],\n", | |
| "\n", | |
| " [[0.022962, 0.039559, 0.045023, 0.056532, 0.038189, 0.013773,\n", | |
| " 0.024106, 0.024428, 0.078114, 0.020636, 0.034975],\n", | |
| " [0.053146, 0.026986, 0.03948 , 0.03409 , 0.055288, 0.030514,\n", | |
| " 0.04947 , 0.028957, 0.035929, 0.0153 , 0.021199],\n", | |
| " [0.040169, 0.051678, 0.023015, 0.046181, 0.02122 , 0.017725,\n", | |
| " 0.034586, 0.014844, 0.019773, 0.030668, 0.030164]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * quantile (quantile) float64 0.975 0.025" | |
| ] | |
| }, | |
| "execution_count": 35, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "da.quantile([0.975, 0.025], dim=\"draw\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Cummulative sums and products" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 36, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (year_id: 11)>\n", | |
| "array([0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642, 0.484231,\n", | |
| " 0.543931, 0.512703, 0.495674, 0.490514])\n", | |
| "Coordinates:\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 36, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "data_slice_da = mean_da.sel(sex_id=1, age_group_id=11, drop=True)\n", | |
| "data_slice_da" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 37, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (year_id: 11)>\n", | |
| "array([0.515046, 0.99797 , 1.517875, 1.979725, 2.445372, 2.884015, 3.368245,\n", | |
| " 3.912177, 4.42488 , 4.920554, 5.411068])\n", | |
| "Coordinates:\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 37, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "data_slice_da.cumsum(\"year_id\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 38, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (year_id: 11)>\n", | |
| "array([5.150465e-01, 2.487281e-01, 1.293149e-01, 5.972412e-02, 2.781038e-02,\n", | |
| " 1.219881e-02, 5.907039e-03, 3.213024e-03, 1.647328e-03, 8.165372e-04,\n", | |
| " 4.005227e-04])\n", | |
| "Coordinates:\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 38, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "data_slice_da.cumprod(\"year_id\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 9. Vectorized operations\n", | |
| "\n", | |
| "Adding, multiplying, dividing two or more arrays" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Xarray lines up all of the dimensions and coordinates for you when performing arithmetic between two or more arrays.\n", | |
| "Of course, if you line things up yourself before had computation will be faster (can discuss that more later)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "It's okay if they don't share dimensions -- data will automatically broadcast for dimensions that don't exist. e.g.,\n", | |
| "below, the right operand doesn't have a draw dimension, but the value of each slice is applied to each draw of the corresponding slice from the left operand\n", | |
| "\n", | |
| "that is\n", | |
| "\n", | |
| "``da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=i)`` for all ``i`` in ``[0, 99]`` is applied to\n", | |
| "``mean_da.sel(age_group_id=11, sex_id=2, year_id=1996)``" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 42, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "result_da = da + mean_da # try with operators!" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 43, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (draw: 3)>\n", | |
| "array([0.956272, 0.969894, 1.22502 ])\n", | |
| "Coordinates:\n", | |
| " sex_id int64 2\n", | |
| " age_group_id int64 11\n", | |
| " year_id int64 1996\n", | |
| " * draw (draw) int64 0 1 2" | |
| ] | |
| }, | |
| "execution_count": 43, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "result_da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=[0, 1, 2])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 44, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (draw: 3)>\n", | |
| "array([0.486596, 0.500218, 0.755344])\n", | |
| "Coordinates:\n", | |
| " sex_id int64 2\n", | |
| " age_group_id int64 11\n", | |
| " year_id int64 1996\n", | |
| " * draw (draw) int64 0 1 2\n", | |
| "Attributes:\n", | |
| " metric: rate\n", | |
| " author: Me" | |
| ] | |
| }, | |
| "execution_count": 44, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=[0, 1, 2])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 45, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' ()>\n", | |
| "array(0.469676)\n", | |
| "Coordinates:\n", | |
| " sex_id int64 2\n", | |
| " age_group_id int64 11\n", | |
| " year_id int64 1996" | |
| ] | |
| }, | |
| "execution_count": 45, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.sel(age_group_id=11, sex_id=2, year_id=1996)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Xarray also defaults to taking the intersection of the coordinates from each of the operands" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 46, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n", | |
| "array([[[1.003333, 1.041037, 1.042973, 1.023185, 1.109987, 1.041623,\n", | |
| " 1.03129 , 1.151135, 0.975739, 0.945686, 1.096026]],\n", | |
| "\n", | |
| " [[1.014855, 1.010536, 0.946785, 1.009277, 0.984802, 0.956791,\n", | |
| " 0.910494, 0.918802, 1.110162, 1.000113, 1.058741]]])\n", | |
| "Coordinates:\n", | |
| " * age_group_id (age_group_id) int64 13\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 46, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "3 * mean_da - mean_da.sel(age_group_id=[13])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "There are ways around the above if you want the union of the two. Of course the output will have NaNs where the operands\n", | |
| "don't line up (i.e. where they don't have the same coordinates of a dimension they share)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 47, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n", | |
| "array([[[ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan],\n", | |
| " [ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan],\n", | |
| " [1.003333, 1.041037, 1.042973, 1.023185, 1.109987, 1.041623,\n", | |
| " 1.03129 , 1.151135, 0.975739, 0.945686, 1.096026]],\n", | |
| "\n", | |
| " [[ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan],\n", | |
| " [ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan],\n", | |
| " [1.014855, 1.010536, 0.946785, 1.009277, 0.984802, 0.956791,\n", | |
| " 0.910494, 0.918802, 1.110162, 1.000113, 1.058741]]])\n", | |
| "Coordinates:\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 47, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "with xr.set_options(arithmetic_join=\"outer\"):\n", | |
| " result = 3 * mean_da - mean_da.sel(age_group_id=[13])\n", | |
| "result" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Want to point out another subtle feature of xarray I used just above: applying a scalar (float) to the data. Even this simple task is quite a bit more work in pandas, because you don't want to add 1000 to your metadata as well as your data." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 48, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n", | |
| "array([[[1000.515046, 1000.482924, 1000.519905, 1000.46185 , 1000.465647,\n", | |
| " 1000.438642, 1000.484231, 1000.543931, 1000.512703, 1000.495674,\n", | |
| " 1000.490514],\n", | |
| " [1000.479378, 1000.483819, 1000.535785, 1000.557384, 1000.493535,\n", | |
| " 1000.504168, 1000.482273, 1000.471792, 1000.508302, 1000.46291 ,\n", | |
| " 1000.481167],\n", | |
| " [1000.501667, 1000.520518, 1000.521487, 1000.511592, 1000.554994,\n", | |
| " 1000.520812, 1000.515645, 1000.575567, 1000.48787 , 1000.472843,\n", | |
| " 1000.548013]],\n", | |
| "\n", | |
| " [[1000.480634, 1000.501324, 1000.48249 , 1000.489984, 1000.434139,\n", | |
| " 1000.482908, 1000.469676, 1000.505307, 1000.528676, 1000.498105,\n", | |
| " 1000.496416],\n", | |
| " [1000.476805, 1000.533403, 1000.530798, 1000.54454 , 1000.511816,\n", | |
| " 1000.480597, 1000.497687, 1000.487296, 1000.473503, 1000.477073,\n", | |
| " 1000.450402],\n", | |
| " [1000.507428, 1000.505268, 1000.473393, 1000.504639, 1000.492401,\n", | |
| " 1000.478396, 1000.455247, 1000.459401, 1000.555081, 1000.500056,\n", | |
| " 1000.529371]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 48, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da + 1000" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 10. Changing and adding coordinates/Expanding or broadcasting dimensions" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 49, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "broadcasted_da = mean_da.expand_dims(draw=range(5), location_id=[102, 6])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 50, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (draw: 5, location_id: 2)>\n", | |
| "array([[0.482908, 0.482908],\n", | |
| " [0.482908, 0.482908],\n", | |
| " [0.482908, 0.482908],\n", | |
| " [0.482908, 0.482908],\n", | |
| " [0.482908, 0.482908]])\n", | |
| "Coordinates:\n", | |
| " * draw (draw) int64 0 1 2 3 4\n", | |
| " * location_id (location_id) int64 102 6\n", | |
| " sex_id int64 2\n", | |
| " age_group_id int64 11\n", | |
| " year_id int64 1995" | |
| ] | |
| }, | |
| "execution_count": 50, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "broadcasted_da.sel(age_group_id=11, sex_id=2, year_id=1995)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Trying to limit use of our internal FHS code, but we do have a really nice/fast tool for this: expanding an existing dimension to include new coordinates." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 51, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from fbd_core.etl import expand_dimensions" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Note: the difference between broadcasting and expanding\n", | |
| "* **broadcasting** here means altering the array to include dimensions it didn't previously, where each coordinate on a given new dimension points to an identical slice\n", | |
| "* **Expanding** here means altering the array to include coordinates it didn't previously but on a dimension that already did exist." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 52, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 5, year_id: 11, scenario: 3)>\n", | |
| "array([[[[0.515046, ..., 0.515046],\n", | |
| " ...,\n", | |
| " [0.490514, ..., 0.490514]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[ nan, ..., nan],\n", | |
| " ...,\n", | |
| " [ nan, ..., nan]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.480634, ..., 0.480634],\n", | |
| " ...,\n", | |
| " [0.496416, ..., 0.496416]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[ nan, ..., nan],\n", | |
| " ...,\n", | |
| " [ nan, ..., nan]]]])\n", | |
| "Coordinates:\n", | |
| " * age_group_id (age_group_id) int64 11 12 13 14 15\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * scenario (scenario) int64 0 1 -1" | |
| ] | |
| }, | |
| "execution_count": 52, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "expand_dimensions(mean_da, age_group_id=[14, 15], scenario=[0, 1, -1])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 53, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 5, year_id: 11, scenario: 3)>\n", | |
| "array([[[[5.150465e-01, ..., 5.150465e-01],\n", | |
| " ...,\n", | |
| " [4.905137e-01, ..., 4.905137e-01]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[9.999000e+03, ..., 9.999000e+03],\n", | |
| " ...,\n", | |
| " [9.999000e+03, ..., 9.999000e+03]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[4.806340e-01, ..., 4.806340e-01],\n", | |
| " ...,\n", | |
| " [4.964156e-01, ..., 4.964156e-01]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[9.999000e+03, ..., 9.999000e+03],\n", | |
| " ...,\n", | |
| " [9.999000e+03, ..., 9.999000e+03]]]])\n", | |
| "Coordinates:\n", | |
| " * age_group_id (age_group_id) int64 11 12 13 14 15\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * scenario (scenario) int64 0 1 -1" | |
| ] | |
| }, | |
| "execution_count": 53, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "expand_dimensions(mean_da, age_group_id=[14, 15], scenario=[0, 1, -1], fill_value=9999)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Changing coordinate or dimension labels" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 54, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "renamed_da = mean_da.rename({\"sex_id\": \"sex_name\", \"age_group_id\": \"age_group_name\"})" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 55, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_name: 2, age_group_name: 3, year_id: 11)>\n", | |
| "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n", | |
| " 0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n", | |
| " [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n", | |
| " 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n", | |
| " [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n", | |
| " 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n", | |
| " [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n", | |
| " 0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n", | |
| " [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n", | |
| " 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n", | |
| "Coordinates:\n", | |
| " * sex_name (sex_name) <U6 'Male' 'Female'\n", | |
| " * age_group_name (age_group_name) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 55, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "renamed_da.assign_coords(sex_name=[\"Male\", \"Female\"])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Concatenating multiple dataarrays into one dataarray" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 56, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "future_da = xr.DataArray(\n", | |
| " data=np.random.random([2, 3, 4]),\n", | |
| " dims=[\"sex_id\", \"age_group_id\", \"year_id\"],\n", | |
| " coords={\n", | |
| " \"sex_id\": [1, 2],\n", | |
| " \"age_group_id\": [11, 12, 13],\n", | |
| " \"year_id\": range(2001, 2005),\n", | |
| " },\n", | |
| " name=\"fake_thing\"\n", | |
| " )" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 57, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n", | |
| "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n", | |
| " 0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n", | |
| " 0.001631, 0.115671, 0.44467 ],\n", | |
| " [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n", | |
| " 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696 ,\n", | |
| " 0.813101, 0.781387, 0.394234],\n", | |
| " [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n", | |
| " 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n", | |
| " 0.491746, 0.697288, 0.206781]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n", | |
| " 0.925087, 0.867205, 0.221227],\n", | |
| " [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n", | |
| " 0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n", | |
| " 0.63895 , 0.727136, 0.085397],\n", | |
| " [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n", | |
| " 0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n", | |
| " 0.217386, 0.575926, 0.349022]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004" | |
| ] | |
| }, | |
| "execution_count": 57, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "xr.concat([mean_da, future_da], dim=\"year_id\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 58, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n", | |
| "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n", | |
| " 0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n", | |
| " 0.001631, 0.115671, 0.44467 ],\n", | |
| " [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n", | |
| " 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696 ,\n", | |
| " 0.813101, 0.781387, 0.394234],\n", | |
| " [ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan, 0.811391,\n", | |
| " 0.491746, 0.697288, 0.206781]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n", | |
| " 0.925087, 0.867205, 0.221227],\n", | |
| " [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n", | |
| " 0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n", | |
| " 0.63895 , 0.727136, 0.085397],\n", | |
| " [ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan, 0.610459,\n", | |
| " 0.217386, 0.575926, 0.349022]]])\n", | |
| "Coordinates:\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004" | |
| ] | |
| }, | |
| "execution_count": 58, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "xr.concat([mean_da.sel(age_group_id=[11, 12]), future_da], dim=\"year_id\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 59, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "summary_da = xr.concat([\n", | |
| " da.mean(\"draw\").assign_coords(summary_val=\"mean\"),\n", | |
| " da.quantile([0.975, 0.025], \"draw\").rename({\"quantile\": \"summary_val\"}).assign_coords(summary_val=[\"upper\", \"lower\"])\n", | |
| " ], dim=\"summary_val\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 60, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, summary_val: 3)>\n", | |
| "array([[[[0.515046, 0.946091, 0.035937],\n", | |
| " [0.482924, 0.947006, 0.032809],\n", | |
| " [0.519905, 0.989351, 0.036074],\n", | |
| " [0.46185 , 0.938826, 0.054938],\n", | |
| " [0.465647, 0.957233, 0.020817],\n", | |
| " [0.438642, 0.961054, 0.016693],\n", | |
| " [0.484231, 0.962361, 0.010495],\n", | |
| " [0.543931, 0.9847 , 0.052086],\n", | |
| " [0.512703, 0.991939, 0.032586],\n", | |
| " [0.495674, 0.992401, 0.063762],\n", | |
| " [0.490514, 0.931928, 0.043832]],\n", | |
| "\n", | |
| " [[0.479378, 0.96409 , 0.040247],\n", | |
| " [0.483819, 0.989095, 0.042332],\n", | |
| " [0.535785, 0.971059, 0.038767],\n", | |
| " [0.557384, 0.932266, 0.022039],\n", | |
| " [0.493535, 0.992066, 0.02663 ],\n", | |
| " [0.504168, 0.995731, 0.073381],\n", | |
| " [0.482273, 0.961511, 0.016937],\n", | |
| " [0.471792, 0.95729 , 0.019657],\n", | |
| " [0.508302, 0.929298, 0.063141],\n", | |
| " [0.46291 , 0.971658, 0.015542],\n", | |
| " [0.481167, 0.961813, 0.033979]],\n", | |
| "\n", | |
| " [[0.501667, 0.985317, 0.019053],\n", | |
| " [0.520518, 0.95646 , 0.023236],\n", | |
| " [0.521487, 0.971972, 0.041205],\n", | |
| " [0.511592, 0.984797, 0.024052],\n", | |
| " [0.554994, 0.965313, 0.025952],\n", | |
| " [0.520812, 0.9662 , 0.046415],\n", | |
| " [0.515645, 0.967894, 0.013963],\n", | |
| " [0.575567, 0.966558, 0.082439],\n", | |
| " [0.48787 , 0.983227, 0.039138],\n", | |
| " [0.472843, 0.975196, 0.028172],\n", | |
| " [0.548013, 0.969915, 0.087198]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.480634, 0.968146, 0.022962],\n", | |
| " [0.501324, 0.976099, 0.039559],\n", | |
| " [0.48249 , 0.977238, 0.045023],\n", | |
| " [0.489984, 0.909349, 0.056532],\n", | |
| " [0.434139, 0.948866, 0.038189],\n", | |
| " [0.482908, 0.948609, 0.013773],\n", | |
| " [0.469676, 0.951318, 0.024106],\n", | |
| " [0.505307, 0.960428, 0.024428],\n", | |
| " [0.528676, 0.949223, 0.078114],\n", | |
| " [0.498105, 0.983426, 0.020636],\n", | |
| " [0.496416, 0.994721, 0.034975]],\n", | |
| "\n", | |
| " [[0.476805, 0.953117, 0.053146],\n", | |
| " [0.533403, 0.944793, 0.026986],\n", | |
| " [0.530798, 0.99642 , 0.03948 ],\n", | |
| " [0.54454 , 0.971521, 0.03409 ],\n", | |
| " [0.511816, 0.951413, 0.055288],\n", | |
| " [0.480597, 0.926127, 0.030514],\n", | |
| " [0.497687, 0.945339, 0.04947 ],\n", | |
| " [0.487296, 0.983831, 0.028957],\n", | |
| " [0.473503, 0.976795, 0.035929],\n", | |
| " [0.477073, 0.989499, 0.0153 ],\n", | |
| " [0.450402, 0.965297, 0.021199]],\n", | |
| "\n", | |
| " [[0.507428, 0.978185, 0.040169],\n", | |
| " [0.505268, 0.975026, 0.051678],\n", | |
| " [0.473393, 0.967635, 0.023015],\n", | |
| " [0.504639, 0.991123, 0.046181],\n", | |
| " [0.492401, 0.945305, 0.02122 ],\n", | |
| " [0.478396, 0.965985, 0.017725],\n", | |
| " [0.455247, 0.937724, 0.034586],\n", | |
| " [0.459401, 0.966592, 0.014844],\n", | |
| " [0.555081, 0.944686, 0.019773],\n", | |
| " [0.500056, 0.947914, 0.030668],\n", | |
| " [0.529371, 0.969859, 0.030164]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * summary_val (summary_val) <U5 'mean' 'upper' 'lower'" | |
| ] | |
| }, | |
| "execution_count": 60, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "summary_da" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "``combine_first`` is another tool, which similar but has a nice additional feature -- if the array you're appending\n", | |
| "data for coords that already exist original array, then only the data will be used." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 61, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n", | |
| "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n", | |
| " 0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n", | |
| " 0.001631, 0.115671, 0.44467 ],\n", | |
| " [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n", | |
| " 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696 ,\n", | |
| " 0.813101, 0.781387, 0.394234],\n", | |
| " [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n", | |
| " 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n", | |
| " 0.491746, 0.697288, 0.206781]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n", | |
| " 0.925087, 0.867205, 0.221227],\n", | |
| " [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n", | |
| " 0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n", | |
| " 0.63895 , 0.727136, 0.085397],\n", | |
| " [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n", | |
| " 0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n", | |
| " 0.217386, 0.575926, 0.349022]]])\n", | |
| "Coordinates:\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13" | |
| ] | |
| }, | |
| "execution_count": 61, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.combine_first(future_da)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 62, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 6)>\n", | |
| "array([[[-5.000000e+00, -5.000000e+00, 9.976917e-01, 1.631312e-03,\n", | |
| " 1.156705e-01, 4.446702e-01],\n", | |
| " [-5.000000e+00, -5.000000e+00, 8.696002e-01, 8.131013e-01,\n", | |
| " 7.813870e-01, 3.942344e-01],\n", | |
| " [-5.000000e+00, -5.000000e+00, 8.113905e-01, 4.917461e-01,\n", | |
| " 6.972879e-01, 2.067813e-01]],\n", | |
| "\n", | |
| " [[-5.000000e+00, -5.000000e+00, 2.771139e-01, 9.250868e-01,\n", | |
| " 8.672051e-01, 2.212266e-01],\n", | |
| " [-5.000000e+00, -5.000000e+00, 5.082614e-01, 6.389501e-01,\n", | |
| " 7.271363e-01, 8.539651e-02],\n", | |
| " [-5.000000e+00, -5.000000e+00, 6.104589e-01, 2.173860e-01,\n", | |
| " 5.759261e-01, 3.490221e-01]]])\n", | |
| "Coordinates:\n", | |
| " * year_id (year_id) int64 1999 2000 2001 2002 2003 2004\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13" | |
| ] | |
| }, | |
| "execution_count": 62, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "future_with_past = expand_dimensions(future_da, year_id=[1999, 2000], fill_value=-5)\n", | |
| "future_with_past" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 63, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n", | |
| "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n", | |
| " 0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n", | |
| " 0.001631, 0.115671, 0.44467 ],\n", | |
| " [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n", | |
| " 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696 ,\n", | |
| " 0.813101, 0.781387, 0.394234],\n", | |
| " [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n", | |
| " 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n", | |
| " 0.491746, 0.697288, 0.206781]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n", | |
| " 0.925087, 0.867205, 0.221227],\n", | |
| " [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n", | |
| " 0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n", | |
| " 0.63895 , 0.727136, 0.085397],\n", | |
| " [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n", | |
| " 0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n", | |
| " 0.217386, 0.575926, 0.349022]]])\n", | |
| "Coordinates:\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13" | |
| ] | |
| }, | |
| "execution_count": 63, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "mean_da.combine_first(future_with_past)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# 11. Datasets\n", | |
| "\n", | |
| "what are they. when they're useful. How to make them.\n", | |
| "\n", | |
| "Datasets are very useful in some situations." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "A typical use case is when you want to keep several related data variables in one data structure but when they have inconsistent dimensions. For example, you want to store SDI and mortality in a dataset together, but mortality has age-group and sex dimensions, while SDI does not. They do however share year and location as dimensions." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Merging two or more dataarrays into one dataset: Note that dataarrays have to be named" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "ds = xr.merge([mean_da.rename(\"mean\"), da.rename(\"draws\")])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 64, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.Dataset>\n", | |
| "Dimensions: (age_group_id: 3, sex_id: 2, year_id: 11)\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| "Data variables:\n", | |
| " draw_0 (sex_id, age_group_id, year_id) float64 0.9968 ... 0.8295\n", | |
| " draw_1 (sex_id, age_group_id, year_id) float64 0.7616 ... 0.01835\n", | |
| " draw_2 (sex_id, age_group_id, year_id) float64 0.3508 ... 0.6153\n", | |
| " draw_3 (sex_id, age_group_id, year_id) float64 0.7504 ... 0.6143\n", | |
| " draw_4 (sex_id, age_group_id, year_id) float64 0.4339 ... 0.4588\n", | |
| " draw_5 (sex_id, age_group_id, year_id) float64 0.7644 ... 0.995\n", | |
| " draw_6 (sex_id, age_group_id, year_id) float64 0.1224 ... 0.0544\n", | |
| " draw_7 (sex_id, age_group_id, year_id) float64 0.5721 ... 0.6756\n", | |
| " draw_8 (sex_id, age_group_id, year_id) float64 0.6604 ... 0.04322\n", | |
| " draw_9 (sex_id, age_group_id, year_id) float64 0.7618 ... 0.6469\n", | |
| " draw_10 (sex_id, age_group_id, year_id) float64 0.319 ... 0.9751\n", | |
| " draw_11 (sex_id, age_group_id, year_id) float64 0.524 ... 0.7128\n", | |
| " draw_12 (sex_id, age_group_id, year_id) float64 0.6512 ... 0.3474\n", | |
| " draw_13 (sex_id, age_group_id, year_id) float64 0.562 0.619 ... 0.5368\n", | |
| " draw_14 (sex_id, age_group_id, year_id) float64 0.2113 ... 0.5242\n", | |
| " draw_15 (sex_id, age_group_id, year_id) float64 0.4212 ... 0.7537\n", | |
| " draw_16 (sex_id, age_group_id, year_id) float64 0.9361 ... 0.6975\n", | |
| " draw_17 (sex_id, age_group_id, year_id) float64 0.2602 ... 0.4517\n", | |
| " draw_18 (sex_id, age_group_id, year_id) float64 0.9366 ... 0.5681\n", | |
| " draw_19 (sex_id, age_group_id, year_id) float64 0.2167 ... 0.6634\n", | |
| " draw_20 (sex_id, age_group_id, year_id) float64 0.4639 ... 0.1377\n", | |
| " draw_21 (sex_id, age_group_id, year_id) float64 0.8358 ... 0.9574\n", | |
| " draw_22 (sex_id, age_group_id, year_id) float64 0.8781 ... 0.8257\n", | |
| " draw_23 (sex_id, age_group_id, year_id) float64 0.646 ... 0.7914\n", | |
| " draw_24 (sex_id, age_group_id, year_id) float64 0.7519 ... 0.9622\n", | |
| " draw_25 (sex_id, age_group_id, year_id) float64 0.1052 ... 0.411\n", | |
| " draw_26 (sex_id, age_group_id, year_id) float64 0.453 ... 0.6014\n", | |
| " draw_27 (sex_id, age_group_id, year_id) float64 0.97 0.3091 ... 0.6086\n", | |
| " draw_28 (sex_id, age_group_id, year_id) float64 0.4998 ... 0.4429\n", | |
| " draw_29 (sex_id, age_group_id, year_id) float64 0.05933 ... 0.06421\n", | |
| " draw_30 (sex_id, age_group_id, year_id) float64 0.4804 ... 0.06488\n", | |
| " draw_31 (sex_id, age_group_id, year_id) float64 0.8592 ... 0.8677\n", | |
| " draw_32 (sex_id, age_group_id, year_id) float64 0.7863 ... 0.3912\n", | |
| " draw_33 (sex_id, age_group_id, year_id) float64 0.8053 ... 0.6144\n", | |
| " draw_34 (sex_id, age_group_id, year_id) float64 0.4348 ... 0.4366\n", | |
| " draw_35 (sex_id, age_group_id, year_id) float64 0.06214 ... 0.9133\n", | |
| " draw_36 (sex_id, age_group_id, year_id) float64 0.2246 ... 0.3688\n", | |
| " draw_37 (sex_id, age_group_id, year_id) float64 0.8678 ... 0.09394\n", | |
| " draw_38 (sex_id, age_group_id, year_id) float64 0.07461 ... 0.3763\n", | |
| " draw_39 (sex_id, age_group_id, year_id) float64 0.07531 ... 0.3003\n", | |
| " draw_40 (sex_id, age_group_id, year_id) float64 0.693 ... 0.9037\n", | |
| " draw_41 (sex_id, age_group_id, year_id) float64 0.1258 ... 0.0871\n", | |
| " draw_42 (sex_id, age_group_id, year_id) float64 0.8201 ... 0.1401\n", | |
| " draw_43 (sex_id, age_group_id, year_id) float64 0.0862 ... 0.09171\n", | |
| " draw_44 (sex_id, age_group_id, year_id) float64 0.0325 ... 0.8637\n", | |
| " draw_45 (sex_id, age_group_id, year_id) float64 0.891 ... 0.7595\n", | |
| " draw_46 (sex_id, age_group_id, year_id) float64 0.7296 ... 0.4517\n", | |
| " draw_47 (sex_id, age_group_id, year_id) float64 0.2217 ... 0.8892\n", | |
| " draw_48 (sex_id, age_group_id, year_id) float64 0.9126 ... 0.7594\n", | |
| " draw_49 (sex_id, age_group_id, year_id) float64 0.2288 ... 0.07483\n", | |
| " draw_50 (sex_id, age_group_id, year_id) float64 0.04443 ... 0.04533\n", | |
| " draw_51 (sex_id, age_group_id, year_id) float64 0.9039 ... 0.2432\n", | |
| " draw_52 (sex_id, age_group_id, year_id) float64 0.7614 ... 0.3947\n", | |
| " draw_53 (sex_id, age_group_id, year_id) float64 0.4993 ... 0.6609\n", | |
| " draw_54 (sex_id, age_group_id, year_id) float64 0.2617 ... 0.5335\n", | |
| " draw_55 (sex_id, age_group_id, year_id) float64 0.4381 ... 0.233\n", | |
| " draw_56 (sex_id, age_group_id, year_id) float64 0.8289 ... 0.3777\n", | |
| " draw_57 (sex_id, age_group_id, year_id) float64 0.891 0.321 ... 0.5784\n", | |
| " draw_58 (sex_id, age_group_id, year_id) float64 0.4284 ... 0.9219\n", | |
| " draw_59 (sex_id, age_group_id, year_id) float64 0.8465 ... 0.3912\n", | |
| " draw_60 (sex_id, age_group_id, year_id) float64 0.355 ... 0.9802\n", | |
| " draw_61 (sex_id, age_group_id, year_id) float64 0.02395 ... 0.8892\n", | |
| " draw_62 (sex_id, age_group_id, year_id) float64 0.2704 ... 0.5783\n", | |
| " draw_63 (sex_id, age_group_id, year_id) float64 0.5877 ... 0.8256\n", | |
| " draw_64 (sex_id, age_group_id, year_id) float64 0.8006 ... 0.3911\n", | |
| " draw_65 (sex_id, age_group_id, year_id) float64 0.9209 ... 0.964\n", | |
| " draw_66 (sex_id, age_group_id, year_id) float64 0.8785 ... 0.1937\n", | |
| " draw_67 (sex_id, age_group_id, year_id) float64 0.6067 ... 0.9608\n", | |
| " draw_68 (sex_id, age_group_id, year_id) float64 0.03497 ... 0.3151\n", | |
| " draw_69 (sex_id, age_group_id, year_id) float64 0.5407 ... 0.8746\n", | |
| " draw_70 (sex_id, age_group_id, year_id) float64 0.2832 ... 0.5551\n", | |
| " draw_71 (sex_id, age_group_id, year_id) float64 0.2203 ... 0.117\n", | |
| " draw_72 (sex_id, age_group_id, year_id) float64 0.7725 ... 0.6808\n", | |
| " draw_73 (sex_id, age_group_id, year_id) float64 0.6947 ... 0.1008\n", | |
| " draw_74 (sex_id, age_group_id, year_id) float64 0.6666 ... 0.6387\n", | |
| " draw_75 (sex_id, age_group_id, year_id) float64 0.5149 ... 0.005526\n", | |
| " draw_76 (sex_id, age_group_id, year_id) float64 0.9547 ... 0.7345\n", | |
| " draw_77 (sex_id, age_group_id, year_id) float64 0.3364 ... 0.005635\n", | |
| " draw_78 (sex_id, age_group_id, year_id) float64 0.572 ... 0.8498\n", | |
| " draw_79 (sex_id, age_group_id, year_id) float64 0.4541 ... 0.646\n", | |
| " draw_80 (sex_id, age_group_id, year_id) float64 0.2224 ... 0.9263\n", | |
| " draw_81 (sex_id, age_group_id, year_id) float64 0.1243 ... 0.4103\n", | |
| " draw_82 (sex_id, age_group_id, year_id) float64 0.7503 ... 0.1586\n", | |
| " draw_83 (sex_id, age_group_id, year_id) float64 0.1514 ... 0.8157\n", | |
| " draw_84 (sex_id, age_group_id, year_id) float64 0.7995 ... 0.05216\n", | |
| " draw_85 (sex_id, age_group_id, year_id) float64 0.4792 ... 0.424\n", | |
| " draw_86 (sex_id, age_group_id, year_id) float64 0.09267 ... 0.6082\n", | |
| " draw_87 (sex_id, age_group_id, year_id) float64 0.5104 ... 0.6742\n", | |
| " draw_88 (sex_id, age_group_id, year_id) float64 0.9314 ... 0.3405\n", | |
| " draw_89 (sex_id, age_group_id, year_id) float64 0.037 ... 0.8594\n", | |
| " draw_90 (sex_id, age_group_id, year_id) float64 0.2258 ... 0.835\n", | |
| " draw_91 (sex_id, age_group_id, year_id) float64 0.8366 ... 0.4034\n", | |
| " draw_92 (sex_id, age_group_id, year_id) float64 0.8852 ... 0.8596\n", | |
| " draw_93 (sex_id, age_group_id, year_id) float64 0.2229 ... 0.9087\n", | |
| " draw_94 (sex_id, age_group_id, year_id) float64 0.6414 ... 0.6106\n", | |
| " draw_95 (sex_id, age_group_id, year_id) float64 0.3939 ... 0.2564\n", | |
| " draw_96 (sex_id, age_group_id, year_id) float64 0.3816 ... 0.3988\n", | |
| " draw_97 (sex_id, age_group_id, year_id) float64 0.2947 ... 0.2535\n", | |
| " draw_98 (sex_id, age_group_id, year_id) float64 0.6506 ... 0.4103\n", | |
| " draw_99 (sex_id, age_group_id, year_id) float64 0.1932 ... 0.6117" | |
| ] | |
| }, | |
| "execution_count": 64, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "ds" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This is similar to merging in pandas actually, except it defaults to outer merge which is the union of the arrays,\n", | |
| "but alternatively you can take inner merge/or intersection of the arrays." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 66, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.Dataset>\n", | |
| "Dimensions: (age_group_id: 3, draw: 100, sex_id: 1, year_id: 11)\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n", | |
| "Data variables:\n", | |
| " mean (sex_id, age_group_id, year_id) float64 0.4806 ... 0.5294\n", | |
| " draws (sex_id, age_group_id, year_id, draw) float64 0.07233 ... 0.6117" | |
| ] | |
| }, | |
| "execution_count": 66, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "only_females_ds = xr.merge([mean_da.rename(\"mean\").sel(sex_id=[2]), da.rename(\"draws\")], join=\"inner\")\n", | |
| "only_females_ds" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 67, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.Dataset>\n", | |
| "Dimensions: (age_group_id: 3, draw: 100, sex_id: 2, year_id: 11)\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n", | |
| "Data variables:\n", | |
| " mean (sex_id, age_group_id, year_id) float64 nan nan ... 0.5294\n", | |
| " draws (sex_id, age_group_id, year_id, draw) float64 0.9968 ... 0.6117" | |
| ] | |
| }, | |
| "execution_count": 67, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "with_nans_da = xr.merge([mean_da.rename(\"mean\").sel(sex_id=[2]), da.rename(\"draws\")], join=\"outer\")\n", | |
| "with_nans_da" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 68, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'mean' (sex_id: 2, age_group_id: 3, year_id: 11)>\n", | |
| "array([[[ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan],\n", | |
| " [ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan],\n", | |
| " [ nan, nan, nan, nan, nan, nan,\n", | |
| " nan, nan, nan, nan, nan]],\n", | |
| "\n", | |
| " [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n", | |
| " 0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n", | |
| " [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n", | |
| " 0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n", | |
| " [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n", | |
| " 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000" | |
| ] | |
| }, | |
| "execution_count": 68, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "with_nans_da[\"mean\"]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 69, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<xarray.DataArray 'draws' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n", | |
| "array([[[[0.996779, ..., 0.193241],\n", | |
| " ...,\n", | |
| " [0.50371 , ..., 0.654273]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.227811, ..., 0.912856],\n", | |
| " ...,\n", | |
| " [0.24642 , ..., 0.581184]]],\n", | |
| "\n", | |
| "\n", | |
| " [[[0.072334, ..., 0.684663],\n", | |
| " ...,\n", | |
| " [0.628984, ..., 0.358811]],\n", | |
| "\n", | |
| " ...,\n", | |
| "\n", | |
| " [[0.491698, ..., 0.876439],\n", | |
| " ...,\n", | |
| " [0.829525, ..., 0.611719]]]])\n", | |
| "Coordinates:\n", | |
| " * sex_id (sex_id) int64 1 2\n", | |
| " * age_group_id (age_group_id) int64 11 12 13\n", | |
| " * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n", | |
| " * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n", | |
| "Attributes:\n", | |
| " metric: rate\n", | |
| " author: Me" | |
| ] | |
| }, | |
| "execution_count": 69, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "with_nans_da[\"draws\"]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.6.8" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Author
Hello, thanks for sharing. can you indicate how to in stall fbd_core package? this command does not work : conda install fbd_core
fbd_core is an internal library, but here's the code for the expand_dimensions() function:
import numpy as np
import xarray as xr
def expand_dimensions(data, fill_value=np.nan, **new_coords):
"""
Expand (or add if it doesn't yet exist) the data array to fill in new
coordinates across multiple dimensions.
If a dimension doesn't exist in the dataarray yet, then the result will be
`data`, broadcasted across this dimension.
>>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
>>> expand_dimensions(da, b=[1, 2, 3, 4, 5])
<xarray.DataArray (a: 3, b: 5)>
array([[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]])
Coordinates:
* a (a) int64 0 1 2
* b (b) int64 1 2 3 4 5
Or, if `dim` is already a dimension in `data`, then any new coordinate
values in `new_coords` that are not yet in `data[dim]` will be added,
and the values corresponding to those new coordinates will be `fill_value`.
>>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
>>> expand_dimensions(da, a=[1, 2, 3, 4, 5])
<xarray.DataArray (a: 6)>
array([ 1., 2., 3., 0., 0., 0.])
Coordinates:
* a (a) int64 0 1 2 3 4 5
Args:
data (xarray.DataArray):
Data that needs dimensions expanded.
fill_value (scalar, optional):
If expanding new coords this is the value of the new datum.
Defaults to `np.nan`.
**new_coords (list[int | str]):
The keywords are arbitrary dimensions and the values are
coordinates of those dimensions that the data will include after it
has been expanded.
Returns:
xarray.DataArray:
Data that had its dimensions expanded to include the new
coordinates.
"""
ordered_coord_dict = OrderedDict(new_coords)
shape_da = xr.DataArray(
np.zeros(list(map(len, ordered_coord_dict.values()))),
coords=ordered_coord_dict,
dims=ordered_coord_dict.keys())
expanded_data = xr.broadcast(data, shape_da)[0].fillna(fill_value)
return expanded_data
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello, thanks for sharing. can you indicate how to in stall fbd_core package? this command does not work : conda install fbd_core