Skip to content

Instantly share code, notes, and snippets.

@calilisantos
Last active November 3, 2024 14:15
Show Gist options
  • Select an option

  • Save calilisantos/c27379cf2460c43c5d323ee235aa2fd2 to your computer and use it in GitHub Desktop.

Select an option

Save calilisantos/c27379cf2460c43c5d323ee235aa2fd2 to your computer and use it in GitHub Desktop.
Date in Spark Guide
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "477024fd-597a-4e9f-bc22-f51a898a6ea7",
"showTitle": false,
"title": ""
}
},
"source": [
"# Working with date: \n",
"## Try to pattern:\n",
"* ### Only date;\n",
"* ### Or only timestamp\n",
"## References:\n",
"* [Datetime pattern](https://spark.apache.org/docs/3.5.1/sql-ref-datetime-pattern.html)\n",
"* [Spark 3.0 Guide](https://www.databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html)"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "79da09b1-8cda-4109-969f-707864636d19",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Import dependencies**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "52a85283-d777-452d-89be-34e4ac8cb049",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"from datetime import datetime, timedelta\n",
"from pyspark.sql import functions as F, SparkSession, types as T"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "6f9cd377-cf13-4cbb-a916-4fe584883feb",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Create SparkSession**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "dabbbc18-38b9-4ef6-9629-85c3063217d1",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"spark = (\n",
" SparkSession.builder\n",
" .master(\"local[3]\")\n",
" .appName(\"date_in_pyspark\")\n",
" .getOrCreate()\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "c3db539a-0aca-4a7a-b4f8-5445c5eb3611",
"showTitle": false,
"title": ""
}
},
"source": [
"## The dates\n",
"* **`current functions`**\n",
"* **`custom_date`**\n",
"* **Timestamps**\n",
" * **`unix pattern`**\n",
" * **`NTZ` PS: _since spark 3.4.0_**\n",
" * **`convert_timezone`**\n",
" * **`make_timestamp_ntz`**\n",
" * **`to_timestamp_ntz`**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"implicitDf": true,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "d1db5667-35e8-49e1-9771-03297d7730d8",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>today</th><th>today_stamp</th><th>TZ</th><th>past</th><th>today_unix</th><th>today_unix_back</th><th>today_NTZ</th><th>next_week</th><th>past_week</th><th>(lateralAliasReference(today) > lateralAliasReference(past))</th></tr></thead><tbody><tr><td>2024-10-19</td><td>2024-10-19T00:00:20.46Z</td><td>Etc/UTC</td><td>2024-01-01</td><td>1729296000</td><td>2024-10-19</td><td>2024-10-19T00:00:20.46</td><td>2024-10-26</td><td>2024-10-12</td><td>true</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"2024-10-19",
"2024-10-19T00:00:20.46Z",
"Etc/UTC",
"2024-01-01",
1729296000,
"2024-10-19",
"2024-10-19T00:00:20.46",
"2024-10-26",
"2024-10-12",
true
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "today",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "today_stamp",
"type": "\"timestamp\""
},
{
"metadata": "{}",
"name": "TZ",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "past",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "today_unix",
"type": "\"long\""
},
{
"metadata": "{}",
"name": "today_unix_back",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "today_NTZ",
"type": "\"timestamp_ntz\""
},
{
"metadata": "{}",
"name": "next_week",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "past_week",
"type": "\"date\""
},
{
"metadata": "{\"__autoGeneratedAlias\":\"true\"}",
"name": "(lateralAliasReference(today) > lateralAliasReference(past))",
"type": "\"boolean\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"date_functions = spark.sql(\"\"\"\n",
"SELECT\n",
" current_date() AS today,\n",
" current_timestamp() AS today_stamp,\n",
" current_timezone() AS TZ,\n",
" make_date(2024, 1, 1) AS past, -- to_date(\"2024-01-01\") AS past\n",
" to_unix_timestamp(today, 'yyyy-MM-dd') AS today_unix,\n",
" from_unixtime(today_unix, 'yyyy-MM-dd') AS today_unix_back,\n",
" to_timestamp_ntz(today_stamp) AS today_NTZ,\n",
" date_add(today, 7) AS next_week,\n",
" date_sub(today, 7) AS past_week,\n",
" today > past\n",
"\"\"\")\n",
"\n",
"date_functions.display()"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"implicitDf": true,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "2df681ee-e937-40d1-ac1e-1032b0fc5059",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>key</th><th>value</th></tr></thead><tbody><tr><td>spark.sql.session.timeZone</td><td>America/Los_Angeles</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"spark.sql.session.timeZone",
"America/Los_Angeles"
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {
"createTempViewForImplicitDf": true,
"dataframeName": "_sqldf",
"executionCount": 154
},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "key",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "value",
"type": "\"string\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"%sql\n",
"SET TIME ZONE 'America/Los_Angeles'"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "55f504fc-8b78-42cc-99f1-ce1bf97ad315",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>today</th><th>today_stamp</th><th>TZ</th><th>past</th><th>today_unix</th><th>today_unix_back</th><th>today_NTZ</th><th>next_week</th><th>past_week</th><th>(lateralAliasReference(today) > lateralAliasReference(past))</th></tr></thead><tbody><tr><td>2024-10-19</td><td>2024-10-18T17:03:35.591-07:00</td><td>America/Los_Angeles</td><td>2024-01-01</td><td>1729296000</td><td>2024-10-19</td><td>2024-10-19T00:03:35.591</td><td>2024-10-26</td><td>2024-10-12</td><td>true</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"2024-10-19",
"2024-10-18T17:03:35.591-07:00",
"America/Los_Angeles",
"2024-01-01",
1729296000,
"2024-10-19",
"2024-10-19T00:03:35.591",
"2024-10-26",
"2024-10-12",
true
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "today",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "today_stamp",
"type": "\"timestamp\""
},
{
"metadata": "{}",
"name": "TZ",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "past",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "today_unix",
"type": "\"long\""
},
{
"metadata": "{}",
"name": "today_unix_back",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "today_NTZ",
"type": "\"timestamp_ntz\""
},
{
"metadata": "{}",
"name": "next_week",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "past_week",
"type": "\"date\""
},
{
"metadata": "{\"__autoGeneratedAlias\":\"true\"}",
"name": "(lateralAliasReference(today) > lateralAliasReference(past))",
"type": "\"boolean\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"date_functions.display()"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "31257398-06c8-42f1-aea2-18c61ebc2b6d",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Declare Consts**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "b605d2c0-dee6-43f7-806c-27665b07efe5",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"DATE_FORMAT = \"dd/MM/yyyy\"\n",
"\n",
"INITIAL_DATE = datetime(2024,6,1)\n",
"\n",
"MONTHS_TO_BE_FROZEN = 3\n",
"\n",
"TODAY = F.current_date()"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "4ec38261-7f38-4ab5-ad79-73a5f3e7aa77",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Create Dataframe**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "e7c78b1a-2708-4291-aae5-b753150094f7",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>database</th><th>table</th><th>create_date</th></tr></thead><tbody><tr><td>a</td><td>table_one</td><td>2024-06-01</td></tr><tr><td>a</td><td>table_two</td><td>2024-07-02</td></tr><tr><td>b</td><td>table_three</td><td>2024-08-02</td></tr><tr><td>b</td><td>table_four</td><td>2024-09-02</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"a",
"table_one",
"2024-06-01"
],
[
"a",
"table_two",
"2024-07-02"
],
[
"b",
"table_three",
"2024-08-02"
],
[
"b",
"table_four",
"2024-09-02"
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "database",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "table",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "create_date",
"type": "\"date\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"values = [\n",
" [\n",
" \"a\",\n",
" \"table_one\",\n",
" INITIAL_DATE\n",
" ],\n",
" [\n",
" \"a\",\n",
" \"table_two\",\n",
" INITIAL_DATE + timedelta(days=31)\n",
" ],\n",
" [\n",
" \"b\",\n",
" \"table_three\",\n",
" INITIAL_DATE + timedelta(days=62)\n",
" ],\n",
" [\n",
" \"b\",\n",
" \"table_four\",\n",
" INITIAL_DATE + timedelta(days=93)\n",
" ],\n",
"]\n",
"\n",
"schema = T.StructType([\n",
" T.StructField(\"database\", T.StringType()),\n",
" T.StructField(\"table\", T.StringType()),\n",
" T.StructField(\"create_date\", T.DateType())\n",
"])\n",
"\n",
"df = spark.createDataFrame(values, schema)\n",
"\n",
"df.display()"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "7dd8bd52-54fa-4fdd-865a-a9f16fe487ba",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Create is_frozen column**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "93dc0131-bae7-4aa8-acb4-1149670129ac",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>database</th><th>table</th><th>create_date</th><th>is_frozen</th></tr></thead><tbody><tr><td>a</td><td>table_one</td><td>2024-06-01</td><td>true</td></tr><tr><td>a</td><td>table_two</td><td>2024-07-02</td><td>true</td></tr><tr><td>b</td><td>table_three</td><td>2024-08-02</td><td>false</td></tr><tr><td>b</td><td>table_four</td><td>2024-09-02</td><td>false</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"a",
"table_one",
"2024-06-01",
true
],
[
"a",
"table_two",
"2024-07-02",
true
],
[
"b",
"table_three",
"2024-08-02",
false
],
[
"b",
"table_four",
"2024-09-02",
false
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "database",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "table",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "create_date",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "is_frozen",
"type": "\"boolean\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"df = (\n",
" df.withColumn(\n",
" \"is_frozen\",\n",
" F.add_months(F.col(\"create_date\"), MONTHS_TO_BE_FROZEN) < TODAY\n",
" )\n",
")\n",
"\n",
"df.display()"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "4d0b6910-e45c-4c24-ad83-0438cece4e79",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Create days_of_created column**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "312e4c29-f079-4ced-bebd-cfe0b5843e68",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>database</th><th>table</th><th>create_date</th><th>is_frozen</th><th>days_of_created</th></tr></thead><tbody><tr><td>a</td><td>table_one</td><td>2024-06-01</td><td>true</td><td>140</td></tr><tr><td>a</td><td>table_two</td><td>2024-07-02</td><td>true</td><td>109</td></tr><tr><td>b</td><td>table_three</td><td>2024-08-02</td><td>false</td><td>78</td></tr><tr><td>b</td><td>table_four</td><td>2024-09-02</td><td>false</td><td>47</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"a",
"table_one",
"2024-06-01",
true,
140
],
[
"a",
"table_two",
"2024-07-02",
true,
109
],
[
"b",
"table_three",
"2024-08-02",
false,
78
],
[
"b",
"table_four",
"2024-09-02",
false,
47
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "database",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "table",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "create_date",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "is_frozen",
"type": "\"boolean\""
},
{
"metadata": "{}",
"name": "days_of_created",
"type": "\"integer\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"df = (\n",
" df.withColumn(\n",
" \"days_of_created\",\n",
" F.datediff(TODAY, F.col(\"create_date\"))\n",
" )\n",
")\n",
"\n",
"df.display()"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "d5e19499-b5c2-449e-96f4-ee646def8411",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Create months_of_created column**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "a6def84a-eec3-4fc6-8edc-04db7700bd21",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>database</th><th>table</th><th>create_date</th><th>is_frozen</th><th>days_of_created</th><th>months_of_created</th></tr></thead><tbody><tr><td>a</td><td>table_one</td><td>2024-06-01</td><td>true</td><td>140</td><td>4</td></tr><tr><td>a</td><td>table_two</td><td>2024-07-02</td><td>true</td><td>109</td><td>3</td></tr><tr><td>b</td><td>table_three</td><td>2024-08-02</td><td>false</td><td>78</td><td>2</td></tr><tr><td>b</td><td>table_four</td><td>2024-09-02</td><td>false</td><td>47</td><td>1</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"a",
"table_one",
"2024-06-01",
true,
140,
4
],
[
"a",
"table_two",
"2024-07-02",
true,
109,
3
],
[
"b",
"table_three",
"2024-08-02",
false,
78,
2
],
[
"b",
"table_four",
"2024-09-02",
false,
47,
1
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "database",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "table",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "create_date",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "is_frozen",
"type": "\"boolean\""
},
{
"metadata": "{}",
"name": "days_of_created",
"type": "\"integer\""
},
{
"metadata": "{}",
"name": "months_of_created",
"type": "\"long\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"df = (\n",
" df.withColumn(\n",
" \"months_of_created\",\n",
" F.floor(F.months_between(TODAY, F.col(\"create_date\")))\n",
" )\n",
")\n",
"\n",
"df.display()"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "7dbaa4c6-9297-45bd-a52a-605c2c3b2077",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Create create_quarter column**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "dd880d5a-8322-48c2-b77f-6b146622554e",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>database</th><th>table</th><th>create_date</th><th>is_frozen</th><th>days_of_created</th><th>months_of_created</th><th>create_quarter</th></tr></thead><tbody><tr><td>a</td><td>table_one</td><td>2024-06-01</td><td>true</td><td>140</td><td>4</td><td>2</td></tr><tr><td>a</td><td>table_two</td><td>2024-07-02</td><td>true</td><td>109</td><td>3</td><td>3</td></tr><tr><td>b</td><td>table_three</td><td>2024-08-02</td><td>false</td><td>78</td><td>2</td><td>3</td></tr><tr><td>b</td><td>table_four</td><td>2024-09-02</td><td>false</td><td>47</td><td>1</td><td>3</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"a",
"table_one",
"2024-06-01",
true,
140,
4,
2
],
[
"a",
"table_two",
"2024-07-02",
true,
109,
3,
3
],
[
"b",
"table_three",
"2024-08-02",
false,
78,
2,
3
],
[
"b",
"table_four",
"2024-09-02",
false,
47,
1,
3
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "database",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "table",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "create_date",
"type": "\"date\""
},
{
"metadata": "{}",
"name": "is_frozen",
"type": "\"boolean\""
},
{
"metadata": "{}",
"name": "days_of_created",
"type": "\"integer\""
},
{
"metadata": "{}",
"name": "months_of_created",
"type": "\"long\""
},
{
"metadata": "{}",
"name": "create_quarter",
"type": "\"integer\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"df = (\n",
" df.withColumn(\n",
" \"create_quarter\",\n",
" F.quarter(F.col(\"create_date\"))\n",
" )\n",
")\n",
"\n",
"df.display()"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "7051f946-843f-4837-9ba9-c63829a3fd20",
"showTitle": false,
"title": ""
}
},
"source": [
"# **Format create_date column**"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "30311f85-d578-4409-86d4-e324f143c3ca",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<style scoped>\n",
" .table-result-container {\n",
" max-height: 300px;\n",
" overflow: auto;\n",
" }\n",
" table, th, td {\n",
" border: 1px solid black;\n",
" border-collapse: collapse;\n",
" }\n",
" th, td {\n",
" padding: 5px;\n",
" }\n",
" th {\n",
" text-align: left;\n",
" }\n",
"</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>database</th><th>table</th><th>create_date</th><th>is_frozen</th><th>days_of_created</th><th>months_of_created</th><th>create_quarter</th></tr></thead><tbody><tr><td>a</td><td>table_one</td><td>01/06/2024</td><td>true</td><td>140</td><td>4</td><td>2</td></tr><tr><td>a</td><td>table_two</td><td>02/07/2024</td><td>true</td><td>109</td><td>3</td><td>3</td></tr><tr><td>b</td><td>table_three</td><td>02/08/2024</td><td>false</td><td>78</td><td>2</td><td>3</td></tr><tr><td>b</td><td>table_four</td><td>02/09/2024</td><td>false</td><td>47</td><td>1</td><td>3</td></tr></tbody></table></div>"
]
},
"metadata": {
"application/vnd.databricks.v1+output": {
"addedWidgets": {},
"aggData": [],
"aggError": "",
"aggOverflow": false,
"aggSchema": [],
"aggSeriesLimitReached": false,
"aggType": "",
"arguments": {},
"columnCustomDisplayInfos": {},
"data": [
[
"a",
"table_one",
"01/06/2024",
true,
140,
4,
2
],
[
"a",
"table_two",
"02/07/2024",
true,
109,
3,
3
],
[
"b",
"table_three",
"02/08/2024",
false,
78,
2,
3
],
[
"b",
"table_four",
"02/09/2024",
false,
47,
1,
3
]
],
"datasetInfos": [],
"dbfsResultPath": null,
"isJsonSchema": true,
"metadata": {},
"overflow": false,
"plotOptions": {
"customPlotOptions": {},
"displayType": "table",
"pivotAggregation": null,
"pivotColumns": null,
"xColumns": null,
"yColumns": null
},
"removedWidgets": [],
"schema": [
{
"metadata": "{}",
"name": "database",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "table",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "create_date",
"type": "\"string\""
},
{
"metadata": "{}",
"name": "is_frozen",
"type": "\"boolean\""
},
{
"metadata": "{}",
"name": "days_of_created",
"type": "\"integer\""
},
{
"metadata": "{}",
"name": "months_of_created",
"type": "\"long\""
},
{
"metadata": "{}",
"name": "create_quarter",
"type": "\"integer\""
}
],
"type": "table"
}
},
"output_type": "display_data"
}
],
"source": [
"df = (\n",
" df.withColumn(\n",
" \"create_date\",\n",
" F.date_format(df[\"create_date\"], DATE_FORMAT)\n",
" )\n",
")\n",
"\n",
"df.display()"
]
}
],
"metadata": {
"application/vnd.databricks.v1+notebook": {
"dashboards": [],
"environmentMetadata": null,
"language": "python",
"notebookMetadata": {
"mostRecentlyExecutedCommandWithImplicitDF": {
"commandId": 913038370141173,
"dataframes": [
"_sqldf"
]
},
"pythonIndentUnit": 4
},
"notebookName": "date_in_pyspark",
"widgets": {}
},
"kernelspec": {
"display_name": "clean_db",
"language": "python",
"name": "clean_db"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"vscode": {
"interpreter": {
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment