{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Comandos equivalentes de SQL en Pandas --- 18:15 min\n", "===\n", "\n", "* 18:15 min | Última modificación: Octubre 6, 2021 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "El lenguaje SQL es una de las principales herramientas para la gestión de datos, por lo que su uso es ampliamente difundido. En este documento se discute la equivalencia entre los principales comandos de Pandas y SQL." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Al finalizar el documento, usted estará en capacidad ejecutar en Pandas comandos equivalentes a los siguientes operadores de SQL:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* SELECT\n", "\n", "* SEELCT DISTINCT\n", "\n", "* ORDER BY\n", "\n", "* WHERE\n", "\n", "* GROUP BY\n", "\n", "* INNER JOIN\n", "\n", "* LEFT OUTER JOIN\n", "\n", "* RIGHT OUTER JOIN\n", "\n", "* FULL JOIN\n", "\n", "* UNION\n", "\n", "* UPDATE\n", "\n", "* DELETE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adaptado de: https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparación" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import pandas as pd\n", "\n", "pd.set_option(\"display.notebook_repr_html\", False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Carga de datos" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "tips = pd.read_csv(\n", " \"https://raw.githubusercontent.com/jdvelasq/playground/master/datasets/tips.csv\",\n", " sep=\",\",\n", " thousands=None,\n", " decimal=\".\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SELECT" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip smoker time\n", "0 16.99 1.01 No Dinner\n", "1 10.34 1.66 No Dinner\n", "2 21.01 3.50 No Dinner\n", "3 23.68 3.31 No Dinner\n", "4 24.59 3.61 No Dinner" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT\n", "# total_bill,\n", "# tip,\n", "# smoker,\n", "# time\n", "# FROM\n", "# tips_df\n", "# LIMIT 5;\n", "#\n", "tips[\n", " [\n", " \"total_bill\",\n", " \"tip\",\n", " \"smoker\",\n", " \"time\",\n", " ]\n", "].head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SELECT DISTINCT" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Sun', 'Sat', 'Thur', 'Fri'], dtype=object)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT DISTINCT\n", "# day\n", "# FROM\n", "# tips_df\n", "# LIMIT 5;\n", "#\n", "tips[\"day\"].unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ORDER BY" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "67 3.07 1.00 Female Yes Sat Dinner 1\n", "92 5.75 1.00 Female Yes Fri Dinner 2\n", "111 7.25 1.00 Female No Sat Dinner 1\n", "172 7.25 5.15 Male Yes Sun Dinner 2\n", "149 7.51 2.00 Male No Thur Lunch 2" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT \n", "# *\n", "# FROM \n", "# tips\n", "# ORDER BY \n", "# total_bill ASC\n", "# LIMIT 5;\n", "#\n", "tips.sort_values('total_bill').head(5)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "67 3.07 1.00 Female Yes Sat Dinner 1\n", "92 5.75 1.00 Female Yes Fri Dinner 2\n", "111 7.25 1.00 Female No Sat Dinner 1\n", "172 7.25 5.15 Male Yes Sun Dinner 2\n", "149 7.51 2.00 Male No Thur Lunch 2" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips.sort_values(\n", " \"total_bill\",\n", " ascending=True,\n", ").head(5)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "170 50.81 10.00 Male Yes Sat Dinner 3\n", "212 48.33 9.00 Male No Sat Dinner 4\n", "59 48.27 6.73 Male No Sat Dinner 4\n", "156 48.17 5.00 Male No Sun Dinner 6\n", "182 45.35 3.50 Male Yes Sun Dinner 3" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT\n", "# *\n", "# FROM\n", "# df\n", "# ORDER BY\n", "# total_bill DESC\n", "# LIMIT 5;\n", "#\n", "tips.sort_values(\n", " \"total_bill\",\n", " ascending=False,\n", ").head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## WHERE" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "0 16.99 1.01 Female No Sun Dinner 2\n", "1 10.34 1.66 Male No Sun Dinner 3\n", "2 21.01 3.50 Male No Sun Dinner 3\n", "3 23.68 3.31 Male No Sun Dinner 2\n", "4 24.59 3.61 Female No Sun Dinner 4" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT \n", "# *\n", "# FROM \n", "# df\n", "# WHERE \n", "# time = 'Dinner'\n", "# LIMIT 5;\n", "#\n", "tips[tips['time'] == 'Dinner'].head(5)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True 176\n", "False 68\n", "Name: time, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "is_dinner = tips['time'] == 'Dinner'\n", "is_dinner.value_counts()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "0 16.99 1.01 Female No Sun Dinner 2\n", "1 10.34 1.66 Male No Sun Dinner 3\n", "2 21.01 3.50 Male No Sun Dinner 3\n", "3 23.68 3.31 Male No Sun Dinner 2\n", "4 24.59 3.61 Female No Sun Dinner 4" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips[is_dinner].head(5)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "0 16.99 1.01 Female No Sun Dinner 2\n", "1 10.34 1.66 Male No Sun Dinner 3\n", "2 21.01 3.50 Male No Sun Dinner 3\n", "3 23.68 3.31 Male No Sun Dinner 2\n", "4 24.59 3.61 Female No Sun Dinner 4" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips.query(\"time == 'Dinner'\").head(5)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "23 39.42 7.58 Male No Sat Dinner 4\n", "44 30.40 5.60 Male No Sun Dinner 4\n", "47 32.40 6.00 Male No Sun Dinner 4\n", "52 34.81 5.20 Female No Sun Dinner 4\n", "59 48.27 6.73 Male No Sat Dinner 4\n", "116 29.93 5.07 Male No Sun Dinner 4\n", "155 29.85 5.14 Female No Sun Dinner 5\n", "170 50.81 10.00 Male Yes Sat Dinner 3\n", "172 7.25 5.15 Male Yes Sun Dinner 2\n", "181 23.33 5.65 Male Yes Sun Dinner 2\n", "183 23.17 6.50 Male Yes Sun Dinner 4\n", "211 25.89 5.16 Male Yes Sat Dinner 4\n", "212 48.33 9.00 Male No Sat Dinner 4\n", "214 28.17 6.50 Female Yes Sat Dinner 3\n", "239 29.03 5.92 Male No Sat Dinner 3" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT \n", "# *\n", "# FROM \n", "# df\n", "# WHERE \n", "# time = 'Dinner' AND tip > 5.00;\n", "#\n", "tips[(tips['time'] == 'Dinner') & (tips['tip'] > 5.00)]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "59 48.27 6.73 Male No Sat Dinner 4\n", "125 29.80 4.20 Female No Thur Lunch 6\n", "141 34.30 6.70 Male No Thur Lunch 6\n", "142 41.19 5.00 Male No Thur Lunch 5\n", "143 27.05 5.00 Female No Thur Lunch 6\n", "155 29.85 5.14 Female No Sun Dinner 5\n", "156 48.17 5.00 Male No Sun Dinner 6\n", "170 50.81 10.00 Male Yes Sat Dinner 3\n", "182 45.35 3.50 Male Yes Sun Dinner 3\n", "185 20.69 5.00 Male No Sun Dinner 5\n", "187 30.46 2.00 Male Yes Sun Dinner 5\n", "212 48.33 9.00 Male No Sat Dinner 4\n", "216 28.15 3.00 Male Yes Sat Dinner 5" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT \n", "# *\n", "# FROM\n", "# df\n", "# WHERE\n", "# size >= 5 OR total_bill > 45;\n", "#\n", "tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## GROUP BY" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sex\n", "Female 87\n", "Male 157\n", "dtype: int64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT \n", "# sex, \n", "# count(*)\n", "# FROM \n", "# df\n", "# GROUP BY \n", "# sex;\n", "#\n", "tips.groupby('sex').size()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip smoker day time size\n", "sex \n", "Female 87 87 87 87 87 87\n", "Male 157 157 157 157 157 157" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips.groupby('sex').count()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sex\n", "Female 87\n", "Male 157\n", "Name: total_bill, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips.groupby('sex')['total_bill'].count()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " tip day\n", "day \n", "Fri 2.734737 19\n", "Sat 2.993103 87\n", "Sun 3.255132 76\n", "Thur 2.771452 62" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "#  SELECT \n", "# day, \n", "# AVG(tip), \n", "# COUNT(*)\n", "# FROM \n", "# df\n", "# GROUP BY \n", "# day;\n", "#\n", "tips.groupby('day').agg({'tip': np.mean, 'day': np.size})" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " tip \n", " size mean\n", "smoker day \n", "No Fri 4.0 2.812500\n", " Sat 45.0 3.102889\n", " Sun 57.0 3.167895\n", " Thur 45.0 2.673778\n", "Yes Fri 15.0 2.714000\n", " Sat 42.0 2.875476\n", " Sun 19.0 3.516842\n", " Thur 17.0 3.030000" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT \n", "# smoker, \n", "# day, \n", "# COUNT(*), \n", "# AVG(tip)\n", "# FROM \n", "# df\n", "# GROUP BY \n", "# smoker, \n", "# day;\n", "#\n", "tips.groupby(['smoker', 'day']).agg({'tip': [np.size, np.mean]})" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#\n", "# Funciones que pueden ser aplicadas a un DataFrame:\n", "#\n", "# abs all any clip clip_lower clip_upper\n", "# corr corrwith count cov cummax cummin\n", "# cumprod cumsum describe diff eval kurt\n", "# mad max mean median min mode\n", "#  ct_change prod quantile rank round sem\n", "# skew sum std var\n", "#" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## INNER JOIN" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " key value\n", "0 A 0.388804\n", "1 B -1.410434\n", "2 C 0.878714\n", "3 D -0.818024" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# DataFrames de ejemplo\n", "#\n", "df1 = pd.DataFrame(\n", " {\n", " \"key\": [\"A\", \"B\", \"C\", \"D\"], \n", " \"value\": np.random.randn(4)\n", " }\n", ")\n", "\n", "df1" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " key value\n", "0 B 1.360152\n", "1 D -0.489195\n", "2 D 0.406400\n", "3 E 0.835870" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame(\n", " {\n", " 'key': ['B', 'D', 'D', 'E'],\n", " 'value': np.random.randn(4)\n", " }\n", ")\n", "\n", "df2" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " key value_x value_y\n", "0 B -1.410434 1.360152\n", "1 D -0.818024 -0.489195\n", "2 D -0.818024 0.406400" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT\n", "# *\n", "# FROM\n", "# df1\n", "# INNER JOIN\n", "# df2\n", "# ON df1.key = df2.key;\n", "#\n", "pd.merge(\n", " df1,\n", " df2,\n", " on=\"key\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LEFT OUTER JOIN" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " key value_x value_y\n", "0 A 0.388804 NaN\n", "1 B -1.410434 1.360152\n", "2 C 0.878714 NaN\n", "3 D -0.818024 -0.489195\n", "4 D -0.818024 0.406400" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT\n", "# *\n", "# FROM\n", "# df1\n", "# LEFT OUTER JOIN\n", "# df2\n", "# ON\n", "# df1.key = df2.key;\n", "#\n", "pd.merge(\n", " df1,\n", " df2,\n", " on=\"key\",\n", " how=\"left\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## RIGHT OUTER JOIN" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " key value_x value_y\n", "0 B -1.410434 1.360152\n", "1 D -0.818024 -0.489195\n", "2 D -0.818024 0.406400\n", "3 E NaN 0.835870" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT\n", "# *\n", "# FROM\n", "# df1\n", "# RIGHT OUTER JOIN\n", "# df2\n", "# ON\n", "# df1.key = df2.key;\n", "#\n", "pd.merge(\n", " df1,\n", " df2,\n", " on=\"key\",\n", " how=\"right\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## FULL OUTER JOIN " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " key value_x value_y\n", "0 A 0.388804 NaN\n", "1 B -1.410434 1.360152\n", "2 C 0.878714 NaN\n", "3 D -0.818024 -0.489195\n", "4 D -0.818024 0.406400\n", "5 E NaN 0.835870" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "#  SELECT \n", "# *\n", "# FROM \n", "# df1\n", "# FULL OUTER JOIN \n", "# df2\n", "# ON \n", "# df1.key = df2.key;\n", "#\n", "pd.merge(df1, df2, on='key', how='outer')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## UNION" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " city rank\n", "0 Chicago 1\n", "1 San Francisco 2\n", "2 New York City 3" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame(\n", " {\n", " 'city': ['Chicago', 'San Francisco', 'New York City'],\n", " 'rank': range(1, 4)\n", " }\n", ")\n", "\n", "df1" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " city rank\n", "0 Chicago 1\n", "1 Boston 4\n", "2 Los Angeles 5" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame(\n", " {\n", " 'city': ['Chicago', 'Boston', 'Los Angeles'],\n", " 'rank': [1, 4, 5]\n", " }\n", ")\n", "\n", "df2" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " city rank\n", "0 Chicago 1\n", "1 San Francisco 2\n", "2 New York City 3\n", "0 Chicago 1\n", "1 Boston 4\n", "2 Los Angeles 5" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT city, rank\n", "# FROM df1\n", "# UNION ALL\n", "# SELECT city, rank\n", "# FROM df2;\n", "#\n", "pd.concat(\n", " [df1, df2],\n", ")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " city rank\n", "0 Chicago 1\n", "1 San Francisco 2\n", "2 New York City 3\n", "1 Boston 4\n", "2 Los Angeles 5" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT\n", "# city,\n", "# rank\n", "# FROM\n", "# df1\n", "# UNION\n", "# SELECT\n", "# city,\n", "# rank\n", "# FROM\n", "# df2\n", "#\n", "pd.concat(\n", " [df1, df2],\n", ").drop_duplicates()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Primeras N filas con desplazamiento (offset)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "183 23.17 6.50 Male Yes Sun Dinner 4\n", "214 28.17 6.50 Female Yes Sat Dinner 3\n", "47 32.40 6.00 Male No Sun Dinner 4\n", "239 29.03 5.92 Male No Sat Dinner 3\n", "88 24.71 5.85 Male No Thur Lunch 2\n", "181 23.33 5.65 Male Yes Sun Dinner 2\n", "44 30.40 5.60 Male No Sun Dinner 4\n", "52 34.81 5.20 Female No Sun Dinner 4\n", "85 34.83 5.17 Female No Thur Lunch 4\n", "211 25.89 5.16 Male Yes Sat Dinner 4" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT \n", "# * \n", "# FROM \n", "# tips\n", "# ORDER BY \n", "# tip DESC\n", "# LIMIT 10 OFFSET 5;\n", "#\n", "tips.nlargest(10 + 5, columns='tip').tail(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Primeras N filas por grupo" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size rn\n", "95 40.17 4.73 Male Yes Fri Dinner 4 1\n", "90 28.97 3.00 Male Yes Fri Dinner 2 2\n", "170 50.81 10.00 Male Yes Sat Dinner 3 1\n", "212 48.33 9.00 Male No Sat Dinner 4 2\n", "156 48.17 5.00 Male No Sun Dinner 6 1\n", "182 45.35 3.50 Male Yes Sun Dinner 3 2\n", "197 43.11 5.00 Female Yes Thur Lunch 4 1\n", "142 41.19 5.00 Male No Thur Lunch 5 2" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT * FROM (\n", "# SELECT\n", "# t.*,\n", "# ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rn\n", "# FROM df t\n", "# )\n", "# WHERE rn < 3\n", "# ORDER BY day, rn;\n", "#\n", "(\n", " tips.assign(\n", " rn=tips.sort_values(\n", " [\"total_bill\"],\n", " ascending=False,\n", " )\n", " .groupby([\"day\"])\n", " .cumcount()\n", " + 1\n", " )\n", " .query(\"rn < 3\")\n", " .sort_values([\"day\", \"rn\"])\n", ")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size rnk\n", "95 40.17 4.73 Male Yes Fri Dinner 4 1.0\n", "90 28.97 3.00 Male Yes Fri Dinner 2 2.0\n", "170 50.81 10.00 Male Yes Sat Dinner 3 1.0\n", "212 48.33 9.00 Male No Sat Dinner 4 2.0\n", "156 48.17 5.00 Male No Sun Dinner 6 1.0\n", "182 45.35 3.50 Male Yes Sun Dinner 3 2.0\n", "197 43.11 5.00 Female Yes Thur Lunch 4 1.0\n", "142 41.19 5.00 Male No Thur Lunch 5 2.0" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", " tips.assign(\n", " rnk=tips.groupby([\"day\"])[\"total_bill\"].rank(\n", " method=\"first\",\n", " ascending=False,\n", " )\n", " )\n", " .query(\"rnk < 3\")\n", " .sort_values([\"day\", \"rnk\"])\n", ")" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size rnk_min\n", "67 3.07 1.00 Female Yes Sat Dinner 1 1.0\n", "92 5.75 1.00 Female Yes Fri Dinner 2 1.0\n", "111 7.25 1.00 Female No Sat Dinner 1 1.0\n", "236 12.60 1.00 Male Yes Sat Dinner 2 1.0\n", "237 32.83 1.17 Male Yes Sat Dinner 2 2.0" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# SELECT * FROM (\n", "# SELECT\n", "# t.*,\n", "# RANK() OVER(PARTITION BY sex ORDER BY tip) AS rnk\n", "# FROM tips t\n", "# WHERE tip < 2\n", "# )\n", "# WHERE rnk < 3\n", "# ORDER BY sex, rnk;\n", "#\n", "(\n", " tips[tips[\"tip\"] < 2]\n", " .assign(rnk_min=tips.groupby([\"sex\"])[\"tip\"].rank(method=\"min\"))\n", " .query(\"rnk_min < 3\")\n", " .sort_values([\"sex\", \"rnk_min\"])\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## UPDATE" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "0 16.99 2.02 Female No Sun Dinner 2\n", "1 10.34 3.32 Male No Sun Dinner 3\n", "2 21.01 3.50 Male No Sun Dinner 3\n", "3 23.68 3.31 Male No Sun Dinner 2\n", "4 24.59 3.61 Female No Sun Dinner 4" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# UPDATE \n", "# tips\n", "# SET \n", "# tip = tip*2\n", "# WHERE \n", "# tip < 2;\n", "#\n", "tips.loc[tips['tip'] < 2, 'tip'] *= 2\n", "tips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DELETE" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "0 16.99 2.02 Female No Sun Dinner 2\n", "1 10.34 3.32 Male No Sun Dinner 3\n", "2 21.01 3.50 Male No Sun Dinner 3\n", "3 23.68 3.31 Male No Sun Dinner 2\n", "4 24.59 3.61 Female No Sun Dinner 4\n", "5 25.29 4.71 Male No Sun Dinner 4\n", "6 8.77 2.00 Male No Sun Dinner 2\n", "7 26.88 3.12 Male No Sun Dinner 4\n", "8 15.04 3.92 Male No Sun Dinner 2\n", "9 14.78 3.23 Male No Sun Dinner 2" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# \n", "# DELETE FROM \n", "# tips\n", "# WHERE \n", "# tip > 9;\n", "#\n", "tips.loc[tips['tip'] <= 9].head(10)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }