{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Selección y manejo de filas --- 5:35 min\n", "\n", "* 5:35 min | Última modificación: Octubre 6, 2021 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas es una librería de alto desempeño para el manejo y análisis de datos en Python ampliamente utilizada en Analítica y Ciencia de Datos, por lo que su dominio resulta fundamental. Pandas se especializa en estructuras \"tidy\", es decir, tablas de datos donde cada fila es un registro y cada columna es un atributo." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Este tutorial esta basado en https://es.hortonworks.com/tutorial/beginners-guide-to-apache-pig/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparación" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "pd.set_option(\"display.notebook_repr_html\", False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Carga de los datos de los conductores" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " driverId ssn location certified \\\n", "name \n", "George Vetticaden 10 621011971 244-4532 Nulla Rd. N \n", "Jamie Engesser 11 262112338 366-4125 Ac Street N \n", "Paul Coddin 12 198041975 Ap #622-957 Risus. Street Y \n", "Joe Niemiec 13 139907145 2071 Hendrerit. Ave Y \n", "Adis Cesir 14 820812209 Ap #810-1228 In St. Y \n", "\n", " wage-plan \n", "name \n", "George Vetticaden miles \n", "Jamie Engesser miles \n", "Paul Coddin hours \n", "Joe Niemiec hours \n", "Adis Cesir hours " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Carga del archivo ddesde un repo en GitHub\n", "#\n", "drivers = pd.read_csv(\n", " \"https://raw.githubusercontent.com/jdvelasq/playground/master/datasets/drivers/drivers.csv\",\n", " sep=\",\",\n", " thousands=None,\n", " decimal=\".\",\n", ")\n", "\n", "drivers.set_index(\"name\", inplace=True)\n", "drivers.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Obtención de un subconjunto de registros" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " driverId ssn location certified wage-plan\n", "name \n", "Joe Niemiec 13 139907145 2071 Hendrerit. Ave Y hours\n", "Adis Cesir 14 820812209 Ap #810-1228 In St. Y hours\n", "Rohit Bakshi 15 239005227 648-5681 Dui- Rd. Y hours" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drivers[3:6]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " driverId ssn location certified wage-plan\n", "name \n", "Joe Niemiec 13 139907145 2071 Hendrerit. Ave Y hours\n", "Rohit Bakshi 15 239005227 648-5681 Dui- Rd. Y hours\n", "Adis Cesir 14 820812209 Ap #810-1228 In St. Y hours" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drivers.loc[\n", " [\n", " \"Joe Niemiec\",\n", " \"Rohit Bakshi\",\n", " \"Adis Cesir\",\n", " ]\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ordenamiento del indice" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " driverId ssn location \\\n", "name \n", "Wes Floyd 37 190504074 P.O. Box 269- 9611 Nulla Street \n", "Tom McCuch 16 363303105 P.O. Box 313- 962 Parturient Rd. \n", "Teddy Choi 29 185502192 P.O. Box 106- 7003 Amet Rd. \n", "Sridhara Sabbella 33 967409015 Ap #477-2507 Sagittis Avenue \n", "Scott Shaw 38 386411175 276 Lobortis Road \n", "\n", " certified wage-plan \n", "name \n", "Wes Floyd Y hours \n", "Tom McCuch Y hours \n", "Teddy Choi Y hours \n", "Sridhara Sabbella Y hours \n", "Scott Shaw Y hours " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drivers.sort_index(\n", " axis=0,\n", " ascending=False,\n", ").head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Obtención de un subconjunto de filas usando una expresión regular" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " driverId ssn location \\\n", "name \n", "Jean-Philippe Playe 25 913310051 P.O. Box 812- 6238 Ac Rd. \n", "Michael Aube 26 124705141 P.O. Box 213- 8948 Nec Ave \n", "Dan Rice 30 282307061 Ap #881-9267 Mollis Avenue \n", "Andrew Grande 36 245303216 Ap #685-9598 Egestas Rd. \n", "\n", " certified wage-plan \n", "name \n", "Jean-Philippe Playe Y hours \n", "Michael Aube Y hours \n", "Dan Rice Y hours \n", "Andrew Grande Y hours " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Nombres que finalicen en 'e'.\n", "#\n", "drivers.filter(regex=\"e$\", axis=0).head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " driverId ssn location certified \\\n", "name \n", "George Vetticaden 10 621011971 244-4532 Nulla Rd. N \n", "Nadeem Asghar 22 783204269 154-9147 Aliquam Ave Y \n", "Andrew Grande 36 245303216 Ap #685-9598 Egestas Rd. Y \n", "\n", " wage-plan \n", "name \n", "George Vetticaden miles \n", "Nadeem Asghar hours \n", "Andrew Grande hours " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Nombres que contengan 'de'.\n", "#\n", "drivers.filter(like=\"de\", axis=0).head()" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Cambio del nombre de filas" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " driverId ssn location certified \\\n", "name \n", "GEORGE VETTICADEN 10 621011971 244-4532 Nulla Rd. N \n", "JAMIE ENGESSER 11 262112338 366-4125 Ac Street N \n", "Paul Coddin 12 198041975 Ap #622-957 Risus. Street Y \n", "Joe Niemiec 13 139907145 2071 Hendrerit. Ave Y \n", "Adis Cesir 14 820812209 Ap #810-1228 In St. Y \n", "\n", " wage-plan \n", "name \n", "GEORGE VETTICADEN miles \n", "JAMIE ENGESSER miles \n", "Paul Coddin hours \n", "Joe Niemiec hours \n", "Adis Cesir hours " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drivers.rename(\n", " index={\n", " \"George Vetticaden\": \"George Vetticaden\".upper(),\n", " \"Jamie Engesser\": \"Jamie Engesser \".upper(),\n", " }\n", ").head()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" }, "toc-autonumbering": false, "toc-showmarkdowntxt": false }, "nbformat": 4, "nbformat_minor": 4 }