{ "cells": [ { "cell_type": "markdown", "id": "35df0cae-5159-40e9-bcd6-7ceb1d6b6bfc", "metadata": { "tags": [] }, "source": [ "Conjuntos --- 8:13 min\n", "===\n", "\n", "* 8:13 min | Última modificación: Octubre 5, 2021" ] }, { "cell_type": "markdown", "id": "c62bfe64-44ee-4f45-a326-cb621a8141e3", "metadata": {}, "source": [ "Un conjunto es una estructura de datos cuyos elementos no se repiten." ] }, { "cell_type": "code", "execution_count": 1, "id": "4b5e8a47-ad64-47ef-8a1f-bad7fada6759", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a', 'b', 'c', 'd'}" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Un conjunto es una estructura de datos\n", "# cuyos elementos no se repiten\n", "#\n", "set_a = {\"a\", \"a\", \"b\", \"c\", \"d\", \"d\", \"d\"}\n", "set_a" ] }, { "cell_type": "code", "execution_count": 2, "id": "c2c01c1a-d15b-4931-8cc3-d14f140b01eb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a', 'b', 'c', 'd'}" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Un conjunto es una estructura de datos\n", "# cuyos elementos no se repiten\n", "#\n", "set_a = set([\"a\", \"a\", \"b\", \"c\", \"d\", \"d\", \"d\"])\n", "set_a" ] }, { "cell_type": "code", "execution_count": 3, "id": "4e4e3900-1353-4ba0-a8de-739a8892186f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Longitud\n", "#\n", "len(set_a)" ] }, { "cell_type": "markdown", "id": "d7b30d7c-d4cd-48f4-94e7-eed321fd0855", "metadata": {}, "source": [ "## Métodos" ] }, { "cell_type": "code", "execution_count": 4, "id": "94d5d3cc-46d5-4179-adf8-603e9ce6ded3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1, 'a', 'b', 'c', 'd'}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# add()\n", "# ===================================\n", "#\n", "set_a = {\"a\", \"b\", \"c\", \"d\"}\n", "set_a.add(1)\n", "set_a" ] }, { "cell_type": "code", "execution_count": 5, "id": "81deff90-1f19-4036-8557-f6b6ac3aa945", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'b', 'c', 'd'}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# discard()\n", "# ===================================\n", "#\n", "set_a = {\"a\", \"b\", \"c\", \"d\"}\n", "set_a.discard(\"a\")\n", "set_a" ] }, { "cell_type": "code", "execution_count": 6, "id": "0692ada7-5a0f-4488-ad36-c0f04e5bb198", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a', 'b', 'c'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# pop()\n", "# ===================================\n", "#\n", "set_a = {\"a\", \"b\", \"c\", \"d\"}\n", "set_a.pop()\n", "set_a" ] }, { "cell_type": "code", "execution_count": 7, "id": "0b71fff1-9967-424f-8d52-d87172e85fe4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1, 2, 3, 4, 'a', 'b', 'c', 'd'}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# update()\n", "# ===================================\n", "#\n", "set_a = {\"a\", \"b\", \"c\", \"d\"}\n", "set_b = {1, 2, 3, 4}\n", "set_a.update(set_b)\n", "set_a" ] }, { "cell_type": "code", "execution_count": 8, "id": "cd87c2a1-586a-454c-8d02-906859a2d63c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0, 1, 2}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# difference\n", "# ===================================\n", "#\n", "set_a = {0, 1, 2, 3, 4, 5}\n", "set_b = {3, 4, 5, 6, 7, 8}\n", "set_a.difference(set_b)" ] }, { "cell_type": "code", "execution_count": 9, "id": "889d7916-219e-4139-ba45-b820a5f7f8d5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0, 1, 2, 3, 4, 5, 6, 7, 8}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# union\n", "# ===================================\n", "#\n", "set_a = {0, 1, 2, 3, 4, 5}\n", "set_b = {3, 4, 5, 6, 7, 8}\n", "set_a.union(set_b)" ] }, { "cell_type": "code", "execution_count": 10, "id": "ccc04feb-322c-4023-861c-02a0382b27b2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{3, 4, 5}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# intersection\n", "# ===================================\n", "#\n", "set_a = {0, 1, 2, 3, 4, 5}\n", "set_b = {3, 4, 5, 6, 7, 8}\n", "set_a.intersection(set_b)" ] }, { "cell_type": "markdown", "id": "d4e2b414-4a9b-442b-86f5-801d4a58b455", "metadata": {}, "source": [ "## Operadores" ] }, { "cell_type": "code", "execution_count": 11, "id": "e613a283-3e07-4ff8-b5b8-870b492873bf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0, 1, 2}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Diferencia de conjuntos\n", "# ===================================\n", "#\n", "set_a = {0, 1, 2, 3, 4, 5}\n", "set_b = {3, 4, 5, 6, 7, 8}\n", "set_a - set_b" ] }, { "cell_type": "code", "execution_count": 12, "id": "57544fd4-4e9a-4c43-81c9-42c0a5e0b32e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0, 1, 2, 3, 4, 5, 6, 7, 8}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Unión de conjuntos\n", "# ===================================\n", "#\n", "set_a = {0, 1, 2, 3, 4, 5}\n", "set_b = {3, 4, 5, 6, 7, 8}\n", "set_a | set_b" ] }, { "cell_type": "code", "execution_count": 13, "id": "dfecd2fb-4bbd-4608-af28-f62838807fa7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{3, 4, 5}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Intersección de conjuntos\n", "# ===================================\n", "#\n", "set_a = {0, 1, 2, 3, 4, 5}\n", "set_b = {3, 4, 5, 6, 7, 8}\n", "set_a & set_b" ] }, { "cell_type": "code", "execution_count": 14, "id": "13c05070-9d31-466a-9850-a22fb13b1d58", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0, 1, 2, 6, 7, 8}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Unión menos intersección\n", "# ===================================\n", "#\n", "set_a = {0, 1, 2, 3, 4, 5}\n", "set_b = {3, 4, 5, 6, 7, 8}\n", "set_a ^ set_b" ] }, { "cell_type": "code", "execution_count": 15, "id": "e0ba71ac-e50e-46c4-acd5-7c6a0951d83e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "# Pertenencia\n", "# ===================================\n", "#\n", "\"a\" in set_a" ] }, { "cell_type": "markdown", "id": "ea231c40-3753-41fc-be9d-9276778df546", "metadata": {}, "source": [ "## Ejemplo" ] }, { "cell_type": "markdown", "id": "5e89dde8-887f-4211-b13f-36654f0a9592", "metadata": {}, "source": [ "A partir del archivo babe_names.csv, encuentre los nombres que aparecen en 2014 y no en 2011." ] }, { "cell_type": "code", "execution_count": 16, "id": "4d40faa9-bc88-4e4a-9082-b77d3968023d", "metadata": {}, "outputs": [], "source": [ "babynames_url = (\n", " \"https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/baby_names.csv\"\n", ")\n", "!wget --quiet {babynames_url} -P /tmp/" ] }, { "cell_type": "code", "execution_count": 17, "id": "6a837f0c-a7ab-4ebf-8b21-e64908a0cb4f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | BRITH_YEAR | \n", "GENDER | \n", "ETHNICTY | \n", "NAME | \n", "COUNT | \n", "RANK | \n", "
---|---|---|---|---|---|---|
0 | \n", "2011 | \n", "FEMALE | \n", "HISPANIC | \n", "GERALDINE | \n", "13 | \n", "75 | \n", "
1 | \n", "2011 | \n", "FEMALE | \n", "HISPANIC | \n", "GIA | \n", "21 | \n", "67 | \n", "
2 | \n", "2011 | \n", "FEMALE | \n", "HISPANIC | \n", "GIANNA | \n", "49 | \n", "42 | \n", "
3 | \n", "2011 | \n", "FEMALE | \n", "HISPANIC | \n", "GISELLE | \n", "38 | \n", "51 | \n", "
4 | \n", "2011 | \n", "FEMALE | \n", "HISPANIC | \n", "GRACE | \n", "36 | \n", "53 | \n", "