{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gramáticas\n", "\n", "* *30 min* | Última modificación: Diciembre 10, 2020" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "http://www.nltk.org/book/\n", "\n", "Text Analytics with Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Ejemplo de una gramática ambigüa**\n", "\n", "```\n", "S -> NP VP\n", "\n", "PP -> P NP\n", "\n", "NP -> Det N \n", " | Det N PP \n", " | 'I'\n", "\n", "VP -> V NP \n", " | VP PP\n", "\n", "Det -> 'an' \n", " | 'my'\n", "\n", "N -> 'elephant' \n", " | 'pajamas'\n", "\n", "V -> 'shot'\n", "\n", "P -> 'in'\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aplicación a la frase:\n", "\n", " I shot and elephant in my pijamas\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Aplicación (parte 1)**\n", "\n", "```\n", "Frase Regla Stack\n", "------------------------------------------------------------------------------------------------------\n", ">I< shot an elephant in my pijamas S\n", "\n", ">I< shot an elephant in my pijamas S -> NP VP (S NP VP)\n", "\n", ">I< shot an elephant in my pijamas NP -> Det N (S NP VP)\n", "Falla Det -> 'an' | 'my' pero >I< \n", "\n", ">I< shot an elephant in my pijamas NP -> Det N PP (S NP VP)\n", "Falla Det -> 'an' | 'my' pero >I< \n", "\n", ">I< shot an elephant in my pijamas NP -> 'I' (S (NP I) VP) \n", "\n", ">shot< an elephant in my pijamas VP -> V NP (S (NP I) (VP V NP))\n", "Ambiguo VP tiene dos posibles reglas\n", "\n", ">shot< an elephant in my pijamas V -> 'shot' (S (NP I) (VP (V shot) NP))\n", "\n", ">an< elephant in my pijamas NP -> Det N (S (NP I) (VP (V shot) (NP Det N)))\n", "\n", ">an< elephant in my pijamas Det -> 'an' (S (NP I) (VP (V shot) (NP (Det an) N)))\n", "\n", ">elephant< in my pijamas N -> 'Elephant' (S (NP I) (VP (V shot) (NP (Det an) (N Elephant)))\n", "Falla. No se reconocio toda la frase\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Aplicación (parte 2)**\n", "\n", "```\n", "Frase Regla Stack\n", "------------------------------------------------------------------------------------------------------\n", "(de la tabla anterior)\n", "\n", ">I< shot an elephant in my pijamas NP -> 'I' (S (NP I) VP) \n", "\n", ">shot< an elephant in my pijamas VP -> VP PP (S (NP I) (VP VP PP))\n", "Ambiguo VP tiene dos posibles reglas\n", "\n", ">shot< an elephant in my pijamas VP -> V NP (S (NP I) \n", "Ambiguo VP tiene dos posibles reglas (VP \n", " (VP V NP) \n", " PP))\n", "\n", "\n", ">shot< an elephant in my pijamas V -> shot (S (NP I) \n", " (VP \n", " (VP (V shot) NP)\n", " PP))\n", "\n", ">an< elephant in my pijamas NP -> Det N (S (NP I) \n", "Dos posibles reglas (VP \n", " (VP (V shot) (NP Det N)) \n", " PP))\n", "\n", "\n", ">an< elephant in my pijamas Det -> an (S (NP I) \n", " (VP \n", " (VP (V shot) (NP (Det an) N)) \n", " PP)) \n", "\n", ">elephant< in my pijamas N -> 'elephant' (S (NP I) \n", " (VP \n", " (VP (V shot) (NP (Det an) (N elephant)) \n", " PP)) \n", "\n", ">in< my pijamas PP -> P NP (S (NP I) \n", " (VP \n", " (VP (V shot) (NP (Det an) (N elephant)) \n", " (PP P NP))) \n", " \n", ">in< my pijamas P -> in (S (NP I) \n", " (VP \n", " (VP (V shot) (NP (Det an) (N elephant)) \n", " (PP (P in) NP))) \n", " \n", ">my< pijamas NP -> Det N (S (NP I) \n", " (VP \n", " (VP (V shot) (NP (Det an) (N elephant)) \n", " (PP (P in) (NP Det N)))) \n", " \n", ">my< pijamas Det -> my (S (NP I) \n", " (VP \n", " (VP (V shot) (NP (Det an) (N elephant)) \n", " (PP (P in) (NP (Det my) N)))) \n", " \n", ">pijamas< N -> pijamas (S (NP I) \n", " (VP \n", " (VP (V shot) (NP (Det an) (N elephant)) \n", " (PP (P in) (NP (Det my) (N pijamas))))) \n", "\n", " \n", ">< (S (NP I) \n", " (VP \n", " (VP (V shot) (NP (Det an) (N elephant)) \n", " (PP (P in) (NP (Det my) (N pijamas))))) \n", "\n", "\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Ejercicio** Complete la tabla siguiente:\n", "\n", "```\n", "Frase Regla Stack\n", "------------------------------------------------------------------------------------------------------\n", "(de la tabla inicial)\n", "\n", ">I< shot an elephant in my pijamas NP -> 'I' (S (NP I) VP) \n", "\n", ">shot< an elephant in my pijamas VP -> V NP (S (NP I) \n", " (VP \n", " V \n", " NP))\n", "\n", "(Continue)\n", "\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import nltk" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S\n", " (NP I)\n", " (VP\n", " (VP (V shot) (NP (Det an) (N elephant)))\n", " (PP (P in) (NP (Det my) (N pajamas)))))\n", "(S\n", " (NP I)\n", " (VP\n", " (V shot)\n", " (NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N pajamas))))))\n" ] } ], "source": [ "##\n", "## Implementación en NLTK\n", "##\n", "##\n", "groucho_grammar = nltk.CFG.fromstring(\n", " \"\"\"\n", " S -> NP VP\n", "\n", " PP -> P NP\n", "\n", " NP -> Det N | Det N PP | 'I'\n", "\n", " VP -> V NP | VP PP\n", "\n", " Det -> 'an' | 'my'\n", "\n", " N -> 'elephant' | 'pajamas'\n", "\n", " V -> 'shot'\n", "\n", " P -> 'in'\n", " \"\"\"\n", ")\n", "\n", "##\n", "## Dos posibles descomposiciones de la misma frase\n", "## usando la gramatica anterior\n", "##\n", "sent = [\"I\", \"shot\", \"an\", \"elephant\", \"in\", \"my\", \"pajamas\"]\n", "parser = nltk.ChartParser(groucho_grammar)\n", "for tree in parser.parse(sent):\n", " print(tree)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gramática libre del contexto" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S (NP Mary) (VP (V saw) (NP Bob)))\n" ] } ], "source": [ "##\n", "## Gramática libre del contexto\n", "## Ejemplo de una gramática estructuralmente ambigua\n", "##\n", "grammar1 = nltk.CFG.fromstring(\n", " \"\"\"\n", " S -> NP VP\n", " \n", " VP -> V NP | V NP PP\n", " \n", " PP -> P NP\n", " \n", " V -> \"saw\" | \"ate\" | \"walked\"\n", " \n", " NP -> \"John\" | \"Mary\" | \"Bob\" | Det N | Det N PP\n", " \n", " Det -> \"a\" | \"an\" | \"the\" | \"my\"\n", " \n", " N -> \"man\" | \"dog\" | \"cat\" | \"telescope\" | \"park\"\n", " \n", " P -> \"in\" | \"on\" | \"by\" | \"with\"\n", " \"\"\"\n", ")\n", "\n", "sent = \"Mary saw Bob\".split()\n", "rd_parser = nltk.RecursiveDescentParser(grammar1)\n", "for tree in rd_parser.parse(sent):\n", " print(tree)\n", "\n", "##\n", "## S Sentence\n", "## NP Noun phrase\n", "## VP Verb phrase\n", "## PP Prepositional phrase\n", "## Det Determiner\n", "## N Noun\n", "## V Verb\n", "## P Preposition\n", "##" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Ejercicio** Construya el arbol que deriva la expresión anterior." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Escritura de y almacenamiento de una gramática**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting mygrammar.cfg\n" ] } ], "source": [ "%%writefile mygrammar.cfg\n", "S -> NP VP\n", "VP -> V NP | V NP PP\n", "PP -> P NP\n", "V -> \"saw\" | \"ate\" | \"walked\"\n", "NP -> \"John\" | \"Mary\" | \"Bob\" | Det N | Det N PP\n", "Det -> \"a\" | \"an\" | \"the\" | \"my\"\n", "N -> \"man\" | \"dog\" | \"cat\" | \"telescope\" | \"park\"\n", "P -> \"in\" | \"on\" | \"by\" | \"with\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parsing 'Mary saw Bob'\n", "Found a parse:\n", "(S (NP Mary) (VP (V saw) (NP Bob)))\n", "(S (NP Mary) (VP (V saw) (NP Bob)))\n" ] } ], "source": [ "##\n", "## Lectura del archivo y parsing de la frase\n", "##\n", "import nltk \n", "\n", "grammar1 = nltk.data.load('file:mygrammar.cfg')\n", "sent = \"Mary saw Bob\".split()\n", "rd_parser = nltk.RecursiveDescentParser(grammar1, trace=1)\n", "for tree in rd_parser.parse(sent):\n", " print(tree)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "##\n", "## Gramática recursiva\n", "## Se dice que una gramática es recursiva si el nombre\n", "## de la regla también aparece al lado derecho \n", "## - Recursión directa: Nom -> Adj Nom\n", "## - Recursión indirecta: S -> NP VP & VP -> V S\n", "##\n", "grammar2 = nltk.CFG.fromstring(\n", " \"\"\"\n", " S -> NP VP\n", " \n", " NP -> Det Nom | PropN\n", " \n", " Nom -> Adj Nom | N\n", " \n", " VP -> V Adj | V NP | V S | V NP PP\n", " \n", " PP -> P NP\n", " \n", " PropN -> 'Buster' | 'Chatterer' | 'Joe'\n", " \n", " Det -> 'the' | 'a'\n", " \n", " N -> 'bear' | 'squirrel' | 'tree' | 'fish' | 'log'\n", " \n", " Adj -> 'angry' | 'frightened' | 'little' | 'tall'\n", " \n", " V -> 'chased' | 'saw' | 'said' | 'thought' | 'was' | 'put'\n", " \n", " P -> 'on'\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S\n", " (NP (Det the) (Nom (Adj angry) (Nom (N bear))))\n", " (VP\n", " (V chased)\n", " (NP\n", " (Det the)\n", " (Nom (Adj frightened) (Nom (Adj little) (Nom (N squirrel)))))))\n" ] } ], "source": [ "##\n", "## Ejemplo de recursión por frases nominales anidadas\n", "##\n", "sent = \"the angry bear chased the frightened little squirrel\".split()\n", "rd_parser = nltk.RecursiveDescentParser(grammar2)\n", "for tree in rd_parser.parse(sent):\n", " print(tree)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S\n", " (NP (PropN Chatterer))\n", " (VP\n", " (V said)\n", " (S\n", " (NP (PropN Buster))\n", " (VP\n", " (V thought)\n", " (S (NP (Det the) (Nom (N tree))) (VP (V was) (Adj tall)))))))\n" ] } ], "source": [ "##\n", "## Ejemplo de recursión por sentencias anidadas\n", "##\n", "sent = \"Chatterer said Buster thought the tree was tall\".split()\n", "rd_parser = nltk.RecursiveDescentParser(grammar2)\n", "for tree in rd_parser.parse(sent):\n", " print(tree)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parser recursivo descendente" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Este parser NO puede manejar recusión por la izquierda (X -> X Y) ya que genera ciclos infinitos.\n", "\n", "```python\n", "grammar1 = nltk.CFG.fromstring(\n", " \"\"\"\n", " S -> NP VP\n", " \n", " VP -> V NP | V NP PP\n", " \n", " PP -> P NP\n", " \n", " V -> \"saw\" | \"ate\" | \"walked\"\n", " \n", " NP -> \"John\" | \"Mary\" | \"Bob\" | Det N | Det N PP\n", " \n", " Det -> \"a\" | \"an\" | \"the\" | \"my\"\n", " \n", " N -> \"man\" | \"dog\" | \"cat\" | \"telescope\" | \"park\"\n", " \n", " P -> \"in\" | \"on\" | \"by\" | \"with\"\n", " \"\"\"\n", ")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parsing de la frase \"Mary saw a dog\".\n", "\n", "```\n", "Frase Regla Stack\n", "------------------------------------------------------------------------------------------------------\n", ">Mary< saw a dog S\n", "\n", ">Mary< saw a dog S -> NP VP (S NP VP)\n", "\n", ">Mary< saw a dog NP -> \"John\" (S NP VP) Falla, backtracking\n", "\n", ">Mary< saw a dog NP -> \"Mary\" (S (NP Mary) VP) Match, consume el símbolo\n", "\n", ">saw< a dog VP -> V NP (S \n", " (NP Mary) \n", " (VP V NP))\n", " \n", ">saw< a dog V -> \"saw\" (S Match, consume el símbolo \n", " (NP Mary) \n", " (VP (V saw) NP))\n", " \n", ">a< dog NP -> \"John\" (S Falla, backtracking\n", " (NP Mary) \n", " (VP (V saw) NP))\n", " \n", ">a< dog NP -> \"Mary\" (S Falla, backtracking\n", " (NP Mary) \n", " (VP (V saw) NP))\n", "\n", ">a< dog NP -> \"Bob\" (S Falla, backtracking\n", " (NP Mary) \n", " (VP (V saw) NP))\n", " \n", ">a< dog NP -> Det N (S \n", " (NP Mary) \n", " (VP \n", " (V saw)\n", " (NP Det N)))\n", "\n", ">a< dog Det -> \"a\" (S Match, consume el símbolo \n", " (NP Mary) \n", " (VP \n", " (V saw)\n", " (NP (Det a) N)))\n", "\n", ">dog< N -> \"man\" (S Falla, backtracking\n", " (NP Mary) \n", " (VP \n", " (V saw)\n", " (NP (Det a) N)))\n", "\n", ">dog< N -> \"dog\" (S Match, consume el símbolo\n", " (NP Mary) \n", " (VP \n", " (V saw)\n", " (NP (Det a) (N dog))))\n", "\n", " \n", "\n", "\n", ">< N -> \"dog\" (S\n", " (NP Mary) \n", " (VP \n", " (V saw)\n", " (NP (Det a) (N dog))))\n", "```\n", "\n", "La sentencia fue reconocida correctamente: No hay símbolos por consumir, y no quedan reglas por reemplazar en el stack." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parsing 'Mary saw a dog'\n", "Found a parse:\n", "(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))\n", "(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))\n" ] } ], "source": [ "##\n", "## Llame la función usando diferentes valores para el\n", "## parámetro trace\n", "##\n", "rd_parser = nltk.RecursiveDescentParser(grammar1, trace=1)\n", "sent = 'Mary saw a dog'.split()\n", "for tree in rd_parser.parse(sent):\n", " print(tree) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Shift-reduce Parsing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Operación de un parser shift-reduce. \n", "\n", "```\n", " S -> NP VP\n", "\n", " VP -> V NP \n", " | V NP PP\n", "\n", " PP -> P NP\n", "\n", " V -> \"saw\" \n", " | \"ate\" \n", " | \"walked\"\n", " \n", " NP -> \"John\" \n", " | \"Mary\" \n", " | \"Bob\" \n", " | Det N \n", " | Det N PP\n", " \n", "Det -> \"a\" \n", " | \"an\" \n", " | \"the\" \n", " | \"my\"\n", " \n", " N -> \"man\" \n", " | \"dog\" \n", " | \"cat\" \n", " | \"telescope\" \n", " | \"park\"\n", " \n", " P -> \"in\" \n", " | \"on\" \n", " | \"by\" \n", " | \"with\"\n", " \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "Op. Regla Stack Remaining Text \n", "-----------------------------------------------------------------------------------\n", " the dog saw a man in the park<\n", " \n", "shift the dog saw a man in the park<\n", "\n", "reduce Det -> the (Det the) dog saw a man in the park<\n", "\n", "shift (Det the) saw a man in the park<\n", " dog\n", "\n", "reduce N -> dog (Det the) saw a man in the park<\n", " (N dog) \n", " \n", "reduce NP -> Det N (NP saw a man in the park<\n", " (Det the)\n", " (N dog))\n", "\n", "shift (NP a man in the park<\n", " (Det the)\n", " (N dog))\n", " saw\n", " \n", "reduce V -> saw (NP a man in the park<\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", "\n", "shift (NP man in the park<\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " a\n", "\n", "reduce Det -> a (NP man in the park<\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (Det a)\n", "\n", "shift (NP in the park<\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (Det a)\n", " man\n", "\n", "reduce N -> man (NP in the park<\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (Det a)\n", " (N man)\n", "\n", "reduce NP -> Det N (NP in the park<\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", "\n", "shift (NP the park<\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", " in\n", " \n", "reduce P -> in (NP the park< \n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", " (P in)\n", " \n", "shift (NP park<\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", " (P in)\n", " the\n", "\n", "reduce P -> in (NP park< \n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", " (P in)\n", " (Det the)\n", "\n", "shift (NP <\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", " (P in)\n", " (Det the)\n", " park\n", " \n", " \n", "reduce N -> park (NP <\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", " (P in)\n", " (Det the)\n", " (N park)\n", " \n", "reduce NP -> Det N (NP <\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", " (P in)\n", " (NP\n", " (Det the)\n", " (N park))\n", " \n", "reduce PP -> P NP (NP <\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP \n", " (Det a)\n", " (N man))\n", " (PP\n", " (P in)\n", " (NP\n", " (Det the)\n", " (N park)))\n", " \n", "reduce NP -> NP PP (NP <\n", " (Det the)\n", " (N dog))\n", " (V saw)\n", " (NP\n", " (NP \n", " (Det a)\n", " (N man))\n", " (PP\n", " (P in)\n", " (NP\n", " (Det the)\n", " (N park))))\n", " \n", "reduce VP -> V VP (NP <\n", " (Det the)\n", " (N dog))\n", " (VP\n", " (V saw)\n", " (NP\n", " (NP \n", " (Det a)\n", " (N man))\n", " (PP\n", " (P in)\n", " (NP\n", " (Det the)\n", " (N park)))))\n", " \n", "reduce S -> NP VP (S <\n", " (NP\n", " (Det the)\n", " (N dog))\n", " (VP\n", " (V saw)\n", " (NP\n", " (NP \n", " (Det a)\n", " (N man))\n", " (PP\n", " (P in)\n", " (NP\n", " (Det the)\n", " (N park))))))\n", "\n", "```\n", "\n", "OK! se llego a `S` y no hay texto pendiente por procesar." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parsing 'Mary saw a dog'\n", " [ * Mary saw a dog]\n", " [ 'Mary' * saw a dog]\n", " [ NP 'saw' * a dog]\n", " [ NP V 'a' * dog]\n", " [ NP V Det 'dog' * ]\n", "(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))\n" ] } ], "source": [ "##\n", "## Ejemplo Shift-reduce parsing\n", "##\n", "sr_parser = nltk.ShiftReduceParser(grammar1, trace=1)\n", "sent = 'Mary saw a dog'.split()\n", "for tree in sr_parser.parse(sent):\n", " print(tree) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Ejercicio.---** Aplique un parser shift-reduce a la frase \"Mary saw a dog\"." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }