{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gramáticas con características\n", "\n", "* *30 min* | Última modificación: Diciembre 10, 2020" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "http://www.nltk.org/book/\n", "\n", "Text Analytics with Python" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "##\n", "## Uso de diccionarios para definir características\n", "## de entidades gramaticales\n", "##\n", "## CAT: categoria gramatical\n", "## ORTH: ortografía\n", "## REF: referente\n", "## REL: relación\n", "##\n", "\n", "kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}\n", "\n", "chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Concordancia gramatical" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Formas de conjugación de un verbo**\n", "\n", "```\n", " singular plural\n", "------------------------------------- \n", "1st per I run we run\n", "2nd per you run you run\n", "3rd per he/she/it runs rhey run\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Aproximación directa.** Reglas para singular y plural. No es adecuado para gramaticas con muchas reglas.\n", "\n", "```\n", "S -> NP_SG VP_SG\n", "S -> NP_PL VP_PL\n", "\n", "NP_SG -> Det_SG N_SG\n", "NP_PL -> Det_PL N_PL\n", "\n", "VP_SG -> V_SG\n", "VP_PL -> V_PL\n", "\n", "Det_SG -> 'this'\n", "Det_PL -> 'these'\n", "\n", "N_SG -> 'dog'\n", "N_PL -> 'dogs'\n", "\n", "V_SG -> 'runs'\n", "V_PL -> 'run'\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Adición de propiedades a las categorías gramaticales**.\n", "\n", "Si `sg` es singular y `pl` es plural.\n", "\n", "```\n", "Det[NUM=sg] -> 'this'\n", "Det[NUM=pl] -> 'these'\n", "\n", "N[NUM=sg] -> 'dog'\n", "N[NUM=pl] -> 'dogs'\n", "\n", "V[NUM=sg] -> 'runs'\n", "V[NUM=pl] -> 'run'\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Resulta más apropiado permitir variables en las propiedades. `?n` denota los posibles valores de `NUM`.\n", "\n", "```\n", "S -> NP[NUM=?n] VP[NUM=?n]\n", "\n", "NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\n", "\n", "VP[NUM=?n] -> V[NUM=?n]\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting feat0.fcfg\n" ] } ], "source": [ "%%writefile feat0.fcfg\n", "% start S\n", "# ###################\n", "# Grammar Productions\n", "# ###################\n", "# S expansion productions\n", "S -> NP[NUM=?n] VP[NUM=?n]\n", "# NP expansion productions\n", "NP[NUM=?n] -> N[NUM=?n]\n", "NP[NUM=?n] -> PropN[NUM=?n]\n", "NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\n", "NP[NUM=pl] -> N[NUM=pl]\n", "# VP expansion productions\n", "VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]\n", "VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP\n", "\n", "# ###################\n", "# Lexical Productions\n", "# ###################\n", "Det[NUM=sg] -> 'this' | 'every'\n", "Det[NUM=pl] -> 'these' | 'all'\n", "Det -> 'the' | 'some' | 'several'\n", "PropN[NUM=sg]-> 'Kim' | 'Jody'\n", "N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'\n", "N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'\n", "\n", "IV[TENSE=pres, NUM=sg] -> 'disappears' | 'walks'\n", "IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk'\n", "IV[TENSE=past] -> 'disappeared' | 'walked'\n", "\n", "TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'\n", "TV[TENSE=pres, NUM=pl] -> 'see' | 'like'\n", "TV[TENSE=past] -> 'saw' | 'liked'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S[]\n", " (NP[NUM='sg'] (PropN[NUM='sg'] Kim))\n", " (VP[NUM='sg', TENSE='pres']\n", " (TV[NUM='sg', TENSE='pres'] likes)\n", " (NP[NUM='pl'] (N[NUM='pl'] children))))\n" ] } ], "source": [ "from nltk import load_parser\n", "\n", "## crea el parser\n", "parser = load_parser('feat0.fcfg', trace=0)\n", "\n", "## frase a analizar\n", "tokens = 'Kim likes children'.split()\n", "\n", "## arboles\n", "for tree in parser.parse(tokens):\n", " print(tree)\n", " \n", "# TV: verbos transitivos\n", "# IV: verbos intransitivos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Terminologia" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La notación `+/-` se usa para representar `true/false`. En el siguiente ejemplo `AUX` indica si el verbo es usado como auxiliar.\n", "\n", "```python\n", "V[TENSE=pres, AUX=+] -> 'can'\n", "V[TENSE=pres, AUX=-] -> 'may'\n", "\n", "V[TENSE=pres, AUX=+] -> 'walks'\n", "V[TENSE=pres, AUX=-] -> 'likes'\n", "```\n", "\n", "No obstante, se suele reemplazar `AUX=+` por `+AUX`\n", "\n", "```\n", "V[TENSE=pres, +AUX] -> 'can'\n", "V[TENSE=pres, +AUX] -> 'may'\n", "\n", "V[TENSE=pres, -AUX] -> 'walks'\n", "V[TENSE=pres, -AUX] -> 'likes'\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Concordancia de características:\n", "\n", "```\n", "S -> NP[AGR=?n] VP[AGR=?n]\n", "\n", "NP[AGR=?n] -> PropN[AGR=?n]\n", "\n", "VP[TENSE=?t, AGR=?n] -> Cop[TENSE=?t, AGR=?n] Adj\n", "\n", "Cop[TENSE=pres, AGR=[NUM=sg, PER=3]] -> 'is'\n", "\n", "PropN[AGR=[NUM=sg, PER=3]] -> 'Kim'\n", "Adj -> 'happy'\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ NUM = 'sg' ]\n", "[ TENSE = 'past' ]\n" ] } ], "source": [ "##\n", "## Manipulación de características en NLTK\n", "##\n", "import nltk \n", "\n", "## características atomicas (strings o enteros)\n", "fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')\n", "print(fs1)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "fem\n" ] } ], "source": [ "##\n", "## Las estructuras de características operan como diccionarios\n", "##\n", "fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')\n", "\n", "## extracción del valor asociado a la clave\n", "print(fs1['GND'])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ CASE = 'acc' ]\n", "[ GND = 'fem' ]\n", "[ NUM = 'pl' ]\n", "[ PER = 3 ]\n" ] } ], "source": [ "##\n", "## Asignación de una nueva característica\n", "##\n", "fs1['CASE'] = 'acc'\n", "print(fs1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ [ CASE = 'acc' ] ]\n", "[ AGR = [ GND = 'fem' ] ]\n", "[ [ NUM = 'pl' ] ]\n", "[ [ PER = 3 ] ]\n", "[ ]\n", "[ POS = 'N' ]\n" ] } ], "source": [ "##\n", "## Estructuras con valores complejos\n", "##\n", "fs2 = nltk.FeatStruct(POS='N', AGR=fs1)\n", "print(fs2)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ CASE = 'acc' ]\n", "[ GND = 'fem' ]\n", "[ NUM = 'pl' ]\n", "[ PER = 3 ]\n" ] } ], "source": [ "##\n", "## Extracción de un valor complejo por clave\n", "##\n", "print(fs2['AGR'])" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] } ], "source": [ "##\n", "## Extracción de un valor dentro de un diccionario\n", "##\n", "print(fs2['AGR']['PER'])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ [ GND = 'fem' ] ]\n", "[ AGR = [ NUM = 'pl' ] ]\n", "[ [ PER = 3 ] ]\n", "[ ]\n", "[ POS = 'N' ]\n" ] } ], "source": [ "##\n", "## Creación de una estructura compleja usando corchetes\n", "##\n", "print(nltk.FeatStruct(\"[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]\"))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ AGE = 33 ]\n", "[ NAME = 'Lee' ]\n", "[ TELNO = '01 27 86 42 96' ]\n" ] } ], "source": [ "##\n", "## Las estructuras pueden guardar cualquier valor\n", "##\n", "print(nltk.FeatStruct(NAME='Lee', TELNO='01 27 86 42 96', AGE=33))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ ADDRESS = (1) [ NUMBER = 74 ] ]\n", "[ [ STREET = 'rue Pascal' ] ]\n", "[ ]\n", "[ NAME = 'Lee' ]\n", "[ ]\n", "[ SPOUSE = [ ADDRESS -> (1) ] ]\n", "[ [ NAME = 'Kim' ] ]\n" ] } ], "source": [ "##\n", "## Uso de referencias a valoes ya asignados.\n", "## Note el uso de ->(1)\n", "##\n", "print(\n", " nltk.FeatStruct(\n", " \"\"\"\n", " [\n", " NAME='Lee', \n", " ADDRESS=(1)[\n", " NUMBER=74, \n", " STREET='rue Pascal'\n", " ],\n", " SPOUSE=[\n", " NAME='Kim', \n", " ADDRESS->(1)\n", " ]\n", " ]\n", " \"\"\"\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ A = 'a' ]\n", "[ ]\n", "[ B = (1) [ C = 'c' ] ]\n", "[ ]\n", "[ D -> (1) ]\n", "[ E -> (1) ]\n" ] } ], "source": [ "##\n", "## Forma alternativa para indicar referencias\n", "##\n", "print(\n", " nltk.FeatStruct(\n", " \"\"\"\n", " [\n", " A='a', \n", " B=(1)[C='c'], \n", " D->(1), \n", " E->(1)\n", " ]\n", " \"\"\"\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ CITY = 'Paris' ]\n", "[ NUMBER = 74 ]\n", "[ STREET = 'rue Pascal' ]\n" ] } ], "source": [ "##\n", "## Unificación de características (unión de conjuntos)\n", "##\n", "fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')\n", "fs2 = nltk.FeatStruct(CITY='Paris')\n", "print(fs1.unify(fs2))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "##\n", "## Unificación de características con el mismo nombre\n", "##\n", "fs0 = nltk.FeatStruct(A='a')\n", "fs1 = nltk.FeatStruct(A='b')\n", "fs2 = fs0.unify(fs1)\n", "print(fs2)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ ADDRESS = [ NUMBER = 74 ] ]\n", "[ [ STREET = 'rue Pascal' ] ]\n", "[ ]\n", "[ NAME = 'Lee' ]\n", "[ ]\n", "[ [ ADDRESS = [ NUMBER = 74 ] ] ]\n", "[ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]\n", "[ [ ] ]\n", "[ [ NAME = 'Kim' ] ]\n" ] } ], "source": [ "##\n", "## Unificación con compartición de estructuras\n", "##\n", "fs0 = nltk.FeatStruct(\n", " \"\"\"\n", " [\n", " NAME=Lee,\n", " ADDRESS=[\n", " NUMBER=74,\n", " STREET='rue Pascal'\n", " ],\n", " SPOUSE=[\n", " NAME=Kim,\n", " ADDRESS=[\n", " NUMBER=74,\n", " STREET='rue Pascal'\n", " ]\n", " ]\n", " ]\n", " \"\"\"\n", ")\n", "print(fs0)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ ADDRESS = [ NUMBER = 74 ] ]\n", "[ [ STREET = 'rue Pascal' ] ]\n", "[ ]\n", "[ NAME = 'Lee' ]\n", "[ ]\n", "[ [ [ CITY = 'Paris' ] ] ]\n", "[ [ ADDRESS = [ NUMBER = 74 ] ] ]\n", "[ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]\n", "[ [ ] ]\n", "[ [ NAME = 'Kim' ] ]\n" ] } ], "source": [ "##\n", "## Aumento con nuevos datos\n", "##\n", "fs1 = nltk.FeatStruct(\n", " \"\"\"\n", " [\n", " SPOUSE = [\n", " ADDRESS = [CITY = Paris]\n", " ]\n", " ]\n", " \"\"\"\n", ")\n", "\n", "##\n", "## Note que la unificación agrega CITY a ADDRESS \n", "##\n", "print(fs1.unify(fs0))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ [ CITY = 'Paris' ] ]\n", "[ ADDRESS = (1) [ NUMBER = 74 ] ]\n", "[ [ STREET = 'rue Pascal' ] ]\n", "[ ]\n", "[ NAME = 'Lee' ]\n", "[ ]\n", "[ SPOUSE = [ ADDRESS -> (1) ] ]\n", "[ [ NAME = 'Kim' ] ]\n" ] } ], "source": [ "##\n", "## Comportamiento diferente\n", "##\n", "fs2 = nltk.FeatStruct(\n", " \"\"\"\n", " [\n", " NAME = Lee, \n", " ADDRESS = (1)[NUMBER=74, STREET='rue Pascal'],\n", " SPOUSE=[NAME=Kim, ADDRESS->(1)]\n", " ]\n", " \"\"\"\n", ")\n", "\n", "## \n", "## Note que se agrega CITY = 'Paris' a la direccion\n", "## principal\n", "##\n", "print(fs1.unify(fs2))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ ADDRESS1 = ?x ]\n", "[ ADDRESS2 = ?x ]\n" ] } ], "source": [ "##\n", "## Uso de variables\n", "##\n", "fs1 = nltk.FeatStruct(\"[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]\")\n", "fs2 = nltk.FeatStruct(\"[ADDRESS1=?x, ADDRESS2=?x]\")\n", "print(fs2)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ ADDRESS1 = (1) [ NUMBER = 74 ] ]\n", "[ [ STREET = 'rue Pascal' ] ]\n", "[ ]\n", "[ ADDRESS2 -> (1) ]\n" ] } ], "source": [ "##\n", "## Asignación de las variables a partir\n", "## de los datos en fs1\n", "##\n", "print(fs2.unify(fs1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Subcategorización" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Gramática original**.\n", "\n", "```\n", "S -> NP[NUM=?n] VP[NUM=?n]\n", "\n", "NP[NUM=?n] -> N[NUM=?n]\n", "NP[NUM=?n] -> PropN[NUM=?n]\n", "NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\n", "NP[NUM=pl] -> N[NUM=pl]\n", "\n", "##\n", "## Cambia esta definición. Se pueden manejar \n", "## mediante propiedades\n", "##\n", "VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]\n", "VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP\n", "\n", "\n", "# igual a partir de aca\n", "Det[NUM=sg] -> 'this' | 'every'\n", "Det[NUM=pl] -> 'these' | 'all'\n", "Det -> 'the' | 'some' | 'several'\n", "\n", "PropN[NUM=sg]-> 'Kim' | 'Jody'\n", "\n", "N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'\n", "N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'\n", "\n", "##\n", "## Se hace innecesaria esta parte\n", "##\n", "IV[TENSE=pres, NUM=sg] -> 'disappears' | 'walks'\n", "IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk'\n", "IV[TENSE=past] -> 'disappeared' | 'walked'\n", "\n", "TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'\n", "TV[TENSE=pres, NUM=pl] -> 'see' | 'like'\n", "TV[TENSE=past] -> 'saw' | 'liked`\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Gramática modificada**." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting feat1_exam.fcfg\n" ] } ], "source": [ "%%writefile feat1_exam.fcfg\n", "%start S\n", "\n", "S -> NP[NUM=?n] VP[NUM=?n]\n", "\n", "NP[NUM=?n] -> N[NUM=?n]\n", "NP[NUM=?n] -> PropN[NUM=?n]\n", "NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\n", "NP[NUM=pl] -> N[NUM=pl]\n", "\n", "##\n", "## Nueva definición:\n", "## agreaga SUBCAT = {intrans, trans, clause}\n", "##\n", "VP[TENSE=?t, NUM=?n] -> V[SUBCAT=intrans, TENSE=?t, NUM=?n]\n", "VP[TENSE=?t, NUM=?n] -> V[SUBCAT=trans, TENSE=?t, NUM=?n] NP\n", "VP[TENSE=?t, NUM=?n] -> V[SUBCAT=clause, TENSE=?t, NUM=?n] SBar\n", "\n", "V[SUBCAT=intrans, TENSE=pres, NUM=sg] -> 'disappears' | 'walks' | 'puts'\n", "V[SUBCAT=trans, TENSE=pres, NUM=sg] -> 'sees' | 'likes'\n", "V[SUBCAT=clause, TENSE=pres, NUM=sg] -> 'says' | 'claims'\n", "\n", "V[SUBCAT=intrans, TENSE=pres, NUM=pl] -> 'disappear' | 'walk' | 'put'\n", "V[SUBCAT=trans, TENSE=pres, NUM=pl] -> 'see' | 'like'\n", "V[SUBCAT=clause, TENSE=pres, NUM=pl] -> 'say' | 'claim'\n", "\n", "V[SUBCAT=intrans, TENSE=past, NUM=?n] -> 'disappeared' | 'walked' | 'put'\n", "V[SUBCAT=trans, TENSE=past, NUM=?n] -> 'saw' | 'liked' | 'put'\n", "V[SUBCAT=clause, TENSE=past, NUM=?n] -> 'said' | 'claimed' | 'put'\n", "\n", "# igual a partir de aca\n", "Det[NUM=sg] -> 'this' | 'every'\n", "Det[NUM=pl] -> 'these' | 'all'\n", "Det -> 'the' | 'some' | 'several'\n", "\n", "PropN[NUM=sg]-> 'Kim' | 'Jody'\n", "\n", "N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child' | 'table' | 'book' \n", "N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children' | 'tables' | 'books'\n", "\n", "\n", "##\n", "## Agrega dos producciones\n", "##\n", "SBar -> Comp S\n", "Comp -> 'that'" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S[]\n", " (NP[NUM='sg'] (PropN[NUM='sg'] Kim))\n", " (VP[NUM='sg', TENSE='pres']\n", " (V[NUM='sg', SUBCAT='clause', TENSE='pres'] claims)\n", " (SBar[]\n", " (Comp[] that)\n", " (S[]\n", " (NP[NUM='sg'] (PropN[NUM='sg'] Jody))\n", " (VP[NUM='sg', TENSE='pres']\n", " (V[NUM='sg', SUBCAT='trans', TENSE='pres'] likes)\n", " (NP[NUM='pl'] (N[NUM='pl'] children)))))))\n" ] } ], "source": [ "from nltk import load_parser\n", "\n", "## crea el parser\n", "parser = load_parser('feat1_exam.fcfg', trace=0)\n", "\n", "## frase a analizar\n", "tokens = 'Kim claims that Jody likes children'.split()\n", "\n", "## arboles\n", "for tree in parser.parse(tokens):\n", " print(tree)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "##\n", "## ------------------------------- Hasta aqui -------------------------------\n", "##" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "##\n", "## Generalized Phrase Structure Grammar (GPSG)\n", "##" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting feat1.fcfg\n" ] } ], "source": [ "%%writefile feat1.fcfg\n", "% start S\n", "# ###################\n", "# Grammar Productions\n", "# ###################\n", "S[-INV] -> NP VP\n", "S[-INV]/?x -> NP VP/?x\n", "S[-INV] -> NP S/NP\n", "S[-INV] -> Adv[+NEG] S[+INV]\n", "S[+INV] -> V[+AUX] NP VP\n", "S[+INV]/?x -> V[+AUX] NP VP/?x\n", "SBar -> Comp S[-INV]\n", "SBar/?x -> Comp S[-INV]/?x\n", "VP -> V[SUBCAT=intrans, -AUX]\n", "VP -> V[SUBCAT=trans, -AUX] NP\n", "VP/?x -> V[SUBCAT=trans, -AUX] NP/?x\n", "VP -> V[SUBCAT=clause, -AUX] SBar\n", "VP/?x -> V[SUBCAT=clause, -AUX] SBar/?x\n", "VP -> V[+AUX] VP\n", "VP/?x -> V[+AUX] VP/?x\n", "\n", "# ###################\n", "# Lexical Productions\n", "# ###################\n", "V[SUBCAT=intrans, -AUX] -> 'walk' | 'sing'\n", "V[SUBCAT=trans, -AUX] -> 'see' | 'like'\n", "V[SUBCAT=clause, -AUX] -> 'say' | 'claim'\n", "V[+AUX] -> 'do' | 'can'\n", "NP[-WH] -> 'you' | 'cats'\n", "NP[+WH] -> 'who'\n", "Adv[+NEG] -> 'rarely' | 'never'\n", "NP/NP ->\n", "Comp -> 'that'" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S[-INV]\n", " (NP[+WH] who)\n", " (S[+INV]/NP[]\n", " (V[+AUX] do)\n", " (NP[-WH] you)\n", " (VP[]/NP[]\n", " (V[-AUX, SUBCAT='clause'] claim)\n", " (SBar[]/NP[]\n", " (Comp[] that)\n", " (S[-INV]/NP[]\n", " (NP[-WH] you)\n", " (VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] )))))))\n" ] } ], "source": [ "tokens = 'who do you claim that you like'.split()\n", "from nltk import load_parser\n", "cp = load_parser('feat1.fcfg')\n", "for tree in cp.parse(tokens):\n", " print(tree)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S[-INV]\n", " (NP[-WH] you)\n", " (VP[]\n", " (V[-AUX, SUBCAT='clause'] claim)\n", " (SBar[]\n", " (Comp[] that)\n", " (S[-INV]\n", " (NP[-WH] you)\n", " (VP[] (V[-AUX, SUBCAT='trans'] like) (NP[-WH] cats))))))\n" ] } ], "source": [ "tokens = 'you claim that you like cats'.split()\n", "for tree in cp.parse(tokens):\n", " print(tree)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(S[-INV]\n", " (Adv[+NEG] rarely)\n", " (S[+INV]\n", " (V[+AUX] do)\n", " (NP[-WH] you)\n", " (VP[] (V[-AUX, SUBCAT='intrans'] sing))))\n" ] } ], "source": [ "tokens = 'rarely do you sing'.split()\n", "for tree in cp.parse(tokens):\n", " print(tree)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", " \n", "tokens = 'rarely do you sing'.split()\n", "for tree in cp.parse(tokens):\n", " print(tree)\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "text = '''\n", "\n", " \t\n", ">>> tokens = 'rarely do you sing'.split()\n", ">>> for tree in cp.parse(tokens):\n", "... print(tree)\n", "\n", "\n", "\n", "'''\n", "\n", "\n", "text = text.replace(\">>> \", \"\").replace(\"... \", \"\").replace(\"...\", \"\").replace(\"\\t\", \"\")\n", "print(text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }