NPL Natural language processing (FSA)
Learning Goal: I’m working on a machine learning discussion question and need an explanation and answer to help me learn.
Problem 1: FSA (3pts)
Design an FSA that will recognize date expressions that come in two forms: 19/1/2021 (the day is first) or 19 Jan 2021. Assume that date strings are fed to your FSA one character at a time. In general, assume the patterns are either DD/MM/YYYY or “DD MMM YYYY”, where MM = a one- or two-digit month string, between 1 and 12 inclusive; DD = a one- or two-digit day string, between 1 and 31 inclusive; and YYYY = a one-to-four-digit year string, between 0 and 9999 inclusive; and MMM = a three character month string, starting with a capital letter, one of Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, or Dec. In the first pattern, there will be no whitespace; in the second pattern, assume that the day/month and month/year strings are separated by a single space.
Problem 2: Regular Expressions (1pt)
Write your solution to Problem 1 as a regular expression. You may split the regular expression in parts if you so wish.
Problem 3: Complex FSAs
(2pt) Design a FSA that will recognize sentences such as “There’s a flea on the speck on the frog on the bump on the branch on the log in the hole in the bottom of the sea.” Namely, the sentences should look like “There’s an THING1 on/in the THING2 on/in the THING3 … on/in the THINGn”, where THINGn is a thing, and where the maximum n can be arbitrarily large. Assume that words are fed to the FSA one at a time (not a character at a time), and your FSA may include one special transition that tests whether a word represents a noun or not.
(2pts) Design an FSA that recognize sentences such as “The mouse the cat the dog chased ate lived in the house that Mary built.”, namely “ANIMAL1 ANIMAL2 ANIMAL3 …VERBED3 VERBED2 VERBED1 lived in the PLACE1 which Mary built.” Assume that n <= 3 and that you have two special transitions available, one that tests whether a word represents an animal, and the other that tests whether a word is a verb.
(1pts) For part (b), would it be possible to design an FSA that would work to arbitrary depth n? Why or why not? What is different between the FSA is part (a) and in part (b)?
Problem 4: Morphotactic FSTs (1pts)
Consider the following pairs of verb lemmas: cry/cries, fly/flies, die/dies. Design a morphotactic FST that takes a lexical description such as “die + Pres + 3rd” and converts it into an intermediate form suitable for orthographic processing. Show the state by state transition along with the correct input and output when the FST is presented with the input “die + Pres + 3rd” and “fly + Pres + 3rd”.