L16 – Logic I: Languages and Automata

Discrete Mathematics2026年5月6日发布 芮和
348 330
5,703字
24–36 分钟

Formal language

Definition

  • An alphabet is a finite set Σ\Sigma.
  • Elements of an alphabet Σ\Sigma are called characters.
  • A string over Σ\Sigma is a finite sequence of characters from Σ\Sigma.
  • A (formal) languageLL over Σ\Sigma is a set of strings

Notations:

  • The empty string containing no characters is denoted ε\varepsilon.
  • The set of all strings over Σ\Sigma is denoted Σ\Sigma^*.
  • So a language over Σ\Sigma is a subset LΣL \subseteq \Sigma^*.

Example

  • Σ={a,b}\Sigma = \{a, b\}.
  • Σε,a,abb,aabaaabbabaaabaaaabbb\Sigma^* \ni \varepsilon, a, abb, aabaaabbabaaabaaaabbb

Example

  • Σ={a,b,c}\Sigma = \{a, b, c\}.
  • LL is the language of palindromes over Σ\Sigma.
  • L={ε,a,b,c,aa,bb,cc,aaa,aba,aca,bab,}L = \{\varepsilon, a, b, c, aa, bb, cc, aaa, aba, aca, bab, \dots\}.

Example
Programming codes, mathematical expressions, logic formulas.


Formal language vs Natural language

Natural languageFormal language
Natural, organic, evolving.Artificial, designed, fixed.
Ambiguous, descriptive.Well-defined, prescriptive.
Flexible, redundant.Minimalist, structured.
Semantics (meaning) is most important.The form is most important.

Key question:
How to efficiently describe a language?
(not by listing all its strings).


Regular language

  • UnionL1L2L_1 \cup L_2.
  • ConcatenationL1L2={xyΣ|xL1,yL2}L_1L_2 = \{xy \in \Sigma^* \mid x \in L_1, y \in L_2\}.
  • PowerL0={ε}L^0 = \{\varepsilon\}. Ln+1=LLnL^{n+1} = LL^n.
  • Kleene closureL=i=0Li={xΣ|n,xLn}L^* = \bigcup_{i=0}^\infty L^i = \{x \in \Sigma^* \mid \exists n, x \in L^n\}.

Definition: The collection of regular languages can be defined recursively as follows:

  • The empty language \emptyset and {c}\{c\} for cΣc \in \Sigma are regular languages.
  • If LL is a regular language, so are LL^* and L0={ε}L^0 = \{\varepsilon\}.
  • If L1L_1 and L2L_2 are regular languages, so are L1L2L_1 \cup L_2 and L1L2L_1L_2.

Deterministic finite automaton

A mathematical model of computation. Can be used to determine whether a string is within a language.

Definition: A deterministic finite automaton (plural automata) consists of a set QQ of states and a transition function τ:Q×ΣQ\tau: Q \times \Sigma \to Q. A unique state q0Qq_0 \in Q is designated as the initial state, and some states FQF \subseteq Q are designated as the final states.

A DFA can be represented as a graph with directed edges labeled by characters.

An automaton runs on an input string s=c1cns = c_1 \dots c_n and answers YES or NO.

  • Initially, the automaton is in the initial state r0=q0r_0 = q_0.
  • After kk steps, the state of the automaton is denoted rkr_k.
  • At the kk-th step, the automaton reads the kk-th character ckc_k of ss, then transitions from the state rk1r_{k-1} to rk=τ(rk1,ck)r_k = \tau(r_{k-1}, c_k).
  • After nn steps, the string is accepted if rnFr_n \in F is a final state, and rejected otherwise.

Definition: The language of an automaton is the set of strings accepted by this automaton.

Theorem: A language is regular iff it is the language of a deterministic finite automaton.

Example

  • The string 010110 is accepted.
  • The string 101000 is rejected.
  • The string 11011100 is?

Example

What does this accept?

Nondeterministic finite automaton

In an NFA,

  • For a state-character pair in Q×ΣQ \times \Sigma, there might be zero or several possible transitions (non-deterministic).
  • There is an ε\varepsilon-transition that can be followed for any number of times without consuming any character in the input string.
  • The input string is accepted iff there exists a sequence of transitions that leads to a final state.

Example

Theorem (NFA=DFA) A language can be accepted by a DFA iff it can be accepted by an NFA.

Proof (omitted) by powerset construction.

Example

Theorem: A language is regular iff it is the language of a NFA.


Formal grammar

Definition: A formal grammar consists of

  • A set Σ\Sigma of terminal symbols.
  • A set NN of nonterminal symbols disjoint from Σ\Sigma.
  • A set PP of production rules of the form
    (ΣN)N(ΣN)(ΣN)(\Sigma \cup N)^* N (\Sigma \cup N)^* \to (\Sigma \cup N)^*
  • A distinguished start symbolSNS \in N.

Example:

  • SaSbS \to aSb, SbaS \to ba.
  • SaS \to a, SSSS \to SS, aSabaSa \to b.

Regular grammar

Definition: A (right-)regular grammar is a formal grammar such that each production rule is of the following forms

  • AεA \to \varepsilon.
  • AaA \to a.
  • AaBA \to aB.
    where A,BNA, B \in N and aΣa \in \Sigma.

Theorem: A language is regular iff it can be generated by a (right-)regular grammar.

If we replace the last production rule by ABaA \to Ba, we obtain a left-regular grammar, equivalent to the right-regular grammar.

However, if both ABaA \to Ba and AaBA \to aB are included in the production rules, the language is not regular.


Regular expression

A convenient and popular way to describe a regular language. Let RR be a regular expression, we use (R)\mathcal{L}(R) to denote the language it describes.

Regular expressions (regex) start with the atomic regular expressions.

  • the empty language \emptyset, ()=\mathcal{L}(\emptyset) = \emptyset.
  • the empty string ε\varepsilon, (ε)={ε}\mathcal{L}(\varepsilon) = \{\varepsilon\}.
  • for aΣa \in \Sigma, (a)={a}\mathcal{L}(a) = \{a\}.

Then we combine regular expressions in the following ways (in the order of increasing precedence!).

  • Or (union of languages).
    (R1|R2)=(R1)(R2)\mathcal{L}(R_1|R_2) = \mathcal{L}(R_1) \cup \mathcal{L}(R_2).
  • Concatenation.
    (R1R2)=(R1)(R2)\mathcal{L}(R_1R_2) = \mathcal{L}(R_1)\mathcal{L}(R_2).
  • Kleene closure.
    (R)=(R)\mathcal{L}(R^*) = \mathcal{L}(R)^*.

Example

  • (0|1)*00(0|1)*, Σ\Sigma^*00Σ\Sigma^*.
  • ΣΣΣΣ\Sigma\Sigma\Sigma\Sigma, Σ4\Sigma^4.
  • 1*(0|ε\varepsilon)1*, 1*0?1*.
  • [A][A]*(.[A][A]*)*@[A][A]*.[A][A]*(.[A][A]*)*, [A]+(.[A]+)*[A]+(.[A]+)+.

By definition

Theorem: A language is regular iff it is the language of a regular expression.

The set of regular expressions is itself a language, but not a regular language.


POSIX regex standard

This is the syntax used by many practical tools such as grep. It is a convenient shorthand for writing regular languages.

  • . matches any single character.
  • [abc] matches one character chosen from the set {a,b,c}.
  • [A-Z] matches one uppercase letter in the range A to Z.
  • R+ means one or more copies of R.
  • R? means zero or one copy of R.
  • R* means zero or more copies of R.
  • (R) groups a subexpression.
  • R|S means either R or S.

Example
[A-Za-z]+@[A-Za-z]+\.[A-Za-z]+ matches an email pattern.


Context-free grammar

Definition: A context-free grammar is a formal grammar such that the lhs of each production rule is a single non-terminal symbol. A language is context-free iff it can be generated by a context-free grammar.

Example

  • SaSaS \to aSa, SbSbS \to bSb, SεS \to \varepsilon.
  • SSSS \to SS, S(S)S \to (S), S()S \to ().
© 版权声明

相关文章

33 条评论

  • 丽丽
    丽丽 读者

    回文那个例子挺直观的

    德国
    回复
  • NeonFury
    NeonFury 读者

    幂集构造的证明有点跳步啊

    中国台湾
    回复
  • MuteMagnet
    MuteMagnet 读者

    DFA图例那部分有点意思

    中国山东
    回复
  • HollowHowl
    HollowHowl 读者

    正则表达式居然不是正则语言?

    菲律宾
    回复
    • Velvet Moonbeam
      Velvet Moonbeam 读者

      我也被这个绕进去了

      中国湖北@ HollowHowl
      回复
  • 丝绸商
    丝绸商 读者

    ε转移确实容易让人晕圈,CFG那章也够呛

    阿根廷
    回复
  • Skywalker_风行者
    Skywalker_风行者 读者

    NFA的ε转移总觉得有点绕

    泰国
    回复
  • 书吏文渊
    书吏文渊 读者

    Kleene闭包那符号老记不住

    澳大利亚
    回复
  • PoltergeistPrank
    PoltergeistPrank 读者

    POSIX正则这块儿蛮实用的

    中国北京
    回复
  • RinBloom
    RinBloom 读者

    形式语言和自然语言对比有意思

    中国湖北
    回复
  • Sproutling
    Sproutling 读者

    CFG那块儿看得我有点懵

    加拿大
    回复
  • Wandering Monk
    Wandering Monk 读者

    那个邮箱的正则写起来真的头大

    美国
    回复
  • Finn海
    Finn海 读者

    DFA状态图看着挺直观的

    美国
    回复
    • PhantomFable
      PhantomFable 读者

      我也觉得,图解方便很多

      日本@ Finn海
      回复
  • TurtleTalk
    TurtleTalk 读者

    回文那个例子好直观

    中国台湾
    回复
  • Luminous Tide
    Luminous Tide 读者

    NFA转DFA那个幂集构造法真的心累

    澳大利亚
    回复
  • YukiMarshmallow
    YukiMarshmallow 读者

    左右正则混用直接寄,踩过坑

    日本
    回复
  • SerpentShadow
    SerpentShadow 读者

    形式语言和自然语言的对比蛮有意思的

    美国
    回复
  • LunarLullaby
    LunarLullaby 读者

    正则表达式那块儿终于搞懂了,原来Kleene closure是这么回事

    韩国
    回复
  • Noble Crane
    Noble Crane 读者

    最后那个上下文无关文法看着就头大

    中国北京
    回复
    • Jay晨
      Jay晨 读者

      同感,看到那块直接跳过

      日本@ Noble Crane
      回复
  • JetsetNomad
    JetsetNomad 读者

    正则表达式那部分例子挺实用,直接拿来写校验

    澳大利亚
    回复
  • OrionBlaze
    OrionBlaze 读者

    回文用栈实现更直观,有限状态机搞不定无限回文

    中国四川
    回复
  • 尘外
    尘外 游客

    DFA图确实比看定义好懂,画出来就明白了😭

    中国陕西
    回复
  • NightshadeWhisper
    NightshadeWhisper 读者

    NFA转DFA的powerset construction,考试考了直接放弃

    中国辽宁
    回复
  • 黑洞漫步者
    黑洞漫步者 游客

    这课跟计算理论有重叠,但这边更强调自动机模型,侧重点不同

    韩国
    回复
  • Poppy Seed
    Poppy Seed 读者

    那个[A]+是重复一次或多次,相当于[A][A]*,看懂了没?

    中国山东
    回复
  • 被门夹过的核桃
    被门夹过的核桃 游客

    正则表达式优先级Union最低,Kleene star最高,记错就写错

    中国浙江
    回复
  • 熵焓星云
    熵焓星云 游客

    S→aSa这种递归规则,一眼就能看出结构,写解析器好用

    中国辽宁
    回复
  • SolitaireSoul
    SolitaireSoul 读者

    DFA图好直观啊

    美国
    回复
  • 影月巫师
    影月巫师 游客

    正则表达式真的省事👍

    中国福建
    回复