L16 – Logic I: Languages and Automata

Discrete Mathematics2026年5月6日发布芮和

348 330

5,703字

24–36 分钟

Formal language

Definition

An alphabet is a finite set $\Sigma$ .
Elements of an alphabet $\Sigma$ are called characters.
A string over $\Sigma$ is a finite sequence of characters from $\Sigma$ .
A (formal) language $L$ over $\Sigma$ is a set of strings

Notations:

The empty string containing no characters is denoted $\varepsilon$ .
The set of all strings over $\Sigma$ is denoted $\Sigma^*$ .
So a language over $\Sigma$ is a subset $L \subseteq \Sigma^*$ .

Example

$\Sigma = \{a, b\}$ .
$\Sigma^* \ni \varepsilon, a, abb, aabaaabbabaaabaaaabbb$

Example

$\Sigma = \{a, b, c\}$ .
$L$ is the language of palindromes over $\Sigma$ .
$L = \{\varepsilon, a, b, c, aa, bb, cc, aaa, aba, aca, bab, \dots\}$ .

Example
Programming codes, mathematical expressions, logic formulas.

Formal language vs Natural language

Natural language	Formal language
Natural, organic, evolving.	Artificial, designed, fixed.
Ambiguous, descriptive.	Well-defined, prescriptive.
Flexible, redundant.	Minimalist, structured.
Semantics (meaning) is most important.	The form is most important.

Key question:
How to efficiently describe a language?
(not by listing all its strings).

Regular language

Union $L_1 \cup L_2$ .
Concatenation $L_1L_2 = \{xy \in \Sigma^* \mid x \in L_1, y \in L_2\}$ .
Power $L^0 = \{\varepsilon\}$ . $L^{n+1} = LL^n$ .
Kleene closure $L^* = \bigcup_{i=0}^\infty L^i = \{x \in \Sigma^* \mid \exists n, x \in L^n\}$ .

Definition: The collection of regular languages can be defined recursively as follows:

The empty language $\emptyset$ and $\{c\}$ for $c \in \Sigma$ are regular languages.
If $L$ is a regular language, so are $L^*$ and $L^0 = \{\varepsilon\}$ .
If $L_1$ and $L_2$ are regular languages, so are $L_1 \cup L_2$ and $L_1L_2$ .

Deterministic finite automaton

A mathematical model of computation. Can be used to determine whether a string is within a language.

Definition: A deterministic finite automaton (plural automata) consists of a set $Q$ of states and a transition function $\tau: Q \times \Sigma \to Q$ . A unique state $q_0 \in Q$ is designated as the initial state, and some states $F \subseteq Q$ are designated as the final states.

A DFA can be represented as a graph with directed edges labeled by characters.

An automaton runs on an input string $s = c_1 \dots c_n$ and answers YES or NO.

Initially, the automaton is in the initial state $r_0 = q_0$ .
After $k$ steps, the state of the automaton is denoted $r_k$ .
At the $k$ -th step, the automaton reads the $k$ -th character $c_k$ of $s$ , then transitions from the state $r_{k-1}$ to $r_k = \tau(r_{k-1}, c_k)$ .
After $n$ steps, the string is accepted if $r_n \in F$ is a final state, and rejected otherwise.

Definition: The language of an automaton is the set of strings accepted by this automaton.

Theorem: A language is regular iff it is the language of a deterministic finite automaton.

Example

The string 010110 is accepted.
The string 101000 is rejected.
The string 11011100 is?

Example

Nondeterministic finite automaton

In an NFA,

For a state-character pair in $Q \times \Sigma$ , there might be zero or several possible transitions (non-deterministic).
There is an $\varepsilon$ -transition that can be followed for any number of times without consuming any character in the input string.
The input string is accepted iff there exists a sequence of transitions that leads to a final state.

Example

Theorem (NFA=DFA) A language can be accepted by a DFA iff it can be accepted by an NFA.

Proof (omitted) by powerset construction.

Example

Theorem: A language is regular iff it is the language of a NFA.

Formal grammar

Definition: A formal grammar consists of

A set $\Sigma$ of terminal symbols.
A set $N$ of nonterminal symbols disjoint from $\Sigma$ .
A set $P$ of production rules of the form
$(\Sigma \cup N)^* N (\Sigma \cup N)^* \to (\Sigma \cup N)^*$
A distinguished start symbol $S \in N$ .

Example:

$S \to aSb$ , $S \to ba$ .
$S \to a$ , $S \to SS$ , $aSa \to b$ .

Regular grammar

Definition: A (right-)regular grammar is a formal grammar such that each production rule is of the following forms

$A \to \varepsilon$ .
$A \to a$ .
$A \to aB$ .
where $A, B \in N$ and $a \in \Sigma$ .

Theorem: A language is regular iff it can be generated by a (right-)regular grammar.

If we replace the last production rule by $A \to Ba$ , we obtain a left-regular grammar, equivalent to the right-regular grammar.

However, if both $A \to Ba$ and $A \to aB$ are included in the production rules, the language is not regular.

Regular expression

A convenient and popular way to describe a regular language. Let $R$ be a regular expression, we use $\mathcal{L}(R)$ to denote the language it describes.

Regular expressions (regex) start with the atomic regular expressions.

the empty language $\emptyset$ , $\mathcal{L}(\emptyset) = \emptyset$ .
the empty string $\varepsilon$ , $\mathcal{L}(\varepsilon) = \{\varepsilon\}$ .
for $a \in \Sigma$ , $\mathcal{L}(a) = \{a\}$ .

Then we combine regular expressions in the following ways (in the order of increasing precedence!).

Or (union of languages).
$\mathcal{L}(R_1|R_2) = \mathcal{L}(R_1) \cup \mathcal{L}(R_2)$ .
Concatenation.
$\mathcal{L}(R_1R_2) = \mathcal{L}(R_1)\mathcal{L}(R_2)$ .
Kleene closure.
$\mathcal{L}(R^*) = \mathcal{L}(R)^*$ .

Example

(0|1)*00(0|1)*, $\Sigma^*$ 00 $\Sigma^*$ .
$\Sigma\Sigma\Sigma\Sigma$ , $\Sigma^4$ .
1*(0| $\varepsilon$ )1*, 1*0?1*.
[A][A]*(.[A][A]*)*@[A][A]*.[A][A]*(.[A][A]*)*, [A]+(.[A]+)*[A]+(.[A]+)+.

By definition

Theorem: A language is regular iff it is the language of a regular expression.

The set of regular expressions is itself a language, but not a regular language.

POSIX regex standard

This is the syntax used by many practical tools such as grep. It is a convenient shorthand for writing regular languages.

. matches any single character.
[abc] matches one character chosen from the set {a,b,c}.
[A-Z] matches one uppercase letter in the range A to Z.
R+ means one or more copies of R.
R? means zero or one copy of R.
R* means zero or more copies of R.
(R) groups a subexpression.
R|S means either R or S.

Example
[A-Za-z]+@[A-Za-z]+\.[A-Za-z]+ matches an email pattern.

Context-free grammar

Definition: A context-free grammar is a formal grammar such that the lhs of each production rule is a single non-terminal symbol. A language is context-free iff it can be generated by a context-free grammar.

Example

$S \to aSa$ , $S \to bSb$ , $S \to \varepsilon$ .
$S \to SS$ , $S \to (S)$ , $S \to ()$ .

文章版权归作者所有，未经允许请勿转载。

L08 – Chinese remainder theorem, CRT map, and group

Discrete Mathematics # 数论基础 # 模运算 # 欧拉定理

2026年3月27日

8717.1K

L12 – Combinatorics: Sets, Multisets, Binomial Inversion, and Distribution Problems

Discrete Mathematics # 二项式反演 # 分配问题 # 卡塔兰数

2026年4月13日

576.5K

L03 – Division of residue classes, Euler’s phi function, Euler’s theorem, Fermat’s little theorem, and RSA

Discrete Mathematics # RSA算法 # 数论基础 # 模运算

2026年3月11日

628.8K

L9 – Subgroups, Cyclic Groups, DLOG, CDH, Diffie-Hellman Key Exchange, and Cardinality

Discrete Mathematics # 密码学基础 # 密钥交换 # 循环群

2026年4月2日

385.8K

33 条评论

丽丽读者
回文那个例子挺直观的
2026年5月6日德国
回复
NeonFury 读者
幂集构造的证明有点跳步啊
2026年5月6日中国台湾
回复
MuteMagnet 读者
DFA图例那部分有点意思
2026年5月6日中国山东
回复
HollowHowl 读者
正则表达式居然不是正则语言？
2026年5月6日菲律宾
回复
- Velvet Moonbeam 读者
  我也被这个绕进去了
  2026年5月6日中国湖北@ HollowHowl
  回复
丝绸商读者
ε转移确实容易让人晕圈，CFG那章也够呛
2026年5月6日阿根廷
回复
Skywalker_风行者读者
NFA的ε转移总觉得有点绕
2026年5月6日泰国
回复
- PicklePhantom 读者
  我也觉得有点绕，理解中
  2026年5月6日日本@ Skywalker_风行者
  回复
书吏文渊读者
Kleene闭包那符号老记不住
2026年5月6日澳大利亚
回复
PoltergeistPrank 读者
POSIX正则这块儿蛮实用的
2026年5月6日中国北京
回复
RinBloom 读者
形式语言和自然语言对比有意思
2026年5月6日中国湖北
回复
Sproutling 读者
CFG那块儿看得我有点懵
2026年5月6日加拿大
回复
- SoloSquad 读者
  我也卡在这了
  2026年5月6日中国重庆@ Sproutling
  回复
Wandering Monk 读者
那个邮箱的正则写起来真的头大
2026年5月6日美国
回复
Finn海读者
DFA状态图看着挺直观的
2026年5月6日美国
回复
- PhantomFable 读者
  我也觉得，图解方便很多
  2026年5月6日日本@ Finn海
  回复
TurtleTalk 读者
回文那个例子好直观
2026年5月6日中国台湾
回复
Luminous Tide 读者
NFA转DFA那个幂集构造法真的心累
2026年5月6日澳大利亚
回复
YukiMarshmallow 读者
左右正则混用直接寄，踩过坑
2026年5月6日日本
回复
SerpentShadow 读者
形式语言和自然语言的对比蛮有意思的
2026年5月6日美国
回复
LunarLullaby 读者
正则表达式那块儿终于搞懂了，原来Kleene closure是这么回事
2026年5月6日韩国
回复
Noble Crane 读者
最后那个上下文无关文法看着就头大
2026年5月6日中国北京
回复
- Jay晨读者
  同感，看到那块直接跳过
  2026年5月6日日本@ Noble Crane
  回复
JetsetNomad 读者
正则表达式那部分例子挺实用，直接拿来写校验
2026年5月6日澳大利亚
回复
OrionBlaze 读者
回文用栈实现更直观，有限状态机搞不定无限回文
2026年5月6日中国四川
回复
尘外游客
DFA图确实比看定义好懂，画出来就明白了😭
2026年5月6日中国陕西
回复
NightshadeWhisper 读者
NFA转DFA的powerset construction，考试考了直接放弃
2026年5月6日中国辽宁
回复
黑洞漫步者游客
这课跟计算理论有重叠，但这边更强调自动机模型，侧重点不同
2026年5月6日韩国
回复
Poppy Seed 读者
那个[A]+是重复一次或多次，相当于[A][A]*，看懂了没？
2026年5月6日中国山东
回复
被门夹过的核桃游客
正则表达式优先级Union最低，Kleene star最高，记错就写错
2026年5月6日中国浙江
回复
熵焓星云游客
S→aSa这种递归规则，一眼就能看出结构，写解析器好用
2026年5月6日中国辽宁
回复
SolitaireSoul 读者
DFA图好直观啊
2026年5月6日美国
回复
影月巫师游客
正则表达式真的省事👍
2026年5月6日中国福建
回复