From 2921ec961576ae5907eda63de57ac611a3b07ea6 Mon Sep 17 00:00:00 2001
From: taynpg <taynpg@163.com>
Date: Thu, 9 Jan 2025 15:59:49 +0800
Subject: [PATCH] =?UTF-8?q?codec=EF=BC=9A=E6=B7=BB=E5=8A=A0UTF-8=E7=BC=96?=
 =?UTF-8?q?=E7=A0=81=E5=AE=9A=E4=B9=89=E6=A0=87=E5=87=86?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 codec.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)
 create mode 100644 codec.md

diff --git a/codec.md b/codec.md
new file mode 100644
index 0000000..2f10326
--- /dev/null
+++ b/codec.md
@@ -0,0 +1,16 @@
+# UTF-8编码
+
+UTF-8 采用可变长编码的方式，一个字符可占 1 字节 -6 字节，其中每个字符所占的字节数由字符开始的 1 的个数确定，具体的编码方式如下：
+
+| 范围                  | 1        | 2        | 3        | 4        | 5        | 6        |
+| --------------------- | -------- | -------- | -------- | -------- | -------- | -------- |
+| 0x00000000~0x0000007F | 0xxxxxxx |          |          |          |          |          |
+| 0x00000080~0x000007FF | 110xxxxx | 10xxxxxx |          |          |          |          |
+| 0x00000800~0x0000FFFF | 1110xxxx | 10xxxxxx | 10xxxxxx |          |          |          |
+| 0x00010000~0x001FFFFF | 11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |          |          |
+| 0x00200000~0x03FFFFFF | 111110xx | 10xxxxxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |          |
+| 0x04000000~0x7FFFFFFF | 1111110x | 10xxxxxx | 10xxxxxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |
+
+对于 UTF-8 的编码，只需要每次计算每个字符开始字节的 1 的个数，就可以确定这个字符的长度。
+
+来源：[[C++ 读取 UTF-8 及 GBK 系列的文本方法及原理-CSDN博客](https://blog.csdn.net/weixin_41055260/article/details/121434010)]
\ No newline at end of file