使用MkDocs轻松搞定Python项目文档

本文将会介绍如何使用MkDocs来轻松搞定Python项目文档，这无疑是Python工程的重要部分。

引言

在学习Python第三方模块时，我们尝尝会阅读它们的官方文档，这能让我们快速掌握它们，同时也是这些第三方模块最为重要的参考资料。

比如，大名鼎鼎的scrapy模块，其官方文档（readthedocs风格）界面（网址为: https://scapy.readthedocs.io/en/latest/）如下：

又比如，最近大火的Pydantic和FastAPI模块都采用改了MkDocs工具来构建其官方文档，风格为material。Pydantic的界面（网址为: ）如下：

那么，什么是MkDocs呢？

MkDocs是一个快速、简单且华丽的静态站点生成器，适用于构建项目文档。文档源文件是用 Markdown 编写的，并使用单个 YAML 配置文件进行配置。使用MkDocs，我们将会很方便地给Python项目来生成一个项目文档。

本文将会通过Python的docstring和Google代码注释风格，并结合MkDocs来生成Python项目文档，并部署至Github平台。

在这之前，我们需要安装的Python模块如下：

mkdocs==1.5.3
mkdocs-material==9.5.17
mkdocstrings==0.24.1
mkdocstrings-python==1.9.0

我们使用的项目为文章使用Hatchling轻松实现Python项目打包中的Python项目package_python_project，网址为: https://github.com/percent4/package_python_project .

docstring和doctest

我们对项目中的src/token_counter/token_count.py添加docstring和符合Google代码注释风格的注释，同时在注释中添加Examples，使之能用doctest进行简单测试。

改造后的脚本如下：

# -*- coding: utf-8 -*-
# @place: Pudong, Shanghai
# @file: token_count.py
# @time: 2024/1/22 17:45
"""TokenCounter

to use TokenCounter, you shoule use:

from token_counter.token_count import TokenCounter

"""
import tiktoken

from typing import List, Union


class TokenCounter(object):
    """The class is for count tokens in user input.
    """
    def __init__(self, model: str = "gpt-3.5-turbo"):
        """`__init__` method

        the `__init__` method of the class

        Notes:
            the model name now only support OpenAI GPT models

        Args:
            model: the name of models, e.g. `gpt-3.5-turbo`
        """
        self.model = model

    def count(self, _input: Union[List[str], str]) -> Union[List[int], int]:
        """count the tokens of _input in string or list of string format

        count the tokens of _input using OpenAI tiktoken module, the model is `gpt-3.5-turbo`
        by default. if the model is not supported, then use `cl100k_base` as backup.

        Args:
            _input: the input string or list of string

        Examples:
            >>> token_counter = TokenCounter()
            >>> token_counter.count("who are you?")
            4
            >>> token_counter.count(["who are you?", "How's it going on?"])
            [4, 6]

        Raises:
            NotImplementedError: if `model` is not in the list

        Returns:
            the number of token of a string or the list of token of each string in list
        """
        try:
            encoding = tiktoken.encoding_for_model(self.model)
        except KeyError:
            print("Warning: model not found. Using cl100k_base encoding.")
            encoding = tiktoken.get_encoding("cl100k_base")

        if isinstance(_input, list):
            token_count_list = []
            for text in _input:
                token_count_list.append(len(encoding.encode(text)))
            return token_count_list
        elif isinstance(_input, str):
            return len(encoding.encode(_input))
        else:
            raise NotImplementedError(f"not support data type for {type(_input)}, please use str or List[str].")

对这个脚本使用doctest进行简单测试，命令为：

1	`python3 -m doctest token_count.py -v`

输出结果为：

Trying:
    token_counter = TokenCounter()
Expecting nothing
ok
Trying:
    token_counter.count("who are you?")
Expecting:
    4
ok
Trying:
    token_counter.count(["who are you?", "How's it going on?"])
Expecting:
    [4, 6]
ok
3 items had no tests:
    token_count
    token_count.TokenCounter
    token_count.TokenCounter.__init__
1 items passed all tests:
   3 tests in token_count.TokenCounter.count
3 tests in 4 items.
3 passed and 0 failed.
Test passed.

项目文档生成

切换正题。我们运行命令mkdocs new .后，就会在当前目录下生成docs文件夹和mkdocs.yml配置文件。

在docs文件夹下，我们可以使用Markdown格式来编写.md文件，作为项目文档的素材。这里，笔者创建三个文档：

index.md
reference.md
tutorials.md

index.md的文件内容如下：

# Welcome to Token Counter Docs

This document is about my personal project for token counter.

## Tutorials

You can see more detail about the project in [tutorial](tutorials.md), which serve as the project tutorial.

The source code can be found on Github, the website is [https://github.com/percent4/package_python_project](https://github.com/percent4/package_python_project) .

## Project overview

::: src.token_counter

## Others

The project is useful for Python project with packaging and documentation.

reference.md文件内容如下：

1
2
3

## Reference

1. openai/tiktoken: [https://github.com/openai/tiktoken](https://github.com/openai/tiktoken)

tutorials.md文件内容如下：

This path the project documentation focuses on the realize of **TokenCounter** Class.
Now it only support the models of OpenAI GPT mode, such as `gpt-3.5-turbo`.

::: src.token_counter.token_count

其中:::是Mkdocs工具特有语法，表示引入Python脚本或模块。

同时，我们对mkdocs.yml进行配置，定义好主题（theme）、插件（plugins）和导航栏（nav）等，参考如下：

site_name: Token Counter Docs

theme:
  name: "material"
  features:
    - navigation.tabs
  palette:
    # Palette toggle for light mode
    - scheme: default
      toggle:
        icon: material/lightbulb
        name: Switch to dark mode

    # Palette toggle for dark mode
    - scheme: slate
      toggle:
        icon: material/lightbulb-outline
        name: Switch to light mode

plugins:
  - search:
      lang: en
  - mkdocstrings:
      handlers:
        python:
          paths: [src]

nav:
  - Index: index.md
  - Tutorial: tutorials.md
  - Reference: reference.md

这里需要稍微说明下，插件mkdocstrings会提取Python代码中的docstring用来生成文档，这无疑是非常方便的。

在终端输入mkdocs serve命令，即可通过http://127.0.0.1:8000/访问项目文档，效果展示如下：

在这个项目文档中，还能很方便地查看Python源码。

以上生成的项目文档风格为material。如果我们想切换为readthedocs风格，只需要在配置文件中将theme的name改成readthedocs，但这种风格会丢失一些东西，毕竟不如material支持得那么全面。

部署

上述项目文档只在本地可查看。如果需要部署项目文档，有两种方法：

打包成静态文件夹：运行mkdocs build，将文档打包成site静态文件夹，可部署在任何你想要的服务器上面。
部署在Github上：运行mkdocs gh-deploy，稍待片刻，你的项目文档即可访问：https://percent4.github.io/package_python_project/ , 此时项目文档的代码在gh-pages分支。