Using Algolia DocSearch
· 阅读需 3 分钟
Docusaurus has official support for Algolia DocSearch.docusaurus 预设自带。
注册Algolia
首先要注册Algolia,百度一下。
得到 appId、apiKey、indexName
主题配置-自动索引
配置文件docusaurus.config.js增加:
themeConfig:{
algolia: {
appId: 'xxx',
apiKey: 'xxx',
indexName: 'dev.lichenghao.cn',
contextualSearch: true,
searchParameters: {},
searchPagePath: 'search',
},
}
然后到 DocSearch: Search made for documentation | DocSearch (algolia.com) 填写自己的网站和邮箱,然后每 24 小时便会运行一次代码爬取你的网站得到索引数据。
除此之外还可以选择主动推送索引的方式。
推送索引-手动索引
在centos7服务器下,利用docker执行官方的爬虫程序。
需要依赖 jq,Command-line JSON processor
安装EPEL存储库
sudo yum install epel-release -y
安装jq
sudo yum install jq -y
验证
jq --version
然后在任意文件夹下新增两个文件env
,config.json
,分别用于设置algolia的api key和索引推送的配置。
- env
- config.json
注意:API_KEY 是 Admin API Key
APPLICATION_ID=xxx
API_KEY=xxx
需要修改:index_name、start_urls
{
"index_name": "dev.lichenghao.cn",
"start_urls": ["https://dev.lichenghao.cn"],
"sitemap_urls": ["https://dev.lichenghao.cn/sitemap.xml"],
"stop_urls": ["/search"],
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1, article h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"custom_settings": {
"attributesForFaceting": [
"type",
"lang",
"language",
"version",
"docusaurus_tag"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
],
"attributesToHighlight": ["hierarchy", "content"],
"attributesToSnippet": ["content:10"],
"camelCaseAttributes": ["hierarchy", "content"],
"searchableAttributes": [
"unordered(hierarchy.lvl0)",
"unordered(hierarchy.lvl1)",
"unordered(hierarchy.lvl2)",
"unordered(hierarchy.lvl3)",
"unordered(hierarchy.lvl4)",
"unordered(hierarchy.lvl5)",
"unordered(hierarchy.lvl6)",
"content"
],
"distinct": true,
"attributeForDistinct": "url",
"customRanking": [
"desc(weight.pageRank)",
"desc(weight.level)",
"asc(weight.position)"
],
"ranking": [
"words",
"filters",
"typo",
"attribute",
"proximity",
"exact",
"custom"
],
"highlightPreTag": "<span class='algolia-docsearch-suggestion--highlight'>",
"highlightPostTag": "</span>",
"minWordSizefor1Typo": 3,
"minWordSizefor2Typos": 7,
"allowTyposOnNumericTokens": false,
"minProximity": 1,
"ignorePlurals": true,
"advancedSyntax": true,
"attributeCriteriaComputedByMinProximity": true,
"removeWordsIfNoResults": "allOptional",
"separatorsToIndex": "_",
"synonyms": [
["js", "javascript"],
["ts", "typescript"]
]
}
}
然后执行
docker run -it --env-file=env -e "CONFIG=$(cat config.json | jq -r tostring)" algolia/docsearch-scraper:v1.16.0
等待结果如下表示成功
......
> DocSearch: https://dev.lichenghao.cn/docs/react/zkJ8XlOBb4b1azdWe962P 30 records)
> DocSearch: https://dev.lichenghao.cn/docs/springcloud/C7UvQ2pvTdgduS9A5seg 83 records)
Nb hits: 8302
Github Action-全自动索引
利用Github的工作流,在每次推送代码的时候自动推送索引数据。
直接上工作流配置文件。
.github/workflows/docSearch.yml
name: docSearch
on:
push:
branches:
- main
jobs:
algolia:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Get the content of docSearch.json as config
id: algolia_config
run: echo "::set-output name=config::$(cat docSearch.json | jq -r tostring)"
- name: Clear old index records
id: clear_old_index_records
run: |
wget https://github.com/algolia/cli/releases/download/v1.5.0/algolia_1.5.0_linux_amd64.deb && sudo dpkg -i algolia_*.deb \
&& algolia indices clear 'dev.lichenghao.cn' --application-id ${{ secrets.ALGOLIA_APP_ID }} --admin-api-key ${{ secrets.ALGOLIA_API_KEY }} -y
- name: Run algolia/docsearch-scraper
env:
ALGOLIA_APP_ID: ${{ secrets.ALGOLIA_APP_ID }}
ALGOLIA_API_KEY: ${{ secrets.ALGOLIA_API_KEY }}
CONFIG: ${{ steps.algolia_config.outputs.config }}
run: |
docker run \
--env APPLICATION_ID=${ALGOLIA_APP_ID} \
--env API_KEY=${ALGOLIA_API_KEY} \
--env "CONFIG=${CONFIG}" \
algolia/docsearch-scraper:v1.16.0
流程很明确:读取配置文件,执行爬虫程序。
在执行爬虫程序之前,我执行了步骤
Clear old index records
我调用了下Algolia Cli
去清除历史索引数据。这一步是可选的,因为免费账户的索引记录为一万条,如果不删除历史的就超过限制。而我就是免费账户!!!
如果你不想每次提交代码都推送索引的话,可 以改成定时任务。
注意时区问题,比如下面定时任务,会在北京时间每天10点运行。
name: docSearch
on:
schedule:
- cron: "0 2 * * *"