当前位置：首页 > news >正文

[项目] Boost搜索引擎

news 2026/5/9 17:29:32

1.项目相关背景

2.项目宏观原理

3.技术栈和项目环境

4.正排索引&&倒排索引

5.去标签与数据清洗

6.构建索引模块Index

6.1正排索引

6.2 建立倒排

jiebacpp使用

建立分词

7.搜索引擎模块Searcher

Jsoncpp -- 通过jsoncpp进行序列化和反序列化

处理Content

8.引入http-lib

9.Web前端代码编写

10.项目日志编写

11.项目测试

1.项目相关背景

由于boost官网是没有站内搜索的，因此我们需要自己做一个。我们所做的是站内搜索，所谓站内搜索，其搜索的数据更垂直，数据量更小。

2.项目宏观原理

3.技术栈和项目环境

技术栈：C/C++ C++11, STL, 准标准库Boost，Jsoncpp，cppjieba，cpp-httplib , 选学： html5，css，js、jQuery、Ajax.

项目环境：Centos 7云服务器，vim/gcc(g++)/Makefile , vs2019 or vs code

4.正排索引&&倒排索引

文档举例：

文档ID	文档内容
1	雷军买了四斤小米
2	雷军发布了小米手机

正排索引：从文档ID找到文档内容

分词：方便建立倒排索引和查找

雷军买了四斤小米：雷军/买/了/四斤/小米

雷军发布了小米手机：雷军/发布/了/小米/手机

倒排索引：根据关键字，找到文档ID的内容

关键字	文档ID
雷军	1,2
买	1
四斤	1
小米	1,2
四斤小米	1
发布	2
小米手机	2

停止词：了，的，吗，a，the，一般我们在分词的时候可以不考虑

用户输入：小米->倒排索引中查找到文档ID->提取出文档ID(1,2)->根据正排索引->找到文档ID->构建响应结果

5.去标签与数据清洗

我们只需要boost下doc/html/中文件建立索引

作用：原始数据->去标签->把结果放在同一个行文本文档中

Parser.cc

Parser.cc中有主要有三个函数EnumFile,ParseHtml,SaveHtml

5.1 EnumFile():

作用：递归式把每个html文件名带路径，保存到files_list中

步骤：

判断路径是否存在
判断文件是不是普通文件，因为.html文件是普通文件
判断后缀是否符合要求，必须是.html结尾

要使用Boost库中filesystem

boost开发库安装：sudo yum install -y boost-devel

递归遍历：使用boost库中recursive_directory_iterator

必须是.html文件才可以被遍历插入

iter->path().extension() == ".html"

bool EnumFile(const std::string& src_path,std::vector<std::string> * files_list)
{namespace fs = boost::filesystem;fs::path root_path(src_path);//查找路径是否存在 不存在 就没有必要往后走了if(!fs::exists(root_path)){std::cerr<<src_path<<"not exists" <<std::endl;return false;}//递归遍历文件//定义一个空的迭代器 用来进行判断递归结束fs::recursive_directory_iterator end;for(fs::recursive_directory_iterator iter(root_path);iter != end;iter++){//判断文件是否是普通文件 html都是普通文件 if(!fs::is_regular_file(*iter)) {continue;}//一定是一个普通文件  判断文件后缀 只需要htmlif(iter->path().extension() != ".html"){continue;}//std::cout<<"debug: "<<iter->path().string()<<std::endl;//当前的路径一定是一个合法的,以.html结束的普通网页文件files_list->push_back(iter->path().string());}return true;
}

5.2 ParseHtml()

作用：读取每个文件的内容，解析文件内容建立DocInfo_t

步骤：

读取文件
解析指定文件，提取title -> <title> </title>
解析指定文件，提取content
解析指定文件，提取url

typedef struct DocInfo
{std::string title;  //文档的标题std::string content;//文档内容std::string url;    //文档在官网中的url
}DocInfo_t;

bool ParseHtml(const std::vector<std::string>& files_list,std::vector<DocInfo_t>*results)
{//遍历文件 解析文件for(const std::string &file : files_list){//1.读取文件 Read()std::string result;if(!ns_util::FileUtil::ReadFile(file,&result)){continue;}//2.解析指定的文件,提取titleDocInfo_t doc;//解析titleif(!ParseTitle(result,&doc.title)){continue;}//3.解析指定的文件,提取content 本质是去标签if(!ParseContent(result,&doc.content)){continue;}//4.解析指定的文件,提取urlif(!ParseUrl(file,&doc.url)){continue;}//这里一定是完成了解析任务,当前文档的相关结果都保存在了doc里面results->push_back(std::move(doc));//细节 本质会发生拷贝 效率可能比较低}return true;
}

5.2.1 读取文件

static bool ReadFile(const std::string &file_path,std::string *out)
{std::ifstream in(file_path,std::ios::in);if(!in.is_open()){std::cerr<<"open file"<<file_path<<" error " << std::endl;return false;}//文件打开成功std::string line;//如何理解getline读取到文件结束呢?//getline的返回值是一个& //while(bool) 本质是因为返回的对象重载了强制类型转换while(std::getline(in,line)){*out += line;}in.close();return true;
}

5.2.2 解析指定文件，提取title

由于title标题都是在<title> </title>标签之间，因此我们可以使用字符串操作来进行提取

//找到<title> </title>位置,然后选取中间的位置
static bool ParseTitle(const std::string& file,std::string *title)
{std::size_t begin = file.find("<title>");if(begin == std::string::npos){return false;}std::size_t end = file.find("</title>");if(end == std::string::npos){return false;}begin+=std::string("<title>").size();if(begin>end){return false;}*title = file.substr(begin,end-begin);return true;
}

5.2.3 去掉标签

去标签是基于一个状态机来读取的，在进行遍历的时候，一旦碰到'>',说明该标签处理完毕了，我们不想保留原始文本中的'\n'，就设定为空字符

static bool ParseContent(const std::string& file,std::string *content)
{//去标签,基于一个简易的状态机来编写enum status{LABLE,CONTENT};enum status s = LABLE;//在遍历的时候 只要碰到'>'当前的标签被处理完毕for(char c:file){switch(s){case LABLE:if(c == '>') s = CONTENT;break;case CONTENT://只要碰到了左尖括号 一位置新的标签开始了if(c == '<') s=LABLE;else{//我们不想保留原始文件中的\n,因为我们想用\n作为html解析之后文本的分隔符4if(c == '\n') c= ' ';content->push_back(c);}break;default:break;}}return true;
}

5.2.4 拼接url

我们在观察boost官方库的url可以发现，boost库下的官方文档和下载下来的文档有路径对应关系的。

官网URL样例： https://www.boost.org/doc/libs/1_78_0/doc/html/accumulators.html

其中url是由url_head和url_tail拼接而成。而url_head为固定的字符串构成：

"https://www.boost.org/doc/libs/1_81_0/doc/html"

而url_tail正式我们html文件的文件路径名，只保留文件名

//构建url boost库的官方文档,和我们下载下来的文档是有路径对应关系的
static bool ParseUrl(const std::string & file_path,std::string *url)
{ const std::string url_head = "https://www.boost.org/doc/libs/1_81_0/doc/html";std::string url_tail = file_path.substr(src_path.size());*url = url_head + url_tail;return true;
}

5.3 SaveHtml函数

作用：将解析内容写入文档中，一定要考虑下一次在读取的时候也方便操作。因此我们这里采用的格式如下：

类似：title\3content\3url \n title\3content\3url \n title\3content\3url \n ...

这样方便我们以后使用getline(ifsream, line)，直接获取文档的全部内容：title\3content\3url

//每一个文档包含3个部分 title\3 content\3 url \3 \n title\3 content\3 url \3 
//每个文档和文档之间用'/n'分隔开
bool SaveHtml(const std::vector<DocInfo_t>& results,const std::string & output)
{#define SEP '\3'//按照二进制的方式进行写入std::ofstream out(output,std::ios::out | std::ios::binary);if(!out.is_open()){std::cerr <<"open "<<output<<"Failed" <<std::endl;return false;}//beginfor(auto &item : results){std::string out_string;out_string = item.title;out_string += SEP;out_string += item.content;out_string += SEP;out_string += item.url;out_string += '\n';//写入文件out.write(out_string.c_str(),out_string.size());}out.close();return true;
}

6.构建索引模块Index

这一步我们要构建正排索引和倒排索引。

    struct DocInfo{std::string title;  //文档的标题std::string content;//文档对应去标签之后的内容std::string url;    //文档的urluint64_t doc_id;         //文档id};

6.1正排索引

//正排索引的数据结构用数组，数组的下标天然是文档的ID

std::vector<DocInfo> forward_index; //正排索引

正排索引是根据doc_id找到文档内容

步骤：

字符串切分，解析line
字符串进行填充到DocInfo
将DocInfo插入到正排索引的vector中

//namespace ns_util 中,类class FileUtil的成员函数
class StringUtil{public:static void split(const std::string &target, std::vector<std::string> *out, const std::string &sep){//boost splitboost::split(*out, target, boost::is_any_of(sep), boost::token_compress_on);}
};DocInfo *BuildForwardIndex(const std::string&line)
{//1.解析line,做字符串切分std::vector<std::string> results;const std::string sep = "\3";//行内分隔符ns_util::StringUtil::split(line,&results,sep);if(results.size() != 3){return nullptr;}//2.字符串进行填充DocInfoDocInfo doc;doc.title = results[0];doc.content = results[1];doc.url = results[2];doc.doc_id = forward_index.size();//先进行保存id,再插入,对应的id就是当前doc在vector中的下标!//3.插入到正排索引的vectorforward_index.push_back(std::move(doc));//doc.html文件return &forward_index.back();
}

这里使用了boost库中的split方法。如果有多个"\3"分割符时,要将第三个参数设置为boost::token_compress_on

注意：先进行保存id,再插入,对应的id就是当前doc在vector中的下标!

6.2 建立倒排

 struct InvertedElem{uint64_t doc_id;std::string word;int weight;//权重InvertedElem():weight(0){}};

根据文档内容，行成一个或者多个InvertedElem（倒排拉链），因为当前我们是在一个文档进行处理的，一个文档中会包含多个"词"，都对应到当前doc_id。

需要对title和content进行分词
设置词和文档的相关性 -- 我们这里使用词频统计使用weight,因此我们需要定义一个倒排拉链的结构体，这里做一个简单处理。
自定义相关性，让在标题出现的关键字的weight值更高。因此weight = 10*title + content。

安装cpp-jieba：

获取链接：cppjieba: cppjieba cppjieba

下载成功后rz -E 加入到我们的项目路径下。之后我们建立软连接到当前的目录之下

ln -s cppjieba/dict dict -- 词库

ln -s cppjieba/include/cppjieba/ cppjieba --相关头文件

注意：在使用cppjieba时有一个坑，比如把deps/limonp 下的文件拷贝到include/cppjieba/ 下才能正常使用

cp deps/limonp include/cppjieba/ -rf

jiebacpp使用

#pragma once#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <mutex>
#include <unordered_map>
#include <boost/algorithm/string.hpp>
#include "cppjieba/Jieba.hpp"
#include "log.hpp"namespace ns_util{const char* const DICT_PATH = "./dict/jieba.dict.utf8";const char* const HMM_PATH = "./dict/hmm_model.utf8";const char* const USER_DICT_PATH = "./dict/user.dict.utf8";const char* const IDF_PATH = "./dict/idf.utf8";const char* const STOP_WORD_PATH = "./dict/stop_words.utf8";class JiebaUtil{private://static cppjieba::Jieba jieba;cppjieba::Jieba jieba;std::unordered_map<std::string, bool> stop_words;private:JiebaUtil():jieba(DICT_PATH, HMM_PATH, USER_DICT_PATH, IDF_PATH, STOP_WORD_PATH){}JiebaUtil(const JiebaUtil&) = delete;static JiebaUtil *instance;public:static JiebaUtil* get_instance(){static std::mutex mtx;if(nullptr == instance){mtx.lock();if(nullptr == instance){instance = new JiebaUtil();instance->InitJiebaUtil();}mtx.unlock();}return instance;}void InitJiebaUtil(){std::ifstream in(STOP_WORD_PATH);if(!in.is_open()){LOG(FATAL, "load stop words file error");return;}std::string line;while(std::getline(in, line)){stop_words.insert({line, true});}in.close();}void CutStringHelper(const std::string &src, std::vector<std::string> *out){//核心代码jieba.CutForSearch(src, *out);for(auto iter = out->begin(); iter != out->end(); ){auto it = stop_words.find(*iter);if(it != stop_words.end()){//说明当前的string 是暂停词，需要去掉iter = out->erase(iter);}else{iter++;}}}public:static void CutString(const std::string &src, std::vector<std::string> *out){ns_util::JiebaUtil::get_instance()->CutStringHelper(src, out);//jieba.CutForSearch(src, *out);}};JiebaUtil *JiebaUtil::instance = nullptr;//cppjieba::Jieba JiebaUtil::jieba(DICT_PATH, HMM_PATH, USER_DICT_PATH, IDF_PATH, STOP_WORD_PATH);
}

获取单例时可能会有线程安全的问题，我们对其进行加锁

至此我们引入了jieba分词，我们可以正是编写倒排索引了

建立分词

首先对title和content进行分词，因此当title和content分完词后我们要对词和词频建立映射表。我们对title和content进行分词是想统计各个词出现的词频。我们建立vector来保存分出来的词语。

当这个词语出现在title时认为其权重较重，在content出现时认为其权重较轻。

注意：由于搜索的时候本身是不区分大小写的，因此我们在分词结束之后将出现的词语全部转换成小写，然后进行统计。

 bool BuildInvertedIndex(const DocInfo& doc){//DocInfo{title,content,url,doc_id}//word-> //需要对title和content进行分词//example: 吃/葡萄/不吐/葡萄皮struct word_cnt{int title_cnt;int content_cnt;word_cnt():title_cnt(0), content_cnt(0){}};std::unordered_map<std::string, word_cnt> word_map; //用来暂存词频的映射表//对标题进行分词std::vector<std::string> title_words;ns_util::JiebaUtil::CutString(doc.title, &title_words);for(std::string s : title_words){boost::to_lower(s); //需要统一转化成为小写word_map[s].title_cnt++; //如果存在就获取，如果不存在就新建}//对文档内容进行分词std::vector<std::string> content_words;ns_util::JiebaUtil::CutString(doc.content, &content_words);//对内容进行词频统计for(std::string s : content_words){boost::to_lower(s);word_map[s].content_cnt++;}
#define X 10
#define Y 1for(auto &word_pair : word_map){InvertedElem item;item.doc_id = doc.doc_id;item.word = word_pair.first;item.weight = X*word_pair.second.title_cnt + Y*word_pair.second.content_cnt; //相关性InvertedList &inverted_list = inverted_index[word_pair.first];inverted_list.push_back(std::move(item));}return true;}

7.搜索引擎模块Searcher

我们编写Searcher首先需要获取或穿件index对象，然后根据index对象建立索引

建立正排和倒排索引成功之后，用户要进行搜索了。首先我们要先将用户输入的关键字进行分词，然后根据分词的各个词进行index查找，建立index时要忽略大小写。然后进行合并排序，汇总查找结果，按照相关性进行降序排序，将权重高的排在前面，最后我们根据查找出来的结果，构建Json串

void Search(const std::string &query, std::string *json_string){//1.[分词]:对我们的query进行按照searcher的要求进行分词std::vector<std::string> words;ns_util::JiebaUtil::CutString(query, &words);//2.触发 根据分词的各个词,进行index查找 建立index是忽略大小写的//ns_index::InvertedList inverted_list_all;//内部放的是InvertedElemstd::vector<InvertedElemPrint> inverted_list_all;std::unordered_map<uint64_t,InvertedElemPrint> tokens_map;for(std::string word : words){boost::to_lower(word);ns_index::InvertedList *inverted_list = index->GetInvertedList(word);if(nullptr == inverted_list){continue;}for(const auto&elem : *inverted_list){auto &item = tokens_map[elem.doc_id];//item一定是doc_id相同的节点item.doc_id = elem.doc_id;item.weight = elem.weight;item.words.push_back(elem.word);}}for(const auto&item : tokens_map){inverted_list_all.push_back(std::move(item.second));}std::sort(inverted_list_all.begin(),inverted_list_all.end(),[](const InvertedElemPrint &e1,const InvertedElemPrint& e2){return e1.weight > e2.weight;});//4.构建 根据查找出来的结果 构建json -- 通过第三方库Json::Value root;for(auto &item : inverted_list_all){ns_index::DocInfo *doc = index->GetForwardIndex(item.doc_id);if(nullptr == doc){continue;}Json::Value elem;elem["title"] = doc->title;elem["desc"] = GetDesc(doc->content,item.words[0]);/*content是文档的去标签的结果 但是不是我们想要的,我们要的是一部分*/elem["url"] = doc->url;root.append(elem);}//Json::StyledWriter writer;Json::FastWriter writer;*json_string = writer.write(root);}

查找时我们首先需要获取倒排拉链。

Jsoncpp -- 通过jsoncpp进行序列化和反序列化

jsoncpp的安装：sudo yum install -y jsoncpp-devel

那我们如何使用Jsoncpp呢？我们做一下演示

#include <iostream>
#include <string>
#include <vector>
#include <jsoncpp/json/json.h>int main()
{Json::Value root;Json::Value item1;item1["key1"] = "value1";item1["key2"] = "value2";Json::Value item2;item2["key1"] = "value1";item2["key2"] = "value2";root.append(item1);root.append(item2);//进行序列化Json::StyledWriter writer;std::string s = writer.write(root);std::cout<<s<<std::endl;return 0;
}

这里注意我们在编译时是需要链接json库的否则会报连接时错误

需要 -ljsoncpp

我们发现打印的结果进行了序列化。我们还有另一种形式FastWriter，这种形式更加简洁

做好这些准备工作之后我们进行构建Json串

这里还有一个需要注意的地方是content是文档的去标签结果，但是不是我们想要的，我们只需要一部分，因此这里需要进行处理。

处理Content

std::string GetDesc(const std::string &html_content, const std::string &word){//找到word在html_content中的首次出现，然后往前找50字节(如果没有，从begin开始)，往后找100字节(如果没有，到end就可以的)//截取出这部分内容const int prev_step = 50;const int next_step = 100;//1. 找到首次出现auto iter = std::search(html_content.begin(), html_content.end(), word.begin(), word.end(), [](int x, int y){return (std::tolower(x) == std::tolower(y));});if(iter == html_content.end()){return "None1";}int pos = std::distance(html_content.begin(), iter);//2. 获取start，end , std::size_t 无符号整数int start = 0; int end = html_content.size() - 1;//如果之前有50+字符，就更新开始位置if(pos > start + prev_step) start = pos - prev_step;if(pos < end - next_step) end = pos + next_step;、、//3. 截取子串,returnif(start >= end) return "None2";std::string desc = html_content.substr(start, end - start);desc += "...";return desc;}

8.引入http-lib

引入http-lib：cpp-httplib: cpp-httplib - Gitee.com

注意：在引入http-lib的时候需要较新版本的gcc,使用gcc -v便可查看gcc版本

如果您当前是较低版本，请先升级至较新版本，升级方法：升级GCC-Linux CentOS7

当GCC版本更新之后我们在当前文件下创建httplib的软连接

ln -s /home/Lxy/cpp-httplib-v0.7.15 cpp-httplib

#include "searcher.hpp"
#include "cpp-httplib/httplib.h"
#include "log.hpp"
const std::string input = "data/raw_html/raw.txt";
const std::string root_path = "./wwwroot";int main()
{ns_searcher::Searcher search;search.InitSearcher(input);httplib::Server svr;svr.set_base_dir(root_path.c_str());svr.Get("/s", [&search](const httplib::Request &req, httplib::Response &rsp){if(!req.has_param("word")){rsp.set_content("必须要有搜索关键字!", "text/plain; charset=utf-8");return;}std::string word = req.get_param_value("word");//std::cout << "用户在搜索：" << word << std::endl;LOG(NORMAL,"用户搜索的: "+word);std::string json_string;search.Search(word, &json_string);rsp.set_content(json_string, "application/json");//rsp.set_content("你好,世界!", "text/plain; charset=utf-8");});LOG(NORMAL,"服务器启动成功....");svr.listen("0.0.0.0", 8081);return 0;
}

9.Web前端代码编写

了解html，css，js

html: 是网页的骨骼 -- 负责网页结构

css：网页的皮肉 -- 负责网页美观的

js（javascript）：网页的灵魂---负责动态效果，和前后端交互

教程： w3school 在线教程

<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1.0"><script src="http://code.jquery.com/jquery-2.1.1.min.js"></script><title>boost 搜索引擎</title><style>/* 去掉网页中的所有的默认内外边距，html的盒子模型 */* {/* 设置外边距 */margin: 0;/* 设置内边距 */padding: 0;}/* 将我们的body内的内容100%和html的呈现吻合 */html,body {height: 100%;}/* 类选择器.container */.container {/* 设置div的宽度 */width: 800px;/* 通过设置外边距达到居中对齐的目的 */margin: 0px auto;/* 设置外边距的上边距，保持元素和网页的上部距离 */margin-top: 15px;}/* 复合选择器，选中container 下的 search */.container .search {/* 宽度与父标签保持一致 */width: 100%;/* 高度设置为52px */height: 52px;}/* 先选中input标签， 直接设置标签的属性，先要选中， input：标签选择器*//* input在进行高度设置的时候，没有考虑边框的问题 */.container .search input {/* 设置left浮动 */float: left;width: 600px;height: 50px;/* 设置边框属性：边框的宽度，样式，颜色 */border: 1px solid black;/* 去掉input输入框的有边框 */border-right: none;/* 设置内边距，默认文字不要和左侧边框紧挨着 */padding-left: 10px;/* 设置input内部的字体的颜色和样式 */color: #CCC;font-size: 14px;}/* 先选中button标签， 直接设置标签的属性，先要选中， button：标签选择器*/.container .search button {/* 设置left浮动 */float: left;width: 150px;height: 52px;/* 设置button的背景颜色，#4e6ef2 */background-color: #4e6ef2;/* 设置button中的字体颜色 */color: #FFF;/* 设置字体的大小 */font-size: 19px;font-family:Georgia, 'Times New Roman', Times, serif;}.container .result {width: 100%;}.container .result .item {margin-top: 15px;}.container .result .item a {/* 设置为块级元素，单独站一行 */display: block;/* a标签的下划线去掉 */text-decoration: none;/* 设置a标签中的文字的字体大小 */font-size: 20px;/* 设置字体的颜色 */color: #4e6ef2;}.container .result .item a:hover {text-decoration: underline;}.container .result .item p {margin-top: 5px;font-size: 16px;font-family:'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;}.container .result .item i{/* 设置为块级元素，单独站一行 */display: block;/* 取消斜体风格 */font-style: normal;color: green;}</style>
</head>
<body><div class="container"><div class="search"><input type="text" value="请输入搜索关键字"><button onclick="Search()">搜索一下</button></div><div class="result"><!-- 动态生成网页内容 --><!-- <div class="item"><a href="#">这是标题</a><p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p><i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i></div><div class="item"><a href="#">这是标题</a><p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p><i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i></div><div class="item"><a href="#">这是标题</a><p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p><i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i></div><div class="item"><a href="#">这是标题</a><p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p><i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i></div><div class="item"><a href="#">这是标题</a><p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p><i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i></div> --></div></div><script>function Search(){// 是浏览器的一个弹出框// alert("hello js!");// 1. 提取数据, $可以理解成就是JQuery的别称let query = $(".container .search input").val();console.log("query = " + query); //console是浏览器的对话框，可以用来进行查看js数据//2. 发起http请求,ajax: 属于一个和后端进行数据交互的函数，JQuery中的$.ajax({type: "GET",url: "/s?word=" + query,success: function(data){console.log(data);BuildHtml(data);}});}function BuildHtml(data){// 获取html中的result标签let result_lable = $(".container .result");// 清空历史搜索结果result_lable.empty();for( let elem of data){// console.log(elem.title);// console.log(elem.url);let a_lable = $("<a>", {text: elem.title,href: elem.url,// 跳转到新的页面target: "_blank"});let p_lable = $("<p>", {text: elem.desc});let i_lable = $("<i>", {text: elem.url});let div_lable = $("<div>", {class: "item"});a_lable.appendTo(div_lable);p_lable.appendTo(div_lable);i_lable.appendTo(div_lable);div_lable.appendTo(result_lable);}}</script>
</body>
</html>

10.项目日志编写

#include <iostream>
#include <string>
#include <ctime>#define NORMAL 1
#define WARNING 2
#define DEBUG 3
#define FATAL 4#define LOG(LEVEL,MESSAGE) log(#LEVEL,MESSAGE,__FILE__,__LINE__)
void log(std::string level,std::string message,std::string file,int line)
{std::cout<<"[" <<level<<"]" <<"[" << time(nullptr)<<"]"<<"[" <<message<<"]"<<"[" <<file<<"]"<<"[" <<line<<"]"<<std::endl;
}

项目部署到Linux服务器上

nohup ./http_server > log/log.txt 2>&1 &

11.项目测试

项目结果：boost 搜索引擎

项目代码地址：project-boost-search-engine

1.项目相关背景

2.项目宏观原理

3.技术栈和项目环境

4.正排索引&&倒排索引

5.去标签与数据清洗

6.构建索引模块Index

6.1正排索引

6.2 建立倒排

jiebacpp使用

建立分词

7.搜索引擎模块Searcher

Jsoncpp -- 通过jsoncpp进行序列化和反序列化

处理Content

8.引入http-lib

9.Web前端代码编写

10.项目日志编写

11.项目测试

相关文章：