自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<sub id="mxm5d"><rt id="mxm5d"><form id="mxm5d"></form></rt></sub>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

AI.x社區(qū)

登錄/注冊
51CTO

中國優(yōu)質(zhì)的IT技術(shù)網(wǎng)站

51CTO博客

專業(yè)IT技術(shù)創(chuàng)作平臺

51CTO學(xué)堂

IT職業(yè)在線教育平臺

Advanced RAG 08：使用 Self-RAG 打造高質(zhì)量、可追溯的 RAG System 原創(chuàng) 精華

發(fā)布于 2024-6-19 12:24

瀏覽

0收藏

編者按：RAG 技術(shù)通過檢索并利用外部知識源，能夠較為有效地提升生成內(nèi)容的準(zhǔn)確性和多樣性。然而，經(jīng)典 RAG 流程也存在一些不足，例如不必要的檢索會浪費計算資源，并可能引入無關(guān)內(nèi)容或錯誤信息，影響生成質(zhì)量。
本文介紹了 Self-RAG 這一技術(shù)，通過引入 Reflection Tokens，語言模型能夠根據(jù)具體需求動態(tài)決定是否檢索外部知識，從而大大減少了不必要的檢索操作。與此同時，Self-RAG 通過特殊的訓(xùn)練流程，使生成的內(nèi)容不僅通順自然，與事實知識相符，甚至還可以追溯知識源。
當(dāng)然，Self-RAG技術(shù)訓(xùn)練過程相對復(fù)雜，在生成階段也融入了諸多特殊機(jī)制，在一定程度上增加了推理成本。不過，本文作者也提出了優(yōu)化 Self-RAG 的一些建議，如簡化 Reflection Tokens 設(shè)計、探索不同模型大小的影響等，指明了一些技術(shù)發(fā)展方向。

作者 | Florian June

編譯 | 岳揚

本文從一個常見的生活場景入手：參加開卷考試。我們通常會采用以下兩種作答策略：

方法一：對于熟悉的題目，直接快速作答；對于不熟悉的題目，快速翻閱參考書，找到相關(guān)部分，在腦海中整理分類和總結(jié)后，再在試卷上作答。
方法二：每一個題目都需要參考書本進(jìn)行解答。先找到相關(guān)部分，在腦海中進(jìn)行整合和總結(jié)后，再到試卷上書寫答案。

顯然，方法一更受考生青睞，是首選方法。方法二不僅耗時，還有可能引入無關(guān)的或錯誤的信息，導(dǎo)致出現(xiàn)混淆和錯誤，甚至在考生原本擅長的領(lǐng)域也不例外。

然而，方法二是經(jīng)典的 RAG （檢索->整合->生成）流程^[1]，而方法一則代表 Self-RAG 流程^[2]，本文將進(jìn)一步探討這個問題。

01 Overview

圖 1 將 RAG 和 Self-RAG^[2] 的主要流程進(jìn)行了比較：

Advanced RAG 08：使用 Self-RAG 打造高質(zhì)量、可追溯的 RAG System-AI.x社區(qū)

圖 1：Overview of Self-RAG 。Self-RAG（右）的主要功能為檢索（retrieve）、評判（critique）和生成（generate），使生成的文本內(nèi)容不僅通順流暢，而且與事實知識相符，并且可以追溯到原始知識源。資料來源：https://arxiv.org/pdf/2310.11511.pdf

Self-RAG 主要由三大步驟組成：

按需檢索（Retrieval as Needed）：當(dāng)模型需要檢索信息時，例如查詢“美國各州的名字是怎么來的？”(圖 1 右上方）時，模型輸出中會包含一個 [Retrieve] token，表示需要檢索與 Query 相關(guān)的內(nèi)容。相反，當(dāng)被要求寫 "寫一篇主題為 ‘記憶中深刻的暑假’ 的文章"（圖 1 右下方）時，模型會選擇直接生成答案，無需進(jìn)行檢索。
以并行方式生成內(nèi)容（Parallel Generation）：模型會同時使用提示詞（prompt）和檢索到的內(nèi)容來生成模型輸出。在整個過程中，有三種 reflection tokens （譯者注：用于指示模型在生成過程中應(yīng)該執(zhí)行哪些操作、效果如何的token，比如前文的 [Retrieve]。）會顯示檢索內(nèi)容的相關(guān)性如何。
內(nèi)容的評估和選擇：對步驟 2 中生成的內(nèi)容進(jìn)行評估，并選擇最佳文檔段落作為輸出。

請注意，上述模型是經(jīng)過特殊訓(xùn)練的，其訓(xùn)練過程將在本文后續(xù)部分進(jìn)行討論。

02 Reflection Tokens 的介紹

如圖 2 所示，與 RAG 相比，self-RAG 框架的不同之處在于它在生成過程中使用了 reflection tokens 進(jìn)行更精確的控制。

Advanced RAG 08：使用 Self-RAG 打造高質(zhì)量、可追溯的 RAG System-AI.x社區(qū)

圖 2：Self-RAG 中使用的四種 reflection tokens 。每種類型都使用多個 tokens 來表示其輸出值（output）。底部三行是三類 critique tokens ，粗體字表示這一類中最理想的 critique tokens 。x、y、d 分別表示輸入（input）、輸出（output）和相關(guān)段落（relevant passage）。來源：Self-RAG^[2]

一般來說，self-RAG 會做出四種不同的判斷：

[Retrieve]：決策過程，決定是否從資源 R 中檢索額外信息。
[IsREL]：相關(guān)性檢查，確定給定數(shù)據(jù) d 是否包含解決問題 x 所需的信息。
[IsSUP]：驗證過程，檢查數(shù)據(jù) d 中是否有依據(jù)支持所生成回復(fù) y 中的內(nèi)容。
[IsUSE]：輸出結(jié)果是分?jǐn)?shù)數(shù)值（數(shù)值范圍 1-5 ），其中 5 代表所生成的回復(fù)對問題的解決有用程度最高。

在 RAG 系統(tǒng)中，檢索是一個必須進(jìn)行的固定步驟，無論條件如何，都要進(jìn)行檢索。相比之下，self-RAG 引入了 reflective tokens ，使 LLM 更具適應(yīng)性（adaptable）和智能性（intelligent）。在文本生成過程中，如果 LLM 遇到了需要額外信息支持的不確定區(qū)域，它會在遇到 reflective token 時暫停執(zhí)行文本生成任務(wù)，系統(tǒng)會執(zhí)行一次快速而精確的信息檢索操作，最后 LLMs 會利用新獲取的信息繼續(xù)完成當(dāng)前的文本生成任務(wù)。

03 Code Explanation 通過解讀代碼了解 self-RAG

為了直觀地理解 Self-RAG 過程，我們需要首先查看、檢查代碼，然后詳細(xì)討論模型的訓(xùn)練過程。

Self-RAG 這種技術(shù)是開源的^[3]，知名的開源 Python 庫 Langchain?^[4] 和 LlamaIndex 都分別實現(xiàn)了 Self-RAG 功能。本文將以 LlamaIndex 庫中 Self-RAG 的具體技術(shù)實現(xiàn)^[5]作為參考進(jìn)行說明。

3.1 環(huán)境配置

首先，進(jìn)行環(huán)境配置。

(base) Florian@instance-1:~$ conda create -n llamaindex python=3.11

(base) Florian@instance-1:~$ conda activate llamaindex


(llamaindex) Florian@instance-1:~$ pip install llama-index

(llamaindex) Florian@instance-1:~$ pip install huggingface-hub

(llamaindex) Florian@instance-1:~$ huggingface-cli login

安裝完成后，LlamaIndex 的版本信息如下，請確認(rèn)：

llama-index                             0.10.20

llama-index-core                        0.10.20.post2

下載論文提供的 Llama2-7B 模型，模型大小約為 4.08G。

(llamaindex) Florian@instance-1:~$ huggingface-cli download m4r1/selfrag_llama2_7b-GGUF selfrag_llama2_7b.q4_k_m.gguf --local-dir "YOUR_DOWNLOAD_MODEL_DIR" --local-dir-use-symlinks False

(llamaindex) Florian@instance-1:~$ ls "YOUR_DOWNLOAD_MODEL_DIR"
selfrag_llama2_7b.q4_k_m.gguf

3.2 測試代碼

測試代碼如下。首次執(zhí)行時需要下載 SelfRAGPack^[5]。

import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

from llama_index.core import Document, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.readers import SimpleDirectoryReader
from pathlib import Path


# Option: download SelfRAGPack
# The first execution requires the download of SelfRAGPack. 
# Subsequent executions can comment this out.
from llama_index.core.llama_pack import download_llama_pack
download_llama_pack(
 "SelfRAGPack",
 "./self_rag_pack")

from llama_index.packs.self_rag import SelfRAGQueryEngine

# The directory where the Llama2 model was previously downloaded and saved.
download_dir = "YOUR_DOWNLOAD_MODEL_DIR"

# Create testing documents
documents = [
    Document(
        text="A group of penguins, known as a 'waddle' on land, shuffled across the Antarctic ice, their tuxedo-like plumage standing out against the snow."
 ),
    Document(
        text="Emperor penguins, the tallest of all penguin species, can dive deeper than any other bird, reaching depths of over 500 meters."
 ),
    Document(
        text="Penguins' black and white coloring is a form of camouflage called countershading; from above, their black back blends with the ocean depths, and from below, their white belly matches the bright surface."
 ),
    Document(
        text="Despite their upright stance, penguins are birds that cannot fly; their wings have evolved into flippers, making them expert swimmers."
 ),
    Document(
        text="The fastest species, the Gentoo penguin, can swim up to 36 kilometers per hour, using their flippers and streamlined bodies to slice through the water."
 ),
    Document(
        text="Penguins are social birds; many species form large colonies for breeding, which can number in the tens of thousands."
 ),
    Document(
        text="Intriguingly, penguins have excellent hearing and rely on distinct calls to identify their mates and chicks amidst the noisy colonies."
 ),
    Document(
        text="The smallest penguin species, the Little Blue Penguin, stands just about 40 cm tall and is found along the coastlines of southern Australia and New Zealand."
 ),
    Document(
        text="During the breeding season, male Emperor penguins endure the harsh Antarctic winter for months, fasting and incubating their eggs, while females hunt at sea."
 ),
    Document(
        text="Penguins consume a variety of seafood; their diet mainly consists of fish, squid, and krill, which they catch on their diving expeditions."
 ),
]

index = VectorStoreIndex.from_documents(documents)

# Setup a simple retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)


model_path = Path(download_dir) / "selfrag_llama2_7b.q4_k_m.gguf"
query_engine = SelfRAGQueryEngine(str(model_path), retriever, verbose=True)

# No retreival example
response = query_engine.query("Which genre the book pride and prejudice?")

# Retreival example
response = query_engine.query("How tall is the smallest penguins?")

測試代碼運行結(jié)果如下（大部分 llama_cpp 的調(diào)試信息已被刪除）：

...
...
Model metadata: {'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '4096', 'general.name': 'LLaMA v2', 'tokenizer.ggml.add_bos_token': 'true', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '11008', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '15'}
Using fallback chat format: None

llama_print_timings:        load time = 4887.53 ms
llama_print_timings:      sample time = 11.29 ms / 22 runs   ( 0.51 ms per token, 1947.76 tokens per second)
llama_print_timings: prompt eval time = 4887.46 ms / 24 tokens ( 203.64 ms per token, 4.91 tokens per second)
llama_print_timings: eval time = 5883.27 ms / 21 runs   ( 280.16 ms per token, 3.57 tokens per second)
llama_print_timings:       total time = 10901.84 ms / 45 tokens
Final answer: The book "Pride and Prejudice" is a romantic novel by Jane Austen.
...
...
llama_print_timings:        load time = 4887.53 ms
llama_print_timings:      sample time = 11.74 ms / 20 runs   ( 0.59 ms per token, 1703.29 tokens per second)
llama_print_timings: prompt eval time = 7473.66 ms / 37 tokens ( 201.99 ms per token, 4.95 tokens per second)
llama_print_timings: eval time = 5414.34 ms / 19 runs   ( 284.96 ms per token, 3.51 tokens per second)
llama_print_timings:       total time = 13076.88 ms / 56 tokens
Input: ### Instruction:
How tall is the smallest penguins?

### Response:
[Retrieval]<paragraph>Penguins consume a variety of seafood; their diet mainly consists of fish, squid, and krill, which they catch on their diving expeditions.</paragraph>
Prediction: [Relevant]The height of the smallest penguin species can vary depending on the species.[No support / Contradictory][Utility:5]
Score: 1.4213598342974367
10/10 paragraphs done

End evaluation
Selected the best answer: [Relevant]The smallest penguin species is the Little Blue Penguin (also known as the Fairy Penguin), which can grow to be around 40 centimeters (16 inches) in height.[Fully supported][Utility:5]
Final answer: The smallest penguin species is the Little Blue Penguin (also known as the Fairy Penguin), which can grow to be around 40 centimeters (16 inches) in height.

我們可以看到，第一個 query 不需要執(zhí)行檢索操作，而第二個 query 已進(jìn)行了檢索操作并進(jìn)行了輸出內(nèi)容的評估。

理解測試代碼的關(guān)鍵在于弄清楚類 SelfRAGQueryEngine^[6] 的實現(xiàn)，現(xiàn)在我們來深入探討一下這個類。

3.2 類 SelfRAGQueryEngine

首先來看看構(gòu)造函數(shù)^[7]，它主要用于使用 llama_cpp 加載 Llama2-7B 模型。

class SelfRAGQueryEngine(CustomQueryEngine):
 """Simple short form self RAG query engine."""

    llm: Any = Field(default=None, description="llm")
    retriever: BaseRetriever = Field(default=None, description="retriever")
    generate_kwargs: Dict = Field(default=None, description="llm generation arguments")
    verbose: bool = Field(default=True, description="Verbose.")

 def __init__(
        self,
        model_path: str,
        retriever: BaseRetriever,
        verbose: bool = False,
        model_kwargs: Dict = None,
        generate_kwargs: Dict = None,
 **kwargs: Any,
 ) -> None:
 """Init params."""
 super().__init__(verbose=verbose, **kwargs)
        model_kwargs = model_kwargs or _MODEL_KWARGS
        self.generate_kwargs = generate_kwargs or _GENERATE_KWARGS
 try:
 from llama_cpp import Llama
 except ImportError:
 raise ImportError(_IMPORT_ERROR_MSG)
        self.llm = Llama(model_path=model_path, verbose=verbose, **model_kwargs)
        self.retriever = retriever

接下來，我們將介紹處理 query 的相關(guān)函數(shù)^[8]。其主要流程如圖 3 所示：

Advanced RAG 08：使用 Self-RAG 打造高質(zhì)量、可追溯的 RAG System-AI.x社區(qū)

圖 3：query 處理功能的主要流程。圖片由作者提供。

關(guān)鍵部分已作注釋，以便讀者更好地理解。

def custom_query(self, query_str: str) -> Response:
 """Run self-RAG."""
 # Obtain responses using the Llama2 model.
        response = self.llm(prompt=_format_prompt(query_str), **_GENERATE_KWARGS)
        answer = response["choices"][0]["text"]
        source_nodes = []

 # Determine if a retrieval is necessary.
 if "[Retrieval]" in answer:
 if self.verbose:
                print_text("Retrieval required\n", color="blue")
 # The step 1 of Figure 1, retrieve as needed.
            documents = self.retriever.retrieve(query_str)
 if self.verbose:
                print_text(f"Received: {len(documents)} documents\n", color="blue")
            paragraphs = [
                _format_prompt(query_str, document.node.text) for document in documents
 ]

 if self.verbose:
                print_text("Start evaluation\n", color="blue")

 # Step 2 and 3 in Figure 1, generate in parallel and evaluate 
 # (the code does not implement parallelism)
            critic_output = self._run_critic(paragraphs)

            paragraphs_final_score = critic_output.paragraphs_final_score
            llm_response_per_paragraph = critic_output.llm_response_per_paragraph
            source_nodes = critic_output.source_nodes

 if self.verbose:
                print_text("End evaluation\n", color="blue")

 # Select the paragraph with the highest score and return it.
            best_paragraph_id = max(
                paragraphs_final_score, key=paragraphs_final_score.get
 )
            answer = llm_response_per_paragraph[best_paragraph_id]
 if self.verbose:
                print_text(f"Selected the best answer: {answer}\n", color="blue")

        answer = _postprocess_answer(answer)
 if self.verbose:
            print_text(f"Final answer: {answer}\n", color="green")
 return Response(response=str(answer), source_nodes=source_nodes)

從代碼中可以看到，圖 1 中的三個步驟都有所體現(xiàn)。然而，LlamaIndex 的 self-RAG 功能并未實現(xiàn)并行化。感興趣的讀者可以查看 self._run_critic 函數(shù)了解更多信息，該函數(shù)也處理與各種 reflection tokens 相對應(yīng)的分?jǐn)?shù)。

04 如何訓(xùn)練 Llama2–7B 模型

我們以前曾多次使用過 Llama2-7B 模型，本文來探討一下如何獲取和訓(xùn)練該模型。

4.1 訓(xùn)練目標(biāo)

使語言模型能夠生成包含 reflection tokens 的文本。

4.2 兩個模型

在訓(xùn)練過程中，需要使用兩個模型：批判模型（critic model） C 和生成模型（generator model） M。批判模型 C 主要生成 M 所需的已經(jīng)標(biāo)注好的用于有監(jiān)督學(xué)習(xí)任務(wù)的數(shù)據(jù)（supervision data）。

然而，在推理過程中，只需使用模型 M，不需要批判模型 C。

4.3 批判模型 C

批判模型（critic model）經(jīng)過訓(xùn)練可生成 reflection tokens 。使用該模型是為了在任務(wù)輸出中能夠離線插入 reflection tokens ，從而更新訓(xùn)練語料庫（training corpus）。

手動為每個文本段落標(biāo)注 reflection tokens 的成本非常高。Self-RAG 利用 GPT-4 根據(jù)不同 reflection token 的定義、輸入和輸出，為每個 reflection token 分配唯一的特定指令，從而高效地完成數(shù)據(jù)標(biāo)注任務(wù)。例如，[retrieval] token 會指導(dǎo)模型在進(jìn)行內(nèi)容評估時是否需要檢索外部文檔。

獲得訓(xùn)練數(shù)據(jù) D_critic 后，我們就可以基于傳統(tǒng)的、基于條件的語言模型（standard conditional language model）構(gòu)建用于訓(xùn)練機(jī)器學(xué)習(xí)模型的目標(biāo)函數(shù)，如下所示：

Advanced RAG 08：使用 Self-RAG 打造高質(zhì)量、可追溯的 RAG System-AI.x社區(qū)

批判模型 C 可以用任何預(yù)訓(xùn)練的語言模型進(jìn)行初始化和進(jìn)一步微調(diào)。例如，它可以直接使用與生成模型相同的預(yù)訓(xùn)練模型（如Llama 2 7B）進(jìn)行初始化。

4.4 生成模型 M

圖 4 顯示了在 Self-RAG 框架中如何收集用于訓(xùn)練生成模型（Generator）的 supervision data（譯者注：已經(jīng)標(biāo)注好的用于有監(jiān)督學(xué)習(xí)任務(wù)的數(shù)據(jù)）。給定一對輸入輸出（x, y），self-RAG 使用檢索器和批判模型對原始的模型輸出 y 進(jìn)行標(biāo)注，從而創(chuàng)建 supervision data（譯者注：已經(jīng)標(biāo)注好的用于有監(jiān)督學(xué)習(xí)任務(wù)的數(shù)據(jù)）。對于 y 中的每個段落 yt（yt ∈ y）：

Advanced RAG 08：使用 Self-RAG 打造高質(zhì)量、可追溯的 RAG System-AI.x社區(qū)

圖4：收集生成模型（generator）的訓(xùn)練數(shù)據(jù)。圖中的每次條件判斷都是通過批判模型 C 執(zhí)行的。該圖片由作者提供，靈感來源于 Self-RAG^[2] section 3.2.2 。

請注意，圖 4 中的每次條件判斷都是通過批判模型 C 執(zhí)行的。獲得的訓(xùn)練數(shù)據(jù)如圖 5 所示：

Advanced RAG 08：使用 Self-RAG 打造高質(zhì)量、可追溯的 RAG System-AI.x社區(qū)

圖 5：Self-RAG 訓(xùn)練示例。左邊的例子不需要進(jìn)行外部檢索，而右邊的例子需要檢索外部文檔，因此插入了相關(guān)文檔段落。來源：Self-RAG^[2]。

獲得訓(xùn)練數(shù)據(jù) D_gen 后，我們就可以構(gòu)建用于訓(xùn)練語言模型的 standard next-token prediction objective function （譯者注：“Standard”指這是語言模型訓(xùn)練中最普遍、最基礎(chǔ)的目標(biāo)函數(shù)形式?！癗ext-token prediction”指的是該函數(shù)的目標(biāo)是預(yù)測序列中的下一個token（詞元），“Objective function”即目標(biāo)函數(shù)，可能是損失函數(shù)等內(nèi)容。）啦，如下所示：

生成模型 M 需要預(yù)測的不僅僅是輸出內(nèi)容，還包括 reflection tokens 。

05 作者對 self-RAG 的見解和思考

總體來看，self-RAG 為 RAG 技術(shù)的增強(qiáng)和優(yōu)化提供了一種全新的視角。不過，這種技術(shù)需要更復(fù)雜的訓(xùn)練流程，并且在內(nèi)容生成階段（generation phase）融入了一些特殊機(jī)制，不僅生成目標(biāo)輸出文本，還會生成多種類型的反饋標(biāo)簽，并在生成時根據(jù)這些標(biāo)簽執(zhí)行多個判斷操作，這樣會不可避免地會增加推理成本?？赡軙?/span>嚴(yán)重影響重視實時性能要求的項目。

此外，該框架還有很大的優(yōu)化空間。為激發(fā)進(jìn)一步的討論和創(chuàng)新，這里有一些建議：

如何優(yōu)化 reflection tokens 。Self-RAG 設(shè)計了四種 reflection tokens 。除了 [Retrieve] token 之外，其他三個（[IsREL]、[IsSUP]、[IsUSE]）都有一定的相似性?？梢钥紤]使用更少的 reflection tokens ，或者設(shè)計表示其他語義的 reflection tokens ，這可能是一個值得探索的優(yōu)化方向。
為什么批判模型（critic model）要使用大語言模型（LLM）呢？我認(rèn)為這可能是由于像[IsUSE]這樣的 token 嚴(yán)重依賴于常識性知識。判斷某個回答的有用，是一項較小的模型可能也能夠勝任的任務(wù)。然而，這些較小的模型通常僅學(xué)習(xí)了特定的訓(xùn)練數(shù)據(jù)，缺乏全面的知識。因此，使用LLM作為批判模型（critic model）是有道理的。
批判模型（critic model）模型大小的選擇。我們已經(jīng)在 7B 和 13B 的模型上對 Self-RAG 進(jìn)行了測試，結(jié)果非常出色。但是，如果我們切換到較小尺寸的 LLM，比如 3B ，會觀察到哪些差異？同樣，如果我們過渡到使用更大的 LLM ，比如 33B ，我們又能預(yù)見到多大的系統(tǒng)性能提升呢？
為什么不使用基于人類反饋的強(qiáng)化學(xué)習(xí)（RLHF）呢？該論文建議在特定任務(wù)示例數(shù)據(jù)（task examples）上訓(xùn)練目標(biāo)語言模型。然后使用離線但是批判模型對這些數(shù)據(jù)進(jìn)行增強(qiáng)（使用 reflection tokens），與 RLHF 相比，訓(xùn)練成本大大降低。此外，self-RAG 中的 reflection tokens 使得在推理階段的內(nèi)容生成可控，而 RLHF 專注于在訓(xùn)練期間與人類的偏好對齊。不過，論文中沒有包含任何與 RLHF 相關(guān)的對比實驗。

06 Conclusion

本文從一個直觀的生活場景（開卷考試）入手，介紹了 self-RAG 技術(shù)的基本流程，并輔以代碼解釋。文章還分享了作者的一些見解和思考。

如果你對 RAG（檢索增強(qiáng)生成）技術(shù)非常感興趣，歡迎瀏覽、分享本系列其他文章。 : )

Thanks for reading!

Julian Yip

Multi-Cloud Data Architect | Azure, GCP, Databricks Certified | ML and MLOps Practitioner

END

參考資料

[1]https://medium.com/ai-in-plain-english/a-brief-introduction-to-retrieval-augmented-generation-rag-b7eb70982891

[2]https://arxiv.org/pdf/2310.11511.pdf

[3]https://github.com/AkariAsai/self-rag

[4]https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_self_rag.ipynb?ref=blog.langchain.dev

[5]https://github.com/run-llama/llama_index/tree/v0.10.20/llama-index-packs/llama-index-packs-self-rag

[6]https://github.com/run-llama/llama_index/blob/v0.10.20/llama-index-packs/llama-index-packs-self-rag/llama_index/packs/self_rag/base.py

[7]https://github.com/run-llama/llama_index/blob/v0.10.20/llama-index-packs/llama-index-packs-self-rag/llama_index/packs/self_rag/base.py#L174

[8]https://github.com/run-llama/llama_index/blob/v0.10.20/llama-index-packs/llama-index-packs-self-rag/llama_index/packs/self_rag/base.py#L245

本文經(jīng)原作者授權(quán)，由 Baihai IDP 編譯。如需轉(zhuǎn)載譯文，請聯(lián)系獲取授權(quán)。

原文鏈接：

???https://ai.gopubby.com/advanced-rag-08-self-rag-c0c5b5952e0e????

?著作權(quán)歸作者所有，如需轉(zhuǎn)載，請注明出處，否則將追究法律責(zé)任

標(biāo)簽

贊

收藏

回復(fù)

舉報

回復(fù)

相關(guān)推薦

Arena-Hard：開源高質(zhì)量大模型評估基準(zhǔn)

Aceryt ? 4148瀏覽 ? 0回復(fù)
七個高質(zhì)量潤色論文和文章的指令

數(shù)師兄 ? 3390瀏覽 ? 0回復(fù)
阿里RAG新框架R4：增強(qiáng)檢索器-重排序-響應(yīng)器，5個知識密集任務(wù)上都超過Self-RAG等！

PaperAgent ? 5465瀏覽 ? 0回復(fù)
Advanced RAG 07：在 RAG 系統(tǒng)中進(jìn)行表格數(shù)據(jù)處理的新思路

Baihai_IDP ? 5103瀏覽 ? 0回復(fù)
Advanced RAG 09：『提示詞壓縮』技術(shù)綜述

Baihai_IDP ? 2807瀏覽 ? 0回復(fù)
Advanced RAG 10：引入檢索評估、知識精練的 CRAG 技術(shù)詳解

Baihai_IDP ? 3237瀏覽 ? 0回復(fù)
大模型微調(diào)技巧 | 高質(zhì)量指令數(shù)據(jù)篩選方法-MoDS

NLP工作站 ? 3660瀏覽 ? 0回復(fù)
百度也來卷RAG了，Self-Reasoning比Self-RAG最高提升11.8%

PaperAgent ? 2620瀏覽 ? 0回復(fù)
Advanced RAG 11：對用戶輸入的內(nèi)容進(jìn)行「分類處理」和「再優(yōu)化」

Baihai_IDP ? 2275瀏覽 ? 0回復(fù)
提升 RAG 系統(tǒng)的回答質(zhì)量：構(gòu)建高效的 Prompt

玄姐聊AGI ? 3400瀏覽 ? 0回復(fù)
如何獲取高質(zhì)量數(shù)據(jù)進(jìn)行代碼指令調(diào)優(yōu)？

NLP工作站 ? 2412瀏覽 ? 0回復(fù)
改進(jìn)RAG管道檢索文檔質(zhì)量的五種方法

51CTO內(nèi)容精選 ? 2462瀏覽 ? 0回復(fù)
對于Advanced RAG系統(tǒng)你不得不知道的三個階段優(yōu)化技巧

AI博物院 ? 2448瀏覽 ? 0回復(fù)
Material Anything：端到端打造任意3D物體的高質(zhì)量材質(zhì)！

angel ? 2578瀏覽 ? 0回復(fù)
讓你的RAG應(yīng)用更加智能！引入自我反思的大模型 RAG 框架(Self-RAG)

AI博物院 ? 2635瀏覽 ? 0回復(fù)
GraphRAG結(jié)合普通RAG，打造Hybrid RAG

小虎哦哦 ? 2651瀏覽 ? 0回復(fù)
大模型訓(xùn)練之訓(xùn)練數(shù)據(jù)準(zhǔn)備，即怎么準(zhǔn)備高質(zhì)量的訓(xùn)練數(shù)據(jù)集？

AI探索時代 ? 3198瀏覽 ? 0回復(fù)
關(guān)于打造高質(zhì)量RAG系統(tǒng)的問題記錄

AI探索時代 ? 1000瀏覽 ? 0回復(fù)
關(guān)于RAG應(yīng)用中怎么高質(zhì)量的進(jìn)行數(shù)據(jù)召回——召回策略的研究

AI探索時代 ? 744瀏覽 ? 0回復(fù)

這個用戶很懶，還沒有個人簡介

帖子

聲望

粉絲

關(guān)注

最近發(fā)布

寫給開發(fā)者的“Vibe coding”提示詞編寫指南 15h前發(fā)布
MCP 協(xié)議為何不如你想象的安全？從技術(shù)專家視角解讀 5天前發(fā)布

熱門推薦

MCP 協(xié)議為何不如你想象的安全？從技術(shù)專家視角解讀 0回復(fù)

擺脫云端限制！Qwen3+MCP+Ollama 本地工具調(diào)用實戰(zhàn)教程 0回復(fù)

Spring AI 1.0.0 發(fā)布！支持 MCP 很炸裂！! 1回復(fù)

2025年最值得關(guān)注的十大多模態(tài)大語言模型！ 0回復(fù)

Crawl4AI：GitHub榜首40K星標(biāo)！LLM專屬極速開源爬蟲神器 0回復(fù)

上一篇： Netflix 機(jī)器學(xué)習(xí)科學(xué)家的提示詞優(yōu)化經(jīng)驗分享

下一篇：合成數(shù)據(jù)：解鎖通用人工智能的“關(guān)鍵之鑰”？

社區(qū)精華內(nèi)容

目錄

^{<blockquote id="56wum"><i id="56wum"></i></blockquote>}

<legend id="56wum"><track id="56wum"></track></legend>

<blockquote id="56wum"><p id="56wum"></p></blockquote>

<sub id="56wum"><p id="56wum"></p></sub>