自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

用XGBoost進(jìn)行時(shí)間序列預(yù)測(cè)

作者：佚名 2021-04-07 10:02:00

開發(fā) 后端

在本教程中，您將發(fā)現(xiàn)如何開發(fā)XGBoost模型進(jìn)行時(shí)間序列預(yù)測(cè)。一起來看看吧。

XGBoost是梯度分類和回歸問題的有效實(shí)現(xiàn)。

它既快速又高效，即使在各種預(yù)測(cè)建模任務(wù)上也表現(xiàn)出色，即使不是最好的，也能在數(shù)據(jù)科學(xué)競(jìng)賽的獲勝者（例如Kaggle的獲獎(jiǎng)?wù)撸┲袕V受青睞。

XGBoost也可以用于時(shí)間序列預(yù)測(cè)，盡管它要求將時(shí)間序列數(shù)據(jù)集首先轉(zhuǎn)換為有監(jiān)督的學(xué)習(xí)問題。它還需要使用一種專門的技術(shù)來評(píng)估模型，稱為前向驗(yàn)證，因?yàn)槭褂胟倍交叉驗(yàn)證對(duì)模型進(jìn)行評(píng)估會(huì)導(dǎo)致樂觀的結(jié)果。

在本教程中，您將發(fā)現(xiàn)如何開發(fā)XGBoost模型進(jìn)行時(shí)間序列預(yù)測(cè)。完成本教程后，您將知道：

1、XGBoost是用于分類和回歸的梯度提升集成算法的實(shí)現(xiàn)。

2、可以使用滑動(dòng)窗口表示將時(shí)間序列數(shù)據(jù)集轉(zhuǎn)換為監(jiān)督學(xué)習(xí)。

3、如何使用XGBoost模型擬合，評(píng)估和進(jìn)行預(yù)測(cè)，以進(jìn)行時(shí)間序列預(yù)測(cè)。

教程概述

本教程分為三個(gè)部分：他們是：

1、XGBoost集成

2、時(shí)間序列數(shù)據(jù)準(zhǔn)備

3、XGBoost用于時(shí)間序列預(yù)測(cè)

XGBoost集成

XGBoost是Extreme Gradient Boosting的縮寫，是隨機(jī)梯度提升機(jī)器學(xué)習(xí)算法的有效實(shí)現(xiàn)。隨機(jī)梯度增強(qiáng)算法（也稱為梯度增強(qiáng)機(jī)或樹增強(qiáng)）是一種功能強(qiáng)大的機(jī)器學(xué)習(xí)技術(shù)，可在各種具有挑戰(zhàn)性的機(jī)器學(xué)習(xí)問題上表現(xiàn)出色，甚至表現(xiàn)最佳。

它是決策樹算法的集合，其中新樹修復(fù)了那些已經(jīng)屬于模型的樹的錯(cuò)誤。將添加樹，直到無法對(duì)模型進(jìn)行進(jìn)一步的改進(jìn)為止。XGBoost提供了隨機(jī)梯度提升算法的高效實(shí)現(xiàn)，并提供了一組模型超參數(shù)，這些參數(shù)旨在提供對(duì)模型訓(xùn)練過程的控制。

XGBoost設(shè)計(jì)用于表格數(shù)據(jù)集的分類和回歸，盡管它可以用于時(shí)間序列預(yù)測(cè)。

首先，必須安裝XGBoost庫。您可以使用pip進(jìn)行安裝，如下所示：

sudo pip install xgboost

一旦安裝，您可以通過運(yùn)行以下代碼來確認(rèn)它已成功安裝，并且您正在使用現(xiàn)代版本：

# xgboost  
import xgboost  
print("xgboost", xgboost.__version__)

運(yùn)行代碼，您應(yīng)該看到以下版本號(hào)或更高版本。

xgboost 1.0.1

盡管XGBoost庫具有自己的Python API，但我們可以通過XGBRegressor包裝器類將XGBoost模型與scikit-learn API結(jié)合使用。

可以實(shí)例化模型的實(shí)例，就像將其用于模型評(píng)估的任何其他scikit-learn類一樣使用。例如：

# define model  
model = XGBRegressor()

現(xiàn)在我們已經(jīng)熟悉了XGBoost，下面讓我們看一下如何為監(jiān)督學(xué)習(xí)準(zhǔn)備時(shí)間序列數(shù)據(jù)集。

時(shí)間序列數(shù)據(jù)準(zhǔn)備

時(shí)間序列數(shù)據(jù)可以表述為監(jiān)督學(xué)習(xí)。給定時(shí)間序列數(shù)據(jù)集的數(shù)字序列，我們可以將數(shù)據(jù)重組為看起來像監(jiān)督學(xué)習(xí)的問題。我們可以通過使用以前的時(shí)間步長(zhǎng)作為輸入變量，并使用下一個(gè)時(shí)間步長(zhǎng)作為輸出變量來做到這一點(diǎn)。讓我們通過一個(gè)例子來具體說明。假設(shè)我們有一個(gè)時(shí)間序列，如下所示：

time, measure  
1, 100  
2, 110  
3, 108  
4, 115  
5, 120

通過使用上一個(gè)時(shí)間步的值來預(yù)測(cè)下一個(gè)時(shí)間步的值，我們可以將此時(shí)間序列數(shù)據(jù)集重組為監(jiān)督學(xué)習(xí)問題。通過這種方式重組時(shí)間序列數(shù)據(jù)集，數(shù)據(jù)將如下所示：

請(qǐng)注意，時(shí)間列已刪除，某些數(shù)據(jù)行不可用于訓(xùn)練模型，例如第一和最后一個(gè)。

這種表示稱為滑動(dòng)窗口，因?yàn)檩斎牒皖A(yù)期輸出的窗口會(huì)隨著時(shí)間向前移動(dòng)，從而為監(jiān)督學(xué)習(xí)模型創(chuàng)建新的“樣本”。

有關(guān)準(zhǔn)備時(shí)間序列預(yù)測(cè)數(shù)據(jù)的滑動(dòng)窗口方法的更多信息。

在給定所需的輸入和輸出序列長(zhǎng)度的情況下，我們可以在Pandas中使用shift（）函數(shù)自動(dòng)創(chuàng)建時(shí)間序列問題的新框架。

這將是一個(gè)有用的工具，因?yàn)樗鼘⒃试S我們使用機(jī)器學(xué)習(xí)算法探索時(shí)間序列問題的不同框架，以查看可能導(dǎo)致性能更好的模型。

下面的函數(shù)將一個(gè)時(shí)間序列作為具有一個(gè)或多個(gè)列的NumPy數(shù)組時(shí)間序列，并將其轉(zhuǎn)換為具有指定數(shù)量的輸入和輸出的監(jiān)督學(xué)習(xí)問題。

# transform a time series dataset into a supervised learning dataset  
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):  
 n_vars = 1 if type(data) is list else data.shape[1]  
 df = DataFrame(data)  
 cols = list()  
 # input sequence (t-n, ... t-1)  
 for i in range(n_in, 0, -1):  
  cols.append(df.shift(i))  
 # forecast sequence (t, t+1, ... t+n)  
 for i in range(0, n_out):  
  cols.append(df.shift(-i))  
 # put it all together  
 agg = concat(cols, axis=1)  
 # drop rows with NaN values  
 if dropnan:  
  agg.dropna(inplace=True)  
 return agg.values

我們可以使用此函數(shù)為XGBoost準(zhǔn)備時(shí)間序列數(shù)據(jù)集。

準(zhǔn)備好數(shù)據(jù)集后，我們必須小心如何使用它來擬合和評(píng)估模型。

例如，將模型擬合未來的數(shù)據(jù)并預(yù)測(cè)過去是無效的。該模型必須在過去進(jìn)行訓(xùn)練并預(yù)測(cè)未來。這意味著不能使用在評(píng)估過程中將數(shù)據(jù)集隨機(jī)化的方法，例如k折交叉驗(yàn)證。相反，我們必須使用一種稱為前向驗(yàn)證的技術(shù)。在前向驗(yàn)證中，首先通過選擇一個(gè)切點(diǎn)（例如除過去12個(gè)月外，所有數(shù)據(jù)均用于培訓(xùn)，最近12個(gè)月用于測(cè)試。

如果我們有興趣進(jìn)行單步預(yù)測(cè)，例如一個(gè)月后，我們可以通過對(duì)訓(xùn)練數(shù)據(jù)集進(jìn)行訓(xùn)練并預(yù)測(cè)測(cè)試數(shù)據(jù)集的第一步來評(píng)估模型。然后，我們可以將來自測(cè)試集的真實(shí)觀測(cè)值添加到訓(xùn)練數(shù)據(jù)集中，重新擬合模型，然后讓模型預(yù)測(cè)測(cè)試數(shù)據(jù)集中的第二步。對(duì)整個(gè)測(cè)試數(shù)據(jù)集重復(fù)此過程將為整個(gè)測(cè)試數(shù)據(jù)集提供一步式預(yù)測(cè)，可以從中計(jì)算出誤差度量以評(píng)估模型的技能。

下面的函數(shù)執(zhí)行前向驗(yàn)證。它使用時(shí)間序列數(shù)據(jù)集的整個(gè)監(jiān)督學(xué)習(xí)版本以及用作測(cè)試集的行數(shù)作為參數(shù)。然后，它逐步通過測(cè)試集，調(diào)用xgboost_forecast（）函數(shù)進(jìn)行單步預(yù)測(cè)。計(jì)算錯(cuò)誤度量，并將詳細(xì)信息返回以進(jìn)行分析。

# walk-forward validation for univariate data  
def walk_forward_validation(data, n_test):  
 predictions = list()  
 # split dataset  
 train, test = train_test_split(data, n_test)  
 # seed history with training dataset  
 history = [x for x in train]  
 # step over each time-step in the test set  
 for i in range(len(test)):  
  # split test row into input and output columns  
  testX, testtesty = test[i, :-1], test[i, -1]  
  # fit model on history and make a prediction  
  yhat = xgboost_forecast(history, testX)  
  # store forecast in list of predictions  
  predictions.append(yhat)  
  # add actual observation to history for the next loop  
  history.append(test[i])  
  # summarize progress  
  print('>expected=%.1f, predicted=%.1f' % (testy, yhat))  
 # estimate prediction error  
 error = mean_absolute_error(test[:, -1], predictions)  
 return error, test[:, 1], predictions

調(diào)用train_test_split（）函數(shù)可將數(shù)據(jù)集拆分為訓(xùn)練集和測(cè)試集。我們可以在下面定義此功能。

# split a univariate dataset into train/test sets  
def train_test_split(data, n_test): 
 return data[:-n_test, :], data[-n_test:, :]

我們可以使用XGBRegressor類進(jìn)行單步預(yù)測(cè)。下面的xgboost_forecast（）函數(shù)通過將訓(xùn)練數(shù)據(jù)集和測(cè)試輸入行作為輸入，擬合模型并進(jìn)行單步預(yù)測(cè)來實(shí)現(xiàn)此目的。

# fit an xgboost model and make a one step prediction  
def xgboost_forecast(train, testX):  
 # transform list into array  
 train = asarray(train)  
 # split into input and output columns  
 trainX, traintrainy = train[:, :-1], train[:, -1]  
 # fit model  
 model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)  
 model.fit(trainX, trainy)  
 # make a one-step prediction  
 yhat = model.predict([testX])  
 return yhat[0]

現(xiàn)在，我們知道了如何準(zhǔn)備時(shí)間序列數(shù)據(jù)以進(jìn)行預(yù)測(cè)和評(píng)估XGBoost模型，接下來我們可以看看在實(shí)際數(shù)據(jù)集上使用XGBoost的情況。

XGBoost用于時(shí)間序列預(yù)測(cè)

在本節(jié)中，我們將探索如何使用XGBoost進(jìn)行時(shí)間序列預(yù)測(cè)。我們將使用標(biāo)準(zhǔn)的單變量時(shí)間序列數(shù)據(jù)集，以使用該模型進(jìn)行單步預(yù)測(cè)。您可以將本節(jié)中的代碼用作您自己項(xiàng)目的起點(diǎn)，并輕松地對(duì)其進(jìn)行調(diào)整以適應(yīng)多變量輸入，多變量預(yù)測(cè)和多步預(yù)測(cè)。我們將使用每日女性出生數(shù)據(jù)集，即三年中的每月出生數(shù)。

您可以從此處下載數(shù)據(jù)集，并將其放在文件名“ daily-total-female-births.csv”的當(dāng)前工作目錄中。

數(shù)據(jù)集（每天女性出生總數(shù).csv）:

https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv

說明（每日女性出生總數(shù)）:

https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.names

數(shù)據(jù)集的前幾行如下所示：

"Date","Births"  
"1959-01-01",35  
"1959-01-02",32  
"1959-01-03",30  
"1959-01-04",31  
"1959-01-05",44  
...

首先，讓我們加載并繪制數(shù)據(jù)集。下面列出了完整的示例。

# load and plot the time series dataset  
from pandas import read_csv  
from matplotlib import pyplot  
# load dataset  
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)  
values = series.values  
# plot dataset  
pyplot.plot(values) 
pyplot.show()

運(yùn)行示例將創(chuàng)建數(shù)據(jù)集的折線圖。我們可以看到?jīng)]有明顯的趨勢(shì)或季節(jié)性。

當(dāng)預(yù)測(cè)最近的12個(gè)月時(shí)，持久性模型可以實(shí)現(xiàn)約6.7例出生的MAE。這提供了性能基準(zhǔn)，在該基準(zhǔn)之上可以認(rèn)為模型是熟練的。

接下來，當(dāng)對(duì)過去12個(gè)月的數(shù)據(jù)進(jìn)行單步預(yù)測(cè)時(shí)，我們可以評(píng)估數(shù)據(jù)集上的XGBoost模型。

我們將僅使用前6個(gè)時(shí)間步長(zhǎng)作為模型和默認(rèn)模型超參數(shù)的輸入，除了我們將損失更改為'reg：squarederror'（以避免警告消息），并在集合中使用1,000棵樹（以避免學(xué)習(xí)不足））。

下面列出了完整的示例。

# forecast monthly births with xgboost  
from numpy import asarray  
from pandas import read_csv 
from pandas import DataFrame 
from pandas import concat  
from sklearn.metrics import mean_absolute_error  
from xgboost import XGBRegressor  
from matplotlib import pyplot   
# transform a time series dataset into a supervised learning dataset  
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):  
 n_vars = 1 if type(data) is list else data.shape[1]  
 df = DataFrame(data)  
 cols = list()  
 # input sequence (t-n, ... t-1)  
 for i in range(n_in, 0, -1):  
  cols.append(df.shift(i))  
 # forecast sequence (t, t+1, ... t+n)  
 for i in range(0, n_out):  
  cols.append(df.shift(-i))  
 # put it all together  
 agg = concat(cols, axis=1)  
 # drop rows with NaN values  
 if dropnan:  
  agg.dropna(inplace=True)  
 return agg.values   
# split a univariate dataset into train/test sets  
def train_test_split(data, n_test):  
 return data[:-n_test, :], data[-n_test:, :] 
# fit an xgboost model and make a one step prediction  
def xgboost_forecast(train, testX):  
 # transform list into array  
 train = asarray(train)  
 # split into input and output columns  
 trainX, traintrainy = train[:, :-1], train[:, -1]  
 # fit model  
 model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)  
 model.fit(trainX, trainy)  
 # make a one-step prediction  
 yhat = model.predict(asarray([testX]))  
 return yhat[0]  
# walk-forward validation for univariate data  
def walk_forward_validation(data, n_test):  
 predictions = list()  
 # split dataset  
 train, test = train_test_split(data, n_test)  
 # seed history with training dataset  
 history = [x for x in train]  
 # step over each time-step in the test set  
 for i in range(len(test)):  
  # split test row into input and output columns  
  testX, testtesty = test[i, :-1], test[i, -1]  
  # fit model on history and make a prediction  
  yhat = xgboost_forecast(history, testX)  
  # store forecast in list of predictions  
  predictions.append(yhat)  
  # add actual observation to history for the next loop  
  history.append(test[i])  
  # summarize progress  
  print('>expected=%.1f, predicted=%.1f' % (testy, yhat))  
 # estimate prediction error  
 error = mean_absolute_error(test[:, -1], predictions)  
 return error, test[:, -1], predictions  
# load the dataset  
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)  
values = series.values  
# transform the time series data into supervised learning  
data = series_to_supervised(values, n_in=6)  
# evaluate  
mae, y, yhat = walk_forward_validation(data, 12)  
print('MAE: %.3f' % mae)  
# plot expected vs preducted  
pyplot.plot(y, label='Expected')  
pyplot.plot(yhat, label='Predicted')  
pyplot.legend()  
pyplot.show()

運(yùn)行示例將報(bào)告測(cè)試集中每個(gè)步驟的期望值和預(yù)測(cè)值，然后報(bào)告所有預(yù)測(cè)值的MAE。

注意：由于算法或評(píng)估程序的隨機(jī)性，或者數(shù)值精度的差異，您的結(jié)果可能會(huì)有所不同?？紤]運(yùn)行該示例幾次并比較平均結(jié)果。

我們可以看到，該模型的性能優(yōu)于持久性模型，MAE約為5.9，而MAE約為6.7

>expected=42.0, predicted=44.5  
>expected=53.0, predicted=42.5  
>expected=39.0, predicted=40.3  
>expected=40.0, predicted=32.5  
>expected=38.0, predicted=41.1  
>expected=44.0, predicted=45.3  
>expected=34.0, predicted=40.2  
>expected=37.0, predicted=35.0  
>expected=52.0, predicted=32.5  
>expected=48.0, predicted=41.4  
>expected=55.0, predicted=46.6  
>expected=50.0, predicted=47.2  
MAE: 5.957

創(chuàng)建線圖，比較數(shù)據(jù)集最后12個(gè)月的一系列期望值和預(yù)測(cè)值。這給出了模型在測(cè)試集上執(zhí)行得如何的幾何解釋。

圖2

一旦選擇了最終的XGBoost模型配置，就可以最終確定模型并用于對(duì)新數(shù)據(jù)進(jìn)行預(yù)測(cè)。這稱為樣本外預(yù)測(cè)，例如超出訓(xùn)練數(shù)據(jù)集進(jìn)行預(yù)測(cè)。這與在模型評(píng)估期間進(jìn)行預(yù)測(cè)是相同的：因?yàn)槲覀兪冀K希望使用模型用于對(duì)新數(shù)據(jù)進(jìn)行預(yù)測(cè)時(shí)所期望使用的相同過程來評(píng)估模型。下面的示例演示了在所有可用數(shù)據(jù)上擬合最終XGBoost模型并在數(shù)據(jù)集末尾進(jìn)行單步預(yù)測(cè)的過程。

# finalize model and make a prediction for monthly births with xgboost  
from numpy import asarray  
from pandas import read_csv  
from pandas import DataFrame  
from pandas import concat  
from xgboost import XGBRegressor   
# transform a time series dataset into a supervised learning dataset  
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):  
 n_vars = 1 if type(data) is list else data.shape[1]  
 df = DataFrame(data)  
 cols = list()  
 # input sequence (t-n, ... t-1)  
 for i in range(n_in, 0, -1):  
  cols.append(df.shift(i))  
 # forecast sequence (t, t+1, ... t+n)  
 for i in range(0, n_out):  
  cols.append(df.shift(-i))  
 # put it all together  
 agg = concat(cols, axis=1)  
 # drop rows with NaN values  
 if dropnan:  
  agg.dropna(inplace=True)  
 return agg.values   
# load the dataset  
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)  
values = series.values  
# transform the time series data into supervised learning  
train = series_to_supervised(values, n_in=6)  
# split into input and output columns  
trainX, traintrainy = train[:, :-1], train[:, -1] 
# fit model  
model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)  
model.fit(trainX, trainy)  
# construct an input for a new preduction  
row = values[-6:].flatten()  
# make a one-step prediction  
yhat = model.predict(asarray([row]))  
print('Input: %s, Predicted: %.3f' % (row, yhat[0]))

運(yùn)行示例將XGBoost模型適合所有可用數(shù)據(jù)。使用最近6個(gè)月的已知數(shù)據(jù)準(zhǔn)備新的輸入行，并預(yù)測(cè)數(shù)據(jù)集結(jié)束后的下個(gè)月。

Input: [34 37 52 48 55 50], Predicted: 42.708

責(zé)任編輯：龐桂玉來源： Python中文社區(qū) (ID:python-china)

XGBoost Python 代碼

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)