机器学习集成学习篇——python实现Bagging和AdaBOOST算法CyrusMay的博客-

13 六月

星期六, 13 六月 2020 08:00 Last Updated on 星期六, 13 六月 2020 08:00 0 Comments

机器学习集成学习篇——python实现Bagging和AdaBOOST算法

摘要

本文通过python实现了集成学习中的Bagging和AdaBOOST算法，并将代码进行了封装，方便读者调用。

Bagging算法

import numpy as np import pandas as pd class Cyrus_bagging(object):     def __init__(self,estimator,n_estimators = 20):         self.estimator = estimator         self.n_estimators = n_estimators         self.models = None     def fit(self,x,y):         x = np.array(x)         y = np.array(y).reshape((-1,))         indices = np.arange(x.shape[0])         self.models = [] for i in range(self.n_estimators):             index = np.random.choice(indices,x.shape[0])             x0 = x[index]             y0 = y[index]             self.models.append(self.estimator.fit(x0,y0))     def predict(self,x):         res = np.zeros([x.shape[0],self.n_estimators]) for i in range(self.n_estimators):             res[:,i] = self.models[i].predict(x)         result = [] for i in range(res.shape[0]):             pd_res = pd.Series(res[i,:]).value_counts()             result.append(int(pd_res.argmax())) return np.array(result)

from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import classification_report knn = KNeighborsClassifier() model = Cyrus_bagging(knn) model.fit(x_train,y_train) y_pre = model.predict(x_test) print(classification_report(y_test,y_pre))

示例使用的数据为了与不使用集成算法的模型的准确率区分开来，所以使用较少特征的数据，因而准确率不是特别高，不过与未使用集成算法的模型相比，准确率已经优出不少。

precision    recall  f1-score   support            0       1.00      1.00      1.00        11           1       0.67      0.67      0.67         9           2       0.70      0.70      0.70        10  avg / total       0.80      0.80      0.80        30

Adaboost算法

import numpy as np import pandas as pd from sklearn.metrics import accuracy_score class CyrusAdaBoost(object):     def __init__(self,estimator,n_estimators = 20):         self.estimator = estimator         self.n_estimators = n_estimators         self.error_rate = None         self.model = None     def update_w(self,y,pre_y,w):         error_rate = 1 - accuracy_score(y,pre_y) for i in range(w.shape[0]): if y[i] == pre_y[i]:                 w[i] = w[i]*np.exp(-error_rate) else:                 w[i] = w[i]*np.exp(error_rate) return w/w.sum()     def cal_label(self,result,alpha):         label = [] for i in range(result.shape[0]):             count = np.zeros(int(result[i,:].max()+1)) for j in range(result.shape[1]):                 count[int(result[i,j])] += alpha[j]             label.append(count.argmax()) return np.array(label)     def fit(self,x,y):         x = np.array(x)         y = np.array(y).reshape((-1,))         self.error_rate = []         self.model = []         w0 = np.ones(x.shape[0])         w0 = w0/w0.sum()         indices = np.arange(x.shape[0]) for i in range(self.n_estimators):             index = np.random.choice(indices,size = x.shape[0],p = w0)             x0 = x[index]             y0 = y[index]             model0 = self.estimator.fit(x0,y0)             pre_y0 = model0.predict(x0)             error_rate = 1 - accuracy_score(y0,pre_y0)             self.error_rate.append(error_rate)             self.model.append(model0)             w0 = self.update_w(y0,pre_y0,w0)     def predict(self,x):         res = np.zeros([x.shape[0],self.n_estimators]) for i in range(self.n_estimators):             res[:,i] = self.model[i].predict(x)         alpha = 1 - np.array(self.error_rate) return self.cal_label(res,alpha)

from sklearn.tree import DecisionTreeClassifier model = CyrusAdaBoost(estimator=DecisionTreeClassifier(),n_estimators=50) model.fit(x_train,y_train) y_pre = model.predict(x_test) print(accuracy_score(y_pre,y_test))

0.932

by CyrusMay 2020 06 12

这世界全部的漂亮
不过你的可爱模样
——————五月天（爱情的模样）——————

展开阅读全文

8
评论 1
x
海报

扫一扫，海报
手机看

到微信朋友圈

x

扫一扫，手机阅读
打赏

打赏

Cyrus_May

“你的鼓励将是我创作的最大动力”

5C币 10C币 20C币 50C币 100C币 200C币

确定

全栈工程师开发手册（原创）(腾讯内推)

02-27 机器学习集成学习篇——python实现Bagging和AdaBOOST算法CyrusMay的博客- 8205

python机器学习案例系列教程——集成学习（Bagging、Boosting、随机森林RF、AdaBoost、GBDT、xgboost）

全栈工程师开发手册（作者：栾鹏） python数据挖掘系列教程可以通过聚集多个分类器的预测结果提高分类器的分类准确率，这一方法称为集成（Ensemble）学习或分类器组合（Classifier Combination），该方法由训练数据构建一组基分类器（Base Classifier），然后通过对每个基分类器的预测进行投票来进行分类。集成学习（ensemble lear……

本页所有内容来自官方网站 https://www.imapbox.com 新闻来源：互联网搜索引擎和新闻站

本网页所有图片由 ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片，下载并得到。

ImageBox 图片批量下载器工具地址: 网页图片批量下载工具-最新版本下载

非凡下载站地址：https://www.crsky.com/soft/35838.html

本网页所有视频内容由 imoviebox边看边下-网页视频下载, iurlBox网页地址收藏管理器下载并得到。

ImovieBox网页视频下载器下载地址: ImovieBox网页视频下载器-最新版本下载

本文章由: imapbox邮箱云存储,邮箱网盘,ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片,imoviebox网页视频批量下载器,下载视频内容,为您提供.

阅读和此文章类似的: 全球云计算

机器学习集成学习篇——python实现Bagging和AdaBOOST算法CyrusMay的博客-