博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
评估模型的性能并调参
阅读量:3950 次
发布时间:2019-05-24

本文共 3227 字,大约阅读时间需要 10 分钟。

评估模型的性能并调参

网格搜索和随机网格搜索

# 使用网格搜索进行超参数调优:# 方式1:网格搜索GridSearchCV()from sklearn.model_selection import GridSearchCVfrom sklearn.svm import SVCimport timestart_time = time.time()pipe_svc = make_pipeline(StandardScaler(),SVC(random_state=1))param_range = [0.0001,0.001,0.01,0.1,1.0,10.0,100.0,1000.0]param_grid = [{
'svc__C':param_range,'svc__kernel':['linear']},{
'svc__C':param_range,'svc__gamma':param_range,'svc__kernel':['rbf']}]gs = GridSearchCV(estimator=pipe_svc,param_grid=param_grid,scoring='accuracy',cv=10,n_jobs=-1)gs = gs.fit(X,y)end_time = time.time()print("网格搜索经历时间:%.3f S" % float(end_time-start_time))print(gs.best_score_)print(gs.best_params_)

在这里插入图片描述

# 方式2:随机网格搜索RandomizedSearchCV()from sklearn.model_selection import RandomizedSearchCVfrom sklearn.svm import SVCimport timestart_time = time.time()pipe_svc = make_pipeline(StandardScaler(),SVC(random_state=1))param_range = [0.0001,0.001,0.01,0.1,1.0,10.0,100.0,1000.0]param_grid = [{
'svc__C':param_range,'svc__kernel':['linear']},{
'svc__C':param_range,'svc__gamma':param_range,'svc__kernel':['rbf']}]# param_grid = [{'svc__C':param_range,'svc__kernel':['linear','rbf'],'svc__gamma':param_range}]gs = RandomizedSearchCV(estimator=pipe_svc, param_distributions=param_grid,scoring='accuracy',cv=10,n_jobs=-1)gs = gs.fit(X,y)end_time = time.time()print("随机网格搜索经历时间:%.3f S" % float(end_time-start_time))print(gs.best_score_)print(gs.best_params_)

在这里插入图片描述

混淆矩阵和ROC曲线(类别为两类)

# 混淆矩阵:# 加载数据df = pd.read_csv("/wdbc.data",header=None)'''乳腺癌数据集:569个恶性和良性肿瘤细胞的样本,M为恶性,B为良性'''# 做基本的数据预处理from sklearn.preprocessing import LabelEncoderX = df.iloc[:,2:].valuesy = df.iloc[:,1].valuesle = LabelEncoder()    #将M-B等字符串编码成计算机能识别的0-1y = le.fit_transform(y)le.transform(['M','B'])# 数据切分8:2from sklearn.model_selection import train_test_splitX_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,stratify=y,random_state=1)from sklearn.svm import SVCpipe_svc = make_pipeline(StandardScaler(),SVC(random_state=1))from sklearn.metrics import confusion_matrixpipe_svc.fit(X_train,y_train)y_pred = pipe_svc.predict(X_test)confmat = confusion_matrix(y_true=y_test,y_pred=y_pred)fig,ax = plt.subplots(figsize=(2.5,2.5))ax.matshow(confmat, cmap=plt.cm.Blues,alpha=0.3)for i in range(confmat.shape[0]):    for j in range(confmat.shape[1]):        ax.text(x=j,y=i,s=confmat[i,j],va='center',ha='center')plt.xlabel('predicted label')plt.ylabel('true label')plt.show()

在这里插入图片描述

# 绘制ROC曲线:from sklearn.metrics import roc_curve,aucfrom sklearn.metrics import make_scorer,f1_scorescorer = make_scorer(f1_score,pos_label=0)gs = GridSearchCV(estimator=pipe_svc,param_grid=param_grid,scoring=scorer,cv=10)y_pred = gs.fit(X_train,y_train).decision_function(X_test)#y_pred = gs.predict(X_test)fpr,tpr,threshold = roc_curve(y_test, y_pred) ###计算真阳率和假阳率roc_auc = auc(fpr,tpr) ###计算auc的值plt.figure()lw = 2plt.figure(figsize=(7,5))plt.plot(fpr, tpr, color='darkorange',         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc) ###假阳率为横坐标,真阳率为纵坐标做曲线plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')plt.xlim([-0.05, 1.0])plt.ylim([-0.05, 1.05])plt.xlabel('False Positive Rate')plt.ylabel('True Positive Rate')plt.title('Receiver operating characteristic ')plt.legend(loc="lower right")plt.show()

在这里插入图片描述

转载地址:http://lagwi.baihongyu.com/

你可能感兴趣的文章
托盘使用--最小化到托盘,双击托盘立即还原显示
查看>>
ORACLE客户端连接配置 tnsnames.ora内容
查看>>
Ubuntu 12.04添加开机启动项
查看>>
MFC对话框使用标签页控件
查看>>
redhat中vsftp开机自启动
查看>>
修改RedHat默认登录方式
查看>>
按字段值分组表中记录
查看>>
windows批处理实现telnet登陆和运行命令--还有问题
查看>>
windows批处理实现telnet登陆和运行命令--设置缺省输入法为英文
查看>>
freetds使用-远程访问SQL Server库
查看>>
linux获得系统编码
查看>>
Ubuntu安装glib
查看>>
MySQL存储过程,生成大量数据
查看>>
查询字段值出现多次的字段值
查看>>
SQL Server表存在则进行查重 SQL语句
查看>>
redhat 9 下sqlite 3的安装及编程
查看>>
两个同步表的字段复制.Oracle.
查看>>
windows MySQL 报“Got a packet bigger than 'max_allowed_packet' bytes”错误,解决过程.
查看>>
MFC ADO连MySQL,使用数据源.
查看>>
在Redhat9下静态编译glib库.
查看>>