博客
关于我
20200210_logistic回归来预测违约的概率
阅读量:103 次
发布时间:2019-02-26

本文共 5236 字,大约阅读时间需要 17 分钟。

这是国外那大哥第二次下单,还是很好的完成了,但是第三次单子没有抓住,确实不会,出国留学生就是有钱,我也想(手动滑稽)

In this homework, we use logistic regression to predict the probability of default using incomeand balance on the Default data set. We will also estimate the test error of this logistic regressionmodel using the validation set approach. Do not forget to set a random seed before beginningyour analysis.

在这次作业中,我们使用logistic回归来预测违约的概率,使用默认数据集上的income和balance。我们还将使用validation set方法来估计这个logistic回归模型的测试误差。在开始分析之前,不要忘记设置一个随机种子。

  1. (a) Fit a multiple logistic regression model that uses income and balance to predict the probability of default, using only the observations

1.(A)拟合多元Logistic回归模型,利用收入和平衡来预测.违约概率,只使用观测结果

#导入包import pandas as pdfrom sklearn import metricsimport warningswarnings.filterwarnings("ignore")
test=pd.read_excel('Default.xlsx')test.head()
default student balance income
1 No No 729.526495 44361.625074
2 No Yes 817.180407 12106.134700
3 No No 1073.549164 31767.138947
4 No No 529.250605 35704.493935
5 No No 785.655883 38463.495879
#将类别型变量转化为数值变量def fun(x):    if 'No' in x:        return 0    else:        return 1test['default']=test.apply(lambda x: fun(x['default']),axis=1)
#定义训练集X=test[['balance','income']]y=test['default']
from sklearn.linear_model import LogisticRegression# 准确率lr_acc=[]# 构建LogisticRegression模型(默认参数即可),并调用fit进行模型拟合model = LogisticRegression()model.fit(X,y)# 计算LogisticRegression在测试集上的误差率a=model.predict_proba(X)# 打印误差率result=[]for i in range(len(a)):    if a[i][1]>0.5:        result.append(1)    else:        result.append(0)print('误差: %.4f' % (1-metrics.recall_score(y,result,average='weighted')))
误差: 0.0336

(b) Using the validation set approach, estimate the test error of this model. In order to do this, you must perform the following steps:

利用验证集方法,对模型的测试误差进行估计。为此,您必须执行以下步骤:

i. Split the sample set into a training set and a validation set.

将样本集分成训练集和验证集。

from sklearn.model_selection import train_test_split# 使用train_test_split方法,划分训练集和测试集,指定80%数据为训练集,20%为验证集X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=2020)X_validation, X_test, y_validation, y_test = train_test_split(X_test,y_test, test_size=0.1,random_state=2020)

ii. Fit a multiple logistic regression model using only the training observations.

仅用训练观测值拟合多元Logistic回归模型。

from sklearn.linear_model import LogisticRegressionlr_acc=[]model = LogisticRegression()model.fit(X_train,y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,          verbose=0, warm_start=False)

Obtain a prediction of default status for each individual in the validation set by computing theposterior probability of default for that individual, and classifying the individual to the defaultcategory if the posterior probability equals 0.5.

通过计算该个体的后验违约概率,获得验证集中每个个体的违约状态预测,如果后验概率等于0.5,则将该个体分类为defaultcategory。

a=model.predict_proba(X_validation)
result=[]for i in range(len(a)):    if a[i][1]>0.5:        result.append(1)    else:        result.append(0)
print('误差: %.4f' % (1-metrics.recall_score(y_validation,result,average='weighted')))
误差: 0.0361

© Repeat the process in (b) three times, using three different splits of the observations into a

training set and a validation set. Comment on the results obtained
©在(b)中重复上述过程三次,将观察结果分成训练集和验证集。

from sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LogisticRegressionfor i in range(3):    X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=2020)    X_validation, X_test, y_validation, y_test = train_test_split(X_test,y_test, test_size=0.1,random_state=2020)    model = LogisticRegression()    model.fit(X_train,y_train)    a=model.predict_proba(X_validation)    result=[]    for i in range(len(a)):        if a[i][1]>0.5:            result.append(1)        else:            result.append(0)    from sklearn import metrics    print('误差: %.4f' % (1-metrics.recall_score(y_validation,result,average='weighted')))
误差: 0.0361误差: 0.0361误差: 0.0361

(d) Now consider a logistic regression model that predicts the probability of default using

income, balance, and a dummy variable for student. Estimate the test error for this model using
the validation set approach. Comment on whether or not including a dummy variable for student
leads to a reduction in the test error rate.

现在考虑一个逻辑回归模型,该模型使用收入、余额和学生的虚拟变量来预测违约概率。使用验证集方法估计该模型的测试误差。评论是否包括一个虚拟变量的学生导致降低测试错误率。

def fun(x):    if 'No' in x:        return 0    else:        return 1test['student']=test.apply(lambda x: fun(x['student']),axis=1)X=test[['balance','income','student']]y=test['default']X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=2020)X_validation, X_test, y_validation, y_test = train_test_split(X_test,y_test, test_size=0.1,random_state=2020)model = LogisticRegression()model.fit(X_train,y_train)a=model.predict_proba(X_validation)result=[]for i in range(len(a)):    if a[i][1]>0.5:        result.append(1)    else:        result.append(0)from sklearn import metricsprint('误差: %.4f' % (1-metrics.recall_score(y_validation,result,average='weighted')))
误差: 0.0361

答:影响不大

转载地址:http://exqz.baihongyu.com/

你可能感兴趣的文章
MySQL批量插入数据遇到错误1213的解决方法
查看>>
mysql技能梳理
查看>>
MySQL报Got an error reading communication packets错
查看>>
Mysql报错Can‘t create/write to file ‘/tmp/#sql_3a8_0.MYD‘ (Errcode: 28 - No space left on device)
查看>>
MySql报错Deadlock found when trying to get lock; try restarting transaction 的问题解决
查看>>
MySQL报错ERROR 1045 (28000): Access denied for user ‘root‘@‘localhost‘
查看>>
Mysql报错Packet for query is too large问题解决
查看>>
mysql报错级别_更改MySQL日志错误级别记录非法登陆(Access denied)
查看>>
Mysql报错:too many connections
查看>>
MySQL报错:无法启动MySQL服务
查看>>
mysql授权用户,创建用户名密码,授权单个数据库,授权多个数据库
查看>>
mysql排序查询
查看>>
MySQL排序的艺术:你真的懂 Order By吗?
查看>>
MySQL排序的艺术:你真的懂 Order By吗?
查看>>
Mysql推荐书籍
查看>>
Mysql插入数据从指定选项中随机选择、插入时间从指定范围随机生成、Navicat使用存储过程模拟插入测试数据
查看>>
MYSQL搜索引擎
查看>>
mysql操作数据表的命令_MySQL数据表操作命令
查看>>
mysql操作日志记录查询_如何使用SpringBoot AOP 记录操作日志、异常日志?
查看>>
MySQL支持的事务隔离级别,以及悲观锁和乐观锁的原理和应用场景?
查看>>