当前位置:首页 > 逻辑回归进行异网(联通电信)用户年龄分类
逻辑回归算法可以高效的解决二分类问题,也可以解决多分类只是没有KNN高效,KNN天生就是解决多分类,不过KNN过于简单,适用性没有逻辑回归好。我是想通过移动用户的交际圈来判断异网用户是否25岁以下,想看看到底年轻人有多少去了联通。这里就提取了两个特征,一个是交际圈前5名的平均年龄,再一个就是平均联系亲密度。然后利用网内对网内的数据进行模型训练,准确率上去后再开始应用。 这块能用到的地方还是挺多的,比如判断用户是否会订购业务,是否即将离网,是否套餐降档等等,但是,处理实际的用户多分类问题,最佳的还是神经网络深度学习,但是运算量太大了,我的机器配置过低了。实际应用逻辑回归需要改进的地方比较多,不像理论数据,简单运算就能90以上准确率。还得继续学学别人的优化算法。
用的是python3.6,参考代码是2.7的,花了点时间看懂然后改代码,改参数,还是不熟,报错就百度,搞了一个多小时终于OK了。代码和结果如下
# -*- coding: utf-8 -*- \
Created on Wed Apr 11 19:49:13 2018
from numpy import *
import matplotlib.pyplot as plt import time
# calculate the sigmoid function def sigmoid(inX):
return 1.0 / (1 + exp(-inX))
# train a logistic regression model using some optional optimize algorithm
# input: train_x is a mat datatype, each row stands for one sample
# train_y is mat datatype too, each row is the corresponding label
# opts is optimize option include step and maximum number of iterations
def trainLogRegres(train_x, train_y, opts): # calculate training time #startTime = time.time()
numSamples, numFeatures = shape(train_x) alpha = opts['alpha']; maxIter = opts['maxIter'] weights = ones((numFeatures, 1))
# optimize through gradient descent algorilthm for k in range(maxIter):
if opts['optimizeType'] == 'gradDescent': # gradient descent algorilthm
output = sigmoid(train_x * weights)
error = train_y - output
weights = weights + alpha * train_x.transpose() * error elif opts['optimizeType'] == 'stocGradDescent': # stochastic gradient descent
for i in range(numSamples):
output = sigmoid(train_x[i, :] * weights) error = train_y[i, 0] - output
weights = weights + alpha * train_x[i, :].transpose() * error elif opts['optimizeType'] == 'smoothStocGradDescent': # smooth stochastic gradient descent
# randomly select samples to optimize for reducing cycle fluctuations
dataIndex = list(range(numSamples)) for i in range(numSamples): alpha = 4.0 / (1.0 + k + i) + 0.01
randIndex = int(random.uniform(0, len(dataIndex)))
共分享92篇相关文档