python大数据分析实例-用Python整合的大数据分析实例

大家好，又见面了，我是你们的朋友全栈君。

用Python进行数据分析的好处是，它的数据分析库目前已经很全面了，有NumPy、pandas、SciPy、scikit-learn、StatsModels，还有深度学习、神经网络的各类包。基本上能满足大部分的企业应用。用Python的好处是从数据抽取、数据收集整理、数据分析挖掘、数据展示，都可以在同一种Python里实现，避免了开发程序的切换。

这里就和大家分享我做的一个应用实例。解决问题：自动进行销售预测，提高准确率，减少人工一个一个SKU进行预测分析。最终的效果如下图：

1、用到的工具

当然我们只需要用Python和一些库就可以了。

pandas：数据整理

numpy：pandas的前提，科学计算

MySQLdb：mysql数据库链接

statsmodels：统计建模

pylab：图形处理

flask：web框架

2、Flask的安装

请参考 http://docs.jinkan.org/docs/flask/，在Flask的app目录下建立一个forecasting.py的python文件，在Flask的app的templates目录下建立一个forecastin.html的模版文件，两个文件的内容如下：

forecasting.py

# -*- coding: utf-8 -*-

from app import app

from flask import render_template

@app.route(‘/forecasting/’)

def forecasting(item=None):

return render_template(“forecasting.html”)

forecastin.html

Hello World

在DOS窗口运行

python d:pyflaskrun.py

在浏览器打开http://127.0.0.1:5000/就可以看到forecasting.html模版的内容显示了。

接下来我们从头建一个预测模型。

3、建立数据库并填写数据

CREATE TABLE `sale` (

`SaleMonth` datetime DEFAULT NULL,

`Sale` float DEFAULT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

数据自己录入啦。

4、相关库的引入

我们现在在之前第2点建立的文件的基础上进行修改，

在forecasting.py的头部引入以下库

# -*- coding: utf-8 -*-

from app import app

from flask import render_template

import pylab

import pandas as pd

import numpy as np

from pandas import Series,DataFrame

import MySQLdb

import pandas.io.sql as sql

import statsmodels.api as sm

import time

import datetime

from dateutil.relativedelta import relativedelta

import random

5、定义路由

@app.route(‘/forecasting/’)

意思就是我们访问例如http://127.0.0.1:5000/forecasting/2的地址对于就是解析到forecasting.py文件，其中是可变的URL部分，如上面的URL的2

6、定义函数

def forecasting(lag=None):

其中lag就是接受URL中的参数，我们定义lag是自回归函数的滞后期数

7、数据库链接

conn = MySQLdb.connect(host=’127.0.0.1′,user=’root’,passwd=’123456′,db=’bi’,charset=’utf8′)

str_sql = “select SaleMonth as Month,Sale from sale order by SaleMonth”

sale=sql.read_sql(str_sql,conn)

8、数据处理

我们整理数据以适合使用。

##//数据处理

#转换数据中的月份为日期类型，并把它定义为pandas索引

sale.Month = pd.to_datetime(sale.Month)

sale = sale.set_index(“Month”)

##//提取最大月份和最小月份

start = min(sale.index)

end = max(sale.index)

##定义预测的月份，在最大月份的基础上加1-4

pre_start =end+relativedelta(months=1)

pre_end =end+relativedelta(months=4)

#必要的转换

pre_start =pre_start.strftime(‘%Y-%m-%d’)

pre_end =pre_end.strftime(‘%Y-%m-%d’)

#生成时间序列，从最小月份到最大月份

i = pd.date_range(start, end, freq=’MS’)

df = DataFrame(i,i)

#定义列、定义索引index名

df.columns = [‘T’]

df.index.names =[‘Month’]

#把sale与df合并，通过索引

rs = pd.merge(sale,df,left_index=True,right_index=True,how=’outer’)

#删除临时列T，并把rs转换为html，方便后面输出到模版中

del rs[‘T’]

data = rs.to_html()

9、数据预测

##预测

#对rs进行对数变换

rs = np.log(rs)

#对rs进行自回归，lag是自回归的滞后因子，来自函数的lag参数，即来自RUL的参数

r = sm.tsa.AR(rs).fit(maxlag=lag, method=’mle’, disp=-1)

#对未来四个月进行预测

fcst_lg = r.predict(start,pre_end)

#对预测的结果进行指数变换，因为之前做了对数变换

fcst = np.exp(fcst_lg)

#转换fcst为pandas的DataFrame格式

fcst = DataFrame(fcst)

#定义列名和索引，用于和原来的rs合并

fcst.columns=[‘fcst’]

fcst.index.names =[‘Month’]

#合并fcst和rs到rs_out

rs_out = pd.merge(sale,fcst,left_index = True,right_index = True,how=’outer’)

#rs_out转换为记录格式，再转换为html格式，以方便输出到模版中显示

#取得最后的4行作为预测的显示输出，不知道为什么rs_out[-4:-1]这个输出漏了最后一行

rs_fcst = rs_out[-4:-1]

rs_fcst = rs_fcst.to_html()

rs2 = rs_out.to_records()

rs_out = rs_out.to_html()

10、数据整理

我使用了echart web图标框架进行显示。

##以下是处理表格数据输出到echart的json格式

tmp=u””

tmp1=””

tmp2=””

tmp3=””

for t in rs2:

#tmp1 += “{‘label’:'” + str(t.Month.year)+”/”+str(t.Month.month) + “‘,’value’:'” + str(t.Qty) + “‘},”

#tmp1 += “”

tmp1 += “‘”+str(t.Month.year)+”/”+str(t.Month.month)+”‘,”

#tmp2 += “”

tmp2 += str(‘%.0f’ % t.Sale) +”,”

#tmp3 += “”

tmp3 += str(‘%.0f’ % t.fcst) +”,”

tmp +=””+tmp1+””

tmp +=u””+tmp2+””

tmp +=u””+tmp3+””+””

tmp1 = tmp1[:-1]

tmp2 = tmp2[:-1]

tmp2 = tmp2.replace(‘nan’,”-”)

tmp3 = tmp3[:-1]

tmp=u”'{

title : {text: ‘测试’,subtext: ‘纯属虚构’},

tooltip : {trigger: ‘axis’},

legend: {data:[‘实际销售’,’预测销售’]},

toolbox: {

show : true,

feature : {

mark : {show: false},dataView : {show: true, readOnly: false},

magicType : {show: true, type: [‘line’, ‘bar’]},

restore : {show: true},saveAsImage : {show: false}

}

calculable : true,

dataZoom : {show : true,realtime : true,start : 0,end : 100},

xAxis : [{type : ‘category’,data : [%s]}],

yAxis : [{type : ‘value’,min : 5000,scale : true}],

series : [

{

name:’实际销售’,type:’bar’,data:[%s],

markPoint : {

data : [{type : ‘max’, name: ‘最大值’},{type : ‘min’, name: ‘最小值’}]

markLine : {data : [{type : ‘average’, name: ‘平均值’}]}

{

name:’预测销售’,type:’line’,data:[%s],

}

]

};”’ %(tmp1,tmp2,tmp3)

11、生成公式

生成一个公式能更直观显示变量之间的关系。

#生成动态公式图片

rp = r.params

ftext=”

i=0

for rp1 in rp:

if (i==0) and (rp1>0) :const = ‘+’ + str((“%.4f” % rp1))

if (i==0) and (rp1<0) :const = str((“%.4f” % rp1))

if (i==1):ftext = ftext + str((“%.4f” % rp1))+’y_{t-‘+str(i)+’}’

if (i>1) and (rp1>0):ftext = ftext + ‘+’ + str((“%.4f” % rp1))+’y_{t-‘+str(i)+’}’

if (i>1) and (rp1<0):ftext = ftext + str((“%.4f” % rp1))+’y_{t-‘+str(i)+’}’

i +=1

f = r’$y_{t}=’+ftext+const + ‘$’

f2 = r’$y=ln(w_{t})$’

fig = pylab.figure()

#设置背景为透明

fig.patch.set_alpha(0)

text = fig.text(0, 0, f)

# 把公式用公式图片的方式保存

dpi = 300

fig.savefig(‘d:/py/formula.png’, dpi=dpi)

# Now we can work with text’s bounding box.

bbox = text.get_window_extent()

width, height = bbox.size / float(dpi/4) + 0.005

# Adjust the figure size so it can hold the entire text.

fig.set_size_inches((width, height))

# Adjust text’s vertical position.

dy = (bbox.ymin/float(dpi))/height

text.set_position((0, -dy))

# Save the adjusted text.

url = ‘D:/py/Flask/app/static/images/1.png’

fig.savefig(url, dpi=dpi)

12、输出到模版

把py程序中的在模版中用到的结果输出到模版。

return render_template(“forecasting.html”,r=r,rs_out=rs_out,tmp=tmp,lag=lag,f=f,f2=f2,rs_fcst=rs_fcst)

13、设计模版

我们可以用{
{变量名}}来接受来自py程序的变量。

分析结果

// 路径配置

require.config({

paths:{

‘echarts’ : ‘/static/ECharts/build/echarts’,

‘echarts/chart/bar’ : ‘/static/ECharts/build/echarts’,

‘echarts/theme/macarons’:’/static/ECharts/src/theme/macarons’,

}

});

require(

[

‘echarts’,

‘echarts/theme/macarons’,

‘echarts/chart/bar’, // 使用柱状图就加载bar模块，按需加载

‘echarts/chart/line’ // 使用柱状图就加载bar模块，按需加载

function (ec,theme) {

// 基于准备好的dom，初始化echarts图表

var myChart = ec.init(document.getElementById(‘main’),theme);

var option = {
{tmp | safe}}

myChart.setOption(option);

}

);

.right{text-align: right}

body{font-size: 12px;background:white}

Summary of AR Results

Lag length：

{
{r.k_ar}}

Samples：

{
{r.nobs}}

Model：

—————————————–

AIC：

{
{‘%.4f’ % r.aic}}

BIC：

{
{‘%.4f’ % r.bic}}

FPE：

{
{‘%.4f’ % r.fpe}}

HQIC：

{
{‘%.4f’ % r.hqic}}

———————————————————-

Results for equation

==========================================================

coefficient

std.error

t-stat

p-value

{% for i in range(lag+1) %}

{% if i==0 %}

const

{% else %}

Y(t-{
{i}})

{% endif %}

{
{‘%.4f’ % r.params[i]}}

{
{‘%.4f’ % r.bse[i]}}

{
{‘%.4f’ % r.tvalues[i]}}

{
{‘%.4f’ % r.pvalues[i]}}

{% endfor %}

———————————————————-

预测

==========================================================

{
{rs_fcst | safe}}

1.png?

14、实际应用

在这各例子中，我们只是对一个产品、一个模型、一个参数进行了预测。

在实际应用中，可以批量对产品、多个模型、多种参数进行预测，写一个判定预测模型好坏的算法，自动确定每种产品的最优模型和参数，定期自动计算各产品的预测值。

希望这个思路能帮到大家。

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/139854.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...

python大数据分析实例-用Python整合的大数据分析实例

相关推荐

时间序列数据库概览

prototype.js中的class.create()方法

网站检测空链、死链工具（Xenu）

外链检测工具，反链友链检测工具

hive sql和mysql区别_mysql改表名语句

js-函数的prototype

发表回复