大家好，又见面了，我是你们的朋友全栈君。

如何从tushare获取股票历史数据写入自己的MySQL数据库

点击 https://tushare.pro/register?reg=414428 ，免费注册后即可获取 tushare 的 token，就可以下载各种金融数据了。

1. tushare推荐方法

如果你需要读取全部股票的历史数据，tushare 给的建议是按 “天” 获取。因为 tushare api 限制一次获取最高5000条记录，而A股市场目前有3000多只股票，提取一次数据不会超过 api 的限制记录数。

代码如下：

import tushare as ts
pro = ts.pro_api()
df = pro.daily(trade_date='20200325')

然后通过日期循环，就可以获取所有股票的历史数据了。
日期信息可以通过交易日历获得：

#获取20200101～20200401之间所有有交易的日期
df = pro.trade_cal(exchange='SSE', is_open='1', 
                            start_date='20200101', 
                            end_date='20200401', 
                            fields='cal_date')
print(df.head())

输出

cal_date
0 20200102
1 20200103
2 20200106
3 20200107
4 20200108

为了保持数据提取的稳定性，tushare 建议先建立一个专门的函数，实现一个重试机制，见下面代码：

def get_daily(self, ts_code='', trade_date='', start_date='', end_date=''):
    for _ in range(3):
        try:
            if trade_date:
                df = self.pro.daily(ts_code=ts_code, trade_date=trade_date)
            else:
                df = self.pro.daily(ts_code=ts_code, start_date=start_date, end_date=end_date)
        except:
            time.sleep(1)
        else:
            return df

然后通过循环调取数据：

for date in df['cal_date'].values:
     df = get_daily(date)

上述方法使用的是 pro.daily() 函数，目前 tushare 提供了一个新的“通用行情接口” pro_bar() ，而且内部已经存在重试机制，所以现在可以直接用 ts.pro_bar() 函数来获取历史数据了。

2.获取个别数据

如果不需要那么多的数据，只要个别股票的所有数据，还可以按tscode来获取。
使用 ts.pro_bar() 代替 pro.daily() 。

上一篇《学习python想连接MySQL，没有练习数据怎么办？》已经把股票基础信息保存在MySQL数据库里了，本篇需要从 stock_basic 里获取上市公司的上市日期。

2.1. 在数据库stock下，增加表 stock _all

用 Navicat 看着更方便，设 ts_code + trade_date 作为主键，避免数据重复。
在这里插入图片描述

2.2. 代码如下

# -*- coding: utf-8 -*-
#
# Author: wxb
# Purpose: 初始化数据库 stock_all，数据来源 tushare, 数据接口说明 https://waditu.com/document/2?doc_id=109
# Latest Version: V1.0 @ 2021/1/10 15:20
# File: init_stock_all.py

import sys
import time
import pymysql
import tushare as ts


def str_date_to_num(str_date):
    # 把字符串日期转换为以‘秒’为单位的整数
    tmp_date = time.strptime(str_date, "%Y%m%d")
    second = int(time.mktime(tmp_date))
    return second

def num_to_str_date(second):
    # 把以‘秒’为单位的日期整数转换为字符串日期
    tmp_date = time.localtime(second)
    str_date = time.strftime("%Y%m%d", tmp_date)
    return str_date


if __name__ == '__main__':
    # 设置 TUSHARE_token, 初始化API接口
    ts.set_token('你从tushare获得的token')
    pro = ts.pro_api()
    # 建立数据库连接
    db = pymysql.connect(host='127.0.0.1', user='root', password='你的MySQL密码', database='stock', charset='utf8')
    cursor = db.cursor()
    # 举例用的2个股票代码
    stock_pool = ['000001.SZ', '000002.SZ']
    for tscode in stock_pool:
        # 从 stock_basic 表中获取上市日期
        sql_query = f'SELECT `list_date` FROM stock_basic WHERE `ts_code` = "{tscode}"'
        # print(sql_query)
        cursor.execute(sql_query)
        data = cursor.fetchone()
        if not data:
            print(f'Failed to get the list_date of "{tscode}"')
            continue
        # 设置起始日期
        start_date = data[0]
        # 设置当前日期为结束日期
        end_date = time.strftime('%Y%m%d', time.localtime())
        print(f'"{tscode}" : {start_date} - {end_date}')

        # 因为tushare api对提取有限制，每次最多提取5000条数据，这里要进行分次提取
        # 字符串日期转换成数字用于计算
        s_dt = str_date_to_num(start_date) - 1 * (24 * 60 * 60)
        e_dt = str_date_to_num(end_date)
        # 每次提取日期范围5000天，因为含有非交易日，所以返回数据不会超过5000条记录
        # 因为日期转换返回的是'秒'，所以这里要乘以“ 24小时 * 3600秒/小时 ”
        step = 5000 * (24 * 60 * 60)
        tot_records = 0    # 写入数据库的记录数
        tot_rows = 0    # 从api读到的记录数
        for dt in range(s_dt, e_dt, step):
            sdate = num_to_str_date(dt + 1 * (24 * 60 * 60))
            t_dt1 = dt + step
            t_dt2 = t_dt1 if t_dt1 < e_dt else e_dt
            edate = num_to_str_date(t_dt2)
            try:
                # 获取stock日线行情，同时要求返回复权因子
                df = ts.pro_bar(ts_code=tscode, start_date=sdate, end_date=edate, adj='', adjfactor=True)
            except Exception as e:
                print(e)
                continue
            # 如果没有数据返回，继续
            if df is None:
                continue
            # 返回 rows 个记录，写入数据库
            rows = df.shape[0]
            tot_rows += rows
            print('"%s" : %4s records returned for date range %s - %s' % (tscode, rows, sdate, edate))
            values = []
            for index in range(rows):
                stk_data = tuple(df.loc[index])
                # if index <= 10: print(stk_data)
                values.append(stk_data)
            try:
                sql_insert = 'INSERT INTO `stock_all` (`ts_code`, `trade_date`, `open`, `high`, `low`, `close`, `pre_close`, `change`, `pct_chg`, `vol`, `amount`, `adj_factor`) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'
                tot_records += cursor.executemany(sql_insert, values)
                db.commit()
            except Exception as e:
                print(e)
                db.rollback()
        print(f'"{tscode}" : {tot_records} of {tot_rows} records were inserted.\n')
    cursor.close()
    db.close()
    print('Done!')