基于Tensorflow的DCNN实现(A Convolutional Neural Network for Modelling Sentences)「建议收藏」

基于Tensorflow的DCNN实现(A Convolutional Neural Network for Modelling Sentences)「建议收藏」本文我写了一个基于tensorflow的DCNN的实现,原文是AConvolutionalNeuralNetworkforModellingSentences,地址如下:https://arxiv.org/abs/1404.2188先给出我自己的github的代码https://github.com/jacky123465/DCNN(如果是python3.几的版本是可以直接运行…

大家好,又见面了,我是你们的朋友全栈君。

本文我写了一个基于tensorflow的DCNN的实现,原文是A Convolutional Neural Network for Modelling Sentences,地址如下:

https://arxiv.org/abs/1404.2188

先给出我自己的github的代码https://github.com/jacky123465/DCNN(如果是python3.几的版本是可以直接运行的哦)

1. 首先是数据预处理的部分

文件名字为DataUnits.py

def clean_str(string):用于去掉数据中一些类似于问号,感叹号的内容

def clean_str(string):
    string = re.sub(r"[^A-Za-z0-9:(),!?\'\`]", " ", string)
    string = re.sub(r" : ", ":", string)
    string = re.sub(r"\'s", " \'s", string)
    string = re.sub(r"\'ve", " \'ve", string)
    string = re.sub(r"n\'t", " n\'t", string)
    string = re.sub(r"\'re", " \'re", string)
    string = re.sub(r"\'d", " \'d", string)
    string = re.sub(r"\'ll", " \'ll", string)
    string = re.sub(r",", " , ", string)
    string = re.sub(r"!", " ! ", string)
    string = re.sub(r"\(", " \( ", string)
    string = re.sub(r"\)", " \) ", string)
    string = re.sub(r"\?", " \? ", string)
    string = re.sub(r"\s{2,}", " ", string)
return string.strip().lower()

load_data_and_labels():用于从文件中读入x和y。同时对y进行如下处理:假设y是1到6,那么我们将其变成000001,000010,000100,001000,010000,100000

def load_data_and_labels():
    """
    Loads data from files, splits the data into words and generates labels.
    Returns split sentences and labels.
    """
    # Load data from files
    folder_prefix = 'data/'
    #测试
    #增加了一个'rb'

    x_train = list(open(folder_prefix+"train", 'rb').readlines())
    x_test = list(open(folder_prefix+"test", 'rb').readlines())
    test_size = len(x_test)
    x_text = x_train + x_test
    #修改
    le = len(x_text)
    for i in range(le):
        encode_type = chardet.detect(x_text[i])
        x_text[i] = x_text[i].decode(encode_type['encoding'])  # 进行相应解码,赋给原标识符(变量)
    #修改
    x_text = [clean_str(sent) for sent in x_text]
    y = [s.split(' ')[0].split(':')[0] for s in x_text]
    x_text = [s.split(" ")[1:] for s in x_text]
    # Generate labels
    all_label = dict()
    for label in y:
        if not label in all_label:
            all_label[label] = len(all_label) + 1
    one_hot = np.identity(len(all_label))
    y = [one_hot[ all_label[label]-1 ] for label in y]
return [x_text, y, test_size]

def pad_sentences(sentences, padding_word=”<PAD/>”):因为句子长短不一,所以把他的长短补充完整

def pad_sentences(sentences, padding_word="<PAD/>"):
    """
    Pads all sentences to the same length. The length is defined by the longest sentence.
    Returns padded sentences.
    """
    sequence_length = max(len(x) for x in sentences)
    padded_sentences = []
    for i in range(len(sentences)):
        sentence = sentences[i]
        num_padding = sequence_length - len(sentence)
        new_sentence = sentence + [padding_word] * num_padding
        padded_sentences.append(new_sentence)
return padded_sentences

load_data:导入数据的主函数

def load_data():
    """
    Loads and preprocessed data
    Returns input vectors, labels, vocabulary, and inverse vocabulary.
    """
    # Load and preprocess data
    sentences, labels, test_size = load_data_and_labels()
    sentences_padded = pad_sentences(sentences)
    vocabulary, vocabulary_inv = build_vocab(sentences_padded)
    x, y = build_input_data(sentences_padded, labels, vocabulary)
return [x, y, vocabulary, vocabulary_inv, test_size]

batch_iter:将数据按照batch_size的大小划分,放进网络训练:

def batch_iter(data, batch_size, num_epochs):
    """
    Generates a batch iterator for a dataset.
    """
    data = np.array(data)
    data_size = len(data)
    num_batches_per_epoch = int(len(data)/batch_size) + 1
    for epoch in range(num_epochs):
        # Shuffle the data at each epoch
        shuffle_indices = np.random.permutation(np.arange(data_size))
        shuffled_data = data[shuffle_indices]
        for batch_num in range(num_batches_per_epoch):
            start_index = batch_num * batch_size
            end_index = (batch_num + 1) * batch_size
            if end_index > data_size:
                end_index = data_size
                start_index = end_index - batch_size
yield shuffled_data[start_index:end_index]

2.然后是模型部分

包括原文中的卷积池化,全连接等操作

#对model稍作改动
import tensorflow as tf

class DCNN():
    def __init__(self, batch_size, sentence_length, num_filters, embed_size, top_k, k1):
        self.batch_size = batch_size
        self.sentence_length = sentence_length
        self.num_filters = num_filters
        self.embed_size = embed_size
        self.top_k = top_k
        self.k1 = k1

    def per_dim_conv_k_max_pooling_layer(self, x, w, b, k):
        self.k1 = k
        input_unstack = tf.unstack(x, axis=2)
        w_unstack = tf.unstack(w, axis=1)
        b_unstack = tf.unstack(b, axis=1)
        convs = []
        with tf.name_scope("per_dim_conv_k_max_pooling"):
            for i in range(self.embed_size):
                conv = tf.nn.relu(tf.nn.conv1d(input_unstack[i], w_unstack[i], stride=1, padding="SAME") + b_unstack[i])
                #conv:[batch_size, sent_length+ws-1, num_filters]
                conv = tf.reshape(conv, [self.batch_size, self.num_filters[0], self.sentence_length])#[batch_size, sentence_length, num_filters]
                values = tf.nn.top_k(conv, k, sorted=False).values
                values = tf.reshape(values, [self.batch_size, k, self.num_filters[0]])
                #k_max pooling in axis=1
                convs.append(values)
            conv = tf.stack(convs, axis=2)
        #[batch_size, k1, embed_size, num_filters[0]]
        #print conv.get_shape()
        return conv

    def per_dim_conv_layer(self, x, w, b):
        input_unstack = tf.unstack(x, axis=2)
        w_unstack = tf.unstack(w, axis=1)
        b_unstack = tf.unstack(b, axis=1)
        convs = []
        with tf.name_scope("per_dim_conv"):
            for i in range(len(input_unstack)):
                #yf = input_unstack[i]
                conv = tf.nn.relu(tf.nn.conv1d(input_unstack[i], w_unstack[i], stride=1, padding="SAME") + b_unstack[i])#[batch_size, k1+ws2-1, num_filters[1]]
                convs.append(conv)
            conv = tf.stack(convs, axis=2)
            #[batch_size, k1+ws-1, embed_size, num_filters[1]]
        return conv

    #增加的函数,只用来做folding操作
    def k_max_pooling(self, x, k):
        input_unstack = tf.unstack(x, axis=2)
        out = []
        with tf.name_scope("k_max_pooling"):
            for i in range(len(input_unstack)):
                conv = tf.transpose(input_unstack[i], perm=[0, 2, 1])
                values = tf.nn.top_k(conv, k, sorted=False).values
                values = tf.transpose(values, perm=[0, 2, 1])
                out.append(values)
            fold = tf.stack(out, axis=2)
        return fold

    def fold_k_max_pooling(self, x, k):
        input_unstack = tf.unstack(x, axis=2)
        out = []
        with tf.name_scope("fold_k_max_pooling"):
            for i in range(0, len(input_unstack), 2):
                fold = tf.add(input_unstack[i], input_unstack[i+1])#[batch_size, k1, num_filters[1]]
                conv = tf.transpose(fold, perm=[0, 2, 1])
                values = tf.nn.top_k(conv, k, sorted=False).values #[batch_size, num_filters[1], top_k]
                values = tf.transpose(values, perm=[0, 2, 1])
                out.append(values)
            fold = tf.stack(out, axis=2)#[batch_size, k2, embed_size/2, num_filters[1]]
        return fold

    def full_connect_layer(self, x, w, b, wo, dropout_keep_prob):
        with tf.name_scope("full_connect_layer"):
            h = tf.nn.tanh(tf.matmul(x, w) + b)
            h = tf.nn.dropout(h, dropout_keep_prob)
            o = tf.matmul(h, wo)
        return o

    def DCNN(self, sent, W1, W2, b1, b2, k1, top_k, Wh, bh, Wo, dropout_keep_prob):
        conv1 = self.per_dim_conv_layer(sent, W1, b1)
        conv1 = self.k_max_pooling(conv1, k1)
        #conv1 = self.fold_k_max_pooling(conv1, k1)
        conv2 = self.per_dim_conv_layer(conv1, W2, b2)
        fold = self.fold_k_max_pooling(conv2, top_k)
        #增加一个int
        #fold_flatten = tf.reshape(fold, [-1, int(top_k * self.embed_size * self.num_filters[1] / 4)])
        fold_flatten = tf.reshape(fold, [-1, int(top_k * self.embed_size * self.num_filters[1] / 2)])
        #fold_flatten = tf.reshape(fold, [-1, int(top_k*100*14/4)])
        print(fold_flatten.get_shape())
        out = self.full_connect_layer(fold_flatten, Wh, bh, Wo, dropout_keep_prob)
return out

3.最后是主函数

compile.py文件,整体就是一些初始化和调用模型的过程。每个参数我都写了比较详细的注释,读者可以自己调节


#coding=utf8
from models import *
import dataUtils
import numpy as np
import time
import os
class train():
def __init__(self, x_train, x_dev, y_train, y_dev, x_test, y_test):
self.x_train = x_train
self.x_dev = x_dev
self.y_train = y_train
self.y_dev = y_dev
self.x_test = x_test
self.y_test = y_test
def train(self):
embed_dim = 32
ws = [8, 5]
top_k = 4
k1 = 19
num_filters = [6, 14]
batch_size = 40
n_epochs = 25
num_hidden = 100
sentence_length = 37
num_class = 6
evaluate_every = 200
checkpoint_every = 200
num_checkpoints = 5
# --------------------------------------------------------------------------------------#
def init_weights(shape, name):
return tf.Variable(tf.truncated_normal(shape, stddev=0.01), name=name)
sent = tf.placeholder(tf.int64, [None, sentence_length])
y = tf.placeholder(tf.float64, [None, num_class])
dropout_keep_prob = tf.placeholder(tf.float32, name="dropout")
with tf.name_scope("embedding_layer"):
W = tf.Variable(tf.random_uniform([len(vocabulary), embed_dim], -1.0, 1.0), name="embed_W")
sent_embed = tf.nn.embedding_lookup(W, sent)
# input_x = tf.reshape(sent_embed, [batch_size, -1, embed_dim, 1])
input_x = tf.expand_dims(sent_embed, -1)
# [batch_size, sentence_length, embed_dim, 1]
W1 = init_weights([ws[0], embed_dim, 1, num_filters[0]], "W1")
b1 = tf.Variable(tf.constant(0.1, shape=[num_filters[0], embed_dim]), "b1")
# 增加int()
# W2 = init_weights([ws[1], int(embed_dim/2), num_filters[0], num_filters[1]], "W2")
W2 = init_weights([ws[1], int(embed_dim), num_filters[0], num_filters[1]], "W2")
b2 = tf.Variable(tf.constant(0.1, shape=[num_filters[1], embed_dim]), "b2")
# 增加int
# Wh = init_weights([int(top_k*embed_dim*num_filters[1]/4), num_hidden], "Wh")
Wh = init_weights([int(top_k * embed_dim * num_filters[1] / 2), num_hidden], "Wh")
bh = tf.Variable(tf.constant(0.1, shape=[num_hidden]), "bh")
Wo = init_weights([num_hidden, num_class], "Wo")
model = DCNN(batch_size, sentence_length, num_filters, embed_dim, top_k, k1)
out = model.DCNN(input_x, W1, W2, b1, b2, k1, top_k, Wh, bh, Wo, dropout_keep_prob)
with tf.name_scope("cost"):
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out, labels=y))
# train_step = tf.train.AdamOptimizer(lr).minimize(cost)
predict_op = tf.argmax(out, axis=1, name="predictions")
with tf.name_scope("accuracy"):
acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(y, 1), tf.argmax(out, 1)), tf.float32))
# -------------------------------------------------------------------------------------------#
print('Started training')
with tf.Session() as sess:
# init = tf.global_variables_initializer().run()
global_step = tf.Variable(0, name="global_step", trainable=False)
# 学习率函数
optimizer = tf.train.AdamOptimizer(1e-3)
grads_and_vars = optimizer.compute_gradients(cost)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
# Keep track of gradient values and sparsity
grad_summaries = []
for g, v in grads_and_vars:
if g is not None:
grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g)
sparsity_summary = tf.summary.scalar("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g))
grad_summaries.append(grad_hist_summary)
grad_summaries.append(sparsity_summary)
grad_summaries_merged = tf.summary.merge(grad_summaries)
# Output directory for models and summaries
timestamp = str(int(time.time()))
out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", timestamp))
print("Writing to {}\n".format(out_dir))
# Summaries for loss and accuracy
loss_summary = tf.summary.scalar("loss", cost)
acc_summary = tf.summary.scalar("accuracy", acc)
# Train Summaries
train_summary_op = tf.summary.merge([loss_summary, acc_summary, grad_summaries_merged])
train_summary_dir = os.path.join(out_dir, "summaries", "train")
train_summary_writer = tf.summary.FileWriter(train_summary_dir, sess.graph)
# Dev summaries
dev_summary_op = tf.summary.merge([loss_summary, acc_summary])
dev_summary_dir = os.path.join(out_dir, "summaries", "dev")
dev_summary_writer = tf.summary.FileWriter(dev_summary_dir, sess.graph)
# Checkpoint directory. Tensorflow assumes this directory already exists so we need to create it
checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
checkpoint_prefix = os.path.join(checkpoint_dir, "model")
if not os.path.exists(checkpoint_dir):
os.makedirs(checkpoint_dir)
saver = tf.train.Saver(tf.global_variables(), max_to_keep=num_checkpoints)
# Initialize all variables
sess.run(tf.global_variables_initializer())
def train_step(x_batch, y_batch):
feed_dict = {
sent: x_batch,
y: y_batch,
dropout_keep_prob: 0.5
}
_, step, summaries, loss, accuracy = sess.run(
[train_op, global_step, train_summary_op, cost, acc],
feed_dict)
print("TRAIN step {}, loss {:g}, acc {:g}".format(step, loss, accuracy))
train_summary_writer.add_summary(summaries, step)
def dev_step(x_batch, y_batch, writer=None):
"""
Evaluates model on a dev set
"""
feed_dict = {
sent: x_batch,
y: y_batch,
dropout_keep_prob: 1.0
}
step, summaries, loss, accuracy = sess.run(
[global_step, dev_summary_op, cost, acc],
feed_dict)
print("VALID step {}, loss {:g}, acc {:g}".format(step, loss, accuracy))
if writer:
writer.add_summary(summaries, step)
return accuracy, loss
# 添加list强制装换
batches = dataUtils.batch_iter(list(zip(self.x_train, self.y_train)), batch_size, n_epochs)
# Training loop. For each batch...
max_acc = 0
best_at_step = 0
for batch in batches:
x_batch, y_batch = zip(*batch)
train_step(x_batch, y_batch)
current_step = tf.train.global_step(sess, global_step)
if current_step % evaluate_every == 0:
print("\nEvaluation:")
acc_dev, _ = dev_step(self.x_dev, self.y_dev, writer=dev_summary_writer)
if acc_dev >= max_acc:
max_acc = acc_dev
best_at_step = current_step
path = saver.save(sess, checkpoint_prefix, global_step=current_step)
print("")
if current_step % checkpoint_every == 0:
print('Best of valid = {}, at step {}'.format(max_acc, best_at_step))
saver.restore(sess, checkpoint_prefix + '-' + str(best_at_step))
print('Finish training. On test set:')
acc, loss = dev_step(self.x_test, self.y_test, writer=None)
print(acc, loss)
dev = 300
# Load data
print("Loading data...")
x_, y_, vocabulary, vocabulary_inv, test_size = dataUtils.load_data()
# x_:长度为5952的np.array。(包含5452个训练集和500个测试集)其中每个句子都是padding成长度为37的list(padding的索引为0)
# y_:长度为5952的np.array。每一个都是长度为6的onehot编码表示其类别属性
# vocabulary:长度为8789的字典,说明语料库中一共包含8789各单词。key是单词,value是索引
# vocabulary_inv:长度为8789的list,是按照单词出现次数进行排列。依次为:<PAD?>,\\?,the,what,is,of,in,a....
# test_size:500,测试集大小
# Randomly shuffle data
x, x_test = x_[:-test_size], x_[-test_size:]
y, y_test = y_[:-test_size], y_[-test_size:]
shuffle_indices = np.random.permutation(np.arange(len(y)))
x_shuffled = x[shuffle_indices]
y_shuffled = y[shuffle_indices]
x_train, x_dev = x_shuffled[:-dev], x_shuffled[-dev:]
y_train, y_dev = y_shuffled[:-dev], y_shuffled[-dev:]
print("Train/Dev/Test split: {:d}/{:d}/{:d}".format(len(y_train), len(y_dev), len(y_test)))
yf = train(x_train, x_dev, y_train, y_dev, x_test, y_test)
yf.train()

本人第一次写这种长的博客,如果有写的不好的地方,欢迎大家指正

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/144105.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)


相关推荐

  • unittest测试框架简介

    unittest测试框架简介unitest测试框架简介

    2022年10月14日
  • pycharm使用小技巧_pycharm基本使用方法

    pycharm使用小技巧_pycharm基本使用方法Pycharm作为Python开发最常用的IDE之一,不仅兼容性好,而且功能也相当丰富,比如调试、语法高亮、智能提示等等功能,它还支持web开发框架比如Django等,当你熟悉了它之后,开发效率是相当之高的。但对于新手来说,Pycharm功能丰富的同时也是一把双刃剑,有的小伙伴刚上手之后看到一堆的英文界面难免会懵逼,哈哈哈,没有关系,今天博主就来教大家一些Pycharm最常用的技巧,以及一些pycharm常用的快捷键,让你快速上手Python开发中最常用的IDEPycharm,跟上老司机的车速!一

  • 第四章 :springBoot自动配置原理,加载过程

    第四章 :springBoot自动配置原理,加载过程第四章 :springBoot自动配置原理,加载过程

  • python转置矩阵函数_对python 矩阵转置transpose的实例讲解

    python转置矩阵函数_对python 矩阵转置transpose的实例讲解在读图片时,会用到这么的一段代码:image_vector_len=np.prod(image_size)#总元素大小,3*55*47img=Image.open(path)arr_img=np.asarray(img,dtype=’float64′)arr_img=arr_img.transpose(2,0,1).reshape((image_vector_len,))#4…

  • 微信小程序开发环境(阿里云服务搭建+可运行的demo)「建议收藏」

    最近微信小程序异常火爆,很多人在学习,下面带着大家搭建下微信小程序的调试环境(client+server),并调试一套demo源码(JavaScript和node.js基础即可,微信推荐使用的语言,无前端编程基础,去菜鸟教程简单学习下JavaScript,node.js,mysql即可),方便大家学习。微信小程序搭建环境必需的两点:云服务器,域名,下面一步步给搭建演示如果在一台阿里云服…

  • 神经网络轴承故障诊断_一维卷积神经网络详解

    神经网络轴承故障诊断_一维卷积神经网络详解基于一维卷积神经网络的滚动轴承故障识别提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档文章目录基于一维卷积神经网络的滚动轴承故障识别一、数据预处理二、模型搭建三、使用步骤1.引入库2.读入数据总结一、数据预处理采用美国凯斯西储大学(CWRU)的开放轴承数据库的样本进行实验分析,轴承故障产生的实验台如下图所示。使用电火花加工技术分别在轴承的内圈、外圈和滚动体上引入单点缺陷,故障尺寸分别为7、14和21in,以48kHz采样频率采集不同负载下的故障轴承振动数据用于实验分析。

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号