Trading Using Google Trends

Trading Using Google Trends

本文是学习 Mastering pandas for finance 一书第六章的学习笔记。介绍了一个简单的交易策略,就是根据谷歌搜素的搜索词:debt 的数据制定交易策略,具体来说是这样的,如果 debt 的搜索频率高于过去三天的平均搜索量就全仓卖出,否则就全仓买入。结果表明这是一个很好的策略。

准备工作

下面的代码被我设置为 Python 脚本的模板:

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd
import numpy as np
import tushare as ts
import matplotlib.pyplot as plt
from matplotlib import rcParams

pro = ts.pro_api('你的Tushare密钥')
from matplotlib.font_manager import FontProperties

# 中文字体
cnfont = FontProperties(fname='/Library/Fonts/Songti.ttc', size=14)
# 英文字体
enfont = FontProperties(fname='/Users/czx/Library/Fonts/RobotoSlab-Regular.ttf',
size=14)
# 解决负号'-'显示为方块的问题
rcParams['axes.unicode_minus'] = False
rcParams['savefig.dpi'] = 300 # 图片像素
rcParams['figure.dpi'] = 300 # 分辨率

The data from the paper

该策略最早由一篇名为 Quantifying Trading Behavior in Financial Markets Using Google Trends 提出,该论文公开了其使用的数据集:PreisMoatStanley2013.dat

Python
1
2
3
4
5
6
7
8
9
10
paper = pd.read_csv('PreisMoatStanley2013.dat', \
delimiter = ' ', parse_dates = [0, 1, 100, 101])
paper[:5]
#> Google Start Date ... DJIA Closing Price
#> 0 2004-01-04 ... 10485.18
#> 1 2004-01-11 ... 10528.66
#> 2 2004-01-18 ... 10702.51
#> 3 2004-01-25 ... 10499.18
#> 4 2004-02-01 ... 10579.03
#> [5 rows x 102 columns]
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
data = pd.DataFrame({
'GoogleWE': paper['Google End Date'],
'debt': paper['debt'].astype(np.float64),
'DJIADate': paper['DJIA Date'],
'DJIAClose': paper['DJIA Closing Price'].astype(np.float64)
})
data[:5]
#> GoogleWE debt DJIADate DJIAClose
#> 0 2004-01-10 0.210000 2004-01-12 10485.18
#> 1 2004-01-17 0.210000 2004-01-20 10528.66
#> 2 2004-01-24 0.210000 2004-01-26 10702.51
#> 3 2004-01-31 0.213333 2004-02-02 10499.18
#> 4 2004-02-07 0.200000 2004-02-09 10579.03

Gathering our own DJIA data from Alpha Vantage

我们也可以自己搜集道琼斯工业指数:

Python
1
2
3
4
5
6
7
8
9
10
11
import pandas_datareader.data as web
from datetime import datetime
djia = web.DataReader("DIA", "av-daily", \
access_key = '你在Alpha Vantage申请的密钥', \
start = datetime(2004, 1, 1), end = datetime(2011, 3, 5))
djia_closes = djia['close'].reset_index()
djia_closes[:3]
#> index close
#> 0 2004-01-02 104.37
#> 1 2004-01-05 105.58
#> 2 2004-01-06 105.50

合并论文数据和自己搜集的数据:

Python
1
2
3
4
djia_closes['Date'] = pd.to_datetime(djia_closes['index'])
data = pd.merge(data, djia_closes,
left_on = 'DJIADate', right_on = 'Date')
data[:3]

比较我们搜集的 DJIA 数据和论文中提供的 DJIA 数据:

Python
1
2
3
4
data['close'] = data['close'] * 100
data[['DJIAClose', 'close']].plot()
plt.savefig('DJIA.svg')
plt.show()

Python
1
2
3
4
5
6
7
8
9
10
(data['DJIAClose'] - data['close']).describe()
#> count 373.000000
#> mean -1.924906
#> std 20.575912
#> min -156.550000
#> 25% -12.680000
#> 50% -0.980000
#> 75% 8.430000
#> max 135.770000
#> dtype: float64
R
1
2
3
4
5
# 相关系数
data[['DJIAClose', 'close']].corr()
#> DJIAClose close
#> DJIAClose 1.000000 0.999902
#> close 0.999902 1.000000

我们也可以自行从谷歌上下载谷歌搜索词数据:trends_report_debt.csv

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
our_debt_trends = pd.read_csv('trends_report_debt.csv')
our_debt_trends.Week = pd.to_datetime(our_debt_trends.Week)

final = pd.merge(data.reset_index(), our_debt_trends,
left_on = 'GoogleWE', right_on = 'Week',
suffixes = ['_P', '_O'])
final.drop('Week', inplace = True, axis = 1)
final.set_index('Date', inplace = True)
final[:5]
#> level_0 GoogleWE debt_P ... index close debt_O
#> Date ...
#> 2004-01-12 0 2004-01-10 0.210000 ... 2004-01-12 10517.0 63
#> 2004-01-20 1 2004-01-17 0.210000 ... 2004-01-20 10546.0 60
#> 2004-01-26 2 2004-01-24 0.210000 ... 2004-01-26 10718.0 61
#> 2004-02-02 3 2004-01-31 0.213333 ... 2004-02-02 10530.0 63
#> 2004-02-09 4 2004-02-07 0.200000 ... 2004-02-09 10612.0 61
#> [5 rows x 8 columns]
Python
1
2
3
4
5
6
7
8
9
combined_trends = final[['GoogleWE', 'debt_P', 'debt_O']].set_index('GoogleWE')
combined_trends[:5]
#> debt_P debt_O
#> GoogleWE
#> 2004-01-10 0.210000 63
#> 2004-01-17 0.210000 60
#> 2004-01-24 0.210000 61
#> 2004-01-31 0.213333 63
#> 2004-02-07 0.200000 61

比较我们下载的谷歌趋势数据和论文提供的:

Python
1
2
3
4
combined_trends.corr()
debt_P debt_O
debt_P 1.00000 0.95766
debt_O 0.95766 1.00000

R
1
2
3
4
5
6
fig, ax1 = plt.subplots(figsize = (6, 4))
ax1.plot(combined_trends.index, combined_trends.debt_P, color = 'b')
ax2 = ax1.twinx()
ax2.plot(combined_trends.index, combined_trends.debt_O, color = 'r')
plt.savefig('googleweAndDIA.svg')
plt.show()

Generating order signals

我们的交易策略是:如果 debt 的搜索频率高于过去三天的平均搜索量就全仓卖出,否则就全仓买入。

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
base = final.reset_index().set_index('GoogleWE')
base.drop(['DJIAClose'], inplace = True, axis = 1)
base['PMA'] = base['debt_P'].shift(1).rolling(3).mean()
base['OMA'] = base['debt_O'].shift(1).rolling(3).mean()
base[:5]
#> Date level_0 debt_P ... debt_O PMA OMA
#> GoogleWE ...
#> 2004-01-10 2004-01-12 0 0.210000 ... 63 NaN NaN
#> 2004-01-17 2004-01-20 1 0.210000 ... 60 NaN NaN
#> 2004-01-24 2004-01-26 2 0.210000 ... 61 NaN NaN
#> 2004-01-31 2004-02-02 3 0.213333 ... 63 0.210000 61.333333
#> 2004-02-07 2004-02-09 4 0.200000 ... 61 0.211111 61.333333
#> [5 rows x 9 columns]

base['signal0'] = 0
base.loc[base.debt_P > base.PMA, 'signal0'] = -1
base.loc[base.debt_P < base.PMA, 'signal0'] = 1
base['signal1'] = 0
base.loc[base.debt_O > base.OMA, 'signal1'] = -1
base.loc[base.debt_O < base.OMA, 'signal1'] = 1
base[['debt_P', 'PMA', 'signal0', 'debt_O', 'OMA', 'signal1']][:5]
#> debt_P PMA signal0 debt_O OMA signal1
#> GoogleWE
#> 2004-01-10 0.210000 NaN 0 63 NaN 0
#> 2004-01-17 0.210000 NaN 0 60 NaN 0
#> 2004-01-24 0.210000 NaN 0 61 NaN 0
#> 2004-01-31 0.213333 0.210000 -1 63 61.333333 -1
#> 2004-02-07 0.200000 0.211111 1 61 61.333333 1

Computing returns

Python
1
2
3
4
5
6
7
8
9
10
11
base['PctChg'] = base.close.pct_change().shift(-1)
base['ret0'] = base.PctChg * base.signal0
base['ret1'] = base.PctChg * base.signal1
base[['close', 'PctChg', 'signal0', 'signal1', 'ret0', 'ret1']][:5]
#> close PctChg signal0 signal1 ret0 ret1
#> GoogleWE
#> 2004-01-10 10517.0 0.002757 0 0 0.000000 0.000000
#> 2004-01-17 10546.0 0.016310 0 0 0.000000 0.000000
#> 2004-01-24 10718.0 -0.017541 0 0 -0.000000 -0.000000
#> 2004-01-31 10530.0 0.007787 -1 -1 -0.007787 -0.007787
#> 2004-02-07 10612.0 0.013098 1 1 0.013098 0.013098

Cumulative returns and the result of strategy

Python
1
2
3
4
base['cumret0'] = (1 + base.ret0).cumprod() - 1
base['cumret1'] = (1 + base.ret1).cumprod() - 1
base[['cumret0', 'cumret1']].plot()
plt.show()

# Python

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×