New Shiny App on My Shiny Server

A new shiny application is under construction! See: https://czxa.top/shiny/stock/

This blog is the first note of *Master Pandas for finance* . This notes introduce the basics of the Series and DataFrame objects.

It;s very easy to use `numpy`

create a pandas Series:

1 | `import numpy as np` |

Individual elements of a series can be retrieved using the [] operator of the series object. The item with the index label 2 can be retrieved using the following code:

1 | `s[2]` |

The `.head()`

and `.tail()`

methods are provided by pandas to examine just the first or the last few records in s Series. By default, these return the `first five`

or`last five`

rows, respectively, but you can use the n parameter or just pass in an interger to specify the number of rows.

1 | s.head() |

1 | `## Retrieve index` |

Note the type of index and values:

1 | `type(s.index)` |

1 | `s2 = pd.Series([1, 2, 3, 4], index = ['a', 'b', 'c', 'd'])` |

1 | `s3 = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e':5})` |

1 | `s = pd.Series([10, 0, 1, 1, 2, 3, 4, 5, np.nan])` |

1 | `s.shape` |

1 | `s.count()` |

1 | `s.unique()` |

Also declude from NaN:

1 | `s.value_counts()` |

1 | `s3 = pd.Series([1, 2, 3, 4], index = ['a', 'b', 'c', 'd'])` |

The process of adding two Series objects differs from an array as it first aligns data upon the index label values instead of simple applying the operations to elements in the same position.

1 | `s3 + s4` |

While for numpy array:

1 | `a1 = np.array([1, 2, 3, 4])` |

the most straightforward method is creating it from a NumPy array:

1 | `pd.DataFrame(np.array([[10, 11], [20, 21]]))` |

A DataFrame can also be initialized by passing a list of Series objects:

1 | `df1 = pd.DataFrame([pd.Series(np.arange(10, 15)),` |

the demensions of it:

1 | `df1.shape` |

column names can be specified at the time of creating the DataFrame using the columns parameter of the DataFrame constructor:

1 | `df = pd.DataFrame(np.array([[10, 11], [20, 21]]), columns = ['a', 'b'])` |

change the column names:

1 | `df.columns = ['c1', 'c2']` |

1 | `df = pd.DataFrame(np.array([[0, 1], [2, 3]]), columns=['c1', 'c2'], index = ['r1', 'r2'])` |

A DataFrame can also be created by passing a dictionary containing one or more Series objects, where the dictionary keys contain the column names and each Series is one column of data.

1 | `s1 = pd.Series(np.arange(1, 6, 1))` |

*注意：如果直接用数组和序列创建数据框，原来的数组和序列都是按行排列的，但是如果使用字典创建数据框会按列排列。*

A DataFrame also does automatic alignment of the data for each Series passed in by a dictionary:

1 | `s3 = pd.Series(np.arange(12, 14), index=[1, 2])` |

First I use `TuShare`

package create a DataFrame `df`

:

1 | `import tushare as ts` |

1 | `df.date.head(3)` |

1 | `df[:3]` |

if DataFrame has a numeric index， .loc[] & .iloc[] have same function：

1 | `df.loc[14]` |

It’s possible to look up the location in index of a specific label value, which can be used to retrieve the row(s):

1 | `i1 = df.index.get_loc('2018-06-20')` |

Selecting rows by the index label and/or location - .ix[]

the method is deprecated. please use .loc[] for label based indexing or .iloc[] for positional indexing.~~df.ix[[‘2018-06-20’, ‘2018-06-21’]]~~~~df.ix[[1, 2, 3]]~~

1 | `df.at['2018-06-20', 'open']` |

1 | `df.open < 10` |

1 | `np.random.seed(123456)` |

The following demonstrates subtraction along a column axis by using the DataFrame object; the .sub() method subtracts the A column from every column

1 | `df` |

The process of performing a reindex does the following:

- Reorders existing data to match a set of labels;
- Inserts NaN markers where no data exists for a label;
- Fills missing data for a label using a type of logic(defaulting to adding NaNs)

1 | `np.random.seed(1)` |

Creater flexibility in creating a new index is provide using the .reindex() method. One example of flexibility of .reindex() over assigning the .index property directly is that the list provided to .reindex() can be of a different length than the number of rows in the Series:

1 | `s2 = s.reindex(['a', 'c', 'e', 'g'])` |

注意下面的两个序列由于索引格式不一致而无法对应相加

1 | `s1 = pd.Series([0, 1, 2], index = [0, 1, 2])` |

把索引变成数值型索引之后即可相加

1 | `s2` |

The default action of inserting NaN as a missing value during .reindex() can be changed using fill_value of the method. The following command demonstrates using 0 instead of NaN:

1 | `s` |

`前向填充`

：使用前面的一个元素填充

The following command demonstrated forward filling, often referred to as the last known value. The Series is reindexed to create a continuous integer index, and using the method = ‘ffill’ parameter, any new index labels are assigned a value from the previously seen value value along the Series. Here’s the command:

1 | `s3 = pd.Series(['red', 'green', 'blue'], index = [0, 3, 5])` |

`后向填充`

：使用后一个值填充，如果没有则使用缺失值填充

1 | `s3.reindex(np.arange(0, 7), method = 'bfill')` |

#
Python

Update your browser to view this website correctly. Update my browser now