python - Pandas: why pandas.Series.std() is quite different from numpy.std() -

- March 15, 2012

i got 2 snippets code follows.

import numpy numpy.std([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]) 0

and

import pandas pd pd.series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).std(ddof=0) 10.119288512538814

that's huge difference.

may ask why?

this issue indeed under discussion (link); problem seems algorithm calculating standard deviation used pandas since not numerically stable 1 used numpy.

an easy workaround apply .values series first , apply std these values; in case numpy's std used:

pd.series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).values.std()

which gives expected value 0.

Search This Blog

Bay WIKI

python - Pandas: why pandas.Series.std() is quite different from numpy.std() -

Comments

Post a Comment

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

Automatically Create Database in Entity Framework 6 with Automatic Migrations Disabled -