python - Pandas: why pandas.Series.std() is quite different from numpy.std() -
i got 2 snippets code follows.
import numpy numpy.std([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]) 0
and
import pandas pd pd.series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).std(ddof=0) 10.119288512538814
that's huge difference.
may ask why?
this issue indeed under discussion (link); problem seems algorithm calculating standard deviation used pandas
since not numerically stable 1 used numpy
.
an easy workaround apply .values
series first , apply std
these values; in case numpy's
std
used:
pd.series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).values.std()
which gives expected value 0.
Comments
Post a Comment