python - Pandas: why pandas.Series.std() is quite different from numpy.std() -
i got 2 snippets code follows.
import numpy numpy.std([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]) 0 and
import pandas pd pd.series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).std(ddof=0) 10.119288512538814 that's huge difference.
may ask why?
this issue indeed under discussion (link); problem seems algorithm calculating standard deviation used pandas since not numerically stable 1 used numpy.
an easy workaround apply .values series first , apply std these values; in case numpy's std used:
pd.series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).values.std() which gives expected value 0.
Comments
Post a Comment