Suppose we have two datasets, have $$n1$$, $$n2$$ data separately, and we know mean and variance of each, $$\mu_1$$ ,$$\sigma_1^2$$, $$\mu_2$$ , $$\sigma_2^2$$ , then we combined the two datasets to single one, what’s the variance of the combined dataset?

I find a solution in Internet, here is the formula.

Now, I will prove it. Consider $Var[x] = E[x^2] - (E[x])^2$

Therefore

$\sigma^2 = \frac{n1E[x_1^2]+n2E[x_2^2]}{n1+n2} - (E[x])^2 = \frac{n1E[x_1^2]+n2E[x_2^2]}{n1+n2} - \mu^2 \frac{n1(\sigma_1^2 + \mu_1^2)+n2(\sigma_2^2 + \mu_2^2)}{n1+n2} - \mu^2$

The we expand the first formula

The interesting point is

So I think the formula from Internet is correct.