Damn, I lost all I wrote. I don't have time anymore to rewrite everything so I will be rather brief.

No, you cannot apriorically discard heuristic proofs, because of something called "the predictive power" of a theory. When a theory is "designed" it is designed such that heuristic proofs agree with known observational data, and that by the means of these heuristic proofs you can make predictions. You or I or anyone else cannot just discard such proofs just because they are based on more or less idealized heuristic arguments. It is only observation that can decide whether a prediction is true or not. Remember the laws of planetary motion, the discovery of Pluto the precession of Mercury perihelion? They were all (at the time) predictions of certain theories.

But let's try now your argument without any heuristic rules. You have a system in a state |0> which we consider to be reproducibe, i.e. we can somehow prepare the system to be in such a state whenever we need it (it might take some trials to do that, but in the end we can find this state).
And we measure two quantities, M1,M2 which are just QUANTITIES, not operators, and I don't care if they commute or not since I don't measure them simultaneously anyway.
a) I measure first M1, and by measuring M1, the system is left in a state |0,1>. All this ket notation is absolutely meaningless,no eigenstates, no left and right products, nothing of the sort. I just use it as a convenient notation for the states. Now I measure M2, and after the measurement, the system is left in the state |0,1,2>
b) I measure first M2, which brings my sysytem into the state |0,2'>, and if afterwards I measure M1, the sytem will get ino the state |0,2',1'>

Now, the states |0,1>,|0,1,2>, |0,2'> |0,2',1'> are in the general case not the same by themselves or combinations, except for the particular case of a coincidence. And if you associate in an univalued manner states with the values of the observations/measurement, the result of the two measurements will be different.

Sure, if you make statistical measurements (i.e. if you measure both M1M2 and M2M1 a large number of times), you will obtain the same distribution of the values m1m2 and m2m1, but then, what does this tell you? Absolutely nothing because you are ignoring the "correlations" between the individual measurements, you understand what I mean. You are now making correlations on the entire enasmbles, which erases basically the individual measurement "correlations".