Update physics_openrowingmonitor.md

Added description of issue in Theil-Senn implementation
2023-11-21 11:31:26 +01:00 · 2023-11-21 11:31:26 +01:00 · d70a2b3e5e
parent 9e80bef1ee
commit d70a2b3e5e
1 changed files with 27 additions and 3 deletions
--- a/docs/physics_openrowingmonitor.md
+++ b/docs/physics_openrowingmonitor.md
@ -458,22 +458,46 @@ Currently, this is an accepted issue, as the simplified formula has the huge ben

 ### Use of Quadratic Theil-Senn regression for determining &alpha; and &omega; based on time and &theta;

-Abandoning the numerical approach for a regression based approach has resulted with a huge improvement in metric robustness. So far, we were able to implement Quadratic Theil-Senn regression and get reliable and robust results. The underlying assumption of this Quadratic approach is that the Angular Accelration &alpha; is constant, or at constant by approximation in the flank under measurment. In rowing this probably won't be the case as the force will vary based on the position in the Drive phase (hence the need for a forcecurve). Currently, the use of Quadratic Theil-Senn regression represents a huge improvement from both the traditional numerical approach (as taken by [[1]](#1) and [[4]](#4)) used by earlier approaches of Open Rowing Monitor. As the number of datapoints in a *Flanklength* in the relation to the total number of datapoints in a stroke is small, we consider this is a decent approximation while maintaining an sufficiently efficient algorithm to be able to process all data in the datastream in time.
+Abandoning the numerical approach for a regression based approach has resulted with a huge improvement in metric robustness. So far, we were able to implement Quadratic Theil-Senn regression and get reliable and robust results. Currently, the use of Quadratic Theil-Senn regression represents a huge improvement from both the traditional numerical approach (as taken by [[1]](#1) and [[4]](#4)) used by earlier approaches of Open Rowing Monitor.

-We can inmagine there are better suited third polynomal (cubic) approaches available that can robustly calculate &alpha; and &omega; as a function of time, based on the relation between time and &theta;. However, getting these to work in a datastream with very tight limitations on CPU-time and memory across many configurations is quite challenging. We also observe that in several areas the theoretical best approach did not deliver the best practical result (i.e. a "better" algorithm delivered a more noisy result for &alpha; and &omega;). Therefore, this avenue isn't investigated yet, but will be a continuing area of improvement.
+The (implied) underlying assumption underpinning the use of Quadratic Theil-Senn regression approach is that the Angular Accelration &alpha; is constant, or near constant by approximation in the flank under measurment. In essence, quadratic Theil-Senn regression would be fitting if the acceleration would be a constant, and the relation of &theta;, &alpha; and &omega; thus would be captured in &theta; = 1/2 * &alpha; * t<sup>2</sup> + &omega; * t. We do realize that in rowing the Angular Accelration &alpha;, by nature of the rowing stroke, will vary based on the position in the Drive phase: the ideal force curve is a heystack, thus the force on the flywheel varies in time.

-We also observe specific issues, which could result in overfitting the dataset, nihilating its noise reduction effect. As the following sample of three rotations of a Concept2 flywheel shows, due to production tolerances or deliberate design constructs, there are **systematic** errors in the data due to magnet placement or magnet polarity. This results in systematic issues in the datastream:
+As the number of datapoints in a *Flanklength* in the relation to the total number of datapoints in a stroke is relatively small, we use quadratic Theil-Senn regression as an approximation on a smaller interval. In tests, quadratic regression has proven to outperform (i.e. less suspect to noise in the signal) both the numerical approach with noise filtering and the linear regression methods. When using the right efficient algorithm, this has the strong benefit of being robust to noise, at the cost of a O(n<sup>2</sup>) calculation per new datapoint (where n is the flanklength). Currently, we consider this is a decent approximation while maintaining an sufficiently efficient algorithm to be able to process all data in the datastream in time.
+
+Although the determination of angular velocity &omega; and angular acceleration &alpha; based on Quadratic Theil-Senn regression over the time versus angular distance &theta; works decently, it does not respect the true dynamic nature of angular acceleration &alpha;. From a pure mathematical perspective, a higher order polynomial would be more appropriate. A cubic regressor, or even better a fourth order polynomal have shown to be better mathematical approximation of the time versus distance function for a Concept2 RowErg. We can inmagine there are better suited third polynomal (cubic) approaches available that can robustly calculate &alpha; and &omega; as a function of time, based on the relation between time and &theta;. However, getting these to work in a datastream with very tight limitations on CPU-time and memory across many configurations is quite challenging.
+
+However, there are some current practical objections against using these more complex methods:
+
+* Higher order polynomials are less stable in nature, and overfitting is a real issue. As the displacement of magets can present itself as a sinoid-like curve, 3rd or higher polynomials are inclined to follow that curve. As this might introduce wild shocks in our metrics, this might be a potential issue for application;
+* A key limitation is the available number of datapoints. For the determination of a polynomial of the n-th order, you need at least n+1 datapoints (which in Open Rowing Monitor translates to a `flankLength`). Some rowers, for example the Sportstech WRX700, only deliver 5 to 6 datapoints for the entire drive phase, thus putting explicit limits on the number of datapoints available for such an approximation.
+* Calculating a higher order polynomial in a robust way, for example by Theil-Senn regression, is CPU intensive. A quadratic approach requires a O(n<sup>2</sup>) calculation when a datapoint is added to the flank. Our estimate is that with current known robust polynomial regression methods, a cubic approach requires at least a O(n<sup>3</sup>) calculation, and a 4th polynomial a O(n<sup>4</sup>) calculation. With smaller flanks (which determines the n) this has proven to be doable, but for machines which produce a lot of datapoints, and thus have more noise and a typically bigger `flankLength` (like the C2 RowErg and Nordictrack RX-800, both with a 11 `flankLength`), this becomes an issue: we consider completing 10<sup>3</sup> or even 10<sup>4</sup> complex calculations within the 5 miliseconds that is available before the next datapoint arrives, impossible.
+
+We also observe specific practical issues, which could result in structurally overfitting the dataset, nihilating its noise reduction effect. As the following sample of three rotations of a Concept2 flywheel shows, due to production tolerances or deliberate design constructs, there are **systematic** errors in the data due to magnet placement or magnet polarity. This results in systematic issues in the datastream:

 <img src="img/Concept2_RowErg_Construction_tolerances.jpg" width="700">

 Fitting a quadratic curve with at least two full rotations of data (in this case, 12 datapoints) seems to reduce the noise to very acceptable levels. In our view, fitting a third-degree polynomial would result in a better fit with these systematic errors, but resulting in a much less robust signal.

+We also observe that in several areas the theoretical best approach did not deliver the best practical result (i.e. a "better" algorithm delivered a more noisy result for &alpha; and &omega;). Therefore, this avenue isn't investigated yet, but will remain a continuing area of improvement.
+
+This doesn't definitively exclude the use of more complex polynomial regression methods: alternative methods for higher polynomials within a datastream could be as CPU intensive as Theil-Senn Quadratic regression now, and their use could be isolated to specific combination of Raspberry hardware and settings. Thus, this will remain an active area of investigation for future versions.
+
 ### Use of Quadratic Theil-Senn regression and a median filter for determining &alpha; and &omega;

 For a specific flank, our quadratic regression algorithm calculates a single &alpha; for the entire flank and the individual &omega;'s for each point on that flank. As a datapoint will be part of several flank calculations, we obtain several &alpha;'s and &omega;'s that are valid approximations for that specific datapoint. To obtain the most stable result, we opt for the median of all valid values for &alpha; and &omega; to calculate the definitive approximation of &alpha; and &omega; for that specific datapoint. Although this approach has proven very robust, and even necessary to prevent noise from disturbing powercurves, it is very conservative. For example, when compared to Concept 2's results, the powercurves have the same shape, but the peak values are considerable lower.

 Reducing extreme values while maintaining the true data volatility is a subject for further improvement.

+### Quality of the implementation of Quadratic Theil-Senn regression in calculateA()
+
+The implementation of the Quadratic Theil-Senn regression in `FullTSQuadraticSeries.js` contains an optimization that will hurts its accuracy. In theory, all factors for a (in generalised formula y = a * x<sup>2</sup> + b * x + c ) should be considered in the median calculation. `TSQuadraticSeries` maintains a matrix of all possible `a`'s to determine the median. When calculating the optimal curve between points x<sub>1</sub> and x<sub>3</sub> by calculateA(x<sub>1</sub>, x<sub>3</sub>), all possible intermediate values of x<sub>2</sub> are considered, resulting in a series of `a`'s.
+
+In theory, all resulting `a`'s found by calculateA(x<sub>1</sub>, x<sub>3</sub>) should be returned as an array and all should be considered in the median calculation by `TSQuadraticSeries`'s `push(x,y)` function. However, this results in a large 3D matrix, where sorting and determining the median would be extremely CPU intensive. As this approach would result in a 3D matrix, with length, width and depth sizes close to `flankLength`, a sorting will cost O(n<sup>3</sup>) cycles (at best) before the median can be determined for `a`.
+
+By reducing the result of calculateA(x<sub>1</sub>, x<sub>3</sub>) to a single result, the median `a` of all curves starting at x<sub>1</sub> and ending at x<sub>3</sub>, the matrix in `TSQuadraticSeries` is reduced to a 2D matrix, where sorting is 'only' O(n<sup>2</sup>). However, as a median isn't a commutative operation, this will introduce errors in the resulting data. Tests have shown the effects are minimal and that the current implementation delivers good results. 
+
+However, it does imply that a better Quadratic Theil-Senn regression algorithm can improve the algorithms accuracy.
+
 ## References

 <a id="1">[1]</a> Anu Dudhia, "The Physics of ErgoMeters" <http://eodg.atm.ox.ac.uk/user/dudhia/rowing/physics/ergometer.html>