Interrater reliability of clinically performed manual arm position and motion matching test

Corresponding Author:
Kuan-yi Li, PhD,
OTR Associate professor
Department of Occupational Therapy and Graduate Institute of Behavioral Sciences
Healthy Aging Research Center
Chang Gung University, 259 Wenhwa 1st Road, Kwei-shan, Tao-yuan, Taiwan
Tel: 886-3-2118800, ext 3676
Fax: +886-3-211-8421
E-mail: [email protected]



This study examined the interrater reliability of manually performed arm position and motion matching tests in healthy older adults.


16 healthy older adults were required to perform 2 tasks (position matching and motion matching) with both arms for 4 target angles with 3 repetitions for each. 4 occupational therapists were recruited to rate the participants’ performance by viewing recorded digital images.


Krippendorff’s Alpha values ranged between 0.15~0.33 for the position matching test and 0.02~0.19 for the motion matching test among the 4 raters. Junior therapists showed higher reliability in both tests than senior therapists.


Results indicated poor interrater reliability of manually performed arm position and motion matching tests. Time of clinical experience may have potential impact on the reliability coefficient as junior therapists had higher percentage of trials rated as impaired than senior therapists. Junior therapists also had higher Krippendorff’s Alpha values than senior therapists.


Proprioception, Kinesthesia, Position sense, Motion sense, Interobserver


Quality assessment of sensory perception is important for clinical evaluation before implementing treatments [1]. Of the many somatosensory tests, proprioception and kinesthesia of the upper extremities are two of the primary assessments used in clinical practice [2]. Here we define proprioception as the awareness of body position and kinesthesia as the conscious awareness of body motion [3].Proprioception and kinesthesia are both essential for optimal muscular control, coordination, and stability during the planning, modification and execution of movement [4-7]. Muscle spindles are considered to be the primary proprioceptive and kinesthetic receptors contributing afferent signals to the central nervous system (CNS) [8,9]. Although standardized machinebased assessments are avaliable to measure proprioception quantitatively, they are not widely available in clinics [10-13]. Therefore, arm position matching and manually performed motion test are often used in clinics as a screening tool to assess proprioceptive and kinesthetic accuracy.

Currently, no standardized protocol is used to perform arm position and motion matching test in clinical settings. Based on previous research and current clinical practice, we can summarize the primary testing protocols as follows [14-18]. For the position matching tests, the therapist moves the tested body segment to the target joint angle and holds the limb in this position, and then the participant is asked to match the position with the contralateral body segment. Similarly, motion matching examinations are performed by having the participant move the contralateral body segments concurrently while examiners are moving the referenced body segment. At the end of each trial, the therapists visually and subjectively judges the movement to determine proprioceptive and kinesthetic accuracy by comparing the joint position/ motion at the reference limb to the limb being examined. Using an ordinal scale, proprioceptive and kinesthetic accuracy is graded as intact, impaired or absent [19].

Historically, validity and reliability of the arm position and motion matching tests have rarely been reported [20,21]. A majority of the previous studies that have addressed the intra-rater reliability of joint position sense have mainly examined the knee joint [22-25]. Although the assessment methods were not consistent across studies, they mainly involved replication of the joint position in the sitting and standing positions. The intra-rater reliability (ICC) for the knee joint position sense ranged between 0.17 and 0.79. Previous research suggests that the testing position, measuring methods, and age can potentially impact the proprioceptive acuity [26,27], and subsequently lead to discrepancies in reliability measurements. As for the upper arm, test-retest reliability has been examined for joint position and motion sense across different testing methods [20,28,29]. They mainly examined the reliability of measuring methods instead of rater’s reliability; therefore, the intraclass correlation coefficients (ICC) was performed to measure the reliability of error scores among trials performed by participants. To date, few studies have been done to examine the interrater reliability for arm position and motion matching tests because the visual judgment is considered to be an easy and intuitive method. However, previous research has shown poor interrater reliability for sensory assessments [21]. The lack of reliability was credited to differences in training of the assessors and heterogeneous clinical populations (stroke survivors).

Population aging is a worldwide issue, and the need for long-term care is growing rapidly. Early detection and effective treatments are necessary for elderly individuals to maintain an active lifestyle and social participation [30]. Recent studies have reported that perceptual deficits could be an early sign of neurodegenerative diseases such as Parkinson’s disease and Alzheimer’s disease [31,32]. Kinesthesia and olfactory function have begun to receive growing attention for their potential as early diagnostic markers. Patients with Parkinson’s disease and focal dystonia have been reported to have impaired kinesthesia, which includes position sense and motion sense [33-35]. Manually performed arm position and motion matching tests are widely used as clinical assessment tools for initial screening purposes; therefore, measuring interrater reliability of clinically used sensory assessments in the elderly population can provide useful information for determining appropriate clinical treatment. Proper administration of assessments can reduce measurement error and improve clinical decision making. Proprioception and kinesthesia are primary sensory feedback modalities during motor learning. In this study, we examined the interrater reliability of arm position and motion matching tests and addressed the following questions: (1) Is the interrater reliability different between arm position and motion matching tasks for old adults? (2) Is the interrater reliability different between junior and senior therapists?


▪ Participants

16 healthy older adults (mean age 63.13 ± 4.48 years; 7 male, 9 female) participated in this study. All participants were informed and signed the consent form approved by the Institutional Review Board of Chang Gung Memorial hospital. All participants were right-handed (determined by the Edinburgh Handedness Inventory) [36], and without cognitive impairment (a score ≧ 24 on the Mini-Mental State Examination (MMSE)) [37]. Exclusion criteria included any known neurological disease (ex: stroke or diabetes) or past severe arm injuries, which might interfere with proprioception and kinesthesia. Four occupational therapists with clinical experience ranging from 2~5 years were recruited to rate the performance of participants by viewing the recorded digital images. Clinical raters were blinded to the participants’ demographic characteristics.

▪ Test administration procedure

Participants sat on a height adjustable chair with the forearm of the reference arm rested at 90° of elbow flexion for the starting position. Each participant wore goggles to occlude visual cues during the experiment. Each individual visited the laboratory one time and both arms were tested during the visit. All participants performed two tasks (position matching vs. motion matching) with both arms (right vs. left). Four target angles were tested with three repetitions for each. A total of 48 trials were conducted for each participant. The order of position and motion matching sense tests was random, but each matching test was administered from proximal to distal joints based on clinical practice. All trials were administered by a certified occupational therapist.

Because no standardized clinical testing protocol has been established, four target angles were chosen for both the position and motion matching test based on previous research [38]. These desired target angles were shoulder flexion 60°, shoulder horizontal abduction 60°, elbow flexion 45°, and wrist extension 50°. A limb was moved to one of the selected positions and data were collected three times from each target angle. Subsequently, the arm was returned to the neutral position and then manipulated to the next selected joint position. To ensure each participant was in full understanding of the experimental procedure, up to three practice trials were administered prior to recording. Practice trials consisted of the administration of the experimental tasks but without the opaque goggles. A trained occupational therapist performed the arm position and motion matching tests through the entire experiment. A seven-camera Vicon MX motion analysis system (Oxford Metrics Inc., Oxford, UK) with a sampling rate of 120 Hz was used to capture the movement of subjects during the arm position and motion matching tests. To allow for accurate joint angle measurements, three goniometers were affixed to the table and the back of the chair for the therapist to visualize the desired reference positions and to allow for consistency of angular displacement. The angular displacement data from the current study was part of the previous experiment and detailed procedures and the experimental setup have been previously published [39].

In the position matching test, the examiner moved the referenced joint segment to the target angle, and subjects were asked to mirror the position with the contralateral joint segment. At the end of each trial, subjects were asked to verbally indicate that they felt both joint segments were at an identical and symmetrical position. In the motion matching test, an examiner moved the reference joint segment and subjects were instructed to concurrently mirror the motion with their contralateral, testing joint segments. In both the position and motion matching tests, each trial was repeated until three selfreported “intact” trials for each target angle was recorded. Among the 16 participants, the maximum number of additional trials required was 5 (n=1). Most of the participants (66.67%) could perform the intact trials with less than 3 additional attempts, and only 4 participants required more than 4 extra trials.

A digital video recorder was placed three meters in front of participants to record the testing process. The footage was presented to trained and licensed occupational therapists that were knowable about the testing procedure but otherwise naïve to the participant population. The raters were instructed to make a visual judgment for each of the individual trials. These ratings were used in the calculation of the interrater reliability values. Training of the raters was conducted and the written and verbal instructions were provided before data collection. Therapists were instructed to rate the observed trials as though they were performing the test themselves in their clinical setting, that is, to identify whether the test joint angle and referenced joint angle were at identical symmetrical position at the end of each trial. They assessed the performance of the participants by using an ordinal scale which rated the proprioceptive and kinesthetic accuracy as intact, impaired, or absent, which is equivalent to 2 (correct, 100%; little or no difference), 1 (correct, 3/4 correct or considerable difference), and 0 (less than 3/4 correct or absence) from the Fugl-Meyer assessment of the upper extremities [38]. Raters were blinded to participants’ identity and health status, and rated the series of videos independently over the course of two weeks. All raters were clinical practitioners and they all reported to have the experience of applying arm position and motion matching tests in clinics.

▪ Data analysis

Krippendorff’s Alpha [40] was used to calculate the interrater reliability for position and motion matching tests for all raters and the value is usually considered reliable when it was greater than 0.80, acceptable when it was between 0.80 and 0.667, and unreliable when it was less than 0.667 [41]. Next, we examined whether the level of clinical experience had an effect on the rater’s reliability. We divided the raters into two groups according to their number of year of clinical experience. Raters 1 and 2 were grouped in the “junior therapist group” as their clinical experience was 2 years. The other two therapists were categorized in the “senior therapist group” with 5 years clinical experience.


As in the clinical setting, passive position and motion were imposed manually by a trained therapist; therefore, we examined the intraclass correlation coefficients (ICC, model (3,1)) of angular displacement for each target joint angle to ensure reliability. The ICCs for each joint angle ranged between 0.60 and 0.81 (confidence interval [CI]: 0.31~0.97), which has been defined as good agreement (Cicchetti, 1994). The absolute error of angular displacement of each target angle was reported in Table 1.

  Position matching task Motion matching task  
Shoulder flexion 6.83 ± 5.18 6.09 ± 4.44
Shoulder abduction 9.32 ± 5.97 7.82 ± 6.31
Elbow flexion 7.32 ± 5.69 7.91 ± 6.37
Wrist extension 15.91 ± 10.19 12.88 ± 9.41

Table 1: The absolute error (mean ± SD) for each target angle in degree.

▪ Interrater reliability for position and motion matching test

384 total trials were evaluated for the position and motion matching tests respectively. Only the three self-reported “intact” trials were viewed by the clinical raters. Krippendorff’s Alpha values ranged between 0.15~0.33 for the position matching test and 0.02~0.19 for the motion matching test among 4 raters. An independent sample t-test was performed to examine whether any statistical difference in the reliability coefficients (Krippendorff’s Alpha) was observed between the position and motion matching tasks. There was no significant difference in the Krippendorff’s Alpha between the two tasks (t(14)=1.98, p = 0.07).

▪ Interrater reliability for position and motion matching test between junior and senior therapists

For the junior therapist group, the Krippendorff’s Alpha ranged between 0.28~0.50 for the position matching test and 0.07~0.33 for the motion matching test. In the senior therapist group, the Krippendorff’s Alpha ranged between -0.04~0.46 for the position matching test and -0.01~0.26 for the motion matching test. When the Krippendorff’s Alpha is less than 0, it means that the disagreements are systematic. It indicated a strong and systematic disagreement between 2 senior raters. The interrater reliability was higher in young therapists than senior therapists. Among the 4 target angles, senior therapists showed the least reliability in rating shoulder abduction for both position and motion matching tasks. However, junior therapists had the lowest reliability coefficient in rating elbow flexion for both position and motion matching tasks. We performed an independent sample t-test to examine whether any statistically significant difference in the reliability coefficients (Krippendorff’s Alpha) was observed between junior and senior therapists, and found no significant difference between the two groups (t(14)=1.68, p = 0.12). Detailed results are presented in Table 2.

    Junior therapists S Senior therapists    All therapists
Position matching task
Shoulder flexion 0.3369 0.3525 0.2048
Shoulder abduction 0.4986 -0.0437 0.1462
Elbow flexion 0.2817 0.2919 0.2541
Wrist extension 0.4498 0.4558 0.3325
Motion matching task
Shoulder flexion 0.3346 0.1604 0.1610
Shoulder abduction 0.2433 -0.0106 0.0215
Elbow flexion 0.0713 0.1604 0.0958
Wrist extension 0.3302 0.2629 0.1924

Table 2: Interrater reliability (Krippendorff’s Alpha) of junior, senior and all therapists for each target angle in position and motion matching tasks.

Further analysis revealed that junior therapists identified more impaired trials than senior therapists. For the position matching test, the percentage of impaired trials identified was 43.37% and 34.95% for rater 1 and 2 (junior therapists), respectively and 13.52% and 8.42% for rater 3 and 4 (senior therapists). A similar finding was present in the motion matching test. Rater 1 and 2 identified 34.69% and 31.63% impaired trials and Rater 3 and 4 indicated 4.85% and 6.89% respectively.


Perceptual deficits have a large potential impact on rehabilitation outcomes. Currently, manually performed arm matching position and motion tests are frequently used as initial screening tests to identify possible proprioceptive and kinesthetic deficits in the clinical setting. To date, no study has examined the reliability of these assessment tools for evidence-based practice. In this study, we examined the interrater reliability of manually performed proprioceptive and kinesthetic accuracy test in older adults by using the arm matching test. Current findings were consistent with previous studies showing poor reliability of sensory assessments [21]. Across the testing paradigm, Krippendorff’s Alphas were below 0.667 which is considered poor reliability [41]. Albeit poor, young therapists had a higher level of agreement than senior therapists. The current finding indicate a need to improve and standardize testing and training procedures for occupational therapist employing kinematic and proprioception testing in the clinical setting.

Junior therapists showed higher agreement than senior therapists in rating proprioceptive and kinesthetic accuracy tests. As this result is largely driven by the junior clinicians’ increased likelihood of identifying a trial as impaired, it would appear that the junior therapists had less tolerance in matching errors than senior therapists. Specifically, junior individuals were at least 20% more likely to rate the trial as impaired in performance. This led to a poor reliability level when the Krippendorff’s Alpha was calculated from all raters. The absolute errors reported in our results were compatible with those obtained in a previous study that used a similar method for the knee joint [42]. We were unable to find similar results for experiments involving the upper extremities. Although junior therapists showed higher agreement and identified more impaired trials than senior therapists, the senior therapists made more correct judgments than the junior therapists. This finding may be reflective of small changes in the manner in which these assessments are being learned during the training process or of the role of additional years of clinical experience. It is beyond the scope of this paper to isolate the causative mechanism of this difference, but rather we wish to draw attention to the difference to enhance evidence based practice.

In the design of this experiment, we elected to pick a single viewing angle and record images from only this vantage. Thus, the footage given to the therapists for rating was all in the same viewing angle without accommodating the actual movement plane. This may have caused potential difficulties for the raters to make their judgments, especially for the movements in the sagittal plane (ex: shoulder flexion). However, it should also be noted that in the clinic, therapists tend to make judgments from only the top view while patients are in a seated position. Clinicians rarely alter their viewing angles during the administration of the assessment. This circumstance introduces the potential for a similar bias for the therapists during the visual judgment of the evaluation. As there is currently no standardized testing procedure for the performance of either the arm position or motion matching tests in the clinical setting we elected to standardize the viewing angle across the trials. While we argue that the best practice for this assessment battery would be to have a standardized and well described assessment procedure, that is currently not the state of the profession and thus it is recommended to have the same assessor monitoring a patient’s progression of change over time on perceptual sensory assessments, particularly for arm position and motion matching test in clinical populations.

Finally, Krippendorff’s Alpha tends to generate lower reliability coefficient than other methods because it accounts for observed disagreement over expected disagreement [40]. It is a more rigorous method to calculate reliability than other known methods and it best account for all kinds of measurement levels and data with missing values for multiple raters. Furthermore, the data structure would have a potential impact on reliability estimation. Our data showed that impaired trials identified by raters were scattered around the data matrix. This caused the observed and expected values to have a high level of disagreement, thus generating the low reliability findings. However, considering the implications of the current findings, Krippendorff’s Alpha is the most adequate method to assess interrater reliability because it can apply on multiple raters with various scales. In this study, 4 raters used ordinal scale to grade participants’ proprioceptive and kinesthetic accuracy; therefore, Krippendorff’s Alpha is the most appropriate method to evaluate the interrater reliability of clinically performed manual arm position and motion matching test.

Perceptual impairments have been indicated as an important factor for rehabilitation outcomes [43]. Therapists will alter treatment plans due to perceptual deficits noted on by these assessments. Although arm position and motion matching tests are widely used as an initial screening tool for proprioceptive and kinesthetic impairments, it is an unreliable method to provide insightful results for perceptual sensitively of clinical populations. At present, there is no standardized and consistent protocol to measure proprioception in clinics; therefore, the reliability data from previous studies were inconsistent and incompatible. To improve the reliability of perceptual assessment, Katherine, et al. (2011) proposed a 3-phase approach which included protocol standardization, personnel training, and reliability measurement. For protocol standardization, developing both testing and training manuals were required for clinical utility. Once the manuals have been established, therapists are required to complete a training program including hands-on practice and competency tests across centers. Following this, the trained and certified therapists were considered as experts who can serve as coaches in providing feedback and gold standard in reliability measurement. Katherine, et al. (2011) reported high interrater reliability (ICC = 0.96) in the proprioception subdomain of the Fugl- Meyer assessment of upper extremities after completing the aforementioned procedures. A more accurate, sensitive and standardized assessment of perceptual deficits is needed for clinical outcome measures.


There were several limitations to our study. First, the sample size of raters and healthy old adults was small. Although our results were consistent with previous findings on other types of sensory assessments, the generalizability of the current findings was limited. Second, we applied testing protocols only to healthy old adults. Potential application on patients with neurological disorders or somatosensory deficits should be further explored. Finally, given the limited viewing angles could introduce potential bias for raters. Future research should examine whether multiple perspective could enhance the rater’s reliability.


This study examined the interrater reliability of proprioceptive and kinesthetic matching tests by assessing arm position and motion matching tests performed on healthy old participants. The current findings indicated low reliability in rating participants’ performance. Further, junior therapists have an increased tendency to rate trials as “impaired” as compared to senior therapist. A primary contributor to the lack of reliability may be the lack of detailed standardized clinical procedures for therapist to follow.

Conflict of interest statement

The authors declare no conflict of interest.


This project was supported in part by the Healthy Aging Research Center (EMRPD1G0241), Chang Gung Memorial Hospital (BMRPC58), the National Sciences Council (NSC102-2314-B-182-009-MY3) and the Ministry of Science and Technology (MOST105- 2314-B-182-011) in Taiwan.


malatya escort ankara escort ankara escort antalya escort ankara escort istanbul escort porno izle
mobile bitcoin casino