Bioinspir. Biomim. 19 (2024) 046015 https://doi.org/10.1088/1748-3190/ad5129 OPEN ACCESS RECEIVED 22 December 2023 REVISED 1 May 2024 ACCEPTED FOR PUBLICATION 28 May 2024 PUBLISHED 10 June 2024 Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. PAPER Utilising redundancy in musculoskeletal systems for adaptive stiffness and muscle failure compensation: a model-free inverse statics approach Elijah Almanzor1,3, Taku Sugiyama2,3,∗, Arsen Abdulali1, Mitsuhiro Hayashibe2 and Fumiya Iida1 1 Bio-Inspired Robotics Laboratory, Department of Engineering, University of Cambridge, Cambridge, United Kingdom 2 Neuro-Robotics Laboratory, Department of Robotics, Graduate School of Engineering, Tohoku University, Sendai 980-8579, Japan 3 These two authors contributed equally to this work and are designated as co-first authors. ∗ Author to whom any correspondence should be addressed. E-mail: taku.sugiyama.t5@dc.tohoku.ac.jp, eda26@cam.ac.uk, hayashibe@tohoku.ac.jp, aa2335@cam.ac.uk and fi224@cam.ac.uk Keywords:musculoskeletal system, redundancy, neural network, inverse statics, adaptive stiffness, resilience Supplementary material for this article is available online Abstract Vertebrates possess a biomechanical structure with redundant muscles, enabling adaptability in uncertain and complex environments. Harnessing this inspiration, musculoskeletal systems offer advantages like variable stiffness and resilience to actuator failure and fatigue. Despite their potential, the complex structure presents modelling challenges that are difficult to explicitly formulate and control. This difficulty arises from the need for comprehensive knowledge of the musculoskeletal system, including details such as muscle arrangement, and fully accessible muscle and joint states. Whilst existing model-free methods do not need explicit formulations, they also underutilise the benefits of muscle redundancy. Consequently, they necessitate retraining in the event of muscle failure and require manual tuning of parameters to control joint stiffness limiting their applications under unknown payloads. Presented here is a model-free local inverse statics controller for musculoskeletal systems, employing a feedforward neural network trained on motor babbling data. Experiments with a musculoskeletal leg model showcase the controller’s adaptability to complex structures, including mono and bi-articulate muscles. The controller can compensate for changes such as weight variations, muscle failures, and environmental interactions, retaining reasonable accuracy without the need for any additional retraining. 1. Introduction Evolution has equipped humans and animals with the ability to function and adapt in uncertain, com- plex environments. Intelligent behaviour emerges from the interplay between an organism’s central nervous system (CNS), body, and environment [1, 2]. Vertebrates, including humans, feature a bio- mechanical structure with redundant muscles organ- ised in agonist-antagonist pairs for motion [3]. These muscles can be mono-articulate, influencing a single joint directly, or multi-articulate, affecting multiple joints [4–6]. Humans are therefore able to perform a wide variety of complex and precise movements by coordinating and recruiting certain muscles based on the specific demands of a particular movement [7]. Functional redundancy in muscles contributes to mechanical resilience, compensating for injuries [8, 9], fatigue [10], ageing [11] and maintaining overall functionality [12]. In healthy individuals, fatigue directly impacts muscle performance, activations, and resulting muscle synergies formovement [13, 14]. For instance, fatigue in the elbow flexors leads to reduced trans- joint inhibition of related arm muscles, indicating compensatory use of redundancy [14]. In a sim- ilar study [15], participants fatigued from repeated ankle exertions during a hopping task employed two strategies for compensation. Firstly, they relied more on proximal bi-articulate muscles to adjust leg joint © 2024 The Author(s). Published by IOP Publishing Ltd https://doi.org/10.1088/1748-3190/ad5129 https://crossmark.crossref.org/dialog/?doi=10.1088/1748-3190/ad5129&domain=pdf&date_stamp=2024-6-10 https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/ https://orcid.org/0000-0001-8101-4217 https://orcid.org/0000-0001-9235-8779 https://orcid.org/0000-0001-6179-5706 https://orcid.org/0000-0001-9246-7190 mailto:taku.sugiyama.t5@dc.tohoku.ac.jp mailto:eda26@cam.ac.uk mailto:hayashibe@tohoku.ac.jp mailto:aa2335@cam.ac.uk mailto:fi224@cam.ac.uk https://doi.org/10.1088/1748-3190/ad5129 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 1. The musculoskeletal leg model utilised in this paper. It is comprised of six muscles: The mono-articulate agonist-antagonist pairs iliacus (IL) and gluteus maximus (GM), vastus intermedius (VA) and the short biceps (SB), which drive the hip and knee joints directly, and the bi-articulate muscles, rectus femoris (RF) and the long biceps (LB). On the right is an example of an arbitrary leg posture. stiffness, compensating for fatigued knee and ankle muscles. Secondly, they exhibited earlier activations of the gastrocnemius muscle. This strategic distri- bution of effort among muscles enhances endurance and robustness, enabling sustained forces without excessive fatigue [16]. Furthermore, various studies indicate that individuals experiencing chronic pain increase activity in other muscles to maintain motor output while minimising use of the painful muscles [17, 18]. Stroke often leads to weakness, spasticity, and abnormal movement patterns in limbs opposite the cerebral lesion, prompting compensatory strategies [19, 20]. Stroke patients with severe arm impair- ments exhibit entirely different muscle synergies in the affected arm compared to the unaffected arm [21], for adjusting hand posture [22] and for picking up objects [23]. Comparisons of unconstrained upper limb reaching between healthy individuals and stroke patients reveal that stroke patients recruit additional degrees of freedom, particularly trunkmuscles, indic- ating exploitation of muscle and kinematic redund- ancy in the musculoskeletal structure [19]. Subjects with spinal cord injury also demonstrate signific- antly different hand muscle synergies [24]. Cats also exhibit similar compensatory strategies to maintain limb kinematics following various peripheral nerve injuries [25]. Musculoskeletal systems mimic the biomechan- ical structure of organisms, controlling joints through tendon-actuator unit agonist-antagonist pairs, unlike traditional robots that directly drive joints with motors (see figure 1). Similar to real muscles, actuators in these systems exhibit compliant beha- viour, utilising components like pneumatic artifi- cial muscles or Hill-type muscle models [26, 27]. This compliance provides advantages similar to their biological counterparts. By employing compli- ant Hill-type models and co-activation patterns of agonist-antagonist pairs [28], musculoskeletal sys- tems can attain variable stiffness in their joints or as a whole. This feature allows for safer interactions with the environment compared to rigid counterparts [29]. Moreover, they can generate greater forces than purely soft robots, leveraging their rigid skeletons. Furthermore, owing to their high muscle redundan- cies, failures and breakage of muscles can be com- pensated by others through different muscle stimula- tion activations and combinations to achieve the same motion [30]. The motor equivalence problem posits that coordination and movement lack unique solutions [31, 32]. Theoretical infinite solutions exist for the same task space position across the kinematic joint, muscle, activation, and neural spaces [33]. Whilst resolving redundancy in the CNS remains a key research focus [34], it is widely accepted that the CNS employs both feedforward and feedback internal models for motor control [35, 36] with signals des- cending frommotor cortical areas to the spinal neural structures [21]. Movement often exhibits stereotyp- ical behaviour, such as the ‘two-thirds power law’ observed in humans and primates, where angu- lar velocity logarithmically relates to curvature [37, 38]. Three main internal model frameworks attempt to explain these regularities: optimal control the- ory (OCT), passive motion paradigm (PMP), and motion primitives (MP). OCT suggests that move- ment regularities arise from CNS path planning, optimising trajectories based on objective func- tions like the minimum jerk model and its variants [39–41], which predict movements using fifth-order polynomials [42]. PMP [43], building on equilibrium point hypothesis (EPH), simulates internal shifts in 2 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al musculoskeletal structure equilibrium to achieve desired postures through joint torque calculations based on task-related force-fields, akin to imped- ance control. MP proposes that complex coordinated motions result from combining stored primitives with language-like syntax to reduce dimensionality [44, 45]. For example, the time-varying synergies model scales, shifts, and sums coordinated muscle activations to generate new patterns [46, 47], often extracting muscle synergies using non-negative mat- rix factorisation [44]. The OCT and PMP frameworks for explaining human limb manipulation rely heavily on extens- ive prior knowledge of the entire system, lead- ing to scalability issues when explaining how the CNS recruits redundant muscles for synergistic co- contractions, even after neurotrauma. For instance, in cases ofmuscle failure like stroke-induced paralysis [19], manual reformulation of the cost function for OCT and the admittance matrix for PMP is neces- sary. On the other hand, while a limited set of motion primitives can significantly reduce dimensionality and complexity, their mathematical extraction and formalisation are required [44, 48, 49]. Consequently, these methods generally fail to address CNS plas- ticity, which allows individuals with neuromuscu- lar disorders due to stroke or injuries [23, 25] or those experiencing fatigue [15] and chronic pain [17] to utilise motor redundancy for alternative control strategies in achieving target reaching motions. Another notable framework for explaining motor redundancy is the concept of central pattern gen- erators (CPG). Rhythmic movement like respira- tion, walking, and swimming, is attributed to CPGs in the spinal cord [50–52]. These CPGs generate muscle activationsmodulated by higher-level centres, producing synergistic muscle contractions based on sensory responses from the agent-environment interactions [53]. Although some evidence indicates that certain rhythmic humanmovements persist after neurotrauma [54], a larger impairment is found in discrete reaching than cyclic movements, implying distinct areas in the CNS govern rhythmic and non- rhythmic movement [52]. Themotor redundancy problem of infinite choice in solutions is also very present in musculoskeletal systems. The non-linear coupling and interference arising from muscle arrangements, multi-articulate relationships between joints, tendons, and agonist- antagonist muscles, coupled with the force–length and force–velocity properties of Hill-type muscles [55, 56], contribute to increased complexity inmuscle stimulation solutions within an already sophistic- ated system [57]. Controllers relying on algebraic or numerical kinematic or dynamic models can generate appropriate muscle activations for accur- ate and precise manipulation [58–61]. However, these approaches assume comprehensive a priori knowledge of the entire system, posing challenges when scaling to intricate morphological configur- ations that involve a larger number of muscles with diverse properties, geometry, and arrangements. This limitation constrains their practical applica- tions. Moreover, sudden changes such as muscle failure or fatigue require the reformulation of the model. Consequently, these controllers do not fully exploit the functional redundancy inherent in mus- culoskeletal systems. To obviate the need for explicit models, model- free methods employing machine learning have been employed. Some studies have employed supervised neural networks, which learn the inverse mapping between task and muscle spaces, facilitating coordin- ated control of the end-effector with joint-stiffness control [27, 29, 62–64]. However, these methods necessitate manual tuning of parameters that govern co-contraction, restricting their applicability in the face of unknown payloads or environmental inter- actions. Additionally, these model-free approaches require an additional retraining process to adapt to the system after muscle failure. The only prior work that undergoes this retraining process is con- strained to joint space control [65]. While reinforce- ment learning approaches have proven successful in highly redundant musculoskeletal systems [66–68], they often mandate complex reward function design and laborious retraining after muscle failure. Several approaches rooted in human motor con- trol have emerged, with the majority of these incor- porating machine learning techniques [64, 69–71]. Consequently, they encounter similar drawbacks as model-free methods. CPG-based control has also been explored [53], but it is confined to rhythmic movements. Another noteworthy approach, inspired by the EPH, has been implemented [72]. Thismethod enables movement by altering the system’s equilib- rium but necessitates manual tuning of reflex circuit gains and muscle reference lengths to achieve desired end-effector positions, limiting its application in the task space. In this study, we address the limitations of prior methods by implementing a novel application of a learning-based motor babbling approach to mus- culoskeletal systems, facilitating control directly in the task space [73]. This method is employed to approximate the inverse statics (IS) mapping between muscle activations and the task space for musculo- skeletal structures comprising Hill-type muscle mod- els and bi-articular arrangements. The IS control- ler and its formulation provides an alternative view on how such an internal model could be learnt in biological organisms as well as artificial musculo- skeletal systems, and offers two key contributions. Firstly, it demonstrates resilience to muscle failure. In contrast to prior works [65, 72], our approach does not necessitate additional retraining when muscles experience failure, showcasing efficiency in utilising functional redundancy. Secondly, it autonomously 3 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al adapts joint stiffness without human intervention, enabling control of the redundant system even in the presence of unknown payloads. We extend this prop- erty to demonstrate that the system can handle inter- actions with the environment not encountered in the motor-babbling training data without requiring further retraining. The controller approximates the local inversion of forward statics by linearising at the current state using a neural network. Consequently, it calculates muscle activation signals at each time step to reach a desired reference, taking as input only the target and current task space positions and the current muscle activations, under the assumption of quasi-static conditions. The learning and deploy- ment process is straightforward, requiring no prior information about the model, such as the number of muscles, their internal states or configuration, the overall mechanical structure, or the environment. To validate the controller’s performance, we conducted simulations using a musculoskeletal leg. Our exper- imental results indicate that the controller achieves reasonable accuracy in controlling target-reaching and trajectory-tracking tasks, even under variations such as changes in mechanical weight, muscle failure, and interactions with the environment. The following section reviews prior work relat- ing to the control of artificial musculoskeletal sys- tems (section 2). Section 3.1 explains the mus- culoskeletal simulation, the implemented Hill-type muscles (section 3.2), and the formulation of the IS controller (section 3.3). Section 4 describes the exper- imental setup and results. Finally, section 5, gives a discussion, conclusions and scope for future work. 2. Related works In this section, we review the literature on the various methods of control for artificial musculoskeletal sys- tems. Thesemethods fall into the categories ofmodel- based, model-free, and biologically inspired control schemes. 2.1. Model-based control approaches In model-based control [74], several techniques have been developed, employing kinematic and dynamic models based on muscle-joint mappings [75–77]. For instance, Zhao et al [58] successfully controlled a one-degree-of-freedom system driven by mono- articulate pneumatic muscles using a switch-sliding controller. Meanwhile, Kawamura et al [59] devised a numerical joint-space controller for a redundant muscle humanoid neck. Although their method can achieve control even when muscles are deactivated, it is limited to mono-articulate muscles and neces- sitates accurate joint angle estimation via an exten- ded Kalman filter. Tahara et al [60] applied the virtual spring hypothesis to coordinate movement, achieving end-effector positioning in a six-muscle musculoskeletal system with bi-articulate muscles. However, their approach mandates complete know- ledge of parameters such as muscle insertion points and fully accessible joint states. Stanev andMoustakas [61] determined the relationship between kinemat- ics and dynamics using null-space projection. This approach, however, also requires fully accessible states and is difficult to scale to more complex models due to the time complexity of the numerical algorithm. Analytical and numerical methods therefore face scalability challenges in more advanced systems that incorporate bi-articulate Hill-type muscles suscept- ible to change such as breakage or failure. These methods also demand prior knowledge of the sys- tem, assuming conditions like zero gravity, minimal friction, fully accessible muscle and joint states, and minimal interactions, rendering them laborious and impractical to formulate. 2.2. Model-free control approaches Approaches that are model-free [65, 78–81] and hybrid control [26, 29, 63, 82, 83] employ algorithms like supervised or reinforcement learning. 2.2.1. Supervised learning Regarding supervised learning approaches, Masuda et al [62] employed an auto-encoder neural network to simultaneously control their pneumatically actu- ated robot and joint stiffness. They used random motor babbling for learning forward and inverse kin- ematic (IK) mappings. Similarly, Driess et al [27] achieved task space control using goal-babbling and sequential quadratic programming for the inverse, incorporating neural networks. They adjusted joint stiffness through perturbations in the Jacobian null space. Kawaharazuka et al [28] utilised auto-encoders for state estimation, allowing joint stiffness adjust- ments and muscle failure detection. Jantsch et al [29] combined numerical methods and neural networks hierarchically to control a humanoid robot’s kinemat- ics and approximate muscle Jacobian. Tahara et al [63] implemented a hybrid iterative learning control approach tolerant to sensor noise in muscle and task space. Zhong et al [64] proposed using convex hull theory to reduce redundant muscles, simplifying the system without sacrificing motion accuracy. Other works applied Gaussian process regression to model forward dynamics in agonist-antagonist pneumatic- ally driven joints [26]. The primary advantage of model-free approaches lies in their ability to eliminate explicit formu- lations, allowing for scalability to multi-articulate muscles. Consequently, the majority of these studies [27, 29, 62–64] enable task space control without necessitating knowledge of muscle arrangement. However, while they can achieve joint-stiffness con- trol by modulating muscle co-contractions, they rely on human intervention for parameter tuning, restricting their applicability in scenarios involving 4 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al unknown payloads or interactions with the environ- ment. Furthermore, akin tomodel-based approaches, they fail to fully exploit the functional redundancy offered by multiple muscles. Notably, among the reviewed literature, the work of Kawahazura et al [65] is the sole learning approach that addresses this limit- ation. Nevertheless, their implementation mandates an online relearning phase for robot control post- muscle failure. Additionally, their method requires a known geometric model to initialize their autoen- coder network, along with internal states such as muscle length, and is confined to joint control. 2.2.2. Reinforcement learning In reinforcement learning, Balaghie et al [84] used neuro-fuzzy controllers updated by critic agents for each muscle in a simulation musculoskeletal arm. The controller tolerated up to 20% uncertainties in arm mass, inertia, and length for task space control. Another study [30] developed a deep Q network- based controller for a musculoskeletal arm model, employing a phased target-learning framework and human-inspired noise for stable and efficient explor- ation of the solution space. Complex reinforce- ment learning algorithms such as Deep Deterministic Policy Gradient [67], proximal policy optimisation [68], and Soft Actor-Critics [85] have also shown promising control results. However, reinforcement learning faces optim- isation challenges due to the vast solution space, requiring complex reward function design, fully accessible muscle states, high-dimensional inputs, and specific initial conditions, leading to a computa- tionally intensive and laborious training process [67]. Unstable early explorations may result in unstable actions, risking unsafe movements and potential damage to the system or the environment. 2.3. Biologically-inspired control approaches Regarding methods inspired by human motor con- trol, the minimum jerk model and its variations have been employed for trajectory optimization, gen- erating human-like motion for rigid robots [86, 87]. However, these methods necessitate a model of the system which results in similar disadvant- ages as model-based approaches. The concept of motion primitives was also applied to musculo- skeletal systems [88]. Campbell et al [69] used a Bayesian Interaction Primitives method for open- loop trajectories without an analytical model. Zhong et al [64] employed Hebbian learning in recur- rent neural networks to modulate control signals. Fu et al [70] extracted motor synergies from non- optimal motor babbling data, using them for inverse dynamics in tracking control. Another approach used radial basis function neural networks to mod- ulate muscle synergy amplitudes and timescales for target-reaching [89]. These approaches utilise imit- ation learning and hence require demonstrations to bootstrap the learning. Another biologically inspired approach involves an emotion-modulated learning rule for recurrent neural networks, dynamically adjusting hyperparameters based on reward entropy [90]. This enhanced the accuracy of point-reaching tasks for their musculoskeletal arm. Likewise, Chen et al [57] developed a motor cortex-inspired recur- rent neural network with reward-modulated learn- ing, exhibiting the ability to learn new tasks without catastrophic forgetting. As these approaches incor- porate machine learning, they exhibit the same drawbacks as the model-free techniques. CPGs have also been employed for coordinated motion such as playing instruments [53] and locomotion [91– 93]. CPG-based approaches, while not dependent on kinematic or dynamic models, are restricted to rhythmic motions. Drawing inspiration from the EPH, Marques et al [72] employed Hebbian learn- ing for reflex behaviour in musculoskeletal legs, accomplishing stable hopping and point-to-point reaching tasks via manual tuning of control gains. Despite its ability to accommodate muscle failure, this method necessitates re-tuning of reflex circuits. Moreover, it lacks the inverse mapping between task andmuscle space,making it unable to directly control the end-effector. 3. Methods 3.1. Musculoskeletal leg model To verify the performance and advantages of our learning-based feedback controller in musculo- skeletal systems for manipulation tasks, we based the structure of our simulation experimental platform on the seminal neuro-muscular studies by Geyer and Herr [94], and Marques et al [72] which utilised a leg structure. We employed MATLAB & Simscape to construct the musculoskeletal leg, consisting of three rigid seg- ments: the pelvis, femur, and tibia. The masses of the femur and tibia are 11 and 5 kg, respectively. The pelvis mass is set to zero during training data collec- tion and all experiments as the pelvis is simulated as a weld joint, except for the squatting experiment in section 4.3.4, where the pelvis mass varies between 1,3,5 kg. The femur and tibia lengths denoted as Lf and Lt, are 0.4 and 0.34 m, respectively, and utilise the centre of the 3D file geometry as the centre of mass. The hip and knee joints are simulated as revol- ute joints. Despite the inherent 3D nature of the leg, the movements are confined to the 2D sagittal plane. The leg is actuated by a total of six muscles. The organization, positioning, and tendon linkages of these muscles draw inspiration from musculo- skeletal models outlined in [72, 94]. Employing Hill- type models (refer to section 3.2) and the Simscape Belts and Cables utilities, we simulate the muscles, assuming rigid tendons (refer to figure 1). At the 5 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 2. Left: the main experimental setup for the majority of the end-effector control experiments. The pelvis is mounted above the blue floor such that the end-effector is free to move within the leg’s operating space. Right: the musculoskeletal leg model used in the squatting experiment in section 4.3.4. The leg’s end-effector is replaced with a flat plate with a spatial contact force against the blue simulation floor. A flat plate was used to prevent any slippage. The angle has stiffness and damping included to maintain stability due to the lack of other muscles. muscle’s actuating end, the cumulative force, com- prising active, passive, and damping forces, from any givenHill-typemusclemodel is converted into torque and directed to a revolute winch. This winch drives the muscle spool, which, in turn, is linked to simula- tion cables, representing rigid tendons that connect to their respective skeleton segments. To ensure mech- anical stability and prevent simulation crashes, pul- leys were introduced to avoid collisions between the cables and rigid bodies. Two pairs of mono-articulate agonist-antagonist muscles are used to actuate the hip and knee joints directly. The iliacus (IL) and gluteus maximus (GM) are used for the hip joint, and the vastus interme- dius (VA) and short biceps (SB) are used for the knee joint. Another agonist-antagonist pair is used for bi- articulation which has their winches placed on the pelvis, and the cable ends on the tibia. These are the rectus femoris (RF) and the long biceps (LB). For the motor babbling training data collection and themajority of the experiments, the pelvis is fixed in place using a weld joint, such that the rest of the leg is free to move without any interactions with the environment (see figure 2, left). Both the hip and knee joints include a damping of 0.3 Nms deg−1 to remove any mechanical oscillations during actuation and maintain the quasi-static assumption. A stiffness of 0.5 Nmdeg−1 is added to the knee joint to set the equilibriumposition of the tibia to 45◦. This was done to allow greater motion for the tibia during motor babbling, particularly when the SB and VA muscles are exerting similar amounts of force. For the final experiment involving a squatting motion, the foot is replaced by a simple rectangular plate attached to a revolute ankle joint, which has a spatial contact force model with a floor (see figure 2, right). The intent of the experiment is not to test the dynamic capability of the controller such as upright balancing, but rather to test its behaviour under unseen quasi-static environmental interac- tions. Hence, the pelvis mounting is also replaced with a prismatic joint such that it is only free to move up and down in the Z direction. The ankle joint in the model approximates the absence of ankle muscles by having stiffness and damping values set at 5 Nms deg−1 and 3 Nmdeg−1, respectively, with an equilibrium point at 17◦, corresponding to the leg’s initial squatting posture. These values were found to give sufficient mechanical support to the rest of the leg during the squatting motion. The contact force between the feet and the solid ground is modelled with parameters 1× 109 N m−1 for stiffness, 1 N s m−1 for damping, and a value of 1 for the coefficient of static friction. The length, width and thickness of the plate are 0.3 m, and 0.02 m respectively, with a mass of 100 kg. These values were found empirically and were used to prevent the leg from slipping. 3.2. Hill-type muscle model Themuscle-tendon units (MTU) actuating the leg are modelled asHill-typemuscles; the standard approach for approximating biological humanmuscles [55, 56]. Muscle j receives muscle activation signal aj, and out- puts the totalmuscle force F j MTU (equation (1)). Each muscle has a contractile, a spring and a damper ele- ment, arranged in parallel (see figure 3). The con- tractile element provides the active contraction force based on the muscle activation signal aj to produce 6 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al the force FCE. On the other hand, the spring and damper elements are responsible for the passive forces of themuscle, and are dependent on themuscle length and velocity to produce forces Fspring and Fdamper, respectively. The spring element models the natural behaviour of real muscles which tries to contract back to the resting length whenever it experiences exten- sion from external forces. The damping element res- ists the rate of extension or contraction [95] F j MTU (̂ l(t) , v̂(t) ,a(t) ) = FCE (̂ l(t) , v̂(t) ,a(t) ) + Fspring (̂ l(t) ) + Fdamper (v̂(t)) . (1) Hence, similar to biological muscles, the total out- put force FMTU of the muscle is not only due to the muscle activation of the contractile element, but also the spring and damper elements which gives the muscles passive dynamics. As a result, each muscle can also produce passive forces without any muscle activation. These components greatly affects the com- plexity and hence the overall behaviour of the mus- culoskeletal leg unlike certain tendon-driven artificial systems, which do not explicitly address the spring and damper elements [28]. Muscles also affect one another. For example the force from the antagonist will directly affect the velocity of the agonist. The force generation of the contractile element is modelled as follows: FCE (̂ l(t) , v̂(t) ,a(t) ) = FMax · aj (t) · fl (̂ l(t) ) · fv (v̂(t)) (2) where FMax is the maximum isometric force of the muscle, l̂(t) is the muscle fiber length normalised to the initial resting length lref, and v̂ is the fiber velo- city which is normalized to the maximum fiber velo- city vref. The values used for lref are the initial lengths when the leg is at its resting posture (see figure 2, left). The values of vref were found empirically by testing the maximum velocity of the individual muscles. For FMax, we used empirical values that correspond to the joint limits of the leg when the muscle activations aj are equal to 1. The values for lref, vref, and FMax were chosen empirically such that they allowed for a wide range of motions and are more specific to our mus- culoskeletal system (see table 6, appendix). The function fl(̂l(t)) is the muscle active force– length behaviour, which is modelled using a nor- mal distribution (equation (3)). The function fv(v̂(t)) is the active force–velocity behaviour of the muscle, which we modelled using a sigmoid function (equation (4)) fl (̂ l(t) ) = 1√ 2πσ exp − (̂ l(t)−µ )2 2σ2  (3) fv (v̂(t)) = A 1− ebv̂(t) (4) where the mean µ and the variance σ2 of the normal distribution, both set to a value of 1, and the amp- litude A, and gain b of the sigmoid function set to values of 1.8 and 5 respectively, such that the shape of fl(̂l(t)) and fv(v̂(t)) reflects the typical normalised act- ive force–length and force–velocity characteristics of the contractile element in priorworks [56, 95, 96]. See figure 3(b), for the force–length and force–velocities curves. The force output of the contractile element is therefore dependent on the current time-stepsmuscle activation aj, and the length and velocity of its fibers (equation (2)). The muscle activation a scales fc and is constrained within 0⩽ aj ⩽ 1. The force generated by the passive spring element (i.e. passive force–length characteristic), is modelled as a rectified linear function which increases linearly with K, the (unit-less) spring constant, if the norm- alised length l̂(t) is greater than 1 (equation (5)). For K, we used a value of 1 as muscles can only produce contraction forces, as well as to reflect values used in prior literature [72, 97] Fspring (̂ l(t) ) = FMax · { K̂l(t) ( if l̂> 1 ) 0 (otherwise) . (5) Similarly, we also modelled the passive damp- ing (i.e. passive force–velocity characteristic) as a lin- ear function as given by equation (6), which uses the (unit-less) damping coefficient C= 1 based on prior works [72, 95], as well as to prevent Hill-type muscles from exhibiting unrealistic high-frequency oscillations during the simulation Fdamper (v̂(t)) = FMax ·Cv̂(t) . (6) Whilst the simulation model may differ in spe- cific values to biological structures, the musculo- skeletal simulation itself is a vital abstracted model which captures the important characteristics of the musculoskeletal systems such as the highly non- linear behaviour, passive compliance and redund- ancy, arising from the Hill-type the mono and bi- articulate agonist-antagonist muscle pairs. 3.3. Inverse static mapping between activations and end-effector positions In general, musculoskeletal systems featuring Hill- type muscles exhibit mappings between the muscle space, the joint space, and the end-effector space [31, 63]. The forward kinematics mapping which relates the joint angles θ to the end-effector position x is given by equation (7). The forward statics mapping which relates the MTU forces FMTU to θ is given by equation (8). The forward mapping which relates the muscle activations a to FMTU is given by equation (1) in section 3.2. 7 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 3. (a) Schematic of the 3-element Hill-type muscle model. (b) Characteristics of the Hill-type muscle model. On the left is the normalised force output against the normalised length of the muscle under steady-state conditions. The passive force–length is produced by the spring, and the active force–length is produced by the contractile element. On the right is the normalised force against the normalised velocity of the muscle. The normalised active force–velocity curve, along with the muscle activation, scales the contractile force. The passive force–velocity curve is due to the passive damper of the muscle. x= f1 (θ) (7) θ = f2 (FMTU) . (8) The complete forward statics mapping for an arbitrary musculoskeletal system is therefore given by equation (9) x= f(a) . (9) Hence, the inverse mapping would be to find the muscle activation solutions a for a given end-effector pose x. We apply a learning-based inverse control approach [98–100], which assumes steady-state con- ditions. Obtaining the inverse of this function is non- trivial, due to the redundancy of solutions. These solutions are non-convexwhichmakes direct learning of inversions unattainable [79]. However, by linear- ising and taking the derivative at some arbitrary actu- ator state a0, the forward differential mapping can then be obtained, as given in equation (10), where J is the Jacobian that maps actuator velocities ȧ to state velocities ẋ ẋ= J ( a0 ) ȧ (10) Discretisation of equation (10) using Taylor expansion up to the first-order, leads to equation (11). After some rearranging (equation (12)), the general- ised Jacobian inverse G can be obtained, which maps the current system and actuator state at time-step i, xi and ai respectively, and the system state at time-step i+ 1, xi+1, to the next time-steps actuator state, ai+1, as given by equation (13) ∆x≈ J ( a0 ) ∆a (11) J(ai)ai+1 ≈ xi+1 − f(ai)+ J(ai)ai (12) ai+1 = G(xi+1 − xi)+ ai. (13) Then through the process of motor babbling where the system undergoes random actuator movement using randomised muscle activations a, the following mapping (equation (14)) can be approximately learnt by a neural network: (xi+1,xi,ai)→ (ai+1) (14) providing spatial locality is ensured, where |ai+1 − ai|< ϵ. The ϵ is recommended to be within 5% to 15% of the actuator input range [98]. However, in our case, due to the non-linear behaviour of the Hill-typemuscle, we increased this constraint to 30%. When the neural network is implemented as a con- troller (see figure 4), it should converge the system to the state xi+1, providing it is physically and geomet- rically reachable. Hence the xi+1 acts as the reference state for the controller. Unlike rigid robots, soft and musculoskeletal systems have compliance with envir- onmental obstacles, hence, the current state xi is also dependent on the environment. Due to the general- isability of neural networks, it is expected to control the system even without exploring the whole of the system’s actuator space. As we are mainly interested in the generation of activations that result in suitable muscle coordina- tion for a given desired target, we deliberately kept the motion constrained in the Saggital plane. Therefore, with the addition of the knee joint limits, there are unique joint angle posture solutions for each desired end-effector target position. This allows for the direct analysis of the controller’s behaviour regarding find- ing solutions to the muscle redundancy problem. To track the overall posture of the leg, we outputted the X and Z coordinate positions of the foot end-effector, as represented by the solid sphere in figure 2. 4. Experimental setup and results 4.1. Model-free controller training and implementation To obtain the controller, we use a feedforward neural network, which has three hidden layers, implemented in PyTorch. Each layer has 1000 nodes. This network 8 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 4. The implementation of the closed-loop model-free controller. The controller takes in the current state, xi = {Xi} which encodes the foot position, the desired final target position xi+1 = {Xi+1}, and the current muscle activations ai = {aLBi ,aGMi ,aSBi ,aVAi ,aILi ,a RF i }. The controller then outputs the next time-steps muscle activations ai+1, which brings the leg closer to the target posture provided it is kinematically feasible. architecture was built incrementally by adding hid- den layers until the root mean squared error on a small 20% training data subset of themodel with only one muscle enabled, the IL in our case, no longer improved. Except for changing the number of the input and the output nodes, we used the same net- work architecture for each leg configuration, e.g. the full 6-Muscle model uses the same architecture as the IL-only model. We intentionally retrained a new net- work for each muscle configuration to test the per- formance of the control approach as the musculo- skeletal leg increased in complexity.When implemen- ted, the controller receives the concatenated vector of the current muscle activation signals ai, current states xi, and desired reference states xi+1 at time step i (see figure 4). The controller then outputs the muscle activation signals for the next time step ai+1, such that the desired reference state is realised. The controller is run at 500 Hz. A low-pass filter with a gain of 0.01 is also applied to the output of the controller to main- tain the quasi-static conditions. For each control network corresponding to the different muscle configurations, we trained for 500 epochs using the mean squared error loss function, a 512 batch size and the ADAM optimizer. The ini- tial learning rate was 0.0001, with a momentum of 0.9. Although we did not search for the optimum parameters such as the control and motor-babbling frequency, learning hyper-parameters, as well as the amount of training data required, we have found that these settings resulted in reasonable performance for the controller. We conducted various experiments to validate the model-free inverse statics (IS) controller. The complexity of the musculoskeletal leg was systemat- ically increased by incrementally adding muscles to determine their effects on the controller’s perform- ance. This led to six muscle configuration cases. In the order of the experiments conducted, the muscle configuration cases were the iliacus only (IL-only), the gluteus maximus only (GM-only), the IL with the GM (IL+GM), the short biceps only (SB-only), the vastus intermedius only (VA-only), the SB with the VA (SB+VA), and finally with all the muscles enabled, including the bi-articulate rectus femoris (RF) and the long biceps (LB), whichwe refer to as the six-muscle case. To obtain the training data, we generated quasi- static motor babbling data at a frequency of 1 Hz, where the musculoskeletal leg would move from one random posture to another. For each posture, we randomly generated a final muscle activation a per muscle, with a ramp duration that varies between 2⩽ t⩽ 10 from the previous posture’s activation. The new randomly generated activations are within 30% of the maximum activation value of 1. The final muscle activation is then kept at this value with a dur- ation 5⩽ t⩽ 13, to ensure steady-state behaviours are included in the training data. A smoothing filter is applied to the ramp and steady activations to pre- vent jerky movements. We generated 5000 s of train- ing data for the single andmuscle-pairs configuration (IL-only, GM-only, IL andGMpairs, SB-only, and VA and SB pairs), with 1000 s for validation. Due to the increased complexity of the leg with both mono and bi-articulate pairs, we increased the amount of train- ing data to 10000 s, with 2000 s of validation. The random foot positions reached bymotor babbling for 9 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 5. The task-space X and Z positions reached by the leg’s end-effector during the motor babbling training data collection. For each muscle configuration, the range of the workspace is shown, with the initial posture of the corresponding leg model at the center for reference. Note that for the mono-articulate muscles which control the hip case, we fixed the knee at 45◦. Similarly, we fixed the hip angle at 0◦, when knee-related mono-articulate muscles were explored. Table 1. The mean average errors and their standard deviations for the 25 random target states for the mono-articulate muscles which have direct actuation over the hip and knee joints. The values are given as a percentage with respect to the overall length of the leg (0.74 m). Hip Muscles IL-only GM-only IL+GM IS Controller PI IS Controller PI IS Controller PI X Error (%) 0.23± 0.08 0.12± 0.18 0.08± 0.03 0.03± 0.03 0.19± 0.19 0.14± 0.23 Z Error (%) 0.09± 0.09 0.11± 0.20 0.12± 0.01 0.05± 0.04 0.08± 0.08 0.15± 0.43 Knee Muscles SB-only VA-only SB+VA X Error (%) 0.27± 0.05 0.00± 0.00 0.19± 0.07 0.00± 0.00 0.11± 0.11 0.00± 0.00 Z Error (%) 0.39± 0.01 0.00± 0.00 0.07± 0.03 0.00± 0.00 0.09± 0.11 0.00± 0.00 the six different muscle configurations are given in figure 5. 4.2. Hip and knee control experiments First, we tested the performance of the controller on the muscles which have direct authority over the hip and knee joints. For the hip joint control, we started by evaluating the IL-only and the GM-only cases, with all the other muscles removed from the simulation leg. This was done such that we could compare the behaviour of the model-free controller against a simple proportional-integral (PI) feedback controller. Then with increased leg complexity, we evaluated the controller’s performance when both the IL and the GM were enabled as a mono-articulate agonist-antagonist pair (IL+GM), and again com- pared to a PI controller. The P and I gains used for each muscle were 1, and 2, respectively. These values resulted in no overshoot, minimal steady-state errors and a rise-time of less than 5 s. For each of the three cases, we generated different 25 random target states within the leg’s operating space to obtain the average performance of the controller on position-reaching tasks. We then conducted similar experiments on the muscle cases VA, SB, and VA+SB which control the knee joint. Similar to the hip experiments, we evalu- ated different sets of 25 random target points against the simple PI controller. For these experiments, the initial state of the leg is where the hip joint is set to 0◦, with the knee angle set to 45◦ (see figure 5). Table 1 displays the mean absolute errors and standard deviations between the attained final steady states and the sets of 25 randomly selected target states for the cases involving the mono-articulate muscles. In general, the IS controller achieves relat- ively low steady-state errors for the various muscle cases, however, it does under-perform relative to the PI controller. The point of implementing the PI con- troller was not for a direct performance comparison, but to assess the generated muscle activations. Note that for the PI controller, we use the joint angles as the reference and feedback states to move the leg to the 25 target positions. 10 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 6. The resulting muscle activation signals for the IL+GM and VA+SB cases using the learning-based IS and the PI controllers for arbitrary examples within the target test data. The PI controller only activates the single muscle that moves to the target, whilst the IS controller utilises both agonist-antagonist muscles to achieve the same posture. An interesting behaviour can be seen in figure 6, where for the same target posture, the IS controller utilises both agonist-antagonist muscles to realise the target, whereas the PI controller would only activ- ate the single muscle that actuates the leg within that region. This shows that the IS controller is capable of utilisingmuscle redundancy and can automatically adjust joint stiffness without any human intervention unlike othermethods [28]. This capability of control- ler is the result of learning frommotor babbling train- ing data, which included various co-contraction levels of agonist-antagonistmuscle pairs.Hence, due to iter- ative nature of the formulation, the controller is able to infer appropriate co-contraction levels at each iter- ation to reach the target position by minimising the error between xi+1 and xi. Note that although the musculoskeletal leg only has a single degrees of freedom for the agonist (such as GM) and the agonist-antagonist (such as IL+GM) muscle configuration, a pure agonist model would not result in the same strategy compared to the agonist-antagonist configuration. For example, the addition of the IL muscle (antagonist to the GM), induces additional and opposing torque to the hip joint from either the passive forces (from the vis- coelastic spring and damper elements) or the con- tractile element triggered by the muscle activation. Hence, a GM only model would need less muscle force, and therefore muscle activation, to bring it closer to the target. These advantages are further proven in later experiments with the complete leg, such as muscle failure (see section 4.3.3) and varying payloads with and without external interactions (see sections 4.3.2 and 4.3.4). Due to the complexity of the complete leg with the mono and bi-articulate muscles, we no longer compare it to the PI controller in later exper- iments, as that would require more complex models and feedback controllers [58–61, 65, 75–77]. Table 2.Mean Average Error and their Standard Deviations for the 25 random target points for the complete Six-Muscle configuration. The values are given as a percentage with respect to the overall length of the leg (0.74 m). X Error (%) Z Error (%) 2.05± 1.88 1.22± 1.27 4.3. Six-muscle leg control experiments For the complete six-muscle case, we conducted six different experiments using the same network without any retraining. The first three experiments were to determine the general performance of the controller when coordinating themuscles of themus- culoskeletal leg for general manipulation without any external interactions. The hip of the system is moun- ted and the leg is free to move. The latter three exper- iments were performed to determine the behaviour of the controller when changes to the leg occur, such as varying payloads due to changes in mechanical structure, muscle failure, and even environmental interactions. 4.3.1. Randomised target postures, randomised initial postures and trajectory tracking Similar to the hip and knee joint experiments, this first six-muscle experiment was performed to obtain the average performance of the controller on 25 random target postures within the leg’s workspace to investigate basic control capabilities when bi- articulate muscles are included. As seen in table 2, the mean average errors and standard deviations are approximately an order of magnitude higher than the simple mono-articulate cases (see table 1). This is likely due to the compounding effects of muscle error during the controller training process, which propag- ates to the overall structure with all muscles included. A second experiment was performed to determ- ine the generalisability of the controller for 11 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Table 3.mean average error and their standard deviations for 3 random target points when the leg is initialised at 10 random postures. The values are given as a percentage with respect to the overall length of the leg (0.74 m). Target 1 Target 2 Target 3 X Error (%) 3.72± 3.69 3.26± 3.58 5.04± 3.32 Z Error (%) 2.19± 1.80 1.01± 1.00 3.59± 1.31 Figure 7. An example of X and Z posture reaching target performance and the generated muscle activations for when the leg starts at arbitrary random initial postures for Target 1. The controller can generate diverse muscle synergies to reach the target, adapting based on the initial leg position. This demonstrates the controller’s capability to identify appropriate global solutions for the redundancy problem. Table 3 displays the mean percentage errors corresponding to this experiment.. point-to-point reaching tasks, specifically when the leg starts from an arbitrary posture. Here, we chose three random target postures but initialised the leg at 10 random starting postures. Table 3 shows the steady-state mean average errors of the reaching tasks from random starting points. There is a slight reduc- tion in performance relative to the previous exper- iment (table 2), where the leg is initialised from the resting posture. This shows that the errors are not only dependent on the initial position of the end-effector but also on the target. Nevertheless, the errors in the X and Z axes are on average approxim- ately ⩽ 5% relative to the overall length of the leg. This shows that the controller is easily scalable to more complex mechanical structures with coupled and non-linear interactions between the actuation muscles without significant reductions in perform- ance. Furthermore, the standard deviations are relat- ively low, which indicates that the leg model reached the same target position from significantly different starting points with reasonable accuracy. Figure 7 shows an example of the leg incorpor- ating different muscle activations, and hence muscle coordination, for reaching the same target position from various initial postures. In the third experiment, we assessed the control- ler’s performance in tracking a specified trajectory. An ellipse was employed to evaluate how well the controller handled paths with both large and small curvatures. The ellipse has the radii of 8 and 16 cm in the X and Z axes respectively. The centre of the ellipse is at X= 8 cm and Z=−65 cm within the legs workspace. We also discretised the ellipse into 90 target positions. To establish appropriate control parameters for the controller, we modified the refer- ence posture xi+1 with either a 1 s or 15 s time delay. This adjustment aimed to investigate the system’s response under conditions where the target update aligns with the frequency of motor babbling train- ing data at 1 Hz and significantly slower intervals at 0.067 Hz (every 15 s). The resulting behaviour for the 1 and 15 s time delays are given in figure 8(a). The mean average res- ultant error for the entire tracking of the ellipse is given in table 4. The errors were calculated by taking the difference between the position reached by the leg at every 1 and 15 s. These are denoted by the crosses in the figure. It can be seen that the shorter time delay results in smoother trajectories with lower errors. The saw-tooth pattern in the trajectory observed with the 15 s time delay can likely be attributed to the interplay of non-linearmuscle behavior, gravitational effects, and suboptimal convergence characteristics of the controller. We also tested the repeatability of the trajectory tracking. Figure 8(b) shows the resulting trajectory when the controller is instructed to perform the same ellipse five times using the 1 s time delay. Variations 12 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 8. Tracking of an arbitrary ellipse with 90 way-points, within the leg’s operating workspace. (a) Tracking performance when the time delay to the next target xi is varied between 1 and 15 s. A smoother trajectory is achieved with a smaller time delay. Table 4 displays the corresponding mean percentage errors. (b) Tracking repeatability performance when the leg is cycled through the ellipse five times. The controller performs the same trajectory between the third and fifth cycles. Table 5 displays the corresponding mean percentage errors.. Table 4. The effects of the time delay on the mean average resultant error and standard deviations in the X and Z axes for entire tracking of the ellipse trajectory given in figure 8(a). The final position after each time delay is compared with the individual waypoints of the desired trajectory to obtain the errors. The values are given as a percentage with respect to the overall length of the leg (0.74 m). Time delay 1 s 15 s Error (%) 1.92± 0.73 2.20± 1.92 in the resulting trajectory are evident only between the first and third cycles, while the fourth and fifth cycles replicate the third cycle. This indicates the con- troller’s sensitivity to changes in the leg’s initial pos- ition during trajectory tracking. In the absence of sensor noise in the simulation and the deterministic nature of feedforward neural networks, the leg con- verged to a repeated final position by the end of the second cycle. This final position then serves as the starting point for the subsequent cycle, resulting in the repetition of the same trajectory from the third cycle onward. It is noteworthy, however, that minimal errors persist and remain consistent with the earlier experiments. 4.3.2. Performance under payload changes For this fourth experiment, we varied the weight of the leg’s skeleton to mimic mechanical changes that can occur to physical systems. For example, the replacement of parts orwhen the it is loadedwith aux- iliary components such as a portable power source. We varied the mass between ±20% of the original weight. We again obtained the performance of the controller for 25 random target postures. Figure 9 compares the steady-state mean aver- age errors of the target reaching tasks with different bone weights. From figure 9(a), it can be seen that the higher the deviation from the original weight, the larger the error in both the X and Z axes. Figure 9(b) shows the trajectory for an arbitrary example. However, their standard deviations greatly overlap, suggesting that the changes in structural payload have a minute effect on the control per- formance accuracy. Hence, the controller can accom- modate such changes without retraining and without including such information in the training data. This is shown in figure 9(c), where the controller automat- ically adjusts the required muscle activations for all muscles, and hence, applies an appropriate stiffness of the joints, without any further intervention. 4.3.3. Performance under muscle failure Musculoskeletal systems have the advantage ofmuscle redundancy. If a muscle undergoes fatigue or failure, it is possible to achieve the same motion by incor- porating other muscles. Hence, for this experiment, we determine the generalisation of the learning-based controller when muscles are disabled in the model. We again obtained results for 25 random target pos- tures. Note that the disabled muscle is still present within the simulation, meaning it still has passive forces from the spring and damping elements, how- ever, it does not contract when given muscle activ- ation a, similar to partial paralysis induced by stroke [19]. Stroke patients experience difficulties in regulat- ing stretch reflex thresholds of muscles. This directly affects muscle activations and coordination, resulting in partial actuation [20, 21, 101]. Figure 10(a) compares the steady-state mean average errors when disabling each of the six muscles individually. Depending on which muscle was dis- abled, the error variations differed greatly. For instance, disabling the GM muscle considerably increased errors of the X and Z positions. On the other hand, disabling the VA muscle induced almost no change in the error for all target states. This sug- gests that certain muscles are more prominent than others. Nevertheless, the controller can adapt to fail- ures of certain muscles without retraining. This can be seen in the activation graphs in figure 10(b), wheremuscles that are turned off are compensated by increasing the activation of the other muscles. 13 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Table 5.mean average resultant Error in the X and Z axes and their standard deviations for the different cycles of the ellipse trajectory tracking given in figure 8(b). The final position after each time delay is compared with the individual waypoints of the desired trajectory to obtain the errors. The values are given as a percentage with respect to the overall length of the leg (0.74 m). Number of cycles 1st Cycle 2nd Cycle 3rd Cycle 4th Cycle 5th Cycle Error (%) 1.92± 0.73 3.31± 2.22 2.41± 1.39 2.41± 1.39 2.41± 1.39 Figure 9. (a): Mean average error and the standard deviations for the various weight deviations of the femur and the tibia relative to the original weight. The Standard Deviations overlap greatly suggesting no significant changes in the controller performance. The values are given as a percentage with respect to the overall length of the leg (0.74 (m). (b) An example trajectory for the minimum (−20%) and maximum weight (20%) deviations against the original weight (0%). (c) Example muscle activations for the example given in (b). The IS controller automatically changes the required muscle activation to accommodate lighter and heavier structures. The IS controller determines whether a muscle is deactivated based on the current state. For example, for the disabled GM muscle (see figure 10(b)), an arbitrary current input to the network at time t= i is equivalent to (xi+1,xi,ai) = (xi+1,xi,{aLBi ,aGMi = 0, ..,aRFi }). As the controller has been trained to min- imise the error between xi+1 (the reference) and xi (the current end-effector position), the corres- ponding output muscle activations for the next time step ai+1, automatically scales the muscle activations for the non-deactivated muscles. This is the result of the learnt pseudo-inverse approximation of the local Jacobian which contains approximated map- pings between xi,xi+1,ai to ai+1 in various task-space positions. We also tested the performance of the system when up to three muscles were disabled (figure 11). It is reasonable that the more muscles are disabled, the higher the error due to more limited movement. For example, disabling the SB and LB muscles removes the ability to perform knee flexion. 4.3.4. Performance under external physical interaction For the final experiment, the controller was evalu- ated if it can tolerate interactions with the envir- onment. The leg was controlled to track a squat- ting motion trajectory, with its foot always in contact with the ground (see section 3.1 for the experimental setup). The target reference posture cycles between −68 cm, where the end-effector is further from the 14 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 10. (a) The mean average error and standard deviations of 25 target states for the cases where each muscle has the contractile element disabled. The values are given as a percentage with respect to the overall length of the leg (0.74 (m). The disabling of the GM resulted in the highest error which suggests importance in the overall structure. (b) An arbitrary example of muscle activations for the disabled GM and VA cases for the same target state. The IS controller was able to compensate for the disabled muscles by increasing the muscle activation of the others, thereby showing the utilisation of muscle redundancy. Figure 11. The mean average error and standard deviations of 25 target states for up to three muscles with the contractile element disabled. The values are given as a percentage with respect to the overall length of the leg (0.74 m). As expected, the more muscles are disabled, the higher the error. Removal of both the SB and the LB removes the capability to perform knee flexion. pelvis coordinate axes resulting in a stood-up posture, and−58 cm which results in the squat posture in the Z axis. The target X axis is kept at zero. The refer- ence is updated after every 15 s to give the controller enough time to stabilise due to quasi-static assump- tions. This experiment was conducted with different payloads by varying the weight of the pelvis to 1 kg, 3 kg, and 5 kg. Figure 12(a) shows the trajectory of the pelvis against the squatting target reference. No slipping was observed during all squatting. For all payload settings, the IS controller achieved stable squatting without any retraining, even though the training pro- cess did not address interaction with the environ- ment. Reasonably accurate tracking performance was achieved when the payload was at 1 kg. A slight reduction in tracking performance is seen, as the payload is increased to 3 and 5 kg. However, an inter- esting behaviour can be seen in the muscle activation graphs at the bottom of figure 12(b) as the payload is increased. These graphs show that the controller automatically outputs the appropriate muscle activ- ation signals depending on the payload similar to our weight deviation and muscle redundancy exper- iments in sections 4.3.2 and 4.3.3. For example, as the payload gets heavier, the higher the RF activa- tion signal the controller outputs at a standing pos- ition to maintain the posture. Similarly, the control- ler incorporates higher SB and GM activations for the heaviest payload at 5 kg, whilst decreasing the VA and LB activations during the squat posture. Thus, the controller can automatically adapt the leg’s stiff- ness required for payloads at least 30% of the overall weight by coordinating andutilising the redundancies 15 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al accordingly. Future work will examine the limits of this behaviour. 5. Discussion and conclusions Musculoskeletal systems featuring agonist-antagonist pairs of muscles offer the advantages of functional redundancy, enabling compensation for muscle fail- ures, and allowing for joint-stiffness adjustment by varying co-contraction levels of muscle pairs. This feature is particularly useful for coordinating move- ment under diverse payloads. However, akin to their biological counterparts, these systems also encounter the motor equivalence problem, which posits that coordination and movement have infinitely many solutions. In this study, we introduce the model-free inverse statics (IS) controller, capable of generating suitable and appropriate muscle activations depending on the task requirement. This enables the musculoskeletal leg to exhibit coordinated behaviour even in the pres- ence of muscle failure and varying payloads without the need for retraining, distinguishing it from other model-free or bio-inspired approaches. The IS controller demonstrated reasonable accur- acy, displaying minimal error deviations across all control experiments. The experimental results have shown that the controller is easily scalable from simple mono-articulate cases to more complex struc- tures that augment the system with bi-articulate muscles without amending the network architec- ture. Through the straightforward process of train- ing on new motor-babbling data on the same archi- tecture, deployment and control of the system were achieved without requiring any assumptions about the arrangement, the muscle model, and the coupling and interference of the muscles, unlike model-based methods. The controller effectively generated appro- priatemuscle activations to reach desired targets, even when faced with redundancy in solutions within the actuator and task (end-effector) space. This was val- idated in the first three experiments involving the complete six-muscle case. Consistently low errors and standard deviations were achieved in target-reaching tasks, irrespective of the initial state of the leg, as well as in trajectory tracking. While the controller’s per- formance in mono-articulate cases falls short com- pared to the PI controller, the errors remain com- parable to the accuracy and variance observed in human arm movements in point-to-point reaching tasks [102]. The controller’s formulation is the learning-based equivalent of a resolved motion rate controller [79]. Leveraging the generalisability of neural networks and the iterative nature of the approximated pseudo- inverse of the local Jacobian, the controller can con- verge close to desired targets, even in the pres- ence of mechanical changes in the system [99, 100]. Consequently, the control system has proven its capacity to compensate for inertial changes in the musculoskeletal system, exhibiting an insignificant reduction in performance compared to the baseline (no weight change), as demonstrated in our weight deviation experiment in section 4.3.2. An essen- tial characteristic of musculoskeletal systems is their ability to control joint stiffness based on the co- contraction of agonist-antagonist muscles, depend- ing on the task requirements [28]. As depicted in figure 9(b), the controller autonomously adjusts the stiffness of the joints for both lighter and heavier musculoskeletal structures by modifying the gener- ated muscle activations accordingly. This adjustment occurs without the need for further retraining or human intervention. Even in the event of actuation failure or fatigue, motion tasks remain achievable by utilising altern- ative muscles. Therefore, it is crucial to employ a controller capable of leveraging this redundancy. However, previous works on learning-based con- trollers have not fully addressed this advantage [27, 29, 62–64] or necessitate some form of re-learning process [65, 72]. For musculoskeletal systems, a malfunctioning muscle induces significant changes in mechanical functionality and operation. Patients with neuromuscular disorders still have their muscles intact with passive dynamics. We modelled these partial actuation scenario’s in our muscle disabling experiment (see section 4.3.3) to determine the res- ulting muscle synergy for coordination using our learning-based IS controller. We demonstrated that our controller can gen- eralise to cases where individual muscles are deac- tivated while maintaining performance. As the IS controller has been trained to minimise the error between xi+1 and xi, the deactivatedmuscles are com- pensated by increasing the activation of the other muscles. Similar strategies are employed by biolo- gical systems. For example, in the study by Cheung et al [21], where the activity of the other muscles are increased to compensate for arm impairment, the IS controller increased muscle activations for the bi- articulate muscles LB and RF when either the GM or the VA muscles were disabled as seen in figure 10(b). In addition, similar to Bonnard et al’s [15] study on the effects of fatigue on hopping, the biggest increase in the activated muscles are seen in the bi- articulate muscles to compensate for the disabled mono-articulate muscles. This shows that exploiting muscle redundancy is important in both biological and artificial systems for compensatory strategies, and that it is possible to achieve a simple internal feed- back model which can compensate for muscle fail- ure by learning directly on stochastic motor-babbling movements produced by a musculoskeletal struc- ture without requiring extensive knowledge about the system. The higher mean average error for the deactiv- ated GM in figure 10(b) is likely attributed to its 16 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Figure 12. Z (height) positions and the corresponding muscle activation signals for the squatting task under varying payloads (the weight of the pelvis). The shaded and non-shaded areas indicate the bent and stood-up postures, respectively. Although not seen in the training data, the IS controller can account for the interaction by adjusting the muscle activation signals accordingly. For example, the magnitude of the muscle activation signals for the RF, SB and GM increased dramatically during the bent posture to accommodate heavier payloads without any further retraining or intervention. importance in the embodiment andmovement of the leg model, at least within our model. Nevertheless, the controller can still recruit other muscles to reach the target position by employing a different yet suit- able muscle synergy. We also examined the control performance when more than one muscle is dis- abled. As depicted in figure 11, the positional error increased with the additional disabled muscles. This increase is attributed to the mechanical limitation arising from disabling muscles that facilitate knee flexion. Nonetheless, it highlights the control’s capab- ility to tolerate the failure of more than one muscle, a feature not accommodated by priormodel-free works [65]. The mapping between the state posture of the leg and the IS solution is not unique, as it is influenced by the initial state of the leg, muscle configurations, and the internal and external forces applied to the system. Leveraging the advantages of joint stiffness adaptation and the utilisation of muscle redundancy for different solution synergies, we demonstrate that the controller can also withstand interactions with the environment without requiring prior informa- tion from motor babbling data or further retraining. This underscores the benefits of the local inverse stat- ics strategy compared to model-free global IKs tech- niques like goal babbling [103], which learn specific solutions to the inverse problem. As illustrated in figure 12, the controller can adapt its muscle activ- ation outputs to perform a squatting motion under varying payloads. The RF muscle activation, in par- ticular, exemplifies this adaptability, with increased muscle activation when standing up to maintain posture. In conclusion, previousmodel-basedmethods for biological and artficial systems necessitate compre- hensive prior knowledge of the musculoskeletal sys- tem, relying on several assumptions such as negli- gible friction, fully accessible muscle states, minimal external interactions, and a static mechanical struc- ture. On the other hand, model-free control meth- ods, although easily scalable to the complex struc- ture of musculoskeletal systems, are limited in har- nessing advantages like muscle redundancy for man- aging muscle failure and adapting to changes in body weight. In this study, we introduce the use of the local inverse statics formulation as an alternat- ive view on how feedback controllers can be learned in both biological and artificial musculoskeletal sys- tems, that is practical and straightforward to train and deploy. The controller maintains reasonable per- formance even when the system’s body undergoes substantial changes in self-configuration and physical interactions, without the need for explicitly formu- lating specific mappings between muscle, joint, and end-effector spaces. 5.1. Future work In future work, improvements to control accuracy will be pursued through the optimisation of hyper- parameters, including the number of nodes and hid- den layers. The observation that a 1 s time delay yields smoother trajectories compared to a 15 s delay for trajectory tracking warrants further investigation in 17 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al future work. A more detailed analysis will be conduc- ted to identify the optimal time-delay value. Unlike traditional rigid robots, it is not straight- forward to decouple the kinematics from the dynam- ics in musculoskeletal systems. For example, IK con- trol and time-dependent trajectory tracking that is vital in production lines can be achieved through vari- ous analytical or learning-based approaches provided the robot uses sufficiently powerful motors that can accommodate the mass and inertia of the links, and with very simple interactions or payloads that is within the performance of the motors. As a result, the robotics community often treats IKs and dynam- ics as separate problems and applies them based on task requirements. In musculoskeletal systems, infinite solutions exist for achieving the same task space position across the joint space, which governs kinematics, as well as in the muscle activation and neural spaces, which govern dynamics and forces [31]. Hence, it is not possible to separate the desired posture from the forces required to move to, and maintain that pos- ture in musculoskeletal systems. Therefore, to con- trol joints and achieve desired postures, coordina- tion of agonist-antagonistic pairs of mono or multi- articulate muscle type actuators (such as Hill-type or McKibben muscles), each with their own pass- ive (spring and damper elements) and active (con- tractile element) dynamics, is necessary, regardless of the speed or acceleration intensity of the move- ment required. Hence, for musculoskeletal systems, the primary distinction between slow (or quasi-static) and fastermovements lies in the higher inertial forces. However, due to the quasi-static assumptions of the feedback controller, our current work is limited to slower movements, with higher inertial forces cur- rently out of scope. We aim to address this limitation through the addition of a feedforward component. It is widely accepted that humans use a hybrid of feedback and feedforward models within the CNS for fast movements [35, 36]. Feedforward models allow for fastermovements as they are not constantly reliant on proprioception, and can predict for a longer horizon, rather than a single time-step ahead. This extension can be achieved by incorporating network architec- tures suitable for time-series data, such as recurrent neural networks [104] or transformers [105]. The current simulation deliberately constrains movement to the 2D sagittal plane to simplify the ana- lysis of the controller’s capabilities with a musculo- skeletal structure. Due to iterative nature of the for- mulation, which finds the IK solution to the target goal given the systems current posture, the approach is scalable to 3D joint kinematics through the addition of a Z axis component in the state vector. Prior works with kinematically hyper-redundant soft continuum robots using the same learning controller have shown scalability to 3D kinematics [99, 100]. Hence, future research will involve implementing and experiment- ing with the controller in 3D simulations and on a physical robot as an extension of the present study. Since the controller only requires task space state and control inputs, a physical system would only require a motion tracker for the end-effector, eliminating the need for additional sensors for joint angles or internal states of Hill-type muscles. Investigating the impact of sensor noise on the controller’s performance for a musculoskeletal system would be an ideal avenue for exploration.However, it has been shown that the con- troller can tolerate noise for 3D soft robots in prior work [99]. Finally, the extension of the work to faster movement also lays the foundation for extending the work to locomotion, rather than manipula- tion. Locomotion and manipulation are generally very distinct problems with vastly different set of requirements [106]. Themore dynamic requirements of biological walking gaits among other factors such as balancing, are currently out of scope of the paper. Nevertheless locomotion is also an important aspect to explore, which will be addressed by extending the simulation to a bi-pedal structure. Data availability statement All data that support the findings of this study are included within the article (and any supplementary files). Acknowledgment This work was supported by the AgriFoRwArdS Centre for Doctoral Training programme under the UKRI grant [EP/S023917/1], the Jersey Farmers Union, JSPS Grant-in-Aid for JSPS Fellows Grant Number JP24KJ0338, JST SPRING [JPMJSP2114], and the GP-Mech Program of Tohoku University. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising. Conflict of interest The authors declare that the research was conducted in the absence of any commercial or financial rela- tionships that could be construed as a potential con- flict of interest. Author contributions Authors E A and T S made a substantial, direct and intellectual contribution to the work as co-first authors. They contributed equally to the develop- ment of the simulation, carrying out the experiments and writing the manuscript. F I supervised the work. A A, M H, and F I contributed to the revisions. All authors approved it for publication. 18 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al Appendix Table 6. Parameters of muscles utilized in the experiments. IL GM VA SB RF LB lref (m) 0.502 612 0.407 100 0.195 512 0.194 165 0.293 156 0.247 841 vref (m s−1) 1.0 1.0 1.0 1.0 1.0 1.0 FMax (N) 220 220 320 320 760 760 ORCID iDs Elijah Almanzor https://orcid.org/0000-0001- 8101-4217 Taku Sugiyama https://orcid.org/0000-0001-9235- 8779 Mitsuhiro Hayashibe https://orcid.org/0000- 0001-6179-5706 Fumiya Iida https://orcid.org/0000-0001-9246- 7190 References [1] Latash M L 2010 Motor synergies and the equilibrium-point hypothesisMotor Control 14 294–322 [2] Pfeifer R, Lungarella M and Iida F 2007 Self-organization, embodiment and biologically inspired robotics Science 318 1088–93 [3] Marjaninejad A and Valero-Cuevas F 2019 Should anthropomorphic systems be “redundant”? (Springer Tracts in Advanced Robotics) vol 124 (Springer) pp 7–34 [4] Bizzi E, Chapple W and Hogan N 1982 Mechanical properties of muscles: implications for motor control Trends Neurosci. 5 395–8 [5] Minetti A E 1994 Contraction dynamics in antagonist muscles J. Theor. Biol. 169 295–304 [6] Qiao H, Chen J and Huang X 2021 A survey of brain-inspired intelligent robots: integration of vision, decision, motion control and musculoskeletal systems IEEE Trans. Cybern. 52 11267–80 [7] Prilutsky B I and Zatsiorsky V M 2002 Optimization-based models of muscle coordination Exerc. Sport. Sci. Rev. 30 32–38 [8] Koshland G F, Galloway J C and Farley B 2005 Novel muscle patterns for reaching after cervical spinal cord injury: a case for motor redundancy Exp. Brain Res. 164 133–47 [9] Benton A M, Amiri P, Henson D P, Sivapuratharasu B, Mcgregor A H and Bull A M J 2023 Characterization of muscle recruitment during gait of bilateral transfemoral and through-knee persons with limb loss Front. Bioeng. Biotechnol. 11 1128528 [10] Singh T and Latash M L 2011 Effects of muscle fatigue on multi-muscle synergies Exp. Brain Res. 214 335–50 [11] Savelberg H H, Verdijk L B, Willems P J and Meijer K 2007 The robustness of age-related gait adaptations: can running counterbalance the consequences of ageing? Gait Posture 25 259–66 [12] Côté J N, Mathieu P A, Levin M F and Feldman A G 2002 Movement reorganization to compensate for fatigue during sawing Exp. Brain Res. 146 394–8 [13] Psek J A and Cafarelli E 1993 Behavior of coactive muscles during fatigue J. Appl. Physiol. 74 170–5 [14] Aymard C, Katz R, Lafitte C, Le Bozec S and Pénicaud A 1995 Changes in reciprocal and transjoint inhibition induced by muscle fatigue in man Exp. Brain Res. 106 418–24 [15] Bonnard M, Sirin A V, Oddsson L and Thorstensson A 1994 Different strategies to compensate for the effects of fatigue revealed by neuromuscular adaptation processes in humans Neurosci. Lett. 166 101–5 [16] Avrillon S, Guilhem G, Barthelemy A and Hug F 2018 Coordination of hamstrings is individual specific and is related to motor performance J. Appl. Physiol. 125 1069–79 [17] Lomond K V and Côté J N 2010 Movement timing and reach to reach variability during a repetitive reaching task in persons with chronic neck/shoulder pain and healthy subjects Exp. Brain Res. 206 271–82 [18] Lund J P, Donga R, Widmer C G and Stohler C S 1991 The pain-adaptation model: a discussion of the relationship between chronic musculoskeletal pain and motor activity Can. J. Physiol. Pharmacol. 69 683–94 [19] Roby-Brami A, Feydy A, Combeaud M, Biryukova E V, Bussel B and Levin M F 2003 Motor compensation and recovery for reaching in stroke patients Acta Neurol. Scand. 107 369–81 [20] Jones T A 2017 Motor compensation and its effects on neural reorganization after stroke Nat. Rev. Neurosci. 18 267–80 [21] Cheung V C K, Turolla A, Agostini M, Silvoni S, Bennis C, Kasi P, Paganoni S, Bonato P and Bizzi E 2012 Muscle synergy patterns as physiological markers of motor cortical damage Proc. Natl Acad. Sci. 109 14652–6 [22] Raghavan P, Santello M, Gordon A M and Krakauer J W 2010 Compensatory motor control after stroke: an alternative joint strategy for object-dependent shaping of hand posture J. Neurophysiol. 103 3034–43 [23] Roby-Brami A, Fuchs S, Mokhtari M and Bussel B 1997 Reaching and grasping strategies in hemiparetic patients Motor Control 1 72–91 [24] Zariffa J, Steeves J and Pai D K 2012 Changes in hand muscle synergies in subjects with spinal cord injury: characterization and functional implications J. Spinal Cord Med. 35 310–8 [25] Chang Y-H, Auyang A G, Scholz J P and Nichols T R 2009 Whole limb kinematics are preferentially conserved over individual joint kinematics after peripheral nerve injury J. Exp. Biol. 212 3511–21 [26] Büchler D, Calandra R, Schölkopf B and Peters J 2018 Control of musculoskeletal systems using learned dynamics models IEEE Robot. Autom. Lett. 3 3161–8 [27] Driess D, Zimmermann H, Wolfen S, Suissa D, Haeufle D, Hennes D, Toussaint M and Schmitt S 2018 Learning to control redundant musculoskeletal systems with neural networks and sqp: exploiting muscle properties 2018 IEEE Int. Conf. on Robotics and Automation (ICRA) (Brisbane, QLD, Australia, 21–25 May 2018) pp 6461–8 [28] Kawaharazuka K, Tsuzuki K, Onitsuka M, Asano Y, Okada K, Kawasaki K and Inaba M 2020 Musculoskeletal autoencoder: a unified online acquisition method of intersensory networks for state estimation, control and simulation of musculoskeletal humanoids IEEE Robot. Autom. Lett. 5 2411–8 [29] Jäntsch M, Schmaler C, Wittmeier S, Dalamagkidis K and Knoll A 2011 A scalable joint-space controller for musculoskeletal robots with spherical joints 2011 IEEE Int. Conf. on Robotics and Biomimetics (Karon Beach, Thailand, 7–11 December 2011) pp 2211–6 [30] Zhou J, Chen J, Deng H and Qiao H 2019 From rough to precise: human-inspired phased target learning framework 19 https://orcid.org/0000-0001-8101-4217 https://orcid.org/0000-0001-8101-4217 https://orcid.org/0000-0001-8101-4217 https://orcid.org/0000-0001-9235-8779 https://orcid.org/0000-0001-9235-8779 https://orcid.org/0000-0001-9235-8779 https://orcid.org/0000-0001-6179-5706 https://orcid.org/0000-0001-6179-5706 https://orcid.org/0000-0001-6179-5706 https://orcid.org/0000-0001-9246-7190 https://orcid.org/0000-0001-9246-7190 https://orcid.org/0000-0001-9246-7190 https://doi.org/10.1123/mcj.14.3.294 https://doi.org/10.1123/mcj.14.3.294 https://doi.org/10.1126/science.1145803 https://doi.org/10.1126/science.1145803 https://doi.org/10.1007/978-3-319-93870-7_2 https://doi.org/10.1016/0166-2236(82)90221-1 https://doi.org/10.1016/0166-2236(82)90221-1 https://doi.org/10.1006/jtbi.1994.1150 https://doi.org/10.1006/jtbi.1994.1150 https://doi.org/10.1109/TCYB.2021.3071312 https://doi.org/10.1109/TCYB.2021.3071312 https://doi.org/10.1097/00003677-200201000-00007 https://doi.org/10.1097/00003677-200201000-00007 https://doi.org/10.1007/s00221-005-2218-9 https://doi.org/10.1007/s00221-005-2218-9 https://doi.org/10.3389/fbioe.2023.1128528 https://doi.org/10.3389/fbioe.2023.1128528 https://doi.org/10.1007/s00221-011-2831-8 https://doi.org/10.1007/s00221-011-2831-8 https://doi.org/10.1016/j.gaitpost.2006.04.006 https://doi.org/10.1016/j.gaitpost.2006.04.006 https://doi.org/10.1007/s00221-002-1186-6 https://doi.org/10.1007/s00221-002-1186-6 https://doi.org/10.1152/jappl.1993.74.1.170 https://doi.org/10.1152/jappl.1993.74.1.170 https://doi.org/10.1007/BF00231064 https://doi.org/10.1007/BF00231064 https://doi.org/10.1016/0304-3940(94)90850-8 https://doi.org/10.1016/0304-3940(94)90850-8 https://doi.org/10.1152/japplphysiol.00133.2018 https://doi.org/10.1152/japplphysiol.00133.2018 https://doi.org/10.1007/s00221-010-2405-1 https://doi.org/10.1007/s00221-010-2405-1 https://doi.org/10.1139/y91-102 https://doi.org/10.1139/y91-102 https://doi.org/10.1034/j.1600-0404.2003.00021.x https://doi.org/10.1034/j.1600-0404.2003.00021.x https://doi.org/10.1038/nrn.2017.26 https://doi.org/10.1038/nrn.2017.26 https://doi.org/10.1073/pnas.1212056109 https://doi.org/10.1073/pnas.1212056109 https://doi.org/10.1152/jn.00936.2009 https://doi.org/10.1152/jn.00936.2009 https://doi.org/10.1123/mcj.1.1.72 https://doi.org/10.1123/mcj.1.1.72 https://doi.org/10.1179/2045772312Y.0000000037 https://doi.org/10.1179/2045772312Y.0000000037 https://doi.org/10.1242/jeb.033886 https://doi.org/10.1242/jeb.033886 https://doi.org/10.1109/LRA.2018.2849601 https://doi.org/10.1109/LRA.2018.2849601 https://doi.org/10.1109/ICRA.2018.8463160 https://doi.org/10.1109/LRA.2020.2972841 https://doi.org/10.1109/LRA.2020.2972841 https://doi.org/10.1109/ROBIO.2011.6181620 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al for redundant musculoskeletal systems Front. Neurorobot. 13 61 [31] Carpenter M B 1968 The co-ordination and regulation of movements J. Neuropathol. Exp. Neurol. 27 348–348 [32] Lashley K S 1951 The problem of serial order in behavior vol 26 (Wiley) pp 112–46 [33] Hirashima M and Oya T 2016 How does the brain solve muscle redundancy? Filling the gap between optimization and muscle synergy hypotheses Neurosci. Res. 104 80–87 [34] Flash T, Meirovitch Y and Barliya A 2013 Models of human movement: trajectory planning and inverse kinematics studies Robot. Auton. Syst. 61 330–9 [35] Kawato M 1999 Internal models for motor control and trajectory planning Curr. Opin. Neurobiol. 9 718–27 [36] Desmurget M and Grafton S 2000 Forward modeling allows feedback control for fast reaching movements Trends Cogn. Sci. 4 423–31 [37] Lacquaniti F, Terzuolo C and Viviani P 1983 The law relating the kinematic and figural aspects of drawing movements Acta Psychol. 54 115–30 [38] Karklinsky M and Flash T 2015 Timing of continuous motor imagery: the two-thirds power law originates in trajectory planning J. Neurophysiol. 113 2490–9 [39] Todorov E and Jordan M I 1998 Smoothness maximization along a predefined path accurately predicts the speed profiles of complex arm movements J. Neurophysiol. 80 696–714 [40] Uno Y, Kawato M and Suzuki R 1989 Formation and control of optimal trajectory in human multijoint arm movement Biol. Cybern. 61 89–101 [41] Harris C M and Wolpert D M 1998 Signal-dependent noise determines motor planning Nature 394 780–4 [42] Flash T and Hogan N 1985 The coordination of arm movements: an experimentally confirmed mathematical model J. Neurosci. 5 1688–703 [43] Mohan V and Morasso P 2011 Passive motion paradigm: an alternative to optimal control Front. Neurorobot. 5 4 [44] Flash T and Hochner B 2005 Motor primitives in vertebrates and invertebrates Curr. Opin. Neurobiol. 15 660–6 [45] Sanzari M, Ntouskos V and Pirri F 2019 Discovery and recognition of motion primitives in human activities PLoS One 14 1–39 [46] d’Avella A, Saltiel P and Bizzi E 2003 Combinations of muscle synergies in the construction of a natural motor behavior Nat. Neurosci. 6 300–8 [47] Alnajjar F, Itkonen M, Berenz V, Tournier M, Nagai C and Shimoda S 2015 Sensory synergy as environmental input integration Front. Neurosci. 8 436 [48] Esmaeili S, Karami H, Baniasad M, Shojaeefard M and Farahmand F 2022 The association between motor modules and movement primitives of gait: a muscle and kinematic synergy study J. Biomech. 134 110997 [49] Rohrer B, Fasoli S, Krebs H I, Hughes R, Volpe B, Frontera W R, Stein J and Hogan N 2002 Movement smoothness changes during stroke recovery J. Neurosci. 22 8297–304 [50] Taga G, Yamaguchi Y and Shimizu H 1991 Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment Biol. Cybern. 65 147–59 [51] Mussa-Ivaldi F and Solla S 2004 Neural primitives for motion control IEEE J. Ocean. Eng. 29 640–50 [52] Klarner T and Zehr E P 2018 Sherlock holmes and the curious case of the human locomotor central pattern generator J. Neurophysiol. 120 53–77 [53] Wang H, Nonaka T, Abdulali A and Iida F 2023 Coordinating upper limbs for octave playing on the piano via neuro-musculoskeletal modeling Bioinspir. Biomim. 18 066009 [54] Leconte P, Orban de Xivry J-J, Stoquart G, Lejeune T and Ronsse R 2016 Rhythmic arm movements are less affected than discrete ones after a stroke Exp. Brain Res. 234 1403–17 [55] Hill A V 1938 The heat of shortening and the dynamic constants of muscle Proc. R. Soc. B 126 136–95 [56] Heinen F, Lund M E, Rasmussen J and de Zee M 2016 Muscle-tendon unit scaling methods of hill-type musculoskeletal models: an overview Proc. Inst. Mech. Eng. H 230 976–84 [57] Chen J and Qiao H 2022 Motor-cortex-like recurrent neural network and multitask learning for the control of musculoskeletal systems IEEE Trans. Cogn. Dev. Syst. 14 424–36 [58] Zhao L, Li Q, Liu B and Cheng H 2019 Trajectory tracking control of a one degree of freedom manipulator based on a switched sliding mode controller with a novel extended state observer framework IEEE Trans. Syst. Man Cybern. Syst. 49 1110–8 [59] Kawamura M, Ookubo S, Asano Y, Kozuki T, Okada K and Inaba M 2016 A joint-space controller based on redundant muscle tension for multiple dof joints in musculoskeletal humanoids 2016 IEEE-RAS 16th Int. Conf. on Humanoid Robots (Humanoids) (Cancun, Mexico, 15–17 November 2016) pp 814–9 [60] Tahara K, Arimoto S, Sekimoto M and Luo Z-W 2009 On control of reaching movements for musculo-skeletal redundant arm model Appl. Bionics Biomech. 6 11–26 [61] Stanev D and Moustakas K 2018 Simulation of constrained musculoskeletal systems in task space IEEE Trans. Biomed. Eng. 65 307–18 [62] Masuda H, Hitzmann A, Hosoda K and Ikemoto S 2019 Common dimensional autoencoder for learning redundant muscle-posture mappings of complex musculoskeletal robots 2019 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) (Macau, China, 3–8 November 2019) pp 2545–50 [63] Tahara K, Kuboyama Y and Kurazume R 2012 Iterative learning control for a musculoskeletal arm: utilizing multiple space variables to improve the robustness 2012 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (Vilamoura-Algarve, Portugal, 7–12 October 2012) pp 4620–5 [64] Zhong S, Chen J, Niu X, Fu H and Qiao H 2020 Reducing redundancy of musculoskeletal robot with convex hull vertexes selection IEEE Trans. Cogn. Dev. Syst. 12 601–17 [65] Kawaharazuka K, Kawamura M, Makino S, Asano Y, Okada K and Inaba M 2017 Antagonist inhibition control in redundant tendon-driven structures based on human reciprocal innervation for wide range limb motion of musculoskeletal humanoids IEEE Robot. Autom. Lett. 2 2119–26 [66] Schumacher P, Geijtenbeek T, Caggiano V, Kumar V, Schmitt S, Martius G and Häufle D 2023 Natural and robust walking using reinforcement learning without demonstrations in high-dimensional musculoskeletal models (arXiv:2309.02976) [67] Kidziński Łukasz, Mohanty S P, Ong C, Hicks J L, Carroll S F, Levine S, Salathé M and Delp S L 2018 Learning to run challenge: synthesizing physiologically accurate motion using deep reinforcement learning (arXiv:1804. 00198) [68] Liu R, Wang J, Chen Y, Liu Y, Wang Y and Gu J 2023 Proximal policy optimization with time-varying muscle synergy for the control of an upper limb musculoskeletal system IEEE Trans. Autom. Sci. Eng. 21 1–12 [69] Campbell J, Hitzmann A, Stepputtis S, Ikemoto S, Hosoda K and Amor H B 2019 Learning interactive behaviors for musculoskeletal robots using bayesian interaction primitives (arXiv:1908.05552) [70] Fu K C D, Libera F D and Ishiguro H 2015 Extracting motor synergies from random movements for low-dimensional task-space control of musculoskeletal robots Bioinspir. Biomim. 10 056016 [71] Kalidindi H T, Thuruthel T G, Laschi C and Falotico E 2019 Cerebellum-inspired approach for adaptive kinematic 20 https://doi.org/10.3389/fnbot.2019.00061 https://doi.org/10.3389/fnbot.2019.00061 https://doi.org/10.1097/00005072-196804000-00011 https://doi.org/10.1097/00005072-196804000-00011 https://doi.org/10.1016/j.humov.2007.04.001 https://doi.org/10.1016/j.neures.2015.12.008 https://doi.org/10.1016/j.neures.2015.12.008 https://doi.org/10.1016/j.robot.2012.09.020 https://doi.org/10.1016/j.robot.2012.09.020 https://doi.org/10.1016/S0959-4388(99)00028-8 https://doi.org/10.1016/S0959-4388(99)00028-8 https://doi.org/10.1016/S1364-6613(00)01537-0 https://doi.org/10.1016/S1364-6613(00)01537-0 https://doi.org/10.1016/0001-6918(83)90027-6 https://doi.org/10.1016/0001-6918(83)90027-6 https://doi.org/10.1152/jn.00421.2014 https://doi.org/10.1152/jn.00421.2014 https://doi.org/10.1152/jn.1998.80.2.696 https://doi.org/10.1152/jn.1998.80.2.696 https://doi.org/10.1007/BF00204593 https://doi.org/10.1007/BF00204593 https://doi.org/10.1038/29528 https://doi.org/10.1038/29528 https://doi.org/10.1523/JNEUROSCI.05-07-01688.1985 https://doi.org/10.1523/JNEUROSCI.05-07-01688.1985 https://doi.org/10.3389/fnbot.2011.00004 https://doi.org/10.3389/fnbot.2011.00004 https://doi.org/10.1016/j.conb.2005.10.011 https://doi.org/10.1016/j.conb.2005.10.011 https://doi.org/10.1371/journal.pone.0214499 https://doi.org/10.1371/journal.pone.0214499 https://doi.org/10.1038/nn1010 https://doi.org/10.1038/nn1010 https://doi.org/10.3389/fnins.2014.00436 https://doi.org/10.3389/fnins.2014.00436 https://doi.org/10.1016/j.jbiomech.2022.110997 https://doi.org/10.1016/j.jbiomech.2022.110997 https://doi.org/10.1523/JNEUROSCI.22-18-08297.2002 https://doi.org/10.1523/JNEUROSCI.22-18-08297.2002 https://doi.org/10.1007/BF00198086 https://doi.org/10.1007/BF00198086 https://doi.org/10.1109/JOE.2004.833102 https://doi.org/10.1109/JOE.2004.833102 https://doi.org/10.1152/jn.00554.2017 https://doi.org/10.1152/jn.00554.2017 https://doi.org/10.1088/1748-3190/acfa51 https://doi.org/10.1088/1748-3190/acfa51 https://doi.org/10.1007/s00221-015-4543-y https://doi.org/10.1007/s00221-015-4543-y https://doi.org/10.1098/rspb.1938.0050 https://doi.org/10.1098/rspb.1938.0050 https://doi.org/10.1177/0954411916659894 https://doi.org/10.1177/0954411916659894 https://doi.org/10.1109/TCDS.2020.3045574 https://doi.org/10.1109/TCDS.2020.3045574 https://doi.org/10.1109/TSMC.2017.2719057 https://doi.org/10.1109/TSMC.2017.2719057 https://doi.org/10.1109/HUMANOIDS.2016.7803367 https://doi.org/10.1155/2009/498363 https://doi.org/10.1155/2009/498363 https://doi.org/10.1109/TBME.2017.2764630 https://doi.org/10.1109/TBME.2017.2764630 https://doi.org/10.1109/IROS40897.2019.8968605 https://doi.org/10.1109/IROS.2012.6385628 https://doi.org/10.1109/TCDS.2019.2953642 https://doi.org/10.1109/TCDS.2019.2953642 https://doi.org/10.1109/LRA.2017.2720854 https://doi.org/10.1109/LRA.2017.2720854 https://doi.org/10.48550/arXiv.2309.02976 https://arxiv.org/abs/1804.00198 https://arxiv.org/abs/1804.00198 https://doi.org/10.1109/TASE.2023.3254583 https://doi.org/10.1109/TASE.2023.3254583 https://arxiv.org/abs/1908.05552 https://doi.org/10.1088/1748-3190/10/5/056016 https://doi.org/10.1088/1748-3190/10/5/056016 Bioinspir. Biomim. 19 (2024) 046015 E Almanzor et al control of soft robots 2019 2nd IEEE Int. Conf. on Soft Robotics (RoboSoft) pp 684–9 [72] Marques H G, Bharadwaj A and Iida F 2014 From spontaneous motor activity to coordinated behaviour: a developmental model PLoS Comput. Biol. 10 e1003653 [73] Caligiore D, Ferrauto T, Parisi D, Accornero N, Capozza M and Baldassarre G 2008 Using motor babbling and hebb rules for modeling the development of reaching with obstacles and grasping Int. Conf. on Cognitive Systems 2008 (Karlsruhe, Germany, 2–4 April 2008) (available at: https:// api.semanticscholar.org/CorpusID:12480235) [74] Armanini C, Boyer F, Mathew A T, Duriez C and Renda F 2023 Soft robots modeling: a structured overview IEEE Trans. Robot. 39 1728–48 [75] Kino H, Kikuchi S, Yahiro T and Tahara K 2009 Basic study of biarticular muscle’s effect on muscular internal force control based on physiological hypotheses 2009 IEEE Int. Conf. on Robotics and Automation (Kobe, Japan, 12–17 May 2009) pp 4195–200 [76] Jacobsen S, Ko H, Iversen E and Davis C 1989 Antagonistic control of a tendon driven manipulator Proc., 1989 Int. Conf. on Robotics and Automation (Scottsdale, AZ, USA, 14–19 May 1989) vol 3 pp 1334–9 [77] Potkonjak V, Svetozarevic B, Jovanovic K and Holland O 2010 Control of compliant anthropomimetic robot joint AIP Conf. Proc. 1281 1271–4 [78] Salaün C, Padois V and Sigaud O 2009 Control of redundant robots using learned models: an operational space control approach 2009 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (St. Louis, MO, USA, 10–15 October 2009) pp 878–85 [79] D’Souza A, Vijayakumar S and Schaal S 2001 Learning inverse kinematics Proc. 2001 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180) (Maui, HI, USA, 29 October–3 November 2001) vol 1 pp 298–303 [80] Sugiyama T, Kutsuzawa K, Owaki D and Hayashibe M 2021 Individual deformability compensation of soft hydraulic actuators through iterative learning-based neural network Bioinspir. Biomim. 16 056016 [81] Sugiyama T, Kutsuzawa K, Owaki D and Hayashibe M 2024 Latent representation-based learning controller for pneumatic and hydraulic dual actuation of pressure-driven soft actuators Soft Robot. 11 105–17 [82] Ookubo S, Asano Y, Kozuki T, Shirai T, Okada K and Inaba M 2015 Learning nonlinear muscle-joint state mapping toward geometric model-free tendon driven musculoskeletal robots 2015 IEEE-RAS 15th Int. Conf. on Humanoid Robots (Humanoids) (Seoul, Korea (South), 3–5 November 2015) pp 765–70 [83] Dong H, Figueroa N and El Saddik A 2013 Muscle force control of a kinematically redundant bionic arm with real-time parameter update 2013 IEEE Int. Conf. on Systems, Man and Cybernetics (Manchester, UK, 13–16 October 2013) pp 1640–7 [84] Hadi Balaghi M E, Vatankhah R, Broushaki M and Alasty A 2016 Adaptive optimal multi-critic based neuro-fuzzy control of mimo human musculoskeletal arm model Neurocomputing 173 1529–37 [85] Akimov D 2020 Distributed soft actor-critic with multivariate reward representation and knowledge distillation (arXiv:1911.13056) [86] Piazzi A and Visioli A 2000 Global minimum-jerk trajectory planning of robot manipulators IEEE Trans. Ind. Electron. 47 140–9 [87] Lin H-I 2014 A fast and unified method to find a minimum-jerk robot joint trajectory using particle swarm optimization J. Intell. Robot. Syst. 75 379–92 [88] Sharif Razavian R, Ghannadi B and McPhee J 2019 A synergy-based motor control framework for the fast feedback control of musculoskeletal systems J. Biomech. Eng. 141 031009 [89] Chen J and Qiao H 2021 Muscle-synergies-based neuromuscular control for motion learning and generalization of a musculoskeletal system IEEE Trans. Syst. Man Cybern. Syst. 51 3993–4006 [90] Huang X, WuW, Qiao H and Ji Y 2018 Brain-inspired motion learning in recurrent neural network with emotion modulation IEEE Trans. Cogn. Dev. Syst. 10 1153–64 [91] Ijspeert A J, Crespi A, Ryczko D and Cabelguen J-M 2007 From swimming to walking with a salamander robot driven by a spinal cord model Science 315 1416–20 [92] Wang T, Guo W, Li M, Zha F and Sun L 2012 CPG control for biped hopping robot in unpredictable environment J. Bionic Eng. 9 29–38 [93] Yu H, Gao H, Ding L, Li M, Deng Z and Liu G 2016 Gait generation with smooth transition using CPG-based locomotion control for hexapod walking robot IEEE Trans. Ind. Electron. 63 5488–500 [94] Geyer H and Herr H 2010 A muscle-reflex model that encodes principles of legged mechanics produces human walking dynamics and muscle activities IEEE Trans. Neural. Syst. Rehabil. Eng. 18 263–73 [95] Haeufle D, Günther M, Bayer A and Schmitt S 2014 Hill-type muscle model with serial damping and eccentric force-velocity relation J. Biomech. 47 1531–6 [96] Meyer A J, Patten C and Fregly B J 2017 Lower extremity EMG-driven modeling of walking with automated adjustment of musculoskeletal geometry PLoS One 12 1–24 [97] Burdet E, Franklin D W and Milner T E 2013 Human Robotics: Neuromechanics and Motor Control (The MIT Press) (https://doi.org/10.7551/mitpress/9007.001.0001) [98] George Thuruthel T, Falotico E, Manti M, Pratesi A, Cianchetti M and Laschi C 2017 Learning closed loop kinematic controllers for continuum manipulators in unstructured environments Soft Robot. 4 285–96 [99] Almanzor E, Ye F, Shi J, Thuruthel T G, Wurdemann H A and Iida F 2023 Static shape control of soft continuum robots using deep visual inverse kinematic models IEEE Trans. Robot. 39 2973–88 [100] Thuruthel T G, Falotico E, Cianchetti M, Renda F and Laschi C 2016 Learning global inverse statics solution for a redundant soft robot Proc. 13th Int. Conf. on Informatics in Control, Automation and Robotics (Lisbon, Portugal, 29–31 July 2026) vol 2 pp 303–10 (available at: https://api. semanticscholar.org/CorpusID:39454151) [101] Levin M F 2016 Principles of Motor Recovery After Neurological Injury Based on a Motor Control Theory (Springer) pp 121–40 [102] Gribble P L, Mullin L I, Cothros N and Mattar A 2003 Role of cocontraction in arm movement accuracy J. Neurophysiol. 89 2396–405 [103] Rolf M, Steil J and Gienger M 2010 Goal babbling permits direct learning of inverse kinematics IEEE Trans. Auton. Mental Dev. 2 216–29 [104] Yu Y, Si X, Hu C and Zhang J 2019 A review of recurrent neural networks: LSTM cells and network architectures Neural Comput. 31 1235–70 [105] Trinh M, Behery M, Emara M, Lakemeyer G, Storms S and Brecher C 2022 Dynamics modeling of industrial robots using transformer networks 2022 6th IEEE Int. Conf. on Robotic Computing (IRC) (Italy, 5–7 December 2022) pp 164–71 [106] Reher J and Ames A D 2021 Dynamic walking: toward agile and efficient bipedal robots Ann. Rev. Control Robot. Auton. Syst. 4 535–72 21 https://doi.org/10.1371/journal.pcbi.1003653 https://doi.org/10.1371/journal.pcbi.1003653 https://api.semanticscholar.org/CorpusID:12480235 https://api.semanticscholar.org/CorpusID:12480235 https://doi.org/10.1109/TRO.2022.3231360 https://doi.org/10.1109/TRO.2022.3231360 https://doi.org/10.1109/ROBOT.2009.5152196 https://doi.org/10.1109/ROBOT.1989.100165 https://doi.org/10.1063/1.3497932 https://doi.org/10.1063/1.3497932 https://doi.org/10.1109/IROS.2009.5354438 https://doi.org/10.1109/IROS.2001.973374 https://doi.org/10.1088/1748-3190/ac1b6f https://doi.org/10.1088/1748-3190/ac1b6f https://doi.org/10.1089/soro.2022.0224 https://doi.org/10.1089/soro.2022.0224 https://doi.org/10.1109/HUMANOIDS.2015.7363456 https://doi.org/10.1109/SMC.2013.283 https://doi.org/10.1016/j.neucom.2015.09.026 https://doi.org/10.1016/j.neucom.2015.09.026 https://arxiv.org/abs/1911.13056 https://doi.org/10.1109/41.824136 https://doi.org/10.1109/41.824136 https://doi.org/10.1007/s10846-013-9982-8 https://doi.org/10.1007/s10846-013-9982-8 https://doi.org/10.1115/1.4042185 https://doi.org/10.1115/1.4042185 https://doi.org/10.1109/TSMC.2020.2966818 https://doi.org/10.1109/TSMC.2020.2966818 https://doi.org/10.1109/TCDS.2018.2843563 https://doi.org/10.1109/TCDS.2018.2843563 https://doi.org/10.1126/science.1138353 https://doi.org/10.1126/science.1138353 https://doi.org/10.1016/S1672-6529(11)60094-2 https://doi.org/10.1016/S1672-6529(11)60094-2 https://doi.org/10.1109/TIE.2016.2569489 https://doi.org/10.1109/TIE.2016.2569489 https://doi.org/10.1109/TNSRE.2010.2047592 https://doi.org/10.1109/TNSRE.2010.2047592 https://doi.org/10.1016/j.jbiomech.2014.02.009 https://doi.org/10.1016/j.jbiomech.2014.02.009 https://doi.org/10.1371/journal.pone.0179698 https://doi.org/10.1371/journal.pone.0179698 https://doi.org/10.7551/mitpress/9007.001.0001 https://doi.org/10.1089/soro.2016.0051 https://doi.org/10.1089/soro.2016.0051 https://doi.org/10.1109/TRO.2023.3275375 https://doi.org/10.1109/TRO.2023.3275375 https://api.semanticscholar.org/CorpusID:39454151 https://api.semanticscholar.org/CorpusID:39454151 https://doi.org/10.1007/978-3-319-47313-0_7 https://doi.org/10.1152/jn.01020.2002 https://doi.org/10.1152/jn.01020.2002 https://doi.org/10.1109/TAMD.2010.2062511 https://doi.org/10.1109/TAMD.2010.2062511 https://doi.org/10.1162/neco_a_01199 https://doi.org/10.1162/neco_a_01199 https://doi.org/10.1109/IRC55401.2022.00035 https://doi.org/10.1146/annurev-control-071020-045021 https://doi.org/10.1146/annurev-control-071020-045021 Utilising redundancy in musculoskeletal systems for adaptive stiffness and muscle failure compensation: a model-free inverse statics approach 1. Introduction 2. Related works 2.1. Model-based control approaches 2.2. Model-free control approaches 2.2.1. Supervised learning 2.2.2. Reinforcement learning 2.3. Biologically-inspired control approaches 3. Methods 3.1. Musculoskeletal leg model 3.2. Hill-type muscle model 3.3. Inverse static mapping between activations and end-effector positions 4. Experimental setup and results 4.1. Model-free controller training and implementation 4.2. Hip and knee control experiments 4.3. Six-muscle leg control experiments 4.3.1. Randomised target postures, randomised initial postures and trajectory tracking 4.3.2. Performance under payload changes 4.3.3. Performance under muscle failure 4.3.4. Performance under external physical interaction 5. Discussion and conclusions 5.1. Future work Appendix References