Intelligent self calibration tool for adaptive few-mode ﬁ ber multiplexers using multiplane light conversion

. Space division multiplexing (SDM) is promising to enhance capacity limits of optical networks. Among implementation options, few-mode ﬁ bres (FMFs) offer high ef ﬁ ciency gains in terms of integratability and throughput per volume. However, to achieve low insertion loss and low crosstalk, the beam launching should match the ﬁ ber modes precisely. We propose an all-optical data-driven technique based on multiplane light conversion (MPLC) and neural networks (NNs). By using a phase-only spatial light modulator (SLM), spatially separated input beams are transformed independently to coaxial output modes. Compared to conventional of ﬂ ine calculation of SLM phase masks, we employ an intelligent two-stage approach that considers knowledge of the experimental environment signi ﬁ cantly reducing misalignment. First, a single-layer NN called Model-NN learns the beam propagation through the setup and provides a digital twin of the apparatus. Second, another single-layer NN called Actor-NN controls the model. As a result, SLM phase masks are predicted and employed in the experiment to shape an input beam to a target output. We show results on a single-passage con ﬁ guration with intensity-only shaping. We achieve a correlation between experiment and network prediction of 0.65. Using programmable optical elements, our method allows the implementation of aberration correction and distortion compensation techniques, which enables secure high-capacity long-reach FMF-based communication systems by adaptive mode multiplexing devices.


Introduction
Emerging innovations like tactile Internet [1] and mobile healthcare [2] necessitate high-capacity, low latency and secure communication networks [3]. In the past decades, single-mode fiber (SMF)-based communications primarily provided the required backbone networks driving such high performance technologies. In order to overcome capacity exhaustion in SMFs called the nonlinear Shannon limit [4], multiplexing in the spatial domain of FMFs, e.g. implemented by digital signal processing [5], is considered an essential complement to conventional multiplexing techniques [6]. SDM [7,8] enables to gainfully exploit spatially parallel optical paths in FMFs the design of which can be optimised for such transmission [9]. The interplay between FMF and SDM allows for data rates beyond 1 Pbits À1 in one single waveguide [10].
When launching data to FMF, those spatial modes that overlap with the incident input beam are excited [11]. A straightforward and low-loss multiplexer is provided by a photonic lantern [12]. By adiabatic tapering, a launching waveguide can be physically coupled to the FMF. However, photonic lanterns barely provide selectivity and no adaptability of the launching conditions after fabrication and crosstalk compensation has to be carried out on the receiver-side inflating the digital processing overhead.
In contrast, mode multiplexers based on MPLC [13] provide programmability. This method allows to map several input spots to the modes of the FMF. In a free-space configuration, multiple reflections between a mirror and a phase-modulating element, e.g. an SLM, carry out the desired transformation. Depending on the phase pattern displayed on the SLM, the transformation can be further adapted for instance to implement predistortion techniques Advancing Society with Light, a special issue from general congress ICO-25-OWLS-16-Dresden-Germany-2022 to tailor light fields making crosstalk compensation on the receiver-side obsolete [14,15], or to enhance information security [16,17].
In recent papers on MPLC [10,14,18,19], the SLM phase masks are calculated offline using the wavefront matching algorithm (WMA) [18,20]. However, such pre-processing is critical in terms of errors including pixelprecise alignment or incident angle tolerances [21]. Alternatively, machine learning algorithms provide a suitable framework for the spatial mapping of light fields. For example, in NN-based image reconstruction intensity images captured at multimode fiber output are allocated to the corresponding input field [22][23][24]. In turn, with machine learning it is also possible to generate SLM phase masks performing the desired light shaping task. This enables among others real-time wavefront shaping through complex media [25], nonlinear spatiotemporal light control [26], or diffractive optical NN, which are based on similar spatial transformations like those with the MPLC considered [27].
In this work, we present a method to advance the mode shaping procedure by a smart data-driven onlinecalibration, where the entire MPLC apparatus is considered as black box. By training an NN, we create a digital twin of the setup called Model-NN. With the Model-NN, we gain knowledge about the experimental environment. Afterwards, we train another NN called Actor-NN that controls the model. This idea is inspired by the work published in Ref. [28]. Within our method, we adapt the Actor-Model approach to enhance the MPLC performance for SDM applications. In the following section, we will introduce the MPLC technique in general. Afterwards, we explain the procedure of mimicking the experimental setup as a digital twin. Finally, we present an implementation to shape intensity images of the EMNIST handwritten letters dataset [29].

Smart calibration for multiplane light conversion
A general scheme of an MPLC device is shown in Figure 1. An array of incoherent input spots is entering the MPLC apparatus. Input spots with Gaussian profile E in illuminate the SLM at the first reflection passage performing phaseonly modulation applying the spatial phase-term / SLM . Using multiple of those free-beam passages, both amplitude and phase profiles are modulated. As a result, the MPLC output should match the desired output modes, which can be calculated by solving the MAXWELL equations.
Although the WMA provides a numerically correct solution to calculate the SLM phase masks, it is prone to errors that can barely be considered during offline calculation [21]. Among others, there are delicate misalignment in pixelprecise positioning of the phase patterns, incident angle tolerance and mismatching propagation distances between SLM and mirror that cause crucial performance drops. To compensate for such errors, we propose an intelligent online calibration procedure that considers the entire MPLC device as black box. The anticipated NN structure is shown in Figure 2.
In a first step, the MPLC is set up with two passages transfering two input beams. To gain knowledge about the black box, a single-layer NN is trained. SLM phase masks / SLM are used as NN input and the output is the corresponding intensity image I with is measured by a camera. c denotes the speed of light and 0 the permittivity of the media. We use a single-layer perceptron as NN architecture for both, Actor-and Model-NN. The input pixels representing SLM phase masks are thereby connected straightforwardly to the pixels of the output intensity image and vice-versa, resulting in 2.304.000 trainable parameters. The NN and its internal structure is shown in Figure 3. For proper training, 2k uniformly distributed random phase masks and corresponding intensity images are used. For both NNs, we use 1600 samples for training, and 400 for test, respectively. As loss function, we use standard mean-squared error (MSE) with where x i denotes the ground truth and y i the prediction for N samples. Compared to the MSE we use as loss function, Figure 1. Optical MPLC setup considered in our work. Two spatially seperated Gaussian input spots are incident to the SLM. After two reflection passages between SLM and mirror (M), the output beams are imaged by a 4f-telescope onto a camera where intensity images are recorded. Both the FMFs facet and the camera are at a distance d % 1.5 cm from the SLM. A smart calibration based on artificial intelligence (AI) is implemented to generate proper SLM phase masks in order to shape desired output modes.
the performance of the NN is indicated by the correlation C with With the single-layer perceptron layout, training converges within several minutes on consumer hardware for both NNs.
After training the NN, a digital twin of the MPLC setup is created called Model-NN. Consequently, the Model-NN includes all misalignments and tolerances of the setup. In a second step, the Model-NN is kept static and another single-layer NN called Actor-NN is trained controlling the model. The ultimate predictions provide phase masks that are required for the desired light shaping task. This intelligent approach allows us to consider experimental tolerances already in the calibration procedure, which advances the MPLC setup.

Model-NN mimicking the MPLC setup
The first step of the Actor-Model approach comprises training the Model-NN to create the digital twin. For this task, representative training data is necessary. For our experiments, we consider the setup shown in Figure 1. Two incoherent Gaussian input spots of 640 nm wavelength enter the MPLC. Mirror and SLM are separated with d M % 3 cm distance. After two reflection passages, the output is imaged to a camera (d % 1.5 cm) by a 4f-telescope capturing an intensity image. On the SLM, two-dimensional 256 Â 256 pixel, 8 bit phase masks are displayed. In our investigations, SLM pixels allocated to the second reflection passage, i.e. the passage closer to the camera, are active. The SLM pixels allocated to the first passage are set passive. To reduce  training effort, groups of 8 Â 8 bit neighboring SLM pixels are assigned to one macropixel. Thus, the training data for the Model-NN consists of 32 Â 32 pixel phase masks and the corresponding 150 Â 150 pixel intensity images, as shown in Figure 3 and Figure 2a. The phase values have to be set within the interval 0; p 2 À Ã to avoid neighboring fields having conjugate phase [28]. In Figure 4, the training procedure on experimental data is shown. As a result, the Model-NN can predict the MPLC behavior with C = 0.94 calculated on average between prediction and ground truth for training data and C = 0.82 for test data, respectively. The investigations shown are performed on a single input beam in a first step. This way it can be shown, that NNs can learn the MPLC behaviour in general.
When considering a second beam at the input, another independent network part has to be created for instance by training another Model-NN. For multi-beam configurations, it is crucial that all output beams have a common overlap region which defines the area where output modes are generated. This constraint can introduce special requirements to the SLM phase masks, especially for many input beams, which will be discussed in Section 3. In Figure 5, intensity heatmaps of a dual-input beam configuration are shown. The heatmaps are calculated from 1k synthetic data, when simulating the MPLC with free-beam propagation using angular spectrum [30]. The images in Figure 5a and Figure 5b result from summing up the intensity images after displaying 1k different random SLM phase masks. In Figure 5c, the overlap between both spots, i.e. heatmaps is shown. This defines the region where output modes can be generated in a later dual-beam MPLC.

Actor-NN controlling the model
In the previous step, the Model-NN is trained mimicking the experimental setup. Now we will use another NN, i.e. Actor-NN to control the model. In this step, the trained Model-NN is frozen, as shown in Figure 2b. As training the Model was performed on all-experimental data, it contains knowledge on the system. This is used in the offline training of the Actor-NN to predict the required phase masks to perform the desired beam shaping task. The Actor-NN architecture is again the single-layer perceptron shown in Figure 3. The input of the Actor-NN can be any desired intensity distribution. We use 150 Â 150 pixel, 8 bit intensity samples of the EMNIST data base [29]. In Figure 6 (left column), three samples are shown. The corresponding phase mask prediction of the Actor-NN (see Fig. 6 middle left column) is forwarded to the frozen Model-NN which predicts the MPLC output beam intensity. In the middle right column of Figure 6, three predictions from the Model-NN are shown. MSE again defines the loss function optimizing the Actor-NN to control the Model-NN which represents the experimental setup.
After training of the entire Actor-Model structure (see Fig. 2b), the Actor-NN is frozen and is used to generate phase masks for the experiment, as shown in Figure 2c. The phase masks provided by the Actor-NN are displayed on the SLM to run the MPLC. In Figure 6 (right column), three camera images capturing the MPLC output are shown. The images are taken after the setup was calibrated with the approach introduced. We achieve a correlation between the ground truth and the Model-NNs prediction of C % 0.7. For the measured intensity we achieve a correlation of C % 0.65 compared to the Model-NNs prediction and C % 0.6 to the ground truth.

Discussion
Our results show that the smart Actor-Model approach enables targeted shaping of light beams that are input to an MPLC device. With this calibration method, we treat the optical setup as a black box. Thus, we do not necessitate pixel-precise alignment in the experiment to match offline calculation, such as the WMA. In turn, we train a digital twin of the MPLC setup called Model-NN that considers knowledge of the experimental environment reducing misalignment. Therefore, proper training data of the setup needs to be created. Training data of the Model-NN comprises SLM phase masks at the input and corresponding intensity camera images at the output. For the actor, training is done by using EMNIST data set as the ground truth. The prediction of the Actor-NN during training is directly connected to the input phase mask of the fixed Model-NN, predicting the systems output intensity. The Actor-NN is trained by comparing the predicted intensity with the ground truth images. The Actor-NNs performance is limited by the performance of the Model-NN. Additionally the Actor can only provide phase masks, to shape a certain Experimental data (i.e. SLM phase masks and intensity camera images) is used for the Model-NN, while EMNIST data base is used for the Actor-NN. We used 2k data each, where 1600 samples are used for training and 400 for test, respectively. Both NNs comprise of a single-layer structure with sigmoid activation function. MSE is used as loss function with adam optimizer, wheres fidelity is used as performance indicator. Convergence is observed in all scenarios that are training (solid) and test data (dashed) for Model-NN (red) and Actor-NN (blue), respectively. area on the camera, which is mainly the center. By reducing the ground truth intensity to this area, the performance is increased significantly to C % 0.7, where C % 0.82 is the maximal achievable correlation from the Model-NNs performance for unknown data.
In our investigations, we use simple single-layer perceptrons for a dual-passage configuration. Due to the exponential parameter increase in fully-connected layers when increasing the number of input / output pixels, more sophisticated NN architectures are required, e.g. DenseNet or MTNet [31,32]. With our approach, offline training of both, the Model-as well as the Actor-NN is possible. This reduces the on-setup time for training data acquisition. In principle, other online training techniques like reinforcement learning [33,34] can be employed to control the system. However, such approaches require increased on-setup time suffering from instabilities such as mechanical drifts or fluctuations. This is particularly critical if interferometric training data, e.g. digital holography [35], must be acquired for complex mode multiplex. We use MSE as loss function during the training of the NNs. However, a custom loss function of multiple weighted measures, e.g. SSIM or Kullback-Leibler divergence, may improve our results significantly, as this allows to trade-off between spatially and probability distributed features.
For performing mode generation with the system shown, the mode profiles must be used as ground truth. This requires complex training data. So far, however, intensity-only images have been used for both model and actor. For mode generation, the system must therefore be extended by an interferometer. The NN architecture should be complemented by complex-valued NNs [36], or by another NN path carrying phase information [28]. After calibration, modes can be launched by directing the output beam to an actual FMF. The FMF input facet is placed to the position defined by the camera sensor in Figure 1, or to an image of it.
The results shown in this work are produced in singlebeam and single-passage configuration. However, to shape coaxial output modes in a multi-beam configuration, all output beams must overlap in a certain region, as shown in Figure 5 for two beams. For all-random phase masks, the overlap area is rather small. In order to increase overlap, or to allow overlapping of multiple beams, targeted shifting can be applied by for instance displaying phase tilts on the SLM. Such patterns are included in Fourier domain phase masks [37,38], where a pixel-shift induces a tilt in the phase pattern.
In multi-beam configurations, multiple independent network structures, such as multi-Model-NNs have to be trained that mimick the propagation properties of independent optical tributaries. In such arrangements, the Model-NN approach shown in this work can straightforwardly be adopted. The NN architecture in a multi-beam arrangement should comprise independent sub-NNs for each beam in both Actor-NN and Model-NN. Since in each passage all beams share the same phase mask, the Actor-NN outputs must be combined, training on a common phase mask. In the solution shown here, we predict a phase mask for one passage that is used as Actor-NN output, i.e. Model-NN input. However, for multi-passage configurations, all SLM pixels referring to different passages can be treated as one accumulated phase mask.
Adding tributaries and thus increasing the number of input beams requires increasing the number of reflection passages. According to a fair estimate with WMA, N passages are required for N input spots [18]. Upscaling the MPLC necessitates also higher resolution of the Actor-Model-NN architectures, including SLM phase masks and camera images serving as input / output of the Model-  The correlation between Model-NN prediction and ground truth is C % 0.7. Right: experimental results. Using the trained Actor-NN, SLM phase masks are generated that are employed for driving the MPLC. The images shown are camera recordings capturing the MPLC output. Note, that a single-beam and single-passage configuration is considered. The correlation between experiment and Model-NN prediction is C % 0.65. NN. Here, we used 32 Â 32 pixel phase masks and 150 Â 150 pixel camera images. In our investigations, we used 8 Â 8 macropixels on the SLM. In total, a 256 Â 256 pixel SLM window is used. These dimensions result from a consideration of how much a phase change on the SLM induces a meaningful change on the camera image when generating training data for the Model-NN. If higher resolution SLM windows, i.e. smaller macropixels, are used the resolution on the camera should be scaled accordingly to capture valuable information.

Conclusions
We have demonstrated an intelligent approach to calibrate an MPLC device using experimental data. Although we treat the entire light shaping system as black box, delicate knowledge about the experimental behavior is gained by using machine learning algorithms with C % 0.82. Here, we have shown that the Actor-Model approach is feasible for online calibration of an all-optical mode multiplexer based on MPLC. In contrast to an offline calculation of SLM phase masks, our approach does not suffer from mismatches between algorithm and experiment reducing the alignment effort dramatically. This is particularly beneficial for the employment of low-complex SDM networks or the transmission of fragile quantum states.