Neural network modeling of bismuth-doped fiber amplifier

. Bismuth-doped ﬁ ber ampli ﬁ ers offer an attractive solution for meeting continuously growing enormous demand on the bandwidth of modern communication systems. However, practical deployment of such ampli ﬁ ers require massive development and optimization efforts with the numerical modeling being the core design tool. The numerical optimization of bismuth-doped ﬁ ber ampli ﬁ ers is challenging due to a large number of unknown parameters in the conventional rate equations models. We propose here a new approach to develop a bismuth-doped ﬁ ber ampli ﬁ er model based on a neural network purely trained with experimental data sets in E-and S-bands. This method allows a robust prediction of the ampli ﬁ er operation that incorporates variations of ﬁ ber properties due to manufacturing process and any ﬂ uctuations of the ampli ﬁ er characteristics. Using the proposed approach the spectral dependencies of gain and noise ﬁ gure for given bi-directional pump currents and input signal powers have been obtained. The low mean (less than 0.19 dB) and standard deviation (less than 0.09 dB) of the maximum error are achieved for gain and noise ﬁ gure predictions in the 1410 – 1490 nm spectral band.


Introduction
Fiber-optic networks are the backbone of the global communications infrastructure that made possible modern Internet, providing multitude of the online services and digital economy.The development of novel approaches for further increasing capacity of optical communication systems is in the focus of the research around the world due to the constantly growing data traffic and the corresponding bandwidth demand [1].Despite a large accessible bandwidth of optical fiber, the conventional optical networks exploit only about 10 THz that is covered by the commercially available Er-doped fiber amplifiers in C-and L-optical bands (1530-1620 nm).There are three main current approaches to increase the capacity of fiber-optical transmission systems by the development of: the higherorder modulation formats, the spatial division multiplexing (SDM), and the multi-band transmission (MBT) [2].Arguably, the most practical technique is the MBT capable to utilize the huge and still available spectral bandwidth of the existing fiber base.Unlike the SDM, it does not require a new fiber deployment.Moreover, capacity of the MBT scales linearly with spectral bandwidth compared to that of the higher-order modulation formats, which scales logarithmically with signal-to-noise-ratio (SNR) [3].The limiting factor of MBT can potentially be stimulated Raman scattering (SRS) that leads to the undesired energy transfer from the higher frequency channels to the lower frequency ones.Despite the impact of SRS, the MBT maximizes the return-on-investments in the existing infrastructures [2] by the transmission in the so-called O, E, S, and U optical bands.However, it involves a significant upgrade of current networks with novel amplifiers in the aforementioned optical bands that are yet being developed and optimized.
The number of doped fiber media operating beyond Cand L-bands have been reported: neodymium (Nd) [4], praseodymium (Pr) [4], thulium (Tm) [5], and bismuth (Bi) [6,7].Unlike many other active dopants, Bi active centers allow the broadband amplification in the whole spectral range from 1150 to 1500 nm [6][7][8][9].Such spectral flexibility of Bi-doped fibers can be achieved by using different host materials like aluminosilicate, phosphosilicate, and germanosilicate glass.This unique feature of Bi-doped fibers makes them one of the most promising amplification tools for the MBT.The bismuth-doped fiber systems showed a great potential for telecommunications after successful information transmissions in O- [10] and E-bands [11][12][13], and the simultaneous signal amplification in different amplification bands [8,14].
However, a significant issue in understanding of bismuth-doped fiber as an active media for amplifiers is inability to determine all fiber parameters required for the modeling using well-known conventional rate equations.This inability is mostly explained by very low concentration of Bi-doped active centers, which cannot be precisely determined.Still the ability to freely model the bismuth-doped fiber amplifiers (BDFAs) is a crucial task for applications like telecommunication due to their gain and noise figure (NF) nonlinear behavior with the pumping scheme configuration, signal or pump powers and wavelengths, and fiber temperature [7,8,15].Thus, it is important to develop an accurate, fast, and simple tool for the modeling of the signal amplification in the BDFA with only known initial pump and signal conditions.As a possible solution, neural network (NN)-based techniques have been already applied for Er-doped fiber amplifiers [16,17] and Raman fiber amplifiers [18,19], but not yet for BDFAs.Moreover, such differentiable amplifier models used in conjunction with also differentiable optical channel models allow a power profile optimization using the gradient descent in reconfigurable optical networks, as also demonstrated in [17].
In this work, we report the NN-based BDFA gain and NF model trained purely on experimental measurements of the five channel amplification in the spectral band of 1410-1490 nm using the BDFA with the bi-directional pumping scheme.Two different data sets are used for both training and testing: with just seven values, and uniformly distributed values of the total input signal power.The proposed model is then used to predict spectral dependencies of gain and NF for the specific total input signal powers and pump diode currents.In addition, the dependency of the maximum absolute error with the training data set size is analyzed.The achieved prediction performance demonstrates the viability of NN approach as a tool for fast and simple BDFA modeling.
The remainder of the paper is organized as follows.Section 2 describes the experimental setup of gain and NF measurements for the data sets acquisition.Section 3 describes the NN-based architecture and the data sets description.Section 4 demonstrates results of numerical simulations for the model trained and tested using different data sets.Section 5 provides a discussion on the importance of the NN approach for modeling of BDFAs.Section 6 concludes the paper.

Experimental setup
The experimental setup for gain and NF measurements is shown in Figure 1.The radiation of the tunable laser tuned to 1410 nm is coupled with the radiation of four signal diodes at 1430 nm, 1450 nm, 1470 nm, and 1490 nm combined in multiplexer (MUX).After 50 : 50 coupler the signal radiation passes variable optical attenuator (VOA) that allowed total input power adjustment in the range from À25 dBm to 5 dBm.The radiation after VOA was characterized in terms of spectrum by the first channel of the optical spectrum analyzer (OSA) and total input signal power by power meter (PM).After that the signal radiation entered BDFA with 320 m long Bi-doped germanosilicate fiber with 9 m core of fiber consisting of 95 mol% SiO 2 , 5 mol% GeO 2 and <0.01 mol% of bismuth.The same BDFA setup was characterized in terms of different pumping schemes performance in [7].Here we use two 1320 nm pump diodes controlled by laser diode (LD) and thermoelectric cooler (TEC) controller for bi-directional pumping.The 1320 nm isolators are used for additional protection of pump diodes from contra propagating radiation.The thin film filter wavelength division multiplexers (TFF-WDMs) with very steep reflection and transmission bands allow efficient coupling of the broadband signal and pump radiation.The spectrum of the signal on the output of the amplifier is recorded using the second channel of OSA.The recorded spectra for different total input signal powers were then used for determination of gain and NF in 1410-1490 nm wavelength range using the source subtraction technique described in [20].The gain and NF characteristics for 1000 mA pumps currents and À25 dBm total input signal power are shown in Figure 1b featuring the maximum achieved gain of 32.9 dB and minimal NF of 5 dB at 1430 nm and 1450 nm, respectively.Figure 1c shows the dependency of the gain at 1430 nm on the total input signal power for different pump diodes currents.The gain dependency has linear and saturation dynamics with the change of total input power, thus it is highly important that NN model has good performance in these both regimes.In addition, the increase of the pump currents shows nonlinear increment of the gain.

Machine learning BDFA model
The single layer NN architecture used to model the BDFA is shown in Figure 2. The inputs of the model are the backward pump diode current I b , the forward pump diode current I f , and the total input signal power of the five channels P in .The outputs of the model are corresponding gains (G 1410 ,. ..,G 1490 ), and NFs (NF 1410 ,. .., NF 1490 ) for all five signal wavelengths.The model is trained using the random projection (RP) [21] to learn the mapping between pump currents and the total input signal power to the gain and NF.A k-fold cross validation with k = 10 is applied for the model selection and the hyper-parameter optimization.This process provides the optimized number of hidden nodes N HN , the activation function f act , the regularization parameter k, and the variance r of the normal distribution.The regularization parameter k, and the variance r are then used to calculate the output weights (W out ) and to assign the hidden layer weights (W 1 ), respectively.The value of these parameters for the different training data sets will be shown in the results section.To improve the prediction capabilities of the NN, a model averaging 20 independently trained NNs is employed [18].Therefore, gain and NF predictions are the average of the 20 NNs outputs.
The experimental data acquisition for the NN training consists of N different current values for the backward (I b ) and forward (I f ) pumps in the range of [200 : 1000] mA and different values of the total input signal power (P in ) in the range of [À25 : 5] dBm.Each case described by (I b , I f , P in ) is applied to the experimental setup (Fig. 1a) and the corresponding output spectra are measured and processed to obtain gain and NF information of the five signal channels.The final data set is in the form of

Results
Firstly, a NN model is trained and tested using a data set with discrete total input signal power levels.The parameter values for the different combinations of testing and training data sets are presented in Table 1.For the first case (Case 1 in Tab.1), the data set is generated for each discrete P in in the set {À25, À20, À15, À10, À5, 0, 5} dBm.The data set for each discrete P in consists of 3000 different I b and I f values drawn from uniform distributions in a log scale to provide a uniform distribution also for gain and NF in dB.The currents are converted back to the linear scale before applying them to the experimental setup.This process gives a total of 21 000 points for all P in cases.The k-fold cross validation performed over the training and validation portion of the data set results in N HN = 1000, f act = sin(x), k = 10 À10 , and r = 10 À2 .This NN model is trained considering the training portion (63%) of the data set.The prediction performance is evaluated in terms of the maximum absolute error E MAX between predicted and target profiles (for gain and NF) for the applied channels.
The probability density functions (PDFs) for E MAX are shown in Figure 3a.They are obtained over 6300 test points (900 each for all 7 input powers).The obtained mean E MAX for gain and NF are 0.16 and 0.15 dB, respectively.The standard deviation of E MAX for gain is ±0.09 dB, and that for NF is ±0.07 dB.These values indicate a high accuracy of predictions using just a single NN model, particularly considering the large input power dynamic range of the BDFA and its gain and NF nonlinear behavior for both pump and input signal powers variations.Figure 3b presents the target and predicted profiles for the worst and the best gain and NF predictions.The overall prediction and target profiles have an excellent match, demonstrating that even the worst cases show a good correspondence with the actual performance.
To verify its generalization ability, the NN model trained with the discrete data set is also tested using the data set with uniformly distributed input total signal power values (Case 2 in Tab. 1).The testing data set consists of 2700 points with P in , I b and I f values being uniformly distributed in the range of [À25 : 5] dBm for P in and [200 : 1000] mA for pump currents (in a log scale).The probability density functions for gain and NF are shown in Figure 3c.The obtained E MAX for gain and NF is equal 0.76 dB and 0.69 dB and error deviations are ±0.39 dB and ±0.32 dB, respectively.The worst and the best predictions of gain and NF are presented in Figure 3d.Even though, the achieved results have a worse performance than the model predicting the behavior of signals with power levels    3e presents the target and predicted profiles for the worst and the best predictions for gain and NF.A very good correspondence between the predicted and real profiles can be seen even for the worst case scenario for gain and NF.Finally, the performance of the proposed framework in all three different cases is tested with different training data set sizes.The dependency of E MAX for gain and NF on the training data set size is presented in Figure 4.The discrete model for both testing data sets (the curves 1 and 2 in Fig. 4) has the similar behavior with almost minimally possible performance around 6500 data set points.On the other hand, the smallest data set (around 2000 points) is required to completely train the model with randomly distributed total input signal power values (curve 3 in Fig. 4).The worst performance of Case 3 achieved with 450 cases is comparable to that achieved in Case 2 with maximum number of data points.The achieved results show that a data set size increase for all cases will not improve the performance of the proposed framework gain and NF predictions.Using a data set with randomly distributed total input signal power values for the training significantly increases the accuracy of amplification predictions for any input signal powers in the range of the training data set.It also allows to decrease the size of the data set size required for the training.The relatively small data set size of Case 3 required for training can decrease the time of the data set acquisition and also allow using such a model in cases when an automatic data acquisition is challenging or impossible.

Discussion
Now we discuss the obtained results in the context of a global problem of the modeling of unknown active media using NN, considering BDFAs as a particular example.It is important to start with the reminder that the conventional rate equations cannot be applied for BDFA modeling due to a very low concentration of Bi-related centers and, thus, inability to measure it using conventional methods.That is the key reason why this problem requires development of novel approaches to modeling to predict the performance for the specific black-box amplifier.The proposed simple NN has shown a remarkable performance in terms of the gain and NF predictions in all the proposed cases: with discrete and random data sets.The comparison between different data set sizes suggested that the most convenient way to use the proposed network is by use small randomly distributed dataset.
We intentionally exploit here a simple NN to demonstrate and stress that such a nonlinear and complex system like BDFA can be modeled using elementary machine learning approach.However, it is evident that further improvements in the model can be easily achieved to allow even more efficient parameter extrapolation, for example fiber length and pump wavelengths/power optimization.This is beyond the scope of the current proof-of-principle work.We anticipate that our results pave the way for further interesting studies of the application of NN approach for modeling of BDFAs, making possible practical deployment of this type of optical amplifiers.

Conclusion
The demonstrated NN-based framework trained purely with the experimental measurements showed the high accuracy for the prediction of both BDFA gain and NF for five signal channels in the 1410-1490 nm wavelength range.The proposed model was trained and tested using two different experimentally acquired data sets based on the grid and randomly distributed signal power values.The results indicate that using the data set with randomly distributed signal power values is preferable for the prediction of the signal amplification with any initial power values in the range of the training data set.Another advantage of such data set is the relatively small number of data points required for the framework training.The predicted BDFA performance shows a good agreement with experimental results of the signal amplification in both linear and saturation regimes confirming that the proposed NN-based framework can be used for the BDFA optimization.The proposed model is the first step towards a reliable and simple modeling tool that can be applied for optimization of BDFA setups in the future.Marie Skłodowska-Curie grant agreement 814276, 813144 and 754462, the Villum Foundations (VYI OPTIC-AI grant no.29344), the European Research Council through the ERC-CoG FRECOM project (grant agreement no.

Figure 1 .
Figure1.a) Experimental setup for BDFA characterization and data sets acquisition; b) Amplifier gain and noise figure as a function of wavelength achieved with 1000 mA pumps currents and À25 dBm signal power; c) Amplifier gain at 1430 nm as a function of total input signal power.TL: tunable laser; MUX: multiplexer; VOA: variable optical attenuator; LD: laser diode; TEC: thermoelectric cooler; Bi: Bi-doped fiber; TFF-WDM: thin film filter wavelength division multiplexer; OSA: optical spectrum analyzer; PM: power meter.

Figure 2 .
Figure 2. Neural network architecture for learning the mapping between inputs (signal powers and pump currents) and outputs (gain and NF profiles).

Figure 3 .
Figure 3. Probability density functions (PDFs) for gain and NF predictions for a) Case 1; c) Case 2; e) Case 3; the worst and the best gain and NF predictions for b) Case 1; d) Case 2; f) Case 3.

Figure 4 .
Figure 4. Maximum absolute error E MAX of gain and NF predictions as a function of training data set size for three different modeling cases indicated in brackets.

Table 1 .
Parameter values for each modeling case.