Implementation of FORMIDABLE: A generalized differential optical design library with NURBS capabilities

. In this article we describe the implementation of Freeform Optics Raytracer with Manufacturable Imaging Design cApaBiLitiEs (FORMIDABLE): an optical design library capable of simulating optical systems by ray-tracing. Optical performance can be quanti ﬁ ed and optimised using third-party optimisation algorithms. Compared to available commercial optical design and similarly to fast accurate NURBS optimization (FANO), our code can simulate and optimise Non-uniform rational B-Spline (NURBS). It also implements generalized differential capabilities that allows faster convergence compared to state-of-the-art. The implementation of FORMIDABLE and its innovative capabilities are described and illustrated with a representative case-study. The source code is available to eligible third-parties under the ECSL licence.


Introduction
An optical design problem consists of finding an appropriate combination of optical surfaces, reflective or refractive, and of refractive materials which will allow an optical instrument to perform a specified function.
The quality of the imaging function can be quantified with various metrics, wavefront error for diffraction limited systems, geometric spot radius for aberration limited ones, and modulation transfer function that can be computed for both.Distortion and transmission are also criterion of importance as are practical constraints such as volume, weight, cost etc.
An appropriate solution is a compromise between these aforementioned metrics and will be found by adjusting the degrees of freedom of the system.Adjusting a degree of freedom will for example modify the shape of an optical surface or its position.
To accomplish this task, designers leverages computedaided design software which rely partly on non-linear numerical optimisation where the fitness of the system is encoded in a scalar function of the degrees of freedom.Optimisation algorithms are then used to find a minimum from a starting point supplied by the designer.
It is increasingly common to rely on surfaces that have no axial symmetry, usually referred to as freeform surfaces [1], to add degrees of freedom.Adding extra degrees of freedom allow finding solutions to an optical design problem that can outperform previous solutions on most metrics, though usually not on all since freeform systems are often more costly [2].
The choice of description of freeform surfaces has been the subject of a vigorous debate, Zernike polynomials being often the favored choice [3].On the other hand, Chrisp [4] have pionered Non-uniform rational B-splines in optical design of imaging systems and have shown advantages compared to other descriptions.
The advantages of NURBS comes from local control.Indeed, NURBS surfaces are controlled by a grid of control points together with their weights.Each control point influences the surface in only a limited region.However, to succesfully optimise NURBS, a custom optical design software is required as commercial optical design codes empirically show poor performances with NURBS potentially because they require many degrees of freedom [4].Fast accurate NURBS optimization (FANO) [5] is an example of such a software.
In the following article we describe our code Freeform Optics Raytracer with Manufacturable Imaging Design cApaBiLitiEs (FORMIDABLE) that also allows design of optical surfaces described as NURBS in optical systems.
Compared to FANO, FORMIDABLE also implements the following functionalities: differential ray tracing [6] which greatly speeds up optimisation of problems with large number of degrees of freedom, and ray-aiming, which allows to optimise system with a physical pupil defined on one of the intermediate surfaces.In the case of ray-aiming, it is unclear to us if FANO implements it, one example [7] does have the physical stop defined on surface 2, which would require rayaiming though most of the examples have the physical stop on the first surface.
Another objective of FORMIDABLE was to use its NURBS capabilities to simulate the impact of structural deformation on optical performance, as NURBS are a standard widely used in mechanical Computed-aided software (CAD) software hence the acronym Freeform Optics Raytracer with Manufacturable Imaging Design cApaBiLitiEs (FORMIDABLE).
In the following sections we will first introduce the concept of differential ray tracing and its implementation.We will then describe the challenges FORMIDABLE overcome in order to optimise a NURBS based system in the context of a case-study.Finally we will describe the architecture of FORMIDABLE.

Differential ray tracing
Differential ray tracing consist into simultaneously computing a ray trace and its derivatives.
We define a ray as a curve perpendicular to a wavefront of light.In this article we only consider mediums of homogenous refractive index separated by optical surfaces, so rays can be described by a line.
Mathematically we represent a ray by a point p and a normalized direction vector direction k.
At each optical surface, the ray will either be refracted or reflected to a new ray.We call this sequence of rays the ray path.
In the context of optical ray tracing we are only interested in ray paths that originates from a point in the field of view and are not vignetted by the pupil of our instrument.
We call the phase space the combination of the field point, the pupil, and the wavelength define the ray tracing function in equation ( 1) as the function that associates a ray path / to phase a point in the phase space and s an optical system.
We note h the part of / that denotes the field point, r the part that denotes the pupil point, and k wavelength the wavelength.raytraceð/; sÞ ¼ w: In equation (1), s is the optical system in consideration, in concrete terms it is a vector of what might vary in an optical system (curvatures, element position etc.), or as refered earlier the degrees of freedom of the optical design problem.The right hand side of equation ( 1), w is the ray path of the system which can be represented as the sequence of rays ffp 1 ; k 1 gg; . . .; fp N ; k N gg .
But we will often prefer a more compact representations eliminating redundant information, either the sequence of intersection points fp 1 ; . . .; p N g (k i are ommited as redundant) or the sequence fu 1 ; . . .; u N g where u i is the argument of p i with respect to the surface function.
Indeed, we limit ourselves to the design of optical surfaces that can be represented parametrically, so for each optical surfaces it exists a function associating a point in three-dimensional space p to a point in two-dimensional parameter space u (Eq.( 2)), To compute the derivative of equation ( 1) in the general case we need a mathematical framework that allows to differentiate functions that do not necessarily have an analytical expression.The framework was introduced in literature [6] and is described here in updated terms.

Implicit differentiation
Given y and x two vector variables such that there is a function mapping x to y but without access to the analytic expression of such function.
What we do have is the analytic expression of f that satisfies f(x,y) = 0.
Knowing x it is possible to compute y by solving f with an algorithm such as Newton's descent, but how to compute the jacobian matrix @y @x ?
To do so, we first introduce two differential operators: the partial differential operator @ Á =@Á and the total differential operator dÁ/dÁ.
The total differential operator applied to f takes into account the fact that y depends on x and can be expressed as a linear combination of partial differential operators that neglect cross terms by applying the chain rules as in equation (3).
Since we have the analytic expression of f, if it is differentiable we obtain the jacobian matrix @y @x by solving the linear system.Of course equation (4) implies that f and y have the same dimensionality.
We can apply the implicit differentiation technique to the intersection calculation of a ray and a surface.
For surfaces that can be described as quadrics (such as spheres, conics etc.) the intersection calculation accepts a closed form solution therefore differentiation is trivial.
For intersection of a ray with a NURBS we use an iterative algorithm [8] which searches for the zero of a two-dimensional error function.
We can therefore differentiate u using the implicit function theorem applied to this two-dimensional intersection error function.
To further compute the derivatives of w from the derivatives of u we use the method that we introduce in Section 2.3.
In the following section, we describe another method implemented in FORMIDABLE, implicit differentiation using Fermat path principle that allows to differentiate directly a ray path.
Whether differentiating using the fermat error function or using the intersection error function will come down to computation time, typically intersection method are more efficient when more surfaces are considered.

Fermat path principle
We recall that the Fermat path principle states that rays travel along a stationary path throughout the system.Let L be the optical path throughout an optical system and n i;iþ1 the refractive index between surface i and i þ 1 and p i p iþ1 k kis the Euclidean distance between p i and p i+1 .
As a consequence of Fermat's path principle we have: we define in equation ( 7), the vector F composed of the numerators of the @L=@w terms.
We call F the offence against Fermat path principle.F must be equal to 0 given equation ( 6) for the ray path to be physical.We note that there the dimensionality of F is equal to the dimensionality of w.
Therefore, we can apply the implicit differentiation technique to F to differentiate w with respect to / or s (Eq.( 8)).
The term @F @w represents the sensitivity of the offence against fermat path principle to small errors in the ray path.This term can be computed analytically if the curvatures of the optical surfaces can be computed analytically.
The term @F @/ represents the sensitivity of the offence against fermat path principle to small changes in the field or the pupil coordinate of the ray.
The term @F @s represents the sensitivity of the offence against fermat path principle to small changes in the geometry of the system.To be computed it will require that the optical surfaces are differentiable with respect to the variable parameters of the optical system.For example, if the optical system has a spherical optical surface with a variable curvature, it is possible to differentiate a point on the surface with respect to the curvature and therefore it is possible to compute analytically @F @s .In most cases the three terms above will be computable analytically and therefore the ray tracing will be differentiable.

Differentiating merit functions
In the previous section we have described how to differentiate a ray tracing.
However, merit functions are typically calculated from the results of multiple operations applied on multiple ray traces.
We will assume that merit functions are composed of operations that are all differentiable, but that are too complex for hand-derivation of derivatives to be practical.
In this case, automatic differentiation (AD) is the tool of choice [9].It is one of the main algorithms behind the progresses of deep learning the recent years.In the following section we will illustrate the concepts behind AD.
Let us first assume that our merit function can be described as an expression which can be put in the form of a directed acyclic graph (DAG).In Figure 1 such a DAG is represented for the calculation of the spherical sag.The sub-expression that is reused corresponds to the expression of The DAG can be converted into a list of elementary operations that a computer will perform in sequence to compute the expression.In the first column of Table 1 each intermediary calculation is denoted by an intermediary variable a i with a 12 corresponding to the result of our computation.
All elementary operations of Table 1 can be differentiated.AD assumes that the computer has a database of differentiation rules for each elementary steps.
There are two modes to aggregate the results of elementary rules into a program, the forward mode and the reverse mode [9].FORMIDABLE uses the forward mode that we introduce next.
In parallel of the computation of Table 1 we will also perform the calculation of its derivative.We introduce the notation: The second column of Table 1 introduces the calculations of the derivative.
The main advantage compared to the tree approach introduced earlier is that each operation can be done in sequence, forwarding the results of the differentiation to the final result.
The disadvantage is that it requires to set the value of _ x, _ y and _ c.These values are called the seeds and will be set for example to _ x ¼ 1, _ y ¼ 0 and _ c ¼ 0 to compute the derivatives with respect to x.
So if we want to compute all three derivatives (because we have three inputs) we would have to repeat the computation 3 times with different seed values.
However if our calculation had multiple outputs, (more than one root in the DAG illustrated in Fig. 1) we would only have to perform the calculation once to have the derivatives of all the outputs with respect to one input.This is one key property of forward mode differentiation, it is efficient for many outputs with few inputs, so it is well suited for ray-tracing as we will have typically more rays than degrees of freedom.
There are many different strategies to implement AD we will focus on the concepts behind the library For-wardDiff.jl[10] which FORMIDABLE is built upon.
This library is implemented in the Julia [11] programming language as is the core of FORMIDABLE.
In Figure 1, leaf nodes represent inputs and constant values, while the other nodes are operations.
For example multiply denotes the application of the multiplication operator to the inputs.
The idea behind ForwardDiff.jlimplementation is to change the implementation of the operations in a way that allows the computation of the derivative.
For that ForwardDiff.jluses Dual numbers.They bundle in the same variable a value (called primal value) and its derivatives.
Dual numbers can then be used to compute derivatives in the forward mode For example in Table 1 dual numbers would represent an entire row of the table ða i ; _ a i Þ.Then operators like the multiplication operator are redefined to operate on dual numbers.
For example, the multiplication operator would be defined as in equation (10).
In terms of implementation, the library ForwardDiff.jluses Julia multiple dispatch to call the correct function.This is also made possible, because Julia is a functional language meaning that multiplication is implemented with conventional functions that can be extended by the user.
We conclude this section by noting that we have shown how AD can be used to differentiate complex programs as long as the elementary functions that compose this program have known differentiation rules.
Since we have also defined the differentiation rule for ray-tracing we now have all the components required to implement a generalized differential ray-tracer which we will describe in the next section.

Case study
The case study we chose to test FORMIDABLE and uncover potential limitations is inspired from the literature [7].Primary properties are in Table 2. Table 1.Operations to compute simulatenously sag and derivative in forward mode.
In the remaining of the section we will first focus on three difficulties that were encountered: definition of aperture on NURBS surfaces, ray-aiming, NURBS and the choice of the optimisation algorithm, before presenting the results.

Projected aperture
Optical design software usually only consider surfaces that can be defined as sag surfaces.
A sag surface defines a point in a local coordinate system as in equation (11).
NURBS surfaces are of a more general form described in equation (12).
In the case of a sag surfaces, we can define an aperture A as a function of x and y which are projection of P on a reference plane as in Figure 2.
For example for a centered circular aperture A would be defined as in equation (13).
In equation ( 13) R defines the radius of the aperture and A is negative if the ray is blocked by the aperture.
In the case of a NURBS u and v do not correspond to a projection on a plane.Often they do not even correspond to physical quantities as they are normalized between 0 and 1.
ProjectedAperture allows bypassing this problem by defining an, aperture plane independent of the surface definition (see Fig. 3) and adds the following fields: aperture: a "classical" aperture, a: tangent vector of the aperture plane, b: tangent vector of the aperture plane (2nd direction), p: offset point, moves the coordinate system of the aperture plane.
A projected aperture will then project the intersection point onto the aperture plane plane, and apply the aperture on this plane.This allows to completely decouple the surface definition from the aperture definition.

Ray-aiming
The first preliminary tests of optimisation showed that it is critical to eliminate as much as possible discontinuities in the merit function.Some discontinuities are unavoidable, when the optimizer tries to evaluate the merit function for an unphysical system (impossibly large curvature for example).Some discontinuities are due to a Formidable calculation artifacts and are a false positive to be avoided.
False positives were found to come from ray-aiming and from the definition of the physical pupil shape.
Ray-aiming is particularly difficult in off-axis systems because it is difficult to locate the image of the physical pupil in the object space.
A simple ray-aiming implementation can be done naively by an optimisation whose guess was that the ray would originate from the center of the first surface (Fig. 4).
This could fail in two ways (in red in Fig. 4): Either because the guess provided was too far from the minima, so it would not converge.Or because they were multiple ray-aiming solutions.The first case requires a better guess than the center of the first surface, the second case is harder to solve, and can typically happen where there is a reflection at a very shallow angle that manages to intersect the pupil at the correct location.
We solved the ray-aiming problem by an algorithm in three parts.
The first part deals with finding the On-axis Chief Ray (OAR) that is the chief ray of the center field.
Since this search is done relatively rarely we invest a significant amount of computational resources to ensure that the correct OAR is found.This is done in the following steps: Tracing in reverse from the center of the pupil, we scan the full sphere of incidence angle to search for directions that trace back to the entrance.We obtain reverse OAR candidates that intersect the pupil at the correct location but do not originate from the correct field points.Tracing in the direct direction and using as a guess the intersection of the OAR candidates with the first surface we adjust this intersection to find direct OAR candidates that intersect the pupil at the correct location and come from the correct field points.We obtain less direct OAR candidates than reverse OAR candidate because the adjustment is not always successful.Furthermore, most of the direct OAR candidates converge to the same ray-aiming solution.
We eliminate non-physical rays that is rays whose direction is sometime opposite the physical propagation of light.To allow tracing to virtual surfaces, (exit pupil for example) we disregard this rule when the direction violation occurs before or after a dummy surface.
If there is still multiple candidates we select the one that has paraxial derivatives that come closer to an imaging function.
This process is depicted in Figure 5.If the system is axial symmetric, this procedure can be disabled.Now that we have an OAR we can compute the physical pupil shape.For this we trace a sample of rays along the edge of the target pupil shape in the object space to the physical pupil and we fit the physical pupil to this sample.This is done with typically 100 rays, much more than what optical design software will typically do.This large number avoids a situation where a couple of NURBS control points could be used by the optimizer to artificially reduce the pupil size.
Once we have the physical pupil shape determined we can compute a raymap.
We progressively fill a hyper cube sampling field, pupil and wavelength of rays.Each ray is traced using the closest ray already traced as a guess back to the OAR, ensuring continuity of the ray map.
If the sampling is sufficiently dense, the expensive procedure for the OAR does not need to be repeated for each ray.
Once the raymap is filled, each ray trace will first look in the raymap for the closest ray that has been traced and use it as a guess.This ensures very robust ray-aiming and is also a speed improvement.

Optimisation
Testing showed that the Levenberg-Marquardt algorithm (also known as damped-least squares) has remarkably good performance on optical problems.
Since we are using forward-mode differentiation, we get the jacobian matrix for free, which would not be the case in reverse-mode differentiation.
Typical optical merit function try to minimize the quadratic sum of components that we will call residuals.
Levenberg-Marquadt tries to bring as close to zero as possible the residuals.This should be completely equivalent but as explained in literature [12] under the assumption that this residuals will indeed come closer to zero it allows Levenberg-Marquardt to behave like 2nd order derivative information was provided.Of the open-source implementations available Levenberg-Marquardt implementation available in SciPy [13] was found to be the most efficient implementation in terms of speed of convergence and quality of the solution found.
Therefore, the direction of descent chosen is usually more efficient than just following the gradient of the merit function.
With this observation, convergence was faster but the final performance still disappointing.A more careful investigation showed that the jacobian matrix conditionning was poor and the effective rank as calculated numerically was deficient.
This was solved by first removing degrees of freedom that were redundant.For example, the effect of decenters of entire NURBS on the merit function was found numerically to be strongly correlated with the control points displacement.Lastly, the remaining problem was with control points of the NURBS that were not contributing in the first order of the merit function.This is typically control points outside of the circular footprint because typical NURBS will have a rectangular bounding curve.
We first tried to solve this problem by adding a penalty either on the curvature of the NURBS (penalized to stay as constant as possible) or on the departure from the starting point.This was found to converge to poor results.What was successful was to optimise using a rectangular pupil shape to ensure that every grid point of the NURBS contributes.
This requires each NURBS to be tailored to be just slightly larger that its effective aperture before optimisation, except the pupil which needs a larger oversize due to the way the physical stop is computed.
The cause of this problem is that we define NURBS with a regular grid of points.To directly optimize circular pupils and NURBS, it will be necessary to investigate optimisation of irregular grids in FORMIDABLE.
A system obtained with this method is depicted in Figure 6, with its RMS spot radius field map depicted in Figure 7.For comparison a system was design with the same properties in CodeV using polynomials with inferior performance as depecited in Figure 8.
4 Software architecture 4.1 The choice of Julia Julia [11] is a high-performance language that aims at a familiar syntax for users of Matlab [14] and Python [15].
It solves the two language problems encountered by Python users when a language convenient to use (Python) must be often supplemented by another language for the performance demanding parts.
The two language problems has the following drawbacks: Developer of the software need to be proficient in two languages.Users of such software also need to be learned in two languages if they want to have a deep understanding of the internal workings.External tools such as debuggers, profilers are difficult to use.Solving the two language problem is achieved by just-intime compilation.In a nutshell, when a Julia function is called, Julia analyses the arguments type (e.g.float, integer or more complex types) and compiles an optimised version of this function (called a method) for this particular argument type combination (called a signature).
This strategy is also employed by Numba [16], a Python library aimed at high performance.
It has the advantage that while Numba is limited to a subset of Python functionalities in Julia no such limitations exist.
Finally Julia has great interoperability with Python [17], allowing Python users to use FORMIDABLE easily.
FORMIDABLE is divided in packages to simplify the management of the third-party dependencies.It allows the end-user to install only necessary packages without many useless dependencies.
Therefore, functions and data structures are grouped based on bringing-in similar dependencies as illustrated in Figure 9.
Formidable.jl is the core packages including everything relating to the raytrace of an optical system and the definition of the optical system.It also includes dependencies required to perform the differential ray-tracing.
All other packages depend on it.FormidableViewer.jlincludes the plotting functions and depends on third party plotting libraries such as Plots.jl[18] and MeshCat.jl[19] that we did not want to add as dependencies to Formidable.jl core.
FormidableOptim.jl implements conversion utilities to allow optimisation of optical merit functions with third-party optimisation libraries.
PyFormidable is a python package that implements the bridge to python.

Numerical performance
FORMIDABLE has been developed not with the goal of having the ultimate performance but with the goal of being performant enough, comparable with performance indicated in litterature [7].
On a CPU clocked at 2.2 FORMIDABLE is capable of computing a ray, NURBS intersection in 0.6 ls.
Furthermore, in the merit function calculation, batch ray tracing is parallelized on all available CPU threads.
Automatic differentiation brings advantages in performance as well, the merit function of the case study presented above is composed of 228 degrees of freedom and is evaluated in 116 ms while its gradient takes 4.581 s to evaluate.
So the finite differentiation logically takes 2 Â 228 Â 116 ms = 52.9s on computation only, automatic differentiation is about 11.5Â faster.This is due to the fact that some expensive calculations (especially the ray-aiming and computation of the raymap) are be done only once, and the derivatives calculated without having to redo the ray-aiming twice for each degree of freedom.

Figure 1 .
Figure 1.Directed acyclic graph for the calculation of the sphere sag.

Figure 2 .
Figure 2. A ray vignetted by an aperture on a surface defined by a sag function.

Figure 3 .
Figure 3.A ray vignetted by an aperture on a generic surface.The aperture plane is defined independantly using a and b vectors (offset point omitted).

Figure 4 .
Figure 4. First naive algorithm for ray aiming.

Figure 9 .
Figure 9. Package hierarchy and main dependencies.
At the time of writing this article FORMIDABLE implements the following functionalities: Surfaces -Conic/aspherical surfaces -XY polynomials -ZERNIKE polynomials -NURBS -Linear diffraction gratings Analysis -2D and 3D layout (discontinuities plotted on a 3D layout) -Transverse ray diagrams -Spot diagrams -Wavefront error maps -Distortion maps -NURBS curvature maps Due to its nature (julia programming and available source) it is comparatively very easy to implement new functionalities.FORMIDABLE [22] is available under the ESA Software Community Licence [23] following registration.

Table 2 .
Case study primary properties.