This application claims priority under 35 U.S.C. 119 from French Patent Application, Serial Number FR 2104801, filed May 6, 2021, which contents are hereby incorporated by reference.
This application further claims priority under 35 U.S.C. 119 from European Patent Application Number EP21306565, filed Nov. 8, 2021, which contents are hereby incorporated by reference.
A field of this invention is information retrieval and recommendation. More particularly, embodiments of the invention relate to computer-implemented methods and systems for fairly ranking objects that are retrieved.
Generally, information retrieval and recommendation involve two stages. The first stage focuses on retrieving a candidate set of results and the second stage focuses on ranking the candidate set of results.
The candidate set of results may include search results (e.g., lists of links to documents from search results in response to a query) and recommendations (e.g., lists of Points-Of-Interest recommendations in response to an identified location, or lists of recommendations of songs in response to genre selection, etc.).
For instance, for information retrieval, a query may be input to a first-stage retriever, which processes the query (for example, based on relevance), and accordingly retrieves a set of documents. A second-stage or ranker (or reranker) then ranks the retrieved set of documents and outputs a ranked set of documents, which, for instance, can be equal to or fewer in number than the first set.
Users (e.g., content consumers) expect the most relevant results to have the highest exposure, whereas providers (e.g., content producers) seek to have a fair (or equitable) exposure to their content. Hence, when ranking the candidate set of results in the second stage, it is preferred that the ranking method balances between utility (which represents users who access the set of results) and fairness (which represents the providers who make up the set of results).
While some ranking methods exist that balance utility and fairness, their complexity is generally prohibitive to enable their use in any realistic scenario. Hence, there continues to be a need for an optimal method (e.g., capable to compute exact optimal solutions) which complexity is reduced compared to existing methods so as to enable Web-scale fair-useful ranking, for instance.
Example methods of the present invention provide according to a first aspect a computer-implemented method for ranking a set of objects that includes:
receiving the set of objects and a set of objective functions;
defining a decision space having n decision variables using a permutohedron, where n is the number of objects to rank and where vertices of the permutohedron represent permutations of exposures provided to the objects in the set by corresponding rankings;
determining a Pareto-set for the set of objective functions;
with a Pareto-optimal point in the Pareto-set, determining a distribution over rankings for the objects in the set using the decision space, where a proportion is associated to each ranking in said distribution;
selecting a sequence of rankings for the objects in the set from the distribution over rankings in accordance with their proportions; and
outputting the selected sequence of rankings.
Certain preferred, but non-limiting aspects of the method according to the first aspect are as follows:
(i) determining an arbitrary vertex of the decision space;
(ii) drawing a line starting at the arbitrary vertex through the target exposure until the line intersects a face of the decision space;
(iii) repeating (i) and (ii) on the intersected face of the decision space using the new intersection point instead of the target exposure, until the newly intersected face is a vertex;
According to a second, a third, a fourth, and a fifth aspect, respectively, the invention may provide: a computer program product comprising code instructions which, when said program is executed on a computer, cause the computer to perform the method according to the first aspect of the invention; a computer-readable medium having stored thereon the computer program product; a data processing device comprising a processor configured to perform the method according to the first aspect of the invention; and/or a system for information retrieval including a computer-implemented first-stage retriever configured to receive a query and generate a set of objects, and a computer-implemented second-stage ranker configured to rank the set of objects according to the first aspect of the invention.
The accompanying drawings are incorporated into the specification for the purpose of explaining the principles of the embodiments. The drawings are not to be construed as limiting the invention to only the illustrated and described embodiments or to how they can be made and used. Further features and advantages will become apparent from the following and, more particularly, from the description of the embodiments as illustrated in the accompanying drawings, wherein:
The disclosed computer-implemented method and embodiments for ranking objects may be implemented within an architecture (e.g., a network or system architecture) such as illustrated in
Example methods provided herein may be implemented by a processor such as the processor 112 or other processor in the server 100 and/or client devices 102. It will be appreciated that the processor 112 can include either a single processor or multiple processors operating in series or in parallel. Memory used in example methods may be embodied, for instance, in memory 113 and/or suitable storage in the server 100, client devices 102b-e, a connected remote storage, or any combination. Memory can include one or more memories or memory elements or structures, including combinations of memory types and/or locations. Data in memory can be stored in any suitable format for data retrieval and processing.
Server 100 may include, but is not limited to, dedicated servers, cloud-based servers, or a combination (e.g., shared). Data streams may be communicated from, received by, and/or generated by the server 100 and/or the client devices 102b-e.
Client devices 102b-e may be any processor-based device, terminal, etc., and/or may be embodied in a client application executable by a processor-based device, etc. Client devices may be disposed within the server 100 and/or external to the server (local or remote, or any combination) and in communication with the server. Example client devices 102b-e include, but are not limited to, autonomous vehicle 102b, robot 102c, computer 102d, mobile communication devices (e.g., smartphones, tablet computers, etc.) such as smartphone 102e, as well as various processor-based devices not shown in
Example methods provided herein address the problem of designing optimal fair-useful ranking policies efficiently using a set of optimization/decision variables. A first feature of example methods chooses as optimization/decision variables “item exposure” variables that act as key links between important objectives in what constitutes an ideal ranking: in particular a utility objective, which represents typically the user or consumer viewpoint, and a fairness objective, which represents typically the provider or supplier viewpoint.
Advantageously, a second feature of example methods can express the optimization problem with only n decision variables in a decision space which is (e.g., can be represented by) a generalized permutohedron, where n is the number of objects to rank, while keeping the expressiveness of the policy to fully control the utility and fairness objectives separately and exactly. Vertices of example permutohedrons disclosed can represent the exposure associated with a corresponding rank, and such permutohedrons are referred to herein as “Expohedrons”. The decision space provided by the permutohedron (e.g., Expohedron) allows one to represent any distribution (or convex combination) over rankings (or, synonymously, over permutations) and to reason geometrically in this space to solve the Utility-Fairness trade-off. In particular, the whole Pareto set of the MOO (Multi-objective Optimization) problem can be easily obtained without passing through explicit scalarisation techniques, thus reducing required processing time and resources.
Example methods can determine the optimal policy using unbiased estimates of relevance scores with uniform estimation quality over the objects and an exposure model with known structure and parameters.
Example methods operate with a complexity of O(n2 log n). Such methods can apply geometric reasoning. Most of the method's steps may be expressed in closed-form equations. Other of the method's steps can be provided by sorting operations. Moreover, the entire Pareto-set can be generated analytically and geometrically, without relying on, say, a scalarization technique to scan the entire frontier.
At 202, server 100 receives from any suitable source(s), including external and/or internal sources, a set of objects, an (e.g., unbiased) relevance score for each object in the set of objects, a list of exposures which are each associated with a rank, and objective functions including a ranking fairness objective function and a ranking utility objective function. At 204, server 100 defines a decision space using a permutohedron having n decision variables, where n is the number of objects to rank and where vertices of the permutohedron represent permutations of exposures provided to the objects in the set by corresponding rankings.
At 206, using the ranking fairness objective function and the ranking utility objective function, server 100 computes a Pareto-set (i.e., the set of non-dominated solutions where there are no other solutions that are better for all objective functions; for example, where there are no exposures that offer larger utility and better fairness at the same time) within the decision space defined by the list of exposures between a first point representing optimal fairness and a second point representing optimal utility. Optimal fairness types in example methods may include, for instance, demographic fairness and meritocratic fairness. As described in more detail below, the ideal exposure for demographic fairness allows equal exposure, whereas for meritocratic fairness exposure may be defined based on relevance (e.g., defined using a vector of relevance). In yet other embodiments, the optimal fairness type may be additionally or alternatively customized based on a defined proportion other than fairness (e.g., for allocating prize money to be paid).
Example fairness types consider individual fairness as opposed to group fairness. Individual fairness tries to ensure equity at the level of individual objects, while group fairness assumes that objects can be related to groups and that the equity is ensured at the group level, typically preventing some minority group to be disadvantaged.
At 207, the server 100 outputs (internally or externally) the Pareto-set to a decision-maker. At 208, server 100 receives (internally or externally) a point in the Pareto-set which translates to a target exposure within the decision space. This target exposure corresponds to an exposure across the objects in the set. The target exposure in an embodiment may be defined by a decision-maker internal or external to the server 100 that sets a defined utility/fairness trade-off, which could be set on a case-by-case basis or more generally as fixed constraints (e.g., set constraints where fairness does not fall below a predefined threshold). In some embodiments the decision-maker may be an administrator. In other embodiments the decision-maker may be an automated system. Combination of administrators and automated systems may also be used.
At 210, using the target exposure received from the decision-maker as an optimal trade-off in the Pareto-set, server 100 determines a distribution over rankings (e.g., a weighted set of rankings) which achieve on average the target exposure for the objects in the set, where each ranking of the distribution over rankings corresponds to a vertex in the decision space. In one embodiment, determining the distribution over rankings can be performed as follows: (i) an arbitrary vertex of the decision space is determined; (ii) a line is drawn (e.g., computed) starting at the arbitrary vertex through the target exposure received from the decision-maker until the line intersects a face of the decision space; (iii) the steps (i) and (ii) are repeated on the intersected face of the decision space using the intersection point instead of the decision-maker's target exposure, until the newly intersected face is a vertex. Steps (i)-(iii) can be performed as many times as there are objects in the set, depending on when the intersection is a vertex at step (iii). Each vertex of the decision space has an associated proportion.
At 212, server 100 deploys the distribution over rankings by selecting a sequence of rankings for the set of objects from the distribution over rankings in accordance with their proportions.
In other embodiments, as mentioned above, the method according to the embodiments of
In one exemplary embodiment, a non-personalized query (e.g., a query made by an anonymous user) is received repeatedly for general points of interest at a defined map location. The set of objects in this exemplary embodiment may be a list of general points of interest such as restaurants, museums, shops, and gas stations. In another exemplary embodiment, a non-personalized keyword query is received repeatedly by a search engine. The set of objects in this exemplary embodiment may be a list of links to documents such as web pages. As these queries are repeated over time, the ordering of the sets of objects varies in accordance with their respective sequence of rankings to achieve their target exposure.
Based on the selected sequence of rankings at 212, one, all, or a subset of the set of objects can be presented, e.g., in a Search Engine Result Page (SERP) that is prepared and provided, e.g., transmitted, to an external or internal device for presenting (e.g., displaying, announcing, printing, importing, exporting, storing, etc.). For instance, a SERP including one, a subset, or all of the objects, where such objects are respectively located based on their determined rank within the selected sequence, can be generated and transmitted to a terminal of a server 100 or client device 102 for displaying on a display. If one or a subset of the objects are presented in the SERP instead of all objects, such presented objects may be, for instance, those objects having a respectively higher ranking in the selected sequence than others in the set of objects.
Exposure, also known as “attention” or “examination” in the field of Information Retrieval (IR), can be defined as the probability that the user will examine an object (such as but not limited to a document) in a certain location of a Search Engine Result Page (SERP). Exposure values (e.g., forming part of the list of exposures received at step 202(iii) in
From the consumer's (i.e., user's) viewpoint, the consumer desires the more relevant objects be given a higher exposure, so as not to spend time looking for relevant objects in a poorly visible location of the SERP. This desire by the consumer may be expressed in example methods by defining the utility of a ranking as the dot product between an exposure vector (i.e., the vector made up of the exposure values provided by the ranking to each object) and a gain vector (i.e., the vector made up of the gains of each object, the gain of an object being defined by an arbitrary monotonically increasing function of the relevance score of the object). Known information retrieval utility measures such as but not limited to Discounted Cumulative Gain (DCG) and Expected Reciprocal Rank (ERR) reflect this formulation, with particular choices of the exposure model parameters. DCG, for instance, assumes that the exposure of an object at rank k is given by 1/log2(1+k) and that the gain function g(d) is given by g(d)=2rel(d), where rel(d) is the relevance score of d. The ERR measure is based on the cascade model and assumes that the exposure depends on the relevance of previous objects in the list; in particular, once a user is satisfied with an object, the exposure of the next objects in the list will be zero.
From a provider's viewpoint, the provider would like that their own objects have a higher exposure in the SERP. However, because there are many providers, a non-disparate treatment of the objects presented to users is desired.
Both the consumer's and the provider's viewpoints, which correspond respectively to a “Utility” objective and “Fairness” criteria, can be expressed in example methods disclosed herein in terms of “exposure.” Exposure acts as a link between objectives of an ideal ranking: the utility objective (e.g., which represents the user or the consumer viewpoint) and the fairness criterion (e.g., which represents the object provider or supplier viewpoint).
An example permutohedron is a polytope, where each vertex corresponds to a particular ranking or permutation over n objects (e.g., documents) and the polytope is the convex hull of these vertices. This polytope is embedded in an n-dimensional space, but is actually (n−1) dimensional. For example,
In addition, every facet of the 3-D object 300 represents a partial ordering of the n objects into two groups. For example, the facet 308 including vertices (4312), (3412), (2413), (2314), (3214), and (4213) represents a partial ordering where d3 is always first (i.e., d3@rank 1) followed by the three other documents in any order; the facet 310 including vertices (3214), (2314), (1324), (1234), (2134), and (3124) represents a partial ordering where d4 is always last (i.e., d4@rank 4) preceded by the three other documents in any order; and the squared facet 312 including vertices (2413), (1423), (1324), and (2314) represents a partial ordering where d1 and d3, in any order for the first two positions, followed by d2 and d4, in any order for the last two positions (i.e., (d1 and d3) before (d2 and d4)).
More generally, each face of dimension (n−k), which generalizes the notion of facet by following a hierarchy with decreasing dimensionality (e.g., facet→edge→vertex for n=4) represents all possible distributions (or convex combinations) respecting a certain partial ordering of the n objects into k groups. Referring again to
Caratheodory's theorem states that any point in the convex hull of a set P of m points vi, embedded in a d-dimensional space (vi∈by Rd ∀i=1, . . . , m) can be decomposed into a convex combination of at most (d+1) of these points.
When the permutohedron is considered as a special case, any point of the permutohedron, which is a d-dimensional object with d=(n−1) (n designating the number of objects), can be decomposed into a distribution over at most n rankings. In an embodiment a procedure known as the GLS procedure (Grotschel, Lovasz and Schrijver) can be used to determine one such decomposition (in general, more than one single decomposition is possible). An example of the GLS procedure (see Grotschel et al., “Geometric Algorithms and Combinatorial Optimization,” published in Springer Science & Business Media, December 2012) is illustrated in
As shown in
Example ranking methods set forth herein will now be described in further detail. The following notation is used for the purpose of formally describing features of example methods:
Example methods can use a ranking policy, denoted herein as π(q), that is both useful and fair by expressing everything (e.g., consumer-oriented utility and provider-oriented fairness) in terms of a single set of decision/optimization variables, which variables are referred to herein as “control levers.” The control levers define an exposure vector, denoted as └π(q). This vector is indexed by object (the first component corresponds to document d1, etc.). This means that, instead of working directly in a decision space defined using a permutohedron where vertex coordinates represent ranks, a modified polytope, referred to herein as an Expohedron, can be used by the disclosed method, where vertex coordinates represent the exposure associated with the corresponding rank. In other words, vertices of the Expohedron represent permutations of exposures provided to the objects by corresponding rankings. While there can exist a 1:1 correspondence between the permutohedron and the Expohedron, the Expohedron directly represents the control lever space (i.e., decision space). Referring again to the exposure vector, denoted as ∈π(q): the ranking policy is a distribution over m rankings maximum (m≤n), ∈π(q)Σi=1mαi∈σ
This optimization problem may be expressed as a multi-objective optimization problem. A Pareto-set (i.e., the set of feasible non-dominated solutions) can be determined by geometric reasoning. Then, a particular trade-off in this determined Pareto-set can be selected or determined, e.g., by a decision-maker, where the trade-off is one target point in the decision space (i.e., a target exposure). This point can be decomposed as a convex combination of at most n rankings, as known from Caratheodory's theorem as provided above.
Once this combination is determined, it (i.e., the ordering of a set of objects from the distribution over rankings) may be deployed through a fair scheduling strategy, which in one embodiment uses low-discrepancy sequences, such as the golden-ratio low-discrepancy sequences (see “Weighted Round Robin (Weighted Random Integers) Using the Golden Ratio Low Discrepancy Sequence”, published on the Internet at demofox.org, June 2020). Those skilled in the art will appreciate that other scheduling strategies could be used in alternate embodiments such as but not limited to algorithms similar to m-balanced words or, equivalently, Stride Scheduling.
An advantage of example methods is that several or even most steps may be performed using geometric reasoning, which in practice, leads to simple algebraic, closed-form solutions. A further advantage of example methods is that they offer a time complexity in O(n2 log(n)). Yet a further advantage of example methods is that they operate in an n-dimensional space, instead of, for instance, an n!-dimensional space or an n2-dimensional space. Consequently, the number of decision/optimization variables employed by example methods is not larger than n, allowing any optimal solution as a distribution over only n rankings at maximum to be implemented.
Example methods can be provided for a general class of exposure models referred to as a “Position-Based Model” (PBM). This family of models assumes that the exposure of an object only depends on its rank. Each rank k is then associated with a parameter γk which represents the probability that this rank will be examined by the user. Other classes of models may be processed using example methods.
In embodiments, an example method uses a PBM-type exposure model, characterized by a fixed set of n parameters γ=(γ1, y2, . . . , γn). Without loss of generality, it is assumed that the γk are sorted by decreasing value. This does not necessarily imply that the examination probabilities are decreasing with the rank, even if it often is the case in practice. Under this model, the sum of object (e.g., document) exposures in a ranked list is always the same and equal to Σk=1nγk. A particular example of an Expohedron for n=3 is shown in
Any point in the Expohedron shown in
There is a straightforward mathematical way to check whether a point belongs to the Expohedron, and this way is exploited by example methods as explained herein. This is referred to as the majorization condition in mathematics, which provides that a point ∈=(∈1, . . . , ∈n) belongs to the Expohedron if and only if ∈ is majorized by γ, which is written as : ∈γ.
The mathematical definition of majorization is the following: ∈γ iff: Σi=1k∈i↓≤Σi=1kγi↓∀k<n and Σi=1n∈i↓=Σi=1nγi↓ with x↓ the vector with the same components as x, but sorted in descending order.
“Zones” as used herein are defined as sets of points that have coordinates (i.e., exposures) in the same order. More particularly, these points can correspond to vectors such that the indices of the components sorted in increasing order are the same. In python, for example, it corresponds to arrays for which the outputs of the argsort function are identical. There are as many zones as vertices in the Expohedron, and each zone contains only one vertex. An example zone is an unbounded pyramid, whose apex is the barycenter and whose semi-axes correspond to the lines joining the Expohedron barycenter to the barycenter of each facet adjacent to the unique vertex that the zone contains.
Given a point in the Expohedron decision space, which has coordinates given by ∈π(q), an example method can include and/or consider the following, each of which is illustrated in further detail respectively in the sections that follow:
The example Utility criterion provides that objects with high relevance score or, more generally, with high gain should have a higher exposure. Without loss of generality, ρ is defined as the vector of the gains (or relevance scores, if the gain function is chosen as the identity), normalized in the same units as the exposure vector, in the sense that Σi=1nρi=Σi=1nγi, which is a constant for a given PBM. This implies that ρ is located on the same hyperplane as the exposure vectors and that they can be directly compared, composed or visualized jointly on the projected Expohedron.
Utility may be expressed, for example, as the dot product between the relevance vector and the exposure vector: U(∈)=ρT.∈. Consequently, equi-utility surfaces in the Expohedron are hyperplanes whose normal is equal to ρ, as illustrated in
Given this mathematical expression of the Utility, the max-Utility ranking policy in the Expohedron may be found using the point ∈* (or in some example methods the set of points) located on a face of the Expohedron whose projection on the ρ, i.e.,
is the largest (see, for example, the point 708 on the face 710 in
When the relevance vector has ties (i.e., where at least two elements of the relevance vector are equal), as represented on
An example fairness criterion will now be expressed in the Expohedron framework. The individual Demographic fairness criterion states that, ideally, all objects (e.g., documents) should have the same exposure. As the sum of the exposures is a constant, it means that the target exposure of the Demographic fairness policy is the barycenter of the Expohedron:
Thus, the fairness criterion can be defined as a quadratic function, for instance as the proximity (or minus the distance) to the barycenter: Fd(∈)=−∥∈−β∥22.
Considering now Meritocratic Fairness, the ideal exposure vector should be proportional to the relevance vector or, more generally, to the merit vector denoted as ρ′, where the merit of an object is defined as a monotonically increasing function of the relevance score of this object or, equivalently, of its gain (the proportionality constant is equal to one here, when working with a merit vector normalized in the same units as the exposure vector, in the sense that Σi=1nρi′=Σi=1nγi). ρ and ρ′ are located in the same zone, because of the monotonically increasing relationship linking them. ρ and ρ′ may be, but need not be, chosen as equal, and identical to the relevance score vector (i.e., the gain and merit functions are chosen as the identity function). Formally, the Meritocratic fairness of a policy with exposure ∈ can be defined as a quadratic function, for instance as the proximity (or minus the distance) to the normalised relevance vector: Fm(∈)=−∥∈−ρ′∥22.
As illustrated in
It could happen that the merit vector is outside the Expohedron, namely when γρ′ (i.e., the majorization condition is not fulfilled). In this case, it is possible to relax the pure proportionality relationship into an affine relationship, with an offset as small as possible while still being in the Expohedron (e.g., the definition of fairness as expressed in Biega et al., “Overview of the TREC 2019 Fair Ranking Track”, in arXiv:2003.11650, March 2020, and in Diaz et al., “Evaluating Stochastic Rankings with Expected Exposure”, in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 275-284, October 2020). This amounts to choosing a target vector which is at the intersection of the merit vector axis and the border of the Expohedron.
Denote Gk=Σi=1kγi and Rk=Σi=1kρ′i↓. The intersection of the relevance vector axis and the border of the Expohedron is given by the affine transformation (referred to in
which expresses that Σi=1k{tilde over (ρ)}i≤Σi=1kγi, and that b is the smallest value that ensures that condition ∀k<n; and
which expresses that Σi=1n{tilde over (ρ)}i=Σi=1nγi.
An alternative solution to the affine transformation for choosing an alternative but feasible meritocratic fairness point is to compute an orthogonal projection, for which a result for the example shown in
Having determined how to compute and how to optimize the Utility and Fairness separately, the complete Pareto-set of the multi-objective Utility-Fairness Problem can be computed.
Explaining the method intuitively and geometrically, an example computation method starts from one extreme of the Pareto-set, namely the ‘purely fair’ solution. Then, to draw (e.g., compute) the entire Pareto-set, the gain vector direction is (computationally) followed. It may be shown that all these points are not dominated by any other point in the Expohedron and correspond to some optimal trade-off between fairness and utility. If the gain vector direction is always followed, the border of the Expohedron will necessarily be crossed at a certain point, which means that the solution is no longer feasible. The direction of the gain vector projected on the (n−2)-dimensional facet that was just crossed is then (computationally) followed. By following this new direction, a new face is crossed, which is this time (n−3)-dimensional, and, once again, the direction of the gain vector projected on that new face should be followed. Finding the projection of the gain vector of any face has a closed-form expression. This path-following procedure is re-iterated computationally until finally a non-dominated max-Utility solution is reached. Along that path, every point corresponds to a strongly non-dominated solution of the utility-fairness trade-off, in other words, the path is a Pareto-set.
The path-following procedure described above is illustrated in
More formally described, the Pareto-optimal set is the union of (n−1) line segments that connects ν(i-1) to ν(i), for i=1, . . . , (n−1). In the following, without loss of generality and for the sake of notation simplicity, it is assumed that the objects (e.g., documents) are sorted by decreasing order of gain, namely ρ=(ρ1, ρ2, . . . , ρn) with ρ1≥ρ2≥ . . . ≥ρn. The initial point ν(0) is either β or ρ′ (replaced by {tilde over (ρ)} if ρ′ is infeasible), depending on whether demographic or meritocratic fairness are considered. When establishing the Pareto-set, the points are always located in the same zone (i.e., the zone of the max-Utility or PRP vertex; it is noted that the barycenter belongs to all zones, as all zones have the barycenter as apex) because, the order of the components of the corresponding exposure vectors is not changed when a vector in the direction of the gain vector is incrementally added to these exposure vectors, even if it is projected on a face.
An embodiment of the Pareto-set building method is set out in
As set forth herein, a point in the Pareto-set, which can be chosen for instance by a decision-maker, can translate to a target exposure within the decision space. This point in the Pareto-set can be decomposed as a convex combination of at most n rankings. To begin this decomposition problem, example methods can particularize and adapt the general GLS procedure described above to the structure of the Expohedron polytope.
One embodiment for realizing the decomposition is set forth in pseudo-code in
At line 9 in
The Bisection method itself has a number of iterations that is independent of n (e.g., for 5-10 iterations). Checking the majorization condition inside the Bisection method requires O(n log n) complexity, so that the total complexity of this method is O(n2 log n).
Given the decomposition of the target exposure into a distribution over ranking, any of several methods, alone or in combination, may be used to deploy the distribution in the form of a sequence of rankings.
For example, stochastic sampling (i.e., random number generators) may be used to deploy a distribution over rankings. In other embodiments, Low-Discrepancy Sequences may be used (e.g., see Martin Roberts, “The unreasonable effectiveness of quasirandom sequences”, Apr 2018).
Low-Discrepancy Sequences (LDS) are provided such that for all t, the sub-sequence of rankings R1, R2, . . . , Rt has low discrepancy (i.e., the proportion of rankings is close to the desired proportion; i.e., proportion of the infinite sequence). Low-Discrepancy Sequences are typically quasi-random sequences of numbers in the [0,1] interval that are as close as possible to the uniform distribution, and these sequences of floats in [0,1] may be transformed as sequences of rankings with desired proportion by comparing the generated float with the stacked (i.e., cumulated) value of the proportions.
The use of additive-recurrence sequence based on irrational numbers (also called Kronecker, Weyl or Richtmyer sequences) may be used in embodiments, and in particular on the golden ratio, which is in some sense the most irrational number. The general recursive form of the sequence is:
s
n+1=(sn+α) mod 1
with α=(√5−1)/2, which is the value achieving the optimal discrepancy for this additive-recurrence sequence class of LDS.
Families of efficient sampling strategies other than Low Discrepancy Sequences may alternative or additionally be used. For instance, strategies based on Stride Scheduling or, equivalently, m-balanced words, can be used as well, and can provide very similar performance.
When expressed in the terms of the example problem, a generator of m-balanced words produces a sequence of rankings such that, in any pair of sub-sequences with identical length, the frequency of any ranking differs at most by m. In other words, this generator guarantees that the generated sequence delivers the rankings with proportions as close as possible to the target ones. In theory, but not wishing to be bound by theory, the best achievable m is, in an example case, at most equal to n−1. An example algorithm capable of efficiently generating m-balanced sequences of rankings, given a certain distribution of distribution of rankings, is provided in Algorithm 1 of Shinya Sano, Naoto Miyoshi, and Ryohei Kataoka. 2004. m-Balanced words: A generalization of balanced words. Theoretical Computer Science 314, 1-2 (Feb. 2004), 97-120. https://doi.org/10.1016/j.tcs.2003.11.021. This generator is equivalent to the well-known Stride Scheduling algorithm, used to generate fair sequences in resource (CPU) management for concurrent processes, as described, for instance, in C. A. Waldspurger and E. Weihl. W. 1995. Stride Scheduling: Deterministic Proportional-Share Resource Management. Technical Report. Massachusetts Institute of Technology, USA.
Example methods set forth for ranking objects may be provided as a computer program product comprising code instructions to execute these methods (for example using data processors 112 of the server 100 and the client devices 120), and storage means readable by computer equipment (for example using memory 113 of the server 100 and the client devices 120) provided with this computer program product for storing such code instructions.
The foregoing description is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. The broad teachings of the disclosure may be implemented in a variety of forms. Therefore, while this disclosure includes particular examples, the true scope of the disclosure should not be so limited since other modifications will become apparent upon a study of the drawings, the specification, and the following claims. It should be understood that one or more steps within a method may be executed in different order (or concurrently) without altering the principles of the present disclosure. Further, although each of the embodiments is described above as having certain features, any one or more of those features described with respect to any embodiment of the disclosure may be implemented in and/or combined with features of any of the other embodiments, even if that combination is not explicitly described. In other words, the described embodiments are not mutually exclusive, and permutations of one or more embodiments with one another remain within the scope of this disclosure. All documents cited herein are hereby incorporated by reference in their entirety, without an admission that any of these documents constitute prior art.
Each module may include one or more interface circuits. In some examples, the interface circuits may include wired or wireless interfaces that are connected to a local area network (LAN), the Internet, a wide area network (WAN), or combinations thereof. The functionality of any given module of the present disclosure may be distributed among multiple modules that are connected via interface circuits. For example, multiple modules may allow load balancing. In a further example, a server (also known as remote, or cloud) module may accomplish some functionality on behalf of a client module. Each module may be implemented using code. The term code, as used above, may include software, firmware, and/or microcode, and may refer to programs, routines, functions, classes, data structures, and/or objects.
The term memory circuit is a subset of the term computer-readable medium. The term computer-readable medium, as used herein, does not encompass transitory electrical or electromagnetic signals propagating through a medium (such as on a carrier wave); the term computer-readable medium may therefore be considered tangible and non-transitory. Non-limiting examples of a non-transitory, tangible computer-readable medium are nonvolatile memory circuits (such as a flash memory circuit, an erasable programmable read-only memory circuit, or a mask read-only memory circuit), volatile memory circuits (such as a static random access memory circuit or a dynamic random access memory circuit), magnetic storage media (such as an analog or digital magnetic tape or a hard disk drive), and optical storage media (such as a CD, a DVD, or a Blu-ray Disc).
The systems and methods described in this application may be partially or fully implemented by a special purpose computer created by configuring a general purpose computer to execute one or more particular functions embodied in computer programs. The functional blocks, flowchart components, and other elements described above serve as software specifications, which may be translated into the computer programs by the routine work of a skilled technician or programmer.
The computer programs include processor-executable instructions that are stored on at least one non-transitory, tangible computer-readable medium. The computer programs may also include or rely on stored data. The computer programs may encompass a basic input/output system (BIOS) that interacts with hardware of the special purpose computer, device drivers that interact with particular devices of the special purpose computer, one or more operating systems, user applications, background services, background applications, etc.
It will be appreciated that variations of the above-disclosed embodiments and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also, various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the description above and the following claims.
Number | Date | Country | Kind |
---|---|---|---|
2104801 | May 2021 | FR | national |
21306565.9 | Nov 2021 | EP | regional |