According to a 2017 study, 11.7% of convictions in the US were wrong. In cases where DNA evidence was present, that number rose to 12.6%. K. Walsh, J. Hussemann, A. Flynn, J. Yahner, and L. Golian, “Estimating the prevalence of wrongful convictions,” https://www.ojp.gov/pdffiles1/nij/grants/251115.pdf, 2017. Further, according to a report (https://www.sentencingproject.org/publications/un-report-on-racial-disparities/) submitted to the UN, African-American adults are 5.9 times as likely to be incarcerated than whites and Hispanics are 3.1 times as likely (in the US). These findings portray fundamental flaws in the judicial process in the US. Because judges sometimes make mistakes, a need has been recognized for improved judicial analysis systems to improve and assist in judicial decisioning.
It is with these concepts in mind, among others, that various aspects of the present disclosure were conceived.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosure. The summary is not an extensive overview of the disclosure. It is neither intended to identify key or critical elements of the disclosure nor to delineate the scope of the disclosure. The following summary merely presents some concepts of the disclosure in a simplified form as a prelude to the description below.
Some aspects described herein involve a system including a logical computing framework (JUST) within which judges or jury members can record propositions about a case and witness statements where a witness says that certain propositions are true. The logical computing framework may provide a user interface that allows the judge or jury members to assign a probability of her belief in a witness statement. A world is an assignment of true or false to each proposition, which is required to satisfy case specific integrity constraints. The logical computing framework employs an explicit algorithm that calculates the k-most likely worlds without using independence assumptions between propositions. The judge may use these calculated top-k most probable worlds to make his or her final decision. For this computation, the logical computing framework incorporates and uses a suite of “combination” functions. Additionally, the logical computing framework incorporates an implicit and efficient algorithm. As provided below, JUST has been tested using 10 combination functions on 5 real-world court cases and 19 TV court cases, where the illustrative examples show the combinations under which JUST works well in practice.
The details of these and other aspects of the disclosure are set forth in the accompanying drawings and description below. Other features and advantages of the disclosure will be apparent from the drawings and description.
The foregoing and other objects, features, and advantages of the present disclosure set forth herein will be apparent from the following description of particular embodiments of those inventive concepts, as illustrated in the accompanying drawings. Also, in the drawings the like reference characters refer to the same parts throughout the different views. The drawings depict only typical embodiments of the present disclosure and, therefore, are not to be considered limiting in scope.
Aspects of the present disclosure relate to data science, evidence-based decision making and, more particularly to a software package and computing system enabled for assembling and encoding factual statements from witnesses, assigning probabilities of the statements to be truthful, automatically incorporating case-specific integrity constraints and using combination functions to render the k most probable worlds, and enabling a judge to issue a verdict based on the output of the system.
Judges sometimes might not reason logically, for example, a judge may be biased towards certain segments of the population (e.g., women, minorities, etc.). As such, the JUST computing framework allows judges to provide non-arbitrary decisions, thereby improving social justice. For example, analysis and operations performed by the JUST computing framework has the potential to reduce bias by making judges (or maybe juries) assign probabilities to witness statements. The computing system provides an automatic framework to allow judges' decisioning to be more logically well-founded based on probabilistic logic, and therefore the judges may provide more fair judgements. Output from the JUST computing framework may be presented to the judge. A user interface may present the most-likely scenarios generated based on the judge (or jury's) own assessment(s) and the laws of probability, where the automatically generated information may assist the judge make a better decision. At the very least, the judge will at least be presented with possible scenarios that conflict with what the judge may have in mind, thus leading to a better-informed final decision. In addition, the assessments provided via the JUST computing framework may be made part of the legal record, which can allow society to better monitor judicial bias. While reversing centuries of racial bias can be a challenge, the JUST computing framework's analysis and output may improve legal outcomes, and may make court procedures fairer.
The computing framework starts with witness statements and an assessment of the witness statements that the judge provides. The judge is then presented with the most likely scenarios based on the judge's own assessment (or the jury's assessment if the judge prefers that, or a combination of the judge and jury's assessment if that is preferable) and the laws of probability to assist in making a more grounded decision. The computing framework will allow the judge to see possible scenarios that conflict with what the judge may have previously had in their mind and, thus allow the judge to make a better-informed final decision. The computing framework may lead to better judicial decisions and reduce biasing problems. The attached Appendix entitled “Judicial Support Tool: Finding the k-Most Likely Judicial Worlds” highlights examples and details related to at least the structures and methods described herein and are incorporated by reference in its entirety herein.
The systems and methods that implement the JUST computing framework assist judges and/or juries in considering the evidence more objectively. JUST proceeds in four broad steps, as shown in
This can be easily done, for example, when transcribing witness statements and depositions in, such as in US courtroom proceedings. To reduce dependence on biased decisioning from a judge (or jury), the judge and jury may independently assign probabilities, from which a median value may be used. In a second step, judges (or juries as indicated in [024]) can then use their own judgement to assign probabilities to whether those statements are true. In the third step, JUST would identify a space of k-possible worlds, along with their probabilities. These possible worlds will need to satisfy various integrity constraints. By showing these k possible worlds to a judge, JUST makes judges aware of different ways in which the evidence in the case can be viewed, along with the probabilities of those worlds based on the judge's own assessment function (or the jury's assessment function if that is preferred by the judge or a combination of the two). At the end of the day, the JUST computing framework allows the judge to make a final better-informed decision, while keeping in mind the alternative worlds presented to them by the JUST system.
A problem solved by JUST may first be defined with reference to a universe of propositions. In Step 1, a witness may say some propositions are true, others are false, and offer no opinion about some propositions. Witness testimony can conflict. A set of integrity constraints (ICs) is also specified sometime after Step 1 and before Step 3 can be executed. Possible worlds are interpretations in logic that satisfy the integrity constraints. In Step 2, judges can assign a probability to each witness statement denoting how much the judge believes a particular witness statement. For instance, in the famous Casey Anthony murder trial (https://en.wikipedia.org/wiki/Death of Caylee Anthony) in the US, Casey's father said “Casey sedated her daughter.” A judge may assign a 75% probability that the witness was telling the truth, while another witness might say the opposite and the judge (or jury) may assign a different probability to that witness's statement. The JUST computing framework uses a class of “combination functions” (CFs) to combine these probabilities and show that they induce probability distributions on the space of possible worlds in different ways. The JUST computing framework then presents two approximation algorithms to find the top-k most likely possible worlds, which can be presented to a judge (or jury) as options to consider before rendering a final decision. The JUST computing framework was tested with a prototype implementation of our framework and its performance was evaluated over 19 TV legal cases and 5 real-world court cases. The JUST computing framework was found to perform very well, achieving high prediction performance when comparing the ground truth and the top-k possible worlds.
Below, the framework is formally defined, the different “combination functions” to combine multiple witness statements are introduced, algorithms to compute most likely possible worlds are presented, and an experimental evaluation over several real-world cases is reported. The related work and conclusions are presented at the end.
Within the JUST computing framework, the existence of a set of “sources” and a set
of (Boolean) propositions are assumed. For example, sources might be those who provided testimony about the Casey Anthony case. The propositions refer to the specific statements that they made, such as “Casey sedated her daughter Caylee.” Assume that
does not contain propositions with “complementary” meaning, e.g., p=“Casey sedated her daughter Caylee” and p′=“Casey did not sedate her daughter Caylee.” (Note that there is no loss of generality in this assumption as p′ can be modeled by assigning the truth value false to p, since p is a Boolean proposition that can take the truth values true or false. Moreover, court recording companies can easily ensure this when they produce transcripts and define propositions if they use a system like JUST). Witness functions, defined below, capture statements made by witnesses.
A witness function is a partial mapping ω: ×
→{0,1}.
Intuitively, ω(s,p)=1 when source s said proposition p is true, while ω(s,p)=0 when s said p is false. Also, ω is a partial mapping, and ω(s,p) is not defined in cases when s did not say anything about p.
An assessment function provides a “confidence” for each statement by some source s about some proposition p. This assessment function reflects the judge's assessment of the veracity of a specific statement by a witness. It can easily be implemented via a graphical user interface in which the judge can log his degree of belief in witness's remarks on a specific statement.
An assessment function for a witness function ω is a partial mapping α: ×
→[0,1] such that for every source s∈
and every proposition p∈
, α(s,p) is defined iff ω(s,p) is defined.
Intuitively, α(s,p) returns a real value in [0,1] assessing the probability of belief that the judge has in s's statement about proposition p is true. For example, α(s,p)=0.7 says that the assessment function believes the statement by source s about proposition p is true with a confidence of 70%. Such assessment functions are provided by the judge or jury. It is important to note that if each member of the jury provides their own assessment of a statement s about proposition p by a specific witness, then these assessments can be combined by the JUST system in accordance with a preference expressed by the judge. Suppose there are 5 jurors who assign such a statement a probability of 0.5, 0.7, 0.3, 0.9, 1 respectively. The judge may combine these probabilities by taking their average which is 0.68. Or the judge may take the median of these probabilities which would be 0.7. Or the judge may take the average after eliminating the highest (e.g., 1) and lowest probability (e.g., 0.3) assignments, leading to a combined probability of 0.7. The reader will note that many other ways can be used by the judge to combine the probabilities provided by the jury members. Or he may choose to only use the probability he assigns to the witness statements or some combination of the jury members' assigned probabilities and his own.
Example 1 ={s1, . . . , s4} and 4 propositions
={p1, . . . , p4}, depicted as a bipartite graph. In Step 1, the witness function is captured by the edges connecting the sources to the propositions. A solid (resp., dashed) edge denotes ω(s,p)=1 (resp., ω(s,p)=0). The assessment function is captured via the edge weights. For example, the dashed edge from s1 to p3 with weight of 0.6 specifies that s1 said that p3 (namely, “Caylee was with the nanny”) is false, and the assessment function assesses s1's testimony to be true with 60% probability.
The notion of a possible world is one in which models one possible scenario that might have occurred.
A possible world is a total mapping γ: →{0,1}.
Intuitively, a possible world labels every single proposition as being either true (1) or false (0). Use PW to denote the set of all possible worlds over .
Example 2 The running example has 16 possible worlds, depending on whether the function assigns 0 or 1 to each of the four propositions. Table I, as shown in
An important point to note from the above example is that the set of all possible worlds is exponential in the set of propositions. For instance, if there are 100 propositions, then there will be 2100 possible worlds, e.g., 1267650600228229401496703205376 possible worlds. This will prove to be a major challenge in computing the k most probable worlds when discussed below.
In some real-world situations, partial or total ground truth may exist, but in situations like judicial cases, and in general in the presence of uncertainty, ground truth does not necessarily exist about what occurred and what did not occur. A possible world captures one possible scenario that might have occurred. To find the k-most probable words, all possible worlds must be considered. However, some of them may be inconsistent with the semantics (e.g., meaning) of the propositions considered. As usual, semantics of cases via two types of integrity constraints are captured, which are introduced below.
A denial constraint is of the form ¬(p1∧ . . . ∧pn), with n≥1 and each pi∈.
Intuitively, a denial constraint says that the propositions p1, . . . , pn cannot all be simultaneously true.
A definite constraint is of the form p1∧ . . . ≳pn→p0, with n≥0 and each pi∈.
Intuitively, a definite constraint says that if the propositions p1, . . . , pn are all simultaneously true, then p0 must be true. When n=0, then p0 must be true.
Example 3 In the running example, the following two constraints may exist (referring to some fixed time point):
The first is a denial constraint saying that “Casey was unemployed” and “Casey was working at Universal Studios” cannot both be true at the same time. The second is a definite constraint saying that if “Caylee was with the nanny” is true, then ‘Casey had a nanny” must also be true. (Note—A simple graphical user interface or GUI can enable court transcription companies to easily write the propositions and the ICs in English as has been done in the Casey Anthony examples, so personnel only need to know how to use the GUI and not have to know logic.)
Define what it means for a possible world to satisfy a given integrity constraint.
A possible world γ satisfies a denial constraint denc of the form ¬(p1∧ . . . ∧pn) iff there exists 1≤j≤n such that γ(pj)=0. A possible world γ satisfies a definite constraint defc of the form p1∧ . . . ∧pn→p0 iff γ(p0)=1 or there exists 1≤j≤n such that γ(pj)=0. Write γ|=denc and γ|=defc to denote satisfaction in these two cases.
Example 4 The possible worlds γ5, γ7, γ13, γ15 (See Table I,
Given a set IC of integrity constraints, say that a possible world γ satisfies IC, denoted γ|=IC, iff γ|=ic for all ic∈IC, otherwise γ does not satisfy IC and write γ|≠IC. Building upon our running example, the possible worlds that satisfy the integrity constraints IC of Example 3 are γ0, γ1, γ4, γ8, γ9, γ10, γ11, γ12, γ14.
The next case deals with associating a probability that a possible world is the actual world. To do so, a possible world distribution is defined.
A possible world distribution w.r.t. a set IC of integrity constraints is a probability distribution ρIC over PW such that ρIC(γ)=0 for all possible worlds γ∈PW with γ|≠IC.
The last column of Table 2 shows a sample possible world distribution ρIC w.r.t. the set IC of integrity constraints of our running example. For instance, this possible world distribution assigns a 10% probability of being correct to each of γ1, γ4, γ9, γ14, a 15% probability of being correct to γ10, a 20% probability of being correct to each of γ0 and γ8, and a 5% probability to γ1.
Example 5 Let us consider proposition p3 in
This causes a dilemma. Both sources that say p1 is true have low credibility—s3 is assessed as lying with 70% probability, while s4 is assesses as lying with 50% probability. What should be inferred from this? It is clear that this situation can be resolved in many different ways.
To address this issue, combination functions are introduced, which return a confidence value about a proposition being true, on the basis of a witness and an assessment function.
Definition 8 (Combination Function) A combination function (CF) is a function cf that takes as input a proposition p, a witness function ω, and an assessment function α for ω, and returns a value in [0,1] as output.
Rather than committing to a specific combination function, a general definition is provided that allows different concrete instances that can accommodate different needs, depending on the application at hand. In the following, 10 concrete combination functions are proposed. A judge making a decision can either choose a default combination function within JUST or pick one of the 10 suggested by us or use some other combination function altogether. Of course, this library of 10 combination functions can be extended easily if future research suggests better ones.
Now, given a witness function ω, an assessment function α for ω, a set of integrity constraints IC, and having chosen a combination function cf, possible world distributions may be found by solving the set LC of linear constraints defined as follows:
In the set of linear constraints above, each variable Xj stands for the (unknown) probability of a possible world γj satisfying the integrity constraints. Also, use the value returned by the combination function for proposition pi as an upper bound for the probability of pi being true.
It easy to see that each solution of LC corresponds to a possible world distribution, and vice versa. In general, LC can have multiple solutions (e.g., possible world distributions) because independence between propositions is not assumed. These solutions can be found by using an off the shelf linear program solving tool such as CPLEX or Gurobi. Thus, a fixed possible world γ can have different probabilities w.r.t. different possible world distributions. One way to get a single probability for γ is to define it as its average probability across all possible world distributions. Then, a set of top-k possible worlds is a set of k possible worlds with highest average probability.
Qualitative Assessments. Judges or juries may be uncomfortable assigning probabilities to witness statements. Suppose a judge/jury prefers to assign a “low”, “medium”, “high” rating to the credibility of a witness statement. These can be easily converted to the intervals [0; ⅓]; [⅓; ⅔]; [⅔; 1], respectively. This can be easily handled via a small modification to the JUST framework: (i) the definition of combination function would be replaced by a function cf′ that behaves just like cf except that cf′(pi; ω;α) would return a probability interval rather than a probability. (ii) The first constraint shown above would be replaced by the constraint:
When multiple witnesses disagree about a proposition and the assessment function for the witnesses varies, a judge can choose a combination function (CF) to combine the values into a single number representing the probability of a proposition being true. Below, two classes of CFs are introduce.
Standard CFs. The simplest CFs use aggregate operators Φ (consider the 5 aggregates min, max, avg, median, mode in our experiments) that take a multiset of real numbers in [0,1] and return a number in the [0,1] interval. Specifically, the basic idea to determine the probability of a proposition p is to combine all statements made about p as follows: take the value α(s; p) for each statement (made by some source s) claiming that p is true, take the value 1−α(s′; p) for each statement (made by some source s′) claiming that p is false, and then apply an aggregate operator Φ over all such collected values.
Given an aggregate operator Φ and a proposition p, define the combination function:
Example 6. Consider proposition p3 of the running example. All statements made about p3 (along with their assessmeωnt function values) are reported in Example 5. Thus, source s3 (resp., s4) is claiming p3 is true and the assessment function value for such a statement is 0.3 (resp., 0.5), while source s1 is claiming p3 is false and the assessment function value for such a statement is 0.6. Using e.g. the average aggregate operator, which results in cfavg(p3; ω; α)=avg({0.3; 0.5; (1−0.6)})=0.4.
Trust-Based CFs. The basic idea of this family of CFs is to assign a score to each source, measuring her/his “reliability”, and then compute each propositions' probability on the basis of such scores. Trust-based CFs determine the probability of a proposition p by looking also at ω(si,p′) and ω(si,p′) values with p′≠p, e.g., by looking also at witness statements for propositions other than p. Define the trust of source s as the average of the assessment function values for s's statements. However, other definitions of trust can be easily substituted here (e.g., median of the assessment function values for s's statements). The probability of a proposition p is then computed by applying an aggregate operator to the trust values of the sources who made statements about p.
For each source s its trust score is defined as
T(s)=avg({ω(s,p)|ω(s,p) is defined}).
Then, given an aggregate operator Φ and a proposition p, the combination function is defined as:
Example 7. Consider again proposition p3 of the running example. The sources who made statements about p3 are s1, s3, and s4, whose trust scores are as follows:
T(si)=avg([0.2,0.8,0.6])=0.53,
T(s3)=avg([0.6,0.3])=0.45,
T(s4)=avg([0.5,0.7])=0.6.
Using e.g., the max aggregate operator, which results in
cf
max
trust(p3,ω,α)=max([(1−0.53),0.45,0.6])=0.6.
In this section, two approximation approaches are used to compute most likely possible worlds. The explicit approach (JUSTexp) makes no independence assumptions whatsoever and computes the average probability of each possible world over a sample of solutions of LC and then returns a top-k set of most likely ones (w.r.t. the average probability). The implicit approach (JUSTimp), assumes that only propositions appearing in no integrity constraints are independent, reducing the computational effort.
More specifically, JUSTexp (See Algorithm 1) works as follows. First, a set S of solutions of LC (e.g., probability distributions over possible worlds) is randomly sampled (line 1). For this, the well-known Hit-and-Run walk may be used (See, R. L. Smith, “Efficient monte carlo procedures for generating points uniformly distributed over bounded regions,” Operations Research, vol. 32, no. 6, pp. 1296-1308, 1984 and L. Lov'asz and S. S. Vempala, “Hit-and-run from a corner,” SIAM Journal on Computing, vol. 35, no. 4, pp. 985-1005, 2006. A number of other algorithms for this purpose are also usable for such sampling. Then, for each possible world γ, its expected probability EP(γ) (w.r.t. S) is computed (lines 2-4). Finally, a set of top-k (w.r.t. EP) possible worlds is returned (lines 5-6).
Before resenting JUSTimp, additional notation is introduced next. Possible worlds defined w.r.t. subsets ′ of
, e.g., functions of the form γ′:
′→{0,1} need to be considered. For notational convenience, such a function γ′ is represented also as the set {pi|pi∈
′ and γ′(pi)=1}∪{
′ and γ′(pi)=0}. PW[
′](resp., LC[
′]) is used to denote the set of possible worlds (resp., linear constraints) w.r.t. only the propositions in
′. Under the independence assumption, the probability P(γ′) of a possible world γ′ in PW[
′] is defined to be:
When γ′=Ø, P(γ′) is defined to be 0. For a set IC of integrity constraints, IC is the set of propositions in
that appear in IC, while
\
IC.
, s.
ty constraints,
, k′ ≤
, s
with |S| = s.
and γ′ ∈ Top′ do
JUSTimp (See, Algorithm 2) works as follows. First, a set S of solutions of LC[IC] is randomly sampled (line 1). Then, for each possible world γ in PW[
IC], its expected probability EP(γ) (w.r.t. S) is computed (lines 2-4). Then, a set of top-k′ possible worlds from PW[
IC] and γ′ from the previous step, γ and γ′ are combined into a possible world γ* with probability EP(γ)×P(γ′) (lines 6-9). Finally, a set of top-k possible worlds from those computed at the previous step is returned (lines 10-11).
We still need to show how to compute a set of top-k′ possible worlds under the independence assumption (line 5 of JUSTimp). To address this problem in the following, by providing a dynamic programming algorithm, called JUSTind (See Algorithm 3), whose worst-case time complexity is O(m·k′), where m is the number of propositions in
One key idea of Algorithm 3 is to build possible worlds bottom-up. It starts by considering only one proposition and then iteratively considers the remaining ones, one at a time.
The details of JUSTind. is presented next. On line 1, m is set to the number of propositions. On lines 2-9, three arrays Top, Lt, and Lf are introduced, each with k′ entries initialized to the empty set. The array Top stores the most likely possible world for the propositions currently considered, while Lt, and Lf are auxiliary arrays whose role will be explained shortly.
Line 10 introduces the integer n, which is initialized to 1 and then updated on line 17, whose meaning is as follows. The value of n after being updated on line 17 is the number of most likely possible worlds to be computed: n is strictly lower than k′ when i propositions are being considered and their number of possible worlds is 2i≤k′—indeed, in such a case, n=2i; when i becomes high enough, e.g., it is such that 2i≥k′, n becomes equal to k′.
The for loop on lines 11-21 performs m iterations. At any iteration i, the top-n most likely possible worlds are computed on lines 12-20 by considering only the first i propositions (an arbitrary order over the propositions can be used). Specifically, on lines 12-15, each possible world in Top is augmented with proposition pi being true (resp., false) and the resulting world is stored in Lt (resp., Lf)—see line 13 (resp., line 14). Next, Lt and Lf are merged into an array L sorted by descending P value (line 16). Then, the top-n possible worlds in L are copied into Top (lines 18-20).
Finally, after m iterations of the for loop on lines 11-21, the array Top, containing the top-k′ most likely possible worlds (sorted by descending probability) w.r.t. all propositions in
,
.
.
The following two theorems state the correctness and the worst-case time complexity of Algorithm 3.
= {p1, ... , pm}. For 0 ≤ i ≤ m, use the following notation:
is the set of the first i propositions in
(here an arbitrary order of the
= Ø;
, that is, when only the propositions
are considered—define Γ0 = {Ø};
Prove that the following optimal substructure property holds for each 1≤i≤m:
Let Si be the result of Top−k′(Li). What must be shown is that Si=Tik′. When Ti-1k′=Γi-1, the claim is straightforward, since Li=Γi in such a case. Below consider the case where Ti-1k′ÖΓi-1. To prove Equation 1, it is shown that P(γ)≥P(γ′) for every γ∈Si and γ′∈Γi\Li (notice that here it suffices to consider Γi\Li rather than Γi\Si because Si contains top-k′ elements of Li). It is proven it by showing that P(γ)≥P(γ′), where γ (resp., γ′) is a possible worlds in Si (resp., Γi\Li) with minimum (resp., maximum) P-clearly, among the possible worlds in Si (resp., Γi\Li).
Let γ″ (resp., γ′″) be a possible world in Ti-1kk′ (resp., Γi-1\Ti-1k′) with minimum (resp., maximum) P. Clearly, P(γ″)≥P(γ′″), since Ti-1k′ is a top-k′ set of Γi-1. Let pimax=max{P(pi), P(
It can be easily verified that JUSTind is an implementation of Equation (1) above which eventually returns Tmk′.
Proof Notice that n≤k′ at any time, as n is initially equal to 1 (line 10) and then it is updated only on line 17 to min{2×n, k′}.
Line 1 takes Θ(1) time. Lines 2-9 take Θ(k′) time. Line 10 takes Θ(1) time.
Let us now analyze the cost of a single iteration of the for loop on lines 11-21, namely lines 12-20.
Lines 12-15 take Θ(n) time, which is O(k′), since n≤k′—here the addition of the elements pi and
The time complexity of line 16 is discussed next. Notice that Top is always sorted by descending score: initially all its elements are the empty set (see line 6); then, Top is updated only on lines 18-20 where the first n elements of L are copied into Top, and since L is sorted by descending P (see line 16), so is Top. Notice that the elements of Top after the n-th one, if any, are the empty set, and thus their P value is 0 by definition. As a consequence, Lt (resp., Lf) is always sorted by descending P. Initially, all its elements are the empty set—see line 7 (resp., line 8). Then, Lt (resp., Lf) is updated only in the for loop on lines 12-15, where the first n elements of Top are copied into Lt (resp., Lf) with the addition of the same element pi(resp.,
Line 17 takes linear time in the size of k′. Lines 18-20 take Θ(n) time, which is O(k′), since n≤k′.
Thus, a single iteration of the for loop on lines 11-21 takes O(k′) time. Since the for loop performs Θ(m) iterations, lines 11-21 takes O(m·k′) time.
Line 22 takes Θ(1) time.
It follows from the analysis above that the worst-case time complexity of algorithm 3 is O(m·k′).
In this section, the experiments conducted with 24 datasets (19 TV legal cases and 5 real-world court cases) to assess the efficacy of our framework are reported.
JUSTexp, JUSTimp, JUSTind, the CFs, and the integrity constraint checker were implemented using one or more computing languages, such as in Python. Experiments were run on a 3.10 GHz Intel Core i9-9960X CPU with 131 GB of RAM, running Ubuntu 18.04.5. The number of samples (e.g., the s input in alg:just-exp,alg:just-imp) was set to 20. For JUSTimp, the number k′ of possible worlds considered under the independence assumption was k′=|PW[IC]|, that is, the same number of possible worlds induced by propositions appearing in the integrity constraints. Min, max, avg, mode, and median were used in the standard CFs as per Definition 9, and also considered their trust-based counterparts (see, Definition 10). In all, 10 CFs were considered.
Datasets. 5 real-word trial cases were considered: Casey Anthony (CA) [https://en.wikipedia.org/wiki/Death_of_Caylee_Anthony](30.5 hours of video), Ashley McArthur (AMA) [https://heavy.com/news/ashley-mcarthur-today](14 hours of video), Jacob Cayer (JC)[https://www.greenbaypressgazette.com/story/news/2020/08/13/jacob-cayer-trial-jury-decide-insanity-after-murder-verdict/3361870001/](22 hours of video), Nathaniel Rowland (NR) [https://apnews.com/article/south-carolina-uber-nathaniel-rowland-samantha-josephson-28626a2e564963fb1fld8a452b016207](16 hours of video), Joshua Aide (JA) [https://fox11online.com/news/crime/man-who-killed-ex-girlfriends-father-shot-2-others-sentenced-to-life](17.5 hours of video). Nineteen additional cases were considered that were taken from the US TV show “Judge Judy” (JJ), which are 10 minutes long on average.
The sources, propositions and witness functions were manually annotated. Because no judges were convinced to spend the time necessary to provide the assessment function, an online “deception detection algorithm” was used [Z. Wu, B. Singh, L. S. Davis, and V. S. Subrahmanian, “Deception detection in videos,” in Proc. AAAI, 2018, pp. 1695-1702] that was trained on a real world courtroom dataset [V. P'erez-Rosas, M. Abouelenien, R. Mihalcea, and M. Burzo, “Deception detection using real-life trial data,” in Proc. ICMI, 2015, p. 59-66]. Of course, this is just a proxy for a judge's assessments and it is not suggested that deception detection algorithms be used for assessment and it is not suggested that deception detection algorithms be used for assessment.
For the CA, AMA, JC, NR, and JA datasets 7, 3, 7, 5, 6 integrity constraints were defined, respectively. For example, these were the natural number of ICs for the propositions in the case. The number of ICs for the JJ dataset ranged from 0 to 3 as these cases were short (10 minutes compared to, e.g., 30.5 hours for CA) and had a smaller number of propositions.
Prediction Performance. For each dataset, its video(s) were manually watched and captured the real ground truth (which is a possible world) based on what the judge in the case decided to be true vs. false. In the following, two metrics are defined to measure prediction performance, denoted PP[i] and PP*[i], whose goal is to assess (in different ways) the quality of the first i possible worlds, for 1≤i≤k.
A definition of PP[i] follows. For a given dataset, let γ* be the ground truth possible world and γ1, . . . , γn be the possible worlds that were computed and sorted by descending probability. For 1≤i≤n and for each proposition pj∈, is defined F[i][j]=1 if there exists a possible world γk with k≤i s.t. γ*(pj)=γk(pj), that is, there is a possible world among the first i ones that agrees with the ground truth on pj's truth value; otherwise, F[i][j]=0. Then, for each 1≤i≤n, the prediction performance of the top-i possible worlds, denoted PP[i], is defined as the average of F[i][j] across all propositions pj's. The second metric proposed is more demanding and is based on the following idea. In an “ideal” set of top-k possible worlds, the first one (e.g., the most likely one) is the ground truth, then it results with all possible worlds that are the same as the ground truth except for the truth value of one proposition, then it results with all possible worlds that are the same as the ground truth except for the truth value of two propositions, and so on, and so forth.
In general, if m is the number of propositions, then the number of possible words that differ from the ground truth by j proposition truth values is (jm), with 0≤j≤m. Thus, there is (0m)=1 possible world differing from the ground truth by 0 propositions (this is the ground truth itself), there are (1m)=m possible worlds differing from the ground truth by 1 proposition, and so on, so forth. In an ideal set of top-k possible worlds, the i-th possible world (1≤i≤k) is the same as the ground truth except for the truth value of xi propositions, where 0≤xi≤m is the smallest integer s.t. i≤Σh=0xiω(hm). Equivalently, the i-th possible world agrees with the ground truth function on m−xi proposition truth values, and used ni to denote the value m−xi. Thus, ni must be read as the number of propositions the ground truth and the n ideal result.
Now let γ1, . . . , γk be the possible worlds that were computed and are sorted by descending probability. We want to measure how much this (sorted) set differs from the ideal result discussed above by assigning a score to each possible world γi measuring how much γi differs from a possible world in the same position in an ideal result. We then compute the average score across all possible worlds. For a possible world γi, let ni be the number of propositions on which γj and the ground truth agree. Then, we define score(γj)=min{1, nγ
JUSTexp vs. JUSTimp. While JUSTimp terminated in all 24 cases, JUSTexp terminated within half an hour only on the JA, JJ6, JJ11, JJ14, JJ16, and JJ19 datasets. Since results did not vary substantially across different CF aggregate operators, in the following a representative case is discussed, which is cfavg—where standard vs. trust-based CFs is discussed later on. Running times (in seconds) of JUSTexp and JUSTimp are reported in Table II of
Also, recall that JUSTimp was able to provide results on the remaining 18 datasets too (requiring little time) while the JUSTexp did not. It was concluded that JUSTimp achieves a good trade-off between quality of the results and running time.
Standard vs. trust-based CFs. Standard CFs were compared against their trust-based counterpart using (as it was able to terminate over all datasets). The running times of a CF and its trust-based counterpart were very close in all cases. Thus, the choice of CF does not significantly affect run time. As such, the quality of the results is the focus.
To summarize, our evaluation shows that JUSTimp provides results of very high quality, has low running times, standard CFs are better than trust-based ones, with no significant difference across the different aggregates.
There is much work on legal systems using logical methods [R. A. Kowalski, “Legislation as logic programs,” in Informatics and the Foundations of Legal Reasoning, 1995, pp. 325-356], [R. Kowalski and A. Datoo, “Logical english meets legal english for swaps and derivatives,” Artificial Intelligence and Law, August 2021], and [J. van Benthem, D. Fern'andez-Duque, and E. Pacuit, “Evidence logic: A new look at neighborhood structures,” in Proc. Advances in Modal Logic, 2012, pp. 97-118] introduced evidence logic as a way to model epistemic agents faced with possibly contradictory evidence from different sources, where each agent has a collection of possible worlds, one of which the agent believes to be the actual world. [F. Liu and E. Lorini, “Reasoning about belief, evidence and trust in a multi-agent setting,” in Proc. PRIMA, vol. 10621, 2017, pp. 71-89] introduced a logic where an agent accumulates evidence in support of a given fact from other agents in the society and the body of evidence in support of that fact can become a reason to believe it. [U. J. Schild, “Criminal sentencing and intelligent decision support,”
Artificial Intelligence and Law, vol. 6, no. 2, pp. 151-202, 1998]proposed a case-based sentencing support system. [P. Leith, “The judge and the computer: How best ‘decision support’?” Artificial Intelligence and Law, vol. 6, no. 2, pp. 289-309, 1998] created an automated decision support system for probation officers. [A. Karamlou, K. Cyras, and F. Toni, “Deciding the winner of a debate using bipolar argumentation,” in Proc. AAMAS, 2019, pp. 2366-2368] developed an argumentation framework to reason about debate outcomes. [T. K. Wah and M. Muniandy, “Courtroom decision support system using case based reasoning,” Procedia—Social and Behavioral Sciences, vol. 129, pp. 489-495, 2014] designed a case-based decision support system allowing plaintiff and defendant to solve their legal case without involvement of an actual judge. [I. Mokanov, D. Shane, and B. Cerat, “Facts2law: Using deep learning to provide a legal qualification to a set of facts,” in Proc. ICAIL, 2019, p. 268-269] developed Facts2Law, a model for matching a set of facts to relevant legal sources.
Some of the work in the area deals with situations where witnesses remember the same events differently [R. Anderson, “The rashomon effect and communication,” Canadian Journal of Communication, vol. 41, no. 2, 2016.], [W. D. Roth and J. D. Mehta, “The rashomon effect: Combining positivist and interpretivist approaches in the analysis of contested events,” Sociological Methods & Research, vol. 31, no. 2, pp. 131-173, 2002]. [A. Josang and V. A. Bondi, “Legal reasoning with subjective logic,” Artificial Intelligence and Law, vol. 8, no. 4, pp. 289-315, 2000] develop a logic for dealing with uncertainties in courtroom settings. [L. van Leeuwen and B. Verheij, “A comparison of two hybrid methods for analyzing evidential reasoning,” Frontiers in Artificial Intelligence and Applications, vol. 322, no. Legal Knowledge and Information Systems, p. 53-62, 2019.] analyzes and compares Bayesian networks with embedded scenarios and formal analysis of argument validity. There is also much work on detecting deception in trials [T. Fornaciari and M. Poesio, “Automatic deception detection in italian court cases,” Artificial Intelligence and Law, vol. 21, no. 3, pp. 303-340, 2013], [V. P'erez-Rosas, M. Abouelenien, R. Mihalcea, and M. Burzo, “Deception detection using real-life trial data,” in Proc. ICMI, 2015, p. 59-66]. A few efforts try to predict court decisions [N. Bagherian-Marandi, M. Ravanshadnia, and M.-R. Akbarzadeh-T, “Two-layered fuzzy logic-based model for predicting court decisions in construction contract disputes,” Artificial Intelligence and Law, vol. 29, no. 4, pp. 453-484, 2021], [M. Medvedeva, M. Vols, and M. Wieling, “Using machine learning to predict decisions of the european court of human rights,” Artificial Intelligence and Law, vol. 28, no. 2, pp. 237-266, 2020], and [A. Deeks, “The judicial demand for explainable artificial intelligence,” Columbia Law Review, vol. 119, no. 7, pp. 1829-1850, 2019]. [127][G. Van Opdorp, R. Walker, J. Schrickx, C. Groendijk, and P. Van den Berg, “Networks at work: a connectionist approach to non-deductive legal reasoning,” in Proceedings of the 3rd International Conference on Artificial Intelligence and Law, 1991, pp. 278-287] provides a connectionist approach to resolving semantically indeterminate terms like “suitable employment” in a legal context-such an approach might complement JUST in encoding language via propositions. [D. E. Rose, A symbolic and connectionist approach to legal information retrieval. Psychology Press, 2013] combines connectionist and symbolic reasoning for legal information retrieval-such methods may be useful in automatically extracting semantic knowledge (e.g., propositions) from legal transcripts.
Our work builds upon ideas in possible world from probabilistic/possibilistic logic, inconsistency and uncertainty management, which are all scenarios where uncertainty about the real state of the world arises [L. D. Raedt, A. Kimmig, and H. Toivonen, “Problog: A probabilistic prolog and its application in link discovery,” in Proc. IJCAI, 2007, pp. 2462-2467], [R. T. Ng and V. S. Subrahmanian, “Probabilistic logic programming,” Information and Computation, vol. 101, no. 2, pp. 150-201, 1992], [D. Dubois and H. Prade, “Possibilistic logic—an overview,” in Computational Logic, ser. Handbook of the History of Logic, J. H. Siekmann, Ed., 2014, vol. 9, pp. 283-342], [J. Grant and A. Hunter, “Measuring inconsistency in knowledgebases,” Journal of Intelligent Information Systems, vol. 27, no. 2, pp. 159-184, 2006], [T. Lukasiewicz, “Probabilistic logic programming with conditional constraints,” ACM Transactions on Computational Logic, vol. 2, no. 3, pp. 289-339, 2001]. We are not aware of work on efficient algorithms to find the top-k most probable worlds in real-world situations, especially when there are integrity constraints and where some propositions are independent of others, while some dependencies still exist between other propositions.
Changing centuries of potential bias cannot be done via one paper-so this paper represents a small start in this direction. Many questions remain about how JUST can be deployed. (Q1) Won't a biased judge just assign biased probabilities to witness statements? There are at least two approaches to address this. One is to set up a framework where the judge's assessments are part of the record. In this case, lawyers for the aggrieved party can appeal based on the judge's assessments—and additionally a judge's record of disbelieving statements made by minorities may now be validated by data. Another is to have the jury assign the probabilities and take say the median probability assigned by the jury. (Q2) Isn't it challenging to encode natural language statements in logic? We did not find this hard to do when we coded our data. If a court reporting company writes down the propositions (along with the sentences from which the propositions are derived), a judge or jury can modify the statements to suit their understanding. (Q3) What about inconsistencies between statements made by a witness? The integrity constraints proposed within JUST are present exactly for this purpose. Recognize that a given set of integrity constraints may not flag some inconsistencies. In this case the judge/jury can add more integrity constraints.
The JUST framework is presented, within which witness statements and the confidence in them are used to calculate the most likely scenarios in judicial cases.
Note that JUST does not use machine learning: rather it computes the k most probable worlds based on the judge's assessment (or jury's assessment, or combination of judge's and jury's assessments) of the veracity of witness statements and the combination functions selected by the judge. This is an extremely challenging task because the set of all possible worlds is exponential in the number of propositions. Taming this complexity is a large advancement over the present methods (Step 3 of the JUST architecture shown in
JUST has the potential to reduce judicial bias but changing centuries of bias is challenging, so JUST is a first step in this direction. JUST uses a judge's own assessment to compute the top-k most probable worlds. The judge will thus be aware of the mathematically most likely scenarios consistent with her own assessments. At the very least, the judge will see possible worlds that conflict with her/his views, and their final decision will therefore be better informed. If the judge's assessments are part of the formal record of the case, a consistent bias exhibited by a judge is more likely to be uncovered by the data and potentially lead to successful appeals.
The processor platform 1400 of the illustrated example includes a processor 1406. The processor 1406 of the illustrated example is hardware. For example, the processor 1406 can be implemented by integrated circuits, logic circuits, microprocessors or controllers from any desired family or manufacture and may be distributed over one or more computing devices.
The processor 1406 of the illustrated example includes a local memory 1408 (e.g., a cache memory device). The illustrative processor 1406 of
The processor platform 1400 of the illustrated example also includes an interface circuit 1414. The interface circuit 1414 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), and/or a PCI express interface.
In the illustrated example, one or more input devices 1412 are connected to the interface circuit 1414. The input device(s) 1412 permit(s) a user to enter data and commands into the processor 1406. The input device(s) can be implemented by, for example, a sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, and/or a voice recognition system.
One or more output devices 1416 are also connected to the interface circuit 1414 of the illustrated example. The output devices 1416 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display, a cathode ray tube display (CRT), a touchscreen, a tactile output device, and/or speakers). The interface circuit 1414 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip, or a graphics driver processor.
The interface circuit 1414 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1424 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
The processor platform 1400 of the illustrated example also includes one or more mass storage devices 1410 for storing software and/or data. Examples of such mass storage devices 1410 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, RAID systems, and digital versatile disk (DVD) drives.
The coded instructions 1420 of
From the foregoing, it will be appreciated that the above disclosed methods, apparatus, and articles of manufacture have been disclosed to improve the functioning of a computer and/or computing device and improving analysis and computing probabilities associated with veracity of propositions in a legal matter.
Although certain example methods, apparatus and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
The foregoing merely illustrates the principles of the disclosure. Various modifications and alterations to the described embodiments will be apparent to those skilled in the art in view of the teachings herein. It will thus be appreciated that those skilled in the art will be able to devise numerous systems, arrangements, and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within the spirit and scope of the present disclosure. From the above description and drawings, it will be understood by those of ordinary skill in the art that the particular embodiments shown and described are for purposes of illustrations only and are not intended to limit the scope of the present disclosure. References to details of particular embodiments are not intended to limit the scope of the disclosure.
This is a utility application claiming priority to and incorporating by reference provisional application entitled “Judicial Support Tool Computing System”, Ser. No. 63/541,555 filed Sep. 29, 2023.
Number | Date | Country | |
---|---|---|---|
63541555 | Sep 2023 | US |