Aspects of the disclosure relate to detecting fraud. Specifically, aspects of the disclosure relate to using machine learning to detect fraud in publicly available forms.
In today's fast-moving world it may be difficult to identify fraudulent activities such as fraudulent activities in an organization. Fraudulent activities may even take place in public organizations such as public companies and still evade the public eye. Even when there is a suspicion of fraud, it may be difficult to raise a suggest fraudulent due to potentially weak evidence. A ramification of missing fraudulent activity may be an increase in front companies, shell companies, money laundering, among other problems. Another outcome of unresolved fraudulent activity may include an erosion of confidence in an economy which hosts suspected fraudulent culprits.
National organizations were established to serve as safeguards against fraud among public organizations. For example, in the United States, the Securities and Exchange Commission (SEC) has been established. The SEC requires public disclosures by organizations such as public companies. However, even with the presence of organizations such as the SEC, detecting fraud even in public companies continues to prove challenging. Part of the challenge may include identifying fraud that may be hidden in large volumes of publicly available documents submitted by companies to the SEC.
What is needed is an apparatus and method for identifying potential fraud among organizations such as public companies.
Provided may be an apparatus and method for identifying potential fraud among organizations such as public companies. For example, provided may be an apparatus and method for sorting through a large volume of publicly available information about an organization such as a public company. The publicly available information may be accessible on a publicly available electronic portal.
Fraud detection apparatus and methods may include automation. Automation may save time and resources. Automation may eliminate or reduce manual processing of data. Automation may allow for focused manual processing of data. For example, manual processing may be used as a quality check for the automated process. Manual processing may be used for other parts of the fraud detection process. A computer processor running a machine learning model may automate the fraud detection apparatus and method.
Provided may be an apparatus, methods, and systems for alerting an organization about activity that may be fraudulent. Methods may include using a computer processor to collect forms submitted to an electronic portal of a Security and Exchange Commission (SEC), where the forms may be related to an organization. The organization may have submitted the forms. Another party may have submitted the forms. The one or more forms submitted to the electronic portal of the SEC comprise SEC Form 10-K, SEC Form 8-K, SEC Form 10-Q, SEC Form 4, and SEC Form SD.
Methods may include the computer processor collecting the forms every 45 days or less. The computer processor may collect the forms every 15 days or less. The computer processor may collect the forms every 8 days or less. The computer processor may collect the forms every 36 hours or less. The computer processor may collect the forms in real-time as they are provided on the electronic portal. Real-time may be 2 hours or less. Real-time may be 1 hour or less. Real-time may be 30 minutes or less. Real-time may be 15 minutes or less. Real-time may be 5 minutes or less.
Methods may include the computer processor cleaning and preprocessing data found in the forms to produce cleaned and preprocessed data.
Methods may include the computer processor running machine learning models to extract sets of features from the cleaned and preprocessed data.
For example, the sets of features may include a set of features relating to liquid, solvency, and profitability ratio classification. The sets of features may include a set of features relating to disclosure classification. The sets of features may include a set of features relating to sentiment analysis. The sets of features may include a set of features relating to anomaly detection classification. The sets of features may include a set of features relating to ownership analysis classification. The sets of features may include a set of features relating to environmental, social, and governance (ESG) disclosure classification.
Methods may include the computer processor running machine learning models to determine if a threshold has been exceeded indicating a risk of fraud.
The machine learning model may include a liquid, solvency, and profitability ratio classification machine learning model. The machine learning model may include a disclosure classification machine learning model. The machine learning model may include a sentiment analysis machine learning model. The machine learning model may include an anomaly detection classification machine learning model. The machine learning model may include an ownership analysis classification machine learning model. The machine learning model may include an ESG disclosure classification machine learning model.
Determining if a threshold has been exceeded may indicate a risk of fraud. Exceeding a threshold when running a liquid, solvency, and profitability ratio classification machine learning model may indicate a detection of one or more unusual liquid, solvency, and profitability ratios. Exceeding a threshold when running a disclosure classification machine learning model may indicate a detection of one or more ambiguous disclosures. Exceeding a threshold when running a sentiment analysis machine learning model to indicate a detection of market manipulation. Exceeding a threshold when running an anomaly detection classification machine learning model to indicate a detection of one or more anomalies. Exceeding a threshold when running an ownership analysis classification machine learning model may indicate a detection of one or more suspicious owners. Exceeding a threshold when running an ESG disclosure classification machine learning model may indicate a detection of one or more fraudulent disclosures are detected.
Methods may include a computer processor notifying an administrator in an organization when one or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when two or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when three or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when four or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when five or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when six or more thresholds have been exceeded. The organization may be the same organization as to which the forms are related. The organization may be an organization different than the organization as to which the forms are related.
Methods may include a computer processor informing the administrator in the organization with an identity of the threshold which has been exceeded.
Methods may include where the organization providing the forms and the organization determining if there is a risk of fraud are different organizations. Methods may include where the organization providing the forms and the organization determining if there is a risk of fraud are the same organization.
Methods may include collecting the forms from the electronic portal every 36 hours or less.
Methods may include applying a time series analysis to a machine learning model. Methods may include notifying an administrator in the second organization, using the computer processor, when an unusual temporal pattern has been detected.
Methods may include applying a clustering classification to one or more machine learning models. Methods may include the computer processor notifying an administrator in the organization when detecting an anomalous cluster.
The objects and advantages of the invention will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
Apparatus, methods, and systems for alerting an organization about activity that may be fraudulent are provided. Methods may include using a computer processor to collect forms submitted to an electronic portal. The forms may be publicly available. The forms may be related to an organization. The organization may have submitted the forms. Another party may have submitted the forms.
Methods may include the computer processor collecting the forms every 36 hours or less. The computer processor may collect the forms in real-time as they are provided on the electronic portal. Real-time may be 2 hours or less. Real-time may be 1 hour or less. Real-time may be 30 minutes or less. Real-time may be 15 minutes or less. Real-time may be 5 minutes or less.
Methods may include the computer processor cleaning and preprocessing the forms to produce cleaned and preprocessed data.
Methods may include the computer processor running machine learning models to extract sets of features from the cleaned and preprocessed data.
For example, the sets of features may include a set of features relating to liquid, solvency, and profitability ratio classification. The sets of features may include a set of features relating to disclosure classification. The sets of features may include a set of features relating to sentiment analysis. The sets of features may include a set of features relating to anomaly detection classification. The sets of features may include a set of features relating to ownership analysis classification. The sets of features may include a set of features relating to ESG disclosure classification.
Methods may include the computer processor to running machine learning models to determine if a threshold has been exceeded indicating a risk of fraud.
The machine learning model may include a liquid, solvency, and profitability ratio classification machine learning model. The machine learning model may include a disclosure classification machine learning model. The machine learning model may include a sentiment analysis machine learning model. The machine learning model may include an anomaly detection classification machine learning model. The machine learning model may include an ownership analysis classification machine learning model. The machine learning model may include an ESG disclosure classification machine learning model.
Determining if a threshold has been exceeded may indicate a risk of fraud. Exceeding a threshold when running a liquid, solvency, and profitability ratio classification machine learning model may indicate a detection of one or more unusual liquid, solvency, and profitability ratios. Exceeding a threshold when running a disclosure classification machine learning model may indicate a detection of one or more ambiguous disclosures Exceeding a threshold when running a sentiment analysis machine learning model to indicate a detection of market manipulation. Exceeding a threshold when running an anomaly detection classification machine learning model to indicate a detection of one or more anomalies. For example, an organization with no employees. Exceeding a threshold when running an ownership analysis classification machine learning model may indicate a detection of one or more suspicious owners. Exceeding a threshold when running an ESG disclosure classification machine learning model may indicate a detection of one or more fraudulent disclosures are detected.
A sentiment analysis machine learning model may use a library of terms that indicate negative, neutral, or positive sentiment. An example of a dictionary containing a library of terms related to sentiment is the Loughran-McDonald Master Dictionary with Sentiment Word Lists. Manipulation of sentiment may include where terms which indicate sentiment are used and those terms convey a sentiment different than the disclosed forms indicate should be conveyed.
The forms may include SEC Form 10-K. The forms may include SEC Form 8-K. The forms may include SEC Form 10-Q. The forms may include SEC Form 4. The forms may include SEC Form SD. The electronic portal may include a portal of the Security and Exchange Commission (SEC). The electronic portal may include the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database.
Methods may include a computer processor notifying an administrator in an organization when one or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when two or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when three or more thresholds have been exceeded Methods may include a computer processor to notify an administrator in an organization when four or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when five or more thresholds have been exceeded. Methods may include a computer processor to notify an administrator in an organization when six or more thresholds have been exceeded. The organization may be the same organization as to which the forms are related. The organization may be an organization different than the organization to which the forms are related.
Methods may include a computer processor informing the administrator in the organization with an identity of the threshold which has been exceeded.
Methods may include where the organization providing the forms and the organization determining if there is a risk of fraud are different organizations. Methods may include where the organization providing the forms and the organization determining if there is a risk of fraud are the same organization.
Methods may include applying a time series analysis to a machine learning model. Methods may include notifying an administrator in the second organization, using the computer processor, when an unusual temporal pattern has been detected.
Methods may include applying a clustering classification to one or more machine learning models. Methods may include the computer processor notifying an administrator in the organization when detecting an anomalous cluster.
Apparatus and methods described herein are illustrative. Apparatus and methods in accordance with this disclosure will now be described in connection with the figures, which form a part hereof. The figures show illustrative features of apparatus and method steps in accordance with the principles of this disclosure. It is to be understood that other embodiments may be utilized, and that structural, functional, and procedural modifications may be made without departing from the scope and spirit of the present disclosure.
The steps of methods may be performed in an order other than the order shown or described herein. Embodiments may omit steps shown or described in connection with illustrative methods. Embodiments may include steps that are neither shown nor described in connection with illustrative methods.
Illustrative method steps may be combined. For example, an illustrative method may include steps shown in connection with another illustrative method.
Apparatus may omit features shown or described in connection with illustrative apparatus. Embodiments may include features that are neither shown nor described in connection with the illustrative apparatus. Features of illustrative apparatus may be combined. For example, an illustrative embodiment may include features shown in connection with another illustrative embodiment.
Computer 101 may have a processor 103 for controlling the operation of the device and its associated components, and may include RAM 105, ROM 107, input/output (“I/O”) 109, and a non-transitory or non-volatile memory 115. Machine-readable memory may be configured to store information in machine-readable data structures. Processor 103 may also execute all software running on the computer. Other components commonly used for computers, such as EEPROM or Flash memory or any other suitable components, may also be part of the computer 101.
Memory 115 may be comprised of any suitable permanent storage technology-e.g., a hard drive. Memory 115 may store software including the operating system 117 and application program(s) 119 along with any data 111 needed for the operation of the system 100. Memory 115 may also store videos, text, and/or audio assistance files. The data stored in memory 115 may also be stored in cache memory, or any other suitable memory.
I/O module 109 may include connectivity to a microphone, keyboard, touch screen, mouse, and/or stylus through which input may be provided into computer 101. The input may include input relating to cursor movement. The input/output module may also include one or more speakers for providing audio output and a video display device for providing textual, audio, audiovisual, and/or graphical output. The input and output may be related to computer application functionality.
System 100 may be connected to other systems via a local area network (LAN) interface 113. System 100 may operate in a networked environment supporting connections to one or more remote computers, such as terminals 141 and 151. Terminals 141 and 151 may be personal computers or servers that include many or all the elements described above relative to system 100. The network connections depicted in
It will be appreciated that the network connections shown are illustrative and other means of establishing a communications link between computers may be used. The existence of various well-known protocols such as TCP/IP, Ethernet, FTP, HTTP, and the like is presumed, and the system can be operated in a client-server configuration to permit retrieval of data from a web-based server or application programming interface (API). Web-based, for the purposes of this application, is to be understood to include a cloud-based system. The web-based server may transmit data to any other suitable computer system. The web-based server may also send computer-readable instructions, together with the data, to any suitable computer system. The computer-readable instructions may include instructions to store the data in cache memory, the hard drive, secondary memory, or any other suitable memory.
Additionally, application program(s) 119, which may be used by computer 101, may include computer executable instructions for invoking functionality related to communication, such as e-mail, Short Message Service (SMS), and voice input and speech recognition applications. Application program(s) 119 (which may be alternatively referred to herein as “plugins,” “applications,” or “apps”) may include computer executable instructions for invoking functionality related to performing various tasks. Application program(s) 119 may utilize one or more algorithms that process received executable instructions, perform power management routines or other suitable tasks.
Application program(s) 119 may include computer executable instructions (alternatively referred to as “programs”). The computer executable instructions may be embodied in hardware or firmware (not shown). Computer 101 may execute the instructions embodied by the application program(s) 119 to perform various functions.
Application program(s) 119 may utilize the computer-executable instructions executed by a processor. Generally, programs include routines, programs, objects, components, data structures, etc., that perform tasks or implement abstract data types. A computing system may be operational with distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, a program may be in both local and remote computer storage media including memory storage devices. Computing systems may rely on a network of remote servers hosted on the Internet to store, manage, and process data (e.g., “cloud computing” and/or “fog computing”).
Any information described above in connection with data 111, and any other suitable information, may be stored in memory 115.
The invention may be described in the context of computer-executable instructions, such as application(s) 119, being executed by a computer. Generally, programs include routines, programs, objects, components, data structures, etc., that perform tasks or implement particular data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, programs may be in both local and remote computer storage media including memory storage devices. It should be noted that such programs may be considered for the purposes of this application as engines with respect to the performance of the tasks to which the programs are assigned.
Computer 101 and/or terminals 141 and 151 may also include various other components, such as a battery, speaker, and/or antennas (not shown). Components of computer system 101 may be linked by a system bus, wirelessly or by other suitable interconnections. Components of computer system 101 may be present on one or more circuit boards. In some embodiments, the components may be integrated into a single chip. The chip may be silicon-based.
Terminal 141 and/or terminal 151 may be portable devices such as a laptop, cell phone, tablet, smartphone, or any other computing system for receiving, storing, transmitting and/or displaying relevant information. Terminal 141 and/or terminal 151 may be one or more user devices. Terminals 141 and 151 may be identical to system 100 or different. The differences may be related to hardware components and/or software components
The invention may be operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, tablets, mobile phones, smart phones and/or other personal digital assistants (“PDAs”), multiprocessor systems, microprocessor-based systems, cloud-based systems, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Apparatus 200 may include one or more of the following components: I/O circuitry 204, which may include a transmitter device and a receiver device and may interface with fiber optic cable, coaxial cable, telephone lines, wireless devices, PHY layer hardware, a keypad/display control device or any other suitable media or devices; peripheral devices 206, which may include counter timers, real-time timers, power-on reset generators or any other suitable peripheral devices; logical processing device 208, which may compute data structural information and structural parameters of the data; and machine-readable memory 210.
Machine-readable memory 210 may be configured to store in machine-readable data structures: machine executable instructions, (which may be alternatively referred to herein as “computer instructions” or “computer code”), applications such as applications 119, signals, and/or any other suitable information or data structures.
Components 202, 204, 206, 208 and 210 may be coupled together by a system bus or other interconnections 212 and may be present on one or more circuit boards such as circuit board 220. In some embodiments, the components may be integrated into a single chip. The chip may be silicon-based.
Computer processor 306 may train fraud detection engine 310. Computer processor 306 may use the forms to train fraud detection engine 310. Computer processor 306 may store the forms at forms storage 312. Computer processor 306 may clean and preprocess the forms at model data 314. Model data 314 may include one set of cleaned and preprocessed data. Model data 314 may include multiple sets of cleaned and preprocessed data. For example, sets of cleaned and preprocessed model data 314 may include CP1t, CP2t, CP3t, through CPnt, referring to a first, second, third, and “n” set of cleaned and preprocessed model data. Examples of multiples sets of cleaned and preprocessed data may be found in Table 1.
Computer processor 306 may extract features from model data 314 to obtain feature extractions 316. Computer processor 306 may run a machine learning (ML) model to extract features from model data 314 to obtain feature extractions 316. For example, feature extractions 316 may include FX1t, FX2t, FX3t, through FXnt, referring to a first, second, third, and “n” set of feature extractions.
Feature extractions 316 may be used to train ML model 318. Fraud detection engine 310 may include fraud detection ML model 318. Extracting features may identify the most discriminating characteristics in the cleaned and processed forms, which a machine learning algorithm can more easily utilize.
Examples of feature extractions may be found in Table 2.
ML model 318 may include training a comprehensive ML model with various subunits performing specific tasks. ML model 318 may include training separate ML models with each ML model to perform a specific task. For example, ML model 318 may include ML1t, ML2t, ML3t, through MLnt, referring to a first, second, third, and “n” set of machine learning models. A computer processor may train ML model 318 to return a value that may be used to determine the presence of a fraud risk. Feature extractions 316 may be used to train ML model 318. For example, if a threshold set by the model is exceeded, the model may indicate the presence of a fraud risk. Examples of ML models and what the ML models may indicate regarding the risk of fraud may be found in Table 3.
Types of ML models may include support vector machines. Types of ML models may include logistic regression. Types of ML models may include random forest. Types of ML models may include decision tree. Types of ML models may include classification models. Types of ML models may include clustering models. Types of ML models may include natural language processing models.
Multiple ML models may be run on a data set, for example to identify sets of features. Multiple ML models may be run on a data set, for example to predict when a threshold has been exceeded. The best ML model may be chosen from different models tried. The best ML model may mean that the ML model identifies sets of features with the most clarity of any ML models tried. The best ML model may mean that the ML model identifies when a threshold has been exceeded and thereby predicting potential fraud with the highest accuracy of any ML models tried.
Training ML model 318 may include training a liquid, solvency, and profitability ratio classification ML model to indicate an unusual liquid, solvency, and/or profitability ratio when an output value from the model exceeds a threshold. An unusual liquid, solvency, and/or profitability ratio may indicate fraud. For example, fraud may include misguiding the public to make incorrect conclusions about an organization's ability to operate well going into the future.
Training ML model 318 may include training a disclosure classification ML model to indicate an ambiguous disclosure when an output value from the model exceeds a threshold. An ambiguous disclosure may indicate fraud. For example, fraud may include performing fraudulent activity while describing those activities in vague, ambiguous terms and thereby evading scrutiny necessary to identify the fraud.
Training ML model 318 may include training a sentiment analysis ML model to indicate an attempt to manipulate a market when an output value from the model exceeds a threshold. An attempt to manipulate a market may indicate fraud. For example, fraud may include manipulating a market to shift the price of a stock of the organization. Manipulating a market may include manipulation of sentiment by using terms which indicate a sentiment which is different a sentiment different than the disclosed forms indicate should be conveyed.
Training ML model 318 may include training an anomaly detection classification ML model to indicate an anomaly when an output value from the model exceeds a threshold. An anomaly may indicate fraud. An anomaly may identify unusual patterns or behaviors in the organization. For example, unusual patterns or behaviors may include no employees. For example, unusual patterns or behaviors may include no physical location. The computer processor may use natural language processing (NLP) techniques to identify anomalies.
Training ML model 318 may include training an ownership analysis classification ML model to indicate a suspicious owner when an output value from the model exceeds a threshold. A suspicious owner may indicate fraud. For example, the fraud may include a front company and a shell company.
Training ML model 318 may include training an ESG disclosure classification ML model to indicate a fraudulent disclosure when an output value from the model exceeds a threshold. A fraudulent disclosure may indicate fraud. For example, the fraud may include a misleading disclosure about the organization's compliance with ESG regulations.
Computer processor 306 may train ML model 318 iteratively. For example, the trained fraud detection ML model 318 from above may be tested with a new extraction of forms from electronic portal 308. Computer processor 306 may store the forms at form storage 312. Computer processor 306 may clean and preprocess the forms at model data 314. Computer processor 306 may extract features from model data 314 to obtain feature extractions 316. Feature extractions 316 may be used to test and fine-tune ML model 318. Computer processor 306 may test ML model 318 with forms that contain examples which would exceed a threshold of ML model 318 and examples which would not exceed a threshold of ML model 318. Exceeding a threshold may indicate a possibility of fraud. Computer processor 306 may measure the accuracy of predictions by ML model 318. When ML model 318 predicts the presence of fraud and the absence of fraud with an accuracy that meets an accuracy threshold, then computer processor 306 may determine that ML model 318 is ready to move past training stage 302 and move to implementation stage 304. ML model 318 may be included in ML model 342.
Implementation state 304 may include a computer processor 330 communicating with electronic portal 332 to obtain forms. Electronic portal 332 may include a Securities and Exchange Commission (SEC) portal. Electronic portal 332 may include the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The forms may include electronic forms. The forms may include financial forms. The forms may include submissions by an organization to an SEC portal. The forms may include submissions by an organization to the EDGAR database. The forms may include SEC Form 10-K. The forms may include SEC Form 8-K. The forms may include SEC Form 10-Q. The forms may include SEC Form 4. The forms may include SEC Form SD.
Computer processor 330 may include computer processor 306. Computer processor 330 may be a computer processor that does not include computer processor 306. Electronic portal 332 may include electronic portal 308. Electronic portal 332 may be an electronic portal that does not include SEC portal 308.
Computer processor 330 may communicate with electronic portal 332 to obtain forms. Computer processor 330 may implement fraud detection engine 334. Computer processor 330 may run fraud detection engine 334 by using the forms found at forms storage 336. Computer processor 330 may clean and process the forms at model data 338. For example, sets of cleaned and preprocessed model data 338 may include CP1i, CP2i, CP3i, through CPni, referring to a first, second, third, and “n” set of cleaned and preprocessed model data. Examples of multiples sets of cleaned and preprocessed data may be found in Table 1.
Computer processor 330 may extract features from model data 338 to obtain feature extractions 340. For example, feature extractions 340 may include FX1i, FX2i, FX3i, through FXni, referring to a first, second, third, and “n” set of feature extractions. Computer processor 330 may run a ML model to extract features from model data 338 to obtain feature extractions 340.
Feature extractions 340 may be used to implement ML model 342. Fraud detection engine 334 may include fraud detection ML model 342. Extracting features may identify the most discriminating characteristics in the cleaned and processed forms, which a machine learning algorithm can more easily utilize. Examples of feature extractions may be found in Table 2.
ML model 342 may include implementing a comprehensive ML model with various subunits performing specific tasks. ML model 342 may include implementing separate ML models with each ML model to perform a specific task. For example, ML model 342 may include ML1i, ML2i, ML3i, through MLni, referring to a first, second, third, and “n” set of machine learning models. A computer processor may implement ML model 342 to return a value used to determine the presence of a fraud risk. The computer processor may implement ML model 342 using feature extractions 340 to return a value that may be used to determine the presence of a fraud risk. For example, when a threshold set by the model is exceeded, the model may indicate the presence of a fraud risk. ML model 342 may include ML model 318. ML model 318 may have been trained in training stage 302. Examples of ML models and what the ML models may indicate regarding the risk of fraud may be found in Table 3.
ML model 342 may include a liquid, solvency, and profitability ratio classification ML model, which may indicate an unusual liquid, solvency, and/or profitability ratio when an output value from the model exceeds a threshold. An unusual liquid, solvency, and/or profitability ratio may indicate fraud. For example, fraud may include misguiding the public to make incorrect conclusions about an organization's ability to operate well going into the future.
ML model 342 may include a disclosure classification ML model, which may indicate an ambiguous disclosure when an output value from the model exceeds a threshold. An ambiguous disclosure may indicate fraud. An ambiguous disclosure may indicate fraud. For example, fraud may include performing fraudulent activity while describing those activities in vague, ambiguous terms and thereby evading scrutiny necessary to identify the fraud.
ML model 342 may include a sentiment analysis ML model, which may indicate an attempt to manipulate a market when an output value from the model exceeds a threshold. An attempt to manipulate a market may indicate fraud. For example, fraud may include manipulating a market to shift the price of a stock of the organization.
Manipulating a market may include changing public sentiment by altering how an organization is messaging information. The organization may message it information is a way that deviates from historical patterns to manipulate public sentiment about the organization. The altered sentiment may affect trading patterns with the organization's stock.
ML model 342 may include an anomaly detection classification ML model, which may indicate an anomaly when an output value from the model exceeds a threshold. An anomaly may indicate fraud. For example, the fraud may include something unusual about the organization such as no employees or no physical location.
ML model 342 may include an ownership analysis classification ML model, which may indicate a suspicious owner when an output value from the model exceeds a threshold. A suspicious owner may indicate fraud. For example, the fraud may include a front company and a shell company.
ML model 342 may include an ESG disclosure classification ML model, which may indicate a fraudulent disclosure when an output value from the model exceeds a threshold. A fraudulent disclosure may indicate fraud. For example, the fraud may include a misleading disclosure about the organization's compliance with ESG regulations.
When computer processor 330 detects a presence of risk of fraud, action 344 may be taken. Action 344 may include preparing a report. Action 344 may include preparing a report and sending it to the organization. Action 344 may include preparing a report and sending it to an administrator in the organization.
At step 408, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to liquid, solvency, and profitability ratio classification. At step 410, using the extracted features, the computer processor may train a liquid, solvency, and profitability ratio classification ML model to detect when a threshold has been exceeded. Exceeding a threshold may indicate a potentially unusual ratio. Exceeding a threshold may include other suitable indications of potential fraud.
At step 412, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to disclosure classification. At step 414, using the extracted features, the computer processor may train a disclosure classification ML model to detect when a threshold has been exceeded. Exceeding a threshold may indicate a potentially ambiguous disclosure. Exceeding a threshold may include other suitable indications of potential fraud.
At step 416, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to sentiment analysis. At step 418, using the extracted features, the computer processor may train a sentiment analysis ML model to detect when a threshold has been exceeded. Exceeding a threshold may indicate a potential attempt by the organization to manipulate a market to shift the price of a stock of the organization. Exceeding a threshold may include other suitable indications of potential fraud.
At step 420, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to anomaly detection classification. At step 422, using the extracted features, the computer processor may train an anomaly detection classification ML model to detect when a threshold has been exceeded. Exceeding a threshold may indicate a potential anomaly. Exceeding a threshold may include other suitable indications of potential fraud.
At step 424, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to ownership analysis classification. At step 426, using the extracted features, the computer processor may train an ownership analysis classification ML model to detect when a threshold has been exceeded. Exceeding a threshold may indicate a potentially suspicious owner. Exceeding a threshold may include other suitable indications of potential fraud.
At step 428, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to ESG disclosure classification. At step 430, using the extracted features, the computer processor may train an ESG disclosure classification ML model to detect when a threshold has been exceeded. Exceeding a threshold may indicate a potentially fraudulent disclosure. Exceeding a threshold may include other suitable indications of potential fraud.
Step 432 may be the end of the training phase. At step 432, the computer processor may provide a trained engine for implementation in detecting fraud in an organization.
At step 508, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to liquid, solvency, and profitability ratio classification. At step 510, using the extracted features, the computer processor may run a liquid, solvency, and profitability ratio classification ML model to detect when a threshold has been exceeded. The ML model may be the liquid, solvency, and profitability ratio classification ML model trained at step 410 in
At step 512, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to disclosure classification. At step 514, using the extracted features, the computer processor may run a disclosure classification ML model to detect when a threshold has been exceeded. The ML model may be the disclosure classification ML model trained at step 414 in
At step 516, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to sentiment analysis. At step 518, using the extracted features, the computer processor may run a sentiment analysis ML model to detect when a threshold has been exceeded. The ML model may be the sentiment analysis ML model trained at step 418 in
At step 520, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to anomaly detection classification. At step 522, using the extracted features, the computer processor may run an anomaly detection classification ML model to detect when a threshold has been exceeded. The ML model may be the anomaly detection classification ML model trained at step 422 in
At step 524, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to ownership analysis classification. At step 526, using the extracted features, the computer processor may run an ownership analysis classification ML model to detect when a threshold has been exceeded. The ML model may be the ownership analysis classification ML model trained at step 426 in
At step 528, the computer processor may run an ML model to extract features from the cleaned and processed form where the extracted features relate to ESG disclosure classification. At step 530, using the extracted features, the computer processor may run an ESG disclosure classification ML model to detect when a threshold has been exceeded. The ML model may be the ESG disclosure classification ML model trained at step 430 in
At step 532, a computer processor may determine if a threshold has been exceeded. If a threshold has been exceeded, at step 534 the computer may present a report to the organization that a threshold has been exceeded. The computer may present a report to an administrator in the organization that a threshold has been exceeded. The organization may be the same organization as the organization where the exceeded threshold was found. The organization may be a different organization to the organization where the exceeded threshold was found. The organization may be an organization determining whether to provide funds to the organization who provided the forms which caused the threshold to be exceeded.
If a threshold has not been exceeded, at step 536 the computer processor may provide the organization with a report that no thresholds have been exceeded.
Thus, systems and methods for alerting an organization about potential fraud are provided. Systems and methods for using a machine learning model to assess forms available online to identify potential fraud and alert an organization are provided. Persons skilled in the art will appreciate that the present invention can be practiced by other than the described embodiments, which are presented for purposes of illustration rather than of limitation. The present invention is limited only by the claims that follow.