Claims
- 1. A method for inferring a behavioral characteristic of an entity from a large volume of multi-entity transaction data, comprising:extracting N ordered pairs from a telephone number database, the N ordered pairs having a telephone number and a business status value indicating whether the telephone number belongs to a business; storing transaction data including a plurality of call detail records, each of said records having an originating telephone number, a dialed telephone number, a connect time and a duration; extracting a first sequence of transactions corresponding to the N ordered pair telephone numbers from the transaction data; identifying a plurality of features indicative of the business status value within the first sequence of transactions; building a model to predict the business status value from the features; extracting a second sequence of transactions corresponding to a telephone number of an entity from the transaction data; predicting a business status value for the entity using the model and the second sequence of transactions; and inferring whether the entity is a business from the predicted business status value.
- 2. The method of claim 1 further comprising:analyzing a subsequent set of call detail records using the model; determining a revised probability that the entity is a business based on the analysis of said subsequent set of call detail records and an earlier determined probability.
- 3. The method of claim 2, wherein said building the model includes processing staging and call aggregation.
- 4. The method of claim 3, wherein said processing staging occurs over a 24 hour period.
- 5. The method of claim 1, further comprising:forming a calling profile for each originating telephone number by binning the call detail records associated with the originating telephone number into a four-dimensional data array.
- 6. The method of claim 5, wherein said data array includes a day-of-week dimension, a time-of-day dimension, a duration dimension, and a status dimension.
- 7. The method of claim 1, wherein said model is formed using logistic regression techniques.
- 8. The method of claim 7, wherein said model is regularized using a ridge penalty.
- 9. The method of claim 1, wherein said building the model occurs over an update period.
- 10. The method of claim 9, wherein said update period is one day.
- 11. The method of claim 2, wherein said determining a revised probability includes updating the earlier determined probability based on exponential weighting and an aging factor.
- 12. The method of claim 1, wherein said model is based on linear regression.
- 13. The method of claim 1, wherein said model is based on decision trees.
- 14. The method of claim 1, wherein said model is based on neural nets.
CROSS REFERENCE TO RELATED APPLICATION
This application is related to and claims priority from provisional application No. 60/079,320 entitled TELECOMMUNICATIONS DATA MINING, filed Mar. 25, 1998, the disclosure of which is hereby incorporated by reference.
US Referenced Citations (6)
Number |
Name |
Date |
Kind |
6108658 |
Lindsay et al. |
Aug 2000 |
A |
6119103 |
Basch et al. |
Sep 2000 |
A |
6173280 |
Ramkumar et al. |
Jan 2001 |
B1 |
6185559 |
Brin et al. |
Feb 2001 |
B1 |
6188751 |
Scherer |
Feb 2001 |
B1 |
6240411 |
Thearling |
May 2001 |
B1 |
Non-Patent Literature Citations (1)
Entry |
Statistical Inference and Data Mining, Glymour et al., Communications of the ACM, vol. 39, No. 11, (Nov. 1996(35-41. |
Provisional Applications (1)
|
Number |
Date |
Country |
|
60/079320 |
Mar 1998 |
US |