This application relates to the field of storage technologies, and in particular to a deduplication method and apparatus.
With development of technologies, more data needs to be stored by using a storage system. To save storage space of the storage system, a deduplication technology is proposed. To be specific, if a plurality of copies of a specific piece of data are stored in the storage system, the plurality of copies of the data are deleted and only one copy of the data is saved, so that an objective of reducing storage space occupied by the data is achieved by reducing the data.
Currently, an implementation process of one deduplication technology is as follows: first, calculating a fingerprint of each piece of data, storing the data, and recording a mapping between the fingerprint and a storage address of the data; performing bulk deduplication on the stored data as to-be-deduplicated data. The bulk deduplication on the stored duplicated data includes: querying whether the stored data has a same fingerprint in a fingerprint table, determining that the data is duplicated data if the stored data has a same fingerprint in the fingerprint table, and determining that the data is unique data if the stored data does not have a same fingerprint in the fingerprint table; and deleting the mapping between the fingerprint of the data and the storage address. It can be learned that in the current deduplication technology, fingerprints of all to-be-duplicated data in the fingerprint table need to be searched for, to determine whether the data is duplicated data, resulting in low deduplication efficiency.
This application provides a deduplication method and apparatus, to improve efficiency of a deduplication technology.
According to a first aspect, a deduplication method is provided. In the method, a fingerprint record that includes a plurality of fingerprint record items is first obtained, where each fingerprint record item includes a fingerprint and a storage address of data corresponding to the fingerprint. If two pieces of data are the same but stored at different storage addresses, a different fingerprint record item is generated for each of the two pieces of data. The two fingerprint record items include a same fingerprint but storage addresses corresponding to the fingerprints are different. After the fingerprint record is obtained, at least two first fingerprint record items that include a same fingerprint are determined from the fingerprint record. For example, the at least two first fingerprint record items each include a first fingerprint. Then, deduplication is performed on data corresponding to the first fingerprints in the at least two first fingerprint record items, the at least two first fingerprint record items are deleted, and a stub of the first fingerprint is recorded in the fingerprint record, where the stub of the first fingerprint is used to indicate that the first fingerprint is a duplicated fingerprint.
In the foregoing technical solution, because a stub corresponding to a duplicated fingerprint is added to the fingerprint record item, a fingerprint included in the fingerprint record item may be directly determined as a duplicated fingerprint by using the stub, and there is no need to search a fingerprint table for all fingerprints of to-be-duplicated data in the conventional technology, to determine whether the data is duplicated data. Therefore, in this application, a duplicated fingerprint may be quickly determined, and deduplication is performed on data corresponding to the duplicated fingerprint, so that efficiency of a deduplication technology can be improved.
In a possible design, when new data is written into a storage system, a fingerprint record item corresponding to the new data is recorded in the fingerprint record. In an example, the fingerprint record item corresponding to the new data is recorded as a second fingerprint record item, and the second fingerprint record item includes the first fingerprint and a storage address of the new data. The stub of the first fingerprint indicates that the first fingerprint is a duplicated fingerprint, and the second fingerprint record item includes the first fingerprint. Therefore, it is determined that the first fingerprint in the second fingerprint record item is a duplicated fingerprint, and then deduplication is performed on the new data.
In the foregoing technical solution, after the new data is stored in the storage system, a fingerprint corresponding to the new data may be compared with the stub. If the fingerprint corresponding to the new data is the same as the fingerprint indicated by the stub, deduplication may be performed on the new data, so that deduplication may be performed on the data without querying the fingerprint table. This saves a process of querying the fingerprint table and improves the efficiency of the deduplication technology.
In a possible design, after the newly written data corresponding to the second fingerprint record item is deleted, the second fingerprint record item may be deleted.
In the foregoing technical solution, deleting an invalid fingerprint record item can reduce storage space occupied by the fingerprint record, so that utilization of the storage space can be improved.
In a possible design, when the storage space occupied by the fingerprint record is greater than or equal to a first threshold, a third fingerprint record item in a fingerprint record table may be deleted. A fingerprint included in the third fingerprint record item is different from fingerprints included in other fingerprint record items in the fingerprint record. That is, a fingerprint record item that occurs once in the fingerprint record is deleted.
In the foregoing technical solution, as more data is written into the storage system, the storage space occupied by the fingerprint record becomes larger, and the storage space occupied by the fingerprint record is greater than or equal to the first threshold after a specific time period. If a fingerprint record item occurs once within this time period, it indicates that a probability of repeatedly storing data corresponding to the fingerprint record item is small, and the fingerprint record item needs to wait for a longer time before deduplication can be performed. Therefore, the fingerprint record item may be directly deleted, to reduce the storage space occupied by the fingerprint record.
In a possible design, when the storage space occupied by the fingerprint record is greater than or equal to the first threshold, a fourth fingerprint record item may be deleted, and duration of storing the fourth fingerprint record item in the fingerprint record is greater than or equal to a second threshold. That is, a fingerprint record item with an earlier write time is deleted from the fingerprint record.
In the foregoing technical solution, if the data has been overwritten, the data will not be repeatedly stored in the storage system, and there is no need to perform deduplication on the data. An earlier time of writing a fingerprint record item into the fingerprint record indicates that data corresponding to the fingerprint record item is more likely to be overwritten with new data, so that the fingerprint record item written into the fingerprint record early may be deleted, to reduce the storage space occupied by the fingerprint record.
In a possible design, when the storage space occupied by the fingerprint record is greater than or equal to the first threshold, a fifth fingerprint record item in the fingerprint record table may be deleted, and the fingerprint record does not record a predetermined quantity of fifth fingerprint record items within a predetermined time period. That is, a fingerprint record item that occurs less frequently in the fingerprint record is deleted.
In the foregoing technical solution, if a fingerprint record item occurs less frequently within a predetermined time period, it indicates that a probability of repeatedly storing data corresponding to the fingerprint is small. Therefore, the fingerprint record item may be directly deleted, to reduce the storage space occupied by the fingerprint record.
In a possible design, if a stub of a second fingerprint is recorded in the fingerprint record, and the stub of the second fingerprint is used to indicate that the second fingerprint is a duplicated fingerprint, when the storage space occupied by the fingerprint record is greater than or equal to the first threshold, it may be determined whether the fingerprint record records a predetermined quantity of third fingerprint record items that include the second fingerprint within a predetermined time period. If the fingerprint record does not record the second predetermined quantity of third fingerprint record items within the predetermined time period, the stub of the second fingerprint in the fingerprint record is deleted.
In the foregoing technical solution, if after a stub of a fingerprint is recorded in the fingerprint record, fewer fingerprint record items corresponding to the fingerprint are subsequently recorded, it indicates that a quantity of times a duplicated fingerprint is determined by using the stub of the fingerprint is small, that is, the stub of the fingerprint contributes less to determining the duplicated fingerprint, so that the stub of the fingerprint can be deleted, to reduce the storage space occupied by the fingerprint record.
In this embodiment of this application, the first threshold, the second threshold, the predetermined quantity, and the predetermined time period are not limited.
According to a second aspect, a deduplication apparatus is provided. The deduplication apparatus may be a storage server, or may be an apparatus in a storage server. The deduplication apparatus includes a processor that is configured to implement the method described in the first aspect. The deduplication apparatus may further include a memory that is configured to store program instructions and data. The memory is coupled to the processor. The processor may invoke and execute the program instructions stored in the memory, to implement the method according to the first aspect. The deduplication apparatus may further include a communications interface. The communications interface is used by the deduplication apparatus to communicate with another device. For example, the another device is a client in a storage system.
In a possible design, the deduplication apparatus includes the processor and the communications interface.
The communications interface is configured to obtain a fingerprint record, where the fingerprint record includes a plurality of fingerprint record items, and each fingerprint record item includes a fingerprint.
The processor is configured to: determine at least two first fingerprint record items from the fingerprint record, where each first fingerprint record item includes a first fingerprint and a storage address of data corresponding to the first fingerprint, and storage addresses of data corresponding to the first fingerprints of the at least two first fingerprint record items are different;
perform deduplication on the data corresponding to the first fingerprints in the at least two first fingerprint record items;
delete the at least two first fingerprint record items; and
record a stub of the first fingerprint in the fingerprint record, where the stub of the first fingerprint is used to indicate that the first fingerprint is a duplicated fingerprint.
In a possible design, the processor is further configured to:
record a second fingerprint record item in the fingerprint record, where the second fingerprint record item includes the first fingerprint and a new storage address of the data corresponding to the first fingerprint, and the data corresponding to the first fingerprint in the second fingerprint record item is newly written data;
determine, based on the stub of the first fingerprint, that the first fingerprint in the second fingerprint record item is a duplicated fingerprint; and
perform deduplication on the newly written data.
In a possible design, the processor is further configured to:
delete the second fingerprint record item.
In a possible design, the processor is further configured to:
delete a third fingerprint record item when storage space occupied by the fingerprint record is greater than or equal to a first threshold, where a fingerprint included in the third fingerprint record item is different from fingerprints included in other fingerprint record items in the fingerprint record.
In a possible design, the processor is further configured to:
delete a fourth fingerprint record item when the storage space occupied by the fingerprint record is greater than or equal to the first threshold, where duration of storing the fourth fingerprint record item in the fingerprint record is greater than or equal to a second threshold.
In a possible design, the processor is further configured to:
when the storage space occupied by the fingerprint record is greater than or equal to the first threshold, a fifth fingerprint record item in a fingerprint record table is deleted, and the fingerprint record does not record a predetermined quantity of fifth fingerprint record items within a predetermined time period.
In a possible design, the processor is further configured to:
when the storage space occupied by the fingerprint record is greater than or equal to the first threshold, determine whether the fingerprint record records a predetermined quantity of third fingerprint record items within a predetermined time period; and
delete a stub of a second fingerprint in the fingerprint record when the fingerprint record does not record the predetermined quantity of third fingerprint record items within the predetermined time period, where the stub of the second fingerprint is used to indicate that the second fingerprint is a duplicated fingerprint, and the third fingerprint record item includes the second fingerprint.
According to a third aspect, a deduplication apparatus is provided. The deduplication apparatus may be a storage server, or may be an apparatus in a storage server. The deduplication apparatus may include a processing module and a communications module, and the modules may execute corresponding functions executed in any one of the design examples in the first aspect.
The communications module is configured to obtain a fingerprint record, where the fingerprint record includes a plurality of fingerprint record items, and each fingerprint record item includes a fingerprint.
The processing module is configured to: determine at least two first fingerprint record items from the fingerprint record, where each first fingerprint record item includes a first fingerprint and a storage address of data corresponding to the first fingerprint, and storage addresses of data corresponding to the first fingerprints of the at least two first fingerprint record items are different;
perform deduplication on the data corresponding to the first fingerprints in the at least two first fingerprint record items;
delete the at least two first fingerprint record items; and
record a stub of the first fingerprint in the fingerprint record, where the stub of the first fingerprint is used to indicate that the first fingerprint is a duplicated fingerprint.
According to a fourth aspect, an embodiment of this application further provides a computer-readable storage medium, including instructions, and when the instructions are run on a computer, the computer is enabled to perform the method according to the first aspect or any design in the first aspect.
According to a fifth aspect, an embodiment of this application further provides a computer program product, including instructions, and when the instructions are run on a computer, the computer is enabled to perform the method according to the first aspect or any design in the first aspect.
According to a sixth aspect, an embodiment of this application provides a chip system. The chip system includes a processor, and may further include a memory, and is configured to implement the method according to the first aspect or any design of the first aspect. The chip system may include a chip, or may include a chip and another discrete device.
According to a seventh aspect, an embodiment of this application provides a storage system. The storage system includes a storage device and the deduplication apparatus in the second aspect and any design of the second aspect; or the storage system includes a storage device and the deduplication apparatus in the third aspect and any design of the third aspect.
For beneficial effects of the second aspect to the sixth aspect and the implementations of the second aspect to the sixth aspect, refer to the descriptions of the beneficial effects of the method in the first aspect and the implementations of the first aspect.
To make objectives, technical solutions, and advantages of embodiments of this application clearer, the following further describes the embodiments of this application in detail with reference to the accompanying drawings.
The following describes technical terms in this application, to facilitate a person skilled in the art to understand the technical solutions of this application.
(1) A deduplication technology may include an inline deduplication mode and a post-process deduplication mode based on a moment at which deduplication is performed. The inline deduplication mode means that deduplication is performed before data in a cache of a storage system is stored in a storage device, and then data obtained after the deduplication is performed is stored in the storage device. The post-process deduplication mode means that after a fingerprint of the data in the cache is calculated and the data in the cache is stored in the storage device, a mapping between the fingerprint of the data and a storage address is recorded, the mapping is read in a preset time period (for example, when the storage system is idle), deduplication is performed on the data based on the fingerprint in the mapping, and the data after the deduplication is performed is stored in a deduplication area of the storage device. It should be noted that the technical solutions in this embodiment of this application are an improvement of the post-process deduplication mode.
(2) A fingerprint table is used to record a mapping between a fingerprint of unique data obtained after deduplication and a storage address of the unique data in a deduplication area. The deduplication area is a storage area, in the storage system, for storing the unique data obtained after the deduplication.
(3) In the embodiments of this application, “a plurality of” means two or more. In view of this, in the embodiments of this application, “a plurality of” may also be understood as “at least two”. “At least one” may be understood as one or more, for example, understood as one, two, or more. For example, “including at least one” means including one, two, or more, and does not limit which is included. For example, “including at least one of A, B, and C” may represent the following cases: A is included, B is included, C is included, A and B are included, A and C are included, B and C are included, or A, B, and C are included. The term “and/or” describes an association relationship for describing associated objects and represents that three relationships may exist. For example, A and/or B may represent the following three cases: Only A exists, both A and B exist, and only B exists. In addition, unless otherwise specified, the character “/” usually indicates an “or” relationship between the associated objects.
Unless otherwise stated, in the embodiments of this application, ordinal numbers such as “first” and “second” are intended to distinguish between a plurality of objects, and not intended to limit a sequence, a time sequence, a priority, or importance of the plurality of objects.
The following describes the deduplication method according to the embodiments of this application with reference to the accompanying drawings.
In
S201: The storage system obtains a fingerprint record.
Each fingerprint record item includes a mapping between a fingerprint and a storage address of data corresponding to the fingerprint. During post-process deduplication, the storage system receives data, calculates a fingerprint of the data, stores the data, generates a fingerprint record item, and performs deduplication on the stored data in a preset time period (for example, when the storage system is idle). The fingerprint record item includes a mapping between the fingerprint of the data and a storage address of the data.
In a specific implementation, the fingerprint record may be recorded in the form of a log, or the fingerprint record may be recorded in the form of an entry. This is not limited in this embodiment of this application.
In an example, as shown in
It should be noted that in a scenario in which the storage system is a distributed storage system, that the storage system obtains a fingerprint record specifically means that a server of the storage system obtains a fingerprint record. When the storage system is in another scenario, the fingerprint record may be obtained by another apparatus or device. For example, in a scenario in which the storage system is a storage array, that the storage system obtains a fingerprint record specifically means that an array controller of the storage array obtains a fingerprint record.
S202: The storage system sorts the fingerprint record items.
Specifically, the storage system may sort the fingerprint record items in the fingerprint record in an ascending order of FPs in the fingerprint record items. In this way, the fingerprint record items with a same fingerprint are arranged together. For example, in
S203: The storage system determines a duplicated fingerprint from the fingerprint record.
The storage system determines the duplicated fingerprint from the fingerprint record based on a threshold of the duplicated fingerprint. In this way, the storage system determines, based on the fingerprint record after sorting, whether a quantity of times that the fingerprint record items that include a same fingerprint occur is greater than or equal to the threshold, and if the quantity of times is greater than the threshold, the storage system determines that the fingerprint is a duplicated fingerprint.
If a fingerprint is a duplicated fingerprint, it indicates that data stored in storage addresses in the fingerprint record items that include the same fingerprint is duplicated data.
In an example, the threshold may be 3. In the fingerprint record shown in
S204: The storage system performs deduplication on data corresponding to a fingerprint that is determined as a duplicated fingerprint in a fingerprint record item.
The fingerprint record shown in
S205: The storage system deletes the fingerprint record item that includes the duplicated fingerprint from the fingerprint record.
In an example, the fingerprint record item that includes the duplicated fingerprint is deleted from the fingerprint record. For example, after the fingerprint record items that include FP_1 and FP_4 are deleted, a fingerprint record shown in
In the foregoing description, deduplication is performed on data, whose repetition times reach the threshold, of a fingerprint in the fingerprint record, so that a deduplication rate of the storage system is improved. However, if the fingerprint record items that include the fingerprints corresponding to the data are deleted from the fingerprint record after the deduplication, when data corresponding to the fingerprints are written into the storage system, deduplication cannot be performed on the newly written data because the fingerprint record does not include the fingerprint record items that include the fingerprints and the repetition times of the fingerprints corresponding to the newly written data cannot reach the threshold. To resolve this problem, this embodiment of this application further includes:
S206: The storage system records, in the fingerprint record, a stub of the fingerprint in the deleted fingerprint record item.
In this embodiment of this application, the stub of the fingerprint in the deleted fingerprint record item is used to indicate that the fingerprint in the deleted fingerprint record item is a duplicated fingerprint.
Specifically, in the fingerprint record shown in
S207: The storage system records a new fingerprint record item in the fingerprint record.
In this embodiment of this application, the new fingerprint record item includes the fingerprint FP_1 and a new storage address of data corresponding to the FP_1, and the data corresponding to the fingerprint FP_1 in the new fingerprint record item is newly written data.
The storage system receives new data, calculates a fingerprint of the new data, stores the new data, and generates a fingerprint record item corresponding to the new data.
S208: The storage system determines, based on the stub of the fingerprint in the deleted fingerprint record item, that the fingerprint in the new fingerprint record item is a duplicated fingerprint.
After the new fingerprint record item is recorded in the fingerprint record, the new fingerprint record item is compared with a stub in the fingerprint record, to determine whether the fingerprint in the new fingerprint record item is the same as a fingerprint corresponding to the stub; if the fingerprint in the new fingerprint record item is the same as the fingerprint corresponding to the stub, it is determined that the fingerprint in the new fingerprint record item is a duplicated fingerprint; or if the fingerprint in the new fingerprint record item is not the same as the fingerprint corresponding to the stub, the fingerprint is not a duplicated fingerprint. Accordingly, deduplication is performed until the repetition times of the fingerprint reaches the threshold.
In an example, in a fingerprint record in
In this way, after the new data is stored in the storage system, the fingerprint corresponding to the new data may be compared with the stub. If the fingerprint corresponding to the new data is the same as the fingerprint indicated by the stub, deduplication may be performed on the new data without waiting for repetition times of the fingerprint corresponding to the new data to reach the threshold. This can improve efficiency of a deduplication technology.
S209: The storage system performs deduplication on the newly written data.
If the fingerprint of the new data is a duplicated fingerprint, it indicates that the data has been stored in the storage device, so that deduplication can be directly performed on the new data.
It should be noted that when deduplication is performed on the newly written data, because the fingerprint table already stores the fingerprint, a mapping between a host access address of the new data and the fingerprint FP_1 may be directly established without querying the fingerprint table. Therefore, a delay of deduplication can be reduced.
S210: The storage system deletes the new fingerprint record item.
After deduplication is performed on the newly written service data, the new fingerprint record item corresponding to the new data is deleted from the fingerprint record, to obtain a fingerprint record shown in
It should be noted that the new data may be different from data already stored in the storage system. For example, the new data further includes data 23, a fingerprint FP_8 of the data 23 is obtained through calculation, a token corresponding to the service data 23 is a token_23, and a fingerprint record shown in
S211: The storage system deletes some fingerprint record items when the storage space occupied by the fingerprint record is greater than or equal to a first threshold.
In this embodiment of this application, the fingerprint record is stored in a deduplication metadata space. Because the deduplication metadata space is limited, as more data is written into the storage system, the storage space occupied by the fingerprint record may exceed the first threshold, where the first threshold may be, for example, 80% or 70% of a maximum value of the deduplication metadata space. If the storage space occupied by the fingerprint record exceeds the first threshold, as shown in
In this embodiment of this application, the deleting some fingerprint record items may include but is not limited to the following three manners.
Manner 1:
Delete a third fingerprint record item. A fingerprint included in the third fingerprint record item is different from fingerprints included in other fingerprint record items in the fingerprint record. That is, a fingerprint record item that occurs once in the fingerprint record is deleted.
If a fingerprint record item occurs once within this time period, it indicates that a probability of repeatedly storing data corresponding to the fingerprint is small, and the fingerprint record item needs to wait for a longer time before deduplication can be performed. Therefore, the fingerprint record item may be directly deleted, to reduce the storage space occupied by the fingerprint record.
In an example, in
Manner 2:
Delete a fourth fingerprint record item. Duration of storing the fourth fingerprint record item in the fingerprint record is greater than or equal to a second threshold. That is, a fingerprint record item with an earlier write time is deleted from the fingerprint record.
Because an earlier time of writing a fingerprint record item into the fingerprint record indicates that data corresponding to the fingerprint record item is more likely to be overwritten with new data, if the data has been overwritten, the data will not be repeatedly stored in the storage system, and there is no need to perform deduplication on the data. Therefore, a fingerprint record item written into the fingerprint record early may be deleted, to reduce the storage space occupied by the fingerprint record.
In an example, if data is written into the storage system in sequence, a smaller storage address of the data indicates a longer storage time of the data in the storage system. Accordingly, a fingerprint record item corresponding to the data is stored for a longer time in the fingerprint record. Therefore, duration of storing a fingerprint record item in the fingerprint record may be determined based on a value of a token. The second threshold may be a difference between maximum values of tokens in fingerprint record items, and the difference may be 20, 15, or the like. For example, the difference is 20. In
Manner 3:
Delete a fifth fingerprint record item in a fingerprint record table. The fingerprint record does not record a predetermined quantity of fifth fingerprint record items within a predetermined time period. That is, a fingerprint record item that occurs less frequently in the fingerprint record is deleted.
If a fingerprint record item occurs less frequently within a predetermined time period, it indicates that a probability of repeatedly storing data corresponding to the fingerprint is small. Therefore, the fingerprint record item may be directly deleted, to reduce the storage space occupied by the fingerprint record.
In an example, the predetermined quantity may be 1 (or 2), that is, a fingerprint record item corresponding to a fingerprint whose quantity of occurrences is less than or equal to 1 (or 2) in the fingerprint record is deleted. When the value of the predetermined quantity is 1, a result in this manner is the same as that in Manner 1. When the value of the predetermined quantity is 2, for a specific process of this manner, refer to the first manner. Details are not described herein again.
Manner 4:
If a stub of a second fingerprint is recorded in the fingerprint record, and the stub of the second fingerprint is used to indicate that the second fingerprint is a duplicated fingerprint, it is determined whether the fingerprint record records a predetermined quantity of third fingerprint record items that include the second fingerprint within a predetermined time period. If the fingerprint record does not record the second predetermined quantity of third fingerprint record items within the predetermined time period, the stub of the second fingerprint in the fingerprint record is deleted.
If after a stub of a fingerprint is recorded in the fingerprint record, fewer fingerprint record items corresponding to the fingerprint are subsequently recorded in the fingerprint record, it indicates that a quantity of times a duplicated fingerprint is determined by using the stub of the fingerprint is small, that is, the stub of the fingerprint contributes less to determining the duplicated fingerprint, so that the stub of the fingerprint can be deleted, to reduce the storage space occupied by the fingerprint record.
In an example, a quantity of fingerprints corresponding to a stub within a preset time period may be recorded in a record item corresponding to the stub of the fingerprints. For example, if a number parameter is added to a token, and a value of the number parameter is the quantity of fingerprints corresponding to the stub in the preset time period, for example, the value of the number parameter is 3, it indicates that the fingerprint record item including the fingerprints corresponding to the stub is recorded for three times in the preset time period, as shown in
In another example, a time point at which a duplicated fingerprint is determined last time by using a stub of a fingerprint may be recorded in a record item corresponding to the stub. For example, a sorting process in which a duplicated fingerprint is determined by using the stub may be recorded, and may be marked as sorted. As shown in
Manner 5:
Any two or more of Manners 1 to 4 may be combined.
In an example, Manner 1 is combined with Manner 2. In
In addition, in this embodiment of this application, after it is determined that the storage space occupied by the fingerprint record is greater than or equal to the first threshold, a storage server may first determine a quantity of fingerprint record items that need to be deleted, and then delete a corresponding quantity of fingerprint record items from the fingerprint record. For example, space occupied by each fingerprint record item is the same. In this case, the storage server may determine a maximum quantity of fingerprint record items that may be stored in the fingerprint record. For example, a maximum of 30 fingerprint record items may be stored. When the quantity of fingerprint record items reaches 33, it may be determined that three fingerprint record items need to be deleted. The three fingerprint record items that need to be deleted are determined based on any one of the foregoing five manners. Therefore, after three fingerprint record items that meet a condition need to be determined, the determined three fingerprint record items may be deleted without traversing the entire fingerprint record. This can improve efficiency.
It should be noted that a fingerprint record item that needs to be deleted may be determined in other manners, which are not illustrated herein.
In the foregoing technical solution, because a stub corresponding to a duplicated fingerprint is added to a fingerprint record item, a fingerprint included in the fingerprint record item may be directly determined as a duplicated fingerprint by using the stub. However, whether a fingerprint is a duplicated fingerprint is determined only after the fingerprint is repeated for a specific quantity of times in the conventional technology. Therefore, in the technical solution, a duplicated fingerprint can be determined quickly, and deduplication can be performed on data corresponding to the duplicated fingerprint. This can improve the efficiency of the deduplication technology.
In addition, it should be noted that, in the embodiment shown in
In the foregoing embodiments provided in this application, to implement the functions in the method provided in the foregoing embodiments of this application, the storage system may include a hardware structure and/or a software module, to implement the foregoing functions by using the hardware structure, the software module, or a combination of the hardware structure and the software module. Whether a specific function of the foregoing functions is implemented in a manner of a hardware structure, a software module, or a combination of a hardware structure and a software module depends on particular applications and design constraints of the technical solutions.
The deduplication apparatus 1100 may include a processing module 1101 and a communications module 1102.
The processing module 1101 may be configured to perform step S201 to S211 in the embodiment shown in
The communications module 1102 may be configured to support the communications system in the embodiment shown in
All related content of the steps in the foregoing method embodiments may be cited in function descriptions of corresponding function modules. Details are not described herein again.
In the embodiment shown in
The deduplication apparatus 1200 includes at least one processor 1220 configured to implement or support the deduplication apparatus 1200 to implement a function of the storage server in the method provided in the embodiment of this application. For example, the processor 1220 may perform deduplication on newly written data. For details, refer to the detailed description in the method embodiment, and the details are not described herein.
The deduplication apparatus 1200 may further include at least one memory 1230 configured to store program instructions and/or data. The memory 1230 is coupled to the processor 1220. Coupling in the embodiments of this application is an indirect coupling or a communication connection between apparatuses, units, or modules, may be in an electrical, a mechanical, or another form, and is used for information exchange between the apparatuses, the units, or the modules. The processor 1220 may cooperate with the memory 1230. The processor 1220 may execute the program instructions stored in the memory 1230. At least one of the at least one memory may be included in the processor.
The deduplication apparatus 1200 may further include a communications interface 1210 configured to communicate with another device through a transmission medium, so that the deduplication apparatus 1200 can communicate with the another device. For example, the another device may be a storage client or a storage device. The processor 1220 may send and receive data through the communications interface 1210.
This embodiment of this application does not limit a specific connection medium between the communications interface 1210, the processor 1220, and the memory 1230. In this embodiment of this application, the memory 1230, the processor 1220, and the communications interface 1210 are connected through a bus 1240 in
In this embodiment of this application, the processor 1220 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or another programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application. The general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed with reference to the embodiments of this application may be directly performed by using a hardware processor, or may be performed by using a combination of hardware in the processor and a software module.
In this embodiment of this application, the memory 1230 may be a non-volatile memory, for example, a hard disk drive (hard disk drive, HDD) or a solid-state drive (solid-state drive, SSD), or may be a volatile memory (volatile memory), for example, a random-access memory (random-access memory, RAM). The memory is any other medium that can carry or store expected program code in a form of an instruction or a data structure and that can be accessed by a computer, but is not limited thereto. The memory in this embodiment of this application may alternatively be a circuit or any other apparatus that can implement a storage function, and is configured to store program instructions and/or data.
An embodiment of this application further provides a computer-readable storage medium including instructions. When the instructions are run on a computer, the computer is enabled to perform the method implemented by the storage server in the embodiment shown in
An embodiment of this application further provides a computer program product including instructions. When the instructions are run on a computer, the computer is enabled to perform the method implemented by the storage server in the embodiment shown in
An embodiment of this application provides a chip system. The chip system includes a processor, and may further include a memory, and is configured to implement a function of the storage server in the foregoing method. The chip system may include a chip, or may include a chip and another discrete device.
An embodiment of this application provides a storage system. The storage system includes a storage device and a storage server in the embodiment shown in
All or some of the foregoing methods in the embodiments of this application may be implemented by software, hardware, firmware, or any combination thereof. When software is used to implement the embodiments, all or some of the embodiments may be implemented in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or some of the procedures or the functions according to the embodiments of this application are generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, a network device, user equipment, or other programmable apparatuses. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (digital subscriber line, DSL for short)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD for short)), a semiconductor medium (for example, an SSD), or the like.
Number | Date | Country | Kind |
---|---|---|---|
201910748958.1 | Aug 2019 | CN | national |
This application is a continuation of International Application No. PCT/CN2020/104846, filed on Jul. 27, 2020, which claims priority to Chinese Patent Application No. 201910748958.1, filed on Aug. 14, 2019. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/104846 | Jul 2020 | US |
Child | 17671224 | US |