This application claims priority to Chinese Patent Application No. 202210146843.7, entitled “VOICE CHIP IMPLEMENTATION METHOD, VOICE CHIP, AND RELATED DEVICE”, and filed on Feb. 17, 2022.
The present disclosure relates to the technical field of artificial intelligence (AI), and in particular, to a method of voice chip implementation, a voice chip, an intelligent voice product, an electronic device, and a storage medium in fields such as intelligent voice and AI chips.
In a current intelligent voice product such as a smart speaker, a dual-chip design is generally adopted. That is, a chip processing function in the intelligent voice product is completed by using two voice chips, and the two chips are configured to complete different functions respectively. However, implementation costs are high due to the need to use two chips.
The present disclosure provides a method of voice chip implementation, a voice chip, an intelligent voice product, an electronic device, and a storage medium.
A method of voice chip implementation, including:
constructing a voice chip including a first Digital Signal Processor (DSP) and a second DSP, the first DSP and the second DSP corresponding to a same Digital Signal Processor core Identifier (DSP core IP) and adopting heterogeneous designs; and
completing a chip processing function in a corresponding intelligent voice product by using the voice chip, wherein different functions are completed by using the first DSP and the second DSP respectively.
A voice chip, including: a first DSP and a second DSP:
the first DSP and the second DSP corresponding to a same DSP core IP: and
the voice chip being configured to realize a chip processing function in an intelligent voice product, wherein the first DSP and the second DSP adopt heterogeneous designs and are respectively configured to implement different functions in the intelligent voice product.
An intelligent voice product, including: the voice chip as described above.
An electronic device, including:
at least one processor; and
a memory in communication connection with the at least one processor: wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method as described above.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are configured to cause a computer to perform the method as described above.
A computer program product, including a computer program/instruction, wherein, when the computer program/instruction is executed by a processor, the method as described above is performed.
One of the above disclosed embodiments has the following advantages or beneficial effects. The chip processing function in the corresponding intelligent voice product can be completed by using the constructed voice chip including two DSPs, and the two DSPs can adopt heterogeneous designs and be configured to complete different functions respectively, so as to realize replacement of the dual chips in the original intelligent voice product with a single chip, thereby reducing implementation costs. Besides, the two DSPs correspond to a same DSP core IP, thereby further reducing the implementation costs.
It should be understood that the content described in this part is neither intended to identify key or significant features of the embodiments of the present disclosure, nor intended to limit the scope of the present disclosure. Other features of the present disclosure will be made easier to understand through the following description.
The accompanying drawings are intended to provide a better understanding of the solutions and do not constitute a limitation on the present disclosure. In the drawings,
Exemplary embodiments of the present disclosure are illustrated below with reference to the accompanying drawings, which include various details of the present disclosure to facilitate understanding and should be considered only as exemplary. Therefore, those of ordinary skill in the art should be aware that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present disclosure. Similarly, for clarity and simplicity, descriptions of well-known functions and structures are omitted in the following description.
In addition, it should be understood that the term “and/or” herein is merely an association relationship describing associated objects, indicating that three relationships may exist. For example, A and/or B indicates that there are three cases of A alone, A and B together, and B alone. Besides, the character “/” herein generally means that associated objects before and after it are in an “or” relationship.
In step 101, a voice chip including a first DSP and a second DSP is constructed, the first DSP and the second DSP corresponding to a same DSP core IP and adopting heterogeneous designs.
In step 102, a chip processing function in a corresponding intelligent voice product is completed by using the voice chip, wherein different functions are completed by using the first DSP and the second DSP respectively.
As can be seen, in the solution of the above method embodiment, the chip processing function in the corresponding intelligent voice product can be completed by using the constructed voice chip including two DSPs, and the two DSPs can adopt heterogeneous designs and be configured to complete different functions respectively, so as to realize replacement of the dual chips in the original intelligent voice product with a single chip, thereby reducing implementation costs. Besides, the two DSPs correspond to a same DSP core IP, thereby further reducing the implementation costs.
For ease of expression, the two DSPs in the voice chip are called the first DSP and the second DSP respectively. Since the DSP core IP is costly, in the solution of the present disclosure, the first DSP and the second DSP may correspond to the same DSP core IP to reduce the implementation costs.
The two DSPs are configured to complete different functions respectively. In an embodiment of the present disclosure, the function completed by the first DSP includes: voice wake-up and voice recognition, and the function completed by the second DSP includes: an operating system, voice compression transmission, and extended wireless protocol connection.
For example, the operating system may be a Free Embedded Real-time Operating System (FreeRTOS), and a wireless protocol may include Bluetooth (BT), Wireless Fidelity (WIFI), and the like.
Different functions correspond to different requirements. For example, in order to realize the functions of voice wake-up and voice recognition, a required storage capacity is relatively fixed and relatively small, preferably 256 KB, but real-time requirements for programs and data are relatively high. However, in order to realize the functions such as the operating system, the voice compression transmission, and the extended wireless protocol connection, the required storage capacity is relatively large and not fixed, but the real-time requirements for programs and data are relatively low. To this end, in the solution of the present disclosure, the first DSP and the second DSP may adopt heterogeneous designs.
Specifically, in an embodiment of the present disclosure, the first DSP may adopt a standard configuration, including: a first DSP core, a first program memory, a first data memory, a first program cache (Icache), a first data cache (Dcache), a first advanced extensible interface master (AXI_M) bus interface, and a first advanced extensible interface slave (AXI_S) bus interface. The first data memory may be a dynamic random access memory (DRAM). The first program memory may be an IRAM, that is, highly integrated DRAM.
In an embodiment of the present disclosure, the second DSP may adopt a non-standard configuration, including: a second DSP core, a second Icache, a second Dcache, and a second AXI_M bus interface.
For ease of expression, the DSP core in the first DSP is called the first DSP core, the DSP core in the second DSP is called the second DSP core, and others are similar thereto. Details are not described again.
For the first DSP and the second DSP, the DSP core therein may be configured to complete various calculation and control functions, the program memory may be configured to store a DSP running program, using a local interface, the data memory may be configured to store DSP interaction data, using a local interface, the Icache may be configured to store an DSP extender or to access a program on an external device, the Dcache may be configured to store DSP extended data or to access data on the external device, the AXI_M bus interface may be configured for the DSP core to perform data operations on the external device, and the AXI_S bus interface may be configured for the external device to operate storage spaces of the program memory and the data memory.
A device may be classified as a master device and a slave device. The master device has the initiative, for example, which may read data from the slave device, may ingest data into the slave device, and the like, while the slave device is not initiative and may be used by the master device for data reading and data ingestion, and the like.
As described above, the first DSP may adopt a standard configuration, that is, there is no need to modify the first DSP, thereby being compatible with existing implementations, reducing implementation complexity, and so on.
Compared with the first DSP, the program memory and the data memory are removed from the second DSP. Correspondingly, in an embodiment of the present disclosure, the second DSP may access the external device through the second Icache and the second Dcache to acquire required programs and data.
In an embodiment of the present disclosure, the external device is located in the voice chip, and may include a Double Data Rate (DDR) synchronous DRAM controller and/or a Pseudo Static Random Access Memory (PSRAM).
The external device such as the DDR synchronous DRAM controller and/or the PSRAM has a relatively large storage space, so as to well meet a requirement of the second DSP to complete the functions such as the operating system, the voice compression transmission, and the extended wireless protocol connection.
The program memory and the data memory are removed from the second DSP, and the AXI_S bus interface may be configured for the external device to operate storage spaces of the program memory and the data memory. Therefore, correspondingly, the AXI_S bus interface in the second DSP can be removed.
In other words, the program memory, the data memory, and the AXI_S bus interface may be removed from the second DSP, thereby reducing implementation costs of the voice chip as a whole.
In an embodiment of the present disclosure, the second DSP may be further configured to share storage spaces of the first program memory and the first data memory through the second AXI_M bus interface and the first AXI_S bus interface, thereby increasing the storage space of the second DSP, improving efficiency of information interaction between the two DSPs, and so on.
In an actual application, simple address decoding may be performed on the first AXI_S bus interface, so that the second DSP can share the storage spaces of the first program memory and the first data memory through the second AXI_M bus interface and the first AXI_S bus interface.
Based on the above introduction,
In an actual application, in addition to the first DSP, the second DSP, and the DDR synchronous DRAM controller and/or the PSRAM, the voice chip constructed in the manner according to the present disclosure may further include some other components, as shown in
Compared with the first DSP and the second DSP, other components may all be referred to as an external device.
The NPU may be configured for implementation of a neural network algorithm. The DDR synchronous DRAM controller and/or the PSRAM are/is used as an external storage device, and may be configured to store programs and data required for the second DSP, and so on. The peripheral module may include communication interfaces such as a Serial Peripheral Interface (SPI), a high-speed Universal Asynchronous Receiver/Transmitter (UART) interface, and a Secure Digital Input and Output (SDIO) interface. The high-speed UART interface may be configured for BT connection, and the SDIO interface may be configured for connection of a WIFI module to realize a wireless communication extension function. The ROM and/or the flash may be configured to store a boot program of the DSP. For example, the first DSP may acquire the boot program from the ROM and/or the flash, and after completing a startup operation, release a startup right to the second DSP, and then the second DSP acquires the boot program from the ROM and/or the flash, complete a startup operation, and so on. The voice input/output module may be configured for voice input and output, and support devices in various audio formats such as an Inter-IC Sound (I2S) bus, Pulse Density Modulation (PDM), and Time Division Multiplexing (TDM).
As shown in
It is to be noted that, to make the description brief, the foregoing method embodiments are expressed as a series of actions. However, those skilled in the art should appreciate that the present disclosure is not limited to the described action sequence, because according to the present disclosure, some steps may be performed in other sequences or performed simultaneously. In addition, those skilled in the art should also appreciate that all the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily mandatory to the present disclosure.
The above is the introduction to the method embodiments. The following is a further illustration of the solutions of the present disclosure through apparatus embodiments.
The first DSP 501 and the second DSP 502 correspond to a same DSP core IP.
The voice chip 500 is configured to realize a chip processing function in an intelligent voice product, wherein the first DSP 501 and the second DSP 502 adopt heterogeneous designs and are respectively configured to implement different functions in the intelligent voice product.
In the solution of the above embodiment, the chip processing function in the corresponding intelligent voice product can be completed by using the constructed voice chip including two DSPs, and the two DSPs can adopt heterogeneous designs and be configured to complete different functions respectively, so as to realize replacement of the dual chips in the original intelligent voice product with a single chip, thereby reducing implementation costs. Besides, the two DSPs correspond to a same DSP core IP, thereby further reducing the implementation costs.
The two DSPs are configured to complete different functions respectively. In an embodiment of the present disclosure, the function completed by the first DSP 501 may include: voice wake-up and voice recognition, and the function completed by the second DSP 502 may include: an operating system, voice compression transmission, and extended wireless protocol connection.
Different functions correspond to different requirements. For example, in order to realize the functions of voice wake-up and voice recognition, a required storage capacity is relatively fixed and relatively small, but real-time requirements for programs and data are relatively high. However, in order to realize the functions such as the operating system, the voice compression transmission, and the extended wireless protocol connection, the required storage capacity is relatively large and not fixed, but the real-time requirements for programs and data are relatively low. To this end, in the solution of the present disclosure, the first DSP 501 and the second DSP 502 may adopt heterogeneous designs.
Specifically, in an embodiment of the present disclosure, the first DSP 501 may adopt a standard configuration, including: a first DSP core, a first program memory, a first data memory, a first Icache, a first Dcache, a first AXI_M bus interface, and a first AXI_S bus interface.
In an embodiment of the present disclosure, the second DSP 502 may adopt a non-standard configuration, including: a second DSP core, a second Icache, a second Dcache, and a second AXI_M bus interface.
Compared with the first DSP 501, the program memory, the data memory, and the AXI_S bus interface are removed from the second DSP 502.
Correspondingly, in an embodiment of the present disclosure, the second DSP 502 may further access the external device 503 through the second Icache and the second Dcache to acquire required programs and data.
In an embodiment of the present disclosure, the external device 503 may include: a DDR synchronous DRAM controller and/or a PSRAM.
The second DSP 502 in the embodiments shown in
Specific operation flows of the embodiments shown in
In brief, by use of the solution in the above embodiment, the dual chips in the original intelligent voice product can be replaced with a single chip, thereby reducing the implementation costs and making the product more competitive. In addition, the intelligent voice product may be a variety of intelligent voice products such as a smart speaker and a vehicle-mounted voice product, and has wide applicability.
The solutions of the present disclosure may be applied to the field of AI, and in particular, relate to fields such as intelligent voice and AI chips. AI is a discipline that studies how to make computers simulate certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, and planning) of human beings, which includes hardware technologies and software technologies. The AI hardware technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, and other technologies. The AI software technologies mainly include a computer vision technology, a speech recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology, and other major directions.
The voice in the embodiments of the present disclosure is not specific to a specific user and does not reflect a specific user's personal information. In addition, the voice can be acquired in various public, legal and compliant manners, such as acquired from the user after the user's authorization.
Collection, storage, use, processing, transmission, provision, and disclosure of users' personal information involved in the technical solutions of the present disclosure comply with relevant laws and regulations, and do not violate public order and moral.
According to embodiments of the present disclosure, the present disclosure further provides an electronic device, a readable storage medium, and a computer program product.
As shown in
A plurality of components in the device 800 are connected to the I/O interface 805, including an input unit 806, such as a keyboard and a mouse: an output unit 807, such as various displays and speakers: a storage unit 808, such as disks and discs; and a communication unit 809, such as a network card, a modem and a wireless communication transceiver. The communication unit 809 allows the device 800 to exchange information/data with other devices over computer networks such as the Internet and/or various telecommunications networks.
The computing unit 801 may be a variety of general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various AI computing chips, various computing units that run machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller or microcontroller, etc. The computing unit 801 performs the methods and processing described above, such as the method described in the present disclosure. For example, in some embodiments, the method described in the present disclosure may be implemented as a computer software program that is tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of a computer program may be loaded and/or installed on the device 800 via the ROM 802 and/or the communication unit 809. One or more steps of the method described in the present disclosure may be performed when the computer program is loaded into the RAM 803 and executed by the computing unit 801. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the method described in the present disclosure by any other appropriate means (for example, by means of firmware).
Various implementations of the systems and technologies disclosed herein can be realized in a digital electronic circuit system, an integrated circuit system, a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), an application-specific standard product (ASSP), a system on chip (SOC), a complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. Such implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, configured to receive data and commands from a storage system, at least one input apparatus, and at least one output apparatus, and to transmit data and commands to the storage system, the at least one input apparatus, and the at least one output apparatus.
Program codes configured to implement the methods in the present disclosure may be written in any combination of one or more programming languages. Such program codes may be supplied to a processor or controller of a general-purpose computer, a special-purpose computer, or another programmable data processing apparatus to enable the function/operation specified in the flowchart and/or block diagram to be implemented when the program codes are executed by the processor or controller. The program codes may be executed entirely on a machine, partially on a machine, partially on a machine and partially on a remote machine as a stand-alone package, or entirely on a remote machine or a server.
In the context of the present disclosure, the machine-readable medium may be tangible media which may include or store programs for use by or in conjunction with an instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any suitable combinations thereof. More specific examples of a machine-readable storage medium may include electrical connections based on one or more wires, a portable computer disk, a hard disk, an RAM, an ROM, an erasable programmable read only memory (EPROM or flash memory), an optical fiber, a compact disk read only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.
To provide interaction with a user, the systems and technologies described here can be implemented on a computer. The computer has: a display apparatus (e.g., a cathode-ray tube (CRT) or a liquid crystal display (LCD) monitor) for displaying information to the user; and a keyboard and a pointing apparatus (e.g., a mouse or trackball) through which the user may provide input for the computer. Other kinds of apparatuses may also be configured to provide interaction with the user. For example, a feedback provided for the user may be any form of sensory feedback (e.g., visual, auditory, or tactile feedback); and input from the user may be received in any form (including sound input, speech input, or tactile input).
The systems and technologies described herein can be implemented in a computing system including background components (e.g., as a data server), or a computing system including middleware components (e.g., an application server), or a computing system including front-end components (e.g., a user computer with a graphical user interface or web browser through which the user can interact with the implementation mode of the systems and technologies described here), or a computing system including any combination of such background components, middleware components or front-end components. The components of the system can be connected to each other through any form or medium of digital data communication (e.g., a communication network). Examples of the communication network include: a local area network (LAN), a wide area network (WAN), and the Internet.
The computer system may include a client and a server. The client and the server are generally far away from each other and generally interact via the communication network. A relationship between the client and the server is generated through computer programs that run on a corresponding computer and have a client-server relationship with each other. The server may be a cloud server, a distributed system server, or a server combined with blockchain.
It should be understood that the steps can be reordered, added, or deleted using the various forms of processes shown above. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different sequences, provided that desired results of the technical solutions disclosed in the present disclosure are achieved, which is not limited herein.
The above specific implementations do not limit the scope of protection of the present disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and replacements can be made according to design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principle of the present disclosure all should be included in the scope of protection of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202210146843.7 | Feb 2022 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/101760 | 6/28/2022 | WO |