1. Field of the Invention
The present invention relates to a sensor network system such as a microphone array network system which is provided for acquiring a speech of a high sound quality and a communication method therefor.
2. Description of the Related Art
Conventionally, in an application system (e.g., an audio teleconference system in which a plurality of microphones are connected, a speech recognition robot system, a system having various speech interfaces), which utilizes a vocal sound, various speech processing practices of speech source localization, speech source separation, noise cancellation, echo cancellation and so on are performed to utilize the vocal sound with a high sound quality. In particular, microphone arrays mainly intended for the processing of speech source localization and speech source separation are broadly researched for the purpose of acquiring a vocal sound with a high sound quality. In this case, the speech source localization specifies the direction and position of a speech source from sound arrival time differences, and the speech source separation is to extract a specific speech source in a specific direction by erasing sound sources that become noises by utilizing the results of speech source localization.
It has been known that the speech processing using microphone arrays normally improves its speech processing performance of noise processing and the like with an increased number of microphones. Moreover, in such speech processing, there is a number of speech source localization techniques using the position information of a speech source (See, for example, a Non-Patent Document 1). The speech processing becomes more effective as the results of speech source localization have better accuracy. In other words, it is required to concurrently improve the accuracy of the speech source localization and the noise cancellation intended for higher sound quality by increasing the number of microphones.
In a speech source localization method using a conventional large-scale microphone array, the positional range of a speech source is divided into positional ranges in a shape of mesh, and the speech source positions are stochastically calculated for respective intervals. For this calculation, there has been the practice of collecting all speech data in a speech processing server such as a work stations in one place and collectively processing all the speech data to estimate the position of the speech source (See, for example, a Non-Patent Document 2). In the case of the collective processing of all speech data as described above, the signal wiring length and communication traffic between the microphones for vocal sound collection and the speech processing server, and the calculation amount in the speech processing server have been vast. There is such a problem that the microphones cannot be increased in number due to the following:
(a) the increase in the wiring length, the communication traffic and the calculation amount in the speech processing server, and;
(b) such a physical limitation that a number of A/D converters cannot be arranged in one place of the speech processing server.
Moreover, there is also such a problem of occurrence of noises due to the increase in the signal wiring length. Therefore, there occurred a problem of difficulties in increasing the number of microphones intended for higher sound quality.
As a method for making improvements concerning the above problems, there has been known a speech processing system with a microphone array in which a plurality of microphones are grouped into small arrays and they are aggregated (See, for example, a Non-Patent Document 3). However, even in such a speech processing system, the speech data of all the microphones obtained in small arrays are aggregated into the speech server in one place via a network, and therefore, this leads to a problem of increase in the communication traffic of the network. Moreover, there is such a problem that a speech processing delay occurs in accordance with the increase in the communication data amount and the communication traffic amount.
Moreover, in order to satisfy demands for sound pickup in a ubiquitous system and a television conference system in the future, a greater number of microphones are necessary (See, for example, the Patent Document 1). However, in the current network system with a microphone array as described above, the speech data obtained by the microphone array is merely transmitted to the server as it is. We found out no system in which node devices of a microphone array mutually exchange position information of the speech source to reduce the calculation amount of the calculation amount in the entire system and reduce the communication traffic of the network. Therefore, a system architecture becomes important which reduces the calculation amount of the entire system and suppresses the communication traffic of the network by assuming an increase in the scale of the microphone array network system.
As described above, it has been demanded to improve the speech source localization accuracy by using a number of microphone arrays with suppressing the communication traffic and the calculation amount in the speech processing server and to effectively perform the speech processing of noise cancellation and so on. Moreover, a position measurement system using a speech source is proposed in these latter days. For example, the Patent Document 2 discloses computation of an ultrasonic tag by using an ultrasonic tag and a microphone array. Further, the Patent Document 3 discloses sound pickup by using a microphone array.
Prior art documents related to the present invention are as follows:
Patent Document 1: Japanese patent laid-open publication No. JP 2008-113164 A; and
Patent Document 2: Pamphlet of International Publication No. WO 2008/026463 A1;
Patent Document 3: Japanese patent laid-open publication No. JP 2008-058342 A; and
Patent Document 4: Japanese patent laid-open publication No. JP 2008-099075 A.
Non-Patent Document 1: Ralph O. Schmidt, “Multiple emitter location and signal parameter estimation”, In Proceedings of IEEE Transactions on Antennas and Propagation, Vol. AP-34, No. 3, March 1986.
Non-Patent Document 2: Eugene Weinstein et al., “Loud: A 1020-node modular microphone array and beamformer for intelligent computing spaces”, MIT, MIT/LCS Technical Memo MIT-LCS-TM-642, April 2004.
Non-Patent Document 3: Alessio Brutti et al., “Classification of Acoustic Maps to Determine Speaker Position and Orientation from a Distributed Microphone Network”, In Proceedings of ICASSP, Vol. IV, pp. 493-496, April. 2007.
Non-Patent Document 4: Wendi Rabiner Heinzelman et al., “Energy-Efficient Communication Protocol for Wireless Microsensor Networks”, Proceedings of the 33rd Hawaii International Conference on System Sciences, 2000, Vol. 8, pp. 1-10, January 2000.
Non-Patent Document 5: Vivek Katiyar et al., “A Survey on Clustering Algorithms for Heterogeneous Wireless Sensor Networks”, International Journal of Advanced Networking and Applications, Vol. 02, Issue 04, pp. 745-754, 2011.
Non-Patent Document 6: J. Benesty et al., “Springs Handbook of Speech Processing”, Springer, 50. Microphone arrays, pp. 1021-1041, 2008.
Non-Patent Document 7: Futoshi Asano et al., “Sound Source Localization and Signal Separation for Office Robot “Jijo-2””, Proceedings of the 1999 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Taipei, Taiwan, R.O.C., pp. 243-248, August 1999.
Non-Patent Document 8: Miklos Maroti et al., “The Flooding Time Synchronization Protocol”, Proceedings of 2nd ACM SenSys, pp. 39-49, November 2004.
Non-Patent Document 9: Takashi Takeuchi et al., “Cross-Layer Design for Low-Power Wireless Sensor Node Using Wave Clock”, IEICE Transactions on Communications, Vol. E91-B, No. 11, pp. 3480-3488, November 2008.
Non-Patent Document 10: Maleq Khan et al., “Distributed Algorithms for Constructing Approximate Minimum Spanning Trees in Wireless Networks”, IEEE Transactions on Parallel and Distributed Systems, Vol. 20, No 1, pp. 124-139, January 2009.
Non-Patent Document 11: Wei Ye et al., “Medium Access Control With Coordinated Adaptive Sleeping for Wireless Sensor Networks”, In proceedings of IEEE/ACM Transactions on Networking, Vol. 12, No. 3, pp. 493-506, 2004.
However, the position measurement function of the GPS system and the WiFi system mounted on many mobile terminals had such a problem that a positional relation between terminals at a short distance of tens of centimeters cannot be acquired even though a rough position on a map can be acquired.
For example, the Non-Patent Document 4 discloses a communication protocol to perform wireless communications by efficiently using transmission energy in a wireless sensor network. Moreover, the Non-Patent Document 5 discloses using a clustering technique for lengthening the lifetime of the sensor network as a method for reducing the energy consumption in a wireless sensor network.
However, the prior art clustering method, which is a technique limited to a network layer, considers neither the sensing object (application layer) nor the hardware configuration of node devices. This led to such a problem that the prior art technique is not adapted to an application that needs to configure paths based on the actual physical signal source position.
An object of the present invention is to solve the aforementioned problems and provide a sensor network system capable of performing data aggregation more efficiently than in the prior art, remarkably reducing the network traffic and reducing the power consumption of the sensor node devices in a sensor network system of, for example, a microphone array network system, and a communication method therefor.
In order to achieve the aforementioned objective, according to one aspect of the present invention, there is provided a sensor network system including a plurality of node devices each having a sensor array and known position information. The node devices are connected with each other in a network via predetermined propagation paths by using a predetermined communication protocol, and the sensor network system collects data measured at each of the node devices so as to be aggregated into one base station by using a time-synchronized sensor network system. Each of the node devices includes a sensor, a direction estimation processor part, and a communication processor part. The sensor array is configured to arrange a plurality of sensors in an array form. The direction estimation processor part operates when detecting a signal from a predetermined signal source received by the sensor array on the basis of the signal, to transmit a detected message to the base station and to estimate an arrival direction angle of the signal and transmit an angle estimation value to the base station, and is activated in response to an activation message at a time of detecting a signal received via a predetermined number of hops from other node devices to estimate an arrival direction angle of the signal and transmit an angle estimation value to the base station. The communication processor part performs an emphasizing process on a signal from a predetermined signal source received by the sensor array for each of the node devices belonging to a cluster designated by the base station in correspondence with the speech source, and transmits a signal that has undergone the emphasizing process to the base station. The base station calculates a position of the signal source on the basis of the angle estimation value of the signal from each of the node devices and position information of each of the node devices, designates a node device located nearest to the signal source as a cluster head node device, and transmits information of the position of the signal source and the designated cluster head node device to each of the node devices, thereby clustering each of the node devices located within the number of hops from the cluster head node device as a node belonging to each cluster. Each of the node devices performs an emphasizing process on the signal from the predetermined signal source received by the sensor array for each of the node devices belonging to the cluster designated by the base station in correspondence with the speech source, and transmits the signal that has undergone the emphasizing process to the base station.
In the above-mentioned sensor network system, each of the node devices is set into a sleep mode before detecting the signal and before receiving the activation message, and power supply to circuits other than a circuit that detects the signal and a circuit that receives the activation message are stopped.
In addition, in the above-mentioned sensor network system, the sensor is a microphone to detect a speech.
According to another aspect of the present invention, there is provide a communication method for use in a sensor network system including a plurality of node devices each having a sensor array and known position information. The node devices are connected with each other in a network via predetermined propagation paths by using a predetermined communication protocol, and the sensor network system collects data measured at each of the node devices so as to be aggregated into one base station by using a time-synchronized sensor network system. Each of the node devices includes a sensor array, a direction estimation processor part, and a communication processor part. The sensor array is configured to arrange a plurality of sensors in an array form. The direction estimation processor part operates when detecting a signal from a predetermined signal source received by the sensor array on the basis of the signal, to transmit a detected message to the base station and to estimate an arrival direction angle of the signal and transmit an angle estimation value to the base station and is activated in response to an activation message at a time of detecting a signal received via a predetermined number of hops from other node devices to estimate an arrival direction angle of the signal and transmit an angle estimation value to the base station. The communication processor part performs an emphasizing process on a signal from a predetermined signal source received by the sensor array for each of the node devices belonging to a cluster designated by the base station in correspondence with the speech source, and transmits a signal that has undergone the emphasizing process to the base station. The communication method including the following steps:
calculating by the base station a position of the signal source on the basis of the angle estimation value of the signal from each of the node devices and position information of each of the node devices, designating a node device located nearest to the signal source as a cluster head node device, and transmitting information of the position of the signal source and the designated cluster head node device to each of the node devices, thereby clustering each of the node devices located within the number of hops from the cluster head node device as a node device belonging to each cluster, and
performing an emphasizing process by each of the node devices on the signal from the predetermined signal source received by the sensor array for each of the node devices belonging to the cluster designated by the base station in correspondence with the speech source, and transmitting the signal that has undergone the emphasizing process to the base station.
The above-mentioned communication method further includes a step of setting each of the node devices into a sleep mode before detecting the signal and before receiving the activation message, and stopping power supply to circuits other than a circuit that detects the signal and a circuit that receives the activation message.
In addition, in the above-mentioned communication method, the sensor is a microphone to detect a speech.
Therefore, according to the sensor network system and the communication method therefor of the present invention, by configuring the network paths specialized for data aggregation coping with the physical arrangement of a plurality of signal sources by utilizing the signal of the object of sensing for the clustering, cluster head determination, and routing on the sensor network, redundant paths are reduced, and the efficiency of data aggregation can be improved at the same time. Moreover, by virtue of the reduced communication overhead for configuring the paths, the network traffic is reduced, and the operating time of the communication circuit of large power consumption can be reduced. Therefore, the data aggregation can be performed more efficiently, the network traffic can be remarkably reduced, and the power consumption of the sensor node device can be reduced in the sensor network system by comparison to the prior art.
These and other objects and features of the present invention will become clear from the following description taken in conjunction with the preferred embodiments thereof with reference to the accompanying drawings throughout which like parts are designated by like reference numerals, and in which:
Preferred embodiments of the present invention will be described below with reference to the drawings. In the following preferred embodiments, like components are denoted by like reference numerals.
As described in the prior art, an independent distributed type routing algorithm is indispensable in a sensor network configured to include a number of node devices. A plurality of source origins of signals of the object of sensing exist in a sensing area, and routing using clustering is effective for configuring optimal paths for them. According to the preferred embodiments of the present invention, a sensor network system capable of efficiently performing data aggregation by using a speech source localization system in a sensor network system relevant to a microphone array network system intended for acquiring a speech of a high sound quality, and a communication method therefor are described below.
First Preferred Embodiment
Referring to
(1) an AD converter circuit 51 connected to a plurality of sound pickup microphones 1;
(2) a speech estimation processor part (for voice activity detection, hereinafter referred to as a VAD processor part, and VAD is referred to as speech activity detection hereinafter) 52 connected to the AD converter circuit 51 to detect a speech signal;
(3) an SRAM (Static Random Access Memory) 54, which temporarily stores a speech signal or a speech signal including a sound signal or the like (the sound signal means a signal at an audio frequency of, for example, 500 Hz or an ultrasonic signal) that has been subjected to AD-conversion by the AD converter circuit 51;
(4) an SSL processor part 55, which executes speech source localization processing to estimate the position of a speech source for the digital data of a speech signal or the like outputted from the SRAM 54, and outputs the results to the SSS processor part 56;
(5) an SSS processor part 56, which executes a speech source separation process to extract a specific speech source for the digital data of the speech signal or the like outputted from the SRAM 54 and the SSL processor part 55, and collects speech data of high SNR obtained as the results of the process by transceiving the data to and from other node devices via a network interface circuit 57; and
(6) a network interface circuit 57, which configures a data communication part to transceive speech data, and is connected to other peripheral sensor node devices Nn (n=1, 2, . . . , N).
The sensor node devices Nn (n=0, 1, 2, . . . , N) have the same configuration as each other, and the sensor node device N0 of the base station can obtain speech data whose SNR is further improved by aggregating the speech data in the network. It is noted that the VAD processor part 52 and a power supply manager part 53 are used for the speech source localization of the first preferred embodiment, whereas they are not used as a principle in the position estimation of the second preferred embodiment. Moreover, distance estimation described later is executed in, for example, the SSL processor part 55.
In the system configured as above, input speech data from 16 microphones 1 is digitized by the AD converter circuit 51, and the information of the speech data is stored into the SRAM 54. Subsequently, the information is used for speech source localization and speech source separation. The speech processing including them is executed by the power supply manager part 53 that saves standby electricity and the VAD processor part 52. The speech processor part is turned off when no speech exists in the peripheries of the microphone array, and the power management is basically necessary because the numbers of microphones 1 waste much power when not in use.
Referring to
The distinguished features of the present system are as follows.
(1) In order to activate the entire node device, low-power speech activity detection is performed.
(2) For the speech source localization, the speech source is localized (auditorily localized).
(3) In order to reduce the sound noise level, the speech source separation process is performed.
Moreover, the sub-array node devices are mutually connected to support intercommunications. Therefore, the speech data obtained at the node devices can be collected to further improve the SNR of the speech source. In the present system, a number of microphone arrays are configured via interactions with the peripheral node devices. Therefore, calculation can be distributed among the node devices. The present system has scalability (extendability) in the aspect of the number of microphones. Moreover, each of the node devices executes preparatory processing for the picked-up speech data.
The microphone array network of the present preferred embodiment is configured to include a number of microphones whose power consumption easily becomes tremendous. An intelligent microphone array system according to the present preferred embodiment is required to operate with a limited energy source in order to save power as far as possible. Since the speech processing unit and the microphone amplifier consume power to a certain extent even when the environment is quiet, speech processing with power saving is effective. Although the present inventor and others has proposed low power consumption VAD hardware implementation to reduce the standby electricity of the sub-arrays in a conventional apparatus, a zero-cross algorithm for VAD is used in the present preferred embodiment. As apparent from
According to the VAD of the present inventor and others, the sampling frequency can be reduced to 2 kHz, and the bit count per sample can be set to 10 bits. A single microphone is sufficient for detecting a signal, and the remaining 15 microphones are also turned off likewise. These values are sufficient for detecting the human words, and in this case, the 0.18-μm CMOS process consumes only power of 3.49 μW.
By separating the low-power VAD processor part 52 from the speech processor part, the speech processor part (SSL processor part 55, SSS processor part 56, etc.) can be turned off by using the power supply manager part 53. Further, all the VAD processor parts 52 of all the node devices are required to operate. The VAD processor part 52 is activated merely by a limited number of node devices in the system. In the VAD processor part 52, a processor relevant to the main signal starts execution upon detecting a speech signal, and the sampling frequency and the bit count are increased to sufficient values. It is noted that the parameters to determine the analog factors in the specifications of the AD converter circuit 51 can be changed in accordance with the specific application integrated in the system.
Next, a distributedly arranged speech capturing process is described below.
(1) a technique using geometrical position information; and
(2) a statistical technique using no position information to improve the main speech source.
The system of the present preferred embodiment was premised on the fact that the node device positions in the network had been known, and therefore, a delay-sum beam to form an algorithm classified in the geometrical method (See, for example, the Non-Patent Document 6 and
Detailed descriptions are provided below separately for a case of two-dimensional speech source localization and a case of three-dimensional speech source localization. First of all, the two-dimensional speech source localization method of the present invention is described with reference to
Then, weighting is performed for the intersections of the speech source direction estimation results of two node devices between the node device 1 and the node device 2, between the node device 1 and the node device 3, and so on. In this case, the weight is determined on the basis of the maximum response intensity of the MUSIC method of each of the node devices (e.g., the product of the maximum response intensities of two node devices). In
The balloons (positions and scales) that represent a plurality of obtained weights become speech source position candidates. Then, the speech source position is estimated by obtaining the barycenter of the plurality of obtained speech source position candidates. In the case of
The three-dimensional speech source localization method of the present invention is described next with reference to
Then, weighting is performed for the intersections of speech source direction estimation results of two node devices between the node device 1 and the node device 2, between the node device 1 and the node device 3, and so on. However, it is often the case where no intersection can be obtained in the three-dimensional case. Therefore, the intersection is obtained virtually on a line segment that connects the straight lines of the speech source direction estimation results of two node devices at the shortest distance. It is noted that the weight is determined on the basis of the maximum response intensity of the MUSIC method at each of the node devices (e.g., the product of the maximum response intensities of two node devices) in a manner similar that of the two-dimensional case. In
The balloons (positions and scales) that represent a plurality of obtained weights become speech source position candidates. Then, the speech source position is estimated by obtaining the barycenter of the plurality of obtained speech source position candidates. In the case of
One implemental example of the present invention is described.
Then, the processor 4 of the speech pickup processor part 2 transmits the speech source direction estimation results and the maximum response intensity to the speech processing server 20 shown in
As described above, speech localization is distributedly performed in each of the node devices, the results are integrated in the speech processing server, and the aforementioned two-dimensional localization and the three-dimensional localization processing are performed to estimate the position of the speech source.
The node device having the microphone array subjects the signal from the microphone array to A/D conversion (step S11), and receives the speech pickup signal of each microphone as an input (step S13). By using the speech signals picked up from the microphones, the direction of the speech source is estimated by the processor mounted on the node device operating as the speech pickup processor part (step S15).
As shown in the graph of
In the speech processing server, the data sent from respective node devices are received (step S21). A plurality of speech source position candidates are calculated from the maximum response intensity of each of the node devices (step S23). Then, the position of the speech source is estimated on the basis of the speech source direction estimation result (A) and the maximum response intensity (B) (step S25).
The three-dimensional speech source localization accuracy is described below.
By using the three Cases A to C, the number of node devices and the dispersion of speech source direction estimation errors of the node devices were changed, and the results of three-dimensional position estimation were compared to one another. Regarding the three-dimensional position estimation, each of the node devices selects one other party of communication at random, and obtains a virtual intersection.
The results of the measurement are shown in
Another implemental example of the present invention is described.
Then, the processor 4 of the speech pickup processor part 2 exchanges data of speech source direction estimation results between the processor and adjacent node devices and other node devices. The processor 4 of the speech pickup processor part 2 executes processing of the aforementioned two-dimensional localization or three-dimensional localization from the speech source direction estimation results and the maximum response intensities of the plurality of node devices including the self-node device, and estimates the position of the speech source.
Second Preferred Embodiment
The sensor node device has the configuration of
Referring to
In order to aggregate speech data among the sensor node devices N0 to N2 in the present preferred embodiment, it is required to synchronize time (timer value) at all the sensor node devices in the network. In the present preferred embodiment, a synchronization technique configured by adding linear interpolation to the known flooding time synchronization protocol (FTSP) is used. The FTSP is to achieve high-accuracy synchronization only by simple communications in one direction. Although the synchronization accuracy by the FTSP is equal to or smaller than one microsecond between adjacent sensor node devices, there are variations in the quartz oscillators owned by the sensor node devices, and a time deviation disadvantageously occurs with a lapse of time after the synchronization process as shown in
In the proposed system of the present preferred embodiment, a time deviation between sensor node devices is stored at the time of time synchronization by the FTSP, and the time progress of the timer is adjusted by linear interpolation. Assuming that a reception time stamp at the first synchronization time is the timer value on the receiving side, by adjusting the time progress of the timer only in the period of a time stamp at the second synchronization time, the dispersion of the oscillation frequency can be corrected. With this arrangement, a time deviation after completing the synchronization can be suppressed within 0.17 microseconds per second. Even if the time synchronization by the FTSP occurs once in one minute, the time deviation between sensor node devices is suppressed within 10 microseconds by performing linear interpolation, and the performance of the speech source separation can be maintained.
By storing a relative time (e.g., the elapsed time is defined as a relative time on an assumption that the time when the first sensor node device is turned on is zero) or the absolute time (e.g., the day, hour, minute and second on a calendar is set as the time), the time synchronization is performed among the sensor node devices by the aforementioned method. The time synchronization is used for measuring the accurate distance between sensor node devices as described later.
Referring to
Subsequently, in step S32, the tablet T1 transmits an “SSL instruction signal of an instruction to prepare for receiving with the microphone 1 and execute the SSL processing in response to the sound signal” to the tablets T3 and T4, and thereafter, transmits a sound generation instruction signal to generate a sound signal to the tablet T2 after a lapse of a predetermined time. In this case, the tablet T1 is also brought into a standby state of the sound signal. The tablet T2 generates a sound signal in response to the sound generation instruction signal, and transmits the signal to the tablets T1, T3 and T4. The tablets T1, T3 and T4 estimate and calculate the arrival direction of the sound signal by executing the speech source localizing process on the basis of the received sound signal using the MUSIC method described in detail in the first preferred embodiment, and store the calculated results into the built-in memory. That is, an angle to the tablet T2 is estimated, calculated and stored by the SSL processing of the tablets T1, T3 and T4.
Further, in step S33, the tablet T1 transmits an “SSL instruction signal of an instruction to prepare for receiving with the microphone 1 and execute the SSL processing in response to the sound signal” to the tablets T2 and T4, and thereafter, transmits a sound generation instruction signal to generate a sound signal to the tablet T3 after a lapse of a predetermined time. In this case, the tablet T1 is also brought into the standby state of the sound signal. The tablet T3 generates a sound signal in response to the sound generation instruction signal, and transmits the signal to the tablets T1, T2 and T4. The tablets T1, T2 and T4 estimate and calculate the arrival direction of the sound signal by executing the speech source localizing process on the basis of the received sound signal using the MUSIC method described in detail in the first preferred embodiment, and store the calculated results into the built-in memory. That is, an angle to the tablet T3 is estimated, calculated and stored by the SSL processing of the tablets T1, T3 and T4.
Furthermore, in step S34, the tablet T1 transmits an “SSL instruction signal of an instruction to prepare for receiving with the microphone 1 and execute the SSL processing in response to the sound signal” to the tablets T2 and T3, and thereafter, transmits a sound generation instruction signal to generate a sound signal to the tablet T4 after a lapse of a predetermined time. In this case, the tablet T1 is also brought into the standby state of the sound signal. The tablet T4 generates a sound signal in response to the sound generation instruction signal, and transmits the signal to the tablets T1, T2 and T3. The tablets T1, T2 and T3 estimate and calculate the arrival direction of the sound signal by executing the speech source localizing process on the basis of the received sound signal using the MUSIC method described in detail in the first preferred embodiment, and store the calculated results into the built-in memory. That is, an angle to the tablet T4 is estimated, calculated and stored by the SSL processing of the tablets T1, T2 and T3.
Subsequently, in step S35 to perform data communications, the tablet T1 transmits an information reply instruction signal to the tablet T2. In response to this, the tablet T2 sends an information reply signal that includes the distance between the tablets T1 and T2 calculated in step S31 and the angles when the tablets T1, T3 and T4 are viewed from the tablet T2 calculated in steps S31 to S34 back to the tablet T1. Moreover, the tablet T1 transmits an information reply instruction signal to the tablet T3. In response to this, the tablet T3 sends an information reply signal that includes the distance between the tablets T1 and T3 calculated in step S31 and the angles when the tablets T1, T2 and T4 are viewed from the tablet T3 calculated in steps S31 to S34 back to the tablet T1. Further, the tablet T1 transmits an information reply instruction signal to the tablet T4. In response to this, the tablet T4 sends an information reply signal that includes the distance between the tablets T1 and T4 calculated in step S31 and the angles when the tablets T1, T2 and T3 are viewed from the tablet T4 calculated in steps S31 to S34 back to the tablet T1.
In the SSL general processing of the tablet T1, the tablet T1 calculates the distances between the tablets on the basis of the information collected as described above as follows as described with reference to
The SSL general processing of the tablet T1 may be performed by only the tablet T1 that is the master or performed by all the tablets T1 to T4. That is, at least one tablet or server apparatus (e.g., SV of
The lengths of the other sides can be obtained likewise by using the twelve angles and the length d. If each sensor node device can perform the aforementioned time synchronization, each sensor node device can obtain the distance from a difference between the speech start time and the arrival time. Although the number of node devices is four in
Although the two-dimensional position is estimated in the above second preferred embodiment, the present invention is not limited to this, and the three-dimensional position may be estimated by using a similar numerical expression.
Further, mounting of sensor node devices on a mobile terminal is described below. Regarding the practical use of the network system, it can be considered to not only use the sensor node devices fixed to a wall and a ceiling but also mounted on a mobile terminal like a robot. If the position of a person to be recognized can be estimated, it is possible to make a robot approach the person to be recognized for image collection of higher resolution and speech recognition of higher accuracy. Moreover, mobile terminals such as smart phones that have been recently rapidly popularized have difficulties in acquiring the positional relations of the terminals at a short distance although they can acquire their own current positions by using the GPS function. However, if the sensor node devices of the present network system are mounted on mobile terminals, it is possible to acquire the positional relations of the terminals that are located at a short distance and unable to be discriminated by the GPS function or the like by performing speech source localization by mutually dispatching speeches from the terminals. In the present preferred embodiment, two types of a message exchange system and a multiplayer hockey game system were mounted as applications that utilize the positional relations of the terminals by using the programming language of java.
In the present preferred embodiment, a tablet personal computer to execute the application and prototype sensor node devices were connected together. A general-purpose OS is mounted as the OS of the tablet personal computer, and a wireless network is configured by having a wireless LAN function compliant to USB2.0 ports in two places and IEEE802.1b/g/n protocol. The prototype sensor node device microphones are arranged at intervals of 5 cm on four sides of the tablet personal computer, and a speech source localization module is operating at the sensor node devices (configured by an FPGA) to output localization results to the tablet personal computer. The position estimation accuracy in the present preferred embodiment is about several centimeters, and the accuracy becomes remarkably higher than that of the prior art.
Third Preferred Embodiment
Referring to
(1) an AD converter circuit 51 connected to a plurality of microphones 1 for speech pickup;
(2) a VAD processor part 52 connected to the AD converter circuit 51 to detect a speech signal;
(3) an SRAM 54, which temporarily stores speech data of a speech signal and the like including a speech signal or a sound signal that has been subjected to AD conversion by the AD converter circuit 51;
(4) a delay-sum circuit part 58, which executes delay-sum processing for the speech data stored in the SRAM 54;
(5) a microprocessor unit (MPU), which executes sound source localization processing to estimate the position of the speech source for the speech data outputted from the SRAM 54, subjects the results to speech source separation processing (SSS processing) and other processing, and collects high-SNR speech data obtained as the result of the processing by transceiving the data to and from other node devices via the data communication part 57a;
(6) a tinier and parameter memory 57b, which includes a timer for time synchronization processing and a parameter memory to store parameters for data communications, and is connected to the data communication part 57a and the MPU 50; and
(7) a data communication part 57a, which configures a network interface circuit to transceive the speech data, control packets and so on, and is connected to other peripheral sensor node devices Nn (n=1, 2, . . . , N).
Although the sensor node devices Nn (n=1, 2, . . . , N) have mutually similar configurations, the sensor node device N0 of the base station can obtain speech data whose SNR is further improved by aggregating the speech data in the network.
Referring to
(1) a physical layer circuit part 61, which transceives speech data, control packets and so on, and is connected to other peripheral sensor node devices Nn (n=1, 2, . . . , N);
(2) an MAC processor part 62, which executes medium access control processing of speech data, control packets and so on, and is connected to the physical layer circuit part 61 and a time synchronizing part 63;
(3) a time synchronizing part 63, which executes time synchronization processing with other node devices, and is connected to the MAC processor part 62 and the timer and parameter memory 57b, and;
(4) a receiving buffer 64, which temporarily stores the speech data or data of control packets and so on extracted by the MAC processor part 62, and outputs them to a header analyzer 66;
(5) a transmission buffer 65, which temporarily stores packets of speech data, control packets and so on generated by the packet generator part 68, and outputs them to the MAC processor part 62;
(6) a header analyzer 66, which receives the packet stored in the receiving buffer 64, analyzes the header of the packet, and outputs the results to a routing processor part 67 or a VAD processor part 50, a delay-sum circuit part 52, and an MPU 59;
(7) a routing processor part 67, which determines routing as to which node device the packet is to be transmitted on the basis of analysis results from the header analyzer 66, and outputs the result to the packet generator part 68; and
(8) a packet generator part 68, which receives the speech data from the delay-sum circuit part 52 or the control data from the MPU 59, generates a predetermined packet on the basis of the routing instruction from the routing processor part 67, and outputs the packet to the MAC processor part 62 via the transmission buffer 65.
Moreover, referring to
(1) self-node device information (node device ID and XY coordinates of the self-node device) that has been preparatorily determined and stored;
(2) path information (part 1) (transmission destination node device ID in the base station direction) acquired at time period T11;
(3) path information (part 2) (transmission destination node device ID of cluster CL1, transmission destination node device ID of cluster CL2, . . . , transmission destination node device ID of cluster CLN) acquired at time period T12; and
(4) cluster information (cluster head node device ID (cluster CL1), XY coordinates of speech source SS1, cluster head node device ID (cluster CL2), XY coordinates of speech source SS2, . . . , cluster head node device ID (cluster CLN), XY coordinates of speech source SSN) acquired at time periods T13 and T14.
It is assumed that the node devices Nn (n=1, 2, . . . , N) are located on a flat plane and has predetermined coordinates (known) in a predetermined XY coordinate system, and the position of each speech source is measured by position measurement processing.
In the operation example of
Subsequently, when speech signals are generated from the two speech sources SSA and SSB, the node devices (node devices N4 to N7 indicated by black circle in
Subsequently, the node device at which the VAD processor part 52 responded and the node devices (node devices N1 to N8 other than the base station N0 in the operation example) activated by the wakeup message estimate the direction of the speech source by using the microphone array network system, and transmit the results to the base station N0. The path to be used at this time is the path of the spanning tree configured in
In this case, the sub-arrays are connected together by using UTP cables. The 10BASE-T Ethernet (registered trademark) protocol is used as a physical layer. In the data-link layer, the power consumption of the protocol that adopts LPL (Low-Power-Listening) (See, for example, the Non-Patent Document 11) is reduced.
The present inventor and others conducted experiments with three sub-arrays in
According to the signal waveforms measured after the time synchronization processing, the maximum time lag immediately after completion of the FTSP synchronization processing was 1 μs, and the maximum time lags between sub-arrays with linear interpolation and without linear interpolation were 10 microseconds and 900 microseconds, respectively, per minute.
Subsequently, referring to
As described above, according to the prior art cluster base routing, clustering has been performed on the basis of only the information of the network layer. On the other hand, in order to configure a path optimized to each signal source in an environment in which a plurality of signal sources of the object of sensing exist in a large-scale sensor network, a sensor node device clustering technique based on the sensing information has been necessary. Accordingly, the method of the present invention has actualized the path formulation more specified for applications by using the signal information (information of the application layer) sensed in cluster head selection and the cluster configuration. Moreover, by combining the method with a wakeup mechanism (hardware) like the VAD processing part 52 in the microphone array network, the power consumption saving performance can be further improved.
Although the sensor network system relevant to the microphone array network system intended for acquiring a speech of a high-quality sound has been described in the aforementioned preferred embodiments, the present invention is not limited to this but allowed to be applied to sensor network systems relevant to a variety of sensors of temperature, humidity, person detection, animal detection, stress detection, optical detection, and the like.
Although the present invention has been fully described in connection with the preferred embodiments thereof with reference to the accompanying drawings, it is to be noted that various changes and modifications are apparent to those skilled in the art. Such changes and modifications are to be understood as included within the scope of the present invention as defined by the appended claims unless they depart therefrom.
Number | Date | Country | Kind |
---|---|---|---|
2011-164986 | Jul 2011 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
6195046 | Gilhousen | Feb 2001 | B1 |
20060023871 | Shaffer et al. | Feb 2006 | A1 |
20060221769 | Van Loenen et al. | Oct 2006 | A1 |
20090262604 | Funada | Oct 2009 | A1 |
20100008515 | Fulton et al. | Jan 2010 | A1 |
20100191525 | Rabenko et al. | Jul 2010 | A1 |
Number | Date | Country |
---|---|---|
2008-58342 | Mar 2008 | JP |
2008-99075 | Apr 2008 | JP |
2008-113164 | May 2008 | JP |
2008026463 | Mar 2008 | WO |
Entry |
---|
Ralph O. Schmidt, “Multiple Emitter Location and Signal Parameter Estimation”, IEEE Transactions on Antennas and Propagation, vol. AP-34, No. 3, Mar. 1986, pp. 276-280. |
Eugene Weinstein et al., “LOUD: A 1020-Node Modular Microphone Array and Beamformer for Intelligent Computing Spaces”, MIT, MIT/LCS Technical Memo MIT-LCS-TM-642, Apr. 2004, pp. 1-18. |
Alessio Brutti et al., “Classification of Acoustic Maps to Determine Speaker Position and Orientation from a Distributed Microphone Network”, In Proceedings of ICASSP, vol. IV, Apr. 2007, pp. 493-496. |
Wendi Rabiner Heinzelman et al., “Energy-Efficient Communication Protocol for Wireless Microsensor Networks”, Proceedings of the 33rd Hawaii International Conference on System Sciences-2000, vol. 8, Jan. 2000, pp. 1-10. |
Vivek Katiyar et al., “A Survey on Clustering Algorithms for Heterogeneous Wireless Sensor Networks”, International Journal of Advanced Networking and Applications, vol. 02, Issue 04, 2011, pp. 745-754. |
Jacob Benesty et al., “Springer Handbook of Speech Processing”, Springer, 50. Microphone Arrays, 2008, pp. 1021-1041. |
Futoshi Asano et al., “Sound Source Localization and Signal Separation for Office Robot “Jijo-2””, Proceedings of the 1999 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Taipei, Taiwan, R.O.C., Aug. 1999, pp. 243-248. |
Miklós Maróti et al., “The Flooding Time Synchronization Protocol”, Proceedings of 2nd ACM SenSys, Nov. 2004, pp. 39-49. |
Takashi Takeuchi et al., “Cross-Layer Design for Low-Power Wireless Sensor Node Using Wave Clock”, IEICE Transactions on Communications, vol. E91-B, No. 11, Nov. 2008, pp. 3480-3488. |
Maleq Khan et al., “Distributed Algorithms for Constructing Approximate Minimum Spanning Trees in Wireless Sensor Networks”, IEEE Transactions on Parallel and Distributed Systems, vol. 20, No. 1, Jan. 2009, pp. 124-139. |
Wei Ye et al., “Medium Access Control With Coordinated Adaptive Sleeping for Wireless Sensor Networks”, IEEE/ACM Transactions on Networking, vol. 12, No. 3, Jun. 2004, pp. 493-506. |
Number | Date | Country | |
---|---|---|---|
20130029684 A1 | Jan 2013 | US |