The present invention relates to the field of optical devices for the acquisition and analysis of man-made shapes, images, colors, and signs, allowing a synesthetic experience through the association of sounds with the produced graphic elements and applicable in various communicative contexts, e.g., such as in the field of interactive videogames, computer science, neuroscience and neuroimaging, in the medical field for therapy (color therapy and music therapy) and/or neurostimulation through BCI (Brain-Computer Interface), HCI (Human-Computer Interaction) and SSD (Sensory Substitution Device), in visual arts, in social arts, but also usable in the pedagogical field.
The invention substantially consists of a device designed to generate a sound, from an image either chosen or produced by a user through a specific image capturing tool and software for processing it and associating the sound; the invention is usable in various contexts ranging from recreational to artistic or even educational.
Devices such as overhead projectors on which it is possible to project previously prepared images, such as photographs or the like, and slides written either beforehand or at the moment, are currently known.
Image projection devices are also known, which work both by means of slides, obviously prepared beforehand, and by acquiring images from digital media and in digital format.
In any case, these known devices allow only the projection of the image which is acquired by the device, and the graphic processing of the image is possible only by using the overhead projectors.
None of the devices listed above also allows associating a specific sound with the projected graphic element.
The present invention allows generating/associating a specific sound with any graphic element, said graphic element possibly being a line, a shape, a photograph, or simply a color.
Furthermore, the invention also allows the association of specific sounds with graphic elements generated at the moment, as a function of the space occupied by the latter and the time taken to generate them (DRAWING).
According to the invention, such a sound association/creation is achieved by using a hardware medium in combination with dedicated software capable of:
This allows the user to interact and compose coherently through the space/time of the visible matter (visible spectrum of the drawing) and the space/time of the audible matter (sound spectrum of the waveform), in order to control directly the modulation of sound frequencies (additive sound synthesis) from the color through the drawing (additive RGB mixing), thus generating sound at every variation in space and time.
A better understanding of the invention will be achieved by means of the following detailed description and with reference to the accompanying drawings, which show a preferred embodiment by way of non-limiting example.
In the drawings:
With reference to the figures listed above, the present invention comprises a hardware device which allows performing the analog image insertion operations and in which a system for acquiring the produced images is provided; the latter works in combination with a software component which allows the images to be acquired, processed, and encoded appropriately and finally converted from analog (original form) to digital as an acoustic spectrum.
Hardware
The hardware device substantially consists of an external module which, in a preferred but non-limiting embodiment, is shaped as a parallelepiped with a square base.
Said module is internally divided into two different superimposed parts and separated by a panel parallel to the bases which defines two superimposed compartments and which is provided with a hole in which the video acquisition device is housed, preferably consisting of a camera oriented towards the upper base.
The upper base consists of a transparent material plate and serves the function of working and measuring space.
This transparent plate, which forms the working surface, is preferably made of high-clarity glass with a single-layer, anti-reflective treatment applied to the part of the plate facing the inside of the module, i.e., towards the camera or other image acquisition device.
Said glass plate is made to maximize light transmission by eliminating the undesired reflections and refractions while maintaining the correct chromatic characteristics of the light passing therethrough.
The upper part between the glass plate and the surface containing the imaging device is internally provided with laterally arranged lighting means.
In a preferred but non-limiting embodiment, the lighting means comprise LEDs and opaque glass, more specifically the invention provides at least two LEDs arranged on two opposite side surfaces of the upper part of the module and housed inside light units embedded in the supporting structure of the module; advantageously, the LEDs are equipped with white opal diffuser glass.
The lower part, between the surface containing the image acquisition device and the lower base of the module, is entirely covered with a material adapted to absorb light and which allows avoiding the diffusion and refraction of undesired lights and reflections inside the module; this lower part substantially consists of a technical compartment to allow possible maintenance actions or adjustments of the sensor, as well as to obtain a sufficiently high module to operate without needing support surfaces for the structure.
It is worth noting that it is also possible not to provide said technical compartment underneath, e.g., by constraining the sensor directly to the bottom of the top.
According to the invention all the surfaces of the top are also coated with the light-absorbing material, except for the transparent plate and the lighting means.
In a preferred but non-limiting embodiment, said light-absorbing material is preferably black velvet.
The lighting means are such as to ensure a correct illumination of the working space, as well as a homogeneous diffusion of the light in the upper part of the module which ensures uniformity of illumination without shadows or reflections with respect to the lens of the acquisition device, so as not to create areas which are too bright or too in shade and distort the image acquisition.
According to the invention, the image acquisition device is a camera, which consists of a USB module with a CMOS sensor and an interchangeable lens.
The choice of the sensor is mainly related to the number of pixels that the software can use for the acquisition; the increase in the pixel management capacity of the software may be followed by a different choice of acquisition device oriented towards devices having a higher resolution because there is no resolution-sensitivity-dimension constraint related to the device which is described.
According to the invention, the acquisition device will be able to work both in RGB (color) mode and in grayscale (monochrome) mode; for this purpose, the device can be chosen from a monochrome and a color sensor, with the possibility to interchange them.
In addition to maintaining an ideal viewing angle, the lens configuration is determined by the need to “isolate” the two-dimensional working surface, through the depth-of-field effect (normally given by the lenses), so as to focus only on that surface and exclude everything which is beyond the working surface, through a progressive blurring (physiological to all optical systems with lenses, which is created by gradually moving away from the focus point, which here is the outer surface of the glass plate); this separation effect between the two-dimensional surface and everything which is beyond the surface is conceived as an aid to the acquisition software which is facilitated in distinguishing between what is placed/created/traced on the working surface and everything which is beyond, in the surrounding environment and in front of the glass itself (operator, ambient light, etc.); this aid for the software in selecting the figures by selecting the focus of the surface maximizes the system accuracy, thus ensuring that the sound conversion of the images is as concentrated as possible on the images/shapes created/placed on the surface rather than those present beyond.
In a preferred but non-limiting embodiment, the sensor selected is a Sony IMX322 sensor (1/2.9 inch diagonal size, 2.07Mpx, HD 1920p) and was chosen after careful analysis and is a good compromise between sensor size, shooting fluidity, image quality, light sensitivity, dynamic range, cost, and availability.
In the (non-limiting) constructional example described, the shooting optics consist of a lens for 1/2.7 inch format sensors with varying focal length of F 2.8-12, focal ratio f 1.4, manual focus, and CS thread mount dedicated to CCTV cameras.
The arrangement of the camera, i.e., of the image acquisition device, was established by calculating a field angle between 40° and 55° to simulate a viewing angle similar to that of the human eye, and decrease the natural geometric distortions caused by the shooting optics.
Advantageously, this choice contributes to the correct selection and calibration of the focus plane of the optics on the two-dimensional working area represented by the transparent plate.
Furthermore, an integrated electronic board equipped with local cooling means for the sensor/processor system is provided, preferably of the Peltier cell type with 12V power supply and RMS power of 60 W, also provided with an axial fan powered at 12V and integrated aluminum heat sink, required for the disposal of the heat generated by the continuous and prolonged operation of the CMOS sensor inside the module.
Advantageously, the cooling system limits the signal degradation due to heat development, thus limiting the “dark noise” effect, i.e., the so-called “thermal noise.”
As mentioned, in the example shown, the surface containing the camera is arranged parallel to the two upper and lower bases and at a distance from the transparent plate such as to ensure a correct shooting angle, e.g. 80 cm.
In the preferred, non-limiting embodiment described hereto, the hardware is substantially a parallelepiped of a height of about 110 cm and a square base; the glass plate used as the upper base and working surface has a size of 50×50 cm and a thickness of 1.5 cm; the surface containing the acquisition device is advantageously placed at a distance of about 80 cm from the glass plate placed on the upper base.
Software
As mentioned, the hardware component works in combination with a software component the purpose of which substantially is to convert the light spectrum into the acoustic spectrum.
Such a linear conversion allows the user to modulate and control the additive synthesis of waveforms produced over time (sound) through their own actions to move/draw images on the acquisition surface (space) during a given time interval (time). The ultimate goal is to allow full control of the synthetic modulation of sounds (WAVEFORMS) as a function of the images moved or drawn by the user.
In the described example, the software in hand was developed as a patch, or extension, of a commercially known program such as MAX MSP by Cycling '74.
Max is a graphical development environment for music and multimedia designed and updated by the software company Cycling '74, based in San Francisco, California, and has been used for over fifteen years by composers, performers, software designers, researchers, and artists interested in creating interactive software.
An API allows third parties to develop new routines (referred to as external objects). As a result, Max has a large user base of programmers—not related to Cycling '74
Precisely by virtue of its extensible design and graphical interface, Max is commonly considered a sort of lingua franca for the software development related to interactive music.
The processed patch detects the RGB values of the video and converts them into audio frequencies; therefore, each frequency will have its own intensity and duration derived from the saturation and brightness, respectively.
The operation of the patch is as follows: only the controls available to the operator are displayed on the patch start page, or presentation mode, these are:
According to the invention, the patch is configured to map the working area and allow a coherent generation/transformation (input-output) of sinusoidal waveforms over time (sound) through the acquisition of images moved or drawn by the user in the working space (drawing); the video image is processed within an RGB matrix with a size, e.g., 640*480 pixels compatibly with the performance of the software that, in the version used, does not support the number of calculations required for the processing of higher resolutions.
Each pixel in this matrix is defined by three values which are related to the saturation of the red (R), green (G), and blue (B) values; each of these values is in a range from 0 to 255.
The matrix of RGB values from the workspace used by the user is converted by the program as a frequency matrix: the three RGB values of each individual pixel are added over time, and their sum value (additive mixture of the RGB values used over time) is converted into a sound frequency value (additive synthesis of the sound frequency values from the space occupied by the images) which lies in a range from 64 to 8000 Hz, according to a relationship which could be defined as one of substantially direct proportionality such that, for example, as the sum of RGB values increases, the frequency of the corresponding sound will approach the upper end of the frequency range, thus allowing the user to perform a sound modulation in real-time (additive synthesis of sounds over time) through the neuromotor activity related to drawing and images moved/drawn on the surface (additive mixing of colors in space).
In other words, each user-generated image variation in SPACE and TIME (i.e., in the time it took to make that variation), corresponds to a degree of additive mixing of RGB values in space, which is directly proportional to a degree of additive synthesis of sounds over time. Such a percentage ratio is closely related to the values of the HSL array, space, and time, and allows the user to modulate the sinusoidal waveforms through his/her actions on the images (pixels and RGB values). Said sum of RGB values corresponds to the additive mixture of color frequencies used by the user in the image space in a given time interval, and is directly related (or proportional) to the additive synthesis of sounds generated over time as a result of the actions by the user himself/herself.
The RGB matrix, obtained for all the acquired image, is converted utilizing commercial programs into a new matrix with HSL values (opacity, saturation, and brightness) of 640*480 pixels in size. Where these HSL values, which are related to the space and time corresponding to image variations, allow the user to use the visible spectrum of RGB values to modulate the sound spectrum of frequency values.
Again in this case, the software component extracts two lists of values, related to both brightness and saturation, between 0 and 255 for each pixel of the matrix.
According to the invention, the brightness value is interpreted and converted by the software component as a sound duration value, while the saturation value corresponds to the sound intensity and is thus dependent on the amount of color (RGB) detected by the camera.
It is known that the parameters which define a sound are frequency, intensity, and duration; thus all the values identified by the software component allow associating, with each detected pixel, a frequency (sum of R, G, and B values converted into the auditory frequency range), an intensity (corresponding to the saturation value from the HSL array), and a duration (corresponding to the brightness value from the HSL array), and therefore a sound.
In this respect, it is worth noting that, according to a specific feature of the invention, the particular choice of the aforesaid parameters to generate a sound, from the image acquired from the transparent flat surface of said hardware component (Visible Spectrum=>Sound Spectrum), is innovative and original in that it allows taking into account the SPACE (which determines the output sound intensity) occupied by the image moved or drawn by the user and the TIME (which determines the output sound duration) taken by the user to draw that image on the transparent surface of the hardware component.
According to the invention, the camera sensor acquires the image by either placing or translating an image on the glass plate or even drawing it directly thereon.
Through the software component it is possible to manage, as mentioned, the data transmission from the hardware component, which data are received by the software component itself, which elaborates them attributing some RGB color “quantity” values to each identified pixel, which values form the frequency of the sound to be associated.
This RGB matrix is converted into an HSL array the saturation and brightness values of which define the intensity and duration of the sound, respectively.
The three values of frequency, intensity, and duration thus obtained uniquely define a sound related to a specific pixel.
It is worth noting that each sign drawn and/or each image placed on the plate corresponds to a specific sound because the matrix is processed in real-time, so even a “movement” of the image from one point to another of the glass plate will result in a variation in the parameters mentioned above and a consequent sound variation.
A variant of the invention (not shown) provides for the additional use of a neuroimaging apparatus, e.g., of the type comprising a helmet which is wearable by a user/subject to detect brain activity while drawing, where said apparatus generates images of the subject's brain activity in real time, and where said images are used, in addition to those drawn on the transparent surface of the hardware component, to generate an overall sound given by the sum of the sounds generated from the drawn images and the sounds generated from the images of the corresponding brain activity.
Thereby, the overall sound, generated by this variant of the invention, would take into account not only the image drawn by the user but also the effect on his/her brain (through the image of his/her brain activity) while:
The overall sound thus depends not only on the drawn image but also on the stimuli of the subject drawing it, while it is being drawn.
Number | Date | Country | Kind |
---|---|---|---|
102020000022453 | Sep 2020 | IT | national |
This application is the U.S. national phase of Inter-national Application No. PCT/IB2021/058685 filed Sep. 23, 2021, which designated the U.S. and claims priority to IT Patent Application No. 102020000022453 filed Sep. 23, 2020, the entire contents of each of which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2021/058685 | 9/23/2021 | WO |