Metadata (cookie, IP address, location) may be collected for the functioning of the site. If you do not want this data processed, then you should leave the site.
OK
download
datasets
We have made publicly available several databases of depersonalized biometric images that can be used exclusively for research (statistical data analysis to identify patterns and classify images). All databases are completely anonymized, are not used, and cannot be used to identify the identities of specific subjects (since they do not contain the information necessary for this).

Links to databases of other researchers (MNIST, Jakobovski) are located at the end of the section.
Echograms of the ear canal
AIC-ears-75
(version 1.0)
number of test subjects: 75 people
A dataset of anonymized (depersonalized) data from echograms of the ear canal of 75 subjects aged from 18 to 40 years. Each echogram is presented as a wav file (mono, 44 kHz, 16 bits). For each subject, 15 measurements of each ear were made (+- 1); after each measurement, the subject took off and put on headphones again (a device in the form of headphones with a built-in microphone for recording the reflected signal). Each subject was asked to listen to a mono audio signal of increasing and decreasing frequency (sliding modulated sine), obtained by linear frequency modulation (chirp signal). The signal frequency varied in the range from 1 kHz to 14 kHz, and the signal duration was 10 seconds (5 seconds the frequency increases, 5 seconds it decreases). The data set includes 2 folders for the right and left ears, each folder containing 75 subfolders with ear measurements of the corresponding subjects. DOWNLOAD
Handwritten images №1
AIC-sign-130
(version 1.0)
Genuine images (65 signers), Impostor images (65 signers)
Anonymized (depersonalized) handwritten images were reproduced by 130 signers aged 18 to 50 years. Each subject chose a word or set of characters at his own discretion, which he reproduced on a Wacom graphics tablet with a sampling rate of 200 points per second and 1024 levels of pen pressure on the tablet. The base is presented in two versions:
1. The raw dataset includes:
- “Geniune” images - 65 folders, each of which contains handwritten images of a certain subject (65 classes of images);
- “Impostor” images - one folder with 650 images (one class of images), reproduced by other 65 subjects (10 per person). It is recommended to use this folder as a test sample of “Impostors” to assess the likelihood of a type 2 error (“false admission of an Impostor”).
Each handwritten image is stored in a separate text file (SVG format, not an image format), in which each line describes one point (report) of the handwritten image: x, y, and p (x, y are the coordinates of the point, p is the pen pressure on the tablet). Each image is a set of points (dynamics) of reproduction of a handwritten word (characters), presented in the order of registration by the input device DOWNLOAD
2. Processed data an xml file (in a shortened format), which contains descriptions of the same images but is presented in the form of vectors of 556 features. The author's technique was used to extract features. DOWNLOAD
Handwritten images №2
AIC-sign-24
(version 1.0)
24 signers
A small set of features from depersonalized handwritten images of 24 subjects (24 classes of images) aged 18 to 35 years. The images were obtained on a Wacom tablet (scanning rate of 200 points per second, 1024 pressure levels) and are vectors of 335 features obtained using the author’s simplified method. The files do not contain handwritten images. The data set is presented in 2 versions:
- full xml file format. DOWNLOAD
- shortened xml file format. DOWNLOAD
Keyboard handwriting
AIC-key-32
(version 1.0)
number of test subjects: 32 people
A small set of anonymized (depersonalized) keyboard handwriting data of 32 subjects (32 image classes) aged 18 to 35 years. Before taking the tests, each subject had rested the previous day and was in a calm state at the time of the experiments. The neurological status of all subjects before the experiment was assessed as normal. The images are vectors of 63 features (time delays—times of holding keys and pauses between pressing adjacent keys, recorded when typing the phrase “the security system must be constantly improved”). The data set is presented as an XML file in a shortened format.
DOWNLOAD
Speech images
AIC-spkr-130
(version 1.1)
Genuine images (65 speakers),
Impostor images (65 speakers)
Anonymized (depersonalized) images of speech passwords were reproduced by 130 speakers aged 18 to 50 years. Voice passwords were short phrases of one, two or three (but short) words (“access control”, “allow access”). Each subject chose a password from a previously prepared dictionary. Subjects' passwords are not unique (i.e., one passphrase could be used by several subjects). The set represents "raw" data, divided by input attempts. Each image is a wav audio file (mono, 8 kHz, 16 bits) that contains one implementation of the speech password (the announcer said the password once). The dataset includes: - “Genuine” images - 65 folders, each of which contains the speech passwords of some subject (65 classes of images); - “Impostor” images - one folder with 650 images (one class of images) of other speech passwords, reproduced by other 65 subjects (10 per person). It is recommended to use this folder as a test sample of “Impostors” to assess the likelihood of a type 2 error (“false admission of an Impostor”).
DOWNLOAD
Mouse movement data
AIC-mouse-19
(version 1.0)
number of test subjects: 19 people
A small set of anonymized (depersonalized) data on the movement of a mouse manipulator. 19 subjects (19 image classes) aged 18 to 30 years used a special device(computer mouse equipped with sensors - a gyroscope and an accelerometer) in a specially developed program to perform test tasks. The subjects completed two series of tests, each series consisting of 20 tests. In the first series of tests, it is necessary to move the cursor between interface elements that have a fixed position on the screen; the second series included similar tests, but the location of the interface elements was determined randomly. The image of the mouse movement trajectories consists of three acceleration functions along the OX, OY, OZ axes and a time function. The image file format is text SVG, each line in the file has the format: “acceleration_Ох : acceleration_Оу : acceleration_Oz : time”. The set contains two folders with two series of tests, each folder contains 19 folders with the data of the corresponding subjects. DOWNLOAD
Electroencephalogram (EEG database №1)
AIC-eeg-19
(version 1.0)
device: Neuron-Spektr 4/P
number of test subjects: 18 people
Depersonalized (depersonalized) electroencephalograms (EEG) of 18 subjects aged 21 to 30 years. Before taking the tests, each subject had rested the previous day and was in a calm state at the time of the experiments. The neurological status of all subjects before the experiment was assessed as normal. The experiments are focused on the development of stable individual responses of the brain to a visual stimulus, occurring primarily in the striate and extrastriate cortex (17, 18 and 19 area by Brodmann). Data were collected with the participation of a professional (practicing) neurophysiologist (neurologist, somnologist, candidate of medical sciences), which confirms the correct use of electroencephalographic equipment. Using virtual reality glasses, subjects watched static (without animation) visual stimuli while lying down. Geometric shapes of various colors, as well as “ink” colored spots of Rorschach, were used as stimuli . At the same time, the EEG of the brain was recorded using the Neuron-Spectrum-4/P from Neurosoft (with a noise level of less than 0.3 μV and a signal quantization frequency of 5000 Hz per channel, subsequently converted to a frequency of 500 Hz), which is a 21-channel electroencephalograph. Out of 21 channels, 10 electrodes were used: Fpz, Fp1, Fp2, Fz, F3, F4, Cz, Oz, O1, O2, monopolar connection according to the “10-20” scheme. Most subjects completed a series of experiments twice on different days (files are marked with numbers 1 and 2). EEGs are presented in the following options:
  • files in edf+ format (DOWNLOAD);
  • files in the universal *.SHV format for quick loading into AIC, recordings are cut into images, each 5 seconds long (DOWNLOAD), 2,5 seconds (DOWNLOAD) and one second (DOWNLOAD).
Subject numbers are from 25 to 42.
Thermograms of the face and neck
(version 1.0)
taking into account the psychophysiological state (PPS) of the subjects
number of test subjects: 84 people

Anonymized (depersonalized) images of thermograms of the face and neck (together in one frame) of 84 subjects aged 18 to 28 years were recorded on a Flir e60 thermal imager (resolution 320x240). Moreover, each subject at different times was in the following six psychophysiological states (PPS):
  • “normal” (before the experiment, the subject was not subjected to any influences, and his neurological status was assessed as normal);
  • 3 stages of alcohol intoxication with blood alcohol content: 0.02-0.03‰, 0.03-0.05‰, 0.05-0.1‰ (before the experiment, the subjects took alcohol, the dosage was calculated using the Widmark formula)
  • sleepy (before the experiment, the subjects took sedatives - valerian, motherwort in accordance with the attached instructions);
  • stress (before the experiment, the subjects passed the Stroop test).

The set represents “raw” data - each thermogram is in a separate binary file. The data set includes 6 folders, each folder contains 2520 thermograms of subjects in the corresponding condition (30 thermograms for each person). DOWNLOAD


Links to databases of other researchers
(version 1.0)
that were used in AIC training examples
MNIST database
is a classic set of anonymized (depersonalized) monochrome images of numbers.For convenience, this database is built into AIC and does not need to be loaded directly when working in the Neural Network Designer (you can select it when training and testing a neural network).We have duplicated the MNIST database in a format more convenient for loading into AIC (the original 4 archives have been unpacked and grouped into two folders: test and train).
We recommend using this link:
https://cloud.mail.ru/public/2zsb/3Us2KHBz3
Original link to MNIST database:
http://yann.lecun.com/exdb/mnist/

Jakobovski base
is a set of anonymized (depersonalized) number sound data. We duplicated this database in a format more convenient for loading into AIC (so that after loading into AIC they would be ranked by class, we distributed the corresponding speech images into folders). We recommend using this link:
https://cloud.mail.ru/public/52Zx/3XEcEXRkr
Original link to the Jakobovski database (recordings are located in the recordings folder):
https://github.com/Jakobovski/free-spoken-digit-dataset

Made on
Tilda