Students Projects

2025

Learning-Based Source Separation Using RRIR-Augmented Deep Neural Networks (DNNs)

Description

The objective of this project is to develop a deep neural network (DNN)-based source separation model that incorporates Relative Room Impulse Response (RRIR) features to enhance speech extraction in real-room acoustic environments. The process involves recording multi-speaker RIRs using a two-microphone setup and computing RRIRs to serve as input features for training a speech separation network such as U-Net, Conv-TasNet, or Wave-U-Net. The model will be trained and evaluated on both simulated and real-world room conditions, with performance compared against traditional blind source separation (BSS) methods like IVA and ICA, as well as standard deep learning approaches. The expected outcome is a robust RRIR-augmented DNN that demonstrates improved separation quality and speech intelligibility, outperforming conventional deep learning-based BSS techniques.

Deep Learning-Based Prediction of Energy Decay Curves (EDC) from Room Geometry and Materials

Description

The objective of this project is to develop a deep learning model, specifically using LSTM, to predict Energy Decay Curves (EDCs) based on detailed room characteristics such as dimensions, material absorption coefficients, scattering properties, and other geometric features. The workflow includes generating or collecting Room Impulse Responses (RIRs), extracting corresponding EDCs, and compiling a dataset of room parameters. After normalizing these features, the LSTM model will be trained using loss functions like MSE and optimizers such as Adam or SGD. The model’s performance will be assessed using metrics like RMSE, MAE, and correlation, and validated against real measured EDCs. The project will also deliver an interactive visualization tool to compare predicted and actual EDCs, demonstrating the model's effectiveness and practical utility in room acoustics analysis.

Natural Language Processing (NLP) for Text-to-Sign Language Conversion

Description

The objective of this project is to develop an NLP-based system for converting written text into sign language, enabling more accessible communication for the deaf and hard-of-hearing community. The project begins with a comprehensive literature review of existing models and approaches for text-to-sign language translation. A basic parser will be developed using NLP techniques, incorporating tokenization and sequence mapping to translate input text into corresponding sign gestures. The expected outcome includes a Python-based text preprocessor that maps words to sign language symbols, and a trained NLP model capable of understanding sentence structure and generating coherent sequences of sign language representations.

Natural Language Processing (NLP) for Text-to-Sign Language Conversion

Description

The objective of this project is to develop an NLP-based system for converting written text into sign language, enabling more accessible communication for the deaf and hard-of-hearing community. The project begins with a comprehensive literature review of existing models and approaches for text-to-sign language translation. A basic parser will be developed using NLP techniques, incorporating tokenization and sequence mapping to translate input text into corresponding sign gestures. The expected outcome includes a Python-based text preprocessor that maps words to sign language symbols, and a trained NLP model capable of understanding sentence structure and generating coherent sequences of sign language representations.

Blind Source Separation Using Relative Room Impulse Response (RRIR) Estimation

Description

The objective of this project is to develop a blind source separation (BSS) system that leverages Relative Room Impulse Response (RRIR) from a two-microphone array to separate overlapping speech in reverberant environments. The process involves recording Room Impulse Responses (RIR) using a controlled setup, computing RRIR to model cross-talk, and integrating this information into a Frequency-Domain Independent Component Analysis (FD-ICA) framework as a constraint. The system will be tested in both simulated and real-world acoustic settings, and its performance will be evaluated using metrics like Signal-to-Distortion Ratio (SDR) and Perceptual Evaluation of Speech Quality (PESQ). The expected outcome is a novel RRIR-guided ICA method that improves speaker separation by reducing cross-talk and enhancing audio quality compared to traditional ICA-based BSS techniques.

Collection and Preparation of Multichannel Audio Datasets

Description

The objective of this project is to collect or create comprehensive multichannel audio datasets that accurately simulate diverse and dynamic acoustic environments. These environments may include changing room acoustics, varying speaker configurations, and mobile sound sources to reflect real-world complexity. The focus will be on ensuring high-quality recordings, proper microphone array setups, and detailed metadata. The expected outcome is a well-structured, annotated dataset that can be used for training, testing, and benchmarking algorithms in applications such as source separation, localization, beamforming, and audio scene analysis.

Implement a rate control algorithm based on perceptual quality metrics (VMAF, SSIM, LPIPS) instead of PSNR

Description

This project aims to implement a rate control algorithm that optimizes video encoding based on perceptual quality metrics like VMAF, SSIM, and LPIPS, rather than relying on PSNR. A Python-based encoder will be developed, dynamically adjusting bitrate using integrated tools such as FFmpeg, OpenCV, and machine learning models. The primary goals are to enhance perceived visual quality at lower bitrates and to minimize bitrate fluctuations in streaming scenarios, ultimately improving user experience.

Implement and evaluate lapped transforms for video coding, comparing them with standard DCT-based approaches

Description

The objective of this project is to implement and evaluate lapped transforms for video coding, comparing their performance with traditional block-based DCT methods. The focus will be on implementing lapped orthogonal transforms and analyzing their impact on compression quality using metrics like PSNR, SSIM, and perceptual quality. Key goals include reducing blocking artifacts and enhancing visual quality, especially in low-bit-rate video scenarios.

Implement adaptive critically sampled filter banks for motion-compensated video compression

Description

The objective of this project is to implement adaptive, critically sampled filter banks to enhance motion-compensated video compression. The approach involves designing filters tailored to various motion types, such as slow or fast-moving scenes, using SciPy and OpenCV. The performance will be evaluated by comparing the motion prediction efficiency against HEVC standards. The primary goals are to improve the accuracy of motion compensation and reduce bit rates, particularly in high-motion video sequences.

Develop a web tool for video analysis with transformations, filtering, convolution, and effects using a Flask back-end

Description

The goal of this project is to develop a web-based tool for real-time video analysis, incorporating advanced video transformations, filtering, convolution, and special effects. The system will be built on a Flask back-end and will leverage OpenCV and FFmpeg for efficient video processing. The tool will support operations such as frame-by-frame manipulation, applying visual effects, performing convolutions for edge detection or blurring, and other analytical transformations. A RESTful Flask API will serve as the backbone, interfacing with a responsive and user-friendly web interface where users can upload videos, apply desired operations, and view results in real-time or download them. To ensure the tool performs efficiently with large video files, optimizations like streaming processing, asynchronous handling, and file caching will be integrated. Deployment will be handled using a scalable solution like Docker or a cloud platform, and comprehensive documentation will be provided to assist users and developers. The final outcome will be a powerful, interactive platform for video analysis and manipulation with a streamlined, Flask-based architecture.

2024

Fast Learning for Adaptive Source Separation in New Environments

Description

The project will focus on leveraging deep learning models for source separation, adapting to new environments in real time. This project will aim to improve separation performance in complex and dynamic soundscapes (take a noisy environments, such as overlapping human speakers with a background sound from other human at a distance in a reverb room) using deep learning models like convolutional neural networks (CNNs), recurrent neural networks (RNNs), or transformers.

Audio Classification for Environmental Monitoring

Description

Your environment is a public park environment and for this create a sound file (fifteen minute duration) that consists a variety of sound sources (such as human talking, calling some one, sound of birds, children are shouting, and fountains' sounds etc.) and identify all the possible sound source present in the environment. Also calculate the percentage of sound sources contribution to the environment for the duration of fifteen minutes (15 min). Use Machine Learning algorithm to and train the model with four different environments.

Build systems capable of removing unwanted noise from audio signals in real time

Description

You are in a situation where there is a public announcement going on in a very reverb indoor scene (i.e. at airport) and your unwanted noise is background noise (i.e. people are chatting, walking, ambiance sound (pink noise)). You task is to develop an algorithm which remove all the unwanted sound and just reproduce the announcement as clear as possible and with out much loss in the audio quality of the announcement as compared to the original announcement.

Separate mixed audio signals into individual sources, such as voices and background music

Description

You have a party at your home and your friends are chatting with each other in a back ground musics instrument players. Take a scene where two of the friends are talking (two speech signals) and three of the friends are playing piano, trumpet and guitar simultaneously. You task is to generate such a situation in an audio file as a single channel mixed audio and develop an algorithm to separate each sound source (channel).

Room acoustics simulation Models implementation in Python such as image source method

Description

Implement a room acoustics simulation tool using the image source method to model sound propagation and reflections in an enclosed room. The simulation will calculate the impulse response of at least three rooms of different sizes and with difference room acoustical treatments (such as, furnished room, empty room, big rooms with acoustical treatment, and big room without acoustical treatment) for different sound sources and receiver positions (at least four source positions and four receiver position), helping to understand how reverberation, absorption and reflections affects of the sound in rooms. Implement frequency-dependent absorption, where the absorption coefficients vary across frequency bands (low, mid, high).

Construct a Room Impulse Response (RIR) using Ray Tracing (RT) Simulation Methods

Description

Implement a room acoustics simulation tool using the Ray Tracing (RT) methods to model sound propagation and reflections in an enclosed room. The simulation will calculate the impulse response of at least three rooms of different sizes and with difference room acoustical treatments (such as, furnished room, empty room, big rooms with acoustical treatment, and big room without acoustical treatment) for different sound sources and receiver positions (at least four source positions and four receiver position), helping to understand how reverberation, absorption and reflections affects of the sound in rooms. Implement frequency-dependent absorption, where the absorption coefficients vary across frequency bands (low, mid, high). Extend the simulation to handle higher-order reflections more efficiently using ray tracing methods.

A web based tool to implement audio coders such as FFMPEG to analyze the signals

Description

A web based tool (using Flask, React etc. as a back-end and HTML, JS, CSS) to implement audio coders such as FFMPEG to analyze the signals in terms of bit rates, quantization, sampling and different distortions to analyze the audio effect

Audio source separation with black box optimization in different reverberation environments

Description

Audio source separation with black-box optimization in reverberant environments involves using optimization algorithms to isolate distinct sound sources from a mixture, without needing detailed knowledge of the environment. In reverberant spaces, where echoes and reflections distort audio, this method adjusts parameters iteratively to improve source separation. By exploring different configurations and evaluating outcomes, black-box optimization can adapt to varying room acoustics and effectively separate desired audio signals, such as speech or music, from unwanted reverberations, even in complex acoustic settings.

Loudspeaker beamforming to separate two audio of different langages from the main stream of mixed audio

Description

Loudspeaker beamforming is a technique used to enhance audio signal separation by directing sound energy toward a specific region or listener while minimizing interference from other directions. In the context of separating two audio streams in different languages from a mixed audio signal, beamforming can be employed to spatially isolate the sound sources. By using an array of loudspeakers beamforming algorithms can shape the sound wavefronts, focusing on distinct sound sources based on their spatial position. This enables the separation of audio signals from two distinct language sources, effectively isolating them from the main mixed stream.

Separate mixed audio signals into individual sources, such as voices and background sounds (take the cocktail party effect into considerations)

Description

The cocktail party effect, where multiple conversations or noises blend together, highlights the difficulty of isolating a single voice from surrounding distractions. Techniques like blind source separation, spatial filtering, and machine learning are commonly used to separate these sources by analyzing their characteristics, such as their location, frequency patterns, or temporal features. By accounting for the acoustic properties of the environment and leveraging advanced algorithms, these methods aim to improve clarity and intelligibility in mixed audio signals.

Personalized HRTFs (HRIRs) synthesis using image processing and machine learning. Capture the ear shape and calculate parameters and select best HRTF from database. Conduct listening experiment to validate the best personalized HRTFs

Description

Capture the ear shape and calculate parameters required to estimate the HRTF and select best HRTF from database (Arrange dataset of 100 HRTFs from open sources). Conduct MUSHRA listening experiment to validate the best personalized HRTFs. (The Listening experiment must have at least 5 participants and you have to estimate the HRTFs of the individual participants)

Reconstruct missing or corrupted parts of signals using interpolation or estimation techniques. Recovering missing samples in audio recordings for seamless playback

Description

In real-world audio recordings, there are often instances where parts of the signal are missing or corrupted due to factors such as noise, data loss, or hardware limitations. This assignment challenges students to recover these missing sections, allowing the audio to play back smoothly. Students will learn to implement interpolation and estimation techniques, such as linear interpolation, spline interpolation, or advanced machine learning methods, to estimate and fill in the missing audio samples. The process will involve analyzing the surrounding audio context, selecting the appropriate method for reconstruction, and testing the results to ensure the reconstructed audio is as natural and seamless as possible.
By the end of this assignment, students will have a working understanding of how to effectively recover missing parts of audio signals and ensure that the playback remains continuous and natural-sounding.

Analyze sequential data to identify patterns, trends, and anomalies over time

Description

Develop a Python-based project that uses ARIMA models to forecast time series data (e.g., weather). The goal is to analyze historical data to identify trends, seasonal patterns, and anomalies, and then use this information to make future predictions. After fitting the ARIMA model and forecasting, evaluate the model's performance using error metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or Root Mean Squared Error (RMSE). Integrate real-time data streams (e.g., weather updates) and make rolling predictions using the ARIMA model.

Process signals received by sensor arrays to estimate the direction of arrival of incoming signals. Implementing beamforming techniques for localizing sound sources in audio recordings

Description

In this project, you will process audio signals received by sensor arrays to estimate the direction of arrival (DOA) of incoming sound sources. By implementing beamforming techniques such as Delay-and-Sum and Minimum Variance Distortionless Response (MVDR), you will localize sound sources in audio recordings, enhancing the signal from the desired direction while reducing interference. The goal is to accurately determine the location of sound sources and improve the clarity of the audio by applying spatial filtering. You will also evaluate the performance of the beamforming methods and visualize the results.

Design Ultrasound Based 3D Indoor Positioning System

Description

This project involves designing an indoor positioning system that uses ultrasound technology to accurately determine the 3D location of objects or individuals within a defined indoor space. By leveraging the time-of-flight of ultrasound signals between transmitters and receivers, the system can calculate distances and triangulate the position in three dimensions. The project will focus on optimizing signal accuracy, minimizing interference, and implementing real-time tracking. The goal is to create a reliable and efficient solution for indoor navigation, suitable for applications like asset tracking, navigation in large buildings, or augmented reality experiences.

Sound Event Localization and Detection under general conversation or music listening environments; such as breaking of glass on floor or window during other voice and music activities

Description

This project focuses on detecting and localizing specific sound events, such as the breaking of glass, within environments with background noise like conversations or music. By using advanced audio processing and machine learning techniques, the system will differentiate and pinpoint events of interest amidst overlapping sounds. The goal is to accurately identify the occurrence and location of these sound events in real-time, even in dynamic and noisy environments, making it applicable for applications like security systems, emergency response, or smart home technologies.

Welcome !