* Search and remove audio sections @ 2020-11-17 15:52 Dani 2020-11-17 20:42 ` Jeremy Nicoll - ml sox users ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Dani @ 2020-11-17 15:52 UTC (permalink / raw) To: sox-users@lists.sourceforge.net Hi, I have a bunch of old MP3 podcasts that have ads in them, at the beginning and the end. These are short bits of podcasts (up to 10 minutes each), and the ads are quite distracting. The ads are about 30 seconds long and usually have a small familiar jingle before they start and after they end. I was wondering if there is an ability using SoX (or other tool) to do a "search and remove" on these, in a batch format - that would apply to hundreds of these files. Something in the form of: %jingle% -> the familiar jingle at the start and end of the ad, so... mimicking a made-up wildcard/regex search: Search for: (%jingle% * %jingle%) ( * ) (%jingle% * %jingle%) Replace: ($2) - meaning - I leave only the center part. Is that something that can be done with audio? Thanks, Dani. _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Search and remove audio sections 2020-11-17 15:52 Search and remove audio sections Dani @ 2020-11-17 20:42 ` Jeremy Nicoll - ml sox users 2020-11-17 21:40 ` Jeff Learman 2020-11-19 2:21 ` Rafal Maszkowski 2020-11-20 14:18 ` Jan Stary 2 siblings, 1 reply; 7+ messages in thread From: Jeremy Nicoll - ml sox users @ 2020-11-17 20:42 UTC (permalink / raw) To: sox-users On 2020-11-17 15:52, Dani wrote: > Hi, > > I have a bunch of old MP3 podcasts that have ads in them, at the > beginning and the end. These are short bits of podcasts (up to 10 > minutes each), and the ads are quite distracting. > The ads are about 30 seconds long and usually have a small familiar > jingle before they start and after they end. > I was wondering if there is an ability using SoX (or other tool) to do > a "search and remove" on these, in a batch format - that would apply > to hundreds of these files. > Something in the form of: > %jingle% -> the familiar jingle at the start and end of the ad, so... > mimicking a made-up wildcard/regex search: > Search for: (%jingle% * %jingle%) ( * ) (%jingle% * %jingle%) > Replace: ($2) - meaning - I leave only the center part. > Is that something that can be done with audio? I don't know. If the jingles at the start and end of each ad are binary equal (which they might be if an automated system placed copies of their contents in the files) then in theory one could use a conventional file search utility to locate each one. Translating the byte offset from the start of an mp3 into the hh:mm:ss (or sample count) position might be complicated, especially if the mp3 are stored with a variable bit rate rather than a fixed one. Recognising the jingles might be hard if any aspect of mp3 compression of the audio means that successive parts of jingles don't appear in the exact same bit- and byte- pattern in each file. If the files contain, say, continuous music (or maybe even speech) then there's a tiny gap (hopefully of digital silence) then a jingle then a second tiny gap then more content, I think you could possibly look for the positions of the gaps. If there's no gaps, or if eg the transition from speech or music to jingle usually involves a jump in volume you could look for those. If you knew where they seemed to be you could sanity-check them - ie decide that they probably do enclose a jingle if they are (say) between 29.3 and 30.7 secs apart. If that was ok, you could generate "trim" commands to remove them. It might be possible if first you did some fairly extreme eq changes on (a copy of) the file, eg to try to make speech sounds very quiet but leave music at a higher level, to make it easier to spot the transitions. A quick look at the sox manual suggests that the "silence" or "vad" effects, applied creatively (perhaps just to small snippets of each file) might also help to identify where things are. That's because if an effect can remove a gap (if there is one) from (say) a 1 sec piece of audio, then that's easy to identify. You'd certainly need to experiment... I've a script (written in oorexx, for use on a Windows system) that essentially issues sox inputfile outputfile "trim" trimparm "stats" with trimparm defined to extract eg a 3 second period of the inputfile (eg from 15 seconds in, through to one sample less than 18 seconds in) then it reads the "stats" output and stores the peak level information. It does that for every 3-second period of data between two points in the file. Thus from each short chunk of the file it produces a line of information like Pk lev dB -21.31 -21.31 -24.15 (which you'll see is one line of the stats output, if you look in the sox manual) which is, first the highest level from both/either channel, then the highest level from the left channel, then the highest from the right channel. The script also calculates the difference in level between the two peak levels. With a whole set of these values it also tracks maximum and minimum values of those. The result is a file containing eg (warning these lines might wrap and need copied elsewhere to read them more easily) sl hh:mm:ss.fr hh:mm:ss.fr Both Left Right Diff -- ----------- ----------- ------- ------- ------- ------- 1 00:00:00.00 to 00:00:03.00-1s -33.46 -33.46 -34.27 0.81 2 00:00:03.00 to 00:00:06.00-1s -23.06 -23.06 -28.17 5.11 3 00:00:06.00 to 00:00:09.00-1s -23.01 -23.01 -25.27 2.26 4 00:00:09.00 to 00:00:12.00-1s -16.95 -16.95 -19.82 2.87 5 00:00:12.00 to 00:00:15.00-1s -18.51 -18.51 -20.37 1.86 6 00:00:15.00 to 00:00:18.00-1s -25.16 -25.16 -26.32 1.16 7 00:00:18.00 to 00:00:21.00-1s -22.36 -22.36 -28.50 6.14 8 00:00:21.00 to 00:00:24.00-1s -21.64 -21.64 -25.37 3.73 9 00:00:24.00 to 00:00:27.00-1s -21.30 -21.30 -24.37 3.07 10 00:00:27.00 to 00:00:30.00-1s -22.11 -22.11 -24.41 2.30 ... Minima: -36.70 -36.70 -37.60 0.06 at slice: 27 27 27 13 Maxima: -13.84 -13.84 -14.58 6.14 at slice: 11 11 11 7 Average: -23.50 -21.27 -26.06 2.56 (where "sl" means "slice"), and in "hh.mm.ss.fr" the "fr" means "fraction" of a seconds, ie tenths and hundredths) I wrote this because I was trying to process recordings of a choir, and while the stats effect applied to each whole song told me that one side of the choir was mostly louder than the other, this was not true for every song they'd performed. The script to look at the situation every 3 secs helped me find out why - for example whether applause levels or random audience noise were causing peaks that weren't characteristic of the music. I picked 3 second slices for no good reason. One could use every tenth of a second but then there'd be 30 times more results... One could possibly use a results file like this to look for predictable level changes (give or take half a db or so). It just gave me a better idea of what was going on. However, it was also in a format that could have been read by another program if it was trying to detect moments of interest. -- Jeremy Nicoll - my opinions are my own _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Search and remove audio sections 2020-11-17 20:42 ` Jeremy Nicoll - ml sox users @ 2020-11-17 21:40 ` Jeff Learman 2020-11-18 0:25 ` Jeremy Nicoll - ml sox users 0 siblings, 1 reply; 7+ messages in thread From: Jeff Learman @ 2020-11-17 21:40 UTC (permalink / raw) To: sox-users [-- Attachment #1.1: Type: text/plain, Size: 7069 bytes --] How many is "a bunch"? Unless you have hundreds, or unless there's some real obvious audible flag to indicate the ads, it'd be easier to find a good simple GUI audio editor that lets you simply select and delete the ads. My guess is that this really isn't going to be easy to do, and you'll have to spend the time to audit the results of trials to see how many files they worked on, and in the end it'll take a lot of time. On Tue, 17 Nov 2020 at 15:44, Jeremy Nicoll - ml sox users < jn.ml.sxu.88@wingsandbeaks.org.uk> wrote: > On 2020-11-17 15:52, Dani wrote: > > Hi, > > > > I have a bunch of old MP3 podcasts that have ads in them, at the > > beginning and the end. These are short bits of podcasts (up to 10 > > minutes each), and the ads are quite distracting. > > The ads are about 30 seconds long and usually have a small familiar > > jingle before they start and after they end. > > > I was wondering if there is an ability using SoX (or other tool) to do > > a "search and remove" on these, in a batch format - that would apply > > to hundreds of these files. > > Something in the form of: > > %jingle% -> the familiar jingle at the start and end of the ad, so... > > mimicking a made-up wildcard/regex search: > > Search for: (%jingle% * %jingle%) ( * ) (%jingle% * %jingle%) > > Replace: ($2) - meaning - I leave only the center part. > > Is that something that can be done with audio? > > I don't know. > > If the jingles at the start and end of each ad are binary equal (which > they might be if an automated system placed copies of their contents > in the files) then in theory one could use a conventional file search > utility to locate each one. > > Translating the byte offset from the start of an mp3 into the hh:mm:ss > (or sample count) position might be complicated, especially if the mp3 > are stored with a variable bit rate rather than a fixed one. > > Recognising the jingles might be hard if any aspect of mp3 compression > of the audio means that successive parts of jingles don't appear in the > exact same bit- and byte- pattern in each file. > > If the files contain, say, continuous music (or maybe even speech) then > there's a tiny gap (hopefully of digital silence) then a jingle then a > second tiny gap then more content, I think you could possibly look for > the positions of the gaps. > > If there's no gaps, or if eg the transition from speech or music to > jingle usually involves a jump in volume you could look for those. > > If you knew where they seemed to be you could sanity-check them - ie > decide that they probably do enclose a jingle if they are (say) between > 29.3 and 30.7 secs apart. > > If that was ok, you could generate "trim" commands to remove them. > > > It might be possible if first you did some fairly extreme eq changes > on (a copy of) the file, eg to try to make speech sounds very quiet > but leave music at a higher level, to make it easier to spot the > transitions. > > A quick look at the sox manual suggests that the "silence" or "vad" > effects, applied creatively (perhaps just to small snippets of each > file) might also help to identify where things are. That's because > if an effect can remove a gap (if there is one) from (say) a 1 sec > piece of audio, then that's easy to identify. You'd certainly need > to experiment... > > > > I've a script (written in oorexx, for use on a Windows system) that > essentially issues > > sox inputfile outputfile "trim" trimparm "stats" > > with trimparm defined to extract eg a 3 second period of the inputfile > (eg from 15 seconds in, through to one sample less than 18 seconds in) > then it reads the "stats" output and stores the peak level information. > It does that for every 3-second period of data between two points in > the file. Thus from each short chunk of the file it produces a line > of information like > > Pk lev dB -21.31 -21.31 -24.15 > > (which you'll see is one line of the stats output, if you look in the > sox manual) > > which is, first the highest level from both/either channel, then the > highest level from the left channel, then the highest from the right > channel. The script also calculates the difference in level between > the two peak levels. With a whole set of these values it also tracks > maximum and minimum values of those. The result is a file containing eg > > (warning these lines might wrap and need copied elsewhere to read them > more > easily) > > sl hh:mm:ss.fr hh:mm:ss.fr Both Left Right > Diff > -- ----------- ----------- ------- ------- ------- > ------- > 1 00:00:00.00 to 00:00:03.00-1s -33.46 -33.46 -34.27 > 0.81 > 2 00:00:03.00 to 00:00:06.00-1s -23.06 -23.06 -28.17 > 5.11 > 3 00:00:06.00 to 00:00:09.00-1s -23.01 -23.01 -25.27 > 2.26 > 4 00:00:09.00 to 00:00:12.00-1s -16.95 -16.95 -19.82 > 2.87 > 5 00:00:12.00 to 00:00:15.00-1s -18.51 -18.51 -20.37 > 1.86 > 6 00:00:15.00 to 00:00:18.00-1s -25.16 -25.16 -26.32 > 1.16 > 7 00:00:18.00 to 00:00:21.00-1s -22.36 -22.36 -28.50 > 6.14 > 8 00:00:21.00 to 00:00:24.00-1s -21.64 -21.64 -25.37 > 3.73 > 9 00:00:24.00 to 00:00:27.00-1s -21.30 -21.30 -24.37 > 3.07 > 10 00:00:27.00 to 00:00:30.00-1s -22.11 -22.11 -24.41 > 2.30 > > ... > > Minima: -36.70 -36.70 -37.60 0.06 > at slice: 27 27 27 13 > > > Maxima: -13.84 -13.84 -14.58 6.14 > at slice: 11 11 11 7 > > > Average: -23.50 -21.27 -26.06 2.56 > > > (where "sl" means "slice"), and in "hh.mm.ss.fr" the "fr" means > "fraction" of a > seconds, ie tenths and hundredths) > > I wrote this because I was trying to process recordings of a choir, and > while the stats > effect applied to each whole song told me that one side of the choir was > mostly louder > than the other, this was not true for every song they'd performed. The > script to look > at the situation every 3 secs helped me find out why - for example > whether applause > levels or random audience noise were causing peaks that weren't > characteristic of the > music. > > I picked 3 second slices for no good reason. One could use every tenth > of a second but > then there'd be 30 times more results... One could possibly use a > results file like > this to look for predictable level changes (give or take half a db or > so). > > It just gave me a better idea of what was going on. However, it was > also in a format > that could have been read by another program if it was trying to detect > moments of > interest. > > -- > Jeremy Nicoll - my opinions are my own > > > _______________________________________________ > Sox-users mailing list > Sox-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/sox-users > [-- Attachment #1.2: Type: text/html, Size: 8880 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] [-- Attachment #3: Type: text/plain, Size: 158 bytes --] _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Search and remove audio sections 2020-11-17 21:40 ` Jeff Learman @ 2020-11-18 0:25 ` Jeremy Nicoll - ml sox users 2020-11-18 8:01 ` Dani 0 siblings, 1 reply; 7+ messages in thread From: Jeremy Nicoll - ml sox users @ 2020-11-18 0:25 UTC (permalink / raw) To: sox-users On 2020-11-17 21:40, Jeff Learman wrote: > How many is "a bunch"? Unless you have hundreds, or unless there's > some > real obvious audible flag to indicate the ads, it'd be easier to find a > good simple GUI audio editor that lets you simply select and delete the > ads. Certainly that would make removal of the ads easier, but finding them - if the podcast files are random extracts of longer files (ie the ads may be anywhere at all in them, not eg always around the - say - 4 mins in point, is still going to be tricky. Assuming that one can near-instantly id something as jingle or ad if one clicks play at some point on a timeline, one's still going to have to do that manually maybe 30 times (ie every 20 seconds) in a ten-minute file. It might still be easier to use a script to generate (say) 2-second snips of each source file at 20-25 second intervals, then concatenate them and listen to a set to find out if any of them seem to contain ads/jingles. > My guess is that this really isn't going to be easy to do, and you'll > have to spend the time to audit the results of trials to see how many > files they worked on, and in the end it'll take a lot of time. I think that too. -- Jeremy Nicoll - my opinions are my own _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Search and remove audio sections 2020-11-18 0:25 ` Jeremy Nicoll - ml sox users @ 2020-11-18 8:01 ` Dani 0 siblings, 0 replies; 7+ messages in thread From: Dani @ 2020-11-18 8:01 UTC (permalink / raw) To: sox-users@lists.sourceforge.net Thank you all. Yeah - I figured it might be a very complicated task. I might just resort to listening to the ads after all, as much as I hate them... Or - going the long way and manually removing those by the hundreds... Thanks again! -----Original Message----- From: Jeremy Nicoll - ml sox users <jn.ml.sxu.88@wingsandbeaks.org.uk> Sent: Wednesday, November 18, 2020 2:25 AM To: sox-users@lists.sourceforge.net Subject: Re: [SoX-users] Search and remove audio sections On 2020-11-17 21:40, Jeff Learman wrote: > How many is "a bunch"? Unless you have hundreds, or unless there's > some real obvious audible flag to indicate the ads, it'd be easier to > find a good simple GUI audio editor that lets you simply select and > delete the ads. Certainly that would make removal of the ads easier, but finding them - if the podcast files are random extracts of longer files (ie the ads may be anywhere at all in them, not eg always around the - say - 4 mins in point, is still going to be tricky. Assuming that one can near-instantly id something as jingle or ad if one clicks play at some point on a timeline, one's still going to have to do that manually maybe 30 times (ie every 20 seconds) in a ten-minute file. It might still be easier to use a script to generate (say) 2-second snips of each source file at 20-25 second intervals, then concatenate them and listen to a set to find out if any of them seem to contain ads/jingles. > My guess is that this really isn't going to be easy to do, and you'll > have to spend the time to audit the results of trials to see how many > files they worked on, and in the end it'll take a lot of time. I think that too. -- Jeremy Nicoll - my opinions are my own _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Search and remove audio sections 2020-11-17 15:52 Search and remove audio sections Dani 2020-11-17 20:42 ` Jeremy Nicoll - ml sox users @ 2020-11-19 2:21 ` Rafal Maszkowski 2020-11-20 14:18 ` Jan Stary 2 siblings, 0 replies; 7+ messages in thread From: Rafal Maszkowski @ 2020-11-19 2:21 UTC (permalink / raw) To: sox-users On Tue, Nov 17, 2020 at 03:52:52PM +0000, Dani wrote: > I have a bunch of old MP3 podcasts that have ads in them, at the beginning and the end. These are short bits of podcasts (up to 10 minutes each), and the ads are quite distracting. > The ads are about 30 seconds long and usually have a small familiar jingle before they start and after they end. > I was wondering if there is an ability using SoX (or other tool) to do a "search and remove" on these, in a batch format - that would apply to hundreds of these files. > Something in the form of: > %jingle% -> the familiar jingle at the start and end of the ad, so... mimicking a made-up wildcard/regex search: > Search for: (%jingle% * %jingle%) ( * ) (%jingle% * %jingle%) > Replace: ($2) - meaning - I leave only the center part. > Is that something that can be done with audio? I am very interested in comparing recordings and in searching sound samples in recordings. I managed to work quite a lot on this last year and my work may be sufficient for my purposes but is unfinished and I have not tried to use it since a year. So it is not a ready solution but something you can try to work on and improve. It should not be very difficult to fit it to your purposes but more work is needed to make it universal. How it works. I really miss MPEG-7 in sox. There are even no beginnings of it there so I have used mpeg7ease ease program to extract audio spectrum envelopes of sound (aselnb script) and my program to compare or search extracts. Generating ASEs (FILE may be any sound or video file understandable by ffmpeg, eventually we use audio only): aselnb FILE… Then we can compare various ~/.ease/cache/*.ease files and get a positive result and time shift between them or negative one: aselnbcmp -v -c "$name1" "$name2" aselnbcmp -v -c -P 16 -S 8 "$name1" "$name2" … or search one in another: aselnbcmp -v -s ~/.ease/cache/needle.ease ~/.ease/cache/haystack.ease The software I have used or have written may be found in: ftp://ftp.icm.edu.pl/private/rzm/patches/ase/ R. -- „Walczy on z całym zapamiętaniem przeciwko intelektowi” - z akt personalnych prof. A. Baeumlera _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Search and remove audio sections 2020-11-17 15:52 Search and remove audio sections Dani 2020-11-17 20:42 ` Jeremy Nicoll - ml sox users 2020-11-19 2:21 ` Rafal Maszkowski @ 2020-11-20 14:18 ` Jan Stary 2 siblings, 0 replies; 7+ messages in thread From: Jan Stary @ 2020-11-20 14:18 UTC (permalink / raw) To: sox-users On Nov 17 15:52:52, Dani@softco.co.il wrote: > I have a bunch of old MP3 podcasts that have ads in them, > at the beginning and the end. Are the ads always at the beginning and at the end, and never anywhere else? > The ads are about 30 seconds long and usually > have a small familiar jingle before they start and after they end. Usually: so not all of them have the jingle(s), or the jingle is not always the same, right? > I was wondering if there is an ability using SoX (or other tool) > to do a "search and remove" on these, in a batch format >- that would apply to hundreds of these files. General audio search is quite hard. But if you intend to actually listen to the 10 minutes of podcast, removing the ads manualy from the beginning and end is a matter of seconds on top of those 10 minutes. On Nov 17 20:42:51, jn.ml.sxu.88@wingsandbeaks.org.uk wrote: > If the jingles at the start and end of each ad are binary equal (which > they might be if an automated system placed copies of their contents > in the files) then in theory one could use a conventional file search > utility to locate each one. A prerequisite of that would be that oll the files are in the very same binary format. We know they are mp3s - are they the same samplerate, bitrate, etc? Because even if the jingle is one and the same every tome, it won't be, encoded into the individual mp3s. > Recognising the jingles might be hard if any aspect of mp3 compression > of the audio means that successive parts of jingles don't appear in the > exact same bit- and byte- pattern in each file. Yes, and they probably won't. > If the files contain, say, continuous music (or maybe even speech) then > there's a tiny gap (hopefully of digital silence) then a jingle then a > second tiny gap then more content, I think you could possibly look for > the positions of the gaps. The ads are supposed to be at the beginning and end. But cutting at silence is what I would go fro first, if there is a telling silence around the ads of course. Jan _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-11-20 14:45 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-17 15:52 Search and remove audio sections Dani 2020-11-17 20:42 ` Jeremy Nicoll - ml sox users 2020-11-17 21:40 ` Jeff Learman 2020-11-18 0:25 ` Jeremy Nicoll - ml sox users 2020-11-18 8:01 ` Dani 2020-11-19 2:21 ` Rafal Maszkowski 2020-11-20 14:18 ` Jan Stary
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/sox.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).