From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sox-users-bounces@lists.sourceforge.net>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from lists.sourceforge.net (lists.sourceforge.net [216.105.38.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 4AA581F66F
	for <normalperson@yhbt.net>; Tue, 17 Nov 2020 20:43:30 +0000 (UTC)
Received: from [127.0.0.1] (helo=sfs-ml-1.v29.lw.sourceforge.com)
	by sfs-ml-1.v29.lw.sourceforge.com with esmtp (Exim 4.90_1)
	(envelope-from <sox-users-bounces@lists.sourceforge.net>)
	id 1kf7ox-0004J0-DB; Tue, 17 Nov 2020 20:43:23 +0000
Received: from [172.30.20.202] (helo=mx.sourceforge.net)
 by sfs-ml-1.v29.lw.sourceforge.com with esmtps
 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1)
 (envelope-from <jn.ml.sxu.88@wingsandbeaks.org.uk>)
 id 1kf7ow-0004Im-1q
 for sox-users@lists.sourceforge.net; Tue, 17 Nov 2020 20:43:22 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=sourceforge.net; s=x; h=Message-ID:References:In-Reply-To:Subject:To:From:
 Date:Content-Transfer-Encoding:Content-Type:MIME-Version:Sender:Reply-To:Cc:
 Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender:
 Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=X9aDWcxloUcFXjpfkbMIQHEgrQaFo5JZYBMIGHz1sIk=; b=bmYap2jl/TbUXy8UD0AwicL7MV
 qHPvkp+/VaWtfS4f/OLGYL0pLt8nUjFymg8sWtZLykf6Ivdw7iMjThRC/6EKVf88HgTGjC2+h6xXL
 Ghh22KNFxcnQ8iqG3XprLNl4oRO89Nmstt51E/gDh2EheoulivasVZdN5T19TmmcMo7E=;
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x
 ; h=Message-ID:References:In-Reply-To:Subject:To:From:Date:
 Content-Transfer-Encoding:Content-Type:MIME-Version:Sender:Reply-To:Cc:
 Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender:
 Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=X9aDWcxloUcFXjpfkbMIQHEgrQaFo5JZYBMIGHz1sIk=; b=R8+klUMqSNHgMHDfPkbKld5abP
 doIroapnLb895uIZd4KFQsw7OBXOSf0iV0Z1lhBb/IKF8wSanTmv1UyFchI9U6i/2UbTgjpHt/V/f
 gzLzWDXLQynmwquhPKE070X/KlgyTcXd68/Frg0DatFoFEUg3MiJ8zfgaeABp5faFaVM=;
Received: from authenticated.a-painless.mh.aa.net.uk ([90.155.4.48])
 by sfi-mx-3.v28.lw.sourceforge.com with esmtps
 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92.2)
 id 1kf7or-001bbP-2A
 for sox-users@lists.sourceforge.net; Tue, 17 Nov 2020 20:43:22 +0000
Received: from a-webmail.thn.aa.net.uk ([2001:8b0:62::22]
 helo=webmail.aa.net.uk) by a-painless.mh.aa.net.uk with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92)
 (envelope-from <jn.ml.sxu.88@wingsandbeaks.org.uk>)
 id 1kf7ob-00088I-ML
 for sox-users@lists.sourceforge.net; Tue, 17 Nov 2020 20:43:01 +0000
Received: from cpc132308-sgyl43-2-0-cust392.know.cable.virginm.net
 ([92.237.237.137]) by webmail.aa.net.uk
 with HTTP (HTTP/1.1 POST); Tue, 17 Nov 2020 20:42:51 +0000
MIME-Version: 1.0
Date: Tue, 17 Nov 2020 20:42:51 +0000
From: Jeremy Nicoll - ml sox users <jn.ml.sxu.88@wingsandbeaks.org.uk>
To: sox-users@lists.sourceforge.net
In-Reply-To: <DB8P195MB0741A75BDDBFA908FCBBD407F3E20@DB8P195MB0741.EURP195.PROD.OUTLOOK.COM>
References: <DB8P195MB0741A75BDDBFA908FCBBD407F3E20@DB8P195MB0741.EURP195.PROD.OUTLOOK.COM>
Message-ID: <5e992665db5dc96823d9ef6830430718@wingsandbeaks.org.uk>
X-Sender: jn.ml.sxu.88@wingsandbeaks.org.uk
User-Agent: Roundcube Webmail/1.3.15
X-Headers-End: 1kf7or-001bbP-2A
Subject: Re: Search and remove audio sections
X-BeenThere: sox-users@lists.sourceforge.net
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <sox-users.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/options/sox-users>,
 <mailto:sox-users-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=sox-users>
List-Post: <mailto:sox-users@lists.sourceforge.net>
List-Help: <mailto:sox-users-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/sox-users>,
 <mailto:sox-users-request@lists.sourceforge.net?subject=subscribe>
Reply-To: sox-users@lists.sourceforge.net
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: sox-users-bounces@lists.sourceforge.net

On 2020-11-17 15:52, Dani wrote:
> Hi,
> 
> I have a bunch of old MP3 podcasts that have ads in them, at the
> beginning and the end. These are short bits of podcasts (up to 10
> minutes each), and the ads are quite distracting.
> The ads are about 30 seconds long and usually have a small familiar
> jingle before they start and after they end.

> I was wondering if there is an ability using SoX (or other tool) to do
> a "search and remove" on these, in a batch format - that would apply
> to hundreds of these files.
> Something in the form of:
> %jingle% -> the familiar jingle at the start and end of the ad, so...
> mimicking a made-up wildcard/regex search:
> Search for:  (%jingle% * %jingle%) ( * ) (%jingle% * %jingle%)
> Replace: ($2) - meaning - I leave only the center part.
> Is that something that can be done with audio?

I don't know.

If the jingles at the start and end of each ad are binary equal (which
they might be if an automated system placed copies of their contents
in the files) then in theory one could use a conventional file search
utility to locate each one.

Translating the byte offset from the start of an mp3 into the hh:mm:ss
(or sample count) position might be complicated, especially if the mp3
are stored with a variable bit rate rather than a fixed one.

Recognising the jingles might be hard if any aspect of mp3 compression
of the audio means that successive parts of jingles don't appear in the
exact same bit- and byte- pattern in each file.

If the files contain, say, continuous music (or maybe even speech) then
there's a tiny gap (hopefully of digital silence) then a jingle then a
second tiny gap then more content, I think you could possibly look for
the positions of the gaps.

If there's no gaps, or if eg the transition from speech or music to
jingle usually involves a jump in volume you could look for those.

If you knew where they seemed to be you could sanity-check them - ie
decide that they probably do enclose a jingle if they are (say) between
29.3 and 30.7 secs apart.

If that was ok, you could generate "trim" commands to remove them.


It might be possible if first you did some fairly extreme eq changes
on (a copy of) the file, eg to try to make speech sounds very quiet
but leave music at a higher level, to make it easier to spot the
transitions.

A quick look at the sox manual suggests that the "silence" or "vad"
effects, applied creatively (perhaps just to small snippets of each
file) might also help to identify where things are.  That's because
if an effect can remove a gap (if there is one) from (say) a 1 sec
piece of audio, then that's easy to identify. You'd certainly need
to experiment...


I've a script (written in oorexx, for use on a Windows system) that
essentially issues

  sox inputfile outputfile "trim" trimparm "stats"

with trimparm defined to extract eg a 3 second period of the inputfile
(eg from 15 seconds in, through to one sample less than 18 seconds in)
then it reads the "stats" output and stores the peak level information.
It does that for every 3-second period of data between two points in
the file.  Thus from each short chunk of the file it produces a line
of information like

                 Pk lev dB     -21.31    -21.31    -24.15

(which you'll see is one line of the stats output, if you look in the
sox manual)

which is, first the highest level from both/either channel, then the
highest level from the left channel, then the highest from the right
channel.  The script also calculates the difference in level between
the two peak levels.  With a whole set of these values it also tracks
maximum and minimum values of those.  The result is a file containing eg

(warning these lines might wrap and need copied elsewhere to read them 
more
easily)

sl hh:mm:ss.fr    hh:mm:ss.fr            Both        Left       Right    
     Diff
-- -----------    -----------         -------     -------     -------    
  -------
  1 00:00:00.00 to 00:00:03.00-1s       -33.46      -33.46      -34.27    
     0.81
  2 00:00:03.00 to 00:00:06.00-1s       -23.06      -23.06      -28.17    
     5.11
  3 00:00:06.00 to 00:00:09.00-1s       -23.01      -23.01      -25.27    
     2.26
  4 00:00:09.00 to 00:00:12.00-1s       -16.95      -16.95      -19.82    
     2.87
  5 00:00:12.00 to 00:00:15.00-1s       -18.51      -18.51      -20.37    
     1.86
  6 00:00:15.00 to 00:00:18.00-1s       -25.16      -25.16      -26.32    
     1.16
  7 00:00:18.00 to 00:00:21.00-1s       -22.36      -22.36      -28.50    
     6.14
  8 00:00:21.00 to 00:00:24.00-1s       -21.64      -21.64      -25.37    
     3.73
  9 00:00:24.00 to 00:00:27.00-1s       -21.30      -21.30      -24.37    
     3.07
10 00:00:27.00 to 00:00:30.00-1s       -22.11      -22.11      -24.41    
     2.30

...

     Minima:      -36.70      -36.70      -37.60        0.06
   at slice:          27          27          27          13


     Maxima:      -13.84      -13.84      -14.58        6.14
   at slice:          11          11          11           7


    Average:      -23.50      -21.27      -26.06        2.56


(where "sl" means "slice"), and in "hh.mm.ss.fr" the "fr" means 
"fraction" of a
seconds, ie tenths and hundredths)

I wrote this because I was trying to process recordings of a choir, and 
while the stats
effect applied to each whole song told me that one side of the choir was 
mostly louder
than the other, this was not true for every song they'd performed.  The 
script to look
at the situation every 3 secs helped me find out why - for example 
whether applause
levels or random audience noise were causing peaks that weren't 
characteristic of the
music.

I picked 3 second slices for no good reason.  One could use every tenth 
of a second but
then there'd be 30 times more results...   One could possibly use a 
results file like
this to look for predictable level changes (give or take half a db or 
so).

It just gave me a better idea of what was going on.  However, it was 
also in a format
that could have been read by another program if it was trying to detect 
moments of
interest.

-- 
Jeremy Nicoll - my opinions are my own


_______________________________________________
Sox-users mailing list
Sox-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-users