From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3561 216.34.176.0/20 X-Spam-Status: No, score=0.7 required=3.0 tests=BAYES_50,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS,T_DKIM_INVALID shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from lists.sourceforge.net (lists.sourceforge.net [216.34.181.88]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 06DD120281 for ; Thu, 29 Jun 2017 14:20:41 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=sfs-ml-1.v29.ch3.sourceforge.com) by sfs-ml-1.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1dQaJ4-0001no-VV; Thu, 29 Jun 2017 14:20:30 +0000 Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194] helo=mx.sourceforge.net) by sfs-ml-1.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1dQaJ3-0001ni-PE for sox-users@lists.sourceforge.net; Thu, 29 Jun 2017 14:20:29 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=MIME-Version:Content-Type:In-Reply-To:References:Message-ID:Date:Subject:To:From; bh=V86yoCVWGocqLQELsAQfP4yZBfz7JUT3arGAEaP5CqY=; b=I4BO1t5YsbefgtRHsSzgRS51WxtmiW94xy0gzRWceahmHxu72d/r1EpG1mbARM3WzaAxiGYT3407Lq69iBCrxbngCU974XYp0Y9Kqh1u3JwSdX1I+zCp8yHFgvimbVoJkiKw7GsyeUtsZ/BBpVtttv63RzTl4U9AnV/0GCgnoDA=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x; h=MIME-Version:Content-Type:In-Reply-To:References:Message-ID:Date:Subject:To:From; bh=V86yoCVWGocqLQELsAQfP4yZBfz7JUT3arGAEaP5CqY=; b=DcaVGOtl8ltlnG3sczHIVX5iyZkHt6AdWwRXbtozfTL66sAy5437gQKGsBeAlxcjc1wad6sJSnNz7iLH7poRVnirQpzHd9IRyKJsPXU79C8OIg9QKd2bgyonW6tmPJaRHppCQHUR86DEENa5XNKXaAMwvD2WK1ZzR5SuAR2J/G8=; Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of ldc.upenn.edu designates 216.32.181.180 as permitted sender) client-ip=216.32.181.180; envelope-from=graff@ldc.upenn.edu; helo=NAM01-BY2-obe.outbound.protection.outlook.com; Received: from mail-by2nam01lp0180.outbound.protection.outlook.com ([216.32.181.180] helo=NAM01-BY2-obe.outbound.protection.outlook.com) by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76) id 1dQaJ1-0000mb-Kc for sox-users@lists.sourceforge.net; Thu, 29 Jun 2017 14:20:29 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=PennO365.onmicrosoft.com; s=selector1-PennO365-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=V86yoCVWGocqLQELsAQfP4yZBfz7JUT3arGAEaP5CqY=; b=AfvNQ3yEaHzg7H9/E4JPFvTChOk7M0ecTkvEt0Z6K7/6AnIyJRFaQRnbZDN/qFW/lqcb/N642HtynYon3FdRqA5+n6Qyz0LYELUyZPiJN2fHj35BeFL7jzhV9yqQqIieCWIANllLzYSpUqpb2X77ZSUUebGuUI60RvsorKGQuqE= Received: from DM5PR10MB1978.namprd10.prod.outlook.com (10.175.90.17) by DM5PR10MB1978.namprd10.prod.outlook.com (10.175.90.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1199.15; Thu, 29 Jun 2017 14:20:18 +0000 Received: from DM5PR10MB1978.namprd10.prod.outlook.com ([10.175.90.17]) by DM5PR10MB1978.namprd10.prod.outlook.com ([10.175.90.17]) with mapi id 15.01.1199.022; Thu, 29 Jun 2017 14:20:18 +0000 From: "Graff, David E" To: "sox-users@lists.sourceforge.net" Thread-Topic: [SoX-users] split stereo by channel and silence Thread-Index: AQHS7tlmvXrSL4Lr+kCzka5Rm8E7VKI7o5+AgAA78gCAAAU1wg== Date: Thu, 29 Jun 2017 14:20:18 +0000 Message-ID: References: <1498522051936-6063.post@n7.nabble.com> <20170629101541.GB63053@www.stare.cz>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: lists.sourceforge.net; dkim=none (message not signed) header.d=none; lists.sourceforge.net; dmarc=none action=none header.from=ldc.upenn.edu; x-originating-ip: [128.91.252.219] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DM5PR10MB1978; 7:nivkmH5PPbgf5CC07o5PR8v4jwvfqpe1Lp2Tbg3KquM6gFNkPQk/DeOvlJNjQWrcEX7pmLLJeJ45QgYf5eEoi/Pltw2LmsmylvFRdkG/1CS3rByz5960dtAh4DTH8YoGEfmCeMeJLUY1W4FH/cyahAvsF4iCrAMQWkJ35aFnqbwkmsz3mdWNxrEkx5P0w0A0B7RCboP4fn+M7ERUnkylhTHfO+Kyk9rnJcWPpyUmkeS6V4/q4meC3p3puVBV8OIoBs9aZZdGV1Eyrov4qdPpzQNbuVYmscgFyqDen6jHBGZ3QaVhQrdX+gyIwwzk7DMFJCgNVNbGlmmdMgBq2L2uJ6c2h7yvUywxy7gdCBmOQRM5gPwS9W5KCSYetFERfToLxbxKLM95IjN23DKdcsQrQtmG7nUfJcFkrGNRQHt6p5BgVQLgepiRx6QckOtiBzeupwrGiVmIdxIXLH08Ch505ZvRWFEE3SVi4C7BhLgGF261zO3TMF7NEGUwPNC0hhzqDHd2cz7CrFZsKB6Tr95JScKjUEUzIBGc57XgBYafyxj5I9pIcXlFHi/Pqkk8QXloNK/apPSn9upz0uMdEYgxEyERlADd+l8iKj4DG2vlvfV5PAkvyOkSvL9JIaAI6+nLFhTTLZDqgWzDiSq6ASg67ZJ4su7TJNNlenRYrxyeJsXVL3ZGxV3mly5DuR04cGYM/8NIV5c311SPLfJlOhieqNHpggrOYrvnPBfHYXHVxYV/tHjfVxh94akEbC51wM35pzPgUqshbWRePMXX5YoZ1A/mCXIkyKlCR6yT99AZz6k= x-ms-office365-filtering-correlation-id: f49c0ee9-2a6c-42f9-3d9d-08d4bef9f818 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254075)(300000503095)(300135400095)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DM5PR10MB1978; x-ms-traffictypediagnostic: DM5PR10MB1978: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(143289334528602)(209352067349851)(236129657087228)(42262312472803)(148574349560750)(247924648384137); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(100000703101)(100105400095)(10201501046)(3002001)(93006095)(93001095)(6041248)(20161123558100)(20161123562025)(20161123555025)(20161123560025)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DM5PR10MB1978; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DM5PR10MB1978; x-forefront-prvs: 0353563E2B x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39450400003)(39850400002)(39840400002)(39410400002)(39400400002)(24454002)(377454003)(2900100001)(19273905006)(74316002)(229853002)(53546010)(66066001)(2501003)(8676002)(5660300001)(54356999)(81166006)(6916009)(42882006)(2950100002)(76176999)(50986999)(2351001)(19627405001)(53936002)(88552002)(189998001)(33656002)(3660700001)(3280700002)(54896002)(6306002)(7736002)(6116002)(55016002)(75432002)(77096006)(102836003)(3846002)(99286003)(110136004)(53376002)(2906002)(38730400002)(16297215004)(14454004)(9686003)(25786009)(6246003)(86362001)(966005)(478600001)(8936002)(606006)(7696004)(236005)(6436002)(5640700003)(6506006); DIR:OUT; SFP:1102; SCL:1; SRVR:DM5PR10MB1978; H:DM5PR10MB1978.namprd10.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: ldc.upenn.edu X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Jun 2017 14:20:18.3461 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 6c4d949d-b91c-4c45-9aae-66d76443110d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR10MB1978 X-Headers-End: 1dQaJ1-0000mb-Kc Subject: Re: split stereo by channel and silence X-BeenThere: sox-users@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: sox-users@lists.sourceforge.net Content-Type: multipart/mixed; boundary="===============1249302732565228381==" Errors-To: sox-users-bounces@lists.sourceforge.net --===============1249302732565228381== Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_DM5PR10MB19783F9ABD9631CA234E7C56F1D20DM5PR10MB1978namp_" --_000_DM5PR10MB19783F9ABD9631CA234E7C56F1D20DM5PR10MB1978namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable According to online docs for Kaldi (http://kaldi-asr.org/doc/tools.html), y= ou should find a utility called "extract-segments", which will take either = a 1- or 2-channel wav file as input and will produce as output a listing of= speech segments with their time stamps. (It looks like using it on single-= channel data is easier/better, and it makes sense to do it this way, becaus= e the use of time stamps on the original data means that "silence" regions = are not deleted from the data, so portions of interest in the two separate = channels retain their original alignment relative to each other -- each spe= ech segment can be handled independently of others, and has a unique identi= fier to keep track of its position in the overall timeline of the original = recording. I haven't used Kaldi at all myself, but this approach to speech detection (= using a listing of time offsets, while preserving the full content of the o= riginal recording) is a pretty common procedure. Dave Graff ________________________________ From: Jon Nichols Sent: Thursday, June 29, 2017 9:50:14 AM To: sox-users@lists.sourceforge.net Subject: Re: [SoX-users] split stereo by channel and silence the reason why is i'm trying to use an ASR( Kaldi to be exact) to transcrib= e the audio. it seems to work better on short audio clips which is why the = split on silence and keeping the channels separate makes it easy to know wh= o the speaker is, plus it was was unintelligible to my model when they were= speaking over each other in a single mono file. i'm still very new to figuring out how to use Kaldi, so there easily could = be better way within that tool to handle this. On Thu, Jun 29, 2017 at 5:15 AM, Jan Stary > wrote: On Jun 26 17:07:31, jonlnichols@gmail.com wro= te: > i have stereo wav files, which each channel is different speakers in a > conversation. trying to figure out how best to split a stereo file by bo= th > its channel and silence, but still know the order the files should be pla= yed > in to hear the conversation has a whole. Why do you want to do this? > i don't want to merge the 2 > channels because often 1 channel has more background noise then the other > and sometime speakers will speak over each other and keeping them separat= e > will make it easier to understand them. You can play the one and then play the other, or just the parts where they speak over each other. > the problem is, for play back sometimes i should play multiple R.###.wav > files in a row, or multi L.###.wav files and i have no way of knowing whe= n i > should do this with my current setup. I you play the L and R files in a sequence (whether one-by-one or with occasional cluster of L or R as you describe), it will not be the conversation that happend, exactly in the places where they spoke over each other. > instead of just having an increment counter for the name, is there a way = to > have have it use the starting time( in seconds or whatever) for that segm= ent > of the file? that way i'd have the below files and could just sort by the > number for the play order. First please descdribe _why_ you are doing this. Are the parts when they both speak so uninteligible that you need to separate them into two mono strems to actually hear what each is saying? Jan ---------------------------------------------------------------------------= --- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users --_000_DM5PR10MB19783F9ABD9631CA234E7C56F1D20DM5PR10MB1978namp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

According to online docs for Kaldi (http://kaldi-asr.org/doc/tools.html), you should find a utility calle= d "extract-segments", which will take either a 1- or 2-channel wav file as input and will produce as output a li= sting of speech segments with their time stamps. (It looks like using it on= single-channel data is easier/better, and it makes sense to do it this way= , because the use of time stamps on the original data means that "silence" regions are not delete= d from the data, so portions of interest in the two separate channels = retain their original alignment relative to each other -- each speech segme= nt can be handled independently of others, and has a unique identifier to keep track of its position in the overall timel= ine of the original recording.


I haven't used Kaldi at all myself, but this approach to speech de= tection (using a listing of time offsets, while preserving the full content= of the original recording) is a pretty common procedure.


   Dave Graff



From: Jon Nichols <jonln= ichols@gmail.com>
Sent: Thursday, June 29, 2017 9:50:14 AM
To: sox-users@lists.sourceforge.net
Subject: Re: [SoX-users] split stereo by channel and silence
 
the reason why is i'm trying to use an ASR( Kaldi to be ex= act) to transcribe the audio. it seems to work better on short audio clips = which is why the split on silence and keeping the channels separate makes i= t easy to know who the speaker is, plus it was was unintelligible = to my model when they were speaking over each other in a single mono file.<= /span>

i'm still very new to figuring out ho= w to use Kaldi, so there easily could be better way within that tool to han= dle this.

On Thu, Jun 29, 2017 at 5:15 AM, Jan Stary <hans@stare.cz>= ; wrote:
On Jun 26 17:07:31, jonlnichols@gmail.com wrote:
> i have stereo wav files, which each channel is different speakers in a=
> conversation.  trying to figure out how best to split a stereo fi= le by both
> its channel and silence, but still know the order the files should be = played
> in to hear the conversation has a whole.

Why do you want to do this?

> i don't want to merge the 2
> channels because often 1 channel has more background noise then the ot= her
> and sometime speakers will speak over each other and keeping them sepa= rate
> will make it easier to understand them.

You can play the one and then play the other, or just the parts
where they speak over each other.

> the problem is, for play back sometimes i should play multiple R.###.w= av
> files in a row, or multi L.###.wav files and i have no way of knowing = when i
> should do this with my current setup.

I you play the L and R files in a sequence (whether one-by-one
or with occasional cluster of L or R as you describe), it will
not be the conversation that happend, exactly in the places
where they spoke over each other.

> instead of just having an increment counter for the name, is there a w= ay to
> have have it use the starting time( in seconds or whatever) for that s= egment
> of the file? that way i'd have the below files and could just sort by = the
> number for the play order.

First please descdribe _why_ you are doing this.
Are the parts when they both speak so uninteligible
that you need to separate them into two mono strems
to actually hear what each is saying?

        Jan


-----------------------------------------------------------------= -------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Sox-users mailing list
Sox-users@lists.sourcefo= rge.net
https://lists.sourceforge.net/lists/listi= nfo/sox-users

--_000_DM5PR10MB19783F9ABD9631CA234E7C56F1D20DM5PR10MB1978namp_-- --===============1249302732565228381== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot --===============1249302732565228381== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Sox-users mailing list Sox-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sox-users --===============1249302732565228381==--