From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mans Rullgard Newsgroups: gmane.comp.audio.sox.devel Subject: [PATCH 5/6] Add a sigma-delta modulator for DSD encoding Date: Wed, 16 Sep 2015 15:16:29 +0100 Message-ID: <1442412990-20764-5-git-send-email-mans@mansr.com> References: <1442412990-20764-1-git-send-email-mans@mansr.com> Reply-To: sox-devel@lists.sourceforge.net NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1442413042 2089 80.91.229.3 (16 Sep 2015 14:17:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 16 Sep 2015 14:17:22 +0000 (UTC) To: sox-devel@lists.sourceforge.net Original-X-From: sox-devel-bounces@lists.sourceforge.net Wed Sep 16 16:17:07 2015 Return-path: Envelope-to: gcasd-sox-devel@m.gmane.org X-ACL-Warn: X-Mailer: git-send-email 2.5.2 In-Reply-To: <1442412990-20764-1-git-send-email-mans@mansr.com> X-Spam-Score: 0.1 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 0.1 AWL AWL: Adjusted score from AWL reputation of From: address X-Headers-End: 1ZcDW2-00083B-6F X-BeenThere: sox-devel@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: sox-devel-bounces@lists.sourceforge.net Xref: news.gmane.org gmane.comp.audio.sox.devel:425 Archived-At: Received: from lists.sourceforge.net ([216.34.181.88]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZcDWC-0003NZ-CJ for gcasd-sox-devel@m.gmane.org; Wed, 16 Sep 2015 16:17:04 +0200 Received: from localhost ([127.0.0.1] helo=sfs-ml-2.v29.ch3.sourceforge.com) by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1ZcDWA-0003NX-Km; Wed, 16 Sep 2015 14:17:02 +0000 Received: from sog-mx-2.v43.ch3.sourceforge.com ([172.29.43.192] helo=mx.sourceforge.net) by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1ZcDWA-0003NQ-91 for sox-devel@lists.sourceforge.net; Wed, 16 Sep 2015 14:17:02 +0000 Received: from unicorn.mansr.com ([81.2.72.234]) by sog-mx-2.v43.ch3.sourceforge.com with esmtp (Exim 4.76) id 1ZcDW2-00083B-6F for sox-devel@lists.sourceforge.net; Wed, 16 Sep 2015 14:17:02 +0000 Received: by unicorn.mansr.com (Postfix, from userid 51770) id 20E3015393; Wed, 16 Sep 2015 15:16:47 +0100 (BST) This adds a sigma-delta modulator for 1-bit (DSD) encoding. It is invoked by the "dither" effect when the output precision is 1-bit or manually with choice of the following noise-shaping filters: fast Reasonably good quality while fast enough for real-time operation. This is the default. hq Lower noise and distortion than "fast" at the expense of being much slower. audiophile Somewhat better quality than "hq" and almost twice as slow. goldenear Slightly higher quality than "audiophile" and considerably slower. Prior to this encoder, the sampling rate should be increased, e.g. by means of the "rate" effect. --- msvc10/LibSoX.vcxproj | 1 + msvc10/LibSoX.vcxproj.filters | 3 + sox.1 | 39 +++ src/Makefile.am | 4 +- src/dither.c | 46 ++- src/effects.h | 1 + src/sdm.c | 666 ++++++++++++++++++++++++++++++++++++++++++ src/sdm.h | 42 +++ src/sdm_x86.h | 235 +++++++++++++++ 9 files changed, 1025 insertions(+), 12 deletions(-) create mode 100644 src/sdm.c create mode 100644 src/sdm.h create mode 100644 src/sdm_x86.h diff --git a/msvc10/LibSoX.vcxproj b/msvc10/LibSoX.vcxproj index c38a764..4a475cf 100644 --- a/msvc10/LibSoX.vcxproj +++ b/msvc10/LibSoX.vcxproj @@ -174,6 +174,7 @@ true true + true true diff --git a/msvc10/LibSoX.vcxproj.filters b/msvc10/LibSoX.vcxproj.filters index bed7cb2..01bda8c 100644 --- a/msvc10/LibSoX.vcxproj.filters +++ b/msvc10/LibSoX.vcxproj.filters @@ -603,5 +603,8 @@ Format Sources + + Effect Sources + diff --git a/sox.1 b/sox.1 index 2c4ca47..98a84a1 100644 --- a/sox.1 +++ b/sox.1 @@ -2012,6 +2012,10 @@ option is not given, then the pseudo-random number generator used to generate the white noise will be `reseeded', i.e. the generated noise will be different between invocations. .SP +If the target precision is 1-bit, the \fBsdm\fR effect is applied +automatically with default settings. Invoke it manually to control its +options. +.SP This effect should not be followed by any other effect that affects the audio. .SP @@ -2991,6 +2995,41 @@ The sampling rate must be one of: 44\*d1, 48, 88\*d2, 96 kHz. .SP This effect supports the \fB\-\-plot\fR global option. .TP +\fBsdm\fR [\fB\-f \fIfilter\fR] [\fB\-t \fIorder\fR] [\fB\-n \fInum\fR] [\fB-l \fIlatency\fR] +Apply a 1-bit sigma-delta modulator producing DSD output. The input +should be previously upsampled, e.g. with the \fBrate\fR effect, to a +high rate, 2\*d8224MHz for DSD64. The \fB\-f\fR option selects the +noise-shaping filter from the following list: +.RS +.IP \fBfast\fR +Reasonably good quality while fast enough for real-time operation. +This is the default. +.IP \fBhq\fR +Lower noise and distortion than \fBfast\fR at the expense of being +much slower. +.IP \fBaudiophile\fR +Somewhat better quality than \fBhq\fR and almost twice as slow. +.IP \fBgoldenear\fR +Slightly higher quality than \fBaudiophile\fR and considerably slower. +.RE +.TP +\ +All but the \fBfast\fR filter perform a partial trellis/viterbi search +with preset parameters which can be overridden using the following +options: +.RS +.IP \fB\-t \fIorder\fR +Set the trellis order, max 32. +.IP \fB\-n \fInum\fR +Set the number of paths to consider, max 32. +.IP \fB\-n \fIlatency\fR +Set the output latency, max 2048. +.RE +.TP +\ +The result of using these overrides is hard to predict and can include +high noise levels or instability. Caution is advised. +.TP \fBsilence \fR[\fB\-l\fR] \fIabove-periods\fR [\fIduration threshold\fR[\fBd\fR\^|\^\fB%\fR] [\fIbelow-periods duration threshold\fR[\fBd\fR\^|\^\fB%\fR]] .SP diff --git a/src/Makefile.am b/src/Makefile.am index e047580..ca7bae7 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -76,8 +76,8 @@ libsox_la_SOURCES += \ mcompand_xover.h noiseprof.c noisered.c \ noisered.h output.c overdrive.c pad.c phaser.c rate.c \ rate_filters.h rate_half_fir.h rate_poly_fir0.h rate_poly_fir.h \ - remix.c repeat.c reverb.c reverse.c silence.c sinc.c skeleff.c \ - speed.c splice.c stat.c stats.c stretch.c swap.c \ + remix.c repeat.c reverb.c reverse.c sdm.c sdm.h sdm_x86.h silence.c \ + sinc.c skeleff.c speed.c splice.c stat.c stats.c stretch.c swap.c \ synth.c tempo.c tremolo.c trim.c upsample.c vad.c vol.c \ ignore-warning.h if HAVE_PNG diff --git a/src/dither.c b/src/dither.c index 3351615..2af827a 100644 --- a/src/dither.c +++ b/src/dither.c @@ -20,6 +20,7 @@ #endif #include "sox_i.h" +#include "sdm.h" #include #undef RANQD1 @@ -263,6 +264,7 @@ typedef struct { double const * coefs; sox_bool dither_off; sox_effect_handler_flow flow; + sdm_t *sdm; } priv_t; #define CONVOLVE _ _ _ _ @@ -292,6 +294,13 @@ typedef struct { #define N 20 #include "dither.h" +static int flow_sdm(sox_effect_t * effp, const sox_sample_t * ibuf, + sox_sample_t * obuf, size_t * isamp, size_t * osamp) +{ + priv_t * p = (priv_t *)effp->priv; + return sdm_process(p->sdm, ibuf, obuf, isamp, osamp); +} + static int flow_no_shape(sox_effect_t * effp, const sox_sample_t * ibuf, sox_sample_t * obuf, size_t * isamp, size_t * osamp) { @@ -364,17 +373,17 @@ static int start(sox_effect_t * effp) if (effp->in_signal.precision <= p->prec || p->prec > 24) return SOX_EFF_NULL; /* Dithering not needed at this resolution */ - if (p->prec == 1) { - /* The general dither routines don't work in this case, so notify - user and leave it at that for now. - TODO: Some special-case treatment of 1-bit noise shaping will be - needed for meaningful DSD write support. */ - lsx_warn("Dithering/noise-shaping to 1 bit is currently not supported."); - return SOX_EFF_NULL; - } - effp->out_signal.precision = p->prec; + if (p->prec == 1) { + p->sdm = sdm_init(NULL, 0, 0, 0); + if (!p->sdm) + return SOX_EOF; + + p->flow = flow_sdm; + return SOX_SUCCESS; + } + p->flow = flow_no_shape; if (p->filter_name) { filter_t const * f; @@ -418,6 +427,23 @@ static int flow(sox_effect_t * effp, const sox_sample_t * ibuf, return p->flow(effp, ibuf, obuf, isamp, osamp); } +static int drain(sox_effect_t * effp, sox_sample_t * obuf, size_t * osamp) +{ + priv_t * p = (priv_t *)effp->priv; + if (p->sdm) + return sdm_drain(p->sdm, obuf, osamp); + *osamp = 0; + return SOX_SUCCESS; +} + +static int stop(sox_effect_t * effp) +{ + priv_t * p = (priv_t *)effp->priv; + if (p->sdm) + sdm_close(p->sdm); + return SOX_SUCCESS; +} + sox_effect_handler_t const * lsx_dither_effect_fn(void) { static sox_effect_handler_t handler = { @@ -430,7 +456,7 @@ sox_effect_handler_t const * lsx_dither_effect_fn(void) "\n shibata, low-shibata, high-shibata." "\n -a Automatically turn on & off dithering as needed (use with caution!)" "\n -p bits Override the target sample precision", - SOX_EFF_PREC, getopts, start, flow, 0, 0, 0, sizeof(priv_t) + SOX_EFF_PREC, getopts, start, flow, drain, stop, 0, sizeof(priv_t) }; return &handler; } diff --git a/src/effects.h b/src/effects.h index 450a5c2..e8987de 100644 --- a/src/effects.h +++ b/src/effects.h @@ -66,6 +66,7 @@ EFFECT(reverb) EFFECT(reverse) EFFECT(riaa) + EFFECT(sdm) EFFECT(silence) EFFECT(sinc) #ifdef HAVE_PNG diff --git a/src/sdm.c b/src/sdm.c new file mode 100644 index 0000000..4770ba7 --- /dev/null +++ b/src/sdm.c @@ -0,0 +1,666 @@ +/* Sigma-Delta modulator + * Copyright (c) 2015 Mans Rullgard + * + * This library is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by + * the Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + * + * This library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser + * General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this library; if not, write to the Free Software Foundation, + * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/* + * References: + * + * Derk Reefman, Erwin Janssen. 2002. + * "Signal processing for Direct Stream Digital: A tutorial for + * digital Sigma Delta modulation and 1-bit digital audio processing" + * http://www.emmlabs.com/pdf/papers/DerkSigmaDelta.pdf + * + * P.J.A. Harpe. 2003. + * "Trellis-type Sigma Delta Modulators for Super Audio CD applications" + * http://www.pieterharpe.nl/docs/report_trunc.pdf + * + * Richard Schreier. 2000-2011. + * "Delta Sigma Toolbox" + * http://www.mathworks.com/matlabcentral/fileexchange/19-delta-sigma-toolbox + */ + +#define _ISOC11_SOURCE + +#include "sox_i.h" +#include "sdm.h" + +#define MAX_FILTER_ORDER 8 +#define PATH_HASH_SIZE 128 +#define PATH_HASH_MASK (PATH_HASH_SIZE - 1) + +typedef struct { + const double a[MAX_FILTER_ORDER]; + const double g[MAX_FILTER_ORDER]; + int32_t order; + double scale; + const char *name; + int trellis_order; + int trellis_num; + int trellis_lat; +} LSX_ALIGN(32) sdm_filter_t; + +typedef struct sdm_state { + double state[MAX_FILTER_ORDER]; + double cost; + uint32_t path; + uint8_t next; + uint8_t hist; + uint8_t hist_used; + struct sdm_state *parent; + struct sdm_state *path_list; +} LSX_ALIGN(32) sdm_state_t; + +typedef struct { + sdm_state_t sdm[2 * SDM_TRELLIS_MAX_NUM]; + sdm_state_t *act[SDM_TRELLIS_MAX_NUM]; +} sdm_trellis_t; + +struct sdm { + sdm_trellis_t trellis[2]; + sdm_state_t *path_hash[PATH_HASH_SIZE]; + uint8_t hist_free[2 * SDM_TRELLIS_MAX_NUM]; + unsigned hist_fnum; + uint32_t trellis_mask; + uint32_t trellis_num; + uint32_t trellis_lat; + unsigned num_cands; + unsigned pos; + unsigned pending; + unsigned draining; + unsigned idx; + const sdm_filter_t *filter; + double prev_y; + uint64_t conv_fail; + uint8_t hist[2 * SDM_TRELLIS_MAX_NUM][SDM_TRELLIS_MAX_LAT / 8]; +}; + +static sdm_filter_t sdm_filter_fast = { + { + 8.11979821108649e-01, 3.21578526301959e-01, + 8.03842133084308e-02, 1.36652129069769e-02, + 1.62614939720868e-03, 1.18730980344801e-04, + 5.81753857463105e-06, -4.43443601283455e-08, + }, + { + 8.10778762576884e-05, 0, 6.65340842513387e-04, 0, + 1.52852264942192e-03, 0, 2.22035724073886e-03, 0, + }, + 8, + 0.492, + "fast", + 0, 0, 0, +}; + +static sdm_filter_t sdm_filter_hq = { + { + 1.05966158780858e+00, 5.47009636009057e-01, + 1.76263553121650e-01, 3.79953988065231e-02, + 5.31936695611806e-03, 4.64865473231071e-04, + 1.21930947998838e-05, + }, + { + 0, 3.96825873999969e-04, 0, 1.32436089566069e-03, + 0, 2.16898568341885e-03, 0, + }, + 7, + 0.50, + "hq", + 16, + 10, + 1280, +}; + +static sdm_filter_t sdm_filter_audiophile = { + { + 1.17270840974752e+00, 6.69435755948125e-01, + 2.38385844332401e-01, 5.67404687000751e-02, + 8.79926385368848e-03, 8.47664163271991e-04, + 2.69551713329985e-05, + }, + { + 0, 3.96825873999969e-04, 0, 1.32436089566069e-03, + 0, 2.16898568341885e-03, 0, + }, + 7, + 0.50, + "audiophile", + 24, + 16, + 1664, +}; + +static sdm_filter_t sdm_filter_goldenear = { + { + 1.33055162190254e+00, 8.60392723676436e-01, + 3.46524494169335e-01, 9.31146164773126e-02, + 1.63339758570028e-02, 1.76908163241072e-03, + 6.86294038857449e-05, + }, + { + 0, 3.96825873999969e-04, 0, 1.32436089566069e-03, + 0, 2.16898568341885e-03, 0, + }, + 7, + 0.50, + "goldenear", + 24, + 24, + 2048, +}; + +static const sdm_filter_t *sdm_filters[] = { + &sdm_filter_fast, + &sdm_filter_hq, + &sdm_filter_audiophile, + &sdm_filter_goldenear, + NULL, +}; + +static const sdm_filter_t *sdm_find_filter(const char *name) +{ + int i; + + if (!name) + return sdm_filters[0]; + + for (i = 0; sdm_filters[i]; i++) + if (!strcmp(name, sdm_filters[i]->name)) + return sdm_filters[i]; + + return NULL; +} + +#include "sdm_x86.h" + +#ifndef sdm_filter_calc +static double sdm_filter_calc(const double *s, double *d, + const sdm_filter_t *f, + double x, double y) +{ + const double *a = f->a; + const double *g = f->g; + double v; + int i; + + d[0] = s[0] - g[0] * s[1] + x - y; + v = x + a[0] * d[0]; + + for (i = 1; i < f->order - 1; i++) { + d[i] = s[i] + s[i - 1] - g[i] * s[i + 1]; + v += a[i] * d[i]; + } + + d[i] = s[i] + s[i - 1]; + v += a[i] * d[i]; + + return v; +} +#endif + +#ifndef sdm_filter_calc2 +static void sdm_filter_calc2(sdm_state_t *src, sdm_state_t *dst, + const sdm_filter_t *f, double x) +{ + const double *a = f->a; + double v; + int i; + + v = sdm_filter_calc(src->state, dst[0].state, f, x, 0.0); + + for (i = 0; i < f->order; i++) + dst[1].state[i] = dst[0].state[i]; + + dst[0].state[0] += 1.0; + dst[1].state[0] -= 1.0; + + dst[0].cost = src->cost + sqr(v + a[0]); + dst[1].cost = src->cost + sqr(v - a[0]); +} +#endif + +static inline unsigned sdm_histbuf_get(sdm_t *p) +{ + return p->hist_free[--p->hist_fnum]; +} + +static inline void sdm_histbuf_put(sdm_t *p, unsigned h) +{ + p->hist_free[p->hist_fnum++] = h; +} + +static inline unsigned get_bit(uint8_t *p, unsigned i) +{ + return (p[i >> 3] >> (i & 7)) & 1; +} + +static inline void put_bit(uint8_t *p, unsigned i, unsigned v) +{ + int b = p[i >> 3]; + int s = i & 7; + b &= ~(1 << s); + b |= v << s; + p[i >> 3] = b; +} + +static inline unsigned sdm_hist_get(sdm_t *p, unsigned h, unsigned i) +{ + return get_bit(p->hist[h], i); +} + +static inline void sdm_hist_put(sdm_t *p, unsigned h, unsigned i, unsigned v) +{ + put_bit(p->hist[h], i, v); +} + +static inline void sdm_hist_copy(sdm_t *p, unsigned d, unsigned s) +{ + memcpy(p->hist[d], p->hist[s], (size_t)(p->trellis_lat + 7) / 8); +} + +static inline int64_t dbl2int64(double a) +{ + union { double d; int64_t i; } v; + v.d = a; + return v.i; +} + +static inline int sdm_cmplt(sdm_state_t *a, sdm_state_t *b) +{ + return dbl2int64(a->cost) < dbl2int64(b->cost); +} + +static inline int sdm_cmple(sdm_state_t *a, sdm_state_t *b) +{ + return dbl2int64(a->cost) <= dbl2int64(b->cost); +} + +static sdm_state_t *sdm_check_path(sdm_t *p, sdm_state_t *s) +{ + unsigned index = s->path & PATH_HASH_MASK; + sdm_state_t **hash = p->path_hash; + sdm_state_t *t = hash[index]; + + while (t) { + if (t->path == s->path) + return t; + t = t->path_list; + } + + s->path_list = hash[index]; + hash[index] = s; + + return NULL; +} + +static unsigned sdm_sort_cands(sdm_t *p, sdm_trellis_t *st) +{ + sdm_state_t *r, *s, *t; + sdm_state_t *min; + unsigned i, j, n; + + for (i = 0; i < 2 * p->num_cands; i++) { + s = &st->sdm[i]; + p->path_hash[s->path & PATH_HASH_MASK] = NULL; + if (!i || sdm_cmplt(s, min)) + min = s; + } + + for (i = 0, n = 0; i < 2 * p->num_cands; i++) { + s = &st->sdm[i]; + + if (s->next != min->next) + continue; + + if (n == p->trellis_num && sdm_cmple(st->act[n - 1], s)) + continue; + + t = sdm_check_path(p, s); + + if (!t) { + for (j = n; j > 0; j--) { + t = st->act[j - 1]; + if (sdm_cmple(t, s)) + break; + st->act[j] = t; + } + if (j < p->trellis_num) + st->act[j] = s; + if (n < p->trellis_num) + n++; + continue; + } + + if (sdm_cmple(t, s)) + continue; + + for (j = 0; j < n; j++) { + r = st->act[j]; + if (sdm_cmple(s, r)) + break; + } + + st->act[j++] = s; + + while (r != t && j < n) { + sdm_state_t *u = st->act[j]; + st->act[j] = r; + r = u; + j++; + } + } + + return n; +} + +static inline void sdm_step(sdm_t *p, sdm_state_t *cur, sdm_state_t *next, + double x) +{ + const sdm_filter_t *f = p->filter; + int i; + + sdm_filter_calc2(cur, next, f, x); + + for (i = 0; i < 2; i++) { + next[i].path = (cur->path << 1 | i) & p->trellis_mask; + next[i].hist = cur->hist; + next[i].next = cur->next; + next[i].parent = cur; + } +} + +static sox_sample_t sdm_sample_trellis(sdm_t *p, double x) +{ + sdm_trellis_t *st_cur = &p->trellis[p->idx]; + sdm_trellis_t *st_next = &p->trellis[p->idx ^ 1]; + double min_cost; + unsigned new_cands; + unsigned next_pos; + unsigned output; + unsigned i; + + next_pos = p->pos + 1; + if (next_pos == p->trellis_lat) + next_pos = 0; + + for (i = 0; i < p->num_cands; i++) { + sdm_state_t *cur = st_cur->act[i]; + sdm_state_t *next = &st_next->sdm[2 * i]; + sdm_step(p, cur, next, x); + cur->next = sdm_hist_get(p, cur->hist, next_pos); + cur->hist_used = 0; + } + + new_cands = sdm_sort_cands(p, st_next); + min_cost = st_next->act[0]->cost; + output = st_next->act[0]->next; + + for (i = 0; i < new_cands; i++) { + sdm_state_t *s = st_next->act[i]; + if (s->parent->hist_used) { + unsigned h = sdm_histbuf_get(p); + sdm_hist_copy(p, h, s->hist); + s->hist = h; + } else { + s->parent->hist_used = 1; + } + + s->cost -= min_cost; + s->next = s->parent->next; + sdm_hist_put(p, s->hist, p->pos, s->path & 1); + } + + for (i = 0; i < p->num_cands; i++) { + sdm_state_t *s = st_cur->act[i]; + if (!s->hist_used) + sdm_histbuf_put(p, s->hist); + } + + if (new_cands < p->num_cands) + p->conv_fail++; + + p->num_cands = new_cands; + p->pos = next_pos; + p->idx ^= 1; + + return output ? SOX_SAMPLE_MAX : -SOX_SAMPLE_MAX; +} + +static sox_sample_t sdm_sample(sdm_t *p, double x) +{ + const sdm_filter_t *f = p->filter; + double *s0 = p->trellis[0].sdm[p->idx].state; + double *s1 = p->trellis[0].sdm[p->idx ^ 1].state; + double y, v; + + v = sdm_filter_calc(s0, s1, f, x, p->prev_y); + y = sign(v); + + p->idx ^= 1; + p->prev_y = y; + + return y * SOX_SAMPLE_MAX; +} + +int sdm_process(sdm_t *p, const sox_sample_t *ibuf, sox_sample_t *obuf, + size_t *ilen, size_t *olen) +{ + sox_sample_t *out = obuf; + size_t len = *ilen = min(*ilen, *olen); + double scale = p->filter->scale; + double x; + + if (p->trellis_mask) { + if (p->pending < p->trellis_lat) { + size_t pre = min(p->trellis_lat - p->pending, len); + p->pending += pre; + len -= pre; + while (pre--) { + x = *ibuf++ * scale * (1.0 / SOX_SAMPLE_MAX); + sdm_sample_trellis(p, x); + } + } + while (len--) { + x = *ibuf++ * scale * (1.0 / SOX_SAMPLE_MAX); + *out++ = sdm_sample_trellis(p, x); + } + } else { + while (len--) { + x = *ibuf++ * scale * (1.0 / SOX_SAMPLE_MAX); + *out++ = sdm_sample(p, x); + } + } + + *olen = out - obuf; + + return SOX_SUCCESS; +} + +int sdm_drain(sdm_t *p, sox_sample_t *obuf, size_t *olen) +{ + if (p->trellis_mask) { + size_t len = *olen = min(p->pending, *olen); + + if (!p->draining && p->pending < p->trellis_lat) { + unsigned flush = p->trellis_lat - p->pending; + while (flush--) + sdm_sample_trellis(p, 0.0); + } + + p->draining = 1; + p->pending -= len; + + while (len--) + *obuf++ = sdm_sample_trellis(p, 0.0); + } else { + *olen = 0; + } + + return SOX_SUCCESS; +} + +sdm_t *sdm_init(const char *filter_name, + unsigned trellis_order, + unsigned trellis_num, + unsigned trellis_latency) +{ + sdm_t *p; + const sdm_filter_t *f; + sdm_trellis_t *st; + unsigned i; + + if (trellis_order > SDM_TRELLIS_MAX_ORDER) { + lsx_fail("trellis order too high (max %d)", SDM_TRELLIS_MAX_ORDER); + return NULL; + } + + if (trellis_num > SDM_TRELLIS_MAX_NUM) { + lsx_fail("trellis size too high (max %d)", SDM_TRELLIS_MAX_NUM); + return NULL; + } + + if (trellis_latency > SDM_TRELLIS_MAX_LAT) { + lsx_fail("trellis latency too high (max %d)", SDM_TRELLIS_MAX_LAT); + return NULL; + } + + p = aligned_alloc((size_t)32, sizeof(*p)); + if (!p) + return NULL; + + memset(p, 0, sizeof(*p)); + + p->filter = sdm_find_filter(filter_name); + if (!p->filter) { + lsx_fail("invalid filter name `%s'", filter_name); + return NULL; + } + + f = p->filter; + st = &p->trellis[0]; + + if (trellis_order || f->trellis_order) { + if (trellis_order < 1) + trellis_order = f->trellis_order ? f->trellis_order : 13; + + if (trellis_num) + p->trellis_num = trellis_num; + else + p->trellis_num = f->trellis_num ? f->trellis_num : 8; + + if (trellis_latency) + p->trellis_lat = trellis_latency; + else + p->trellis_lat = f->trellis_lat ? f->trellis_lat : 1024; + + p->trellis_mask = ((uint64_t)1 << trellis_order) - 1; + + for (i = 0; i < 2 * p->trellis_num; i++) + sdm_histbuf_put(p, i); + + p->num_cands = 1; + + st->sdm[0].hist = sdm_histbuf_get(p); + st->sdm[0].path = 0; + st->act[0] = &st->sdm[0]; + } + + return p; +} + +void sdm_close(sdm_t *p) +{ + if (p->conv_fail) + lsx_warn("failed to converge %"PRId64" times", p->conv_fail); + + aligned_free(p); +} + +typedef struct sdm_effect { + sdm_t *sdm; + const char *filter_name; + uint32_t trellis_order; + uint32_t trellis_num; + uint32_t trellis_lat; +} sdm_effect_t; + +static int getopts(sox_effect_t *effp, int argc, char **argv) +{ + sdm_effect_t *p = effp->priv; + lsx_getopt_t optstate; + int c; + + lsx_getopt_init(argc, argv, "+f:t:n:l:", NULL, lsx_getopt_flag_none, + 1, &optstate); + + while ((c = lsx_getopt(&optstate)) != -1) switch (c) { + case 'f': p->filter_name = optstate.arg; break; + GETOPT_NUMERIC(optstate, 't', trellis_num, 8, SDM_TRELLIS_MAX_ORDER) + GETOPT_NUMERIC(optstate, 'n', trellis_num, 8, SDM_TRELLIS_MAX_NUM) + GETOPT_NUMERIC(optstate, 'l', trellis_lat, 100, SDM_TRELLIS_MAX_LAT) + default: lsx_fail("invalid option `-%c'", optstate.opt); return lsx_usage(effp); + } + + return argc != optstate.ind ? lsx_usage(effp) : SOX_SUCCESS; +} + +static int start(sox_effect_t *effp) +{ + sdm_effect_t *p = effp->priv; + + p->sdm = sdm_init(p->filter_name, p->trellis_order, + p->trellis_num, p->trellis_lat); + if (!p->sdm) + return SOX_EOF; + + effp->out_signal.precision = 1; + + return SOX_SUCCESS; +} + +static int flow(sox_effect_t *effp, const sox_sample_t *ibuf, + sox_sample_t *obuf, size_t *isamp, size_t *osamp) +{ + sdm_effect_t *p = effp->priv; + return sdm_process(p->sdm, ibuf, obuf, isamp, osamp); +} + +static int drain(sox_effect_t *effp, sox_sample_t *obuf, size_t *osamp) +{ + sdm_effect_t *p = effp->priv; + return sdm_drain(p->sdm, obuf, osamp); +} + +static int stop(sox_effect_t *effp) +{ + sdm_effect_t *p = effp->priv; + sdm_close(p->sdm); + return SOX_SUCCESS; +} + +const sox_effect_handler_t *lsx_sdm_effect_fn(void) +{ + static sox_effect_handler_t handler = { + "sdm", "[-f filter] [-t order] [-n num] [-l latency]" + "\n -f Set filter to one of: fast, hq, audiophile, goldenear" + "\n Advanced options:" + "\n -t Override trellis order" + "\n -n Override number of trellis paths" + "\n -l Override trellis latency", + SOX_EFF_PREC, getopts, start, flow, drain, stop, 0, sizeof(sdm_effect_t), + }; + return &handler; +} diff --git a/src/sdm.h b/src/sdm.h new file mode 100644 index 0000000..98082aa --- /dev/null +++ b/src/sdm.h @@ -0,0 +1,42 @@ +/* Sigma-Delta modulator + * Copyright (c) 2015 Mans Rullgard + * + * This library is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by + * the Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + * + * This library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser + * General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this library; if not, write to the Free Software Foundation, + * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef SOX_SDM_H +#define SOX_SDM_H + +#include "sox_i.h" + +#define SDM_TRELLIS_MAX_ORDER 32 +#define SDM_TRELLIS_MAX_NUM 32 +#define SDM_TRELLIS_MAX_LAT 2048 + +typedef struct sdm sdm_t; + +sdm_t *sdm_init(const char *filter_name, + unsigned trellis_order, + unsigned trellis_num, + unsigned trellis_latency); + +int sdm_process(sdm_t *s, const sox_sample_t *ibuf, sox_sample_t *obuf, + size_t *ilen, size_t *olen); + +int sdm_drain(sdm_t *s, sox_sample_t *obuf, size_t *olen); + +void sdm_close(sdm_t *s); + +#endif diff --git a/src/sdm_x86.h b/src/sdm_x86.h new file mode 100644 index 0000000..a40833d --- /dev/null +++ b/src/sdm_x86.h @@ -0,0 +1,235 @@ +/* Sigma-Delta modulator + * Copyright (c) 2015 Mans Rullgard + * + * This library is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by + * the Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + * + * This library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser + * General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this library; if not, write to the Free Software Foundation, + * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef SOX_SDM_X86_H +#define SOX_SDM_X86_H + +#ifdef __AVX__ + +#include + +#define SDM_FILTER_AVX(s0, s1, src, x) do { \ + __m256d tx, t0, t1, t2; \ + __m256d p0, p1; \ + __m256d sx; \ + \ + sx = _mm256_set1_pd(x); \ + s0 = _mm256_load_pd(src); \ + s1 = _mm256_load_pd(src + 4); \ + \ + p0 = _mm256_permute_pd(s0, 5); \ + p1 = _mm256_permute_pd(s1, 5); \ + \ + tx = _mm256_blend_pd(p0, _mm256_permute2f128_pd(sx, p0, 0x21), 0x5); \ + t0 = _mm256_blend_pd(p1, _mm256_permute2f128_pd(p0, p1, 0x21), 0x5); \ + t1 = _mm256_permute2f128_pd(tx, t0, 0x21); \ + t2 = _mm256_blend_pd(p1, _mm256_permute2f128_pd(p1, p1, 0x21), 0xa); \ + \ + s0 = _mm256_add_pd(s0, tx); \ + s1 = _mm256_add_pd(s1, t0); \ + \ + s0 = _mm256_sub_pd(s0, _mm256_mul_pd(t1, _mm256_load_pd(g))); \ + s1 = _mm256_sub_pd(s1, _mm256_mul_pd(t2, _mm256_load_pd(g + 4))); \ + } while (0) + +#define sdm_filter_calc sdm_filter_calc_avx +static double sdm_filter_calc_avx(const double *src, double *dst, + const sdm_filter_t *f, + double x, double y) +{ + const double *a = f->a; + const double *g = f->g; + __m256d s0, s1; + __m256d v0, v1; + __m128d v; + + SDM_FILTER_AVX(s0, s1, src, x - y); + + _mm256_store_pd(dst, s0); + _mm256_store_pd(dst + 4, s1); + + v0 = _mm256_mul_pd(s0, _mm256_load_pd(a)); + v1 = _mm256_mul_pd(s1, _mm256_load_pd(a + 4)); + v0 = _mm256_add_pd(v0, v1); + + v = _mm_add_pd(_mm256_castpd256_pd128(v0), _mm256_extractf128_pd(v0, 1)); + v = _mm_add_pd(v, _mm_permute_pd(v, 1)); + + return x + _mm_cvtsd_f64(v); +} + +#define sdm_filter_calc2 sdm_filter_calc2_avx +static void sdm_filter_calc2_avx(sdm_state_t *src, sdm_state_t *dst, + const sdm_filter_t *f, double x) +{ + const double *a = f->a; + const double *g = f->g; + __m256d s0, s1; + __m256d t0, t1; + __m256d v0, v1, v2; + __m128d r0, r1; + __m256d a0; + + SDM_FILTER_AVX(s0, s1, src->state, x); + + t1 = _mm256_set_pd(0.0, 0.0, 0.0, 1.0); + t0 = _mm256_sub_pd(s0, t1); + s0 = _mm256_add_pd(s0, t1); + + _mm256_store_pd(dst[0].state, s0); + _mm256_store_pd(dst[0].state + 4, s1); + + _mm256_store_pd(dst[1].state, t0); + _mm256_store_pd(dst[1].state + 4, s1); + + a0 = _mm256_load_pd(a); + v0 = _mm256_mul_pd(s0, a0); + v1 = _mm256_mul_pd(t0, a0); + v2 = _mm256_mul_pd(s1, _mm256_load_pd(a + 4)); + + v0 = _mm256_add_pd(v0, v2); + v1 = _mm256_add_pd(v1, v2); + + r0 = _mm_add_pd(_mm256_castpd256_pd128(v0), _mm256_extractf128_pd(v0, 1)); + r1 = _mm_add_pd(_mm256_castpd256_pd128(v1), _mm256_extractf128_pd(v1, 1)); + + r0 = _mm_add_pd(r0, _mm_permute_pd(r1, 1)); + r0 = _mm_add_pd(r0, _mm_set1_pd(x)); + r0 = _mm_mul_pd(r0, r0); + + r0 = _mm_add_pd(r0, _mm_load1_pd(&src->cost)); + + _mm_storel_pd(&dst[0].cost, r0); + _mm_storeh_pd(&dst[1].cost, r0); +} + +#elif defined __SSE2__ || defined _M_X64 || \ + (defined _M_IX86_FP && _M_IX86_FP >= 2) + +#include + +#define SWAPHL(x) \ + _mm_castsi128_pd(_mm_shuffle_epi32(_mm_castpd_si128(x), 0x4e)) + +#define SDM_FILTER_SSE2(s0, s1, s2, s3, src, x) do { \ + __m128d tx, t0, t1, t2, t3; \ + __m128d sx; \ + \ + sx = _mm_set1_pd(x); \ + s0 = _mm_load_pd(src); \ + s1 = _mm_load_pd(src + 2); \ + s2 = _mm_load_pd(src + 4); \ + s3 = _mm_load_pd(src + 6); \ + \ + tx = _mm_shuffle_pd(sx, s0, 1); \ + t0 = _mm_shuffle_pd(s0, s1, 1); \ + t1 = _mm_shuffle_pd(s1, s2, 1); \ + t2 = _mm_shuffle_pd(s2, s3, 1); \ + t3 = _mm_shuffle_pd(s3, s3, 1); \ + \ + s0 = _mm_add_pd(s0, tx); \ + s1 = _mm_add_pd(s1, t0); \ + s2 = _mm_add_pd(s2, t1); \ + s3 = _mm_add_pd(s3, t2); \ + \ + s0 = _mm_sub_pd(s0, _mm_mul_pd(t0, _mm_load_pd(g))); \ + s1 = _mm_sub_pd(s1, _mm_mul_pd(t1, _mm_load_pd(g + 2))); \ + s2 = _mm_sub_pd(s2, _mm_mul_pd(t2, _mm_load_pd(g + 4))); \ + s3 = _mm_sub_pd(s3, _mm_mul_pd(t3, _mm_load_pd(g + 6))); \ + } while (0) + +#define sdm_filter_calc sdm_filter_calc_sse2 +static double sdm_filter_calc_sse2(const double *src, double *dst, + const sdm_filter_t *f, + double x, double y) +{ + const double *a = f->a; + const double *g = f->g; + __m128d s0, s1, s2, s3; + __m128d v0, v1; + + SDM_FILTER_SSE2(s0, s1, s2, s3, src, x - y); + + _mm_store_pd(dst, s0); + _mm_store_pd(dst + 2, s1); + _mm_store_pd(dst + 4, s2); + _mm_store_pd(dst + 6, s3); + + v0 = _mm_mul_pd(s0, _mm_load_pd(a)); + v1 = _mm_mul_pd(s1, _mm_load_pd(a + 2)); + v0 = _mm_add_pd(v0, _mm_mul_pd(s2, _mm_load_pd(a + 4))); + v1 = _mm_add_pd(v1, _mm_mul_pd(s3, _mm_load_pd(a + 6))); + v0 = _mm_add_pd(v0, v1); + v0 = _mm_add_pd(v0, SWAPHL(v0)); + + return x + _mm_cvtsd_f64(v0); +} + +#define sdm_filter_calc2 sdm_filter_calc2_sse2 +static void sdm_filter_calc2_sse2(sdm_state_t *src, sdm_state_t *dst, + const sdm_filter_t *f, double x) +{ + const double *a = f->a; + const double *g = f->g; + __m128d s0, s1, s2, s3; + __m128d v0, v1, v2, v3; + __m128d t0, t1; + __m128d a0; + + SDM_FILTER_SSE2(s0, s1, s2, s3, src->state, x); + + t1 = _mm_set_sd(1.0); + t0 = _mm_sub_pd(s0, t1); + s0 = _mm_add_pd(s0, t1); + + _mm_store_pd(dst[0].state, s0); + _mm_store_pd(dst[0].state + 2, s1); + _mm_store_pd(dst[0].state + 4, s2); + _mm_store_pd(dst[0].state + 6, s3); + + _mm_store_pd(dst[1].state, t0); + _mm_store_pd(dst[1].state + 2, s1); + _mm_store_pd(dst[1].state + 4, s2); + _mm_store_pd(dst[1].state + 6, s3); + + a0 = _mm_load_pd(a); + v0 = _mm_mul_pd(s0, a0); + t0 = _mm_mul_pd(t0, a0); + v1 = _mm_mul_pd(s1, _mm_load_pd(a + 2)); + v2 = _mm_mul_pd(s2, _mm_load_pd(a + 4)); + v3 = _mm_mul_pd(s3, _mm_load_pd(a + 6)); + + v1 = _mm_add_pd(v1, v2); + v1 = _mm_add_pd(v1, v3); + + v0 = _mm_add_pd(v0, v1); + t0 = _mm_add_pd(t0, v1); + + v0 = _mm_add_pd(v0, SWAPHL(t0)); + v0 = _mm_add_pd(v0, _mm_set1_pd(x)); + + v0 = _mm_mul_pd(v0, v0); + v0 = _mm_add_pd(v0, _mm_load1_pd(&src->cost)); + + _mm_storel_pd(&dst[0].cost, v0); + _mm_storeh_pd(&dst[1].cost, v0); +} + +#endif + +#endif -- 2.5.2 ------------------------------------------------------------------------------ Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140