Common Voice is a really cool project attempting to collect recordings from people of all ages, genders, dialects, etc. so that future speech recognition can become more reliable, and so that we do not need to depend on things like Google's captioning service. If you care about , or just like reading out or listening to random sentences, please optionally make an account (to help make the data more easily searchable) and contribute.
This absolutely does not go just for English speakers, BTW. English is doing just fine. If you speak any other languages, please help add, record and sort sentences in it, instead. For example, the Slovenian database is growing, slowly but surely!
@human_dude Yup, agreed! Every so often you get a bad clip and have to reject it; there is one regular with a severe speech impediment that I try to approve when possible and feel really guilty when rejecting, but ... they skip words, or half-pronounce them, and it'd just throw the algorithm off. :ms_frown:
But generally, it's beautiful voices with all sorts of accents, and just lovely to listen to.

@Mayana oh it would so nice if their voice helped out too, especially since they have a speech impediment, computers need to learn to recognize those too.
But yeah, so many nice accents, and so cool to hear that people just do this :)

@human_dude Agreed. And as I said, I accept their entries whenever what they say actually matches what should be said. But sometimes that just isn't the case, and I have no idea how to handle such events.
But in general, I'm really happy when I hear people recording these in a conversational tone, rather than it being noticeable that they're reading. It's not always possible of course, and I think a more conversational dataset would help. But as long as it is understandable and all the words are there as written (no cannot instead of can't or vice versa) it's all good.

@Mayana half-pronounced words are still pronounced.

And please, as an accented person, accents aren't lovely, they're who we are. To me, I don't have an accent at all.

@scruss There is indeed a line, and I try to walk on the accepting side.
Hah, apologies. I hate mine, so I might've gone a bit out of my way to be a positive voice to those who might think likewise. Which ... well, from personal experience I ought to know it doesn't change much anyway. Only getting out there and talking does.
Edit: Wait up. I didn't even even say the accents themselves are lovely. I just said the variety of voices (which naturally have all sorts of accents) is lovely to listen to. And I mean, I'm ESL, so it's not like this is coming from a place of "Oh all those silly foreigners with their silly accents are so lovely and cute!", you know?

it’s quite buried in the website but common voice data is released cc0

(in my mind that’s a good thing compared to something with weird conditions like the MPL that firefox and things use.)

@Mayana If you care about my Rhapsode project, I highly encourage you to participate in Mozilla Common Voice! (See above toot)

Speech recognition is my single biggest weakness in localization (not that I'm unimpressed by Voice2JSON's efforts there), so lending all speech-to-text engines more training data will help Rhapsode work accross all languages!

Also yes: I'd love automated transcriptions in all our video players. Did you know PocketSphinx & Kaldi supports GStreamer?

@alcinnz At first I thought you were telling *me* to contribute, because you tagged me at the top, and was incredibly confused. :ms_joy:
(But to be clear, I obviously do that fairly regularly.)

@Mayana Fair enough. No, I was just intending to give you an extra boost!

@Mayana wta*f*f: native languages of Cymraeg, Gaeilge, and Kernowek — but no Scots?

I'm still hacked off on how the Scots language Mozilla package was treated years ago: "we don't list joke languages" was the response

@scruss That's a fucked up response! I'm very sorry.
Perhaps it would work out better if you made another issue about this, and convinced some more people to weigh in? I don't blame you if you don't have the energy for that, though.

@Mayana now I see there's "Accent: Scottish English" … oh dear

How do I raise issues?

@scruss @Mayana I thought Scots and Scottish accent of English are two things?

By the way, just an FYI... you don't actually *need* to have an account to contribute to #Mozilla #CommonVoice. All having an account does is to remember your stats for how much you've contributed.

I use the #CVproject app on #FDroid to contribute without an account, regularly!

Anyway, for anyone reading, please contribute! This is sorely needed for #accessibility and #foss #voiceassistants !

@Blort Right! I did put an "optionally" in there, but I suppose that's easy enough to miss or not understand. :ms_smile:

@juleLe @Mayana Yes, that's the one.

PS Sorry for not replying sooner. I missed seeing the notification!

@Mayana What licence are they using for this? I hope it stops big tech from using it

@fatboy CC0, I believe. That's basically public domain. Even big tech can use it.
I understand your desire to prevent that. But, given the level of attention FOSS generally gives to accessibility ... as long as better caption services are being made, I'm happy, regardless of by whom.
Naturally, this could also end up in the next shitty, privacy-unfriendly voice assistant. That's unfortunate. But oh well.

@Mayana @fatboy Thing is, Mozilla could prevent that by licensing it under CC ShareAlike instead. But then, they wouldn’t be the Mozilla Corporation that exists by getting half a billion dollars from Google every year and whose head of EU public policy told me at a conference once “why do you hold us to a higher standard, we’re just another Silicon Valley tech company?”

Exactly thus is the reason I did not contribute, even if I find my dialect should be represented there.
@Mayana @fatboy

@kirschwipfel @aral @fatboy People have different priorities. For all of you, the love of FLOSS and dislike of tech corporations is a bigone. For me, while the situation certainly isn't ideal, the possibility of better accessibility for hard-of-hearing folks is far more important. While yes, these datasets can be use by big corps, they can also be used by open source software, and that's what we should be supporting.
I tooted this to bring Common Voice to the attention of those who'd find it interesting. If you do not, nobody is forcing you into anything. I think I'll mute this thread now, if you don't mind. :ms_smile:

@Mayana Please don’t misunderstand: the reason I feel so strongly about this is because I feel it’s an unconscionable injustice that the only two options we have available to us are either sign away our human rights to big corporations or get better accessibility when that’s a false dichotomy created by Big Tech/Silicon Valley. It’s entirely possible that we can keep our human rights AND have better accessibility. Just not within capitalism / the Silicon Valley model.

@Mayana oh I thought the Common Voice project was dead. Nice to see it being still active!

