• I thank you for being willing to be interviewed by me. I just had a talk with Andres. He found that with the use of algorithm in the public participation, it’s quite advanced and quite new in digital geography. He wants to understand a little bit more, and I think that’s probably one of... maybe it has potential that I can write a chapter around it. There are loads of debate about whether algorithm or machine learning, this kind of technique, has its own agency in making decisions or how do they participate or collaborate with human actors in public-participation process.

  • I think with the use of pol.is is somehow very interesting and experimental. We’re from a more geography background, so we both have some doubts around the algorithm. We would want to un-black box, to understand it. I think the knowledge could be shared by other researchers so that in the future maybe then more discourses or discussion can be established. Thank you very much.

  • Certainly, my pleasure. Can you see my screen?

  • Let me see. Yeah. I can see you.

  • I’m supposedly sharing my screen, great.

  • The big question here is, as you said, the space itself behaves like an agent in a discussion.

  • When we say we use pol.is to automate away a facilitator’s job, we mean mostly that it does things that a human will necessarily do if we announced the methods beforehand and in a sense that it’s predictable or explainable, as the jargon currently says.

  • It is not strictly speaking an artificial intelligence in a sense of deep learning but rather as in really handy automation, which is supposedly easy to explain. We can, of course, demonstrate it by having me explain it to you here.

  • (laughter)

  • It is not deep learning. As you said, it’s a marketing choice that Colin and friends call it AI-powered conversations, but that’s a separate conversation, between you and Colin preferably. Let’s get to your questions.

  • My first question is I have some thoughts about the clustering. I’m not so sure if it’s clustering algorithm. According to the first conversation that we had, you explained how you choose the most representative comment within one cluster or across different clusters. I don’t know how do you break it down into clusters.

  • As you can see here, at the polisMath repository, there is very simply to say a maximum number of clusters, which is five.

  • The idea, the intuition being it is a k-means algorithm, with k being anywhere from one to five. It recalculates the k value, which is the number of clusters, every time a new vote or a new comment comes in.

  • It doesn’t suddenly change the cluster number, for UI reasons. There is a buffer. For example, there is three groups at the beginning. After a while, the system thinks it’s better to have four groups, but it will not change three to four until it’s consistently four times four. That is to say, after four actions, the ideal group number remains four. It will then change.

  • There is a buffer. If, after two or three votes, the system think it’s better to go back to three, then the user doesn’t see it suddenly splitting and suddenly joining. It is, I think, mostly a user experience thing.

  • How does it think whether three is better or four is better? It is a standard algorithm to choose the best k. It use an idea called silhouette. The silhouette value, here it is a value from -1 to 1 to determine whether a group is a good fit -- that is to say, whether the points in a certain group or a certain cluster actually belong to that cluster -- by calculating the distance between them. It is a very standard algorithm.

  • If you check the Wikipedia article on determining the number of clusters in a set, I think it uses the standard silhouette method to try to partition it into two, three, four, or five groups, and then see whether any of these k values yields a better silhouette number for all the groups involved. It chooses the one that’s best. It is quite standard, statistics speaking.

  • Are there any particular reason that you choose k-means algorithm and the associated method?

  • First, I didn’t code the math part. We can simply say it is mostly Christopher’s decisions. To have an in-depth discussion of which methods did Christopher evaluate, you’ll have to interview Christopher instead.

  • Personally, I find the k-means and silhouette method easy to explain. It’s the main attraction to me, but, because I didn’t write this code, I don’t really have any intuition on whether there is an even more explainable method that’s around.

  • Perhaps, if it’s possible, I would like to also interview Christopher or Colin in the follow-up.

  • Yes, certainly, on both the marketing and mathematics departments.

  • Thank you. I’ve checked Wikipedia, and they say normally clustering algorithm is a set of loads algorithms. When you say you use k-means algorithm, it’s not only one algorithm. It’s like a collective, to my understanding.

  • I’m just a bit curious. When you showed me earlier the screen, I think that’s the parameter, the setting for the formula or the algorithm that you used, but I wonder where I can see all the algorithms that you use in pol.is.

  • Certainly. It’s in one, single GitHub repository called polisMath.

  • I went there, but if I type algorithm, there’s nothing. I wonder what’s a name that you use, so that I can search by myself.

  • It is written in Clojure, which is Lisp. It is very mathy, [laughs] but it’s not very easy to follow if you are used to more, let’s say, object-oriented way of programming. Most of the algorithm you’re interested in are in this one, single file called Clusters. It’s just cluster.clj in polisMath/math.

  • I think it is pretty standard, iterated, k-means clustering. I think what you’re looking at is the entire algorithm here. I’m sure that you can dive into it and find the hyperparameters and parameters to this algorithm. I think the pol.is team did not deviate from the standard statistics or mathematics at all when coding this part.

  • OK, thank you. I always think that’s very interesting, but there’s another group in academia talking about there are loads of algorithms that have been used in our everyday lives.

  • Of course, social scientists, we don’t really understand it. We want to understand it in an amateur way or in a layman way, so that we can share and we can explain it in an easier way to the common people or to other people who are interested.

  • Even though I can’t code or there’s some barriers for us to try to understand, I think that’s the starting point. [laughs] Thank you very much for explaining everything to me.

  • To introduce their intuitions, there is a kind of strings after each function. Right after the definition line for silhouette, it writes -- in English more or less -- what it’s trying to do in code afterwards. It should be somewhat helpful if you want to walk through this entire file.

  • Right. Thank you very much. I will definitely check it out. The second question is, when you decided to use pol.is to facilitate all those comments, I wonder what did you expect? What kind of result or what kind of effect did you expect to get from pol.is?

  • When we start a pol.is discussion, as we previously discussed, it’s usually because the sheer amount of input comments exceeds the reading comprehension [laughs] time for everybody involved.

  • The first effect I would like everybody to have is to have an overview of what kind of arguments are there. It is easy to overlook that there are better or more interesting arguments if one is just focused on one particular conversation thread.

  • Pol.is, I think the main effect I want to give is more or less the overview effect, which is the effect that let people see that there is a variety, or we should say a distribution, of all the possible ideas in an idea space. That is the first thing that I would like people to consider.

  • I wondered, for some particular interests, such as Airbnb or Uber, why do you choose to use pol.is and not Discourse in a questionnaire or giving a big-picture stage?

  • I think I already answered. It is simply because the incoming opinions are so much in volume and in diversity that it is practically impossible, had we used Discourse or a traditional forum method, for everybody to read it all. Even when people can read it all, they would not be able to tease out consensus from the sheer volume of opinions.

  • The same thing happened in the drunk driving case. In particular, we already know that there are people who are interested who will iterate the same opinions over and over. In pol.is, people who share the same idea would just consolidate into a single dot. They would not waste everybody else’s time by revisiting the same points over and over. Rather, we reward consensus-making behavior.

  • I think my question is, in vTaiwan, there are a number of issues. I found some of them you don’t use pol.is in the questionnaire stage, and some you use pol.is. I wonder. Is there any rationale behind it?

  • We usually use pol.is if we think that there are a lot of opinionated members of society who will want to come to this space. We don’t use pol.is if we think that there’s not so much strong emotions, feelings, or reflections, and the volume is something that a forum can handle or a questionnaire can handle. Then we use forums instead.

  • Basically, we use forums because we know that we can moderate the forums ourselves. When the volumes or variety grows to such a point that human moderation is no longer feasible, instead of giving up, we use an automated moderator. That’s the intuition.

  • I’m curious. In the very beginning, when you start to collect different opinions, before collecting opinion, how did you know this issue will be very popular, like most people will participate?

  • You can analyze this topic and see whether it is hot in social media, in traditional media, whether it is being actively followed, or things like that. There are ways to analyze the traditional and social media to see if any particular topic is likely to get people who are very opinionated or a lot of people.

  • For things that the government or the society doesn’t quite have this strongly opinion on, we don’t have to handle this volume of input.

  • I see your point. I want the skip the third question but I will come back to it. I will go to the fifth...ah the fourth question. The fourth question is, when you say there’s a moderator in the Discourse in a forum format, I wonder perhaps there is a comment moderation in pol.is. There’s a by-default setting, such as lazy or strict. I wonder, which mode did you choose?

  • We tried both ways. Initially, in the UberX case, we did strict moderation, but I almost watched it constantly, approving new comments as quick as I can.

  • Afterwards, for Airbnb, we went for lazy, that is to say "pass by default" moderation, which created an explosion in number of comments. The comments, it’s not very high quality, at that. We had to very quickly prune the comments, as Billy did during the drunk-driving one.

  • I don’t have a very good rule. If the moderator can be 24/7 looking at the moderation, then strict moderation is good. It doesn’t waste anybody’s time. If the moderator can only visit a few times a day, then perhaps lazy moderation is good. I don’t really have a very fine heuristic on this. It’s mostly a trade-off between the time wasted on the participants and the time spent by moderators.

  • That’s quite interesting. You were the moderator for the strict mode for Airbnb?

  • No, for UberX. For Airbnb, I was also the moderator, but it started in lazy mode.

  • Are you the only moderator for...

  • Pol.is, at that time, only allows one moderator. There’s no group moderation.

  • It’s very recently that vTaiwan created a group account and that we made pol.is run independent of Heroku, so we can share a group moderator. All these are very recent advancements. Every case before this year, there’s, by necessity, only one moderator.

  • Wow, you did a lot of jobs, lots of tasks. [laughs]

  • That’s very interesting. I wonder, how did you moderator in the strict moderation mode? How did you approve each comment, is there any protocols or principles behind it?

  • Easily, if it’s a duplicate. If the point is made by, strictly speaking, a comment that address exactly the same thing or something that a very high correlation was distinct, then I tend not to let it through.

  • If it’s making a new point or a point that’s not equivalent, although somewhat similar, to a previous comment, then I let it through. It’s mostly de-duplication.

  • So, you have to remember, when the amount of comment goes really huge, you will have to remember that there are maybe 100 comments which are very different from each other.

  • When there is a new comment comes in, you have to remember, "Oh, this goes into the first one."

  • Yes, that’s right. Really substantial comments, they tend to resonate already with one particular group or the other, so it’s not that hard. We already have to do this if we are doing a face-to-face deliberation. If people bring up new points that’s already covered in the handbook or in any of the stakeholder materials, then you have to understand or remember that.

  • It is easier for a facilitator to simply navigate the comment space, just as we did with RealtimeBoard. It is certainly already easier to simply see one incoming comment and remember whether it’s a duplicate, than seeing one incoming comment and think, "Which group will tend to agree with this?" which is what a face-to-face moderator often have to do.

  • By leaving that part to an algorithm, we can focus on the quality and diversity of the comments. While it is work, it is already significantly less work.

  • That’s quite interesting. Go back to the third question. Will pol.is ensure every comment is viewed by more or less the same amount of people to make it fair when it single out the most representative comment in one group?

  • Yes, it does that. We are also talking about an algorithm change recently. If you interview Christopher, you can go into depth in that. There is a tension between wanting to give every comment the same number of votes or eyeballs, in order make sure that they get fair treatment, versus making the time that’s spent on pol.is most efficient use of a voter’s time.

  • If there is, say, dozens or even hundreds of comments, a random distribution will mean that the quality of the comments seen by any viewer is, by necessity, not very correlated. That is to say, how I vote in my first comment or the second comment does not really give any indication of what kind of comments I will see on the third or on the fourth vote.

  • That creates a somewhat different experience compared to what we are now thinking about, which is to give the voters the most controversial comments first. After two or three votes, we can be reasonably sure which group that this person belongs to.

  • Then we give them, as fair as possible, the comments that this particular group tend to be divisive about, in order to further refine the comment space. There is an argument to be said by try to use the existing votes of one person to inform the question or comment to show, versus the current way, which is as fair as possible to give every comment the same number of eyeballs.

  • We haven’t really did an empirical trial of the new, let’s say, it’s more narrative-based way of presenting comments. I can’t really tell you how the experience differs. The design intuition is so that people will become more engaged, because they get more relevant and controversial comments early on. It engage them into constant thinking based on where they are.

  • Of course, I can argue the other way, which is they don’t get exposed to things that everybody tend to vote yes or tend to vote no. These are useful, too, because it establish a rapport or an understanding that people can come to agreement, even if they have controversies.

  • By having the controversies upfront, it may create an experience that emphasize the controversies. I don’t really know, because we haven’t made a field study yet.

  • I think that’s a very, very interesting argument. I’ll definitely follow up with Christopher. I think that sometimes, because of these mathematics or algorithms, statistic behind the...They will slightly or subtly change the way how people participate, or even change how they feel.

  • Sometimes they will get very sympathetic, because you create another button and say, "You can say yes, or you can say what." The emotional changes will be quite...That’s very interesting.

  • That comes to the fourth question. Have you tried to get a better performance out of pol.is? For example, I checked the document. They recommend every user, if they want to use pol.is, for example, to leave seed comments, to make sure there are some diversities in the very beginning or...

  • We definitely do that, just as a face-to-face deliberation start with an inform phase, where people get a handbook that lists the major arguments and positions from the relevant stakeholders. We try to put, as succinct as possible, the various positions that we get from the stakeholders beforehand and get them into seed comments.

  • We used to have this formula, which we still follow, more or less -- not religiously -- in that we have nine seed comments at first. Three of them are profiling questions. That is to say, in the UberX case, there would be, "Do I have a professional driver’s license?" "Have I used Uber before?" and things like that.

  • Three of them will be position questions, "I think taxation is important," "I think insurance is important," "I think registration is important," and so on. Three of them will be wish questions, like, "I wish the government will do more to rein UberX in," and things like that.

  • The initial seed questions are meant to let people know that there are good answers, both the yes and no, on those profile questions, which means there are people who are like them and people who are not like them. The positions necessarily follows from the profiles, but not always.

  • The wish or the statements that, as the government, we have received from the lobbyists, we also try to put them into seed questions. After all, that is what people considered important enough to raise as lobbying statements. What we want out of pol.is is a refinement out of those very strong or very polarized positions.

  • Where did you get the questions for those seeds from?

  • We asked the stakeholders, "If there is one thing that you can put in as your wish or your position, what would that thing be?"

  • Can I still find the archive questions?

  • You certainly can. I have a slide that shows exactly that. I think I even recorded it as an online course, how to use pol.is. I’ll send you that afterwards.

  • Yes, thank you very much. Also, in that document on GitBook, it also states that there’s a limited number that it can only have less than a thousand comments at a time. Sometimes, because vTaiwan is quite popular, from, say, Uber or Airbnb, there are more than a thousand comments.

  • I wonder. Is it pushing the capacity or the limit of pol.is, or it’s not a concern of yours?

  • If it is things like UberX or Airbnb, people tend to be willing to spend a lot of time on pol.is, even answering all the comments. It is less of a concern.

  • We do think that, for more mild topics, where people feel less strongly, then it may be good to moderate out most of the repetitive comments and/or introduce a different presentation order, where people are presented the more controversial comments first. As I mentioned to you, that’s Christopher’s idea.

  • There is a certain guess work we have to do, how much time any voter, any participant is willing to spend on pol.is. Beyond that number, we will see that the sampling gets less and less effective. Each comment has less and less significant participation, so we’ll have to change accordingly. For the really hot cases, like UberX and Airbnb, I am not that worried.

  • OK, I see. That’s fine. I think we can go to...This is a quite interesting one, the last one, encourage participants to stop back a few times to vote on new comments, which are only showing up very recently. You don’t want them to comment at the end, because if they come later, then they might not be...when it comes to the first point that if they are all being voted by a certain amount of people.

  • If they leave their emails, they’ll get reminders. That’s something that pol.is already does. Disallowing new comments toward the end may be useful, but it may not. In Airbnb’s case, two days before the pol.is period draws to a close, there’s a huge influx of participants, because Airbnb sent an email to all their members in Taiwan asking them to come to pol.is.

  • If we disallow new comments at that point, we will not get high quality conversation. Despite there only being only 48 hours or something left, they still have a lot of very useful interactions. We can always extend the date. If we don’t get a super majority or if we don’t get resonating, high-quality comments, we can always say, "OK, we run it for another week." It is not a hard deadline.

  • That’s quite interesting. I wonder, how do you judge if it is a high-quality conversation?

  • It is a high-quality conversation if we manage to get people from different groups to nevertheless agree on something. That’s the whole point.

  • The whole point is to get a consensus out of the participants, out of different opinions.

  • That’s exactly right. By consensus, I think it was Christopher who came up with something like this:

  • \[ Cons(c) =_{def} \prod _ {g \in G} 2 \min ( P_a(g, c)- 0.5, 0 ) \]

  • Which means that there exists a certain comment, C, which, for every group in G where there is a majority of people -- this is basically saying anything less than 50 percent is treated as zero -- then timed to two so that it still normalizes to 100 percent. It multiplies its support across all groups, the higher the better.

  • The idea is that if there emerges a comment C that is agreed by everybody in every group, then of course it has a perfect score. If it’s agreed by all groups, except one minority group, and that minority group, less than half people agree with it, then that comment is zero.

  • It has to get more than majority support from all groups in order to have a non-zero number of the consensus function. The idea is that if we have a handful -- that is to say more than three, I would guess -- comments, C, with a high enough consensus score that is consistent, then the pol.is conversation is a success. Otherwise, we may have to run it a little bit longer.

  • That’s very interesting. Christopher wrote, come up with this formula himself, the consensus formula?

  • That’s brilliant. That’s very interesting. I’m definitely going to talk to him.

  • Basically, you have the forum tools and a pol.is tool. To use this tool, you gather different ideas on the direction of the trend where you see the issues, how do they integrate? Since you have so many viewpoints, how do they being gather or bring to the other stage, where you’re having the expert meeting, the face-to-face one?

  • Nowadays, it’s simple. We just ask the pol.is algorithm to come up with a report and to read from the report.

  • Before we had a report, we had to do something manually that is very much like the report, only much more tedious. The idea is that we first list the consensus arguments, and then we try to find, in each group, which are the comments that are representative of that group.

  • The comments kind of define that group, meaning this particular group agrees on this, but pretty much nobody else agrees this strongly. That’s it. That’s all we do.

  • We let the face-to-face expert meeting or we let people online know, through a presentation, how many groups are there. What are the majority’s consensus arguments? What are the non-consensus, but nevertheless distinguishing sentiments within every particular group? That’s pretty much it.

  • When you are, say, facilitating the face-to-face expert meeting...

  • I start with a briefing of the pol.is phase.

  • There’s a minority opinion, which was still in the presentation. During the discussion with experts, all kinds of stakeholders, as a facilitator, how did you treat the minority opinions in these conversations or discussions?

  • If it’s a minority group, one of the telling points of pol.is is that it doesn’t ignore minority groups at all. The minority group, I will say, "This group only has 10 percent of population, but it brings a very good point, which is blah, blah, blah." If it is a comment that nobody agrees on, like a negative consensus, then maybe it’s just a really bad idea. [laughs]

  • I don’t actually present those, unless it’s significant in the sense that the reverse of it, it’s still a useful sentiment. I remember I picked one from the UberX case, which is to say, "We should not have this conversation. We should just fine Uber." [laughs]

  • A majority of people disagree with that. I shouldn’t, strictly speaking, show it in the presentation, but I think it helps the discussion to let people know that most people don’t feel that we should stop the discussion. We should let the discussion go on.

  • It is somewhat arbitrary, I admit. Otherwise, minority groups do get representation, but not those minority comments that are ignored by everybody.

  • That’s really interesting. I’ve heard from one of your colleagues who said that once pol.is was shut down. It was at the end of one of the issue, which he forgot. I want to follow up. Do you know what happened and what caused it to shut down?

  • I don’t remember anything like that. I remember that a report function is broken when we want to get the drunk-driving thing. Billy had to do it manually. I think that morning the report function is repaired, so we were able to include the report function’s output. The PDF file is fine.

  • We didn’t have a report function before, anyway, [laughs] so it’s all like that. It is the first time we tried the reporting function. That’s it. I think that’s all it is.

  • When you say you don’t have the report function, is that because you use pol.is as SaaS?

  • No, the reporting function is a new feature. It didn’t get completed until, I think, around drunk driving and NCII. When we were doing online liquor sales, Airbnb, or Uber, there’s no reporting function to speak of.

  • Why do they want to have this function?

  • It is what everybody asked them to write... It is tedious work that all the facilitators have to do. There is no human creativity in it, which makes it a prime candidate to be automated.

  • Do you still have the spreadsheets or the sheets where you create those tedious jobs that I can take a look...

  • There is a GitHub repository called polis-tally. If you look at polis-tally, you probably have everything here. This is the pol.is conversation. If you search for the first nine comments, you will see the seed questions that I just talked about, "I am a Taxi driver," "I’m an Uber driver," "I think accident insurance is important," "Taxation is important."

  • "Conflict resolution should be handled by the Ministry of Transport and Communications," "I think surge pricing is fair," "I have used Uber," "I think multiple dispatch systems can be allowed on the same taxi," "I think there should be prominent display on a car to transfer service," and, "I think to run after such unlicensed operation is the MOTC’s duty," and so on.

  • This is basically the matrix that I operated off before the reporting function. This is sorted by consensus, by the way. You can see the highest consensus is really pretty high.

  • They are 97% and 96%, respectively.

  • That’s very interesting. You said those are the seed comments?

  • No, this is the final result, sorted by percentage of consensus. I was just searching for single-digit index -- which means they are seed comment, because they’re the first nine -- to show you. Of all the seed comments, only the accident insurance one made it to the complete consensus list.

  • Only this one made it?

  • Yeah, I think only this one made it to the list.

  • The conversation actually grows more than the seed comments.

  • This is why we use pol.is. Otherwise, we’d just send out a survey, right?

  • Yeah, that makes a lot of sense. Next one, I’ve seen it on GitHub, they say there’s a data export function. Have you ever used the data export function?

  • Of course. What you’re looking at is the data export. Of course, I write some programs to make it easier to see. The raw export is something like this, which is not very easy to comprehend. It is actually easy if I explain a little bit. Every comment has a body, an index, number of agree, number of disagree, and a percentage.

  • For each index, you can match it with one voter. This voter, participant 1, belongs to the first group, the k cluster, the first cluster. He had posted no comments, and they voted for it 29 times, agreeing 14 times, and disagreeing 10 times.

  • For each comment that are there -- there are almost 200 comments -- here is how they voted. One is a yes, negative one is a no, zero is a skip. If you don’t see anything, that person hasn’t seen this comment before.

  • That’s very interesting. Is this also on GitHub?

  • This is on GitHub. If you search for polis-tally, all the pol.is we have run, I have dumped its data and committed it.

  • Would you be able to know the geography, the gender, age, or the profiles of the participants?

  • We don’t ask that questions, so we don’t.

  • In pol.is privacy policy, it say that they might possibly also collect the data when you register either through Google, through Facebook, or Twitter, whatever.

  • That’s right. If Facebook choose to reveal your gender, it is possible that people who sign up through Facebook, you can cross-analyze its profile. Although technically we can do that now with our own pol.is login and our own non-Heroku version of pol.is, we haven’t even beginning to think about doing it.

  • Now that you mention it, we can do this, technically, but we’re not doing it now.

  • Christopher or Colin, do they do these analyses? The privacy policy looks like they are going to do some cross-reference...

  • They do do geography analysis. I remember, for the UberX case, they correlated the number of population in Taiwan and the number of participants of the UberX conversation in Taiwan. They found that there is very even. There is no overconcentration on large cities, which makes us really happy. I don’t know which other analysis they do, so maybe you have to ask them.

  • I probably need to ask them. This is quite popular to use this data. When you have a conversation, those data are collected accordingly, so that you can recycle or reuse those data, kind of like a data market.

  • I do agree it’s useful. It’s just maybe useful from an analytic sense or academic sense, maybe not that relevant to the deliberation itself.

  • Certainly, if I think the gender or age is important, I will have that included as one pol.is question. If it’s not, then it’s circumstantial. People do not necessarily use Facebook to log in, for one, and people’s Facebook profile may not be that accurate, for another.

  • Yes, exactly. That’s true. I agree with you.

  • The next one, I wonder, all the data collected in the process of pol.is, who has the ownership? I checked the privacy policy of vTaiwan. It said most of the data, it’s from the CC0 license.

  • That’s right, so it belongs to everybody — or it belongs to nobody, depending.

  • No one, I would say in terms of ownership, and that’s very interesting.

  • When I checked the privacy policy in pol.is, they said every data, every piece of code source, are owned by pol.is and subject to intellectual and proprietary rights and law protection.

  • You have used pol.is in a quite interesting way, in a fast way, in a direct way. Did you have the control of the data? Who owns this data?

  • The first thing is that, the privacy statement here, I don’t think pol.is says that your IP address and things like that become their intellectual property... I don’t think it says anything like that here. Maybe I’m mistaken.

  • I think one of them, they say, for example, the comment. When you use the user materials...Is it materials? Yes, user materials. It’s you have a comment in a public domain, and then they might use this data for further analysis. Also, according to I think there is an intellectual...

  • No, I don’t think there is an intellectual property. Of course, by necessity, just by showing your comment to other people, you have to give pol.is a non-exclusive right to show it to other people. Otherwise, there is really no point of pol.is existing.

  • I don’t think somehow you become a contracted author and you give the IP right, and you can’t use those copyright statement yourself. I don’t see anything like that here.

  • I think there’s another part, which is to check the source code.

  • This is term of use. This refers to the source code of pol.is itself. Of course they will want to assert intellectual property on the pol.is code. Otherwise, they can’t put it under AGPL.

  • Also, I think it express several lines to talk about how many...where all these materials -- I think they used a collective word, materials, to express all this data and all this stuff are owned by pol.is.

  • Yeah, but this is about materials. This is not about user material. This is things that Christopher has written. This is not what a participant has written. Of course, pol.is will want to retain IP right for their code, because otherwise, they will not be able to release it under a open-source license; they will have to check with everybody else.

  • I don’t think those two materials are the same thing.

  • When they say there’s data...I wonder when these comments or all kinds of things becoming data are saved in a data format. If they say they own this ownership input, it’s embedded in Taiwan, I wondered, do you have a say to use this data generated by pol.is? I think that’s my question.

  • I think the scope is different. This talks about the service which is you can consider the input being both in comments and the output being a visual display. Then, pol.is is essentially saying, how we generate the output from the input is the intellectual property. It doesn’t really say anything about the vote itself and the comments itself.

  • Of course, the vote and comments are under CC0 if it’s a vTaiwan context, or whatever IP right is belonging to the person who wrote this. All that pol.is is asking is a non-exclusive right for you to let it show your comments to other people. I think that is pretty reasonable, actually.

  • Christopher wrote this, the term of use and the...

  • No, no, no, the code. I don’t know which lawyer wrote this. You have to talk to their lawyer, but I think the intent is never to control the intellectual property of user-generated material. I’m pretty sure about it.

  • Thank you. According to GNU APGL Version 3.0, if through open source you get the source code of pol.is and you modify it, you will probably have to publish the modified pol.is. Does it apply to the Sandstorm pol.is?

  • Yeah, of course. We will have to provide the modified code to any user of the system. That is actually a Sandstorm feature. You can, whenever you use a Sandstorm system, download any data you have in the Sandstorm as well as a copy of the code that is running in Sandstorm.

  • This is called portability. You can move to your own hardware anything that you participate on a Sandstorm instance. Sandstorm is designed with this kind of free software copy left in the cloud in mind, in the sense that any user is able to obtain not only the data stored on the cloud, but also a copy of the code that’s running in the cloud.

  • By that, I think we’re already AGPL-compliant. In addition to that, I think our modifications are all done publicly on GitHub anyway. People can just follow our fork and chase the changes we made.

  • That’s amazing. That’s very interesting. [laughs] Next one, did you have other meeting afterwards? I noticed that the time is up, but there’s still...

  • No, it’s OK. It’s OK. We can go on for another, I don’t know, half an hour or something.

  • The chief of Department of IT in Taipei City, Lee Wei-Bin, he mentioned you in an interview. He mentioned that when in the government and outside of government state, g0v, when they’re trying to build software or a product, they might have some differences, the value they believe or the principle they have.

  • For example, his perspective, he thinks g0v, when they’re doing a project is focusing on very agile and very specific and very fast. They want to be very effective.

  • In the government maybe they focus on the stability and the safety of that piece of software or the system. I wonder, when you are building...vTaiwan is somehow every interesting. It’s somehow in between. It’s a project proposed by a former minister in Executive Yuan, but it’s been built and maintained largely by g0v.

  • I wonder, when you are doing this, when you are building or design the infrastructure of vTaiwan, is there any trade-off? There are some people coming from the government. They might have different ideas of the design, what vTaiwan should look like or what kind of software this should be, compared to you were from g0v at the time, and other people.

  • People who want control from the government side all participate in the Join platform instead. I think the best thing for vTaiwan was that there is, concurrently speaking, a Join platform. People who want government control can put their ideas to the Join platform and leave vTaiwan to experiments.

  • On the other hand, of course, cybersecurity and safety is very important, which is why we use pretty much tried-and-true open source components, pol.is being the one exception. When we are using it, it is not yet open source. There is significant algorithm in it. We don’t necessarily have to trust their explanation to it. We have to do a lot of validations and so on.

  • I think Chia-Liang Kao recommended pol.is because, first, he knew Colin face-to-face. We wanted to create pressure from them to open source pol.is. Before we can apply pressure, we have to let the pol.is team see that it helps to open source things if the large governments are going to use pol.is as part of the decision-making platform.

  • To get credibility, you have to open source it. If it is just a few hobbyists or private sector people using pol.is, they may not care that much about algorithmic transparency. We have to prove that pol.is can be used on a massive scale before we can convince them to open source it.

  • We took a risk by using a proprietary SaaS software as part of the vTaiwan stack. Otherwise, we are building on very solid, open-source proven grounds.

  • That’s very interesting. I didn’t know it is g0v, Chia-Liang Kao, and you to push them to open source pol.is.

  • I’m going to follow up with you. That’s a very interesting story. The next one, there are some discussion around civic tech and the city.

  • I mean to say Decide Madrid, Consul, to some degree has benefited from, say, the resources from Madrid City Hall. Outside that, from Media Lab or from the very talented software engineers who are residents in Madrid, or all kinds, this city as a space for all kinds of experiment and civic tech would be one of them.

  • This kind of discussion has been made in urban geography or in human geography a lot. I’m just curious, because I found that the Sunflower Movement or all kinds of activity from g0v, even vTaiwan, data hackathon, all these hackathon, they are all located and happen in Taipei. I wonder, what do you think? Is it any connection or it’s just happened accidentally?

  • No, it’s not an accident. In Taiwan, we have 20 years of open source movement, with COSCUP, with OSDC, with the meet-ups, with the various conferences inspired by COSCUP and OSDC. There’s a huge amount of people who already are gathering in Taipei.

  • There are, of course, the MOPCON community, and there’s a bunch of people around NCKU in Tainan. By and large, the largest annual events in open source happens in Taipei, and usually around Academia Sinica.

  • Academia Sinica, in the early 2000s, has this national plan to push open source, and it established a so-called OpenFoundry to build infrastructure.

  • It’s just like GitHub before GitHub existed, to basically give the developers not just a online space, but also a legal space through the introduction of Creative Commons and l;icensing advisors, as well as offline space.

  • That is to say, the Academia Sinica building itself provides a lot of practically free venues for large-scale events.

  • g0v itself would have not be able to accommodate hundreds of participants if not for @scw’s intervention that allowed to join the Academia Sinica Department of Information Technologies to serve as a stable venue spot, and so on and so forth.

  • There’s many offline space, online space, and legal frameworks that supports the open source community throughout the past 20 years. G0v just rides on this wave. There is many other teams riding on this wave, as well. g0v is just the civic tech part of it.

  • That’s very interesting. I’m going to follow with other [laughs] people from g0v, and so the last one...

  • I had a talk with Andres because he went to g0v summit 2014, and he said he interviewed you and other key persons. He thinks the value or the logic behind these agile, or anarchy, or either decentralized value of vTaiwan has some connection with the Silicon Valley.

  • Certainly, the hacker culture, as we call it, but it’s not really just Silicon Valley. The hacker culture is also MIT. It is also European, like the Chaos Computing Club. It is pretty global, but it is part of the hacker culture.

  • That’s his point of view, and I think it’s more than that, so I wonder what else...Is there any specific cases or particular culture that you make it as a reference to build g0v, or to make it as the value of g0v?

  • The leaderless values, I think it is part of the Internet itself. The Internet is basically rewarding innovations without permission, and we, I think, very consciously connect to the global open-source movement.

  • Which, again, after the invention of decentralized version control, SVK and BitKeeper, and, of course, Git now, everybody is free to fork any project in any way, knowing that if other people think it’s a good idea, they can merge it with minimal cost.

  • Because both @clkao and I were both developers of decentralized version control systems, we naturally took a lot of the analogy in the decentralized version control system culture, and later the Git culture, into the formulation of the g0v ethos, which again rewards forking, and rewards innovation without permission.

  • We also has a lot of experience with open space technology, in particular the BarCamp and Foo Camp idea of open space technology, or unconference. That is, of course, a pretty Silicon Valley idea.

  • The first BarCamp is organized in Socialtext, a company that I worked with for eight years, and @clkao for about five years. There’s various other g0v participants who worked in some way or another with the Socialtext people.

  • Socialtext, even in Silicon Valley, is considered very radically decentralized. There’s a lot of Socialtext culture in the g0v culture, as well.

  • Otherwise, I think it is also very local. The Taiwan people already has a very active meet-up culture, a very active culture where there’s a rough consensus, and just go and do something.

  • The organization management part owe a lot to the Mozilla community, and also the local COSCUP community in particular. You’ll probably want to interview the OCF people who are involved with g0v summit planning, because the g0v summit in particular is modeled after COSCUP.

  • You mentioned something very interesting. Where you have global experiences, sometimes you have to translate, or to make it localized in a specific context, and each context has its personality or characteristics.

  • It’s not that easy, or it’s not that it’s all the same. I think showing some character of the context in Taiwan would be something I want to figure out.

  • Last question, because I read some text on the archive vTaiwan, they do a hackathon history, and so I learned that you refer to RegulationRoom as kind of like the prototype.

  • Yeah. We basically recreated RegulationRoom using this course, and some GitBook and other technologies. RegulationRoom was written from Drupal, and we find its interface not exactly the best to modify.

  • The design principles I think is really advanced, and that there is something very concrete in its output, as well the synthetic documents and so on. We basically started exactly where the RegulationRoom left off.

  • I also think RegulationRoom leaves a very good trail in the sense that they work with IBM and other research institutes to publish a series of papers on exactly what kind of training they gave to the online moderators, what kind of preparatory material it gave, what kind of interventions it had, and so on.

  • We also think that it’s essential that the public servants involved get a course on how exactly this thing works instead of just being another forum to attend. I think having a concrete prototype as well as theory is very helpful.

  • Why did you choose RegulationRoom? Of course, you mentioned there’s advantages to all this concrete profile they’re doing, that had been done by the people from RegulationRoom...

  • By far, it’s the most advanced. I haven’t seen any national-level regulation making that has quite a success as RegulationRoom had.

  • Basically, we picked the state of the art and started from there.

  • In one of the speech, you mentioned there’s difficulties, but Michel Hess, this professor...

  • Yeah. That’s one of the meta-analysis that mentioned RegulationRoom as well as others.

  • He mentioned that there are three big issues they encounter. The first one is the ignorant wall. Then the second one is the silly wall. That’s just my translation. The third one is the too much information explosion wall, as three barriers they encountered.

  • I can get you the exact formulations in his paper later.

  • Because you know that they have some difficulties or you know exactly what kind of an issue they have. When you’re building, designing the infrastructure of vTaiwan, say you think of a way to circumvent or try to prevent these things from happen?

  • Yeah, certainly. We try to have a lexicon or a mini dictionary to make the ignorance not that much of a problem. We try to have dedicated moderators in order to foster a positive conversation instead of a toxic one.

  • We explicitly choose Discourse because it has excellent moderation tools so that it’s possible to edit away the part that is an ad hominem attack, but still leave the part that gives a substantial contribution. This kind of piecemeal moderation is something that not many other systems have.

  • Finally, for information overload, that’s where pol.is comes in. Even before pol.is, we tried to make finer subtopics of each policy subject so that people don’t go all over the place but can instead focus on one specific aspect at a time. We learned from the advises and the issues that they run into.

  • One last one. Because he problematized this issue based on his experiences in the US, in the States, I wondered do you think digital forum culture in Taiwan also share the same issue or there are other issue happening in the context of Taiwan, from your previous experiences?

  • From my previous experience, in Taiwan we’re lucky in that we don’t have to advertise a lot to engage people in politics. People are very engaged in politics. You don’t have to mobilize them.

  • We don’t have to think that much about rural areas and digital gap because there’s broadband everywhere. There’s advantages in Taiwan in that we can deploy cutting-edge systems without fearing that nobody will get on them. That’s the thing we don’t have to worry about.

  • As for the culture of the people who are the most ignorant, comment the loudest, the trolls culture, as well as a lot of forum things, of course it is the same everywhere because it is a property of the medium. It is not a property of a population because it rewards things that capture people’s attention, and so the attention seekers dominate the discussion.

  • This is a problem that’s also problematized by the Discourse team, which is why they call themselves the Civilized Discourse Construction Kit, because they find most online discourse to be not civilized. They try to invent or construct a system that makes it more civility.

  • By using Discourse, we’re basically standing on the shoulder of the troll masters of the Internet who have all operated and constructed a lot of forums that measures in millions of users. By using Discourse, we’re pretty sure that it will scale, but the RegulationRoom people have to reinvent a lot of it themselves.

  • Were you saying the moderators, are they from g0v committee to moderate or a team from Discourse?

  • The Discourse provides moderation tools. The vTaiwan community is the moderators. For every single comment, there’s a huge number of tools that could be used. You can flag it. You can put it a badge. You can edit part of it. There’s a moderation history. There’s many tools a moderator can use.

  • You can also give out badges to reward useful behavior. You can also assign moderation rights to people who are active in the community. The people who freshly registered, they cannot post pictures. Only after they become a good citizen and participate for a while, do they gain the right to post pictures.

  • There’s thousands of very small things like this in Discourse that generally encourage a civil behavior.

  • That’s fascinating. I’m going to follow up with Discourse, which I haven’t been doing much homework on it. I will look it a little bit more.

  • As you can see, there is a huge number of knobs you can tweak to let a system trust the user to a degree where they can post new topics, where they can reply, where they can post pictures, and so on and so forth.

  • This is very, very interesting. I’m going to tell Andres about it. I think he might find this very interesting. Sorry for asking you to spend more time. Thank you very much. Perhaps I’ll probably have to ask you some more questions in the future. I will let you know.

  • It’s fine. Those are very good questions.

  • I’m happy that you agree for me to post this online so that the wider research community can benefit from it also.

  • Of course. Thank you very much.