View Section: 2017-02-14 Open discussion with Rufus Pollock

( Whiteboard: https://ey.pdis.nat.gov.tw/shared/mz-dL7DJofqEZLbuZ00CG8AsgK13jqlHMV_WfUXKlPa )
前後文Link in context 連結Link
Audrey Tang
2:00 PM today, we’re going to design one of the larger common API — not with the OpenAPI spec — but a common, standard‑based refactoring of our national disaster response systems.
前後文Link in context 連結Link
Audrey Tang
That is actually a good, concrete example to talk about, because that would involve merging two established governmental units, at least their front‑end websites, together, and establishing a common data exchange pipeline around disaster data.
前後文Link in context 連結Link
Audrey Tang
Also, we need to somehow demonstrate that it’s better if you open‑source the part of code which may suffer from performance problems, rather than keeping it tied to proprietary DBs and and throwing more hardware at it.
前後文Link in context 連結Link
Audrey Tang
There’s many dimensions of that operation, and the kick‑off meeting is at 2:00 PM.
前後文Link in context 連結Link
Rufus Pollock
Perfect.
前後文Link in context 連結Link
Audrey Tang
So that’s actually a good case we can talk about.
前後文Link in context 連結Link
Audrey Tang
I’m interested in your book, also. If you want to develop some thoughts, I can help philosophizing, too.
前後文Link in context 連結Link
Rufus Pollock
We can talk about that, as well. One other thing, let me see if I can go and open...where did I write these notes?
前後文Link in context 連結Link
Rufus Pollock
Let’s discuss these topics. One is the practical platform question.
前後文Link in context 連結Link
Rufus Pollock
Second, there’s obviously the book, but particularly out of that is what I call the "open fund" through use open software in gov. Then we might add to that. Let me have a look. Here we go. This is the one I want. I guess I can actually open my OECD slides on data‑driven government.
前後文Link in context 連結Link
( Website: http://rufuspollock.org/2016/09/14/visiting-estonia-21-23-september-2016/ )
前後文Link in context 連結Link
Audrey Tang
Also allocations.
前後文Link in context 連結Link
Rufus Pollock
In terms of the platforming, do you want to tell me a little bit about... if we take 20 minutes on that. One thing I could also talk about a little bit is the Frictionless Data or data containerization ideas, if that would be useful.
前後文Link in context 連結Link
Audrey Tang
Sure, of course.
前後文Link in context 連結Link
Rufus Pollock
Here we go. I’ve got some things here. That’s the one I want, view, into full screen.
前後文Link in context 連結Link
Audrey Tang
You want to project this out?
前後文Link in context 連結Link
Rufus Pollock
I could just talk about...some of the ideas might be quite common for the other things. The one question I would have at the moment is you guys all... I will speak in English. Is that OK?
前後文Link in context 連結Link
Audrey Tang
Yeah, of course.
前後文Link in context 連結Link
Rufus Pollock
My Chinese is not good enough.
前後文Link in context 連結Link
Audrey Tang
It’s OK. If you speak JavaScript or SQL all of us also understand.
前後文Link in context 連結Link
(laughter)
前後文Link in context 連結Link
Rufus Pollock
There’s a really general problem, which is we want to reduce the friction of getting stuff from A to B, from tool A to tool B.
前後文Link in context 連結Link
Rufus Pollock
One thing that you could imagine would be that you are a government official, working in Excel. An ordinary policy‑maker or analyst in one of your departments is probably never going to use JSON, but they will use Excel.
前後文Link in context 連結Link
Rufus Pollock
Imagine a journalist at a newspaper. They might use Excel. They might use JSON if they’re a data journalist. Imagine what it would be like if it was one click to import data from anyone or anything on one of the data platforms into the tool of your choice, into R, into whatever, and then even to do things with it in there. What would you need to have that?
前後文Link in context 連結Link
Rufus Pollock
You need an exchange format, obviously. You need some way to exchange between these tools, and the other is you need tooling.
前後文Link in context 連結Link
Rufus Pollock
I emphasize that often we — certainly myself — as technologists, tend to focus on the exchange format, "If only everyone used RDF. If only everyone used JSON and JSON Schema."
前後文Link in context 連結Link
Rufus Pollock
That is part of the story, in that if you tried to convert from every format out there to every other format out there it would be a nightmare. There are too many of them.
前後文Link in context 連結Link
Rufus Pollock
At the same time, the tooling is hugely important. The integration into people’s existing tool workflow is crucial for adoption. We sometimes think, "If we build it, they will come. If we have our wonderful format, everyone will just adopt it." They won’t.
前後文Link in context 連結Link
Rufus Pollock
Also, in the Frictionless Data project here, the other principles that aren’t necessarily specific, but are distinguishing, are the Zen‑like simplicity.
前後文Link in context 連結Link
Rufus Pollock
For example, RDF does not have Zen‑like simplicity. A focus on existing tooling, for example, in the Frictionless Data project, is that we don’t really invent any data formats. We just invent some degree of a wrapper around some of them.
前後文Link in context 連結Link
Rufus Pollock
For example, we use CSV for tabular data because every tool on the planet that supports tabular data basically supports CSV. Every tool out there supports CSV. [laughs] It’s the most basic tabular structure you could have and it’s web oriented.
前後文Link in context 連結Link
Rufus Pollock
That’s another distinguishing feature, which is data containers. It’s this idea around containerization, which is a little bit different. It has a relationship with APIs, but a little bit different, this idea that you want to ship from different tools to different tools and allow the tool to unwrap what’s in the box, if you can.
前後文Link in context 連結Link
Rufus Pollock
Let’s talk about CSV as maybe exemplifying our approach. For example, why even CSV as a basic tabular format than let’s say JSON? Excel doesn’t support JSON as a native import format. Even if, in our format, you throw away all the data package stuff, you still are left with CSV.
前後文Link in context 連結Link
Rufus Pollock
Even if you don’t understand data packages, you could open the CSV. Whereas most average bureaucrats will not know what JSON is. You will need to build some tool that takes JSON to whatever they have, to some extent.
前後文Link in context 連結Link
Rufus Pollock
I’ll maybe just make a comment here. I was involved in an effort that’s something called IATI, for the International Aid Transparency Initiative. I‑A‑T‑I. It started about 10 years ago now, maybe more. I was involved in it from quite early on. Its aim was that people wanted to report about aid projects. There are people who pay for aid projects, and there are people who received money from aid projects.
前後文Link in context 連結Link
Rufus Pollock
You have issues where you have multiple donors funding the same project without realizing it. Donors don’t realize that there are some projects which are not being funded at all that are important. There’s just a lack of joined up thinking, which is similar to government sometimes. [laughs]
前後文Link in context 連結Link
Rufus Pollock
They created, against some of my advice, at the time their standard was built around XML, which was the JSON of 10 years ago. [laughs] JSON’s maybe better. It’s simpler. It’s web oriented and so on.
前後文Link in context 連結Link
Rufus Pollock
The thing is, they ended up having to spend huge amounts of time. First of all, no one natively can support XML. For example, we spent a load of time writing converters to convert XML back into CSV. That’s what all the analysts, all the people who want to look at the data, or have a spreadsheet and they want to import a spreadsheet. I would just go on about that.
前後文Link in context 連結Link
Rufus Pollock
Just to explain this format, and when you went for this format, basically the structure is to try and take something people are already using, which is CSV.
前後文Link in context 連結Link
Rufus Pollock
I don’t know if you know what I mean by progressive enhancement in JavaScript... It’s less done now, but traditionally progressive enhancement in web applications.
前後文Link in context 連結Link
Rufus Pollock
This is your HTML page. Then you can add JavaScript and progressively enhance it. You’ve got the CSV. Then you can add basically a schema, which is what’s missing in CSV. You don’t know the types. You don’t have rich types.
前後文Link in context 連結Link
Rufus Pollock
Then you can add a descriptor which said, "This data set was created by Audrey," or it was created by you. It was created on this day. This is the license. This is where you got it from. For example, even if you were doing JSON schema based APIs, you still probably need an agreement on what that standard metadata that you ship at the top of your API is.
前後文Link in context 連結Link
Rufus Pollock
You’ve got it perfect. You’re going to bring up an example. You can look at the data package JSON.
前後文Link in context 連結Link
Audrey Tang
Sure. Let’s look at the CSV.
前後文Link in context 連結Link
( Website: https://github.com/datasets/gdp/blob/master/data/gdp.csv )
前後文Link in context 連結Link
Rufus Pollock
Exactly.
前後文Link in context 連結Link
Audrey Tang
Because it’s progressive. Right?
前後文Link in context 連結Link
Rufus Pollock
Exactly.
前後文Link in context 連結Link
Audrey Tang
[laughs] It should be self‑descriptive. As you can see the existing github tooling is happy with that.
前後文Link in context 連結Link
Rufus Pollock
Yes. Exactly.
前後文Link in context 連結Link
Audrey Tang
You can store it in everything...
前後文Link in context 連結Link
Rufus Pollock
You can store it.
前後文Link in context 連結Link
Audrey Tang
...and find whatever. Then you can also see what’s a valid GDP value. Now that you know the country code has to be part of the iso-3-geo-codes/id geo codes.
前後文Link in context 連結Link
Rufus Pollock
Exactly. If you like, this is the part which I think is the least developed, but most exciting. What was the exciting think about link data? Link data I think fundamentally ultimately hasn’t really worked, but what was exciting was foreign keys.
前後文Link in context 連結Link
Rufus Pollock
One of the problems at the moment with the web, if you’ve ever done... You guys have probably taken part in hack days or anything like that. You spend 90 percent of your time preparing the data, downloading it, merging it with some other data set, tidying it up. One of the things here, like that foreign key would be a breakthrough, would be we start to be able to if we had some...
前後文Link in context 連結Link
Audrey Tang
...core data packages...
前後文Link in context 連結Link
Rufus Pollock
...that we could reference them again and again.
前後文Link in context 連結Link
Audrey Tang
It’s there.
前後文Link in context 連結Link
( Website: http://data.okfn.org/roadmap/core-datasets )
前後文Link in context 連結Link
Rufus Pollock
Exactly.
前後文Link in context 連結Link
Audrey Tang
[laughs]
前後文Link in context 連結Link
Rufus Pollock
There it is. Exactly. The irony I’d also say is you don’t need the whole link. One of the things I’d emphasize here. First of all, going back to SQL, foreign keys are more familiar. They’re very simple here. We still working on that, but the idea would then be you’d be able to do a join, and for example join into this data set some other information.
前後文Link in context 連結Link
Rufus Pollock
Here by the way, someone’s already bothered to denormalize a little bit. They’ve already inlined the country name in perfectly normal form, like in the database. You wouldn’t have the country name. You’d just have the ID, which is the foreign key and then the GDP values. Someone’s already denormalized a little bit.
前後文Link in context 連結Link
Rufus Pollock
Let’s say you wanted the continent or you wanted the region. For example a classic thing you might want to do is like, "I’ve got GDP, but I want to aggregate by region." In fact, if you look at the World Bank data, they kind of do this. Often they’ll include in their data set aggregates, for example, GDP for Europe.
前後文Link in context 連結Link
Rufus Pollock
Let me give you another one actually that I’m...I could, on my laptop, just very quickly give you a stupid example.
前後文Link in context 連結Link
Audrey Tang
There’s a text file here that describes all the core data sets.
前後文Link in context 連結Link
Rufus Pollock
At the moment linked most of the core data. It’s actually more coming. This is the list of all of them. You can build data. What’s also nice is, if you start up to their package, if you want to build really lightweight data catalogs, you don’t even need full on CCAN or something like that. You can just write a text file and then almost do it in JavaScript. Just have a little browser of your data packages.
前後文Link in context 連結Link
Rufus Pollock
Let me get you data.oecd. Here’s an example of something I’m actually interested in at the moment, just by coincidence. I’m putting up data.oecd. I think if you just search for OECD pharmaceutical expenditure, I think it’s the top, possibly that one. Here we go.
前後文Link in context 連結Link
Rufus Pollock
What’s great is if you just change the time by the way. Just go down a little bit here, and change the time to all the time. If you drag that over...
前後文Link in context 連結Link
Audrey Tang
Like this?
前後文Link in context 連結Link
Rufus Pollock
Just as far over as you can drag that. Then if you just change, for example, this to GDP per capita. Actually US dollars per capita is even better.
前後文Link in context 連結Link
Audrey Tang
I see.
前後文Link in context 連結Link
Rufus Pollock
I can’t remember who this is. I think this is the US.
前後文Link in context 連結Link
Rufus Pollock
By the way, this is how much we spend on buying...
前後文Link in context 連結Link
Audrey Tang
You’re right, by the way. This is the US.
前後文Link in context 連結Link
Rufus Pollock
One of the things, the US has gone crazy. This is partly Hepatitis C. There’s one drug category in 2014 that’s released. For example, I know you’re in Taiwan. I don’t know the numbers. In mainland China, there at 30 million people with Hepatitis C who will need this expensive...
前後文Link in context 連結Link
Rufus Pollock
The point being, just by the way, is if you just look at this dropdown list. Let’s say I wanted total spending on pharmaceuticals, which is the number I actually want. Guess what? It’s not in this list. I would need to join this data set with population per capita over time to do this.
前後文Link in context 連結Link
Rufus Pollock
That was just one example. We constantly need to do joined and often on the same data set, population, GDP. In fact, the data sets you normally need to join on are quite common. Not always the same, but often they’re the same data sets. Going back, I think you had a nice example of the data package that you...
前後文Link in context 連結Link
Audrey Tang
The GDP one?
前後文Link in context 連結Link
Rufus Pollock
Yeah. The GDP one was perfect.
前後文Link in context 連結Link
Audrey Tang
It was linking to the registry. The thing that it links is not the core yet because it’s not yet curated.
前後文Link in context 連結Link
Rufus Pollock
I should check that, whether it’s there or not. It’s a good...
前後文Link in context 連結Link
Audrey Tang
...it is a good example anyway.
前後文Link in context 連結Link
Rufus Pollock
It’s a good example anyway. I would check that. The other thing we have here that you can see by the way, this is the data packages. Just to look at it, it is actually in JSON. We’re also thinking that you could describe these in JSON, but also in possibly actually CSV itself.
前後文Link in context 連結Link
Rufus Pollock
My point is to the top. First of all, even if you’re using Swagger, the very top part of this before we have the resources...
前後文Link in context 連結Link
Rufus Pollock
The resources looks like a little bit how you describe your fields in Swagger.
前後文Link in context 連結Link
Audrey Tang
Yeah. It’s the JSON Schema part.
前後文Link in context 連結Link
Rufus Pollock
This part here, you could still be useful as just a convention whenever you’re shipping a file in your API. You may need to send some metadata with it. What’s the license? What is the title? I don’t know. When was it last updated? There’s some standard metadata that you might adopt, that’s just based on having gone through NPM or looking ASCII code metadata.
前後文Link in context 連結Link
Rufus Pollock
The thing that then is interesting is the description. This is the resources, which is very simple, which is the description of the fields in the file. It’s heavily based actually, some parts are on JSON schema. It’s unfortunately not exactly identical with it. It’s difficult to exactly converge it.
前後文Link in context 連結Link
Rufus Pollock
The question then is also you can extend the metadata. For example, this is a data package for describing tabular data. Your resources can be anything.
前後文Link in context 連結Link
Rufus Pollock
In a data package, you can put inside a container. You can put anything. You can put binary data in. It wouldn’t be able to be described so usefully. The resource here wouldn’t have any schema. It would just say name. It would just have those exactly, is the JSON one, exactly.
前後文Link in context 連結Link
Rufus Pollock
This is actually one with also a remote reference, where it says, "Hey. My resource lives somewhere else." You can package up stuff that lives somewhere else.
前後文Link in context 連結Link
Audrey Tang
Which is our usual way of doing things.
前後文Link in context 連結Link
Rufus Pollock
Right. You can also extend it. Finally, for example, I don’t know if we pull up fiscal data package...
前後文Link in context 連結Link
( Website: http://specs.frictionlessdata.io/fiscal-data-package/ )
前後文Link in context 連結Link
Audrey Tang
Sure.
前後文Link in context 連結Link
Rufus Pollock
Looking at this diagram, you can see on the tabular data package, which builds on top of data package JSON Table Schema and CSV. Then you can extend it further by saying, "I must have these fields." You might have restaurant inspections. This is sort of what you’re doing with your Swagger APIs probably. You’re going to have a list of schemas.
前後文Link in context 連結Link
Audrey Tang
This is the DITA idea back in the XML era.
前後文Link in context 連結Link
Rufus Pollock
Exactly.
前後文Link in context 連結Link
Audrey Tang
The Darwin Information Typing Architecture.
前後文Link in context 連結Link
Rufus Pollock
Exactly.
前後文Link in context 連結Link
Rufus Pollock
There’s a lot of people who have talked about this in the last few years, or last decade. Obviously even I talked in the US quite a bit. Where standardizing really simple specifications for like, "Hey. You’re going to put out restaurant inspection data, like ’Was the restaurant clean or not?’ What would be the columns you’d have?"
前後文Link in context 連結Link
Rufus Pollock
The thing I think for that to work is an obvious idea. It needs to be super simple, and it needs feedback. It’s so easy to write a spec. It’s really easy to say, "You should have column this, this, and this." It’s a totally different thing for people to actually do that. [laughs] The question is, for example, you need good validation.
前後文Link in context 連結Link
Rufus Pollock
For example, there’s this site called Good Tables that we’ve put up. Obviously they’re schema validated. Here we go. It’s the first one. There are themes like JSON schema validated. All I would say is having looked at them, and even worked at our own site, is let’s say you pick an example. Click on this one here.
前後文Link in context 連結Link
( Website: http://goodtables.okfnlabs.org/ )
前後文Link in context 連結Link
Audrey Tang
Sure.
前後文Link in context 連結Link
Rufus Pollock
If you click on that and then hit validate, you really need to work on your user experience. For example, I don’t think this is still a great user experience. You really need to find a way, and this is harder than you think, to tell people who are going to produce data for you, what’s not right about it if they make errors.
前後文Link in context 連結Link
Rufus Pollock
In a certain way, when you’re doing your JSON API, it all gets easy because you’ve got tech geeks that are producing the API. If you assume people further down the chain have to...In a way that just hides your complexity. That means someone inside the department needs to get the data in a form that can be published through that API.
前後文Link in context 連結Link
Audrey Tang
Yeah. Other system depends on it. It would break...
前後文Link in context 連結Link
Rufus Pollock
It will break.
前後文Link in context 連結Link
Audrey Tang
...if it somehow invalidates.
前後文Link in context 連結Link
Rufus Pollock
Right. I got it. That requires a lot of tech. How do I put it? Your real goal ultimately has got to be that civil servants can publish quite a lot of data themselves.
前後文Link in context 連結Link
Rufus Pollock
At the moment we tend to have a model where basically the civil servants won’t do stuff, but there will be an IT team who will take that data and put it into some schema that we have defined. That has a lot of demand on your IT team.
前後文Link in context 連結Link
Audrey Tang
I don’t think that’s the case in Taiwan though.
前後文Link in context 連結Link
Rufus Pollock
My question is here. Let’s pick something really...I don’t know. Inside the department you’ve probably done some research already. What’s a data set that people are producing by hand, where they’re filling out the Excel spreadsheet by hand?
前後文Link in context 連結Link
Audrey Tang
Meeting logs. That’s the primary example.
前後文Link in context 連結Link
Rufus Pollock
Great. Do they do that in Excel or do they have now a web form?
前後文Link in context 連結Link
Audrey Tang
Meeting logs are done in free text, mostly through emails and as Word documents in official document systems. Mostly printed in paper form and re‑scanned as JPEG files, if I’m not mistaken, or at best, as attachment to emails.
前後文Link in context 連結Link
Rufus Pollock
Perfect. Do you want that to become structured data like a spreadsheet?
前後文Link in context 連結Link
Audrey Tang
Yes.
前後文Link in context 連結Link
Rufus Pollock
In each of the departments you’re looking a little bit... What do you think it’s going to happen?
前後文Link in context 連結Link
Shuyang Lin
I never thought about that. That’s a good question.
前後文Link in context 連結Link
Audrey Tang
What we’re trying to do here is to... I was explaining the Taiwan situation a bit. I understand that there is usually an IT department in other countries, like in each siloed unit.
前後文Link in context 連結Link
Rufus Pollock
There might be an IT department. There might be a common one, as well.
前後文Link in context 連結Link
Audrey Tang
Sure, of course. They would be in a larger IT office or something. In some cases they report directly to an upper level IT unit, sometimes they report to the organization, the command structure doesn’t matter here.
前後文Link in context 連結Link
Audrey Tang
What matters is that whenever we want to do any data exchange in a normal way, there has to be either a hub in the meta-level, or a direction connection between these IT guys. Then the public servants are blissfully ignorant of it. That’s your picture.
前後文Link in context 連結Link
Audrey Tang
In Taiwan it doesn’t work that way. [laughs] In Taiwan the “IT” people is just a part of the picture. Let’s put it lowercase, like “it”.
前後文Link in context 連結Link
(laughter)
前後文Link in context 連結Link
Audrey Tang
Most of the time, it’s the private sector vendors...
前後文Link in context 連結Link
Rufus Pollock
Who do all the stuff?
前後文Link in context 連結Link
Audrey Tang
...who do all the stuff in building systems. Correct me if I’m wrong, but all of the cases that I’ve seen, it’s an externally procured provider that’s renewed annually, that provides the user-visible experiences. Maybe the infrastructure, the actual machines are maintained by the IT department, but the IT are more like MIS that way than actual information technology.
前後文Link in context 連結Link
Rufus Pollock
That’s true in many other governments too. They’ve communicated they’ve outsourced a lot of stuff.
前後文Link in context 連結Link
Audrey Tang
We outsource maybe most of our stuff.
前後文Link in context 連結Link
Rufus Pollock
Let’s go to the question of structured logs. Let’s call him or her who has to report the meeting logs? What’s her name or his name? Should we give them a name, just for the moment, for doing user research?
前後文Link in context 連結Link
Audrey Tang
Alice.
前後文Link in context 連結Link
Rufus Pollock
Alice here has to report the meeting log. You know what it’s like having a structured data. How is that going to work? Which unit are you going to create? Are you going to create a web form where she has to type it in?
前後文Link in context 連結Link
Audrey Tang
No. In PDIS usually just send her an Etherpad link.
前後文Link in context 連結Link
Rufus Pollock
You send her an EtherCalc link?
前後文Link in context 連結Link
Audrey Tang
An Etherpad link.
前後文Link in context 連結Link
Rufus Pollock
An Etherpad link?
前後文Link in context 連結Link
Audrey Tang
Yeah.
前後文Link in context 連結Link
Rufus Pollock
How do you get the structured data out of that, like the time our meeting started?
前後文Link in context 連結Link
Audrey Tang
We write a parser.
前後文Link in context 連結Link
( Website: https://github.com/audreyt/pad2an )
前後文Link in context 連結Link
Rufus Pollock
You give her a structure there in the Etherpad that they must follow?
前後文Link in context 連結Link
Audrey Tang
We just gave them a template basically.
前後文Link in context 連結Link
Rufus Pollock
You give them a template.
前後文Link in context 連結Link
Audrey Tang
The Etherpad always starts with the date of the meeting, the title of the meeting, and then speeches.
前後文Link in context 連結Link
Audrey Tang
It feels very automated; just start typing whatever. We put on a line of description that said, "Try to put a colon between the speaker and the words they speak." That’s it.
前後文Link in context 連結Link
Rufus Pollock
My point though is that as you do this, you’ve created a little format, which is your Etherpad format for reporting speeches. You will need it to be machine-validated.
前後文Link in context 連結Link
Audrey Tang
Yeah.
前後文Link in context 連結Link
Rufus Pollock
Sometimes people won’t follow the format. For example, one common thing I’ll do is type dates that you can’t pass as dates. Going back to that Good Table, one of the general challenges you have is of giving people good format.
前後文Link in context 連結Link
Rufus Pollock
What’s wrong here is there was a header column that was empty. Here the row dimensions are incorrect compared to the thing. For example, here there’s no value for amount of description.
前後文Link in context 連結Link
Audrey Tang
When they hit publish, what we need to do is to have this very clear, instead of random XML error from SayIt.
前後文Link in context 連結Link
Audrey Tang
Yes, I do agree, it’s an experience we need to work on.
前後文Link in context 連結Link
Rufus Pollock
I’m just trying to find...
前後文Link in context 連結Link
Audrey Tang
Currently, it’s either completely blocked through some cryptic error messages, or it goes through.
前後文Link in context 連結Link
Rufus Pollock
Let me see if this works. Thank you very much. I probably actually won’t have to, but it’s very interesting see how I do on the...Is this going to work here?
前後文Link in context 連結Link
Audrey Tang
Just to illustrate, basically, this is from the CastingWords folks. CastingWords is an external vendor. They have their own format. They can produce a VTT or an SRT, which would then timecode it with the YouTube.
前後文Link in context 連結Link
Audrey Tang
We handle that, too. Oftentimes, we pay for the cheaper option when we don’t care about the timecodes, which is most of the time, where they just paragraph delimit the speakers, and so on. Then we write a parser that translate it into a command console, and then feed it to SayIt. That’s our current platform.
前後文Link in context 連結Link
( Website: https://sayit.archive.tw/speeches )
前後文Link in context 連結Link
Rufus Pollock
All I’m going to say is, some things, for example, this is the UK dashboard we built for these. We’re also trying to build these dashboards at the moment. This is an example where this is spending data. Each government department’s supposed to publish spending data every month on what they spent money on.
前後文Link in context 連結Link
Rufus Pollock
Not just their budget, but individual line items, like we bought this mobile subscription, or whatever. I was involved — when I was on the government transparent board — in designing the format. Guess what? It’s written in CSV. It’s super simple. It’s in Excel or CSV.
前後文Link in context 連結Link
Rufus Pollock
It has basically eight columns. That’s all it has. Maybe six columns are compulsory. Now, I want you to guess. They’ve actually published now about 3,000 CSVs, because they publish roughly one every month. How many of them are valid?
前後文Link in context 連結Link
Shuyang Lin
How many?
前後文Link in context 連結Link
Rufus Pollock
How many out of 1,234 are valid?
前後文Link in context 連結Link
Shuyang Lin
Valid, you mean?
前後文Link in context 連結Link
Rufus Pollock
They actually have the six columns they’re supposed to, and the structure of those columns, if it’s supposed to be a number, it’s a number. If it’s a supposed to be a string...
前後文Link in context 連結Link
Shuyang Lin
Is there a person who really fill out the form?
前後文Link in context 連結Link
Rufus Pollock
They did it according to the structure.
前後文Link in context 連結Link
Shuyang Lin
Is it half, 50 percent?
前後文Link in context 連結Link
Rufus Pollock
Half, 50 percent. What do you guess?
前後文Link in context 連結Link
(pause)
前後文Link in context 連結Link
Rufus Pollock
80 percent, 20 percent, 10 percent, 100 percent?
前後文Link in context 連結Link
(discussions)
前後文Link in context 連結Link
Rufus Pollock
30 percent. OK, 30 percent.
前後文Link in context 連結Link
Rufus Pollock
I hear 20 percent.
前後文Link in context 連結Link
Audrey Tang
Five percent.
前後文Link in context 連結Link
Rufus Pollock
Five percent. The answer is zero. They didn’t manage to publish a single valid...
前後文Link in context 連結Link
Audrey Tang
I’m too optimistic. That’s great to know.
前後文Link in context 連結Link
(laughter)
前後文Link in context 連結Link
Shuyang Lin
For this one.
前後文Link in context 連結Link
Rufus Pollock
In addition, you need to track if they keep publishing. Many of them just stop publishing the files. Since the government’s got less interested in transparency, you can just tell here, and we haven’t been running this quite as recently. For example, many of these seem to have not published for quite a long time.
前後文Link in context 連結Link
Rufus Pollock
For example, these guys, this is one department, which is the Ministry of Defense has published since December 2016. The others are often quite a long time ago, like April 2016, etc. The file, one of the things that we experience, that we went and did user research with the people...
前後文Link in context 連結Link
Rufus Pollock
Because the civil servants, they’re not trying to be bad. They’re not like they don’t like this. Actually, many of the departments love this. In fact, their heads of department love this data, because they find it really useful for tracking...
前後文Link in context 連結Link
Audrey Tang
It’s like their GPS for quality.
前後文Link in context 連結Link
Rufus Pollock
They don’t just like this. They like the data. They like having the data. They’re not against it, but they struggle to publish, and do the Good Tables. What they found is just they don’t understand why changing the column names matters.
前後文Link in context 連結Link
Rufus Pollock
For example, they often just change the name of the column in the spreadsheet in random ways. They also do things. They often publish dates that aren’t valid, or change the format of the date. Sometimes they save from Excel when it’s switched to US mode, and then they put the month first, rather than the day first, all of this kind of stuff.
前後文Link in context 連結Link
Rufus Pollock
What it means, by the way, is if you want to actually do a dashboard of government spending, which you’d quite like to build out of this, it’s very painful. It’s very time‑consuming to clean this data up, and actually use it.
前後文Link in context 連結Link
Rufus Pollock
Why I’m trying to get at that is that it’s not just having the schema. It’s building tooling that gives feedback to users on are they valid with the format?
前後文Link in context 連結Link
Rufus Pollock
I don’t know in your Etherpad parser, have you done any experience of not having invalid Etherpad documents? What happens when they get it wrong?
前後文Link in context 連結Link
Audrey Tang
Then you see a broken XML message, a random message that says it’s invalid XML, basically, which is not helpful for the person. You have to click “back,” and then try to figure out where it’s wrong.
前後文Link in context 連結Link
Rufus Pollock
Try to figure out what’s gone wrong.
前後文Link in context 連結Link
Audrey Tang
It’s right. I wrote the conversion script myself, and I didn’t do that well on the error reporting thing.
前後文Link in context 連結Link
Rufus Pollock
It’s hard. I tell you, it’s like good UX for Apple iPhones. It’s harder than you think.
前後文Link in context 連結Link
Shuyang Lin
Every time I found a problem, I go to her.
前後文Link in context 連結Link
Audrey Tang
Well, I know exactly how hard it is, because I did write a parser and error reporter for the Perl 6 language.
前後文Link in context 連結Link
Audrey Tang
Reporting on the exact position when it goes wrong the user a helpful message is really difficult. I spent maybe a year with Larry Wall on that. [laughs]
前後文Link in context 連結Link
Audrey Tang
I know how hard it is. It’s just in the case of meeting logs, I was like, "No, it’s not worth spending that much time on it."
前後文Link in context 連結Link
Rufus Pollock
I got you. It’s stuff like also we’re geeks. For us, line column numbers and rows are really valuable. It turns out, in our experience, when we try to do it that way — I don’t know if you have just the good tables, if you just put it up again just for a moment here — it’s not perfect.
前後文Link in context 連結Link
Rufus Pollock
Originally, we could just say something like, "Header column number one is empty." This is a geek message. This was written by the...
前後文Link in context 連結Link
Audrey Tang
Yeah, it’s like, "I don’t know what to do." This is like my XML messages. It’s like saying the starter tag is empty.
前後文Link in context 連結Link
Rufus Pollock
Most ordinary users don’t even know what column number one is. You want to draw it like they visualize it. That’s why we try to start drawing out, but it’s harder than you think. [laughs] For example, how do you explain date formats?
前後文Link in context 連結Link
Rufus Pollock
It turns out that just explaining good date formats is harder than you think. It’s not just explaining the date format. For example, many of them go, "Oh, OK, but how do I go to Excel and change the date format?" They don’t know how to change date formats in Excel.
前後文Link in context 連結Link
Rufus Pollock
There’s also weird stuff, like Excel may show you one date format, but when it exports, it exports in a different date format. It’s things like that that turn out to be hard to do...
前後文Link in context 連結Link
Audrey Tang
Which is always the case here, by the way, because we start counting years from 1911 in the government.
前後文Link in context 連結Link
Rufus Pollock
From the revolution.
前後文Link in context 連結Link
Audrey Tang
From the revolution. It means that Excel is never storing our date formats natively.
前後文Link in context 連結Link
Rufus Pollock
Right. The thing is, all I’m saying is, this tooling, that’s the really valuable part of format. I can make up format, whatever, building this kind of tooling, so this tooling is built for data JSON table schema.
前後文Link in context 連結Link
Rufus Pollock
This is an error reporting system that’s built for JSON table schema. You can do, if you want, cool things like once you have this, and this is a Python library, and there’s also a JavaScript library. Now, we’re implementing libraries in every language.
前後文Link in context 連結Link
Rufus Pollock
You could do things like install it in Travis. Every time you push a file to GitHub, it will check whether your data is valid, and fail or succeed on your pull request.
前後文Link in context 連結Link
Audrey Tang
The PDIS.tw team here wrote that for our meeting tracking systems.
前後文Link in context 連結Link
Rufus Pollock
You know that. You can do that kind of integration. Here, the other thing I was going to mention, actually, is fiscal data package. There’s a whole specification that goes beyond just having a tabular data package. I think you search for fiscal data package.
前後文Link in context 連結Link
( Website: http://specs.frictionlessdata.io/fiscal-data-package/ )
前後文Link in context 連結Link
Rufus Pollock
There you go, the fiscal data package. This, what it does is it says, "It’s a data package. It’s a tabular data package." It’s got to have tables in it, CSV. Then it says what columns you’ve got to have. It has a model for that.
前後文Link in context 連結Link
Rufus Pollock
This, it basically says, "Be in CSV." Basically, it says, "It must be a tabular data package." Then it says what extra columns or stuff you must have.
前後文Link in context 連結Link
Audrey Tang
Like the Darwin information typing architecture.
前後文Link in context 連結Link
Rufus Pollock
Yes, that’s how you described it.
前後文Link in context 連結Link
Rufus Pollock
Really, by the way, the next level that you probably want if you’re going up, is you’ve got a basic container, like a steel box. Then you say, "My steel box is designed for pallets."
前後文Link in context 連結Link
Rufus Pollock
People know what I mean by a pallet. By the way, containers don’t all look the same, it turns out. There’s ones specialized for particular functions. Then you could say, "Mine is a steel box for pallets that hold bicycles."
前後文Link in context 連結Link
Rufus Pollock
Here, what you also have, really is, if you know what I mean by OLAP, you probably want to get into dimensional, not just saying, "Hey, I’ve got this columns," but columns go together. For example, these three columns together describe a customer. These three columns together describe a department, or so on.
前後文Link in context 連結Link
Rufus Pollock
The question is, then, on the tools, what we want to remember at the very beginning is I should have one click to install, just like you can type npm install, and the library installs. Now, geeks do that, but whether you’re in R, or whether you’re in Excel, or whatever in, you could install data packages or data into your system with one click.
前後文Link in context 連結Link
Rufus Pollock
The next step has to be, in all of this, it’s just like, "Here is all the tooling you can do. What platform integration can you build?" Going back just a point about see JSON, JSON’s awesome, you can all kinds of stuff with your JSON.
前後文Link in context 連結Link
Rufus Pollock
Just take an example. One of the standard things people do with data is put it in a relational database, or put it in spreadsheets. They dominate almost any other system — even NoSQL — by orders of magnitude today in terms of installed base.
前後文Link in context 連結Link
Rufus Pollock
Now, when you have JSON, people do all kinds of fun stuff with JSON. For example, they nest entities. You take one entity, and then inside of it, you embed something else. For example, you might have users, and then you put their posts inside them.
前後文Link in context 連結Link
Rufus Pollock
Once you start doing that, you have all kinds of fun normalization games. It’s not trivial to take JSON back into a relational database.
前後文Link in context 連結Link
Audrey Tang
Hmm, you can just create a field with type JSON. Problem solved.
前後文Link in context 連結Link
(laughter)
前後文Link in context 連結Link
Rufus Pollock
Yeah, but in that case, you’re basically just doing NoSQL in a relational database. You can do it.
前後文Link in context 連結Link
Rufus Pollock
All I’m pointing out is that, for example, I’ve had this debate with standardization efforts. For example, this thing called the Open Contracting Partnership.
前後文Link in context 連結Link
Rufus Pollock
Tim Davies is the guy who was quite involved in IATI. The thing is, in life, how do I put it? Many formats get dominated by publisher concerns, not by consumer concerns.
前後文Link in context 連結Link
Rufus Pollock
Because the people who write the spec are often involved in producing the data, they tend to focus on their concerns as publishers to express what they want to express. Actually, the success of many of your formats is going to depend on consumption.
前後文Link in context 連結Link
Rufus Pollock
From a point of view of publishers, HTML is pretty rubbish. HTML has no semantic structure, basically, until HTML5.
前後文Link in context 連結Link
Audrey Tang
I already wrote that on the whiteboard for you. [laughs]
前後文Link in context 連結Link
Rufus Pollock
Even HTML4 is better than XHTML.
前後文Link in context 連結Link
Audrey Tang
Agreed.
前後文Link in context 連結Link
Rufus Pollock
It was worse as a structured publishing format, but it was easier to consume. It won. One of my comments is that JSON is great, and you can convert to it, but for example, it’s great for APIs for geeks.
前後文Link in context 連結Link
Rufus Pollock
You can express however you want it. We could even have GraphQL etc. In some sense, those are optimized for a certain kind of consumption. GraphQL is a JSON API optimized for consumption of mobile or other devices, where bandwidth matters, or complexity of queries matter.
前後文Link in context 連結Link
Rufus Pollock
For your exchange with a lot of your data, once you’ve got a JSON field, for a lot of the tools, you want to get data back into, it’s not especially convenient, unless you’re building for web application, unless your web APIs are optimized for consumption by web applications.
前後文Link in context 連結Link
Audrey Tang
Which is the case we’re doing now.
前後文Link in context 連結Link
Rufus Pollock
Right, but you’ve got to ask yourself, is most of your data designed for consumption by web applications, or is it for consumption back into a database, or back into something else? I’m only pointing this out. I’m not saying you can convert CSV to JSON incredibly easily, like it’s trivial.
前後文Link in context 連結Link
Rufus Pollock
If you have foreign keys, you can even do complex joins. You can convert JSON back into a normalized structure with tables with no nesting.
前後文Link in context 連結Link
Audrey Tang
The only things you can do, is either exploding it to individual tables, or declare bankruptcy, and say it’s a JSON field.
前後文Link in context 連結Link
Rufus Pollock
Right, or it’s a JSON field, but is it a good example? Who here knows Redux?
前後文Link in context 連結Link
Audrey Tang
R‑E‑D‑U‑X?
前後文Link in context 連結Link
Rufus Pollock
Like React?
前後文Link in context 連結Link
Audrey Tang
Sure, of course.
前後文Link in context 連結Link
Rufus Pollock
Like front end. Do people use React and Redux?
前後文Link in context 連結Link
Audrey Tang
We use Angular 2, React, Vue, whatever.
前後文Link in context 連結Link
Rufus Pollock
One of the funny things about Redux is that if you notice the design of a Redux store, your Redux store is like a local cache of your remote database. One of the things you do in Redux is they encourage you to normalize your data.
前後文Link in context 連結Link
Rufus Pollock
To un‑normalize it, to un‑nest it. If you read the Redux docs, they’ll talk quite a bit about why you need to renormalize your data.
前後文Link in context 連結Link
Audrey Tang
Because it doesn’t want nesting.
前後文Link in context 連結Link
Rufus Pollock
It doesn’t want nesting. Why? It’s because it’s hard to do atomic updates of individual items. it becomes a very complex thing to track. Let’s say, for example, you retrieve a post from the server. If that’s nested under the user, then you have to go to...
前後文Link in context 連結Link
Audrey Tang
It’s expensive, but we can solve it with JSON operational transformation.
前後文Link in context 連結Link
Audrey Tang
That’s a lot of technical details that we don’t want to get into, though.
前後文Link in context 連結Link
Audrey Tang
We know nesting is more trouble than flat. That’s a consensus.
前後文Link in context 連結Link
Rufus Pollock
There’s the thing. I’ve got you. OData, by the way, is like, OData and data packages have quite a lot in common. OData is quite similar. I want to stop there on what I’m saying about data packages. What I would offer is terms of value whatever route you’re going down.
前後文Link in context 連結Link
Rufus Pollock
One is common descriptive metadata. You could adopt some of the similar descriptive metadata. The other is this question on publisher/user experience. For example, you might want to have a JSON API. At the end, that’s what you’re defining in Swagger.
前後文Link in context 連結Link
Rufus Pollock
As an experience, maybe you want to have a data package intermediary there. They’re going to publish from maybe Excel.
前後文Link in context 連結Link
Audrey Tang
Yes. We already agreed on that in the previous meeting. [laughs]
前後文Link in context 連結Link
Rufus Pollock
We did, exactly. I’m just saying, you might want to publish to that, and then convert to JSON for your JSON API.
前後文Link in context 連結Link
Audrey Tang
We also agreed to that.
前後文Link in context 連結Link
Rufus Pollock
Those are things that are valuable. We can stop on that one, then, for the moment. I’d like if you could tell me a little bit about this...
前後文Link in context 連結Link
Audrey Tang
No, it’s just fine. To recap, to augment our realities a little bit, there’s a note that we made some time ago. I think it’s on your Twitter. It’s also on my Twitter, so maybe we’ll bring it up on Twitter.
前後文Link in context 連結Link
( Website: https://twitter.com/audreyt/status/829201093368356864 )
前後文Link in context 連結Link
Audrey Tang
As the road map of what we’re doing this afternoon, it is a great example. This is the drawing that we had.
前後文Link in context 連結Link
Audrey Tang
What we essentially said was that there’s an existing system, which for one reason or another, doesn’t have a vendor that responds as quick as we would like to feature amendments, which is the E‑M‑I‑C system, just to put names to it. It’s the EMIC System.
前後文Link in context 連結Link
Audrey Tang
The EMIC System is where people go when we see typhoons, or any kind of natural disasters. The EMIC System has a back end for all levels of governmental units to register whatever information they have on, for example, a road is broken, or for example, a tree has fallen.
前後文Link in context 連結Link
Audrey Tang
Usually, their user experience is that of instant messages, which means that it’s unstructured data. Well, at least it’s plaintext, not handwriting.
前後文Link in context 連結Link
Audrey Tang
Maybe not as unstructured as it can get, but still, for textual data, it’s just a line of description.
前後文Link in context 連結Link
Audrey Tang
People then look at that line, and decide what to do with it. Most of the time, they just shove it into this time‑stamped append‑only log of disaster data exchange, and then forward to people who want to subscribe to this kind of alert information.
前後文Link in context 連結Link
Audrey Tang
Then for those subscribers, they maybe add a tag, or maybe even without a tag, meaning they just want to subscribe all the information from the Taipei City, for example.
前後文Link in context 連結Link
Audrey Tang
Then they get a relay of all those aggregated information out of it.
前後文Link in context 連結Link
Audrey Tang
What we would like to do, is if it describes a particular place, and we know the place already, we would like to convert it to some kind of geotagging. We’re geotagging it.
前後文Link in context 連結Link
Rufus Pollock
You enrich it, basically. There’s an enrichment.
前後文Link in context 連結Link
Audrey Tang
There’s an enrichment, exactly. Then, of course, the reported time may not be the same as the time that it actually happened. You also want that part of the ETL, which is the time part, not just the space part.
前後文Link in context 連結Link
Audrey Tang
The end result, what we want, of course, is a layer on top of a public-visible map, like OpenStreetMap, with anything that let people query what the disaster is like around your neighborhood.
前後文Link in context 連結Link
Audrey Tang
Of course, the EMIC System is already under a lot of stress performance‑wise from the existing governmental users, and the existing aggregation, and the queries.
前後文Link in context 連結Link
Audrey Tang
The existing hardware would not support any more ETL. What we said last meeting, even though we didn’t say EMIC specifically, is that because it’s all in this huge relational database, we need somehow to get it to export in bulk regularly, with a clean data description language that we can amend as the enrichment goes on.
前後文Link in context 連結Link
Audrey Tang
For example, out of every message, we currently only extract time and text. At some point we would like to extend a TopoJSON-compatible description, like when it talks about road we want the road there, which is a line or a path and so on. We don’t have the complete listing yet.
前後文Link in context 連結Link
Audrey Tang
We want the DDL to be able to grow, and when we grow it we want the API to stay clean, compatible with existing consumers but also versioned in some way.
前後文Link in context 連結Link
Audrey Tang
There’s already users in other government agency who broadcasts push notifications, and we don’t want those services to stop. It would be silly to rebuild that. We want this to continue, but against a clean API from now on which is where an data schema comes in. It’s already communicating over HTTP anyway, so why not structure it?
前後文Link in context 連結Link
Audrey Tang
What we talk about is we take the DDL here and say, "Every hour get me a data package of everything that happens around EMIC, and then publish it as open data packages," and then send it to a second system which is called the NCDR system. The NCDR system currently is for disaster prediction and reduction. The R stands for reduction I think.
前後文Link in context 連結Link
Audrey Tang
While NCDR doesn’t have the official mandate to publish alerts around disasters, their develops are in‑house, and they’ve got some good ArcGIS developers there. They can very easily do visualizations that not only look beautiful, but it’s also very useful.
前後文Link in context 連結Link
Audrey Tang
What they don’t have at the moment is first‑class access to the aggregated database of EMIC.
前後文Link in context 連結Link
Audrey Tang
Data packages obviously is a way for people who don’t have a relationship, let’s say, with each other, to nevertheless fully consume the other person’s data with the confidence that it will not break. What we are saying is that on the national open‑data platform, we would now house a schema for the data package.
前後文Link in context 連結Link
Audrey Tang
The EMIC will start producing structured data, and then the existing National Development Council’s open data validation team...I don’t know whether you’ve talked to these people.
前後文Link in context 連結Link
Rufus Pollock
No. I haven’t.
前後文Link in context 連結Link
Audrey Tang
They already have a data quality program this year. They will use machines to ensure that the EMIC keeps honoring its open data promise.
前後文Link in context 連結Link
Audrey Tang
Note that it’s not checking some API description. It’s just plain open data.
前後文Link in context 連結Link
Audrey Tang
Now we have NCDR consuming this data, and during disaster we want the frequency to pick up.
前後文Link in context 連結Link
Audrey Tang
We also want to do enrichments at the NCDR, because there are now developers in‑house. We can tell them to get all sort of push notifications, as long as it’s not called government official alerts which is outside of their purview. They can do a lot of even more value adding with for example the weather data, Water Dam control data and so on.
前後文Link in context 連結Link
Audrey Tang
Also they already partner with Google and other vendors, with standard CAP protocols. Then they can publish not with their own API, but with the existing web‑based APIs for disaster recovery to the consumers of Facebook Safety Check.
前後文Link in context 連結Link
Rufus Pollock
Yeah.
前後文Link in context 連結Link
Audrey Tang
And Google, and whatever. This is standardized. We don’t need to do anything here. It can also publish its own enriched data as a kind of derived data set, as also open data on the national data platform. People who want data don’t really need to look at the raw EMIC data.
前後文Link in context 連結Link
Audrey Tang
They can look at the enriched data that NCDR will already have, not only the raw fields but also the enriched data fields. That’s what we planned essentially with us meeting, without me telling you the used case. That’s essentially the idea.
前後文Link in context 連結Link
Rufus Pollock
That’s really great. One of the things I think about is that certainly when you’re having large ingestion of raw data, obviously one of the other things you can do is start... often, at the beginning, you could just be dumping...
前後文Link in context 連結Link
Rufus Pollock
Rather than put it into a structured database, one of the things I like at the moment is you could just use flat files. If you have a DDL and you’re dumping data packages, if you’re getting a lot of raw log data in, you could actually run a lot of your enrichment pipeline.
前後文Link in context 連結Link
Rufus Pollock
For example, we have this thing at the moment we’re calling — that’s about to be a post, I think — data package pipelines. Basically, ETL pipelines run around packaged input of each stage of the data package. They output the next stage of the data package.
前後文Link in context 連結Link
Rufus Pollock
If you do quite a bit of ETL, one of your pain points is normally, as they get complicated, "What is the agreement between different stages of the pipeline?"
前後文Link in context 連結Link
Rufus Pollock
We’re using this at the moment. This is very crude in the sense it’s a mini‑ETL language for a pipeline that’s based around tabular data, particularly.
前後文Link in context 連結Link
( Website : https://github.com/frictionlessdata/datapackage-pipelines )
前後文Link in context 連結Link
Audrey Tang
It’s kind of naive.
前後文Link in context 連結Link
Rufus Pollock
Yes, a very naive one, but even if you’re doing a complex code one, you could still have your contract between different parts of the pipeline be written around it. I’m doing one at the moment where I’m looking at cyber security incident data, and more also infection data, like your machine is not yet used for anything, but it’s vulnerable. We have billions of rows come in.
前後文Link in context 連結Link
Rufus Pollock
You can have a similar enrichment. For example, you just have an IP address. You need to add the geo location. You need to add other information.
前後文Link in context 連結Link
Rufus Pollock
This kind of aspect of, "How do you build that pipeline?" comes up a lot. I think it’s very useful.
前後文Link in context 連結Link
Audrey Tang
The "run" here is arbitrary dot‑separated identifiers that mean something to the tools. It doesn’t have a typology?
前後文Link in context 連結Link
Rufus Pollock
This is the name of an ETL stage. You could imagine this getting more enriched, where you even have a little library of the things that you can do. At the moment, it’s very basic.
前後文Link in context 連結Link
Audrey Tang
So it’s designed like a Docker Compose file?
前後文Link in context 連結Link
Rufus Pollock
Yeah, and we do run this in Docker, exactly. What you just say is, "OK, yeah, exactly." You can add things, format, and you can attach things.
前後文Link in context 連結Link
Rufus Pollock
It’s a really simple thing. It’s just a very basic convention of data packages, but it’s the tooling around it.
前後文Link in context 連結Link
Audrey Tang
So all the drawings here that we do can be described by a less naive, but still useful — usefully inspectionable — variant of data package pipeline.
前後文Link in context 連結Link
Rufus Pollock
Yeah. Just to be clear, you don’t have to do data packages with CSV. To really emphasizes it, you can have data packages with JSON in them.
前後文Link in context 連結Link
Rufus Pollock
Data packages, at the most naive level, are just like a steel container. You can put what you like inside them. You can have JSON data and a JSON Schema.
前後文Link in context 連結Link
Rufus Pollock
For tabular data, in my experience, the simplest thing is CSV and using JSON for table schema. Why? Size constraints, you could still CSV on gigabytes of CSV. You can stream it. You can stream JSON, but it’s a bit pain...the pause is much more complicated to stream.
前後文Link in context 連結Link
Rufus Pollock
You can stream CSV, gigabytes of it, over the web. You can stream it. Every language has well‑built, iterative structures for reading gigabytes of CSV and not breaking, like not having to load it all into memory ‑‑ just stuff like that, which, when you’re building ETL pipelines, really matters.
前後文Link in context 連結Link
Rufus Pollock
Should we talk about the open fund?
前後文Link in context 連結Link
Audrey Tang
Sure, of course.
前後文Link in context 連結Link
Audrey Tang
Thank you for the consultation, wizard.
前後文Link in context 連結Link
(laughter)
前後文Link in context 連結Link
Audrey Tang
Really, this is great. Before this case, we’ve never really dealt with true cross‑ministry regular pipelines.
前後文Link in context 連結Link
Audrey Tang
We dealt with different units in the National Development Council, and that’s working pretty well with the regulation preview and also visualization of the national budget.
前後文Link in context 連結Link
Audrey Tang
These are managed by different units within the NDC, so it’s easy, and they have a common IT department.
前後文Link in context 連結Link
Audrey Tang
This afternoon is the first case where we’re going in and say to two different ministries, "Hey, you’re going to do things the new way." I’m pretty excited. If this can be made to work and well‑documented, we can do this with more ministries.
前後文Link in context 連結Link
Rufus Pollock
That would be great. The one other question that I wanted to talk about was open software, actually more than the open data side. That also comes to the open fund.
前後文Link in context 連結Link
Rufus Pollock
I could start with a question, which is, does anyone know the list of vendors, the IT, the national government or city governments, for last year how much they spent?
前後文Link in context 連結Link
Audrey Tang
Of course. It’s our basic procurement data.
前後文Link in context 連結Link
Rufus Pollock
Great.
前後文Link in context 連結Link
Audrey Tang
This shouldn’t take long to find; it’s between $2-3 billion TWD per year.
前後文Link in context 連結Link
Rufus Pollock
The question is, has anyone then checked how much of that spending went to on open software to pay for or buy software or services? It could be actually running a system that used open software versus closed software.
前後文Link in context 連結Link
Audrey Tang
It’s got to be less that 0.1 percent, so I don’t think anyone...
前後文Link in context 連結Link
Rufus Pollock
No one bothers looking at it, got it.
前後文Link in context 連結Link
Audrey Tang
Exactly. No one bothered counting.
前後文Link in context 連結Link
Rufus Pollock
It did not even register.
前後文Link in context 連結Link
Audrey Tang
Actually, when I’m look at the raw numbers now, a lot of it is in services. It’s in building an IT system or maintaining an IT system, not on software licensing.
前後文Link in context 連結Link
Rufus Pollock
I got that, but the question would be how much of the software inside of that service was open software versus closed software, how much that would be. Often, governments don’t know. That would be my question.
前後文Link in context 連結Link
Audrey Tang
We do have a breakdown.
前後文Link in context 連結Link
Audrey Tang
It’s mostly hiring services, it’s building, it’s maintaining, and very little on licensing actually.
前後文Link in context 連結Link
Rufus Pollock
The classic example I also give is — I won’t bring up the diagram — let’s say this a piece of medicine. This is a pill that you take. How much of the cost of that is the information and how much of that is the cost of the manufacturing, the actual chemicals?
前後文Link in context 連結Link
Rufus Pollock
Let’s say you buy a treatment. You buy $100 of pills. How much of that $100 is the manufacturing cost? How much of that is the information cost? It will vary, right?
前後文Link in context 連結Link
Audrey Tang
It will vary depending whether it’s a generic or it’s a label drug.
前後文Link in context 連結Link
Rufus Pollock
Let’s say, an in‑patent drug, it might be that a dollar of that, if we would really draw this diagram correctly, this tiny bit down here would be the cost of the actual chemist and the actual manufacturing it. $99 would be the cost of the information. You can ask the same thing when you buy services.
前後文Link in context 連結Link
Rufus Pollock
It’s true people don’t normally buy a drug recipe. They buy a drug. They buy the actual pill. Similarly, most people actually buy a working software system with servers that are running and doing stuff, but, obviously, inside of that is the sysadmins, the electricity, the energy to run all this service. The other part is the software, the know‑how.
前後文Link in context 連結Link
Rufus Pollock
The question would be to ask of how much was spent on IT last year, how much of it was crudely for buying information, and the know‑how, and the actual rest of the software, and running that, and how much of it was paying for the engineers and the electricity to run the computers and stuff like that? I’m going to ask that as one question.
前後文Link in context 連結Link
Rufus Pollock
The question would be, how much of that went on information or buying information as opposed to paying for just the electricity and all the other stuff, which you pay for whether you’re using open software or closed software? How much of that went on open software? Then you could ask, what’s your goal?
前後文Link in context 連結Link
Rufus Pollock
How much of government spending should go on open software or closed software and why? The other question I would ask you is, if you went around who is in charge of IT budgets in given departments or across government, what’s the name of the guy who runs that right now?
前後文Link in context 連結Link
Audrey Tang
There’s a fixed portion that goes to SMEs.
前後文Link in context 連結Link
Rufus Pollock
A fixed portion?
前後文Link in context 連結Link
Audrey Tang
Yeah. There’s a fixed SME portion.
前後文Link in context 連結Link
Rufus Pollock
How much is it?
前後文Link in context 連結Link
Audrey Tang
Just a second. Let me double‑check before I say anything silly.
前後文Link in context 連結Link
( For 2016, it’s 40% at a minimum. Source: http://gpw.moeasmea.gov.tw/moeasmea/wSite/ct?xItem=60540&ctNode=822&mp=00205/ )
前後文Link in context 連結Link
Rufus Pollock
There’s a fixed portion? We could ask ourselves, what is their goal?
前後文Link in context 連結Link
Rufus Pollock
For me, a goal of having open software, that isn’t my goal. My goal is value for money, a good deal for Taiwan citizens or for British citizens because we don’t care about open software or closed software for its sake unless we’re Richard Stallman maybe. I’m joking.
前後文Link in context 連結Link
Rufus Pollock
Probably he doesn’t care. He cares about freedom. Richard Stallman doesn’t care about free software. Actually, he cares about freedom. Similarly, maybe whoever runs spending here probably cares about or they at least say they care about value for money.
前後文Link in context 連結Link
Rufus Pollock
My questions is, if you went around now and asked decision makers in Taiwan’s government what role do they think open software has in getting value for money, would they even know what open software was? Would they know what was good or not good about it?
前後文Link in context 連結Link
Audrey Tang
There was a huge push around OpenOffice in 2013, which is what I was trying to find, because there was this justification paper that says how much licensing cost it saves over Microsoft Office subscriptions and so on.
前後文Link in context 連結Link
Audrey Tang
I couldn’t quite find the report, but the original justification was very close to what you have said, which is to promote a local community around the then OpenOffice, now LibreOffice community and for people to get in touch with the Chinese text processing part of it instead of waiting for Microsoft to roll next.
前後文Link in context 連結Link
Audrey Tang
The main argument was that we can save this much amount on licensing cost, which I was trying to find but couldn’t.
前後文Link in context 連結Link
( The 2013 report on OpenOffice.org: http://www.dgbas.gov.tw/public/Data/3258342471.pdf )
前後文Link in context 連結Link
Rufus Pollock
My question here is let’s say you had a goal to have more open software; we’re convinced that more open software would mean better value for money for Taiwan. It would mean better IT systems probably, that they were more agile, they were more adaptable to Taiwan’s needs, and that they were probably in the long run cheaper.
前後文Link in context 連結Link
Rufus Pollock
I actually come to the cheaper item last. The number one item is freedom to adapt to your needs faster and better.
前後文Link in context 連結Link
Audrey Tang
I’m pretty sure the IT decision makers know about this, but they don’t usually consider it practical. [laughs]
前後文Link in context 連結Link
Audrey Tang
They have heard of this argument a lot.
前後文Link in context 連結Link
Rufus Pollock
What I want to talk about now is practical. I went and talked to the OECD leaders. I was saying like, "What can you change? What can you guys, what can you change while you’re here in office?" You could build systems. The thing is you’ll, at some point, probably all leave. It’s natural. You’ll get hired by somebody else. Maybe you’re here anyway on a six‑month period.
前後文Link in context 連結Link
Rufus Pollock
What often stay around in bureaucracies is process. The rules that you’ve put in place, if you want to buy differently, you need to change the rules by how you buy. Rather than focusing on what you buy, rather than us going and having an argument directly about, "We should buy more open software," you want to talk about how you’re going to buy. I think it’s actually on my website, if you wanted to put it up.
前後文Link in context 連結Link
Rufus Pollock
Let me just see. Rufus Pollock. I think if you search for Rufus Pollock. Let’s just have a look. If you look for Rufus Pollock, Rufus OECD, I think that might work. I think the second slide or item which needs to be Twitted [indecipherable 70:09] my website. I have to say Google is constantly reducing the priority of individual personal websites. If you then click on the link, I think it’s here. You’ll have them. This slide does not show up...
前後文Link in context 連結Link
Audrey Tang
I don’t think it’s personal. It’s insecure, not HTTPS, and embedded from a HTTPS site.
前後文Link in context 連結Link
Rufus Pollock
Is that what it is?
前後文Link in context 連結Link
Audrey Tang
That is what it is. [laughs]
前後文Link in context 連結Link
Rufus Pollock
If you click on this...
前後文Link in context 連結Link
Audrey Tang
You probably want to run Let’s Encrypt on your website.
前後文Link in context 連結Link
Rufus Pollock
Yeah, I could get a cert. I could turn it on.
前後文Link in context 連結Link
Rufus Pollock
If you just scroll down quite a way, there were some examples. Keep going.
前後文Link in context 連結Link
( Website: http://rufuspollock.org/wp-content/uploads/2016/09/data-driven-government-oecd-cio-sep-2016.pdf )
前後文Link in context 連結Link
Rufus Pollock
One more thing: You can only procure what you know. One of the big problems for IT also is if you want to go and ask someone, one great question to ask people who are buying IT of any kind. Just ask them, "What’s the difference between buying software, or buying an IT service, and buying a chair?" What’s the difference?
前後文Link in context 連結Link
Audrey Tang
The stock of the common procurement doesn’t run dry when you buy software.
前後文Link in context 連結Link
Rufus Pollock
That’s one point, absolutely.
前後文Link in context 連結Link
Rufus Pollock
What’s the other thing? How much have chairs changed in the last 100 years? How different are these chairs from 15 or 20 years ago?
前後文Link in context 連結Link
Audrey Tang
Not since the Bauhaus.
前後文Link in context 連結Link
Rufus Pollock
Bauhaus.
前後文Link in context 連結Link
Audrey Tang
[laughs]
前後文Link in context 連結Link
Rufus Pollock
These chairs have not changed. The basic design of a chair has not changed in hundreds of years, but computers have changed quite quickly. In addition, you can only buy what you know.
前後文Link in context 連結Link
Rufus Pollock
I think it was a famous saying, everyone attributes stuff to Steve Jobs, like they attribute it to Einstein or whatever. Basically, you can only buy what you can imagine.
前後文Link in context 連結Link
Rufus Pollock
If you’re going to create something that people can’t imagine, you have to create it for them to know it.
前後文Link in context 連結Link
Rufus Pollock
The problem with buying software is people are trying to procure things they don’t always know that they can have. They only go with what they can already imagine.
前後文Link in context 連結Link
Rufus Pollock
Also, chairs are very standardized. You can compete on price. There are a few different attributes that you can rate. Software’s not like that.
前後文Link in context 連結Link
Rufus Pollock
In addition, when you buy a chair, you use it for 10 years. Then it gets old. Then you maybe throw it away. You buy new chairs. Let’s say these chair wear out and we want to buy new chairs.
前後文Link in context 連結Link
Rufus Pollock
There’s basically no connection between the chairs we buy next and the chairs we had before.
前後文Link in context 連結Link
Audrey Tang
Except they have the same API.
前後文Link in context 連結Link
Rufus Pollock
We’re talking about chairs here.
前後文Link in context 連結Link
(laughter)
前後文Link in context 連結Link
Rufus Pollock
Yeah. They have the same API. You can still sit on them, that does not change. That’s a very good point.
前後文Link in context 連結Link
Rufus Pollock
But the point is there’s no relationship. If these chairs are really bad and really uncomfortable, you can buy a different set of chairs. It’s not a problem.
前後文Link in context 連結Link
Rufus Pollock
With software, when you buy one thing, normally you have to lock it. Even the people who come along and say, "Don’t worry. We’ll have standard open APIs. You don’t need to worry what’s behind the scenes anymore," that’s just rubbish.
前後文Link in context 連結Link
Rufus Pollock
Don’t believe them. Don’t even believe them if they’re Audrey Tang. It’s not true. The truth is...
前後文Link in context 連結Link
Audrey Tang
...they’re still bad chairs. [laughs]
前後文Link in context 連結Link
Rufus Pollock
...software is way more complicated than its API.
前後文Link in context 連結Link
Rufus Pollock
APIs are entirely surface in the interaction between different things.
前後文Link in context 連結Link
Rufus Pollock
Just for example, go back to a story. Late 1980s, IBM is still one of the richest companies in the world. IBM say, "Well, look. We know what the problem is. People are locked in to the DOS APIs. All we need to do is clone the DOS APIs. It’s not that hard. And you’re going to be completely competitive with them."
前後文Link in context 連結Link
Rufus Pollock
IBM spent like four or five years and a hundreds of millions, if not billions, of dollars trying to clone the Microsoft APIs to be feature compatible, so that people can run their applications on IBM’s system just like they can run it on Windows or DOS. They don’t succeed.
前後文Link in context 連結Link
Audrey Tang
That was also because the standard was set by a single vendor. It’s a very old debate.
前後文Link in context 連結Link
Rufus Pollock
It’s not just that. Look at AWS. Look at it. There’s AWS for cloud. Cloud APIs aren’t that hard to clone. Yes, they are. Look how fast AWS is producing new services. Now there’s Lambda and there’s this. Every week there’s a new service from them.
前後文Link in context 連結Link
Rufus Pollock
People always start out and say, "Oh, API’s so simple." Also what happens when you need to add stuff to the API? Actually the software beneath determines how quickly and how performantly you can add new features to your API.
前後文Link in context 連結Link
Rufus Pollock
Often, by the way, web APIs still don’t cut it. Often you actually do need to integrate at the machine level. Web APIs all still have significant latency and performance issues. Serializing JSON in and out of APIs is not hugely performed. You often end up having to go back to the system.
前後文Link in context 連結Link
Rufus Pollock
Believe me, I know it. Also, when people actually want to switch vendors...
前後文Link in context 連結Link
Rufus Pollock
For example, I went on a contract, bid on a contract years ago. The World Bank and their enthusiasm at the beginning of the open data boom...
前後文Link in context 連結Link
Rufus Pollock
The finance section at the World Bank. Not the World Bank main part that bought from Socrata, which was a proprietary vendor of data platforms. They did an open bid. It wasn’t really an open bid, but they said it was an open bid.
前後文Link in context 連結Link
Rufus Pollock
They said, "OK. We’re opening bidding." The bidding ended at the end of April. They said, "By the beginning of June, when our contract expires, you need to have migrated all the data to our new system and be compatible with the old API from Socrata."
前後文Link in context 連結Link
Rufus Pollock
Socrata had that open API. No one else implemented it. Also you had to migrate all the data, comply with all the performance quality [indecipherable 75:55] . That’s classic stuff.
前後文Link in context 連結Link
Rufus Pollock
"Oh, yes. The API is open, but you need to be able to stand up, migrate all the data, do all the other stuff, to really have competition." Remember, the guy who’s losing the business is not going to be super helpful about migrating all this stuff in.
前後文Link in context 連結Link
Rufus Pollock
There will always be the features that they added that you really needed, whatever it is. It’s really hard to maintain true backwards compatibility for all the edge cases, for all the little bits of API that have been added that you didn’t notice.
前後文Link in context 連結Link
Rufus Pollock
The real guarantee of competition is open software. Ultimately you must have the stuff underneath open, or you don’t get real freedom of choice.
前後文Link in context 連結Link
Rufus Pollock
The thing is, "How do you get more open software?" Let’s just go to the next slide for a moment. This is what I want to emphasize. A lot of time we start saying, "We want more open software of something." The question is how you buy. That’s what you can change. Let’s just go to the next slide.
前後文Link in context 連結Link
Rufus Pollock
I’m sorry to have this annoying scroll.
前後文Link in context 連結Link
Audrey Tang
No. It’s fine.
前後文Link in context 連結Link
Rufus Pollock
What can you do? This is a little bit for the OECD. It’s a little bit like I customized it for them.
前後文Link in context 連結Link
Rufus Pollock
One of them is agile‑lean, another is explicitly favor open software in procurement deals in Taiwan. Do you have an explicit category that allows for open software?
前後文Link in context 連結Link
Rufus Pollock
How big is it? Have you guys ever bought...
前後文Link in context 連結Link
Audrey Tang
We do.
前後文Link in context 連結Link
Rufus Pollock
How big is the points relative to other items?
前後文Link in context 連結Link
Audrey Tang
It’s a section called intellectual property ownership. We just revamped that section.
前後文Link in context 連結Link
Rufus Pollock
Often what happens is people say, "We want to just have the right to the software." It’s often clever by the way.
前後文Link in context 連結Link
Rufus Pollock
If you’re smart, what you’ll do is you’ll say, "Everything that you’re going to buy..." I won’t name a vendor. Let’s say I’m vendor X. You’ll come and buy from me. I’ll say, "Yes. Yes. Everything I build under this contract, everything I build under this contract, will be open software."
前後文Link in context 連結Link
Rufus Pollock
What I build under this contract will just be the layer, a thin layer, on top of all my existing proprietary technologies. Yes. At the end of this contract, you will own this layer or it will be open, this layer of open software. It’s completely useless without the rest of the stack.
前後文Link in context 連結Link
Audrey Tang
Yes. We’re all painfully familiar with this phenomenon.
前後文Link in context 連結Link
Rufus Pollock
You’re all painfully familiar.
前後文Link in context 連結Link
Audrey Tang
We’ve been through this many, many times. [laughs]
前後文Link in context 連結Link
Rufus Pollock
This is why, guys. You not just being coders, but getting in there and writing the procurement rules, really carefully. Saying stuff like, "this scoring is not just for the software we buy under this contract, but it’s for the entire solution that must be needed to run it."
前後文Link in context 連結Link
Audrey Tang
Down to operating system level.
前後文Link in context 連結Link
Rufus Pollock
Exactly. The other one is like principles for buying agile. The two I want to talk about is you’ve got open. Buying agile makes a huge difference. It’s hard for governments, because they like to have spec and deliver. They find that hard to do.
前後文Link in context 連結Link
Audrey Tang
Yeah, we don’t have spec-free procurements here either.
前後文Link in context 連結Link
Audrey Tang
But I do see your point.
前後文Link in context 連結Link
Rufus Pollock
Open offsets. That’s the one I wanted to talk, which is setting up rather than the whole thing that gets you out of...
前後文Link in context 連結Link
Rufus Pollock
The thing still of traditional procurement, even when you procure open software, is you know what you’re buying. I want a chair. How much will it cost to have a chair? I want a website with this, this, and this feature.
前後文Link in context 連結Link
Rufus Pollock
To get out of that space, you either need to go the agile route, where you stop buying software and you buy development time basically.
前後文Link in context 連結Link
Audrey Tang
Which is what we’re doing in PDIS.tw.
前後文Link in context 連結Link
Rufus Pollock
Right. Good idea. The challenge of that is you end up having to build most of the software yourself.
前後文Link in context 連結Link
Audrey Tang
Well, we have a time frame. We put out a budget. We can still outsource standard-compliant solutions and welcome bidding.
前後文Link in context 連結Link
Audrey Tang
People who come up with WordPress or Drupal in their stacks automatically gets a better seat in the bidding.
前後文Link in context 連結Link
Rufus Pollock
I got that. What’s tough about it is still...
前後文Link in context 連結Link
Rufus Pollock
Let’s have a piece of software that will be valuable if we build it, to Taiwan, to Great Britain, to Brazil. Here in Taiwan you can procure agilely, but you then have to absorb either all of the cost for that piece of software...
前後文Link in context 連結Link
Audrey Tang
Yeah, all of the cost, mostly.
前後文Link in context 連結Link
Rufus Pollock
Right. Then that makes it hard for the pieces of software that are only feasible when they’re supplied to multiple countries. You have a coordination problem.
前後文Link in context 連結Link
Rufus Pollock
Now one route around that is rather than going through traditional procurement, where you say, "I want X, and I’m going to put up this amount of money for it," you can go to an open offsets model where we have this fund. We put money in it. Then we just pay out of the fund for what we turned out to use.
前後文Link in context 連結Link
Rufus Pollock
That means I, as an open software vendor, like of the classic VC company, I still have to take the risk. I am going to build this piece of software. If it gets used, I know I get paid out of these funds.
前後文Link in context 連結Link
Rufus Pollock
It means you invert it. The government no longer needs to know it has chairs. It just looks at the end of the year and says, "What did we use? Oh, we used 5 tables, 10 chairs, 3 tanks." [laughs] Whatever we bought, we’ll pay for out of our open offset fund. The vendors take the risk capital of creating the software in the first place.
前後文Link in context 連結Link
Audrey Tang
It’s an open alternative to the model of SaaS, basically.
前後文Link in context 連結Link
Rufus Pollock
Exactly. It’s the open version of SaaS. It’s even better, by the way, because you can distinguish in your payments in the fund. You can pay for the service part of the SaaS separate from the open information.
前後文Link in context 連結Link
Rufus Pollock
Why is that important? You don’t just want to pay the vendor who supplied you the SaaS service.
前後文Link in context 連結Link
Rufus Pollock
For example, let’s say someone supplies WordPress services to you. Your procurement will probably just cover the SaaS part of it, the part of running the service. You often won’t actually be paying, unless you’re buying new features, for any of the underlying development of WordPress, but you kind of care of that.
前後文Link in context 連結Link
Rufus Pollock
The open offsets fund could even be separated from the SaaS procurement. The offsets fund pays for the software.
前後文Link in context 連結Link
Audrey Tang
What I’m saying is the development needs to piggyback on subscriptions, because it’s the same cycle. We renew annually. This option is to renew annually, because conceptually it’s the same deliverable.
前後文Link in context 連結Link
Rufus Pollock
Exactly, it’s renewed annually. Let’s just scroll down one more. This is the guide. This is the things. This is why software is different from chairs and bridges. Switching cost is significant, and governments are bad at negotiating. Governments are pretty terrible at negotiating.
前後文Link in context 連結Link
Rufus Pollock
We can keep scrolling down, keep going. This is the current model and this would be the new model. The new model, government has these IT funds. You pay on use, which is the open offsets. You have spec‑and‑deliver and you have agile. You have three routes, rather the traditional just spec and delivery.
前後文Link in context 連結Link
Audrey Tang
What we have now is if it’s SaaS, then we call it the cloud procurement, which is the top part.
前後文Link in context 連結Link
Rufus Pollock
Which is pay‑on‑use.
前後文Link in context 連結Link
Audrey Tang
Which is pay‑on‑use.
前後文Link in context 連結Link
Rufus Pollock
That’s still not open offsets.
前後文Link in context 連結Link
Audrey Tang
No, it’s not.
前後文Link in context 連結Link
Audrey Tang
What I’m saying is that currently in Taiwan’s IT procurement, everything that’s SaaS is counted differently than the spec‑and‑deliver. We’re trying to convert the spec‑and‑deliver line here.
前後文Link in context 連結Link
Audrey Tang
That was the picture, but these are two different terms on the procurement sheet.
前後文Link in context 連結Link
Rufus Pollock
That’s great. The only thing I’d point out is that cloud procurement is just like traditional software buying, lock‑in. The problems of lock‑in to a cloud provider are just as big.
前後文Link in context 連結Link
Rufus Pollock
If you look at the cloud providers and you go and look at the ones in California, like Workday, any of the really big ones, and you look at their revenue recognition and the way they think about it, they’re clearly willing to lose massive amounts of money to acquire customers.
前後文Link in context 連結Link
Rufus Pollock
Why are they willing to do that?
前後文Link in context 連結Link
Audrey Tang
The want to sell the data?
前後文Link in context 連結Link
Rufus Pollock
No, not because they want to sell the data.
前後文Link in context 連結Link
Audrey Tang
OK, maybe not... [laughs]
前後文Link in context 連結Link
Rufus Pollock
Because you’re locked in. Once you’re on their platform, once you’re on Workday for HR, you’re on Salesforce, or you’re on these platforms, you are not going to switch off easily.
前後文Link in context 連結Link
Audrey Tang
Yeah...
前後文Link in context 連結Link
Rufus Pollock
They are running the same model that old software...that Windows ran. It’s no different. It’s just a recurring revenue rather than a fixed...Windows, you still pay for upgrades. People think it’s so different. Windows used to upgrade every two, three, four years. You bought it at the time, and then you upgraded four years later, but you kept having to upgrade.
前後文Link in context 連結Link
Rufus Pollock
If we keep going down, we’re nearly finished. But my advice for you guys, that’s the agile and lean part, and we go open offsets, it’s the Heartbleed bug.
前後文Link in context 連結Link
Rufus Pollock
Start tracking use. My question to you guys would be to say, in terms of policy that you could have, one is could you start tracking use? Is there any way to set up a small, at the beginning, open fund? Say to government, "Hey, could we take some of this cloud budget, maybe five percent right now, and put it in a fund?"
前後文Link in context 連結Link
Rufus Pollock
"It’s completely limited to open software. We’re not paying for stuff we actually use, like a SaaS license, but we look to the software inside of it. We’re paying the core developers. If EtherCalc gets used or Etherpad or HackMD or WordPress or CKAN gets used, you pay the original developers."
前後文Link in context 連結Link
Audrey Tang
We’ve been making some changes.
前後文Link in context 連結Link
Audrey Tang
Literally the first thing I did as the Digital Minister is to change the procurement rules.
前後文Link in context 連結Link
Rufus Pollock
Wow, it’s so great.
前後文Link in context 連結Link
Audrey Tang
I’ve been thinking about this for years.
前後文Link in context 連結Link
Rufus Pollock
You’ve got to do it. Ideas are cheap. Implementation...
前後文Link in context 連結Link
Audrey Tang
Right, and then we put it up for 60 days of public consultation. I think it’s drawing to a close now. Let’s look at the actual comments. It’s very tricky.
前後文Link in context 連結Link
Audrey Tang
There’s 14 days left. We can’t really do anything before that. I’m sorry that it still says OpenAPI, but it will say common API standards at some point.
前後文Link in context 連結Link
Rufus Pollock
Where’s the open fund in that proposal? Is there a proposal to A, in your procurement...how are you going to measure...let’s say procurement thing, and you’re going to have points, just to get really practical. You say, "OK, we want open software, and we’re going to give points in procurement for software being open."
前後文Link in context 連結Link
Rufus Pollock
How will you assess that? Do you check with the vendor? What’s the verification process to check the stack they supply is actually open?
前後文Link in context 連結Link
Audrey Tang
This is where the cloud procurement and the spec — or agile — procurement differs.
前後文Link in context 連結Link
Audrey Tang
We don’t have a good story for the latter; this is what all of us are painfully aware of.
前後文Link in context 連結Link
Audrey Tang
You can say, "You must use PostgreSQL," but you must know you want PostgreSQL going in. For many government agencies, this is simply out of their consideration, so they just say, "OK, the web application source code must be open," or something.
前後文Link in context 連結Link
Audrey Tang
Then it ends up paying a lot on Oracle license, which is your classic example.
前後文Link in context 連結Link
Rufus Pollock
It must be the whole stack, not just the...
前後文Link in context 連結Link
Audrey Tang
It’s true. We don’t have a good story here.
前後文Link in context 連結Link
Rufus Pollock
Without that, it doesn’t mean that much. That’s the problem. Without that...
前後文Link in context 連結Link
Audrey Tang
It just means you can swap out the top-layer application vendors.
前後文Link in context 連結Link
Rufus Pollock
The vendors are smart at hacking the system.
前後文Link in context 連結Link
Audrey Tang
Yes. That means the new vendor must still know Oracle to get the winning bid. Even they get a whole open source, open data system, they must know Oracle.
前後文Link in context 連結Link
Audrey Tang
We haven’t solved that.
前後文Link in context 連結Link
Audrey Tang
However, for the cloud part, this is my main target of this procurement change. First, it’s software‑as‑a‑service, but we’re insisting now running on local infrastructure.
前後文Link in context 連結Link
Audrey Tang
That actually rules out pretty much everybody who relies on this sticky lock-in. Even if they say it’s proprietary software, it still has to run on the data center here.
前後文Link in context 連結Link
Audrey Tang
We do it not for data localization purposes, but really for know‑how localization, so that people can locally inspect what’s going on.
前後文Link in context 連結Link
Audrey Tang
As you probably are aware, if they use Oracle and the Oracle Access Manager’s stored procedures, a lot of the procedures is plain text. It’s not binaries.
前後文Link in context 連結Link
Audrey Tang
Even if we don’t manage to convince to get the entire lower stack binaries to convert to open software, you can still inspect pretty much everything, during a running system, to figure out how it’s working.
前後文Link in context 連結Link
Rufus Pollock
I got that. Yes, it is great.
前後文Link in context 連結Link
Rufus Pollock
I guess my suggestion is, and I remain at your disposal, would be, one, see if you can get these numbers, which is, "How much was spent?" and "How much was spent on open?"
前後文Link in context 連結Link
Audrey Tang
We can measure that for cloud procurements.
前後文Link in context 連結Link
Rufus Pollock
Two, could you try getting open funds set up, whose sole purpose was to say, "We paid for the open software we used last year." It might be open SSL Library, it might be WordPress, it might be CKAN, but run an experiment. It could be two percent of the IT budget to say, "Let’s run an experiment next year and see if this delivers us value for money," and see what happens.
前後文Link in context 連結Link
Rufus Pollock
You could even limit it and say, "We’ll only spend it in Taiwan," which would be really good for the government. We’ll say, "We’ll spend it on open software, but it has to be a CKAN developer in Taiwan, it has to be a WordPress developer in Taiwan, it has to someone...we will spend this money, but it will be with local economy," which is very nice. The money stays in the economy. Just try and run and open fund.
前後文Link in context 連結Link
Rufus Pollock
The other is to try and write a rule in about assessing the entire stack. Don’t put the burden on the government. You could say, "A vendor, when bidding, has to assess what percentage," and you could do it in value terms or something. You could say to them, "In terms of the cost of running this stack, what percentage of it is open software?"
前後文Link in context 連結Link
Rufus Pollock
It’s up to you to say, if you’re going to spend $50,000 running X but you’re going to spend $100,000 buying Oracle licenses, the $50,000 means you’re only doing a third...
前後文Link in context 連結Link
Audrey Tang
There’s a part in the Digital Nation Plan that develops automated tools to look at all the source code and binaries that a bidding vendor submits, and then try to figure out, first, how much of it is open licensed, how much of it is Creative Commons license, which may not be open, by the way.
前後文Link in context 連結Link
Rufus Pollock
Yes. We can just say open license. Don’t say CC license. You say open license and non‑open license.
前後文Link in context 連結Link
Audrey Tang
Well, even if it’s CC‑ND, we want to know its license.
前後文Link in context 連結Link
Rufus Pollock
Then that’s great, but it’s like going into the discussion of all the weird freeware. I’d say open license, and then you say liberal license...
前後文Link in context 連結Link
Audrey Tang
Yeah. Technically it’s "free culture license" if we are talking in a CC way.
前後文Link in context 連結Link
Rufus Pollock
Don’t worry about the free cultural works definition. Open Definition covers all of that.
前後文Link in context 連結Link
Audrey Tang
I’m aware of that. I’m just saying there’s non‑code part, too.
前後文Link in context 連結Link
Rufus Pollock
Yeah, exactly.
前後文Link in context 連結Link
Audrey Tang
Saying "open software" doesn’t cover the whole of it, the non-software parts.
前後文Link in context 連結Link
Rufus Pollock
That’s a very good point.
前後文Link in context 連結Link
Audrey Tang
In any case, what I’m saying is that we are, as part of the Digital Nation Plan, developing this automated assessment tool, so we can get some useful numbers out of it. A breakdown of all the licenses, count of lines, and so on.
前後文Link in context 連結Link
Audrey Tang
The other easy part is to get the local talents and everybody trained to specific technologies...we count them as common things in the stacks, like Tensorflow, OpenStack, maybe Docker or something. Then we say, "OK, so for these critical parts of the infrastructure, since all the procurements..."
前後文Link in context 連結Link
Audrey Tang
You see the lines of code that use Docker, for example. Then we say, "OK, it’s important to have the local Docker community that is able to maintain this kind of thing." Then we grade people with three, maybe four levels, like being able to install and use it, maybe able to customize it, and to be able to contribute to it, for example.
前後文Link in context 連結Link
Audrey Tang
Then we want this many people empowered over four years. That’s part of the Digital Nation Plan. It’s in it already.
前後文Link in context 連結Link
Audrey Tang
The hard part is to tie this training budget with the cloud usage numbers. Currently in the Digital Nation Plan, these are completely different funds. This is for the enrichment of the community, and this is for reducing the licensing costs. We’re doing both, but not as the same project.
前後文Link in context 連結Link
Rufus Pollock
Maybe, first, if you’re going to write it on your piece of paper, because then it’ll go into your notes, I said there were three things I would suggest.
前後文Link in context 連結Link
Rufus Pollock
One was stacks, just how much is spent on open.
前後文Link in context 連結Link
Audrey Tang
Got it.
前後文Link in context 連結Link
Rufus Pollock
Number two is an open fund of five percent of the IP budget. That means that money is spent on an automated, algorithmic basis, based on just what open information we use. It could be content. By the way, it doesn’t have to be software, but just of X percent.
前後文Link in context 連結Link
Rufus Pollock
The third item is percentage, basically a procurement rule where you assess the percentage of open info in a bid, and there’s a points weighting for that. By the way, the points weighting must be very heavily on 100 percent. People make this mistake that somehow 50 percent open is as good as 100 percent open.
前後文Link in context 連結Link
Rufus Pollock
Unfortunately, it’s like imagine in all those movies...has anyone seen those James Bond movies, where at the very end he just has to remove one thing from the bomb to stop it detonating? Similarly, if you have an entirely open system, but one crucial part of it is closed, then all the value resides there. You need to give a big value for 100 percent, and then the moment you’re under 100 percent...
前後文Link in context 連結Link
Audrey Tang
You want a curve like a flipped power-law graph.
前後文Link in context 連結Link
Rufus Pollock
Yeah, exactly. You want to score a curve like that.
前後文Link in context 連結Link
Rufus Pollock
Otherwise, people just game the system with, "Oh, I got a little bit." It’s true that it’s better to have 50 percent than zero percent, but most will go, "Well, we’re using Python on Azure," or something like that. Yes, but...
前後文Link in context 連結Link
Rufus Pollock
I’m at time with you guys. I’ve got to get to another meeting in a moment, but I’ve really, really appreciated this, and really appreciated going through it with all of you guys.
前後文Link in context 連結Link
Rufus Pollock
Please be in touch. I know, Audrey, I’ve got your email. You’ve got my book on GitLab. Please keep in touch. If there’s anything else on anyone’s mind before I go today?
前後文Link in context 連結Link
Audrey Tang
Everybody can Tweet at @rufuspollock.
前後文Link in context 連結Link
Rufus Pollock
We can Tweet everybody.
前後文Link in context 連結Link
Audrey Tang
He likes and he re‑Tweets. Now I know this.
前後文Link in context 連結Link
Rufus Pollock
I check it every now and then...
前後文Link in context 連結Link
Rufus Pollock
It’s really good to see everyone.
前後文Link in context 連結Link
Audrey Tang
Yup.
前後文Link in context 連結Link
Rufus Pollock
Thank you. One last thing, could I get a picture. I’d like to put a picture up.
前後文Link in context 連結Link
(laughter)
前後文Link in context 連結Link
Rufus Pollock
I don’t normally do this, but...
前後文Link in context 連結Link
Shuyang Lin
I’ll pull it together with the record file.
前後文Link in context 連結Link
Rufus Pollock
Can you take a picture as well?
前後文Link in context 連結Link
Shuyang Lin
Yeah.
前後文Link in context 連結Link
Rufus Pollock
Can you take one? That’s probably even better, and you can send it to me under a CC BY license.
前後文Link in context 連結Link
Audrey Tang
We do CC0 here.
前後文Link in context 連結Link
Rufus Pollock
CC Zero is...
前後文Link in context 連結Link
Audrey Tang
...is compatible with everything.
前後文Link in context 連結Link
Shuyang Lin
Three, two, one. There you are.
前後文Link in context 連結Link
Rufus Pollock
Is that good? Are you happy?
前後文Link in context 連結Link
Shuyang Lin
One, two, three, cheese.
前後文Link in context 連結Link
Rufus Pollock
One, two, three, cheese.
前後文Link in context 連結Link
Shuyang Lin
Yay.
前後文Link in context 連結Link
Rufus Pollock
Yeah.
前後文Link in context 連結Link
Shuyang Lin
OK.
前後文Link in context 連結Link
Rufus Pollock
Thank you very much.
前後文Link in context 連結Link
Shuyang Lin
Thank you very much.
前後文Link in context 連結Link
Rufus Pollock
Real pleasure. 謝謝.
前後文Link in context 連結Link