Serverless Stack - Episode 02 - Focus on IAM

Before reading this article, it would be useful to understand:
You can always read tons of nitty gritty AWS documentation here.

My adventure from last episode is continuing here by tying broad IAM concepts into some specific use cases with resources. As discussed in the previous episode, IAM (Identity Access Management) is an AWS service that, basically, allows us to manage access to our AWS resources. This includes controlling whether and which users can access any of our resources, as well as what can be done with those resources, all with fine granularity.

Recall that the relationship between a user and a resource is not direct. What we are really doing in AWS IAM is creating policies that govern access to resources. These policies can then be applied to users via roles, thus granting those users access to perform specified actions against specified AWS resources.

Following is a way of thinking that has been useful for me in understanding the IAM paradigm. When we create a user in IAM, that user can log into our AWS account console using login credentials, in a way that we are used to dealing with user accounts. But this user does not inherently have permission to do anything! According to the AWS way of doing things, we do not provide any permissions directly to this user. By its nature, a user can't do anything, or access anything... until we give them a role.

A role is simply a collection of policies. For example, we might create an S3 admin role that has related policies; such as a policy to create and delete S3 buckets, a policy to add and remove files to our buckets, and a policy to read files from our S3 buckets. We might then create a more restricted S3 dev role, that only has a policy to add, remove, and read files, but not to mess with buckets. Finally, we might create an S3 read role, that only has a policy to enable reading of files from S3. (In this simple case, it would be silly to create these roles, since AWS provides them, among many others, as common, prepackaged roles; but this will serve as an example.)

What it boils down to is that policies are what actually allow permission to access resources. Policies are reusable and can be applied to any number of roles. Roles are then handed out to users, like an access badge. The roles are also reusable, and can be handed out to many users, and a user can likewise carry multiple roles. These roles can also be handed out to external users (e.g. AWS users who belong to other organizations), and AWS services that are not humans but need to interact with each other in safe, restricted manners, and others. As you can see, this is a complex system; and we haven't even begun to dive into the granularity AWS gives us to manage access to our resources! But this is a start.

The final thing I'll mention this episode is the ARN, or Amazon Resource Name, because this is the concept that gave me the greatest amount of headaches in my past run-ins with AWS resource management. Everything in AWS has an ARN, which is a "globally unique identifier" for anything that exists in the AWS ecosystem - including users! When we are designing or assigning policies (to attach to roles to grant to actors in our application architecture), we need to know the ARN, because that is the name by which we refer to resources in our ecosystem.

Serverless Stack - Episode 01 - The Beginning

In a past job I was exposed to AWS, and in particular, "serverless" hosting (via Amazon AWS's Lambda). I remember loving it, and I've decided to revisit it to deepen my knowledge, and make it a more reliable part of my software development toolbox. A friend pointed me toward the Serverless Stack tutorials online, and I'm going to try them out. They seem to present a pretty comprehensive look at setting up a simple, but feature-complete application using the serverless strategy in conjunction with a few other technologies I want to deepen my knowledge of (such as JavaScript, React, and NoSQL). The goal is to complete this tutorial, while gathering my findings and notes here in blog format, because as we all know...

The best way to learn is to teach.

So where to begin? Well, first of all, "serverless" doesn't mean serverless, per se. It means that your software doesn't get its own server; not even its own VM. Your code still runs on a server, somewhere... it's just that Lambda runs your code on a shared server, where other code for other applications is also being run. An isolated, transient environment is created for your code to run in, with all of the required security provisions, and is destroyed moments after execution is complete.

One general note about AWS is that their development console is always subject to change. I've seen quite a few iterations of it over the years. Typically, an iteration of their console survives for a shorter time than it takes me to complete a single project, though the material changes are small enough not to impede usability. However, for newer users unfamiliar with AWS, following tutorials can be challenging. When following the Serverless Stack tutorials, for example, the AWS console did not match up with the tutorial guides; the two were already out of sync. It's a small thing, but spent cycles are spent cycles. The AWS console in particular can be a little challenging to learn, but once you get started it becomes a little more natural. The UI, while ever shifting like the sands of a lost desert, always tends to present a clear path to the features you need (as long as you know what it is you need, which is where the tutorial will hopefully come into play).

I digress. The first task we encounter as part of the Serverless Stack tutorial is to create IAM (Identity and Access Management) users. IAM is the AWS service that allows us to create and manage AWS users (not users of our application). In other words, we can create users who have access to login to AWS and requisition resources, manage application content, view logs, etc. Interestingly/importantly, this includes application users. In other words, our applications and services running on AWS are, themselves, users. They need to have roles defined for them in IAM.

The users we create, are then the ones we and our applications use to run and manage our application on AWS. It is recommended that we not use the root user for these scenarios because any mistakes we make as root will have no fail-safes in place to prevent erroneous behavior (such as turning on an expensive resource and leaving it running for days before we realize what we did wrong). It forces an extra layer of attentiveness and intention to the process of managing our AWS application.


Switching to React - First Impressions

In earlier posts I wrote that I planned to use Handlebars as my templating engine for the CYOAG project. Things happened since then that caused me to change directions towards ReactJS. I'll take a moment here to say that I've had no contact with or sponsorship from the Handlebars or React teams. This is all just my opinion, formulated based on personal experience working with these technologies on the side-project level. Incidentally, though I'm enjoying working with React, this is notwithstanding my opinion of Facebook, who we have to thank for React. So there's some food for thought.

So why did I make the switch?

I realized Handlebars was more challenging to learn than a templating system needs to be. I didn't find community support to be as extensive or helpful as I'd like. Handlebars also hasn't enjoyed industry adoption as extensively as React, so from a career growth aspect React seemed a better choice. Ironically, the reason I chose Handlebars originally was because we used it on a project at work. But after talking to the lead dev on that project, I learned it wasn't selected for its quality or utility, but due to circumstance. Finally, having built my first working Handlebars prototype, it just didn't feel right. There's no remedy for that.

So, React. It's already big in the web dev scene, has been for some time, and continues to grow. Whether it'll become a longstanding mainstay in industry remains to be seen, but signs point to yes.

I'm surprised to say it, but React was easier to pick up than Handlebars. It surprises me because React has lots of new ideas and syntax you have to learn, while Handlebars has relatively few and advertises itself as a "no frustration" templating system; which, I can attest, is patently false. React, likewise, advertises itself as "painless," also a lie. I'm being a little harsh here, but point being that I find React less painful than Handlebars, despite the fact that neither is a walk in the park.

To be fair, Handlebars lends itself better to smaller projects. The infrastructure and planning required to use it is lighter. To get Hello World up in React, you have to do more work, download more libraries, and learn more technology: JSX, props, state, rendering, components and their lifecycles, etc. In Handlebars, you slap up a template and you're off to the races (relatively-ish). Build processes are another story. I found precompilation to be roughly equivalent in difficulty from Handlebars to React. The documentation is better for React, though, due to superior community support. More community support means more flavors of help, and a greater likelihood one of them will match your style of learning.

The appeal of React is reusability, structure, and organization. From a mile-high perspective, let's say you're the app lead for Facebook, and you want a UI component to let people "Like" things. All kinds of things; wall posts, images, events, shared links, comments, everything. With React and some proper planning, you can make a single component displaying a "Like" link, that dynamically hooks up to whatever needs to be "Like"d. The component itself doesn't care what it belongs to. You can plug-and-play it wherever. One downside to this, as mentioned above, is that the initial development of this component is relatively heavy. Once you have it built, though, you can reuse it dozens, hundreds, millions of times, at very little dev cost.

Another downside is that you have to be conscious of your component structure when building larger applications. CYOAG is not a large application, but if I didn't think about component structure in advance - and thus plan out how state was to managed and communicated among components - I'd find myself having a real bad time. The complexity of component interactions isn't hard to wrap your head around, but once you've built them, that complexity makes maintenance and modification a chore, and a considerable risk.

Part of this is because state and props "cascade" (my term) from parent components to child components when you build an application The React Way. So if anything changes along the cascade path, you have to make sure you account for it at every step and in every handler. Of course, things change as we go, this is life; but I think we can all agree it's to our advantage to minimize this, and with React, best practices in design amplify this risk/advantage relationship.

In short, React encourages better software engineering principles. Other technologies allow you to start hacking out a solution before thinking it through, which can lead to other kinds of headaches down the road.

Non-technical Example

We can use my (thus far incomplete) React component design doc for CYOAG as an example for this. I won't bother quoting the doc here, you can read it if you like. In short, the application state is best managed at the top level. One easy reason to explain for this is that if you allow children to manage state, you'll quickly find yourself making more calls in integration, and having a hard time communicating among components. So we manage the state at the top level, and (in short) provide getters and setters for particular parts of the state to children through props.

When building your components, it's helpful to know what state you'll need to be passing down to children from the ancestral point of view; and what props to expect to receive from the descendent's point of view; and what functions will be needed in order to manage and pass these around. And later, if changes need to be made, you need to keep this "flow" of state in mind to preserve it.

These are my early days with React and my opinions and understanding will undoubtedly develop as I develop applications using React. Once I start using this code "in production" on CYOAG, I'll have plenty more to say, I'm sure.

Thanks for reading!
- Steven Kitzes


Circular Dependency in Relational Databases

This post is going to have an element of storytelling to it, because while some elementary understanding of SQL will be required, the Cliff's Notes seem to tell it all:

You probably want to avoid circular dependencies in relational databases.

These Cliff's Notes are incomplete, a little vague, a bit of an oversimplification, and up for debate besides. Some RDBMs (relational database managers) handle circular dependency better than others, and some problem domains necessitate circular dependency. Take, for example, table representations of nodes in a graph that allows cyclical paths.

However, not all problems necessitate circular dependency, and some RDBMs prohibit, or aren't capable of handling, circular dependencies. In these cases, we need to design around the limitations of our RDBM, and in any case, it's good to be aware of circular dependencies, just on principle. The purpose of this post is thus to share my own early explorations of database design, and to describe how I broke out of the circular dependency trap on a small side project.

Some project context will help here. I've been working on a passion project for a little while now. The high level idea is to build a "Create Your Own Adventure" game, where all content is user-generated, and the best story line branches rise to the top through a voting system like that used by Reddit and StackOverflow. An important early part of this project involved designing a decent database schema to build the rest of the project on top of.

You may wonder why I chose an RDB (relational database) for this in the first place, rather than an alternative such as some flavor of NoSQL. I plan to host this on the AWS (Amazon Web Services) free tier, so I can learn more about AWS. Unfortunately, at the time I started this project, the AWS free tier didn't offer any option other than RBDs. Incidentally, an RDB is almost certainly best for this type of application due to the interrelationships of different types of data, but that's a topic for another time.

So, how did a circular dependency surface in my design?

One feature I want this app to have is position persistence. When a visitor is reading the story, I want to remember where they left off, so they can conveniently come back and resume reading the story at a later date. My plan was to give the Users table a column called Current, to track a given user's position in the story. In other words, a story node would appear as a foreign key in the Users table.

The Nodes table was designed with an Author column to record who had authored a given story node. This was to support a variety of features, such as prohibiting authors from voting on their own posts, and allowing an author to edit their contributions, while protecting those posts from being edited by others. Unfortunately, you'll notice this means a user (author) is now a foreign key in the Nodes table.

Uh oh.

Now I have a node ID as a foreign key in the Users table, and a user ID as a foreign key in the Nodes table. This is a circular dependency of sorts, and is a block in SQL. You'll find some argument as to whether circular dependencies are fundamentally bad practice, or whether SQL just suffers some unhappy limitations. After all, what's conceptually wrong with tracking current node in the Users table, and author in the Nodes table? There's no conflict in authority over the data, since a user doesn't own its current node in this read-only context, it's just a position marker. To be fair, SQL doesn't understand our motivations, and to argue that an RDBM's designers could have done more toward enabling us to convey these human intuitions to the machine is too philosophical for this post.

Whatever the reason and our opinion of it, MySQL forbids this kind of interdependency. The solution I came to was to create a simple table called Positions, with a compound primary key consisting of two foreign keys, one being a user ID from the Users table, the other being a node ID from the Nodes table. Despite some apparent duplication of information, this works for the simple reason that the Positions table relies on Users and Nodes, but neither Users nor Nodes relies on Positions. I'll try to illustrate the improvement in design without opening Visio:

An arrow (->) implies dependency


Users -> Nodes
Nodes -> Users

As you can see, this yields Users -> Nodes -> Users, and so on, ad infinitum. This is bad.


Nodes -> Users
Positions -> Users
Positions -> Nodes

We can combine two of these lines together to yield:

Positions -> Nodes -> Users
Positions -> Users

but you can see that this doesn't introduce any loops, or circular dependencies, as my first design did.

Thanks for reading!
- Steven Kitzes


Alexa Skill Hackathon Takeaways

This past weekend we had a hackathon at work, focused on developing Skills for the (relatively) new Amazon Echo. The purpose of the hackathon was to expose us to a technology we hadn't used before and explore what use cases might exist that we could leverage either internally, or for customers. We had a great time and built some really fun Skills along the way to learning the Alexa Skill development tools. We found there were pros and cons, as with anything, and I personally took away a few key lessons from my short time working with the Echo.

I want to take a moment here to point out that I wasn't paid or encouraged by anyone, anyone at all, to share my experience on this blog. Amazon, to my knowledge, had no part in this hackathon; it was an internal event we did just for fun and exploration. I just want to solidify what I learned by revisiting it in a writeup. So let's talk about the Echo!

When you buy the Amazon Echo, you are buying a very nice piece of hardware that connects you to a service named Alexa, which is where the magic happens. You make your voice request, and then Alexa uses Skills (voice applications) to do something useful or fun in response to your request. First of all, I want to say that Alexa is a lot of fun to work with from a creative product standpoint. Working with Alexa's voice API provides a vast landscape of opportunities to be creative in ways that feel fresh and new. Alexa is also a lot of fun to use from the consumer standpoint, for me at least. I've heard many people say they don't know what they would do with an Echo if they had one, but to someone with that complaint, I'd say the cliche that there's an app for that holds true here.

Alexa has tons of capabilities out of the box, covering everything from checking the weather, to listening to podcasts, news, and music, to telling jokes, to making purchases on Amazon through your account. The system can integrate with home automation tools, and work in concert with apps on your phone to manage shopping lists, to-do lists, and other handy utility features. Additional Skills can be obtained from the community through the Skill Store to cover all kinds of additional fun and useful scenarios. Alexa has lots to offer.

In case this is reading too much like an advertisement, don't worry. There's plenty for Amazon to work out with the Echo before I would ever buy one at full price. That being the first of my concerns: I find it to be unjustifiably expensive. You're buying a speaker, with a hardcore microphone array, connected to the web via wi-fi; that's really all the hardware you get, and it comes in just shy of $200. It is a very nice enclosure, and the speaker itself is exemplary; but without internet, this thing is a brick. The Echo does no work of its own locally. All services are performed by Alexa on Amazon's side, and piped back to you over the web. I can go to OK Google, or Siri, or even Cortana for a lot of what Alexa can provide; so $180 feels brutal for the sole perk of having your voice assistant always-on. A big downside of the Echo, compared with voice services provided by Google, Apple, and even Microsoft, is that the Echo is immobile, whereas your phone is always with you. Adding to the cost, some of the handy features (music providers being a prominent one) require subscription fees; and the buy-by-voice feature, while convenient, can feel a little dangerous since Alexa doesn't recognize which voice is yours. (Buying each other socks without permission became a running gag throughout the hackathon.)

Along those lines, Alexa doesn't always understand voice input clearly, so you sometimes find yourself repeating commands to get her to hear you right. I find Google's voice recognition to be a tad more reliable. Alexa also does poorly if you try to give her a command in a room with other people speaking. She can't differentiate among voices, and this is very frustrating at times. Granted, this is an emerging technology, and natural language and voice recognition are very challenging problems for computers to handle, but from a practical, daily usage standpoint, this is often a frustrating shortcoming.

By far, though, my biggest gripe about working with Alexa is as a developer. Every technology has its idiosyncrasies, but I found learning the Alexa Skill developer tools to be a particularly harrowing experience. First of all, you need to use Amazon Lambda to host your Skills. Amazon Lambda is a cool idea, but developing Alexa Skills thereon is a new kind of challenge, and not in any particularly fulfilling kind of way. The tools for building Skills are a little clunky and the documentation is practically non-existent. They have some nice features, like the ability to test your Skills without needing to go through the time-consuming process of conversing with Alexa repeatedly. However, in order to deploy Alexa skills, you need to access two separate dashboards; the Skill itself, the Lambda logic, is deployed on AWS Lambda through the AWS console, but you have to separately log into the Amazon Developer Console to access Alexa configurations and tools to define things like user interactions, and recognized user phrases. Where is this documented? Great question.

The feedback from the system when something goes wrong is minimal. You'll get messages like "Syntax error." That's it. What kind of syntax error? Where? Your only option is to lint the code yourself and hope that any of your mistakes can be found that way, or be so good at programming that you never make typos or logic errors. There's not much available in terms of debugging help. By its nature, Amazon Lambda also faces the limitation of weak session and state management. This is one reason why some of us at the hackathon resorted to using Alexa as a forwarding service, simply to translate voice into API calls to a remote server (EC2 in this case) that hosted more complex service implementations outside of Amazon Lambda.

As mentioned earlier, the documentation for this set of developer tools is sorely lacking. The official Amazon guide points to a blog post from 4 months ago that is not only incomplete, but also already out of date, with screenshots and instructions that are both flat-out incorrect. It was a headache trial-and-erroring our way through the process of developing our first Alexa Skill and finding third-party tutorials for Skill development. We were able to pull it off, but it was a lot harder than it needed to be. It could have been an hour long process if the documentation had been half-way complete or at least up-to-date. I would expect sketchy documentation from open source projects made by volunteers; not from a top tech company, on a service they profit from, that other developers are meant to use as part of a business model. Maybe I expect too much, but fair or not, that is my expectation of a for-profit system provided by a sixty-five billion dollar tech company.

It really doesn't feel like Amazon is bringing their A-game to the Alexa developer community. If these issues cropped up during an open beta or in a staging environment, that would be one thing, but this is a public-facing, monetized platform. One or two of the features are sub-labeled as beta as of this writing, but the features work fine; the process and interface design are just abysmal, and the documentation is being provided by home-use amateurs because Amazon isn't providing it. I personally find that to be a weak effort on Amazon's part.

Overall, I do like the Echo and Alexa, and I would like to have one if the price were more reasonable. I find it fun and useful. I would also be excited at the opportunity to do more development on Alexa Skills now that I know the process, but it was needlessly painful to learn the ropes compared with other platforms.

Thanks for reading!
- Steven Kitzes